Apr 28, 2025·7 min read

React Query caching rules for operator screens by business risk

React Query caching rules should match business risk on operator screens, so teams set staleTime, polling, and focus refetching with fewer surprises.

React Query caching rules for operator screens by business risk

Why defaults fail on operator screens

Operators do not open a screen to catch up on context. They open it to act. If the queue length, stock count, machine state, or fraud flag is even a little old, the next click can be wrong.

That is what separates an operator screen from a normal product page. A marketing page can survive stale data for a minute. An operations view often cannot. One bad value can send a driver to the wrong stop, approve an order that should pause, or make a failed job look healthy.

A small example makes the problem clear. Picture a support lead watching a live ticket board. The screen still shows one agent as available because the status query has not refreshed yet. The lead assigns a priority case to that person, waits, and loses ten minutes before anyone notices. The data was only slightly stale. The decision was fully wrong.

React Query defaults are broad on purpose. Data becomes stale right away unless you set staleTime, and refetches often happen when the window regains focus or the app reconnects. That works fine for many apps. It is a poor fit for screens where people make decisions from what they see in that exact moment.

The usual mistake is to start with network traffic. On operator screens, start with business risk instead. If stale data can cause a bad refund, a dispatch error, a missed alert, or a compliance issue, set freshness rules around that risk and accept the extra requests.

Good cache rules do not begin with library defaults. They begin with the cost of being wrong.

Sort data by business risk

Most operator screens mix very different kinds of data. A payment failure count, a low stock alert, and a user profile photo do not need the same freshness. Treat them the same, and you either waste requests or hide real problems.

A practical split usually has three groups. Action data needs a fast reaction: open incidents, failed payments, stock below reorder level, machines in alarm, or tickets waiting past SLA. Support data helps people understand what they are seeing: customer names, recent notes, order details, and status history. Background data changes slowly or barely affects a decision: settings, labels, reference tables, and help text.

Before you touch code, ask one blunt question for each value on screen: if this stays stale for 30 seconds, who pays for it?

Sometimes the answer is money. Sometimes it is backlog, missed stock movement, or a safety problem. Those fields deserve aggressive refetching. A color theme or last-updated label does not.

Set a freshness tolerance for each group in plain language. Action data might need updates every 5 to 15 seconds. Support data might be fine at 30 to 60 seconds. Background data can often stay fresh for 10 minutes or longer.

That gives you your staleTime and refetch policy later. The number should come from business impact, not guesswork.

A warehouse screen shows this well. Low inventory alerts and failed barcode scans belong in the first group. Product descriptions and supplier contact names belong in the second. Static shift rules belong in the third. If an alert is late, work stops or the wrong item ships. If a supplier phone number is two minutes old, nobody cares.

Keep the same rule across similar queries. If every alert badge refreshes every 10 seconds except one that waits five minutes, operators stop trusting the whole screen. Predictable behavior matters almost as much as speed.

Define what "fresh enough" means

"Fresh enough" needs a number. If a value can be 30 seconds old and nobody makes a bad decision, say 30 seconds. If a late update can send work to the wrong person, miss an alert, or show the wrong stock level, say 5 seconds or less.

This is where teams often go wrong. They accept the default behavior, then debate it later when operators say the screen feels late. A better question is simple: what happens if this value is 5 seconds old, and what happens if it is 2 minutes old?

The answer changes by field. A live queue count on a support desk may need updates every 10 seconds because staff act on it right away. A shift summary can be 10 minutes old and still be fine because nobody uses it to make second-by-second decisions.

Match the refresh speed to how fast the work changes. If the work changes every few seconds, keep staleTime short and refetch often. If the work changes a few times per hour, let the cache sit longer so the screen stays calm and the backend does less work.

A simple rule sheet helps product, support, and engineering use the same language:

  • Alarm or incident status: 0 to 5 seconds old
  • Active queue, stock, or capacity numbers: 10 to 30 seconds old
  • Task lists that operators reopen often: 30 to 60 seconds old
  • Reports and summaries: 5 to 15 minutes old

These are not magic numbers. They are just a clear starting point. Support staff can usually judge them faster than engineers can. They will tell you, "If this number is one minute old, we can still work," or, "If this status is late by 20 seconds, we call the wrong customer."

That is much more useful than saying a screen should be "real time." It gives you an actual stale window and a reason behind it.

Set query options step by step

An operator screen works best when each query matches the cost of being wrong. A payment queue, a fraud alert list, and a country dropdown should not share the same cache policy just because they live on the same page.

Start with risk buckets

Start with staleTime. Ask one direct question: if this value is 30 seconds old, what can go wrong?

A simple split is usually enough:

  • Critical status data: staleTime from 0 to 5 seconds
  • Active work queues: staleTime around 5 to 15 seconds
  • Context panels and summaries: staleTime around 30 to 120 seconds
  • Reference data like labels or dropdowns: staleTime in minutes or hours

Once staleTime makes sense, decide whether the screen should refetch when the tab gets focus again. Turn refetchOnWindowFocus on for queries operators depend on after switching tabs, such as live queues or issue counts. Leave it off for slow-moving data. Otherwise every tab switch creates noise, spinners, and wasted requests.

Use refetchInterval only for data that people actually watch as it changes. Lists of jobs, incidents, orders, or messages often need polling. A customer profile card usually does not. If every query polls, the screen feels busy and the backend does extra work for little gain.

Handle writes and failures honestly

After a write, invalidate the queries that show the changed record on the same screen. If an operator assigns a ticket, resolves an alert, or updates a shipment, the related list and detail panel should refresh right away. Do not wait for the next interval and hope the UI catches up.

Set retry rules carefully. Automatic retries help with short network blips, but too many retries can hide a broken endpoint or a stuck backend job. For critical operator data, a small retry count is often better than quiet looping. Let the error show up, and make the missing data obvious.

A practical setup is usually boring: short staleTime for risky data, focus refetch only where tab switching matters, polling only for live lists, query invalidation after writes, and limited retries. That keeps the screen fresh enough without turning it into a constant reload cycle.

A simple operator screen example

Check Your Mutation Flow
Make assign, close, and approve actions refresh the right data right away.

Picture a support or dispatch queue that stays open all day. New requests arrive, jobs change hands, and people act fast. On that part of the screen, stale data causes real mistakes.

If one agent sees a job as unassigned for even a few seconds after someone else took it, the team wastes time. If a new urgent request appears late, a customer waits longer than they should. That is why the live queue should refresh every few seconds instead of following library defaults.

For waiting jobs, use a very short staleTime and a steady polling interval. In many teams, 3 to 5 seconds works well. If the queue moves more slowly, 10 seconds may be enough.

The totals on the same page need a different rule. A count of open jobs, overdue items, or resolved requests helps staff keep pace, but it does not need to refetch every time the browser tab gains focus. That usually adds noise, not clarity.

A better pattern is simple. When someone edits a record, closes a job, or reassigns work, refetch the totals right after that action. The numbers stay accurate when people need them, and the app avoids extra requests from focus changes.

History and notes belong in a third bucket. Agents read them for context: what happened before, what the customer said, what another teammate tried. That context matters, but it rarely changes the next second.

One setup that fits many operator screens:

  • Waiting jobs: staleTime near 0, refetch every 3 to 5 seconds
  • Totals: refetch after create, edit, assign, and close actions
  • History log: cache for about 5 minutes
  • Internal notes: cache for 10 to 15 minutes unless staff edits them often

This is where cache settings stop feeling like technical details and start matching business risk. Keep the urgent part of the screen fresh. Let slower context stay cheap and calm.

Choose refetch triggers carefully

A screen can look calm and still be wrong. Bad refresh behavior usually starts when every trigger stays on by default, even though the screen already polls every few seconds.

When a tab gains focus, ask one practical question: did anything important likely change while the operator was away? If the page shows dispatch status, stock counts, or alert queues, a focus refetch often makes sense. If the screen already polls every 5 seconds, focus refetch may add load without helping much.

The same logic applies after the network reconnects. Some data should refresh at once because users may have missed a real event during the outage. Other data can wait for the next scheduled poll. Treat reconnect as a recovery trigger, not a switch you turn on everywhere.

Operators also like control. A visible manual refresh button helps when they suspect the screen is behind or when they just completed an action and want to confirm the result. Put it near the data it affects, not buried in a menu. Show a clear loading state so people know the app heard them.

Background tabs need different behavior too. If the screen is not visible, full-speed polling is usually waste. Slow it down or pause it unless the user must return to an up-to-date queue right away. For many operator screens, polling every 2 seconds in the foreground and every 30 to 60 seconds in the background is a safer trade.

One mistake causes a lot of confusion: teams stack polling, focus refetch, and reconnect refetch on the same hot query. Then a user clicks back into the tab right as the poll fires, and the app sends duplicate requests for the same data. That does not improve freshness. It just makes behavior harder to reason about.

A short rule set works well:

  • Use focus refetch for data that can change a lot while the user is away
  • Use reconnect refetch for data that should recover fast after outages
  • Add manual refresh where operators check outcomes often
  • Slow or pause polling in background tabs
  • Do not combine every trigger on hot queries unless you can explain why

Good production behavior is usually quiet. The screen updates when it should, stays still when it can, and gives operators a clear way to ask for fresh data.

Common mistakes

Audit One Busy Screen
Start with a queue, dashboard, or alert view your team uses every day.

Teams often pick one staleTime for every query because it feels tidy. It is also one of the fastest ways to get the wrong screen behavior. A job queue with customer impact should not refresh on the same schedule as a sidebar count, a help note, or a list of yesterday's completed tasks.

Other teams make the opposite mistake. They poll low-risk data all day because nobody wants to miss a change. That burns requests, adds noise to logs, and makes rate limits more likely. If a panel changes a few times per shift, polling it every 5 seconds is not caution. It is waste.

Mutations cause another class of bug. A user acknowledges an alert, reassigns a ticket, or marks a batch as complete, but the team forgets to invalidate the related query. The action succeeds, yet the screen still shows the old state. Operators click again because they think the first attempt failed.

Loading states can also hurt trust. Some teams hide stale but still useful data behind a full-screen spinner whenever a refetch starts. On an operator screen, that is often worse than showing data that is 20 seconds old with a clear refresh indicator. If a table disappears during every refetch, people stop trusting it and start checking another tool instead.

Timestamps matter more than many teams expect. If the screen shows no "last updated" time, operators have to guess whether the numbers are fresh, delayed, cached, or broken. A small label like "Updated 12 seconds ago" removes that guesswork.

A quick smell test helps:

  • Every panel refreshes at the same rate
  • Low-risk widgets create most of the network traffic
  • User actions do not update nearby data right away
  • Refetching wipes the screen instead of keeping prior data
  • Nobody can tell when a panel last changed

Most of these problems do not come from React Query. They come from treating every query as if it carries the same business cost.

Checks before you ship

Tune Background Polling
Slow hidden tabs down without leaving active queues behind.

A screen can look correct in testing and still fail during a busy shift. A few quick checks catch most problems before operators do.

Start with the operator's view. They do not care about cache theory. They care about whether a number is current, whether a recent action changed the screen, and whether the app stays calm when the network drops and returns.

A warehouse example makes this obvious. If someone marks an order as packed, but the queue still shows the old count for 40 seconds, people stop trusting the screen and start refreshing at random. That usually creates more load, not less.

Show the age of changing data. A note like "Updated 12 seconds ago" removes guesswork. If one panel updates every 10 seconds and another every 2 minutes, make that visible too.

Refetch after actions that change real work. When an operator approves, assigns, ships, or cancels something, refresh the queries tied to that decision. Do not wait for the next poll if the action changes what they should do next.

Keep expensive queries quiet when nobody watches them. If a tab is hidden, a drawer is closed, or a detail panel is off screen, stop polling or slow it down. There is no reason to spend requests on data nobody can act on.

Treat reconnect as one recovery step, not a burst. If the network returns, refetch once for each shared query and then go back to the normal interval. Shared query keys and deduping help prevent four widgets from hitting the same endpoint at once.

Ask support staff to explain the refresh rules out loud. They should be able to say, in about a minute, what updates itself, what refreshes after an action, what pauses in the background, and when a manual refresh still makes sense. If they cannot, the rules are probably too hard to live with.

What to do next

Start with a small risk table, not the code. Put every query from one operator screen into three buckets: low risk, annoying if stale, and expensive if stale. Then add one plain sentence for each query: what goes wrong if this data is 30 seconds old, 2 minutes old, or 10 minutes old.

That exercise usually settles most cache decisions faster than another round of debate about defaults. Operators do not work in the abstract. They feel the damage when a queue count is behind, a payment status lags, or a job still looks stuck after it already failed.

Pick one busy screen and watch real people use it. Sit beside support, operations, or sales for an hour. Ask them to point out every moment when they pause, refresh manually, second-guess the screen, or open another tool to confirm the truth. Those are the places where stale data costs time, money, or trust.

A simple process works well:

  • Write the risk table before changing any query option
  • Change one busy screen first, not the whole app
  • Add a short note beside each query that explains why its refresh rule exists
  • Review those notes after incidents, support spikes, or workflow changes

Keep those notes close to the code. A short comment like "Refetch every 15s because operators route urgent tickets from this count" is enough. Six months later, that note can stop someone from "cleaning up" the query back to a default that looks neat but hurts the team.

Use a real test after each change. If operators stop mashing refresh, stop cross-checking another tool, and clear work faster, the rule is probably right. If they still hesitate, the screen may need shorter polling, a manual refresh button in a better place, or event-based updates for the small set of fields that carry the most risk.

If the screen affects revenue, on-call load, or support volume, an outside review can help. Oleg Sotnikov at oleg.is works as a Fractional CTO and startup advisor, and this kind of operator workflow review is often cheaper than turning every screen into a live feed that burns budget.

Cache settings are rarely universal. They should match the cost of being wrong.

Frequently Asked Questions

Why are React Query defaults a bad fit for operator screens?

Because operators act on what they see right now. If a queue, stock count, or alert stays old for even a few seconds, someone can assign the wrong job, miss an incident, or approve the wrong thing. Start from the cost of a wrong decision, not from the library defaults.

How do I choose the right staleTime?

Group queries by business risk first. If stale data can cause money loss, missed work, or a compliance problem, keep staleTime very short. If the data only adds context, give it a longer cache window and fewer refetches.

Which data needs the fastest refresh?

Refresh action data first. Open incidents, live queues, failed payments, stock alerts, and machine alarms usually need the fastest updates because people make immediate decisions from them. Notes, labels, and settings can wait longer.

Should every query use polling?

No. Poll the parts people watch as they change, like job queues or alert lists. Leave profile cards, help text, and slow-moving reference data on a longer cache so the screen stays calm and the backend does less work.

When should I turn on refetchOnWindowFocus?

Use it when people leave the tab and come back to data that may have changed a lot. It fits dispatch boards, issue counts, and live queues. Skip it on slow data, and skip it on hot queries that already poll often unless you have a clear reason.

What should happen after a mutation like assign or close?

Refresh the affected queries right after the action succeeds. If someone assigns a ticket, closes an alert, or ships an order, update the nearby list and detail view at once. Do not wait for the next polling cycle and hope the screen catches up.

Should I hide old data behind a spinner while refetching?

Keep the old data on screen and show that a refresh is happening. A full-screen spinner often hurts trust more than slightly old data, especially on busy tables. Operators usually prefer a stable view with a clear loading hint and a visible timestamp.

Do operator screens need a last updated timestamp?

Yes. A small note like Updated 12 seconds ago tells people whether the screen is fresh, delayed, or stuck. That removes guesswork and cuts down on random manual refreshes.

How should background tabs handle refetching?

Slow polling down or pause it when nobody watches the screen. Many teams keep a fast interval in the foreground and a much slower one in the background. That saves requests without making the screen feel behind when the user returns.

When should I use real-time events instead of polling?

Use polling for most changing data, then add event-based updates only where delay hurts the business. If a field drives urgent routing, revenue, or on-call work, push updates for that small set and keep the rest on normal cache rules. That gives you fresher data without turning the whole app into a live feed.