Error handling architecture for frontend teams that scales
Build an error handling architecture your frontend team can reuse across screens, with clear retries, support messages, and simple review rules.

Why teams end up with messy error states
A product can end up with five different error styles faster than most teams expect. The sign-in screen shows a red banner, the billing page uses a modal, the profile form leaves the button spinning, and the export screen prints raw server text. None of those choices feels huge on its own, but together they make the product feel unreliable.
Users notice that inconsistency right away. If one screen says "Try again" and another only says "Something went wrong," people stop trusting the message. They do not know if the action saved, if retry is safe, or if they should contact support.
That usually starts for ordinary reasons. Different teams ship screens at different times. Backend errors arrive in different formats. Designers focus on happy paths first and leave error states for later. Developers patch things under deadline pressure. No one owns a shared pattern across the product.
The result feels random. A customer updates a password and sees a clear message when the network drops. Later, the same customer tries to change a billing address and gets a blank form with no clue about what failed. Even if both failures came from the same API timeout, the product now feels inconsistent.
Support teams pay for that inconsistency. When messages are vague or missing, tickets arrive with almost no useful detail: "It broke," "The page froze," or "I clicked save and nothing happened." Support asks follow-up questions, engineers search logs, and everyone spends 20 minutes on a problem that a better message could have narrowed down in seconds.
The fix is not more custom UI. Teams need one error handling pattern that every screen can follow. That pattern should answer the same questions every time: what failed, can the person retry, what should they do next, and what detail should support receive.
Once people learn that pattern, the whole product gets easier to use. Errors feel less alarming, retries make sense, and support gets reports they can act on instead of guesses.
Choose the errors people can recover from
A calm interface starts with one simple split: user mistakes on one side, system failures on the other. If someone enters the wrong password, picks an invalid date, or leaves a required field empty, they can fix it now. If the server times out or a background job fails, they usually cannot.
That sounds obvious, but many teams skip this step. They show the same red banner for everything, then each screen invents its own behavior. Error handling gets much simpler when each error points to one clear next action.
Most teams only need four actions:
- Fix now - the user can change input and try again.
- Retry now - the app can safely try again right away.
- Retry later - the action may work later, but not now.
- Contact support - the user needs help because the app cannot recover safely.
A billing screen makes the difference clear. An expired credit card is a user-fixable problem. A failed request caused by a weak mobile connection may deserve a retry. A duplicate charge warning should not retry at all. That case may need support, because a second attempt could make things worse.
Be strict about what goes into the recoverable group. If the user has no control over the cause, do not pretend they do. "Please try again" is fine for a timeout. It is a bad message for a permission bug, a corrupted response, or a server-side rule the UI cannot explain.
Most products only need a short catalog of recoverable cases. Ten to fifteen common errors is often enough for a first pass. If the list grows too fast, the team is usually naming backend details instead of user-facing situations.
Make these choices early and everyone works from the same map. Designers, frontend engineers, and support staff stop guessing, and screens stop inventing their own recovery rules.
Create one shared error shape
When every screen returns a different error object, the frontend starts guessing. One component shows a toast, another opens a modal, and a third dumps raw text on the page. A shared shape fixes that quickly and gives the team one rule to follow.
Keep the shape small. Most teams do fine with five fields:
code: a short stable id likeAUTH_EXPIREDorPAYMENT_TIMEOUTuserMessage: plain language for the screensupportDetail: a technical note for logs, support, or a copy actionretry: simple rules such as allowed, wait time, and max attemptscontext: safe facts like HTTP status, service name, or request id
This works because each field has one job. The user message tells a person what happened and what to do next. The support detail helps a teammate trace the problem without guessing which API failed.
Stable codes are worth the small effort. They let you map many backend failures to one clear UI pattern, and they give support the same label every time. Do not build codes from raw backend text. Write them once, keep them steady, and use them in logs, analytics, and ticket notes.
Store retry rules on the error when the screen needs them. A timeout may allow two silent retries and then show a button. An expired session should skip retry and send the person to sign in again. When the rule lives with the error, each screen needs less custom logic.
A simple shape can look like this:
{
"code": "PAYMENT_TIMEOUT",
"userMessage": "We could not confirm your payment. Try again in a minute.",
"supportDetail": "billing confirm call timed out, request_id=9f2c, status=504",
"retry": { "allowed": true, "afterMs": 60000, "maxAttempts": 2 },
"context": { "service": "billing", "status": 504 }
}
Keep raw backend text out of the UI. If the server sends SQLSTATE errors, stack traces, or vendor messages, log them and translate them before they reach the screen. People need a clear next step, not a backend crash report.
Write messages people can act on
A good error message helps someone finish the task. A bad one only reports that the app is unhappy. When people see "Something went wrong" or "Request failed," they stop, guess, or contact support with a screenshot and no context.
Each message should answer two plain questions: what happened, and what should I do now? Keep the wording close to the action on the screen. If a payment failed, say that. If a file upload timed out, say that. People should not have to translate system language into normal language.
A few examples make the gap obvious:
- "Request failed" leaves the user stuck.
- "We could not upload your file. Check your connection and try again." gives a clear next step.
- "Validation error" hides the real problem.
- "Enter an email address with a valid format, like [email protected]." tells the person how to fix it.
Plain language matters even more when the user did nothing wrong. Avoid blame. "You entered invalid data" feels cold and often misses the point. "We could not save this form because the date is missing" is clearer and easier to accept.
Keep technical detail out of the main message. Most users do not need HTTP codes, stack traces, database terms, or vendor names. Put those details in logs, monitoring, or support notes instead. If support may need to trace the problem, show a short reference code such as "Error code: INV-204" and keep the rest behind the scenes.
Buttons need the same discipline. Use short labels that match the next step. "Try again" works. "Check details" works. "Use another card" works. Labels like "OK," "Dismiss," or "Continue" often hide what will happen.
This is where the architecture becomes visible. When every screen uses the same rules for tone, detail, and button text, users recover faster and support gets cleaner reports. One clear sentence can save ten confused clicks.
Plan retry flows step by step
Retries help when the problem is temporary, like a dropped mobile connection or a slow server. They hurt when the app repeats an action that changes data, charges money, or sends the same form twice. Good teams settle that rule before the screen ships.
Start by mapping each action on the page. Ask a plain question: if this request runs again, what can go wrong? Read-only requests usually retry safely. Actions that create, update, delete, or submit something need more care, even when the button looks harmless.
A simple rule set is enough for most products:
- Auto-retry data fetches, background refreshes, and other read requests.
- Retry updates only if the backend uses request IDs or version checks to block duplicates.
- Keep payments, destructive actions, and one-time submissions on manual retry.
- If the answer is unclear, choose manual retry first.
Feedback during retry matters as much as the rule itself. When the app starts another attempt, show a loading state that says what is happening. Disable the action button while that request is still running, or people will tap again and create a mess. If the app will try again after a delay, say so in plain words, such as "Trying again in 5 seconds."
A short pause between attempts usually feels better than instant repeats. Two or three tries are enough for most screens. After that, stop. Endless loops waste battery, hide the real failure, and make support harder because no one knows when the app actually gave up.
A simple rhythm for most screens
- On the first failure, retry once in the background if the request is safe.
- If it fails again, keep the screen stable and tell the user when the next attempt will start.
- After the final failed attempt, stop retrying and show one clear next step.
Picture a dashboard that loads account data when a train enters a tunnel. The app can retry that fetch quietly once or twice. A money transfer is different. If the first attempt fails, the app should stop, show the status clearly, and wait for the person to decide. That small difference prevents duplicate actions and cuts a lot of avoidable support tickets.
A simple example from one screen
A good test case is a consultation form on an advisory site. A founder enters their name, email, company stage, and a short note about the problem they need help with. They tap "Send," the spinner shows for a few seconds, and then the request times out.
The first response should stay calm and specific. Do not wipe the form. Show a short message near the button: "We could not send your request because the connection took too long. Your details are still here." That tells the person what happened and removes the first fear: losing their work.
In a solid pattern, the screen only offers "Try again" when the action is safe to repeat. For a form submit, that usually means the app sends the same request ID again so the backend can ignore duplicates. If the action is not safe, the screen should not guess.
A simple flow looks like this:
- The submit request times out.
- The form stays filled, and the button becomes active again.
- The page shows one clear error message and a "Try again" button.
- If the user retries and it fails again, the page adds a support message.
Keeping the entered data in place matters more than teams think. If someone wrote a long problem description, forcing them to type it again often means they leave. Keep every field value, keep the same screen state, and return focus to the failed action so they can retry fast.
The support message should wait until repeated failure. After the second failed attempt, you can show something like: "Still not working? Contact support and mention code FORM_TIMEOUT." That message is short, useful, and easy for a support team to search in logs.
This kind of screen-level pattern feels small, but it stops random UI behavior from spreading across the product. One timeout, one message style, one retry rule, and one support path already remove a lot of confusion.
Common mistakes that cause more support tickets
A generic toast for every failure is one of the fastest ways to confuse people. If the screen always says "Something went wrong," users do not know whether they should retry, wait, check their connection, or stop before they make a bigger mess. The message disappears, the context disappears with it, and support gets a ticket with no useful detail.
Clearing user input after an error is just as bad. Someone fills a long form, taps Save, hits a network issue, and the screen wipes everything out. Most people will feel annoyed before they feel helped. Keep the data in place, mark the field or action that failed, and let them try again without retyping the whole thing.
Retries also need more care than many teams expect. Retrying a harmless fetch is usually fine. Retrying an action that can charge a card, send an email, create an order, or invite a user can create duplicates. That is how one small failure turns into two support tickets: one for the original problem, and one for the duplicate record.
Raw server text should almost never reach the UI. Database errors, stack traces, and internal service messages do not help users fix anything. They can also expose details you should keep private. Translate technical failures into plain language, then keep the raw detail in logs where the team can use it.
Support struggles when the product gives them nothing to work with. Leave a short trail that both users and support can follow:
- Show what action failed, such as "saving draft" or "submitting payment".
- Keep the user input on screen when it is safe to do so.
- Include a short reference or request ID the user can share.
- Record the real server error in logs, not in the interface.
- Block repeat actions when a second attempt might create duplicates.
Picture a checkout screen. A user taps Pay, sees a vague toast, taps again, and gets charged twice. Support now has to guess whether the first request reached the server, which payment belongs to which click, and what the user actually saw. Clear messages, safe retry rules, and preserved input prevent that mess before it starts.
A quick review checklist before release
A short pass before release catches most error bugs that turn into support tickets later. Ten minutes is often enough. The goal is simple: every screen should handle failure in the same calm, predictable way.
If the team cannot review errors quickly, the system is still too loose, and each screen will drift into its own style.
Use the same five checks every time:
- Confirm each failure maps to one shared pattern. A timeout, permission issue, empty result, and server problem should not all produce custom UI states unless the screen truly needs them.
- Read the message next to the action that failed. If a user tried to save, the message should mention saving. If they tried to load a report, the message should talk about loading that report.
- Show a retry button only when retrying is safe. Loading data is usually safe. Submitting a payment, sending a message, or creating a record may not be.
- Check the logs before release, not after. Each error should record a stable code, enough screen context to trace the problem, and the attempt count so the team can tell whether the user retried once or six times.
- Ask one support question: can a support person identify the problem from the message alone? "Something went wrong" is useless. "Order could not load. Error code ORD-404" gives support a place to start.
A profile screen with an expired session is a good example. The UI should not offer endless retry. It should tell the user to sign in again, log the auth error code, and avoid pretending the screen can fix itself.
Test the worst case, not only the happy path. Turn off the network, force a timeout, and click retry more than once. If the screen stays clear, the logs stay readable, and support can tell what failed, the release is in good shape.
Next steps for rolling this out
Start small. Pick one screen that gets a lot of traffic and a steady trickle of support issues, like sign in, checkout, or file upload. That gives you enough real cases to test the new pattern without forcing the whole product to change at once.
Keep the first pass narrow. Use one shared error shape, one rule for retry behavior, and one writing style for messages. If the team tries to change every screen in one sprint, people usually fall back to old habits and custom fixes.
A practical rollout is simple: map the current error states on that screen, replace custom states with the shared shape, rewrite messages so users know what to do next, and track retries, fallbacks, and support tickets for the first couple of weeks.
After release, read real support conversations and bug reports. Look for places where users still get stuck. Maybe the message says something failed but gives no next step. Maybe retry works for a timeout but keeps looping on a permission error. Production shows those weak spots very quickly.
Then write down the pattern your team wants to keep. Keep the doc short enough that people will use it: the shared fields, when retry makes sense, when it should stop, and how to write copy that a support agent can understand without extra context. Two clear pages beat twenty vague ones.
If rollout gets stuck because every screen seems "special," an outside review can help. Oleg Sotnikov at oleg.is works with startups and small teams as a Fractional CTO and advisor, and this kind of frontend-system cleanup fits that work well. A fresh review of UI patterns, API responses, and support cases can expose gaps much faster than another round of ad hoc fixes.
Wait until the first screen feels easy to rebuild. That is when the pattern is ready to spread. Move to the next screen, keep the scope small again, and let real user behavior shape the version your team reuses.