Incident communication runbook for sales and support teams
Use an incident communication runbook to give sales, support, and success teams clear steps, calm wording, and simple handoffs while developers fix the issue.

What goes wrong when everyone improvises
An incident gets worse the moment people start filling gaps with guesses. A support rep says the issue is limited. A sales rep tells a prospect it should be fixed in 10 minutes. A customer success manager promises a full update soon, even though nobody owns the next update. Nobody is trying to mislead anyone. The problem is that mixed messages spread fast.
Customers notice the mismatch right away. They compare chat replies, emails, and account calls. If the story changes every 15 minutes, trust drops before the technical issue is even fixed.
Developers feel the cost almost immediately. Instead of working the incident, they answer the same question in Slack, email, and side messages: What is broken? Who is affected? What should we tell customers? A handful of small interruptions can easily burn 20 or 30 minutes of real fix time.
The pattern is usually the same. People fill gaps with guesses. Customers get silence or vague replies. Sales keeps active deals moving without knowing the risk. Support escalates the same question more than once. Engineers become the only source of truth for every team.
Silence is not safer. If customers hear nothing, they assume the team either does not know what is happening or does not want to say. Vague replies are only a little better. "We are looking into it" helps once. After that, it starts to sound empty unless you also give a clear time for the next update.
Picture a billing outage on a weekday morning. Support tells one customer the problem is under review. Sales says new accounts can still onboard normally. Customer success tells a larger client the issue only affects a small group. Meanwhile, engineers are still checking the scope. Now the team has two problems: the outage itself and the cleanup from bad communication.
The fix is simple in theory. Non-engineers need clear actions, clear limits, and approved customer language. Sales should know when to pause outreach. Support should know when to reassure and when to escalate. Customer success should know which accounts need a personal note. A good runbook gives the whole company one shared message, fewer interruptions, and more time for developers to solve the issue.
Decide who owns communication
When an incident starts, one person should own customer updates. Choose that person before anything breaks. If everyone writes their own version, customers get mixed answers and engineers lose time answering the same question again and again.
That owner does not need to fix the problem. The job is narrower than that. They collect confirmed facts, turn them into plain language, and send updates on a set schedule. In some teams, support is the right home for this role. In others, customer success is a better fit. The exact department matters less than the name on the page.
A simple split works for most teams. Support handles incoming tickets, tags urgent accounts, and shares the approved message. Sales pauses promises that depend on the broken service and reassures active deals without guessing. Customer success reaches out to high-impact accounts, explains business impact in plain language, and tracks follow-up needs.
Engineering should not field random questions from every team during the incident. Set checkpoints instead. For example, the communication owner checks in with engineering every 30 minutes, or sooner if the scope changes, a workaround appears, or recovery starts. If a salesperson or success manager hears a new customer concern, they pass it to the communication owner first. That keeps one thread of truth.
Some messages need a short approval path. Keep it tight or it slows everything down. Normal status updates can go out after engineering confirms the facts. Security issues, data loss, legal risk, refunds, or contract impact should go through one named leader before anyone sends them.
The runbook should also ban ad-lib language during incidents. Sales should not say "fixed" until engineering confirms recovery. Support should not guess timelines. Customer success should not offer credits on the spot unless the approval path allows it. Clear ownership and simple limits do a lot of heavy lifting.
Prepare before anything breaks
Customers tend to report the same few symptoms over and over. They cannot sign in. Payments fail. Data looks missing or delayed. Pages load but actions time out. Emails or third-party syncs stop working. A good runbook starts with these customer-facing incident types because support, sales, and customer success need to recognize them fast without waiting for an engineer to translate.
Severity levels need plain definitions too. Skip labels that only engineers understand. A support rep should be able to tell the difference between "many customers cannot use the product" and "one workaround exists, but the experience is poor" without asking someone else what that means.
Good severity definitions explain two things: customer impact and response speed. A Sev 1 might mean the product is down for many customers and updates need fast approval. A Sev 2 might mean a major feature is broken, but customers can still do some work. A Sev 3 might mean the issue is limited and there is a known workaround. If two people read the same report, they should choose the same severity.
You also need one place where every non-engineering team finds the latest approved update. If support checks Slack, sales checks email, and customer success checks an old document, customers will hear three different stories. Pick one source and make it the only source. That can be a shared internal page, a pinned incident summary, or a status note owned by the incident lead.
Keep customer-ready templates in the same place. They do not need to be fancy. You need an initial reply, a follow-up update, a workaround message, and a resolution note. Write them now, while nobody is under pressure.
A good test is simple. Hand the runbook to a new support rep. If they can tell what happened, how serious it is, where to look, and what to send in under five minutes, the setup is probably good. If they get lost in policy language, it is too long.
Run the process during the first hour
The runbook starts the moment an internal alert looks real. One person takes ownership of communication right away. That person checks three facts with the incident lead: what users can see, who seems affected, and what the team still does not know.
At that point, everyone outside engineering needs a hard stop on guessing. Do not name a cause, promise a fix time, or say the issue is solved unless the incident lead confirms it. A wrong guess creates more work than a short delay.
Send the first customer update quickly, even if the facts are still thin. Waiting for a perfect message usually makes things worse. A short note is enough. Confirm there is an issue, describe the visible impact, say the team is investigating, and give the time for the next update.
Then stick to a rhythm. Every 15 minutes works for a full outage. Every 30 minutes is usually enough for a partial outage. For a smaller issue with limited impact, hourly updates may be fine. What matters most is consistency. Silence makes customers nervous. A brief update with no major change still helps if it arrives on time.
While developers work on the fix, the communication owner keeps one internal source of truth up to date. Support uses it for tickets. Sales uses it for active deals and worried prospects. Customer success uses it for larger accounts that expect direct outreach. Everyone should send the same status, the same timing, and the same wording.
Before you send the final message, confirm that the service is actually stable. Check the customer-facing symptom, not just an internal graph. Then close with a simple note: what customers experienced, when service recovered, whether they need to do anything, and when they can expect a fuller follow-up.
That last message should feel complete. Customers do not need every technical detail right away. They do need a clear end to the incident and a clear next step.
Write updates customers can use
Customers do not need a long note during an incident. They need an update that answers five questions fast: what broke, what they may see, what your team is doing, when you will speak again, and who owns the next update.
Good incident language is calm because it is specific. Bad incident language is vague, dramatic, or oddly polished. "We are aware of a situation" says almost nothing. "We are seeing intermittent login failures" gives customers something they can recognize.
A strong update often fits in four short sentences:
"We are seeing intermittent login failures in the app. Some customers cannot sign in or may get timed out after login. Our engineers are working on the issue now and checking the authentication service. We will post the next update by 2:30 PM ET."
Each sentence has one job. The first states the problem. The second describes the customer impact. The third explains what the team is doing. The last gives a specific update time, even if you do not have a fix yet.
Do not guess. If you do not know the cause, say what you know instead. Do not write "full outage" if only one feature is failing. Do not promise a fix by a certain time unless the engineering lead approves it.
Sales usually needs a slightly different version for live deals and renewals, but the facts should stay the same. For an active deal, a rep might say: "We are managing an active incident that affects login for some users. The team is working on it now, and the next update is scheduled for 2:30 PM ET. I will share that update with you as soon as it is posted."
For a renewal, the language can be just as direct: "We are currently handling a login issue that affects some users. The team has assigned engineers to it and will send the next update by 2:30 PM ET. I can collect the incident notes and timeline for your review after service returns to normal."
Lock these message patterns in before anything breaks. That saves support and sales from writing from scratch while developers stay focused on the fix.
Example: checkout outage on a Tuesday morning
At 9:10 a.m., customers start reporting failed card charges at checkout. Orders do not complete, support gets a spike in tickets, and sales has two live demos booked before lunch.
By 9:15, the incident lead confirms the problem: payments are failing for some or all customers, and the team is still checking whether the payment provider or the app caused it. Engineering starts the fix. Everyone else follows the runbook so developers can work without constant interruptions.
Support does not wait for a perfect root cause. In the first 15 minutes, they send a short message to new tickets and live chat cases: "We are investigating a payment issue affecting checkout. Some charges may fail right now. Our team is working on it, and we will share the next update by 9:30 a.m. If your payment failed, please do not retry more than once until we confirm service is stable."
That message does a lot with very little. It confirms the issue, sets the next update time, and tells customers what to do right now.
Sales handles prospect calls with a simple script. If a prospect asks about the outage, the rep says: "We have an active payment issue this morning. The engineering team is fixing it now. The checkout problem does not change your demo, but I want to be direct about it." If the call depends on a live purchase flow, sales reschedules instead of improvising or dragging an engineer into the meeting.
Customer success reaches out to affected accounts, especially those with active launches, campaigns, or high order volume. A brief note is enough: "We can confirm a payment outage is affecting checkout for your team. We will send you another update at 9:30 a.m. If you need help handling inbound customer questions, reply here and we will help with wording."
Engineering does not answer live questions from support, sales, or customer success one by one. Engineers do not join customer calls, respond to repeated status pings, or rewrite messages in the middle of debugging. One incident lead or coordinator collects facts from engineering and pushes updates to the rest of the company.
At 9:30, every customer-facing team sends the same status: what is broken, who is affected, what customers should do next, and when the next update will arrive. That consistency matters more than polish.
Mistakes that slow recovery
One bad update can turn a short outage into a long, messy day. The biggest issue is often not the bug itself. It is the confusion around it.
Teams guess when they feel pressure to say something fast. That is how support tells customers it is "probably a payment issue," sales says service will return in 15 minutes, and customer success gives a different story to large accounts. If the cause is still unknown, say that. If the timeline is still unknown, say that too.
Early promises create a second problem. Customers often remember the missed deadline more than the original bug. A safer message is plain and honest: the team found the issue, work is in progress, and the next update will come at a set time.
Vague language causes damage too. Phrases like "some users may see temporary disruption" sound neat, but they hide the real impact. Customers need direct words. Can they log in? Can they check out? Are emails delayed? Say what changed for them.
Another common mistake is pulling engineers into the same conversation over and over. Ten people asking for status every five minutes does not speed up the fix. It slows it down.
A few rules prevent most of this:
- Use one approved status message at a time.
- Name the customer impact in plain words.
- Share update times, not guessed fix times.
- Send questions through one incident lead, not straight to engineers.
This matters even more in small teams. If one developer and one CTO-level leader are driving the fix, every interruption counts. Clear customer language protects their focus, and that usually gets service back faster.
Do a two-minute check before sending
Fast messages cause a lot of avoidable damage. A quick review can stop the usual mistakes: the wrong product area, an old timestamp, or a promise nobody can keep.
Start with the latest internal incident note, not your memory. If engineering updated the status at 10:40 and your draft still uses details from 10:15, customers will get mixed signals before the fix even ships.
Use a short check before sending:
- Does the draft match the newest internal status, current scope, and customer impact?
- Are customer names, product names, regions, and timestamps correct?
- Does the message say when the next update will arrive, even if there is no resolution yet?
- Would sales, support, and customer success send almost the same wording if three customers asked at once?
- If a large account replies, is one person clearly assigned to handle that thread?
That last point matters more than most teams expect. Big customers often ask for extra detail, and the trouble starts when sales replies one way, support replies another way, and customer success adds a third version ten minutes later. One owner keeps the conversation steady and keeps engineers out of five side chats.
Small wording changes help too. "We are investigating reports" is weak if you already know the affected area. "We are investigating login failures in the billing portal. Next update by 14:30 UTC" gives customers something they can act on.
Keep this check near the send button, not buried in a long policy document. During an outage, nobody wants a page of theory. They need five quick checks, one approved message, and one named owner.
Review the response after service is back
When service returns, most teams want to move on. That is usually the moment when the same communication mistakes get baked in for the next incident. Give the team 20 to 30 minutes while the details are still fresh and review the customer side of the response, not just the technical fix.
Start with the questions customers asked again and again. If ten people asked whether their data was safe, your update probably missed that point. If sales kept asking support for status, the internal update was probably too thin or too slow.
A short review is enough. Ask which customer questions came up most often, which templates got calm replies, which messages created confusion, where ownership slowed down, and how long approvals held up updates.
Be honest about the weak spots. Often the wording is not the real problem. One person was missing. Nobody knew who could approve a public message. Updates sat too long because everyone wanted perfect phrasing. During an incident, clear and timely beats polished.
Then fix the runbook while the pain is still easy to remember. Remove extra steps. Name a backup owner for each role. Set a simple update rhythm, such as every 30 minutes until the issue stabilizes. If a template caused confusion, rewrite it in plain language and test it on someone outside the incident channel.
A short practice drill helps more than a long meeting. Give sales, support, and customer success a small scenario and ask them to send one internal update and one customer reply. You will spot vague language fast. You will also see whether people can use the runbook under pressure.
Keep the whole thing short enough that someone can scan it in under two minutes. If it reads like policy writing, people will ignore it when things get messy. The best version is usually one page, a few templates, clear owners, and a short checklist people can trust when they are tired.
Start small and practice
Your first runbook does not need to cover every type of incident. Start with one problem your team sees often, such as a login outage, failed checkout, or delayed data sync. Build a small set of templates around that case and make sure sales, support, and customer success all know when to use them.
Give the process clear owners. One person sends the first internal alert. One person owns customer updates. One backup covers each role if someone is out. That removes the familiar delay where everyone waits for someone else to speak.
A short drill will tell you more than a long planning meeting. Put 20 minutes on the calendar and walk through the first hour of a fake incident. Ask support what they would send to affected customers. Ask sales how they would answer an active prospect who asks whether the issue affects them. Keep developers focused on the fix, not on rewriting status messages.
A simple starting set is enough: one incident type to cover first, one owner for internal updates, one owner for customer updates, three templates for investigating, workaround available, and resolved, plus one drill date in the next two weeks.
After every real incident, update the runbook while people still remember what happened. Remove lines nobody used. Add the questions customers asked most. Tighten any message that caused confusion. If a template took too long to approve, fix that step too.
Some teams can build this on their own. Others move faster with a second pair of eyes. If you need help tying customer communication to the technical response, Oleg Sotnikov at oleg.is works with startups and small teams as a Fractional CTO and advisor. That kind of outside review is useful when the runbook, the infrastructure, and the actual incident process all need to match.
A good target for this week is simple: choose one incident type, assign owners, write the first templates, and schedule one drill.
Frequently Asked Questions
Who should send customer updates during an incident?
Pick one named owner before the next outage. That person gets confirmed facts from the incident lead, turns them into plain language, and sends updates on a schedule. Support or customer success can own this, but one person must make the call.
What should the first customer message include?
Send it fast. Say what users can see, explain the customer impact in plain words, share what the team is doing now, and give the exact time for the next update. If you do not know the cause yet, say that instead of guessing.
How often should we update customers?
For a full outage, send updates about every 15 minutes. For a partial issue, 30 minutes usually works, and a smaller issue can wait an hour. Pick a rhythm and stick to it, because silence makes people assume the team has no handle on the problem.
Should sales keep pushing deals during an outage?
No. Sales can keep conversations calm, but reps should pause any promise that depends on the broken feature. If a demo needs login or checkout and those flows fail, reschedule instead of trying to talk around it.
When should support escalate instead of reassure?
Support can reassure customers when the runbook covers the symptom and the approved message still fits. Escalate when the customer reports a new symptom, the account has high business impact, or the approved wording no longer matches what the customer sees.
Can we promise a fix time?
Only if the incident lead approves it. Most teams should give a next update time, not a repair promise. Customers usually accept honest uncertainty better than a deadline your team misses 20 minutes later.
Where should the team look for the latest status?
Pick one place for the latest approved update and make every team use it. A pinned note, a shared page, or a single status post all work if one person keeps it fresh. If support checks one place and sales checks another, customers will hear two different stories.
What templates do we need first?
Start with four short templates: the first reply, a follow-up update, a workaround note, and a resolved message. Write them before the next incident so nobody has to invent wording under pressure.
How do we keep engineers focused on the fix?
Route questions through the communication owner instead of sending them straight to engineers. Set check-ins with engineering on a timer or when scope changes, a workaround appears, or recovery starts. That gives developers longer blocks to work the issue.
What should we do after the incident ends?
Run a short review while the details still feel fresh. Look at the questions customers asked most, the messages that confused people, the approvals that slowed updates, and the owner gaps that created side chats. Then update the runbook right away and schedule a short drill.