AI operational design: prompts need approvals and logs
AI operational design keeps real work moving by adding approvals, logs, and exception queues so teams catch mistakes before they spread.

Why good demos fail in daily work
A demo starts with clean test data and a clear goal. Real work doesn't. Customers send vague messages, attach the wrong file, ask two things at once, or leave out the one detail the model needs.
That gap looks small until the tool touches something expensive. A polished answer in a sandbox feels safe. The same answer sent to a customer can trigger a refund, promise the wrong delivery date, or approve a change nobody meant to approve.
The same problem shows up in back-office work. A model that reads invoices can look accurate in a test run, then miss a duplicate charge or misread a tax line when the format changes. Someone still has to catch that before money goes out.
People also confuse the prompt with the product. It isn't. The prompt shapes the reply, but daily work depends on who checks risky actions, where staff step in, and what happens when the model gets stuck or says something odd.
That's why AI operational design matters. A model can draft, classify, or suggest the next step. Staff still need approval points for anything tied to money, contracts, account access, or customer promises. Without those checks, the team ends up watching the tool by hand and fixing mistakes after the fact.
Ownership is another weak spot. One person writes the prompt, another team deals with customer complaints, and nobody owns the full process. The demo looks great, the launch goes ahead, and the team spends the next month cleaning up edge cases nobody planned for.
A good demo proves that a model can respond. Real work asks a harder question: can the team trust that response inside the mess of daily operations? If the answer depends on a human catching mistakes at the last second, the process is still unfinished.
What operational design actually includes
AI operational design is the part teams skip when a demo looks good. A prompt can draft an answer, sort a ticket, or prepare a change. The harder part is deciding who checks it, what gets saved, and where uncertain cases go.
Start with roles. Someone should own every action the AI can take. One person approves a refund. Another edits the draft before it goes out. A manager or admin can stop the action if the result looks wrong or the case falls outside policy.
That sounds basic, but it changes how safe the system feels. People trust the tool more when they know who can step in and what happens next.
Good approval workflows for AI also need a record that people can read later. Keep the input the model received, the answer it produced, and any action it tried to take. Add a timestamp, the model version, and the rule that allowed the step. Those AI audit logs help with support, training, and debugging when something goes wrong.
Unclear work should not bounce around in email or chat. Put it in one review queue. Angry customer messages, missing account data, and low-confidence replies all need one visible place where a person can decide what happens next.
The first version can stay simple. Retry once if the model fails for a technical reason. If confidence drops, fall back to a smaller action or a safer prompt. If money, access, or customer risk is involved, hand the case to a person. If the same step fails again and again, stop after a small limit instead of letting the system keep guessing.
You do not need a huge setup on day one. Even a small team can define approval rights, keep clean logs, and create exception queues before it expands automation. Oleg Sotnikov often works with companies at exactly this stage. The model already works, but the operating rules still live in people's heads. Writing those rules down usually saves more time than another prompt tweak.
Where approvals belong
Approval should sit at the point where a wrong answer costs money, creates legal risk, or changes a customer record. That usually means payments, refunds, contract language, policy exceptions, and any message that sounds official. If an AI suggests a refund amount, a person should approve it before the money leaves the account.
The same rule applies when the model works with weak data. If a customer record is incomplete, if two systems disagree, or if the AI fills gaps with guesses, send that case to review. Teams often trust fluent output too quickly. Clean writing can hide shaky inputs.
Low-risk work does not need the same friction. Draft notes, internal summaries, first-pass replies, and brainstorming text can move without approval if no customer sees them and no system acts on them. That keeps the team fast. If every AI action needs sign-off, people stop using the tool or start clicking "approve" without reading.
Give each decision one owner
Every approval step needs one named person or one clear role with the final say. Not a group inbox. Not "someone from ops." If three people share the decision, nobody really owns it and the queue starts to stall.
A simple split works well. Finance approves money movement. Legal or compliance approves regulated language. Team leads handle edge cases in customer communication. Product or operations approves workflow changes that affect records.
This is where risk control becomes practical. Prompts shape the output, but approvals shape the consequences. A good flow sends only the right cases to humans, gives them enough context to decide quickly, and records who approved what. That matters later, when a customer asks why a refund was denied or why a message was sent.
What to log from day one
When an AI task goes wrong, the prompt is rarely enough to explain it. You need a record of what went in, what came out, when it happened, and what the system touched along the way. If you skip that early, people end up arguing from memory, and memory is usually wrong.
Start small. Save the request text or source payload, the full output, a clear timestamp, and an ID that ties the run to a user, ticket, order, or document. That gives support staff something concrete to search when a customer says, "This answer looks off" or "Why did this approval go through?"
Then save the run context. Record which model handled the task, which tool or agent called it, and which data source fed it. If your workflow pulls from a CRM, help desk, spreadsheet, or internal knowledge base, note that too. Many bad results come from stale or partial data, not from the model itself.
A useful first log usually covers five things:
- the original input and final output
- the exact time of each run
- the model, tool, and data source used
- every human edit, approval, rejection, and override
- a short plain-language summary anyone on the team can read
That last item matters more than teams expect. Raw JSON helps engineers, but support and operations staff need logs they can scan in seconds. Labels like "customer message," "AI draft," "manager approved," and "agent changed final reply" work better than a nested blob nobody wants to decode during a live issue.
Human actions need their own trail. If a person edits the output, save both versions. If a manager approves or blocks a step, record who did it and when. If staff override a model decision, capture the reason in a short note. Those details turn a messy dispute into a fixable process issue.
Teams that run lean AI operations learn this fast. The prompt gets attention because it's visible. The log does the harder work later, when someone needs a clear answer to one simple question: what happened in this run, and who changed it?
Why exception queues matter
An AI system should not guess its way through edge cases. When the model sees something unusual, it needs one visible place to send that case instead of hiding it in a chat thread, email inbox, or silent failure.
That queue gives people a clean handoff point. A support lead, ops manager, or founder can open one screen and see what needs attention now, what can wait, and what looks risky.
A good queue is not just a pile of problem cases. Teams should sort items by risk, urgency, and customer impact. A billing error for an active customer belongs above a low-stakes formatting issue, even if both arrived at the same time.
Every handoff also needs a short reason. One sentence is enough: "Customer asked for a refund after an unusual charge pattern" or "Order ID did not match the account name." That reason saves time and stops people from rechecking the whole case from scratch.
Queues also need a timer. If cases sit for three days, the queue turns into a graveyard and people stop trusting the system. Track how long each case has waited, how many cases missed the target response time, and which case types pile up most often.
Those patterns show where the workflow is weak. If the same handoff appears every afternoon, the fix might be a better rule, a better prompt, or a better data source. The queue is not only for cleanup. It shows where the system keeps asking for human rescue.
This is often the first practical step for small teams. Oleg Sotnikov, through his work at oleg.is, often helps companies start with a simple exception queue and tighter operating rules before they add more automation.
Build the flow in small steps
Start with one task that appears every week and follows clear rules. Good first choices are invoice data entry, lead triage, or first-pass document sorting. If people already handle it the same way most of the time, it is a good candidate for AI operational design.
Then mark the exact moments where a person needs to step in. Do not put review everywhere. That slows the work and teaches nobody to trust the system. Put review where the cost of a wrong action is real, such as sending money, changing customer records, or closing a case.
A small team can sketch the first version on one page. The AI reads the input and makes a draft decision. A person reviews only high-risk actions. The system records the input, output, confidence, and final choice. Unusual cases move to an exception queue. Before launch, the team tests the flow on real examples.
Logging needs the same care as prompts. If the system changes a field, who approved it? If it skips a case, why? AI audit logs should answer those questions without detective work. Store the original request, the model result, the prompt version, the approval decision, and the final action.
You also need a clear rule for cases that leave the main path. Low confidence, missing data, policy conflicts, and repeated retries are common triggers. Those items should land in exception queues with enough context for a person to fix them fast.
Real examples matter more than synthetic tests. Run 30 to 50 past cases through the flow and compare the result with what your team actually did. You will usually find edge cases in the first hour.
This is the same pattern many small companies use when they bring in outside CTO help: start narrow, add review points, and keep the logs clean. It feels slower for a week, but it saves months of rework later.
A simple example from a support team
A support inbox is a good place to see why operational design matters. Order status questions look simple, but the work breaks down fast when customer messages are incomplete or money is involved.
Picture a small ecommerce team. A customer writes, "Where is my order?" The AI reads the message, checks the order system, and drafts a reply with the shipment status, tracking note, and next expected step. The agent does not start from a blank page, which saves time on the repetitive part.
The draft is only half the story. The flow also decides when a person needs to step in.
If the customer asks for a refund and the amount is above a set limit, the AI does not send anything on its own. It prepares the response, adds the order details, and sends the case to a team lead for approval. That keeps small cases moving while stopping larger ones before they create a costly mistake.
The team also logs what happened on each ticket. They keep the order number, the order data the AI used, the draft it produced, the final message that went out, and who approved it if approval was required. When a customer complains later, the team can see the full path instead of guessing.
Some messages do not have enough information to act. A customer may forget the order number, use the wrong email, or mention two orders in one note. Those cases should not bounce around the inbox. The AI sends them to an exception queue with a short reason such as "missing order number" or "order not found."
That queue becomes a weekly review habit. After a few weeks, the team usually spots easy fixes. They might change the contact form to ask for an order number first, tighten the refund threshold, or add a clearer reply template. That is how the system improves: not from a smarter prompt alone, but from small changes to approvals, logs, and exception handling.
Common mistakes that create rework
Most rework does not start with a bad prompt. It starts around the prompt. A team gets a decent AI draft, then loses time because nobody planned the next step, the review step, or the fallback when something looks wrong.
One common mistake is simple: teams automate the answer but ignore the handoff. The model writes a reply, summary, or recommendation, but the employee still has to copy it into another tool, check the customer record again, and ask for approval somewhere else. The draft looks fast. The process is still slow.
Another problem appears a few weeks after launch. Someone created an approval queue, but nobody owns it once the project goes live. Items sit there for hours or days. People start bypassing the queue because waiting feels worse than doing the work by hand. Approval workflows for AI only work when one person owns the queue, another person covers days off, and the team knows how fast reviews should happen.
Logs often fail more quietly. Teams keep a timestamp and the final output, then discover that those two fields explain almost nothing. When a bad outcome appears, nobody can tell what source data the model saw, which rule fired, who approved the result, or what changed before it went out. Thin AI audit logs turn every mistake into detective work.
Alerts can make things worse too. If exceptions land in email, chat, the help desk, and a dashboard, staff miss the one that matters. People need one main place to watch. Two is already pushing it.
Rules also pile up faster than teams expect. A temporary exception for one customer sticks around forever. An old threshold keeps firing after the business process changes. Six months later, nobody trusts the flow because it behaves like a junk drawer.
A simple cleanup habit helps. Assign one owner for the approval queue. Keep one home for alerts. Review old rules every month. Log enough detail to explain a bad result. Fixing those four things often cuts rework before the model improves at all.
Checks before you expand
Expansion is where weak systems break. A team can tolerate fuzzy steps when five people use the tool. It falls apart when twenty people touch it every day.
Ask a new team member to explain the flow after one short walkthrough. If they cannot say where the AI starts, where a person steps in, and what happens when something looks wrong, the process is still too fuzzy. Good operational design feels almost boring because people know the path without guessing.
Then test your approvals. Pick one risky action, such as sending a refund, changing contract language, or closing a fraud flag. Can you see who approved it, when they approved it, and what the AI suggested first? If that answer takes ten minutes and three tools, your trail is too thin.
Next, look at the cases that did not follow the normal path. You should spot them fast. A clean exception queue makes this easy. If unusual cases hide inside chat threads, inboxes, or private notes, patterns stay hidden until customers complain.
Staff also need a simple way to correct a bad answer on the spot. If they must open a ticket for every wrong summary or misrouted case, they will stop fixing problems and start working around the system. Small fixes teach you a lot, but only if people can make them quickly.
One more check matters more than many teams admit: do you review exceptions often enough? A weekly review is enough for many small teams. You want to notice that the same failure shows up every Tuesday, or that one approval rule catches almost nothing and only slows work down.
A short pre-expansion test works well:
- ask a new hire to explain the flow in plain language
- trace one risky action from suggestion to approval
- pull five exception cases and look for repeats
- correct one bad output without leaving the tool
- check whether the review rhythm still fits the volume
If these checks take effort, fix the flow before you add more users or more automation.
What to do next
Pick one process that already happens every week and draw it on paper. Keep it boring and specific. A support refund flow, a sales follow-up, or an invoice check is better than a big "AI transformation" plan nobody can test.
Mark the points where the system can do real damage if it gets something wrong. Start with actions that can spend money, send a customer message, change data, or expose private details. Those are the places where approval workflows for AI matter most.
Do not design the prompt in isolation. AI operational design works when four parts stay together: the prompt, the approval step, the log, and the exception queue. If one part is missing, the team ends up guessing what happened after a bad output reaches a real customer.
A useful first pass is simple. Let the model draft or classify, but do not let it act on its own. Add human approval before any payment, account change, or customer-facing message. Record the input, output, model version, approval decision, and final action in AI audit logs. Send unclear cases to an exception queue with a short reason instead of letting them fail silently.
That is enough to learn where the flow breaks. After a week or two, patterns usually show up fast. Maybe half the exceptions come from missing customer context. Maybe approvals take only 30 seconds, which means the risk is lower than people feared. Maybe one step should stay manual because the gain is tiny.
If you want a second set of eyes on the process, Oleg Sotnikov shares his Fractional CTO and startup advisory work at oleg.is. This kind of outside review is useful when a team already has decent prompts but no clear rules for approvals, logs, or handoff when the model gets stuck.
The best next move is small: choose one live process, sketch the flow, and test it with real exceptions this week.
Frequently Asked Questions
What is AI operational design?
AI operational design is the layer around the prompt. It sets who reviews risky actions, what the system logs, and where odd cases go when the model should not guess.
Without that layer, staff end up fixing mistakes by hand after the tool already touched a customer, a payment, or a record.
Where should human approval happen?
Put approval right before any action that could cost money, change a record, affect access, or send an official customer message. Refunds, contract text, policy exceptions, and account changes usually need a person to sign off.
If the data looks incomplete or two systems disagree, send the case to review too.
What should I log from day one?
Start with the original input, the model output, the time, and an ID that ties the run to a ticket, order, user, or document. Also save the model version, the data source, and any human edits or approvals.
That record lets your team answer a simple question fast: what happened, and who changed it?
Why isn’t a strong prompt enough?
Because a prompt only shapes the reply. Daily work depends on messy inputs, weak data, handoffs, and rules about who approves what.
A prompt may look great in a test and still cause trouble in production if nobody owns the queue or checks risky actions.
What belongs in an exception queue?
Send cases there when the model lacks needed details, sees conflicting data, hits a retry limit, or reaches a step with real customer or money risk. The queue should hold cases that need a person, not every minor issue.
Keep the reason short and clear so someone can decide fast.
How do I choose the first process to automate?
Pick one task that happens often and follows clear rules. Invoice entry, lead triage, and first-pass support replies usually work better than broad projects.
If your team already handles the task the same way most of the time, you have a good starting point.
Should every AI action need approval?
No. If every action needs sign-off, people either stop using the tool or approve things without reading.
Let AI draft low-risk work such as internal summaries or rough replies. Save human review for actions with real downside.
Who should own approvals and the review queue?
Give each decision to one named person or one clear role. Finance should own money movement, team leads should own edge cases in customer messages, and legal or compliance should own regulated language.
Do the same for the exception queue. One owner keeps it moving, and one backup covers time off.
How should I test the workflow before launch?
Run real past cases through the flow before launch. Compare the AI path with what your team actually did and look for gaps around approvals, missing data, and handoffs.
Thirty to fifty examples usually expose the rough spots fast.
How do I know the system is ready to expand?
Ask a new team member to explain the flow after one short walkthrough. If they cannot say where AI starts, where a person steps in, and what happens when something looks wrong, the process still needs work.
Also trace one risky action end to end and review a handful of exception cases. If that feels slow or confusing, fix the flow before you add more users.