AI agent architecture: put agents at the workflow edge
AI agent architecture works best when agents handle messy inputs at the workflow edge, while fixed services protect payments, records, and access.

Why this gets messy fast
Messy agent architecture usually starts with a shortcut that looks smart in a demo: one agent handles everything. It reads the request, decides what the user means, updates the system, and approves the result. In a real business workflow, that mixes judgment, action, and authority in one place.
The trouble starts when the input is vague. People send incomplete tickets, half-finished chat messages, and sloppy emails every day. An agent does not like blanks, so it fills them in. Sometimes it guesses right. Sometimes it guesses wrong with total confidence.
That gets dangerous when the next step touches money, records, or access. One bad guess can send a refund to the wrong customer, change an account detail, or remove a permission someone still needs. Fixing that later is rarely simple. A person has to trace what happened, repair the record, and explain the mess to the customer.
A small support example shows how fast this can go sideways. A user says, "Please use the same payment method as before and let my colleague handle this account now." A single agent might infer which card to use, decide who the colleague is, update the account, and trigger a refund. If any part of that guess is wrong, the company now has a payment problem and an access problem at the same time.
Trust falls even faster than accuracy. People can accept suggestions from a tool. They get nervous when it changes permissions or edits records without a clear rule behind the action. If nobody can explain why access changed, the system starts to feel random.
That is where teams get stuck. They thought they were buying speed, but they gave one agent too much room to improvise in places where the business needs fixed rules.
What the workflow edge means
The workflow edge is the point where messy human input first enters your system. That is usually where an agent fits best. People write vague emails, paste half a support ticket, upload a screenshot, or ask for two things at once. An agent can read that mess, pull out intent, and turn it into something the rest of the system can use.
That matters because the edge is full of missing details, mixed signals, and odd wording. A customer might write, "I changed banks last month and still got charged twice. Can you fix my account and send the money back?" A rules-based service will struggle if you push that whole message straight into business logic. An agent can split it into separate requests, spot what is unclear, and ask one follow-up question instead of making a risky guess.
At this stage, the agent should do a narrow set of jobs: read chats, emails, forms, and call notes; sort requests by intent and urgency; extract names, dates, IDs, and other fields; and draft the next action in a structured format.
It should stop there when money, records, or permissions are involved. The agent can prepare a refund request, a billing update request, or an account change request, but it should not press the final button on its own. Another service should check the rules, confirm the user has access, and record what happened.
That handoff is the point. The agent turns messy language into clean data. Deterministic services then work with fixed inputs, fixed rules, and clear logs. You get the flexibility of natural language at the edge without letting the uncertain part of the system edit customer records or move funds.
A simple way to frame it: the agent drafts, the service decides. If the agent reads a note wrong, the system can catch the mistake before anything serious happens.
What deterministic services should own
An agent can talk to users, collect context, and suggest the next action. It should not move money, change formal records, or decide who gets access. Those jobs belong to deterministic services with fixed rules and clear service boundaries.
Payments sit firmly on that side. If a customer asks for a refund, the agent can gather the order number, explain the policy, and prepare the request. The payment service should still decide whether it can capture, refund, or void a charge. It should do that with exact limits, duplicate-request checks, and a clear result.
Records need the same treatment. Customer profiles, invoices, contracts, and account history should live in systems that allow only approved writes. The service should check field formats, required approvals, and allowed state changes before it saves anything. That may feel strict, but strict rules save teams from expensive cleanup.
Roles and permission checks should also come from one place, not from prompts or tool-specific logic. The agent should ask whether a user can perform an action, and the authorization service should answer yes or no. If several tools make their own guess, access rules drift quickly and people stop trusting the system.
Approvals and state changes need a full audit trail. When someone updates a billing address, grants a refund, or unlocks an account, the service should record who requested it, who approved it, what changed, and when it happened. If finance or support reviews the case later, they need facts, not a vague chat summary.
Simple business rules should also block bad requests early. A refund larger than the original payment should fail. A role change without manager approval should fail. An account closure with an unpaid invoice should fail. The agent can explain the rejection in plain language, but the service should make the call.
That split keeps the agent useful and keeps risky actions boring. Around money, records, and access, boring is exactly what you want.
How the handoff works
A clean handoff keeps the agent useful without letting it touch the risky parts of the system. The agent listens to messy human input, then passes a small, boring request to code that follows fixed rules.
The first step is translation. A user writes something loose like, "Please move this payment to next month and keep my current access." The agent should not act on that text directly. It should turn it into a short structured request with fields such as action type, account ID, date, and reason. Small is better. If the agent invents extra meaning, the rest of the flow gets shaky.
Then the system should check the request before any business action starts. A validator confirms that required fields exist, checks formats and limits, rejects actions the user cannot ask for, and blocks anything unclear until the missing details are filled in.
That validator should stay strict. It should not guess. If the request says "change owner" but the allowed action list only includes "change billing contact," the system should stop there.
After validation, a normal service makes the decision. This service reads the real records, applies permission checks, and follows fixed business rules. It returns a plain result such as approved, denied, or needs review, plus a reason code. The agent does not decide whether money moves, whether records change, or whether a permission update is allowed. The service does.
Record the outcome before the agent answers the user. Write the request, the decision, and the reason to the system of record or audit log first. Then let the agent turn that result into a human reply. That order matters. If support needs to review the case later, they can see exactly what the user asked, what the service decided, and why.
When teams skip this sequence, the agent starts acting like a hidden backend. That is when mistakes get expensive.
How to place an agent in a workflow
Map the workflow before you add any AI. Look for the spots where people type in their own words, paste long messages, upload odd files, or ask for something that rules alone cannot classify well. That is usually the right place for the agent.
Then mark the risky steps. If a step moves money, edits a record, changes access, sends a legal notice, or deletes data, do not let the agent do it directly. The agent should sit just before that boundary. It turns messy input into a clean request, then a normal service decides whether the action can happen.
One simple rule works well: put the agent where the input is messy, keep business rules in deterministic services, run permission checks after the agent instead of inside it, require approval for high-risk actions, and log the agent output separately from the final action.
This matters more than most teams expect. An agent can read, summarize, extract intent, and ask follow-up questions. It should not be the part that refunds a payment, grants admin access, or rewrites a customer record. A deterministic service can check account state, policy, limits, and audit rules the same way every time.
Think of the agent as a translator, not a cashier.
A support flow makes this easy to see. A customer writes, "I got charged twice and also lost access after changing my email." The agent can split that into two structured requests: refund review and account recovery. After that, separate services check payment history, confirm identity, inspect permission rules, and decide whether each action is allowed.
Test the flow with bad input before you ship it. Use vague requests, missing details, prompt injection attempts, and requests that should fail. If the agent confuses two actions, skips a field, or makes a risky guess, you want to catch that in testing, not in production.
Teams that do this well keep the agent close to the workflow edge and keep the hard controls in plain code. That setup is easier to audit, easier to debug, and much safer when the stakes are real.
A simple example with refunds and account changes
A customer sends one support message: "Please refund order 48291, and change my billing email to [email protected]." The agent reads the note, pulls out the order number, and separates the message into two jobs: a refund request and an account update request.
That split matters. The agent handles the messy part: free-form language, missing punctuation, and mixed intent. It does not decide who gets money back, and it does not change account records on its own.
The refund request goes to a refund service. That service checks the paid amount, the order status, and the refund policy. It can ask straightforward questions: Was the charge settled? Has this order already been refunded? Is it still inside the allowed refund window? Then it returns a clear result like "approved for $49" or "denied because the order is older than 30 days."
The email change goes to an account service. That service should be stricter. Changing a billing email affects receipts, invoices, and often account recovery. So the service asks for identity proof before it updates anything. It might require a one-time code sent to the current email, a signed-in session, or another approved check. Until the customer passes that check, the service keeps the change in a pending state.
Now the agent has two structured results. It can write a final reply in plain language:
"Your refund for order 48291 is approved and will return to your original payment method. Your billing email is not changed yet. Please confirm your identity with the code we sent to your current address."
If the refund service rejects the request, the agent should explain that too, using the exact reason from the service instead of inventing one.
This is a clean pattern. The agent sits at the workflow edge, where language is messy. Deterministic services sit behind service boundaries, where rules, permission checks, and audit trails need to stay exact. The customer gets one smooth conversation, but the risky actions stay under tight control.
Mistakes teams make
Most failures start with a shortcut, not with the model itself. Teams build a useful agent, then let it do too much because the demo feels smooth. That is usually where trust breaks.
The first bad move is giving the agent direct access to payment, admin, or account tools. An agent can suggest a refund, draft an account change, or prepare a support action. It should not press the final button on money movement, record changes, or permission updates. A separate service should check the rules every time.
Shortcuts that cost money
Another common mistake is hiding business rules inside prompts. A prompt is easy to change and hard to audit. If your refund policy says "30 days, original payment method, no open fraud flag," put that logic in code, not in a paragraph the model reads. Prompts can explain the job. They should not decide policy.
Small flows often skip logs because they look harmless. That choice ages badly. When a customer asks, "Why did the system change my address?" or finance asks why a refund went through, you need a record of the request, the data used, the checks run, and the final action. If you cannot replay the path, you cannot fix it with confidence.
Broad write access is another trap. Teams give the agent wide permissions to save time during setup, then forget to tighten them later. Start narrow. Let the agent read what it needs, ask for a decision, and hand off the write step to a service with strict checks.
Confident wording fools people more often than strange wording. The model says something clearly, so the team assumes it must be true. Style should never beat evidence. If the agent says a user is eligible for a refund, the service still needs to check order status, amount, account history, and who approved the action.
A good rule is simple: let the agent talk, summarize, and prepare. Let deterministic services decide, verify, and write.
Quick checks before launch
If an agent can touch money, customer records, or access rights, you need a short pre-launch check that a real team can answer in a few minutes. If the answers sound fuzzy, the design is still fuzzy.
A good test is whether each risky action has one clear owner. The agent can suggest, classify, or prepare a request, but one deterministic service should decide things like charging a card, changing an address, granting access, or deleting data. If two services can both perform the same risky action, people will lose track of where the real rule lives.
Logs matter just as much. A support person should be able to replay a decision later and see what the agent saw, what data it passed along, which rule service answered, and why the final action happened. If your log only says "model approved," that is not enough.
Before launch, run a few checks. Pick five risky actions in the flow and write down the single service that owns the final decision for each one. Take one completed case from staging and see whether another person can trace it from user request to final outcome without guessing. Break the input on purpose by removing a field, sending conflicting data, or giving an unclear request. The flow should stop cleanly instead of inventing the missing part. Find an unusual case and route it to a human, who should be able to approve, reject, or edit the action without fighting the system. Then replace the model in a test environment. Your business rules should stay the same, and the rest of the flow should still work.
That human-review check gets skipped too often. Teams assume a person can step in later, then discover there is no review screen, no audit trail, and no clean way to override the result.
One more rule helps: fail closed. If the agent seems unsure, or the input looks incomplete, stop the action and ask for more data or a human decision. A paused workflow is annoying. A wrong payment or bad permission change is worse.
Next steps for your team
Start with one workflow that already costs your team time or creates support tickets. Refunds, account changes, onboarding, and document intake are good places to start because the risk is easy to see. If you try to redesign everything at once, the discussion gets fuzzy fast.
Put that workflow on one page and label each step: agent, service, or human. Then mark the steps that can change money, records, or permissions. Those steps need deterministic services, clear service boundaries, and strict permission checks.
A simple split works well. Let the agent handle intake, follow-up questions, drafting, summaries, and triage. Let deterministic services calculate totals, update records, enforce rules, and call payment or identity systems. Let humans review unusual cases, complaints, and anything with legal or brand risk. Log every handoff so your team can compare what the agent suggested with what the service actually did.
For the first version, keep the agent at the workflow edge even if that feels conservative. In a refund flow, for example, the agent can collect the order number, reason, and customer tone. The refund service should still check policy, calculate the amount, and execute the transaction.
Then test the flow with real cases from the last month. Use easy cases, messy ones, and a few that should fail. If your team cannot explain why the agent stopped, escalated, or passed control to a service, the design is still too loose.
This is where architecture stops being theory and starts becoming an operating model. People know what the agent may suggest, what the system may do, and where a human still decides.
If you need a second opinion on those boundaries, Oleg Sotnikov at oleg.is works with startups and small teams as a fractional CTO on AI-first software, automation, and product architecture. That kind of outside review can help before an agent touches a live workflow.
Frequently Asked Questions
What does the workflow edge mean?
The workflow edge is where messy human input first hits your system. Think emails, chats, forms, screenshots, or call notes. Put the agent there so it can read the mess, pull out intent, and turn it into structured data for the rest of your stack.
Why not let one agent handle the whole workflow?
Because one agent should not guess, decide, and execute risky actions in the same step. If the input is vague, it may fill in blanks and act with confidence. That causes real damage when the action touches refunds, account records, or permissions.
What should an agent do in a support flow?
Let the agent read free text, split mixed requests, extract names, dates, IDs, and ask follow-up questions when details are missing. It should draft the next step in a structured format, then stop before any write, approval, or payment action.
What should deterministic services own?
Deterministic services should own money movement, record changes, permission checks, approvals, and audit logging. Those services read fixed inputs, apply fixed rules, and return a result with a reason. That keeps risky actions predictable.
How should the handoff from agent to service work?
First, the agent translates the user's message into a small structured request. Next, a validator checks required fields, allowed actions, and limits. After that, the service reads real records, applies rules, writes the result, and logs the decision before the agent replies.
When should a human review the action?
Ask for human approval when the action carries real risk or the input stays unclear. Refund exceptions, access changes, legal notices, account ownership changes, and odd edge cases should pause for review. A human needs a screen that shows the request, the checks, and the proposed action.
Why should business rules stay out of prompts?
Prompts change easily and leave weak audit trails. Business rules need code, tests, and one owner. If your refund policy or access policy lives in a prompt, your team will struggle to trace who changed it and why a decision happened.
What should we log for risky workflows?
Log the user request, the agent's structured output, the validator result, the service decision, the reason code, and the final write. Record who approved the action and when it happened. That gives support and finance a path they can replay without guessing.
How do we test this before launch?
Use real cases, messy input, missing fields, conflicting details, and requests that should fail. Try prompt injection too. The flow should stop, ask for more data, or route to a human instead of inventing a missing fact or pushing through a risky action.
What does fail closed mean in practice?
Fail closed means the system stops when it lacks enough data or confidence. It asks for clarification or sends the case to a person instead of making a risky guess. That slows one case down, but it prevents bad refunds, bad record edits, and bad access changes.