Dec 21, 2024·8 min read

What not to automate with AI in an AI-first company

What not to automate with AI in an AI-first company: learn which work to keep manual until rules settle, data cleans up, and product changes slow.

Why this goes wrong fast

When teams ask what they should not automate with AI, the worst answer is usually "nothing." They see the loudest pain in the company, feel pressure to fix it, and rush automation into a process that still has loose rules, messy data, and too many exceptions.

That feels practical. It usually makes the mess harder to spot.

If people already disagree on how a task should be done, the model fills the gaps on its own. It does not know your intent. It guesses. Sometimes it guesses well enough to look useful for a week, which is almost worse. The errors stay hidden until they reach a customer, a contract, or a report.

Early product churn makes the problem sharper. Startups change pricing, packaging, onboarding, naming, and internal rules all the time. Small changes can quietly break prompts, routing logic, and tool handoffs. The workflow keeps running, but it now follows last week's product instead of today's one.

Bad inputs spread the damage fast. A wrong field name, a stale CRM note, a missing status, or a half-cleaned spreadsheet can push a small error through several steps before anyone notices. One bad summary becomes the wrong ticket. The wrong ticket triggers the wrong reply. Then someone trusts it because "the system" produced it.

A simple rule helps: if humans still solve the task by asking each other what the rule means, do not automate it yet. Write the rule first. Watch real cases. Find the edge cases that people fix by instinct and never document.

This is why solid AI-first work can look slower at the start. The hard part is not writing a prompt. The hard part is trimming unstable product areas, cleaning weak data paths, and getting the team to agree on what a correct outcome looks like. Until then, automation can turn a local problem into a company-wide one.

Do not automate work with fuzzy rules

If two managers answer the same question in two different ways, stop there. AI will not solve that disagreement. It will turn a hidden conflict into fast, repeatable mistakes.

You see this in ordinary work all the time. One support lead approves a refund because the customer sounds upset. Another says no because the account falls outside policy. Both answers may feel reasonable. The rule is still too loose for automation.

Unwritten exceptions are another warning sign. Staff carry them in memory: "We usually do X, unless the client is in renewal" or "Skip this step for older accounts." Humans can work around that for a while. A model usually cannot, and the team may not notice the errors until customers complain.

Keep human review when the goal changes from case to case. Some days the team wants speed. Other days it wants margin, retention, or legal caution. The same input can need a different answer depending on what matters that day. A person can pause and ask one more question. Automation needs that choice written down first.

Write the rule in plain language before you automate anything. If a new hire cannot read it and make the same call as your strongest operator, the task is still too vague.

A quick test works well. Give three people five real examples and ask them to decide using only the written rule. If they disagree often, keep the task manual for now. Tighten the rule, collect the missing exceptions, and test again.

Clear rules save more time than a rushed bot ever will.

Leave moving product areas alone for now

Some product areas change too fast to automate well. If the team still edits the flow every sprint, AI will learn yesterday's version and push the wrong action today.

This happens most often in onboarding, pricing, account setup, and new feature flows. The team changes button text, moves fields, adds a review step, then removes it a week later. If you automate around that too early, you lock in guesses instead of a real process.

A screen redesign is a clear warning sign. If product and design still argue about what belongs on the page, do not build prompts, agents, or rules around that screen yet. You will spend more time fixing the automation than the automation saves.

New features need the same caution. Open questions are normal at the start, but they make poor inputs for automation. If the team still asks, "Who approves this?" or "Should users see this before payment?" wait. Human judgment is cheaper than rebuilding brittle workflows every few days.

You can usually spot unstable work quickly. Hold off if the flow changed more than once in the last two weeks, support gives different answers to the same user question, design files and the live product do not match, or product decisions still depend on one person's memory.

User feedback should settle the shape of the work first. A few weeks of real usage show where people get stuck, what they ignore, and what they misunderstand. That is better than building automation around internal guesses.

A startup might launch a new billing page and feel pressure to automate upgrade help right away. It sounds efficient, but it often backfires. If users still ask where invoices live, whether plans prorate, and why seat counts changed, the team is still learning the flow. Fix the product, watch the questions repeat, then automate the parts that stop moving.

Stable work is boring in a good way. That is usually where automation starts paying off.

Watch for weak data paths

Bad data gives AI a false sense of confidence. The model does not know that a field is missing, mislabeled, or copied into the wrong box unless you build checks around it. If your team still asks, "Why did this record end up here?" pause automation.

The trouble often starts in tool handoffs. Someone copies a lead from a form into a CRM, pastes the company name into a spreadsheet, then updates a support tool by hand. That looks harmless until one typo or skipped field changes the next step. AI will move faster than a person, but it will spread the same mistake across more places.

Duplicate records create the same kind of mess. If one customer appears three times with small differences in name, email, or account status, an automated workflow may send the wrong message, open a second ticket, or mark the wrong deal as closed. Fix that before you add AI. Cleanup work is dull, but it saves far more time than debugging bad actions later.

Start with the places where data changes hands or gets reused. Look for fields that arrive blank, in the wrong format, or under the wrong label. Look for staff copying and pasting between tools, duplicate records for the same customer, and values that trigger actions in billing, support, or product access.

One bad value can travel a long way. A wrong plan type in the CRM can create the wrong invoice, limit product access, and trigger a confused support reply. That chain reaction is easy to miss because each step looks small on its own.

If the path under the task is shaky, the model is not the real problem. The path is.

Oleg Sotnikov often pushes teams to fix architecture and process problems before adding more automation. He is right to do that. Clean inputs, clear field names, and simple checks beat clever prompts every time. Once the data path is stable, AI can help without turning small mistakes into expensive ones.

Keep high-stakes calls with people

Use Drafts Before Auto Send

Let AI draft the work while your team keeps the final call.

Start With Drafts

If one bad answer can lose money, trigger a dispute, or lock out the wrong user, keep a person in the loop. AI can draft, sort, and summarize. It should not make the final call when the cost of a mistake is high.

Pricing is a good example. A model may suggest discounts, rewrite plan details, or reply to a prospect with the wrong promise. That can cut revenue for months, not minutes. Let AI prepare options, but have a person approve any pricing change that affects margin, contract terms, or renewal dates.

The same goes for angry customer messages. When someone writes after an outage, a billing error, or a failed delivery, tone matters as much as facts. An automatic reply can sound cold or miss the point and turn a fixable problem into a public complaint. Let AI draft the response, then let support staff edit and send it.

Some actions should stay manual even at low volume: larger refunds, contract edits, account access changes, user bans or reinstatements, and messages that mention legal claims, threats, or compliance issues.

A practical rule works well: if the task changes money, rights, or trust, require approval. That also covers unusual password reset cases, ownership transfers, and requests to delete records. Speed matters, but not more than judgment.

Picture a small SaaS company with one support inbox. A customer says they were charged twice, wants a refund, and threatens to post screenshots online. AI can pull the invoice, summarize past tickets, and draft a reply in seconds. A person should still decide the refund, the wording, and any goodwill offer. That extra review might take three minutes. It can save the customer relationship and prevent a much bigger mess.

How to test a task before automation

A task is ready for automation only when people can describe it the same way and get close to the same result. If the team cannot do that, the task is still too messy.

Start with one page. Write down the input, the action, and the expected output. Define what arrives, in what format, and from whom. Then describe the exact steps a person takes and what good work looks like at the end.

Keep it plain. If the task description turns into a long debate, stop there. AI will not fix a rule that people cannot explain.

Next, ask three team members to explain the rule on their own. Do this separately, not in a group call. If one person says "approve if the customer looks serious" and another says "approve if the budget is clear," you do not have a rule. You have judgment, habits, and guesswork.

Then test the work on ten recent real cases by hand. Ten is enough to expose the mess without turning this into a big project. Use actual examples from the last few weeks, not perfect samples picked after the fact.

As you go, mark every exception. A missing field, a weird customer note, a broken handoff between tools, or a case that needs manager approval all matter. If four out of ten cases need special handling, the task is not stable yet.

One pattern works well in most teams: automate the boring middle first. Leave the first judgment and the final sign-off with people for now. Let AI clean text, pull fields from forms, sort requests, draft summaries, or move data between systems. Those parts are repetitive, and when they fail, the damage is usually smaller.

A good test feels a little boring. That is the point. If a task still depends on memory, side chats, or "you just know," keep it manual until the rule gets tighter.

A simple startup example

Bring in a Fractional CTO

Get hands on help with AI rollout, product architecture, and tough technical decisions.

Book Consultation

A five-person SaaS team decides to use AI for trial emails. It sounds simple. New users sign up, wait a few days, get a reminder, and then see an upgrade offer.

But the offer changes all the time. In January, the team pushes annual plans. In February, it tests a lower entry price. In March, it adds a limited discount for users who invite teammates. If the team lets AI send messages on its own, the copy starts to drift. Some users get last month's price. Others see a discount that no longer exists.

The data path makes it worse. Product usage shows up fast, but billing status arrives late from another system. For a few hours, the app says one thing and the billing tool says another. That gap is enough to send the wrong email to the wrong person. A paid user might get a trial reminder. A low-usage user might get pushed to upgrade too early.

So the team does not start with auto-send. It starts with drafts. AI writes the first version of each email, including the subject line and offer text. Then a person checks the price, the segment, and the timing before anything goes out.

That sounds less impressive, but it is the better move. The team still saves time. Writing a draft takes minutes instead of half an hour, and someone can catch obvious mistakes before they hit customers.

After a month or two, pricing stops changing every week. The team also fixes the late billing sync and adds a simple check so emails go out only when both systems agree on account status. Only then does it automate the next step, such as sending reminder emails for one stable user segment.

That is the right sequence. Do not automate the final action when the rules keep moving and the data arrives out of order. Let AI help with the work first. Let people keep control until the offer and the data settle.

Mistakes teams make early

Teams usually make the same mistake first: they automate a messy process and hope the mess disappears. It rarely does. AI just helps the broken process fail faster and in more places.

Support triage is a common example. If the team already tags tickets inconsistently, writes unclear escalation rules, and changes priorities every week, an AI layer will not fix that. It will spread the confusion. Now nobody knows whether the bad outcome came from the model or the process.

Another early mistake is trusting one prompt because it looked good in a demo. A prompt that works on ten clean examples can still miss edge cases, quiet failures, and expensive errors in real work. Teams need simple checks: how often does it miss, what kind of miss hurts most, and who notices first.

Ownership gets ignored too often. Someone needs to handle exceptions, stop bad runs, and roll work back when output goes sideways. Without a named owner, small failures sit in the system for days because each person assumes someone else is watching.

Tool sprawl shows up fast in AI-first teams. A company connects chat, email, CRM, docs, tickets, and billing on day one because the setup looks easy. Then one field changes, one webhook fails, or one API limit hits, and the whole chain starts dropping context. Weak data flows do not become safer because more tools sit in the middle.

The last mistake is removing human review too early. Teams get a few good results, feel pressure to save time, and cut the review step before they earn that trust. That is usually when the worst cases arrive: unusual refunds, tricky sales promises, or customer records with missing data.

A better pace is boring on purpose. Clean the process first. Test one narrow task. Measure misses for a few weeks. Keep a person in the loop until the errors are small, predictable, and cheap to fix.

Checks before you automate

Check Your Data Path

Find weak handoffs, stale fields, and silent errors before automation.

Check My Data

Most bad automation starts one step too early. A team sees a task happen a few times, adds AI, and then spends weeks cleaning up edge cases, bad outputs, and confused customers.

A short manual check saves a lot of repair work later. Keep the task human for now if any of these points fail: one person cannot explain the rule in one paragraph, the process has not stayed mostly the same for at least a month, you cannot trace every input back to a clear source, nobody can stop the workflow in seconds, or one wrong action would be hard to reverse.

The first check is where things often break. If two people describe the same task in different ways, the rule is still fuzzy. AI will not fix that. It will copy the confusion and do it faster.

Stability matters just as much. If the team changed the flow three times this month, wait. Automation works best on work that has settled down. A moving target turns every prompt, script, and guardrail into rework.

Data is the next weak spot. If you cannot say where each input came from, you should not trust the output. A scraped spreadsheet, a copied message, and a half-filled CRM field may look usable, but weak data paths create silent mistakes.

You also need a fast stop button. If an AI agent can send emails, update records, or approve actions, someone must be able to pause it right away.

Keep the blast radius small. If one wrong step can create a refund mess, a legal problem, or a broken customer account, do the task by hand until you add tighter controls.

A simple test sums it up: can you explain the work clearly, trust the inputs, stop it fast, and undo the damage? If the answer is no, you are not saving time yet. You are just moving risk around.

What to do next

Start small. Pick one task with stable inputs, a clear expected result, and little downside if it goes wrong. The safest place to begin is usually the dullest work in the company.

Do the task by hand first, even if you plan to automate it next week. Write down each step in plain language, then track the error rate, edge cases, and how often a person needs to step in. If the manual version keeps changing, the automated version will drift even faster.

A simple first pass is enough: choose one low-risk task that repeats often, document the manual steps and expected output, measure mistakes for a week or two, and add human review before anything goes live.

Logs matter more than most teams think. You need a record of what the system saw, what it produced, and who approved it. You also need a clear stop button so someone can shut the workflow off in seconds when output starts going strange.

Do not expand the scope just because the first demo looks good. Let the task stay boring for a while. If it runs the same way over and over, with few surprises and low correction time, that is a much better signal than one flashy week.

This is where teams save money instead of burning it. They automate one narrow job, watch it closely, and only then move to the next one.

If you want a second opinion before automating a process, Oleg Sotnikov at oleg.is does this kind of review as part of his Fractional CTO work. A short workflow review can catch the part that still needs a human before it turns into a bigger mess.

Frequently Asked Questions

What is the first sign a task is not ready for AI automation?

Do not automate a task when people still stop and ask each other what the rule means. If the team cannot explain the rule in plain language and reach the same answer on real cases, AI will fill the gaps with guesses.

Should I automate a process if managers handle it differently?

No. If two managers make different calls on the same input, the problem sits in the rule, not in the tool. Write the rule, test it on real examples, and keep the task manual until people agree.

Are fast-changing product areas bad candidates for automation?

Usually no. Onboarding, pricing, account setup, and new feature flows often change every week, so automation follows old rules fast. Wait until the flow stays mostly the same for a while and support answers stop changing.

Why does messy data cause so many automation problems?

Because AI trusts the inputs you give it. One wrong field, stale note, or duplicate record can trigger the wrong reply, ticket, invoice, or access change across several tools.

Which decisions should stay with people?

Keep people on any decision that changes money, rights, or trust. Refunds, contract edits, account access, bans, ownership changes, and legal or compliance messages need human judgment even if AI drafts the first pass.

How can I test if a task is ready for automation?

Start with one page that defines the input, the steps, and the expected output. Then ask three people to use that rule on ten recent cases; if they disagree often or hit many exceptions, wait and tighten the process first.

Is starting with AI drafts better than full automation?

Yes, that is often the safer start. Let AI write the draft, pull data, or summarize context, then let a person check the final action until the rules and data stop moving.

What mistakes do teams make when they rush AI automation?

Teams often automate a messy process and hope the mess goes away. They also trust a demo too soon, connect too many tools at once, and skip naming one person who owns failures and fixes.

Do I really need logs and a stop button before I automate anything?

Yes. Someone needs to pause the workflow in seconds when it starts doing the wrong thing. Logs also matter because you need to see what the system saw, what it produced, and who approved it.

What should I automate first instead?

Start with repetitive, low-risk work where the correct result stays clear. Good first candidates include drafting emails, cleaning text, sorting requests, pulling fields from forms, and moving simple data between systems with review in place.