Apr 18, 2026·8 min read

QA plan before shrinking engineering: what to change first

QA plan before shrinking engineering starts with clear test ownership, human review points, and production feedback that catches what automation misses.

QA plan before shrinking engineering: what to change first

Why the old QA plan stops working

Most old QA plans assume three things: spare reviewer time, clear ownership, and a test suite people still trust. When the engineering team gets smaller, reviewer time disappears first. Code still gets merged, but fewer people pause to question assumptions, replay odd user paths, or challenge a test that passes without proving much.

That is when trust starts to slip. The old suite still runs on every commit, yet people quietly stop believing parts of it. Some tests protect flows the product no longer uses. Others fail for noisy reasons, then pass on rerun. A wall of green checks can look reassuring while real gaps keep growing underneath.

If the team shrinks and the QA plan stays the same, the rituals usually stay in place while attention drops. That is worse than having fewer tests and knowing exactly what they cover.

Ownership gets blurry fast, too. On a larger team, one person builds the feature, another reviews it, and someone else thinks about release risk. On a smaller team, one engineer often does all three while also handling support, bug fixes, and delivery. Bugs slip through when each person assumes someone else checked edge cases, data cleanup, or rollback steps.

AI changes the pace, but it does not replace judgment. It can draft many tests in minutes, especially around happy paths and obvious edge cases. It cannot decide which failure would hurt the business most, which workflow breaks customer trust, or which old test is still worth maintaining. Teams often mistake speed for coverage, then learn too late that they automated the easy parts.

That is why the old plan fails. It was built for a different team shape, a different review habit, and a different sense of ownership. If you cut the team before you rewrite QA, you keep the ceremony and lose the safety.

Sort work by risk, not by tool

Many teams organize QA into neat boxes: manual testing, automated testing, and monitoring. It looks tidy, but it hides the only question that matters: what hurts customers or blocks revenue when it fails?

Put risk first. If login breaks, people leave. If an internal report is wrong for an hour, you can usually live with it. Those checks should not get the same attention just because they sit in the same suite.

A simple exercise helps. Put every test, alert, and review step into one sheet. For each item, note what flow it protects, whether it blocks a release or just gives a signal, who designs it, who reviews it, whether AI can draft it, and what production signal would tell you it missed something.

This gets honest very quickly. Most teams find checks nobody reads, reports that never change a release decision, and alerts that fire so often people ignore them. If nobody acts on a check, remove it or fold it into something smaller.

Customer-facing flows go to the top. For most SaaS products, that means signup, login, billing, the main product action, and anything tied to data loss or security. Internal tools, rare edge cases, and cosmetic checks can sit lower unless they have a real business cost.

After that, split the work into four lanes. AI can draft repeatable tests, fixtures, and regression cases when the requirements are clear. People still need to design coverage for risky flows, weird user behavior, and the tradeoff between speed and safety. Human review should stay on release blockers and on tests that look easy to write but are easy to get wrong. Production signals close the loop through error reports, failed jobs, support tickets, and usage drops.

Once teams work this way, they stop asking, "Can AI handle QA?" They start asking better questions: what must not fail, what still needs a person, and what production will catch before customers feel the damage.

What AI can draft well

If you are rebuilding QA before shrinking the team, start with the work that already has clear inputs and clear expected results. AI is good at turning rules, examples, and past fixes into a useful first draft. That saves time on repetitive setup, even when a person still reviews the result.

It works best when the rule is concrete and repeatable. Unit tests for pricing thresholds, permission checks, and form validation are a good fit. So are regression tests based on bugs you already fixed, especially when the bug report shows the trigger, the wrong result, and the expected result. API test cases also work well when the contract rarely changes. The same goes for test data builders, mocks, and fixtures - the plumbing work that takes longer to type than to understand.

The difference between drafting and designing matters here. AI can write twenty tests for a discount rule in a few minutes. A person still needs to ask whether those tests cover the weird cases. But the slow part is already done.

History is another good input. Give AI a ticket, the pull request, and the final fix, and it can produce a solid first pass of test notes. Those notes often point to the affected area, the likely failure path, and a short list of tests to add. That is not finished QA thinking, but it is a strong start.

API testing is especially efficient when the contract is stable. If an endpoint takes a user ID, a date range, and a sort order, AI can draft happy-path and error-path cases quickly. Many teams lose hours on this kind of setup.

One warning matters more than the rest: AI drafts well when the rule is visible. If the real behavior lives in unwritten habits, tribal knowledge, or a messy chain of side effects, the draft will look neat and still miss the point. Use AI where the system speaks clearly on its own.

What people still need to design

AI is good at producing many plausible tests. It is bad at deciding which failures would damage the business, break trust, or create a messy rollback at 2 a.m. People still need to do that work.

Start with the areas where one mistake has a real cost. Payments, refunds, access control, account deletion, and any action that can erase or expose data belong in that group. A model can draft test cases for these flows, but a person has to define the outcomes that must never happen.

Specs also miss how real users behave. People click back, reopen old tabs, retry after a lag, paste stale data, or finish steps in the wrong order. Those paths rarely appear in clean product docs. Humans catch them because they know how customers actually move through the product, and they know which weird path turns into a support ticket or a lost sale.

Cross-system failures need the same kind of judgment. If your app saves a record, then calls billing, then sends email, where does the source of truth live when one step times out? What should the user see if a background job finishes late? Should the system retry, block the action, or ask for manual review? AI can suggest cases, but people need to define the right behavior.

Smaller teams also need short, strict release gates for risky changes:

  • Require manual review for money movement and permission changes.
  • Block release if rollback steps are unclear or untested.
  • Hold the deploy if monitoring cannot show failure within minutes.
  • Pause launch if support cannot explain the new edge cases.

Stop rules matter just as much. Some features should not ship if the team cannot prove basic safety. Missing audit logs, vague timeout handling, or actions that can leave data half-written are good reasons to stop.

A small SaaS team can accept a minor UI bug and fix it tomorrow. The same team should not accept a billing bug that charges twice or a role bug that exposes one customer's data to another. People set that line. AI does not.

A simple way to rebuild the workflow

Design Safer Rollbacks
Set release rules your team can follow when risky changes touch money or permissions.

Do not try to redo the whole system at once. Start with one product area people use every day, such as billing, signup, or data import. If you rebuild everything in one pass, you create noise and learn almost nothing.

Start with evidence, not guesses. Pull the last few months of incidents, support tickets, flaky tests, and rollback notes into one small review. Focus on the places where the product already showed stress. Those are the places a smaller team will feel first.

Use real bugs as test fuel

Ask AI to draft tests from actual failures and the current code. This works much better than asking for broad, generic coverage. A support issue like "CSV import drops rows with empty cells" gives the model something concrete. The drafted tests are usually far closer to what you need.

Do not ship those drafts untouched. Someone still needs to cut weak cases, merge duplicates, and add the checks AI misses. That usually means edge cases around business rules, silent data damage, and cases where the code looks fine but the user outcome is still wrong.

A small SaaS team might feed ten recent import bugs into a model and get thirty proposed tests back in an hour. After review, maybe twelve survive. That is still a good trade. The team saved time, and a person decided what deserved trust.

Prove the flow before you expand it

Run the new process for two sprints before you spread it wider. Track a few plain numbers: how many drafted tests you kept, how many bugs escaped anyway, and whether review time stayed reasonable. If the process adds work without catching more problems, change it early.

This is also where production closes the loop. New incidents should go back into the same queue so every escaped bug becomes material for the next round of AI drafted tests and human review. Simple cycles work best on small teams.

Use production signals to fill test gaps

QA should not stop at pre-release testing. Once the team gets smaller, production tells you where the real holes are, often faster than any planning document.

Support tickets are a good starting point. If three customers hit the same confusing bug in a week, that is not just support work. It points to a missing test case, and often to a missing product rule.

Look at failures by feature, not just by service. A database alert may tell you where the error happened, but users feel it inside a feature like signup, billing, or export. If your team already uses Sentry, Grafana, or raw logs, group issues around user flows so patterns show up sooner.

Small teams also miss near misses because nothing fully breaks. A retry that succeeds on the second attempt, a form that fails only for one input shape, or a page that times out for a few seconds still tells you something. Those moments often point to weak checks that a full outage report will never catch.

A simple loop works well:

  • Collect repeated support issues every week.
  • Tag errors by feature and user action.
  • Save failing inputs and odd edge cases.
  • Replay them in a safe test environment.
  • Add the case to your test prompt or template.

Saved inputs matter more than many teams think. If a CSV import fails on one strange header, keep that file. If checkout breaks on one tax setting, store that payload. Replay the exact input in staging or in a test harness, then turn it into a permanent regression test.

AI can help here too. It can turn bug reports, logs, and failing payloads into draft test cases in minutes. People still need to check the logic, remove noise, and decide what should become a permanent rule.

The loop only works if the team feeds discoveries back into the system. Update prompts, test templates, and review checklists every time production teaches you something new. That habit stops small teams from fixing the same bug twice.

A simple example from a small SaaS team

Turn Bugs Into Tests
Build better QA from recent incidents, support issues, and failed deploys.

One small SaaS company cut its engineering group to three people. Each release had to carry less ceremony and more intent. The same engineers now handled releases, support tickets, and bug fixes, so they rewrote the QA plan before reducing the team any further.

They stopped treating every test the same. After each merged change, AI drafted API tests and basic form checks. That worked well for repeatable paths like signup fields, profile edits, and simple validation rules. It saved time and caught the sort of breakage that often slips into fast edits.

They did not trust AI with everything. One engineer owned checks around billing and account access. She wrote the cases that could hurt customers most: failed renewals, wrong plan limits, locked accounts, and permission mix-ups. Those flows needed judgment, not just coverage.

A week later, production logs exposed a gap. Invite emails failed in a narrow edge case: a user got invited twice, clicked the older email, and landed in a broken state. None of the drafted tests caught it because the path depended on timing and real user behavior.

The team did two small things instead of rebuilding the whole suite:

  • They added one replay test that copied the broken invite sequence.
  • They added one release check for duplicate invites before each deploy.

That was enough. The replay test kept the bug fixed, and the release check gave the team a quick manual stop point.

That is often what small teams need: let AI handle routine checks, let people design the risky ones, and let production show you where the blind spots still are.

Mistakes that leave blind spots

Quality usually does not collapse all at once. It erodes in the parts of the system people stopped watching.

One common mistake starts early: asking AI to write tests before anyone names the risky paths. AI can draft plenty of useful checks, but it cannot decide which failure would hurt the business most. If nobody marks billing, permissions, data export, migrations, or login as high risk, the suite fills with easy cases and misses the paths that can do real damage.

Flaky tests create another blind spot. They make the dashboard look busy, then teach people to ignore failures. A test that fails at random is not helping. It trains the team to rerun jobs until they turn green and ship changes without trusting the result.

Small teams also get into trouble when one person writes a risky change, reviews it, and approves the release. That might save twenty minutes today and cost two days later. For billing logic, access control, data deletion, or infrastructure changes, a second human check is still worth the pause.

Production warnings get dismissed too often because they did not become full incidents. That is a mistake. Slow queries, retry spikes, odd error clusters, and a handful of similar support complaints usually point to a gap in coverage. If the team waits for a public outage, the lesson gets much more expensive.

The wrong metric makes all of this worse. Test count is easy to brag about, but it says very little. A smaller set of stable tests that catches bad releases is better than hundreds of checks nobody trusts.

Watch the numbers that matter: how many defects reached production, how often flaky tests fail without a real bug, whether the top-risk areas have direct coverage, and how fast the team notices and fixes warning signs. If those numbers improve, the QA plan is getting sharper. If only the total test count grows, the team is probably adding noise.

Quick checks before you reduce the team

Fix Release Ownership
Map each check to an owner, a trigger, and a real action.

Cutting headcount before you answer a few plain questions is how small defects turn into customer outages. A smaller engineering team can work well, but only if each release check has a clear owner and a clear reason to exist.

Run a short audit on the last month of real work, not on theory. Look at actual release steps, the last few incidents, and the checks people skip when deadlines get tight. That shows what the process really depends on.

Put one name next to every release check. If nobody owns it, it will be skipped or everyone will assume someone else handled it. Write down the user flows that would hurt most if they broke. For most SaaS products, that means signup, login, payment, data save, or export. Review each recent incident and force a follow-up: add a test, add a monitor, or write down why neither would have helped. If incidents do not lead to change, the same bugs come back.

Make the handoff visible for new engineers too. They should be able to see, in one place, which AI drafted tests are ready for review and which human checks still need judgment. Delete checks that never change a release decision. A manual pass that has found nothing for a year might still matter, but many old checks survive by habit alone.

This audit usually exposes the same problem: teams keep too many low-value checks and too few checks around risky flows. AI can cover repeatable paths quickly, but people still need to decide what failure matters, which edge cases deserve attention, and what production signal should wake someone up at 2 a.m.

Use one plain standard for every check: owner, trigger, action. Who looks at it, when does it run, and what happens if it fails? If the team cannot answer that in one sentence, the process is still too vague. Fix that before you reduce the team.

What to do next

Pick one user flow that would hurt most if it broke. For many teams, that is signup, billing, login, or the action customers pay for. Start there this week.

Write that flow down on one page. Note the happy path, the common failure points, and the signals you already watch in production. Keep it plain. If a new engineer cannot read it in five minutes, it is too long.

Then make a short policy for AI drafted tests and human approval. Keep it strict enough to remove guesswork. Let AI draft repetitive tests, fixtures, and edge cases based on a real spec. Ask a person to approve test intent, risk coverage, and any test that can block a release. Require a human to design checks for unclear requirements, money movement, permissions, and data loss. Mark every test with an owner so stale checks do not live forever.

That policy does not need legal language. Half a page is enough if people can follow it without debate.

After that, review incidents every sprint. Look at bugs that reached users, false alarms that wasted time, and checks that never catch anything. Remove dead checks. Add one test or alert for each real failure that slipped through. Over a month or two, this works much better than dumping hundreds of AI drafted tests into the suite and hoping coverage improves.

If you want a second opinion before you cut the team, Oleg Sotnikov offers Fractional CTO advisory through oleg.is. His work focuses on AI-first software development, release process design, and lean engineering operations, so this kind of QA reset fits the problems he already helps teams solve.

Frequently Asked Questions

Why should we change the QA plan before shrinking the engineering team?

Rewrite it first because a smaller team loses reviewer time and clear ownership fast. If you keep the old process, the rituals stay but the safety drops, and people stop trusting green test runs.

Where should we start when we rebuild QA?

Start with one flow people use every day and that would hurt the business if it broke, like signup, billing, login, or import. Pull recent incidents, support tickets, flaky tests, and rollback notes for that flow so you fix a real problem instead of guessing.

How do we decide what needs the most QA attention?

Rank work by business damage, not by whether it is manual or automated. Give the most attention to flows that can block revenue, expose data, or break customer trust, and spend less time on checks nobody uses to make release decisions.

What kinds of tests can AI draft well?

AI drafts repeatable work well when the rules are clear. It can save time on unit tests for validation and pricing rules, API cases with stable contracts, and regression tests based on bugs you already fixed.

What still needs human QA design?

People still need to define what must never happen. Billing, refunds, permissions, account deletion, data loss, and messy cross-system failures need human judgment because someone has to decide the right outcome, rollback plan, and release risk.

How should we use real bugs to improve the test suite?

Feed the model real failures, the related ticket, and the final fix. That gives you a much better draft than a broad request for coverage, but a person should still remove weak cases, merge duplicates, and add the business-rule checks the draft missed.

What production signals should we pay attention to?

Watch support tickets, repeated errors, retry spikes, odd payloads, slow paths, and usage drops by feature. Production shows you gaps that pre-release testing missed, especially around timing issues and real user behavior.

What should we do with flaky tests?

Treat flaky tests as noise until you fix or remove them. If a test fails at random, people stop trusting the suite and start rerunning jobs until they get green, which defeats the whole point of the check.

Which release checks should always keep human review?

Keep human review on money movement, permission changes, data deletion, and infrastructure changes. For risky releases, make sure someone can explain rollback steps and that monitoring can show failure within minutes.

How do we know the new QA process is actually working?

Use a few plain numbers: escaped defects, flaky failures without a real bug, coverage on top-risk flows, and how fast the team spots and fixes warnings. If those improve while review time stays reasonable, the process works better than before.