Nov 21, 2024·7 min read

Pairing a staff engineer with an assistant without losing judgment

Pairing a staff engineer with an assistant works when the engineer sets limits, reviews outputs, and owns tradeoffs before release.

Why speed creates new mistakes

Fast output can fool a team. Code that looks polished often feels finished before anyone has proved it works in the real product.

An assistant can draft a handler, migration, or test suite in minutes. That speed changes how people judge quality. Instead of asking whether the work fits the product, they start asking whether it looks done.

That creates a different kind of error. The first draft often becomes the template for the next few tasks. If that draft handles an edge case badly, names a concept the wrong way, or skips a quiet business rule, the mistake spreads through the codebase.

Speed also hides uncertainty. When an assistant returns three clean options, teams often assume one of them must be safe. In reality, all three can share the same blind spot because they came from the same prompt and the same missing context.

Tools also miss rules that live in people's heads. A repo and a ticket rarely explain why one customer has a special exception, why finance closes orders in two steps, or why support needs a field that seems useless to engineers. The code can look neat, pass tests, and still break a real workflow.

A checkout example makes this obvious. A team asks an assistant to update annual plan logic. The draft calculates totals correctly and even adds tests. But it misses the refund rule for mid-cycle upgrades, so the team ships code that looks complete and creates a billing mess a week later.

This is why working with an assistant only helps when a human keeps ownership of judgment. The staff engineer still decides which constraints matter, which tradeoffs the team can accept, and where speed is too risky. The assistant makes options cheaper to produce. It does not make the final call.

Good teams treat fast drafts as proposals, not answers. Their review gets sharper, not longer. They check assumptions early, catch small errors before others copy them, and keep product judgment with the person who understands the whole system.

What the human should own

When a staff engineer works with an assistant, the human still owns the hard calls. Speed is useful, but speed without a clear owner turns into cleanup later.

Start by writing the goal in plain language before any draft begins. Skip prompts like "improve the service." Say what should change, who it helps, and what must stay the same. "Cut signup drop-off without changing billing logic" gives the assistant a clear lane and gives the reviewer something concrete to check.

The human also sets the limits. Budget, risk, deadline, and team capacity shape the right answer more than code style does. If the team has two days, no budget for new services, and no room for downtime, the assistant should not drift toward a rewrite. Good engineering often means choosing the smaller move on purpose.

Product rules, security rules, and data rules stay with the human owner. The assistant can suggest changes, but it should not decide how customer data gets stored, what actions need approval, or whether a shortcut is safe enough. Those choices carry business cost, legal risk, and trust risk.

A simple split helps. Let the assistant handle bounded work such as drafting tests for current behavior, proposing a small refactor inside one module, writing release notes, or summarizing logs and errors. It should stop and ask before touching authentication, payment logic, permissions, data retention, public APIs, or anything that could raise costs later.

Final approval also stays with the human, even if the assistant did most of the typing. That approval should mean more than "looks fine." The owner should check the tradeoff that matters most for the task. Sometimes that is latency. Sometimes it is support load, rollback risk, or plain readability.

A short written decision helps. One or two sentences are enough: what changed, what risk you accepted, and why this version is the right choice for now.

How to set guardrails before work starts

Speed goes up quickly when a team starts using an assistant. Drift goes up quickly too. The fix is simple: set the limits before anyone writes code.

Start with one sentence that defines the job. Keep it plain. "Add retry logic to failed webhook delivery without changing the public API" is much better than a long brief full of side notes. A short task statement gives the engineer something to judge against when the assistant starts suggesting extra work.

Then separate the problem into two buckets: what must stay the same and what may change. That sounds basic, but it prevents expensive rework.

Public API shape, audit logs, data rules, latency budgets, and release dates often belong in the first bucket. Internal function names, query structure, retry timing, test layout, and logging detail usually belong in the second.

Those boundaries matter because assistants often optimize for local neatness. Teams care about bigger tradeoffs. A cleaner refactor is not a win if it breaks support docs, changes response fields, or adds one more thing to maintain.

Ask for options before asking for code. Two or three approaches are enough. Each should include the upside, the risk, and what stays untouched. That pushes judgment to the front, when changes are still cheap. It also makes review easier because the team has already agreed on why one path won.

A staff engineer should also write down any non-negotiables that sit outside the ticket. Maybe the team cannot add a new service. Maybe this part of the product has a messy dependency and nobody wants to widen the blast radius a week before release. Those facts rarely appear in code, but they shape good decisions.

Save the chosen rules where the whole team can see them. Put them in the ticket, design note, or pull request template. If the rules live only in one person's head, the assistant will miss them and reviewers will end up debating things that were already settled.

This usually takes ten minutes. It can save hours of backtracking.

How to review the work without slowing everything down

Start with the shape of the change. Check which files moved, what logic changed, and whether the assistant touched data models, permissions, or user flows. If the overall shape feels wrong, a slow review line by line will not rescue it.

The human review should test the work against real product rules, not just code style. A change can look clean and still break billing logic, approval steps, rate limits, or audit needs. Most bad changes fail because the assistant made an assumption too early.

Read the tests before reading every branch of the code. Tests show what the assistant believed mattered most. Then ask what is missing. Teams often miss old accounts, partial failure, retries, empty states, and users who do things in the wrong order.

A short review loop works well. Ask what changed for the user or support team, which assumption could fail in production, what proof the tests actually give, what still has no proof, and what you can undo quickly if it goes wrong.

Ask why the assistant chose this path. Why add a new helper instead of changing the current one? Why move this work into the background? Why cache this result? The answer shows whether the assistant made a real tradeoff or simply copied a nearby pattern.

Keep risky work separate from safe cleanup. Renames, formatting, and dead code removal should not travel with a behavior change. Smaller review batches move faster, and they make judgment errors easier to catch before users do.

How to handle tradeoffs on purpose

Give Small Teams Structure

Turn AI assisted coding into a repeatable process instead of a guessing game.

Start now

Tradeoffs need names. If nobody says them out loud, speed starts winning by default, and that is usually when teams create next month's cleanup.

Some parts of the job should move fast. Drafting tests, writing repetitive glue code, and cleaning up routine files are good places to save time. Design changes, security rules, migration plans, and anything hard to undo need more care. A staff engineer should draw that line before work starts, not after something breaks.

Time saved now always shows up as a cost somewhere else. Sometimes that cost is fine. A quick fix for an internal admin page might be worth a little cleanup later. The same shortcut in billing logic is a bad bet.

Small problems usually deserve simple fixes. If a bug affects one edge case, a clear patch is often better than a clever rewrite that touches five systems. Predictability matters more than novelty on a busy team. Elegant ideas can wait. If the team is shipping under pressure, the better choice is usually the one more people can read, review, and fix at 2 a.m. without a long handoff.

It helps to ask a few direct questions: if we choose the fast option, what cleanup will we owe later? If we choose the careful option, what do we delay today? How hard will this be to change in a week? Who carries the cost if this choice turns out wrong?

Write the decision down in plain language. One or two lines are enough: what you chose, why you chose it, and what you rejected.

This is also the kind of judgment Oleg Sotnikov often talks about in his AI-first CTO work: use assistants to remove repetitive effort, but keep human review on choices that affect reliability, cost, and future complexity. That split is what keeps speed useful.

A simple example from a product team

A product team wants to add usage limits to a paid feature. On paper, the rule sounds simple: customers on the basic plan can run it 100 times a month, and customers on the higher plan get more. The assistant quickly drafts the first pass, including schema changes for monthly usage tracking and API checks that block requests after the limit.

The draft looks clean. It adds a counter, resets it each month, and returns a clear error when a customer goes over. This is often the easy part. The assistant can write the obvious path in minutes.

Then the staff engineer spots what the draft missed. Billing does not follow the simple rule in the code. Some customers already have older contracts with custom limits, and a downgrade does not cut them off in the middle of a billing period. The draft would enforce the new rule immediately and create support tickets on day one.

So the team changes the plan. Instead of shipping the full system at once, they split it into two releases. The first release only records usage and applies limits to new accounts on the new pricing rules. Older accounts keep their current behavior for now.

The second release comes later. That one cleans up old plan data, adds the billing exceptions, and moves all accounts to one shared limit check. It is slower than the assistant's original draft, but much safer.

Before release, they also write a rollback plan. They keep enforcement behind a feature flag, make the migration additive so old code still works, keep usage logging on even if they turn enforcement off, and note who can disable the change and how quickly they can do it.

The assistant helped produce working code. The staff engineer protected the business rules, the rollout order, and the escape hatch. Speed does not replace that part.

Common mistakes that weaken judgment

Review Code With Context

Add business rules, support needs, and release limits to your AI coding process.

Start planning

Judgment usually slips in quiet ways. The code looks tidy, the tests pass, and the diff feels small. That is exactly when weak decisions get through.

One common mistake starts before any code exists: asking for a full solution before naming the constraints. If nobody says that a change must keep old API behavior, avoid a risky migration, or stay readable for the on-call team, the assistant fills in the blanks. It will still produce something polished, but it may solve the wrong problem.

Passing tests can fool people for the same reason. Tests prove that some expected cases still work. They do not prove the change is safe with messy production data, odd user behavior, or old records nobody has touched in months.

A better review asks a harder question: what did we stop checking because the tests looked clean?

Another habit that weakens review is mixing refactors with feature work in one batch. If a pull request renames files, rewrites helpers, and changes business behavior at the same time, the reviewer has to hunt for the real risk. Most people miss something.

Split those changes when you can. Let one diff change behavior. Let another clean up the structure later. That keeps review focused and makes rollback much less painful.

Teams also get into trouble when the assistant invents business rules. If nobody defines how refunds round, when a trial ends, or which users can skip a step, the assistant will guess. The guess may sound reasonable and still be wrong for finance, support, or compliance.

Staff engineers should treat missing rules as open questions, not coding prompts. Write down the unknowns. Ask product or operations to answer them. Code should follow the rule, not create it.

Skipping rollback is another mistake, especially when the change looks tiny. A one-line config edit can break signups. A small schema tweak can slow a page enough to trigger a support spike. Size does not measure blast radius.

Before merging, decide how to undo the change quickly. That might mean a feature flag, a safe revert, or a checkpoint on the data. If the team cannot explain the rollback in one minute, the change is not ready.

Good judgment is usually boring. It names constraints early, separates messy diffs, refuses invented rules, and keeps an exit door open.

Quick checks before release

Build A Safer Workflow

Create a review routine your team can repeat without slowing product work.

Book a call

Every release needs one owner. Not five people "feeling good about it." One person, usually the staff engineer or the engineer closest to the change, makes the final call to ship, wait, or revert. Shared responsibility sounds fair, but it creates quiet gaps.

This matters even more when an assistant wrote part of the code. A clean diff can still hide weak choices, missing edge cases, or risky assumptions that nobody said out loud.

Before release, write down the assumptions that could fail in production. Maybe traffic stays lower than expected. Maybe a third-party API responds fast enough. Maybe users follow the path the team expects. Once those assumptions are visible, review gets sharper.

The release check can stay simple. Decide who owns the final call, which assumptions are most likely to break first, how the team will notice failure through logs or alerts, what product and support should expect users to see, and whether rollback works in one calm step.

If the change affects user behavior, product and support need a short note before launch. Two or three sentences are enough. Say what changed, who will notice, and what confusion might show up first. That small habit saves time when the first ticket arrives.

Teams also need failure signals in place before they ship. New code should add logs for the risky path and alerts for the failure that matters most. If nobody can see the problem for an hour, the release plan is not done.

Rollback should feel boring. If reverting the change needs a meeting, a manual database scramble, and three people with shell access, the team is taking on more risk than it thinks. A good rollback plan names the trigger, the person who can act, and the exact step to return to a safe state.

The whole check is short on purpose. It does not slow the team down. It keeps speed from turning into cleanup work tomorrow.

Next steps for a small team

Small teams usually get better results by testing this on one narrow job first. Pick a task that shows up every week, like writing an internal tool, fixing small bugs, or drafting tests for a common API change. Then give that task to one clear owner. If everyone uses the assistant differently, the team learns very little and review turns into guesswork.

Keep the routine boring for two weeks. Use the same prompt style, the same review steps, and the same approval rule each time. That window is long enough to spot patterns and short enough to change course without much cost. The goal is not raw output. It is faster output that still passes human judgment.

Track a few numbers, not a dashboard. Three are usually enough: how often the team reworked the first draft, how many bugs slipped through review, and how long review took from first pass to approval.

If review time drops but bugs rise, speed is hiding a quality problem. If bugs stay flat but rework stays high, the instructions are weak or the task is unclear. When the same mistake shows up twice, change the rule right away. Add a check, narrow the task, or require the engineer to explain the tradeoff before code moves forward. Repeated errors usually point to a process problem, not a talent problem.

A small team also benefits from an outside view now and then. If the process keeps swinging between too much freedom and too much control, someone experienced can spot the pattern faster. Oleg Sotnikov does this kind of Fractional CTO work with startups and small businesses, helping teams set clearer roles, review habits, and limits for AI-assisted coding without turning the process into red tape.

Frequently Asked Questions

What should the human still own when working with an assistant?

The human owns the goal, the limits, and the final decision. That includes business rules, security choices, data handling, rollout order, and which tradeoff the team accepts.

Let the assistant produce options and draft code, but keep judgment with the engineer who understands the product, support load, and failure cost.

Which tasks fit an assistant best?

Start with bounded work. Ask it to draft tests for current behavior, write repetitive glue code, summarize logs, prepare release notes, or clean up one small module.

Those jobs save time without letting the tool invent product rules or change risky parts of the system on its own.

What work needs human approval every time?

Stop and review by hand before it touches authentication, payments, permissions, data retention, public APIs, or anything that can raise long term cost.

Those areas carry business and trust risk. A neat diff does not make them low risk.

How do I write a prompt that keeps the assistant focused?

Write the task in one plain sentence. Say what should change, who it helps, and what must stay the same.

Something like add retry logic to failed webhook delivery without changing the public API gives the assistant a lane and gives the reviewer a real target.

How can I review assistant-written code without slowing everything down?

First check the shape of the change. Look at which files moved, what logic changed, and whether the diff touches data models, permissions, or user flows.

Then read the tests to see what the assistant thought mattered. After that, ask what real product rule or messy case the tests still miss.

Are passing tests enough to trust the change?

No. Tests only prove the cases someone wrote down. They do not prove the change handles old records, partial failure, retries, odd user behavior, or hidden business exceptions.

Use passing tests as one signal, not a green light to stop thinking.

How do I handle speed versus safety?

Name the tradeoff before work starts. Decide where you want speed and where you want caution.

A quick patch often beats a clever rewrite when the problem is small and the team ships under pressure. In billing, migrations, or security, spend more time and keep the path easy to undo.

Why should one person own the release decision?

Pick one owner for the release. That person makes the final call to ship, wait, or revert.

Shared ownership sounds fair, but it often leaves gaps. One owner keeps assumptions, failure signals, and rollback steps in one place.

What makes a rollback plan good?

A good rollback feels boring. The team knows what failure looks like, who can act, and which exact step returns the system to a safe state.

Feature flags, additive migrations, and simple reverts help because they let you back out fast without a late scramble.

How should a small team start using an assistant?

Start with one narrow workflow that repeats every week. Keep one owner, one prompt style, and one review routine for a short trial.

Track rework, bugs that slip through review, and review time. If the same mistake shows up twice, change the rule right away instead of hoping the team will just be more careful.