Dec 06, 2025·7 min read

Managing engineers with AI assistants after boilerplate fades

Managing engineers with AI assistants means changing sprint planning, code review, and feedback so teams judge judgment, speed, and ownership.

Managing engineers with AI assistants after boilerplate fades

What changes when small tasks disappear

Engineers can clear routine work in minutes now. Setup code, repeated edits, test scaffolds, small refactors, and draft docs no longer eat half a day.

That feels better, but it removes a signal many managers used without noticing. When boilerplate fades, effort gets harder to see. A task that once took four hours may now take forty minutes, so commit counts, ticket counts, and visible typing tell you much less.

A fast result does not mean the work was easy. The hard part often moved earlier in the process. Someone still has to define the task well, pick the right tradeoff, spot weak output, and decide when the assistant is wrong.

Review pressure changes next. Code reaches review sooner, so the bottleneck shifts from writing to checking. One engineer can open more pull requests in a day, but reviewers, QA, and product leads do not suddenly get extra time. Keep the same review habits and queues grow fast.

That makes judgment more important than raw typing speed. Strong engineers ask better questions before they generate code. They test edge cases, reject shaky suggestions, and cut work that should not exist. A weaker engineer can move fast too, but they often ship polished mistakes.

On teams Oleg advises, the pattern shows up quickly: code lands earlier, while verification becomes the expensive step. Managers need to watch who keeps quality steady as output speeds up, not who looks busiest.

The old picture of performance changes with it. Writing a lot of code is a weak proxy for progress now. Clear problem framing, safe changes, careful review comments, and good decisions during ambiguity carry more weight. If you still reward visible grind over sound judgment, you will miss the people who keep the team effective.

What planning needs to cover now

When assistants handle boilerplate, planning stops being a rough guess about effort and becomes a clear statement of intent. A manager can no longer hand out a vague ticket and expect a clean result. If the goal is fuzzy, the assistant will help produce the wrong code faster.

Start with the result. Say what should work, who will use it, which inputs matter, and what must not break. "Add export support" is too loose. "Export orders as CSV, include these six fields, keep the current filters, and finish under five seconds" gives an engineer something solid to build and check.

Ask for assumptions and risks before coding starts. That small step changes the work. An engineer might spot inconsistent data, an API with no rate limit buffer, or a generated test suite that misses awkward edge cases. You want those notes at the start, not at the end of the sprint.

It also helps to decide what the assistant can draft and what still needs a person to think it through. Let the assistant draft repetitive tests, CRUD handlers, migration skeletons, and first pass docs. Keep design choices, tricky queries, security checks, and user facing tradeoffs in human hands unless there is a good reason to do otherwise.

A useful task brief covers five things:

  • the expected result in plain language
  • the assumptions the engineer is making
  • the risks or unknowns worth checking first
  • which parts the assistant can draft
  • the rule for done

That last part is easy to skip and expensive to ignore. "Done" should include working behavior, tests, cleanup, and a short note that explains what changed and why. Otherwise teams ship fast drafts that look finished and leave a mess for review.

Leave room for the last stretch of work. Teams moving to AI first development often underestimate testing, cleanup, and handoff notes because code arrives so quickly. The draft may take thirty minutes. Verifying edge cases, removing dead code, and writing a clear note can take twice that. Planning should reflect the full job, not just the speed of the first draft.

How to plan work step by step

When an assistant can write setup code, tests, and glue code in one pass, old task plans get noisy fast. A better plan tracks the user problem, the risky decisions, and who owns the result.

A short planning flow works well:

  1. Write the user problem in one clear sentence, then add one number that tells you if the work helped. That number can be simple, like cutting report time from twenty minutes to five.
  2. Call out the parts where a person must decide. That usually includes product tradeoffs, security, data rules, naming, and anything users will notice right away.
  3. Give one engineer ownership of the final result. Other people and the assistant can help, but one person needs to answer for what ships.
  4. Put a midpoint check on the calendar before full review. Ask for a demo, a short diff, or sample output so you can catch the wrong direction early.
  5. At the end, compare the result with the original problem and the number you chose. If the team shipped more code but missed the goal, say that plainly.

This is one of the clearest changes in managing engineers with AI assistants. You spend less time breaking work into tiny boilerplate tasks and more time checking judgment, direction, and finish quality.

A small example makes it obvious. If a team wants to speed up invoice export, do not plan the work as "build endpoint, add tests, wire UI." Plan it as "reduce export time for finance staff from twelve minutes to two, without missing tax fields." The assistant can draft much of the plumbing. The engineer still decides field rules, edge cases, and how to roll the change out safely.

The midpoint check matters more than many managers expect. Fast code can send a project off course just as fast. A fifteen minute check on day two often saves a full review cycle later.

End with the same question you started with: did this solve the user problem? If the answer is only "we shipped it," the plan was too loose.

How review shifts when code arrives faster

Code can show up much faster with an assistant. Review should not speed up in the same way. The first change is simple: start with reasoning, not style.

Clean formatting can hide weak choices, copied patterns, or a missed business rule. If a reviewer spends ten minutes on spacing and naming before checking the logic, they can miss the real problem.

Good review now starts with a few direct questions:

  • What problem does this change solve?
  • Which parts did the assistant draft?
  • Where did the engineer accept suggestions with little change?
  • What assumptions might be wrong?
  • Which edge cases still need a human check?

Those questions often tell you more than the diff itself. An engineer might say, "The assistant wrote the validation layer, but I rewrote the billing logic." That is useful context. It points the reviewer to the risky part right away.

Naming still matters, but now it tells you something larger. Vague names like processData or handleFlow often mean the code hides too much. Hidden complexity is common in AI generated code because the first draft tries to satisfy every case at once. Good review breaks that apart before it spreads.

Pull requests should also get smaller. When code arrives fast, engineers tend to batch more changes into one branch. That looks efficient and usually hurts review quality. A 600 line pull request written in an afternoon can take longer to review well than three smaller ones sent over the same day.

Smaller pull requests keep reviewers sharper. They also make feedback easier to act on because the engineer can see exactly which choice caused concern.

When the same comments keep appearing, stop rewriting them by hand. Turn them into team rules, checklists, linters, or assistant prompts. If reviewers often write "cover empty input" or "use domain names, not generic helper names," add that rule to the prompt and the pull request template.

Over time, review becomes less about polishing code after the fact and more about teaching the team how to produce better first drafts.

What good performance feedback looks like now

Stress Test Your Architecture
Review product and system choices before fast output turns small gaps into production issues.

Once assistants do much of the typing, praise needs to shift as well. If one person ships twice as much boilerplate because a tool wrote it, that alone says very little. Good feedback focuses on judgment: who picked a sound approach, who spotted a bad draft quickly, and who kept the work clear for the next person.

Fast output still matters. It just is not the whole picture. Strong engineers do not accept the first answer from the assistant. They question it, trim it, test it, and fix the weak parts before those mistakes reach the team.

Many AI drafts look clean at first glance. That is why managers should notice who catches shaky logic early. An engineer who stops a flawed draft in ten minutes often saves more time than someone who pushes a bigger batch of code without checking it closely.

In each review cycle, talk about the full job, not only the diff. Testing, short docs, and handoff notes count because they show whether the engineer owned the result after the assistant produced the first version. In 1:1s, ask how people actually use assistants in their daily work. One engineer may use AI for test setup and refactors. Another may lean on it too early and miss plain errors. They can look equally fast on paper and still need very different feedback.

Keep speed and ownership separate when you give feedback. "You moved quickly" is not the same as "You made this safe to ship." When both are true, say both. When only one is true, be direct.

A good note sounds like this: "You finished the API update fast, but the first AI draft missed edge cases and you caught them late. Next time, test failure paths earlier and leave a short handoff note." That kind of feedback is specific, fair, and easy to act on.

A simple team example

A small startup team needs to update its signup flow before a product launch. The change sounds small: cleaner forms, better validation, and a shorter path from invite email to first login. In practice, the work spreads across UI code, backend checks, tests, and a pile of little fixes.

One engineer uses an assistant to draft most of the form code and a first pass of the tests. By lunch, she already has working fields, error states, and basic coverage for common input mistakes. A job that used to take two days now fits into one long afternoon.

That speed changes the manager's job. He spends less time asking when each form field will be finished and more time asking for a short risk note before review. He wants three things in plain language: what changed, what the assistant wrote, and where the engineer feels least sure.

The note is brief but useful. It says the invite flow changed, old users might hit account matching issues, and mobile Safari still needs a manual check. That gives review a better focus. The team spends less time on naming and boilerplate, and more time on user paths, assumptions, and failure cases.

They ship faster than usual. QA still finds one miss: an invited user pastes an email address with a trailing space, and the signup flow rejects it at the final step. The generated tests covered clean input and obvious bad input, but not the messy copy and paste case real users create all the time.

The manager does not treat that bug as proof that the assistant failed. He treats it as a planning gap. In the next sprint, the team adds four small habits: a fifteen minute check in the middle of the week, a review checklist for risky user flows, one manual test on mobile before QA starts, and a note on rollback steps for signup changes.

Those changes do not slow the team much. They save rework and make review calmer because everyone knows what to check.

Mistakes managers make early

Get a Fractional CTO
Work with Oleg to tighten planning, review, and delivery as your team uses AI more.

Old habits break fast once assistants handle boilerplate. The first mistakes usually come from trusting polished output too much and measuring the wrong things.

A common one is counting lines of code as progress. An engineer can produce five files before lunch with a decent prompt. That does not mean they understood the problem, picked a simple design, or protected the risky parts of the system. Sometimes more code means they accepted too much generated noise.

Another trap is overrating the best prompter on the team. Prompt skill helps, but judgment still decides who you can trust. If someone gets fast drafts from the tool but misses edge cases, cannot explain tradeoffs, or keeps pushing messy changes into review, the speed is mostly theater.

Clean drafts also tempt managers to skip manual checks. That is where trouble starts. AI often writes code that looks calm, consistent, and tidy while hiding weak error handling, thin tests, or bad assumptions about real user behavior. A human still needs to read the risky paths and ask, "What did you verify yourself?"

Feedback can stay stuck in the old model too. Managers keep praising raw output and ticket count even though assistants now do much of the typing. That sends the wrong signal. Praise engineers who frame the problem well, catch flaws early, cut unnecessary work, and know when to reject a slick draft.

The timing of questions changes as well. If you wait until final review to ask basic questions, you waste days. Ask early what changed for the user or the team, which parts the engineer checked by hand, where the assistant could be wrong, why this approach is simpler than the alternative, and what would break first in production. Managers who ask those questions early get fewer surprises, shorter reviews, and a much clearer view of who thinks well in AI heavy workflows.

Weekly checks that actually help

Fix Slower Reviews First
Cut pull request bloat and give reviewers clearer context before queues grow.

Weekly management gets simpler when boilerplate shrinks. You do not need a long scorecard. You need a few checks that show whether the team is using assistants well or just producing more code.

Start with how work began. A solid task names the result, the limit, and the finish line. "Add OAuth for admin users, keep the current login working, and ship tests for the main path" gives an engineer and an assistant something clear to aim at. "Improve auth flow" does not.

By the end of the week, look for a few patterns:

  • Pick a small sample of finished tasks and check whether the expected result was clear before anyone started.
  • Make sure someone inspected assistant written code, tests, and edge cases instead of trusting the first draft.
  • Sample recent reviews and see whether comments stayed on failure risk, security, and user impact instead of spacing fights.
  • Read one piece of manager feedback and ask whether it covered choices, tradeoffs, and follow through, not only speed.

This takes about twenty minutes if you keep it tight. You do not need to inspect every ticket. You need enough evidence to spot drift early.

A bad week can look productive on paper. The team closes eighteen tickets, but half of them started with fuzzy goals, reviews focus on style, and nobody notices that assistant output introduced a permission bug. A better week may close fewer tickets, yet the work lands cleanly because the team checked the right things.

One more check helps: ask the team which repeated chore they removed or shortened that week. If the answer keeps changing from test setup to release notes to API stub writing, the gains are probably real. If nobody can name one, the team may just be generating more output and calling it progress.

What to do next

Pick one team ritual and change only that for the next thirty days. Small changes stick better than a full reset. If your team still plans work as if boilerplate takes half the sprint, start with planning. If pull requests keep growing and reviews feel rushed, start with review.

A simple trial is enough:

  • shorten planning meetings and spend the saved time on risk checks
  • require review notes on assumptions, tests, and edge cases
  • rewrite performance notes so they mention ownership, follow through, and how well someone verifies AI output
  • keep the trial period fixed, then decide what to keep based on what the team actually did

Most teams do not need a new management system. They need cleaner standards. When assistants handle repetitive code, managers should stop rewarding volume and start rewarding judgment. The engineer who spots a weak prompt, catches a bad migration, or tightens a vague task often helps the team more than the person who ships the most lines.

Your review rubric should reflect that shift. Ask whether the engineer framed the task clearly, checked the output, and closed the loop after feedback. That tells you more than counting tickets or praising speed on easy work.

Feedback notes need the same update. Write down who follows through without reminders, who leaves loose ends, and who turns fast AI output into dependable results. Those details are far more useful than broad labels like "strong contributor" or "needs to improve communication."

If the team feels faster but less predictable, a second pair of eyes can help. Oleg Sotnikov at oleg.is works with startups and small companies as a Fractional CTO and advisor, with hands on experience in AI driven software development. He helps teams review architecture, delivery habits, infrastructure, and the shift to AI based workflows.

Make one change, run it for a month, and see what gets calmer, clearer, and easier to trust.

Frequently Asked Questions

What should managers measure when AI handles most boilerplate?

Measure judgment, finish quality, and follow-through. Look at who frames the problem clearly, catches weak AI output early, keeps changes safe, and leaves solid notes for review and handoff.

How detailed should a task brief be now?

Write the result in plain language and add limits. State who will use it, what inputs matter, what must not break, which parts the assistant can draft, and what counts as done.

What work should stay with a human instead of the assistant?

Let the assistant draft repetitive code like setup, CRUD handlers, test scaffolds, and first-pass docs. Keep product tradeoffs, security checks, tricky queries, rollout choices, and user-facing decisions with the engineer.

Why does review feel slower even though coding is faster?

Because code reaches review sooner, but reviewers and QA do not get more hours. The bottleneck moves from writing to checking, so teams need smaller changes and earlier midpoint checks.

How large should pull requests be in an AI-heavy workflow?

Keep pull requests small enough that a reviewer can understand the intent fast and inspect risky paths without rushing. If one branch mixes many concerns, split it before review.

What should reviewers ask before they comment on code style?

Start with reasoning, not style. Ask what problem the change solves, which parts the assistant drafted, what assumptions might fail, and which edge cases the engineer checked by hand.

How should performance feedback change for engineers using AI assistants?

Separate speed from ownership. Praise fast delivery when it comes with testing, cleanup, and sound decisions, and call out gaps when someone ships quickly but misses edge cases or leaves loose ends.

What mistakes do managers make early with AI assistants?

Many managers trust polished output too much, count lines of code, or reward the best prompter instead of the best thinker. Others wait until final review to ask basic questions, which turns small misses into bigger rework.

What weekly checks actually show if the team is using AI well?

Check whether tasks started with clear goals, whether someone verified assistant-written code by hand, whether reviews focused on risk and user impact, and whether feedback covered choices instead of just speed. You can spot drift with a quick sample instead of a long audit.

What is the best first process change to try with my team?

Change one team ritual for thirty days and keep the test small. Good first moves include adding a short risk note before review, requiring assumption and edge-case notes in pull requests, or using a midpoint check before full review.