Jan 01, 2025·7 min read

PR handoff rules for AI code in real software teams

Set clear PR handoff rules for AI code so assistants draft safely, humans finish risky work, and teams know who approves each change.

Why AI-written PRs create confusion

Teams run into trouble when a pull request has real code but unclear ownership. One person clicks "generate," another pastes the patch, and the reviewer assumes the author checked the logic. By the time comments start, nobody fully owns the change.

The problem gets worse when the PR looks like normal human work. If the assistant wrote the query, the test, and half the refactor, but nobody marks those parts, reviewers can't tell where they should slow down. They end up checking style and naming while missing the harder question: who actually understands this code?

AI can produce code that looks finished before anyone has thought it through. It often matches local patterns well enough to pass a quick scan, even when it adds edge cases, hidden assumptions, or extra complexity. A team merges it because it "seems fine," then loses half a day chasing a bug nobody can explain.

Reviewers feel that pressure first. They may not want to block a PR just because an assistant helped. But they also should not become the first person to reason through unfamiliar logic line by line. Once review, authorship, and verification blur together, the whole process gets sloppy.

Small teams feel this even more. Startups move fast, and AI makes that speed hard to resist. Without clear handoff rules, a team can drift from "a person used a tool" to "the tool shaped the change, and no person owns the result."

The risk is bigger than bad code. It is weaker accountability. When production breaks, "the model suggested it" does not help the on-call engineer, the customer, or the founder who asked for the change.

Clear rules solve most of this. They do not ban AI, and they do not turn every task into a process fight. They simply define when assistant output is a draft, when a human must finish the work, and when a reviewer should ask for deeper checks before merge.

That clarity makes review calmer. The author knows what they must verify. The reviewer knows what to inspect. The team stops treating AI-generated code like a shortcut that removes ownership.

What the assistant should handle

Give the assistant a narrow job. It can draft code for small, low-risk changes with a clear boundary: renaming a field, adding input validation, fixing a simple bug, updating one query, or wiring one extra UI state. If the task is easy to describe in a few sentences and easy to test in one place, the assistant can take the first pass.

It often helps even more around the code than inside it. Local work is fine when it stays close to the change and does not spill into unrelated files or larger design decisions. Good uses include:

drafting unit tests for the exact function, handler, or component being changed
suggesting comments where a rule is easy to miss
cleaning up a small refactor inside the same file or module
writing PR notes that explain behavior and edge cases

The boundary matters. Letting an assistant patch a validation bug in a checkout form and add two tests is reasonable. Letting it redesign payments, change pricing logic, or touch settlement code is a different class of risk.

The assistant should also expose its assumptions in the PR notes. If it assumed a field can be null, guessed that an old endpoint still needs backward compatibility, or inferred that a 30-second timeout is acceptable, the reviewer should see that in plain language. Hidden guesses create messy reviews.

Some inputs stay off limits. The assistant should not use secrets, production credentials, customer records, private tickets, or raw logs with personal data. Work from mocks, redacted examples, and test fixtures instead. It should not open env files, paste tokens into tools, or inspect live production data to "figure things out."

That line preserves the speed benefit without turning a draft into a security problem.

When a human must take over

An assistant can draft a lot of code. It should not finish every pull request. Once a change can affect money, access, or permanent data, a person needs to own the final part of the work.

Hand the change to a human when the PR touches:

login, roles, tokens, sessions, or permission checks
billing, pricing, invoices, subscriptions, refunds, or usage limits
queries that expose private records or cross-account data
database schema, migrations, backfills, or deletion logic
shared architecture such as common libraries, APIs, queue contracts, or deployment flow

That rule is simple because it works. If the code can lock users out, charge the wrong customer, leak data, or break several services at once, the assistant can still suggest a patch, but a human should finish it.

The same handoff applies when the task itself is unclear. If the bug report is vague, the logs point in two directions, or the product spec conflicts with old behavior, stop AI-only work early. The assistant can summarize what it found, list open questions, and propose a few safe options. A developer or tech lead should choose the path.

Schema changes need extra care because they are hard to undo under load. Deletion logic has the same problem. One wrong condition can wipe records, orphan data, or make an audit trail useless. These changes need someone who understands the app's history, not just the file being edited.

Shared architecture is another bright line. A small change in a common auth helper or event format can ripple through half the codebase. In that case, move ownership to the engineer responsible for that area, even if the assistant wrote the first draft.

The point is not to ban assistant output. The point is to set a clear stop point. Let the tool draft, test, and explain. Let a human make the final call when the risk goes beyond one isolated fix.

A simple handoff flow

This process works well because it is short.

Tag the task by risk before anyone opens a branch. A short note in the ticket is enough. Copy fixes and tiny refactors are low risk. Auth, billing, permissions, migrations, and anything that can corrupt data start as high risk.
Ask the assistant for a draft PR, patch, or test idea. Do not ask for a final answer. Ask for assumptions too, so weak spots are visible.
Have a human run the code locally, test normal paths, test ugly paths, and finish the parts the assistant often misses. That usually includes validation, error messages, cleanup, naming, and small diff edits that make review easier.
Mark the PR clearly. Note which files had AI help, which ones the human rewrote, and where the logic changed during cleanup.
Send the PR for review only when one human can explain every change, why it exists, and what could still break.

This keeps ownership clear. The assistant adds speed. The human supplies judgment.

A startup can keep the process light. If the task is "add a missing null check in an internal admin page," the assistant can draft it first and the engineer can finish it in ten minutes. If the task touches invoice totals, the engineer should write or rewrite the risky parts by hand, then use AI for tests, comments, or boring glue code.

Teams that skip the middle step end up reviewing machine guesses instead of code someone actually understands. That is where confusion starts, and it slows everyone down.

Who owns the result

Get Help With AI Adoption

Move your engineers to practical AI-first development with sane guardrails.

Start Planning

Every PR needs one human owner, even if an assistant wrote most of the code. Put one person's name on the PR as the accountable author. That person answers review comments, fixes issues, and owns the result after merge if the change breaks something.

This clears up a lot of confusion. AI can propose code. It cannot carry responsibility.

The reviewer has a different job. The reviewer checks whether the code is clear, whether the tests match the change, and whether the PR stayed within the agreed scope. Scope creep matters more with AI because a small prompt can turn into a much larger change than anyone asked for.

For normal changes, one author and one reviewer are enough. For security-sensitive or production-impacting work, add a second approver. That second person should look at risk, not just style. Changes to auth, payments, permissions, infrastructure, migrations, or data deletion deserve that extra step.

A simple policy is usually enough:

one human is the accountable author on every PR
one reviewer checks code quality, tests, and scope
a second approver signs off on security or production-risk changes
nobody merges their own PR without the required approval

Keep the rule short and boring. If the team needs a flowchart to merge a small fix, people will skip it the first time they are under pressure.

A useful test is this: if a bug shows up a week later, can the team tell who approved the design, who checked the code, and who accepted the risk? If not, tighten ownership before you add more AI to the workflow.

A realistic example

A product dashboard has a small bug in its settings form. A user pastes an email address for daily reports, leaves a few spaces at the end, and the form saves it as-is. The next job that sends reports fails because the stored value is dirty.

The assistant gets a narrow task: fix the validation, keep the change inside the form, and update tests. It drafts a pull request that trims the input before save, blocks submit when the email is empty after trimming, and shows a short error message under the field.

It also updates the tests. One test covers a normal email with trailing spaces. Another checks that a field with only spaces does not save. A third makes sure the old saved value still shows correctly when the page loads.

That draft is helpful, but it is not ready to merge. A human engineer reads the patch and spots a few rough edges right away.

The helper name is vague, something like checkFieldValue, so the engineer renames it to isValidReportEmail. The assistant also missed one edge case: the API can return null for older accounts, and the form crashes if that value goes straight into the input. The engineer fixes that and checks that the form still works when the user never edits the field.

Then the engineer tests the behavior in the browser. They paste an email with spaces, clear the field, reload the page with an older account, and confirm the error message disappears as soon as the input becomes valid. That manual pass matters because UI bugs often hide in the gap between passing tests and actual clicks.

The reviewer signs off last, but the review stays tight. They do not ask whether the form should support multiple emails or a new alert schedule because that was not the ticket. They check that the PR changed validation, tests, and naming cleanup only, and that the original bug is gone.

That is a healthy handoff. The assistant drafted the first pass. The human finished the parts that needed judgment. The reviewer confirmed that the change stayed inside scope.

Mistakes that cause trouble

Protect High Risk Changes

Get advice on approvals for auth, billing, data, and shared code.

Review Risk

Teams get into trouble when they treat AI help like a minor detail instead of part of the change record. If a pull request used an assistant, the PR description should say so in plain words. Hiding that fact weakens review because reviewers do not know where they should slow down, re-check assumptions, or read generated code more carefully.

Small fixes often turn messy for another reason. Someone asks the assistant to fix one bug, and it also renames helpers, rewrites tests, and touches files that had no problem. That extra refactor can look neat on a green diff, but it makes the real change harder to inspect. For a narrow fix, keep the scope narrow.

Green tests do not prove the change is safe. They only prove that the current test suite did not catch a problem. AI-generated code can still break edge cases, raise cloud costs, weaken security checks, or add logic that passes for the wrong reason. A human still needs to check intent, side effects, and failure paths.

Ownership also breaks down faster than teams expect. One person prompts the assistant, another skims the code, someone else merges it, and nobody feels fully responsible for the result. That is how bad code reaches production. One named human should own the PR and say, "I read this, I understand it, and I accept the risk."

Risky changes also need rollback notes before merge. This gets skipped all the time, especially when the code looks simple. But if the PR touches billing, auth, data migration, background jobs, or shared infrastructure, the team should know how to undo it quickly. A short rollback plan in the PR can save a lot of time when production starts acting strangely at 4 p.m. on a Friday.

Most avoidable messes come down to the same few problems: unclear disclosure, loose scope, weak human review, blurry ownership, and no written way back.

Quick checks before merge

Make Code Reviews Calmer

Build a process where every change has a clear owner and clean scope.

Improve Reviews

The merge button should never be the moment when the team guesses what a PR actually does. If a human cannot explain why each file changed, the PR is not ready.

Use a short checklist:

a human owner can explain the purpose of every changed file in plain language
the PR notes say where AI helped, such as drafting tests, suggesting a refactor, or writing an initial patch
tests cover the changed behavior, including at least one failure case or edge case
the diff still matches the original task
the approver reviewed the final version, not an earlier draft

That last point matters more than teams expect. AI-assisted PRs often change several times in a row. A developer accepts one suggestion, asks for another, then cleans up the code by hand. If the approver only saw version one, the final merge did not get real review.

A simple example makes this clear. Suppose an assistant drafts a patch for a billing bug. The PR fixes the bug, but it also changes a shared date utility and updates three unrelated snapshots. If the owner cannot explain those extra edits, stop and trim the diff before merge.

PR notes should be specific. "AI assisted" says almost nothing. "AI drafted the first test file and suggested the cache invalidation change; the developer rewrote the final handler" is much better. That gives reviewers context without turning the PR into a confession.

If one of these checks fails, do not merge and promise to clean it up later. Fix the notes, cut unrelated changes, add the missing test, or get final review first. That usually takes minutes and saves hours later.

Start small and adjust

Do not rewrite your whole engineering policy in one pass. Start with one repo, or one team that already uses AI often. That gives you real pull requests to study and keeps mistakes cheap.

A small pilot tells you more than a long debate. If one team writes backend APIs every week, begin there. If another team works on billing or auth, leave that area out until the rules feel stable.

Your first draft can stay simple:

define what the assistant can open on its own
mark the kinds of changes that need a human to finish the work
name who approves the PR and who owns bugs after merge
write down the few checks every PR must pass before approval

Run that version for two weeks. Then review a handful of PRs, not just the clean ones. Look at the cases where the assistant guessed wrong, changed too much, or created extra work for reviewers.

Tighten the weak spots you find. If reviewers keep asking for missing tests, make tests mandatory for assistant-written code. If ownership still feels blurry, add one rule that says a named engineer takes responsibility before merge. Short rules usually age better than long ones.

Track a few numbers after rollout. Rework rate is useful. Review time is useful too. Production issues matter most, because a fast PR process is not worth much if defects show up a day later. Even a simple weekly sheet can show whether the policy helps or just adds noise.

Most teams do not need a perfect policy. They need one people can follow on a busy Tuesday. Clear handoff points, clear ownership, and a short review loop are enough to start.

If your team wants outside help setting those rules, Oleg Sotnikov at oleg.is works as a Fractional CTO and startup advisor on practical AI-first development workflows. That can help when you need a sane process for code review, handoffs, and automation without turning the team into process police.

Frequently Asked Questions

What kinds of pull requests should AI draft?

Let AI draft small, local changes that stay inside a clear boundary. Good examples include trimming input, adding a null check, fixing a simple bug in one module, or writing tests for one function. If the task fits in a few sentences and one person can test it quickly, AI can take the first pass.

When does a human need to finish the work?

A human should step in when the change can affect money, access, private data, or shared systems. That includes auth, billing, permissions, migrations, deletion logic, and common libraries. If the task feels unclear or the bug report points in two directions, stop the AI-only flow and let a developer choose the path.

Do we really need to say that AI helped in the PR?

Yes. Note it in the PR description in plain language so reviewers know where to slow down. That helps them check assumptions, scope, and odd logic instead of treating the change like normal hand-written code.

Who owns an AI-assisted PR after merge?

One human owns the result every time. That person answers review comments, explains every changed file, fixes issues, and carries the post-merge responsibility if the code breaks. AI can draft code, but it does not own outcomes.

What should reviewers look for in AI-written code?

Reviewers should check intent, scope, tests, and risky assumptions. They should not become the first person to reason through unfamiliar logic line by line because the author skipped that work. If the owner cannot explain the change clearly, send it back.

Are green tests enough to trust AI-generated changes?

No. Passing tests only show that the current test suite did not catch a problem. AI can still add bad assumptions, miss ugly paths, or change more than the ticket asked for, so a human still needs to read the code and click through the real behavior.

Should AI ever use secrets or production data?

Keep the assistant away from secrets, live credentials, customer records, private tickets, and raw logs with personal data. Use mocks, redacted examples, and test fixtures instead. If the model needs production data to guess the answer, the team should solve the task another way.

Do high-risk PRs need more than one approver?

For auth, payments, permissions, infrastructure, migrations, or data deletion, add a second approver. One reviewer can miss a risky side effect in those areas. A second set of eyes helps catch issues that style review will not find.

What should the PR notes include?

Write what the assistant did, what the human rewrote, and any assumptions that still matter. A short note like "AI drafted the first tests and suggested the handler change; the developer rewrote validation and null handling" gives reviewers useful context. Vague notes like "AI assisted" do not help much.

How can a team adopt these rules without slowing down?

Start with one repo or one team that already uses AI often. Keep the rules short: mark risk early, let AI draft low-risk work, require one human owner, and ask for final review on the finished diff. After a couple of weeks, review real PRs and tighten only the parts that caused trouble.