Jan 24, 2026·7 min read

AI code review starts before the pull request stage

AI code review works better when prompts set repo rules, task limits, and test expectations before coding starts. Use a simple setup that cuts repeat fixes.

Why review pain starts earlier

Most review pain starts before anyone opens a pull request. It begins with a loose prompt like "add user search" or "clean up the billing flow." The model fills in the blanks on its own, and that is where the repeated cleanup begins.

Reviewers then fix the same things over and over. They rename files to match the repo style, move code into the right folder, and ask for tests that should have shipped with the change. None of that work is hard, but it steals time from the part of review that matters.

By the time the pull request appears, the mess is already visible in the code, file names, and test gaps. Review can catch those choices, but catching them late is always slower than preventing them.

A lot of teams keep repo rules in people's heads. One engineer knows where shared helpers belong. Another knows how service names should look. A reviewer knows every bug fix needs a test. If those rules never make it into the prompt, the model can only guess.

That guesswork creates mixed style and extra scope. A vague request for one feature often turns into unrelated refactors, new patterns, or file placement that does not match the rest of the repo. Now the reviewer has to answer two questions at once: does the feature work, and does this change belong in the codebase at all?

Small startup teams feel this fast. When one senior engineer or fractional CTO carries most of the standards, every weak prompt turns into another round of avoidable comments. Ten minutes spent making the request clear can save an hour of review cleanup and back and forth.

What the prompt must include

A coding prompt needs more than a feature request. If you leave gaps, the model fills them with habits from some other repo. That is why review quality often starts with prompt quality, not with comments on the pull request.

Start with scope. Name the exact folders, files, and modules the task can change. If the model should only edit src/billing/* and tests/billing/*, say so. If shared/ is off limits, say that too. Small boundaries save time because the model stops making unrelated edits that reviewers later have to undo.

Then pin down your repo rules. Most teams have local patterns that only make sense inside that codebase: file names, import order, error handling, test layout, and how methods or components get named. Put those rules in the prompt, even if they feel obvious. The model does not know what "the usual pattern" means unless you spell it out.

It also helps to say what the repo should not use. Old libraries, helper functions you plan to remove, and patterns the team already moved away from tend to reappear in generated code. A short ban list prevents a lot of repeat cleanup.

Keep four things in every prompt:

which files and folders the task may touch
which naming, structure, and style rules the repo follows
which libraries, helpers, and patterns the model must avoid
what "done" means before coding starts

That last point matters most. "Done" should mean more than "the code runs." Say whether the task needs unit tests, updated types, docs, or specific edge-case handling. If the work is only complete when lint passes, old tests still pass, and new tests cover the change, write that in the prompt before the model writes a single line.

A good prompt feels a little strict. That is usually a good sign. Reviewers should spend their time on product decisions and code tradeoffs, not on telling the same model again to stop touching unrelated files or bringing back old code.

Set task boundaries before coding

Most messy pull requests start with a task that was too wide. If you ask a model to "improve the auth flow," it may touch API handlers, session logic, UI text, and logging in one pass. Reviewers then waste time sorting the real change from side edits.

Give the model one job at a time: one feature, one bug, or one refactor. It may feel slower at first, but it makes review much cleaner. Small scope gives the reviewer a simple question: did this change do the one thing we asked for?

A good prompt sets hard edges. Name the files, folders, or modules the model can edit. Name the layer too if that helps, such as "service layer only" or "frontend component only." When the model knows where it can work, it is less likely to wander into unrelated code.

The no-touch list matters just as much. Tell the model what must stay alone, even if it thinks a cleanup would help. That often includes the database schema, shared interfaces, public function names, dependency versions, config files, or formatting outside the edited files.

This cuts out a common review headache. A reviewer may approve a bug fix in five minutes, then spend twenty more spotting a renamed method, a moved file, and a quiet config tweak that nobody asked for.

Uncertainty needs a rule too. Tell the model to stop and ask when it hits missing context, conflicting patterns, or a change that crosses the boundary. If a small API fix suddenly needs a migration or a wider refactor, that is a new task.

Teams with lean staffing benefit most from this. When a few people handle product, code, and infrastructure at once, tight boundaries save real time. Ten small changes are easier to review, test, and roll back than one oversized pull request.

Keep the boundary visible in the prompt:

one concrete outcome
allowed edit area
no-touch area
stop-and-ask rule

Strict limits help the model write code that fits the task, and they help reviewers focus on the change instead of cleaning up after it.

State test expectations up front

Most review churn around tests has nothing to do with skill. It happens because the prompt never said what proof the change needs. The model writes code, skips tests, or adds the wrong kind, and the pull request turns into cleanup work.

A better prompt names the tests before coding starts. That gives the model a finish line. It also makes review much faster, because the reviewer can check whether the change met a clear test plan instead of guessing what "done" means.

Be specific about tests

Ask for the exact tests to add or update. Use real names when you can. "Update auth service unit tests" is better than "add tests." "Add checkout timeout integration test" is better still.

Say when a unit test fits and when an integration test fits. Unit tests work for small logic changes inside one function, class, or module. Integration tests fit when the change touches a database, API call, queue, file storage, or anything that crosses system boundaries.

Do not stop at "add coverage." Ask for, at minimum, one test for the main path, one for a failure path, the command used to run the tests, and the expected result after the run.

That last part matters more than many teams think. If the prompt asks for the test command and expected result, the model is less likely to hand over unverified code. A simple line like "Run pytest tests/payments/test_retry.py and expect all tests to pass" removes a lot of back and forth.

A startup repo makes this easy to picture. Say the task is to add retry logic for failed webhook delivery. A unit test should check that the retry counter increases and stops at the set limit. An integration test should check that the app retries after a real failed delivery attempt and records the error correctly. One failure path could cover a bad response from the remote service.

If your team uses one stack across many repos, write this rule once and reuse it: use the repo's current test framework, cover the main path and one failure path, and report the exact command run. That small habit cuts a surprising amount of review noise.

Write the prompt in five parts

Keep AI Work In Scope

Set scope, stop prompt drift, and keep pull requests small enough to review fast.

Start Advisory

A good prompt reads like a small job ticket, not a vague request to "fix" or "improve" something. If you want less cleanup later, give the model five things in a fixed order before it writes a single line.

Start with the user problem in one sentence. Keep it concrete. For example: users lose draft comments after a timeout, and they need to get them back without changing how the editor feels.
Add repo rules pulled from real code. Mention patterns the repo already uses, such as thin handlers, typed errors, current naming style, and how tests are named. A short example from existing code helps more than a general rule.
Define the scope and the no-go areas. Say what the model can touch, what it must leave alone, and which edge cases it has to cover. If the task is about draft recovery, say not to change auth flow, billing logic, or unrelated UI.
List the test work and acceptance checks. Name the tests you expect, the behavior that must stay the same, and the failure cases that must pass. If a change needs one unit test and one integration test, say that up front.
Ask for a short plan before any code. A four-step or five-step plan is enough. It should include open questions, likely files or modules to change, and any conflict with repo rules.

The order helps because each part narrows the next one. The user problem sets the goal. Repo rules shape the style. Scope cuts off side quests. Tests define done. The short plan gives you one cheap chance to stop a bad approach before it turns into a noisy diff.

Teams that work lean with AI usually do this on purpose. Five extra lines in the prompt can save a full review round, and that is often where better review starts to pay off.

A simple example from a startup repo

A small startup team wants one change in the login flow: when a user enters the wrong password, the message should be clearer and less cold. The last time they asked a model to "improve login UX," it rewrote half the form, renamed fields, changed button copy, and touched auth logic that nobody asked to touch. The pull request looked busy, but most of it was cleanup work for the reviewer.

A tighter prompt changes that.

Task: improve login error messages for failed sign-in.

Edit only:
- the web login form component
- the login API handler
- tests related to failed login states

Do not change:
- field names
- success flow
- copy style used in the rest of the app
- validation rules
- routing or session handling

Use the current tone of the product. Keep messages short and plain.
Add or update tests for wrong password, unknown email, and locked account.
If you find a larger auth issue, leave a comment in the code and do not expand the scope.

That prompt does two useful things. It sets the task boundary before the model writes a single line, and it tells the model what must stay the same. "Keep current copy style" matters more than people think. Without it, a model often swaps plain product text for generic app language that feels out of place.

The reviewer now gets a small pull request. One form changes, one API handler changes, and the related tests change with them. They can read the diff in a few minutes and focus on whether the new messages make sense, not on why the model decided to rename email to username.

This is where review gets easier. The model still writes the code, but the prompt blocks the usual mess before it starts. Small scope, fixed names, and clear test rules beat reviewing a surprise rewrite every time.

Mistakes that create the same cleanup

Make Diffs Easier To Read

Use a simple review process that keeps models inside the task and tests in place.

Review My Process

Most repeat cleanup in pull request review starts with a prompt that leaves too much room for guessing. A one-line request can turn a 30-line fix into a 12-file diff, and then the reviewer spends time undoing side effects instead of checking the actual change.

Where prompts go wrong

The first mistake is asking the model to "improve this" or "clean this up" with no boundary. The model then picks its own job. It may rename methods, move logic, reformat files, or swap patterns that nobody asked it to touch.

Another common miss is leaving the repo map out of the prompt. If you do not name the files, folders, and layers in scope, the model reaches into helpers, shared types, or nearby API handlers. That may feel tidy in the moment, but it creates larger diffs and more risk.

Tests cause the same problem. "Add tests" sounds clear, but it is not. The model might add one happy-path test and call it done. Review gets easier when the prompt says what good coverage means for this task, such as invalid input, one failure path, and no new network calls.

A lot of messy pull requests come from silent rewrites around the target code. The model changes names, comments, or formatting in adjacent files while it works. Those edits make the diff noisy, and they hide the one change you actually need to review.

The last mistake is skipping the short plan before coding starts. A five-line plan forces the model to name the files it will touch, the tests it will add, and what it will leave alone. If the plan looks wrong, you can stop there. That saves more time than cleaning up a bad diff later.

Small teams notice this quickly. One vague prompt repeated ten times a week can waste hours in review. Clear limits may feel strict, but they give the model less room to wander and give reviewers a diff they can trust.

Quick checks before the pull request

Cut Back And Forth

Use CTO support to cut side edits, missing tests, and vague task handoffs.

Book A Call

Most pull requests need less debate than people think. They need a fast smell test. If the change breaks the prompt rules, a reviewer should catch that before reading every line.

Use a short pre-PR check:

Compare the changed files to the task. If the prompt named one service and one test folder, the diff should stay there. Extra files often mean the model drifted past the job.
Scan names, comments, and code patterns. New code should look like it belongs in the repo, not pasted from a different project with different habits.
Check tests early. If the task changes behavior, the author should add or update the tests the prompt asked for. Missing tests usually mean the prompt was vague or ignored.
Read the task and the diff together. The change should solve the stated problem and stop there. Small bonus fixes often create more review work than they save.
Give yourself two minutes. If you still cannot explain what changed, why it changed, and how it was tested, the pull request is not ready.

This part matters because reviewers get tired fast. They can forgive a rough sentence in a comment. They should not have to hunt through unrelated file changes, renamed helpers, and silent behavior changes.

A simple startup example makes this clear. If the task says "update the signup validation and add tests," but the pull request also rewrites shared utilities and renames form fields, review gets slow right away. The extra cleanup has nothing to do with the task, yet someone still has to check it.

Good review starts here, before the formal pull request review. A clean diff tells the reviewer that the prompt had real boundaries and the author followed them. A messy diff says the team will pay for the same mistake again next week.

The best pull requests feel almost boring. They stay in scope, match the repo, include the expected tests, and make sense on a quick read.

What to do next with your team

Most teams can predict the same pull request comments before they even open the diff. That is useful. It means you already have the raw material for a better prompt.

Start by turning your best prompt into a shared team template. Do not leave it in one engineer's notes or one chat thread. Put it where everyone can copy it, use it, and improve it.

A solid team template usually has four short parts:

the task goal and what is out of scope
repo rules the code must follow
test expectations before anyone marks work as done
common review mistakes to avoid

Keep your repo rules close to the code. A short file in the repository works better than a long page nobody opens. If naming, folder structure, error handling, or test style matter, write them down in plain language and keep the file easy to scan.

Then look at your last ten reviews. You will probably find the same two or three comments again and again. Maybe the model changes files outside the task, skips tests, or ignores an existing pattern. Add those repeat comments to the prompt template so the reviewer does not need to type them again next week.

One small habit helps a lot: update the template after real review pain, not during a big process meeting. If one comment appears three times, it belongs in the prompt. That keeps the system grounded in actual work.

Review gets much easier when the team treats prompts like shared engineering rules, not personal writing style. The goal is simple: fewer surprise changes, fewer cleanup comments, and less time wasted on avoidable fixes.

If your team needs help setting this up, outside guidance can speed it up. Oleg Sotnikov does this kind of practical Fractional CTO and startup advisory work, focused on repo rules, testing habits, and lean AI-first development. More of that work is outlined on oleg.is.

Frequently Asked Questions

Why does review pain start before the pull request?

Because the model starts guessing long before review starts. If your prompt only says "add search" or "clean up billing," it will invent scope, file placement, naming, and test work from habits that may not fit your repo. Reviewers then spend time removing side edits instead of judging the real change.

What should I put in every coding prompt?

Start with five things: the user problem, the repo rules, the exact scope, the no-touch area, and the test requirements. That gives the model enough context to write code that fits your codebase instead of a generic pattern from somewhere else.

How small should the task scope be?

Keep it tight. Ask for one feature, one bug fix, or one refactor at a time. When you give the model one concrete outcome, the reviewer can answer one simple question: did this change do the job or not.

Should I tell the model which files it can and cannot change?

Yes. Name the files, folders, or modules it may edit, and name what must stay alone. That simple boundary stops a lot of noisy diffs, like renamed helpers, moved files, or config tweaks that nobody asked for.

How do I define done in the prompt?

Write "done" as proof, not as a feeling. Say whether the task needs unit tests, integration tests, updated types, docs, lint passing, or specific edge-case handling. If old tests must still pass, say that too.

What test expectations should I set up front?

Ask for the exact tests you expect, not just "add tests." A good default is one test for the main path, one for a failure path, plus the command the model should run and the result it should report. That makes review much faster.

Why should I ask for a plan before coding starts?

Before any code, ask for a short plan with the likely files, the tests, and any open questions. You can spot a bad approach in five lines and stop it early instead of cleaning up a messy pull request later.

What should the model do if it finds a bigger issue?

Tell it to stop and ask. If a small change suddenly needs a migration, shared interface update, or a wider refactor, treat that as a new task. That rule keeps one request from turning into a surprise rewrite.

How can reviewers catch prompt drift fast?

Use a quick smell test before deep review. Check whether the changed files match the prompt, whether the new code looks like the repo, and whether the expected tests are there. If you still cannot explain the diff in two minutes, send it back.

How do we turn repeat review comments into a team process?

Take your last few reviews and look for repeated comments. If you keep asking for the same fixes, put those rules into a shared prompt template. Over time, the template should cover scope, repo rules, test expectations, and common mistakes.