Oct 21, 2025·8 min read

Code review bottlenecks: 3 metrics that show process drag

Spot code review bottlenecks by tracking wait time to first comment, merge age, and reopen rate so you can fix slow reviews early.

Code review bottlenecks: 3 metrics that show process drag

Why reviews get stuck

Most review bottlenecks start with silence, not disagreement. Someone opens a pull request, expects a quick reaction, and gets nothing for half a day. That gap breaks focus fast. By the time comments arrive, the author is already in another task and has to rebuild context.

That is why review work often feels slower than the ticket board suggests. A team can look busy and still leave pull requests untouched. The problem usually is not one slow person. It is a weak routine: nobody owns the first response, reviews arrive in bursts, or changes are too large to scan quickly.

Total cycle time does not show this very well. It blends coding, waiting, review, fixes, and merge into one number. If a change spent two days waiting for the first comment and only twenty minutes getting feedback, cycle time hides the part that actually hurt.

Three simple metrics make the delay visible: wait time to first comment, merge age, and reopen rate. Each one points to a different habit. Long wait time usually means reviews start too late. High merge age often means work sits in queues, bounces between people, or grows too big before anyone feels ready to approve it. A rising reopen rate can mean reviews are rushed, testing is thin, or the team merges before concerns are cleared.

Used together, these numbers keep the focus on team behavior instead of blame. If the same delays show up across many pull requests, the process needs work. That is a better conversation than asking who was slow on Tuesday.

The three numbers worth tracking

Teams often collect plenty of pull request metrics and still miss the delay people feel every day. These three usually tell the story faster than a dashboard full of charts.

Wait time to first comment is the gap between opening a pull request and getting the first human response. That response can be a review note, a question, or a clear approval. This metric shows how long work sits untouched. It can point to overloaded reviewers, poor handoffs, or fuzzy ownership. It does not tell you whether the review was any good.

Merge age is the full time from pull request open to merge. It includes waiting, review rounds, fixes, test runs, and pauses between them. This shows how long code stays in limbo before it reaches the main branch. It can reveal slow handoffs or oversized pull requests, but it cannot explain the cause by itself.

Reopen rate tracks how often a pull request needs another pass after people think it is nearly done. That can happen when requested changes trigger a new review, checks fail after updates, or fresh issues appear late. This metric points to rework. It can hint at unclear feedback, weak test coverage, or rushed approvals. It cannot prove that one reviewer or one author caused the problem.

Each number is limited on its own, which is why they work best together. A team might respond quickly but still take days to merge because reviews drag across too many rounds. Another team might merge fast but reopen often because they approve too early.

The pattern gets clearer when you read the metrics as a set. If first comment time is low, merge age is high, and reopen rate keeps rising, the team probably starts reviews on time but struggles to finish them cleanly. That is a process problem.

How to collect the data this week

Start small. One repository and one recent month give you enough signal to spot review bottlenecks without drowning in edge cases.

Pick a repo with steady activity. Skip the quietest one, and skip the repo that caught fire last week. A normal month works best because it shows the delays your team actually lives with.

Your git host already stores most of the events you need. Pull the timestamps for when each pull request opened, when the first review comment appeared, when approvals happened, when the branch merged, and when the pull request reopened after review or merge.

Do not rely only on final status. The timeline matters. A pull request can sit untouched for two days, get one fast approval, then bounce back and forth three more times.

A spreadsheet is enough for the first pass. Give each pull request one row and track opened at, first comment at, approved at, merged at, and reopened at if it happened.

Then label each row by work type. Keep hotfixes separate from normal feature work. A hotfix often gets reviewed in minutes because production is broken, and that can make your usual review flow look healthier than it is.

Do not stop at averages. One giant pull request can wreck the number for the whole month. Use the median as your baseline, then add percentiles like p75 or p90 so you can see where the delay starts to feel painful.

A median wait time to first comment of three hours may sound fine. If p90 is twenty-nine hours, many pull requests still spend a full day waiting for attention. That is where frustration starts.

Before you trust the report, open a handful of pull requests and read the event history yourself. Teams often find messy details on the first pass: bot comments counted as human review, approvals that happened after a force push, or reopened pull requests that were really follow-up fixes.

That quick manual check saves time later. Bad labels and bad timestamps turn a neat chart into a false story.

What wait time to first comment reveals

The first hours after a pull request opens tell you a lot. If an author posts work and hears nothing for half a day, the problem often is not the code. The team has a queue problem.

Long silent gaps create waste fast. Authors switch to other tasks, lose context, and come back cold. Reviewers lose momentum too. A change that could have moved in one short pass now needs more explanation, more checking, and often another round of edits.

This metric gets more useful when you break it up by time of day. Many teams think reviews are slow in general, but the delay often clusters around handoff hours. Work opened in the morning may get a comment within thirty minutes. Work opened late in the afternoon may wait until the next day. That points to staffing and scheduling, not weak effort.

Reviewer load matters too. If one person gives the first comment on most changes, that person becomes the gate. You will see it in the data. Their queue grows, everyone else waits, and routine work piles up behind them. Spread first-pass review across more people when you can, especially for low-risk changes.

Set a simple target. Routine changes should get a first comment within one or two hours during working time. Medium changes should get a first comment the same half day. Larger or risky changes should still get a quick acknowledgment, even if the deeper review comes later.

That last part matters. A short reply like "I will review this after lunch" is not full feedback, but it tells the author the work is moving.

If your team tracks only one review metric at first, use this one. Merge age can stay low for the wrong reasons, and reopen rate shows trouble later. Wait time to first comment shows delay right where it starts.

How to read merge age without blaming people

Outgrown Your Review Process
Oleg helps small teams fix delivery drag without adding extra meetings.

Merge age tells you how long a pull request stays open. It sounds simple, but it mixes several kinds of delay into one number. If you treat it as a score for individual speed, you will blame the wrong person fast.

Start by separating review time from coding time after feedback. A pull request that waits two days for the first review and then gets merged in an hour tells a different story from one that gets comments quickly but sits four more days while the author rewrites tests. Both have the same age on paper. They do not have the same problem.

Size matters too. A thirty-line fix and a two-thousand-line refactor should not share the same expectation. Group small pull requests apart from large ones, or the big ones will make the whole team look slow. Medium changes often hide the real drag because people postpone them just long enough for the queue to grow.

Also watch what happens after approval. Some pull requests get the green light, then sit for a day or two before anyone merges them. That usually points to workflow friction, not review friction. The branch may no longer merge cleanly, tests may fail and nobody owns the fix, release rules may block merges until a certain window, or nobody may know who should press the final button.

This is where bottlenecks often hide. The review is done, but the work still feels stuck.

Merge age works best when you use it to find friction in the path. If one team has older pull requests, ask what slows movement: handoffs, unclear ownership, flaky checks, or oversized changes. Do not turn it into a ranking board for developers. People will game the number, split work badly, or rush reviews.

What reopen rate says about review quality

A reopened pull request often means the review looked finished before the work was actually ready. That can happen after approval, or even after merge, when someone finds a bug, a missed edge case, or a change that broke another part of the app.

Count both cases. If work returns to review after approval, that is a signal. If work returns after merge, that signal is louder because the cost is higher and the fix usually interrupts other work.

One number alone is still not enough. Split reopen rate into two buckets: work reopened because bugs showed up later, and work reopened because the review process missed something.

The second group tells you more about review quality. If people keep reopening work because tests were thin, requirements were vague, or reviewers noticed major issues only at the end, you are not looking at random mistakes. You are looking at a pattern.

High reopen rates usually come from familiar causes. Tests cover the happy path but miss failure cases. The ticket leaves out an important rule. A reviewer focuses on style first, then asks for a large design change after two rounds of comments. That kind of late feedback slows everything down and creates the drag teams feel every week.

Use examples with the metric. If your reopen rate is nine percent, pull five reopened items and read what happened. You may find that three came from unclear acceptance criteria, one came from a missed migration step, and one came from a real production bug. That says much more than "reviews need to improve."

If most reopen cases come from bugs after merge, look at test depth and release checks. If most come from process issues before merge, look at review timing, review scope, and whether the author shared enough context early.

A simple team example

AI Review Support
Add practical AI help for review, testing, and documentation in your current workflow.

A five-person product team tracked pull request metrics for two weeks. They had three engineers, one product manager, and one design lead. The team shipped often, but people still felt review drag every few days.

In week one, small changes moved well. A copy fix, a button color change, and a small test update got a first comment in twenty to forty-five minutes. Larger changes told a different story. One billing update and one search refactor sat for sixteen to twenty-one hours before anyone left the first review note.

That gap mattered because the team treated larger changes as "manager approval required." The manager joined customer calls most afternoons, so those pull requests waited even when other engineers had time to review them.

By the end of week two, the pattern was clear. Small changes still got fast first comments. Larger changes waited close to a full workday. Average merge age rose from 1.3 days to 2.4 days. Four pull requests sat in approved status while one manager had not clicked merge. Three Friday merges reopened on Monday after missed edge cases showed up.

The reopen rate got worse for a simple reason. On Friday afternoon, people wanted to clear the board before the weekend. Reviews got shorter, testers had less time, and two bug fixes had to come back for another round on Monday morning. One engineer put it well: the work felt done on Friday, then half-done again on Monday.

The team made one small change for the next sprint. Any pull request over a set size could get final approval from a rotating engineer, not only the manager. They did not change the review tool or add new meetings.

The result to watch was straightforward. If the change worked, large pull requests should get a first comment sooner, and merge age should fall first. If Friday reopen rate stayed high, the team would know review delay was only part of the problem.

Mistakes that distort the numbers

A dashboard can look precise and still tell the wrong story. When teams track review bottlenecks, the biggest errors usually come from messy inputs and unfair comparisons.

Start with the obvious cleanup. Bots should not count as reviewers. Draft pull requests should not sit in the same bucket as work that is ready for feedback. Abandoned pull requests need their own status, or they will quietly inflate merge age and make the team look slower than it is.

A few checks prevent most bad data:

  • Remove bot comments and automated updates.
  • Exclude drafts until the author marks them ready.
  • Close abandoned pull requests after a clear rule, such as 14 or 30 days.
  • Track review rules beside the metric, such as one approval versus two.

Comparison creates the next problem. One team may require a full test pass, two reviewers, and a product check before merge. Another team may merge after one comment from a teammate. Their numbers will never match, and they should not. If you compare them anyway, you reward the looser process, not the better one.

Large or risky changes also skew the picture. A security fix, schema change, or core payment update should take longer than a typo fix. If managers punish people for higher merge age on this kind of work, engineers learn the wrong lesson. They split work in awkward ways, avoid hard changes, or rush reviews.

The worst mistake is chasing a lower number after quality starts to slip. Faster first comments mean little if reopen rate climbs because reviewers miss real problems. A quick review that sends the same pull request back three times is not efficient.

Time zones matter too. A team spread across three time zones will show different wait patterns than a team in one office. The same goes for part-time reviewers or founders who review between meetings. If your data ignores working hours, you may label a normal delay as a process problem.

Use the numbers, but keep the context attached. Otherwise the metric becomes the bottleneck.

A quick weekly check

Better Approval Rules
Remove bottlenecks caused by one reviewer or unclear final merge ownership.

A good weekly review takes about fifteen minutes. You do not need a dashboard meeting or a long debate. You need a small habit that makes bottlenecks easy to spot before they turn into team friction.

Start with median wait time to first comment. If that number jumps, reviewers are overloaded, distracted, or unsure who should go first. A few slow reviews do not tell you much, but a rising median usually means the team changed something in its routine.

Then check median merge age, but split it by pull request size. A two-day merge age for tiny changes feels slow. The same number for a large refactor may be normal. If you mix all pull requests together, the number gets muddy fast.

Next, count reopened pull requests from the past week and write down the reason in plain words. "Missed test case," "unclear requirement," and "merge conflict after long wait" tell you more than the raw count.

After that, read three outliers from start to finish. Pick one that waited too long, one that aged badly, and one that got reopened. Read the timeline, the comments, and the final fix. Patterns show up quickly. Maybe nobody reviewed small changes on Friday. Maybe one service lacks a clear owner. Maybe large pull requests keep hiding simple mistakes.

Do not leave the check with five action items. Pick one change for the next week. Rotate the first reviewer. Ask for smaller pull requests. Set a rule that someone leaves a first comment within a few working hours. Then check the same three numbers again next week.

If your team uses GitLab or GitHub, this review is easy to do by hand at first. The point is not perfect reporting. The point is to notice drag early, test one fix, and see if the numbers move.

What to do next

Start with one shared report. If it takes ten minutes to decode, people will ignore it. Put only three numbers on it: wait time to first comment, merge age, and reopen rate. That is enough to spot review bottlenecks without turning review into reporting work.

Keep the report short and visible inside the team. A lead, manager, or founder should be able to read it in five minutes and see where work is waiting and where review load is piling up.

Small rules usually work better than new meetings. For routine pull requests, set a clear review window such as same day or within twenty-four hours. That gives reviewers a target and helps authors know when a follow-up makes sense. Move review work away from the busiest reviewer before you add more process. Ask authors to flag urgent or high-risk changes early. Then check the same three numbers again after two weeks.

Reviewer load matters more than many teams admit. If one senior engineer reviews half the code, merge age will rise even if everyone is working hard. Spread reviews across the team, trim approval rules where you can, and make it normal for more than one person to handle routine changes.

If the numbers do not move, the problem may sit deeper in the process. Oleg at oleg.is works as a Fractional CTO and helps teams review code ownership, tooling, and delivery flow, which can be useful when a small or mid-sized team has outgrown its current habits.

Two weeks is long enough to learn something useful. If wait time drops but reopen rate climbs, reviews got faster but weaker. If merge age falls and reopen rate stays steady, keep the change and make it part of the workflow.