Refactor budget in sprints without constant roadmap fights
A simple way to set a refactor budget in sprints, protect feature delivery, and stop cleanup work from turning into a weekly debate.

Why this becomes a sprint fight
Every sprint starts with the same pressure: ship the thing customers can see. A new checkout step, a reporting fix, a small feature for sales. Those items are easy to defend because they have a clear owner and a clear date.
Cleanup work rarely looks that neat. Renaming a messy module, removing duplicate code, fixing flaky tests, or untangling a slow database query sounds optional when the board is already full. On paper, the product can still move forward without it, so it gets pushed.
The cost shows up a few weeks later. A change that should take half a day takes two days because the code is harder to touch. A bug slips through because nobody trusts the tests. A simple release needs extra checking because one old part of the system breaks every time. Small defects pile up, and the team keeps paying for them.
That is why this argument gets old fast. People are not really fighting about one refactor or one bug. They are fighting about time, risk, and who gets to spend part of the sprint on work that does not look urgent yet. Product wants visible progress. Engineers want to stop the codebase from getting slower and more fragile. Both sides are right.
The bigger problem is the lack of a standing rule. Without one, every cleanup task has to win its own debate. Teams end up comparing a login improvement against test cleanup, or a customer request against a dependency update, one by one. The loudest item wins.
Small teams feel this first. Two engineers can still ship a new billing option while wrestling with old payment code nobody wants to touch. But each change takes longer, reviews slow down, and surprise bugs keep appearing. The sprint looks full of feature work, yet the team moves less each month.
What a refactor budget means
A refactor budget is a fixed share of every sprint that the team protects for cleanup work. You choose the slice once, then keep using it every cycle. It might be 10%, 15%, or 20% of capacity. The exact number matters less than the habit.
The point is simple: cleanup stops fighting features ticket by ticket. Instead of reopening the same debate every planning meeting, the team agrees in advance that part of delivery time goes to removing drag.
That reserved slice is planned work. It is not "if we finish early" time, and it is not a bucket for random nice-to-haves. When teams treat cleanup as spare time, it disappears as soon as estimates slip or a stakeholder asks for one more feature.
A real budget works like any other sprint commitment. The team estimates the work, adds it to the sprint, and finishes it with the rest of the backlog. If your sprint usually holds 40 points and you reserve 6 for cleanup, those 6 points are already taken before feature planning starts.
That changes the conversation in a healthy way. People no longer have to prove that each refactor matters more than a feature request. The team already agreed that some work should reduce friction, fix weak spots, and make the next changes easier to ship.
This slice usually goes to a few familiar problems: code that slows every change, brittle areas that break during small updates, repeated manual steps in testing or release work, and confusing structures that keep tripping the team up.
Keep the rule short enough that everyone remembers it. "We reserve 15% of sprint capacity for cleanup" is enough. If the rule needs a long explanation, people will ignore it when the sprint gets busy.
Picking the right percentage
This works best when the number is boring and predictable. If you keep changing it, the team starts debating every sprint again, and the whole point is lost.
If you do not have a baseline yet, start with 10% to 20% of sprint capacity. That is usually enough to fix recurring friction without making the roadmap feel frozen. In a two-week sprint, that might mean one engineer spends about a day, or the whole team spends a small share of time on cleanup.
Start closer to 10% if feature delivery already slips almost every sprint. A bigger cleanup slice can add stress when the team is already missing commitments. Keep the scope tight and prove that the work reduces future drag.
Move toward 15% if the team still ships, but small messes keep slowing them down. Go closer to 20% when bugs, rework, and fragile code already eat feature time every week. Many teams think they cannot afford cleanup, then lose that time anyway by patching the same problems.
Once you pick a number, hold it steady for three or four sprints. That gives you enough time to see whether the budget is helping or whether the team picked a number that only sounded good in a meeting. Weekly changes hide the result.
Watch a few plain signals during that trial period. Do bug fixes interrupt planned work less often? Do estimates get a little more stable? Does the team spend less time reopening old code just to make one feature safe enough to ship?
After those three or four sprints, adjust once. If cleanup still spills into feature time, increase the slice a little. If delivery stays shaky and the cleanup items do not reduce rework, trim it back and narrow what qualifies.
Running it in sprint planning
Start with real capacity, not wishful capacity. Count who is available, subtract time off, meetings, support duty, and anything else that will eat working hours. If your team plans in points, use the same system here, but base it on what the team can finish in a normal sprint, not on the best sprint it ever had.
Then reserve the cleanup slice before anyone starts arguing for feature work. That order matters. If you debate roadmap items first, cleanup turns into a leftover bucket, and it loses every time.
Planning gets much easier when the team keeps a short, ranked debt list ready before the meeting. Five to ten items is usually enough. If the list grows too long, nobody can tell what matters now, and planning slows down.
Each item should have a clear start, a clear end, and a plain reason for doing it. "Split this risky service into two smaller modules" can work if the team can finish it in one sprint. "Improve the architecture" is too fuzzy and should stay out.
That rule matters more than people expect. If nobody can explain the problem, the impact, and what done looks like, the item is not ready. Teams often smuggle unclear refactors into the sprint because they sound smart. It usually ends with half-finished work and a fresh argument at review.
It also helps to assign an owner before the sprint starts. That person does not need to do all the work, but they keep the item moving, answer scope questions, and stop it from expanding halfway through the sprint.
What belongs in the cleanup slice
The cleanup slice should pay for work that removes repeat pain, not work that simply feels satisfying to tidy.
The best items already cost the team time, attention, or sleep. Support keeps seeing the same bug. A developer touches one old area and spends hours chasing side effects. Release day still depends on a manual step that fails often. Those are business costs, not side projects.
Good candidates usually do one of a few things: reduce a bug pattern that keeps coming back, make a fragile part of the app safer to change, add tests around code the team edits often, or remove release steps someone has to babysit.
Tests belong here when they protect busy, risky parts of the product. Coverage around login, billing, checkout, or a shared API can prevent the kind of bug that burns half a sprint. Coverage around an old feature nobody has touched in a year can wait.
Small structural fixes also fit well. If one service has grown messy enough that a tiny change takes half a day, cleanup can pay back quickly. Split an oversized function. Remove dead code. Isolate a flaky dependency. Rename confusing pieces so the next person does not have to guess.
Release work is another common target. Maybe deployments fail because one script breaks every third attempt, so a developer has to stay online and rerun steps by hand. Fixing that process is cleanup work, and it often saves more time than one more minor feature.
What should stay out of this slice matters just as much. Do not hide a large rewrite inside routine cleanup. Replacing a framework, rebuilding a core service, or redesigning the data model needs its own plan and a direct decision.
If the team can finish the work inside the sprint and explain the payoff in one plain sentence, it probably belongs here.
A simple example from a product team
A six-person product team runs two-week sprints. For months, they kept having the same argument. Product wanted every slot for new work, and engineers kept asking for time to clean up old code. Nothing felt big enough to win the roadmap debate on its own, so the mess stayed in place.
They changed one planning rule. Instead of fighting over each cleanup task, they reserved about one person's time in every sprint for maintenance and refactoring. In practice, that meant roughly 10 days out of a 60-day sprint capacity. Nobody had to prove that each fix was more important than a feature. The team had already agreed that this slice existed.
In one sprint, they used that time on two small jobs that had been annoying everyone for weeks. First, they fixed a flaky checkout test that failed often enough to slow releases. The bug was not in checkout itself. The test relied on timing that changed under load, so people kept rerunning the pipeline and guessing whether a failure was real.
Second, they simplified one busy part of the app where discounts, taxes, and payment states all lived in one file. Adding any small checkout change took longer than it should. A developer split the logic into smaller parts, removed duplicate conditions, and added a few focused tests around the risky paths.
Nothing about that sprint looked dramatic on a roadmap. Customers did not get a shiny new tab or a launch announcement. Still, the next sprint started in better shape. The team spent less time rechecking failed tests, fewer changes touched the same fragile code, and planning felt calmer because estimates stopped swinging so much.
That is why this works better than one-off cleanup requests. The team does not need to sell every repair as a special case. It protects a fixed slice of time, uses it on the parts that create drag, and carries less mess into the next sprint.
Mistakes that break the plan
This usually fails in small, predictable ways. Teams agree on a cleanup budget when things are calm, then drop it the moment a deadline feels tight. That teaches everyone the same lesson: cleanup is optional, features are not.
A fixed slice only helps if it stays fixed long enough to matter. If the team keeps moving it from 20% to 10% to 0% whenever pressure rises, nobody can plan around it. Debt keeps growing, and the next sprint feels even tighter.
The most common mistakes are easy to spot. Teams fill the slice with vague work like "code cleanup" or "improve architecture." People use it to finish feature leftovers. Someone sneaks in a rewrite that is too large for one sprint. Product and engineering never agree on what "done" means. Then the team forgets to show what the work saved.
Large rewrites deserve extra suspicion. A sprint cleanup slice is for reducing friction, not rebuilding half the product. If a task cannot fit into that slice and produce a useful result by sprint end, break it down or keep it off the board.
The other quiet failure is weak evidence. A team that says "we cleaned things up" will lose the argument to a team that says "this change cut support tickets" or "tests now catch this bug before release." Even rough numbers help. Saving two hours a week is easier to defend than saying things feel better.
When the slice is stable, small, and measurable, it stops looking like engineering preference. It becomes normal sprint work.
Checks before you lock the sprint
A sprint gets messy when cleanup tasks are vague or too big. The fix is boring, but it works.
Before the team commits, make sure each cleanup item can finish inside the sprint. If a task will spill into the next sprint, cut it down or leave it out. Name the problem, not the wish. "Reduce test suite runtime from 28 to 20 minutes" is clear. "Improve testing" is not.
Protect the cleanup slice before feature work expands. If you wait until the end of planning, urgent feature requests will eat it. Ask for one visible result per item, even if the result is small: fewer support tickets, one slow query fixed, build time cut by 10 minutes, or one flaky job removed.
It also helps to reject fuzzy language on the spot. When someone proposes "code cleanup," ask one follow-up question: what problem will disappear by the end of the sprint? That question usually shows whether the task is real work or just a good intention.
Review the rule after three or four sprints. If the slice is too small, too large, or filled with the wrong kind of work, change it once and try again.
What to do next
Start with a small rule and give it enough time to work. The first cycle often feels awkward because people still argue about what belongs there. Run the same rule for three sprints before you judge it. One sprint can be a fluke. Three gives you a pattern.
Keep your debt list short. Rank items by pain today, not by age or by how annoying the code looks. Work that causes repeat bugs, slows releases, or creates rework should stay near the top. Old cleanup ideas with no current cost can wait.
Track a few numbers the whole team can understand: repeat bugs from the same area, rework after QA, cycle time from start to done, and spillover into the next sprint. You do not need a heavy reporting system. A few simple notes are enough if they stay consistent.
Review the cleanup slice at the end of each sprint in plain language. Ask what got easier. Ask what still hurts. If the same messy area keeps interrupting feature work, keep working on it until the pain drops. That is better technical debt planning than picking random refactors because they feel overdue.
If planning fights keep eating time, an outside technical lead can help set the rule and keep the team honest about scope. Oleg Sotnikov at oleg.is works with startups and small companies as a fractional CTO on product architecture, delivery process, infrastructure, and AI-first development. That kind of outside view helps when everyone agrees on the pain but nobody can get to a clean decision in planning.
If the numbers improve after three sprints, keep the rule. If they do not, change the slice size or tighten the debt list. Do not go back to arguing item by item unless you enjoy losing the same hour every sprint.
Frequently Asked Questions
Why should we reserve sprint time for refactoring at all?
Because cleanup work already costs time, even when you do not plan for it. A fixed slice stops every refactor, test fix, or release fix from fighting features one by one, and it gives the team a steady way to remove drag before it grows.
How much sprint capacity should we set aside for cleanup?
Most teams do well with 10% to 20% of sprint capacity. Pick one number, keep it steady for three or four sprints, and judge it by results instead of changing it every week.
When should we choose 10% versus 20%?
Start near 10% if delivery already slips and you need a small, safe trial. Move closer to 15% or 20% when repeat bugs, fragile code, and rework already eat feature time every sprint.
What kind of work belongs in the cleanup slice?
Use it for work that removes repeat pain. Good examples include flaky tests in busy areas, slow queries that block common changes, risky modules that break often, and manual release steps someone keeps babysitting.
What should stay out of the refactor budget?
Keep large rewrites out. A framework replacement, a data model redesign, or a rebuild of a core service needs its own plan, because it is too big and risky to hide inside normal sprint cleanup.
How do we plan cleanup without losing feature delivery?
Reserve the cleanup slice before feature planning starts. If you wait until the end, feature requests will eat it, and cleanup turns back into leftover time that never shows up.
How do we know a refactor task is ready for a sprint?
A cleanup item is ready when the team can explain the problem, the payoff, and what done looks like in plain language. If a ticket says only "improve architecture" or "code cleanup," it is still too fuzzy.
How long should we keep the same budget before adjusting it?
Give the rule three or four sprints before you change it. That window lets you see real patterns like fewer interrupts, steadier estimates, or less time spent reopening the same risky code.
What should we measure to see if the budget is working?
Track a few simple signals the whole team can trust. Repeat bugs from the same area, rework after QA, build or test time, release failures, and sprint spillover usually tell you fast whether cleanup helps.
What mistakes usually ruin a refactor budget?
Teams usually break the plan by dropping the cleanup slice under deadline pressure, filling it with vague tasks, or sneaking in work that is too big for one sprint. Keep the slice fixed, keep the scope small, and show one clear result from each item.