Feb 08, 2025·8 min read

Rollback database migrations without breaking releases

Rollback database migrations without breaking releases by pairing deploy order, reversible schema changes, and feature flags in a safer flow.

Table of Contents

Why rollbacks fail after a schema change

Rolling back app code is usually fast. Rolling back data is not.

You can switch servers back to the previous version in a few minutes, but the database may already hold rows written by the new release. That mismatch causes most migration rollback failures. The old app still expects the old shape of the data. If the new release renamed a column, split one field into two, changed a default, or started writing a new format, the old code can crash or read the wrong value.

The risk gets worse when a schema change removes something instead of adding it. Dropping a column, deleting a table, or making a field required can block the path back. Once the new release writes data that only the new schema understands, the previous version has nowhere safe to land.

Mixed versions make this even messier. During a real deploy, not every process flips at the same second. One web server may run new code while another still runs the old version. Background jobs, admin tools, and scheduled tasks can all hit the same database at once.

That leads to a simple rule: a schema that works for only one app version is risky. In practice, teams usually run into the same problems. New code writes data the old code cannot read. Old code still queries columns that a migration removed or renamed. Constraints pass for one version and fail for the other. Workers or delayed jobs keep using old queries after the web deploy finishes.

A small example makes this clear. Say a release changes one name field into first_name and last_name. The deploy looks fine, but half an hour later you roll back the code. The old version now looks for name, while the database only has the new columns. The app is back, but the release is still broken.

Safe releases treat code and schema as one system. The database needs a path that lets old and new versions live together for a while, even if you hope you never need it.

Know which database changes are risky

For a rollback to work, old and new code need to live with the same database for at least a short time. Some schema changes allow that. Others break it right away.

The safest change is usually adding a new nullable column. Old code ignores it, and new code can start writing to it when ready. That gives you breathing room, which is rare during a release.

Risk climbs fast when you rename or drop a column. Old code still asks for the old name. If the column is gone, requests fail at once. Even a clean deploy can leave a few old app instances running for a minute or two, and that minute is enough to cause real errors.

A few changes need extra caution. Renaming columns or tables often breaks old code on the next query. Dropping columns, indexes, or constraints can remove something the old version still needs. Changing a data format can confuse both versions. A date stored as text today and as a timestamp tomorrow is a common trap. Large backfills bring a different risk because they can slow the database, fill logs, and turn a short deploy into a long incident.

Data format changes are easy to underestimate. If the old app expects "12.99" as text and the new app stores 1299 as cents, both versions may run, but one reads nonsense. That is worse than a loud failure because bad data can spread quietly.

Backfills create pressure in a different way. The schema may be fine, but the work takes time. A migration that updates 50 million rows can lock tables, push CPU load up, or make replicas lag. If you need to roll back while that job is still running, cleanup gets messy fast.

A simple test helps: ask whether old code can still read and write after the change. If the answer is yes, the change is usually low risk. If the answer is "only after every server updates" or "only after data conversion finishes," treat it as a risky release.

Match the deploy style to the migration

A deploy plan and a schema plan have to fit each other. If version A and version B might run at the same time, the database has to make sense to both. That is where many rollbacks fail: the code rolls back, but the data no longer matches what the old code expects.

Rolling deploys are the strictest case. One server may still run old code while another already runs new code, so both versions must read and write the same records without breaking. That usually means additive changes first. Add a new column, keep the old one, write to both for a while, and remove the old column later.

Blue-green sounds safer because you switch traffic between two full environments. It still fails if the old and new versions interpret the same data in different ways. If the new release rewrites a field, changes its meaning, or drops data the old release needs, switching back will not save you.

Canary releases give you an earlier warning. A small slice of traffic hits the new code first, which makes bad writes easier to spot before they spread through the whole database. This is especially useful when a migration changes how the app writes records, not just how it reads them.

Pick the deploy style by compatibility

Use rolling deploys only with backward-compatible schema changes. Use blue-green when the app version changes a lot, but keep the data format compatible until the cutover is stable. Use canary when you want real traffic to test new writes before a full release.

Big schema jumps rarely fit a safe rollback plan. Renaming a column in place, splitting one field into several, or changing data types across large tables can lock you into the new version fast. Once production writes start using the new shape, the old app may read garbage or fail outright.

The safer pattern is expand and contract migration. First, expand the schema so both versions work. Then update the app. Backfill old rows if needed. Only after the new path is stable do you contract the schema by removing old columns or constraints.

If a change cannot support old and new code together, treat it like a special release. Slow it down, narrow traffic, and plan the rollback around data, not just app code.

Write migrations you can reverse

A migration is only reversible if the old app version can still run after the schema change. In practice, that means adding things first and removing things later. If you drop or rename a column too early, the rollback fails even when the code deploy itself is easy to undo.

Start with additive changes. Add the new column, table, index, or default first. Let both the old and new code paths work for a while, then remove the old structure only after production proves the new path is stable.

Keeping old columns or tables for one extra release often feels slow, but it saves real pain. If the new code has a bug, you can switch traffic back without trying to rebuild lost data under pressure.

Before you merge, write the down step in plain language. Do not leave it as cleanup for later. If you cannot explain exactly how to undo the migration, that is a warning that the change is too big or destroys data too early.

For each migration, answer a few questions before it ships. What does the old app still need to read and write? What data should stay duplicated for one release cycle? How would you move new data back if you roll back the code? When is it actually safe to remove the old schema piece?

Small migrations are easier to trust. Adding one nullable column is easy to undo. Adding three columns, backfilling them, changing constraints, and deleting the old field in one file is not. Split that work into separate steps so each migration can stand on its own.

If you split full_name into first_name and last_name, do not drop full_name in the same release. Add the new columns, keep writing full_name, and copy data both ways if needed. After the new code runs cleanly in production, you can stop using the old field. Only then should you remove it.

Teams that handle reversible database migrations well tend to be stubborn about this. They accept extra temporary schema pieces because a clean rollback matters more than a tidy database on release day.

Ship the change in small steps

Audit Jobs and Queues

Find workers and cron tasks that still write the old or new shape.

Audit Jobs

Big failures usually happen when teams change the app and the database in one move. A safer release breaks that work into smaller deploys. Each step should run fine on its own, and each step should be easy to undo.

Start by adding the new schema without changing what users see. Create the new column, table, or index, but keep the old path working exactly as before. If you can, make the new fields optional at first so old code does not crash.

Then ship code that understands both data shapes. During this stage, the app might read from the old schema, the new schema, or both. For writes, many teams use a temporary dual-write period so new data lands in both places. It is a little messy, but it gives you room to roll back the app without losing fresh data.

A common order is simple: add the new schema, backfill old data into the new structure, deploy code that can handle both versions, move writes to the new schema, and switch reads only after you confirm the new data is complete.

Reads and writes should not switch at the same time unless the change is tiny. Writes usually move first because they are easier to reason about. Reads often touch more screens, jobs, and reports, so they need more care. Check row counts, error logs, and a few real user flows before you move the next part.

Do not delete the old schema right after the new path starts working. Let it stay in place through normal traffic, background jobs, and at least one full business cycle. A startup might wait a few days. A business with monthly billing might wait until that billing run finishes cleanly.

Only after the new path stays stable should you remove the old columns or tables in a separate release. That is the expand and contract migration pattern, and it is one of the few practical ways to make rollbacks work under real traffic.

Control release timing with feature flags

When a release depends on a schema change, do not expose the new behavior the moment the code lands. Put the new path behind a feature flag. That gives you time to deploy the code, run the migration, and check logs and queries before real users touch the change.

This split matters when you need safe rollbacks without turning a routine release into a fire drill. If the database already has a new column or table, you may not want to undo that change right away. Often the safer move is simpler: turn the feature off, keep users on the old path, and sort out the issue without another full deploy.

A good rollout starts small. Turn the flag on for your team first, then a few test accounts, then a small slice of production traffic. Watch error rates, slow queries, failed jobs, and support messages. If one step looks wrong, stop there.

The flag should control the risky behavior itself, not just the screen that reveals it. A hidden button is useless if background jobs, webhooks, or mobile clients still write the new schema. Put the off switch around reads, writes, and any async work tied to the change.

A few habits keep flags useful. Default new flags to off. Target a small group before everyone else. Log every flag change with time and owner. Remove the flag after the rollout is stable.

Picture a signup form that now saves first_name and last_name instead of one name field. You can deploy code that understands both formats, keep the new form off, migrate data in the background, and then enable it for a few accounts. If parsing names causes bad data, switch the flag off in seconds. Users stay on the old flow while you fix the code, and the release stays calm.

Example: splitting one name field into two

Check Your Deploy Order

Make code schema backfills and workers move in the right order.

Review Plan

A common schema change looks small and still breaks rollbacks. Imagine a users table with one full_name column, and the app now needs first_name and last_name for forms, email greetings, and sorting.

Do not replace full_name in one deploy. Add first_name and last_name first, keep full_name, and let the old code keep working. That is the safe side of an expand and contract migration.

A simple rollout looks like this:

Release 1 adds nullable first_name and last_name.
Backfill existing rows from full_name.
Write both formats when a user creates or edits a profile.
Switch reads to the new columns behind a feature flag.
In a later release, remove full_name after the new path proves stable.

The dual-write step matters more than people expect. For one full release, every profile update should save full_name and also save first_name plus last_name. That gives you time to compare real data instead of trusting the parsing logic.

Names get messy fast. "Mary Ann van der Berg" and "Prince" do not fit a neat split. If your parser makes a bad guess, keep the original full_name untouched and mark the row for review or leave one of the new fields blank.

You also want a quick comparison check. Rebuild a display name from first_name and last_name, then compare it with full_name for a sample of records or for every changed row. Small mismatches are easier to fix while both formats still exist.

When the numbers look good, turn on reads from the new columns with a feature flag. If something goes wrong, switch the flag off and read full_name again. That is what makes rollback database migrations practical: the database still supports the old code path.

Drop full_name only after one more clean release. If you delete it too early, the rollback stops being a rollback and turns into a repair job.

Mistakes that create rollback pain

Rollback trouble usually starts before anyone runs the rollback. It starts when a release assumes the app, schema, data, and background workers will all move in lockstep. They rarely do.

A common mistake is dropping a column in the same release that stops using it. That feels tidy, but it removes your escape route. If the new code fails and you redeploy the old version, the old version may still read or write that column. Keep the old column for one more release, let both versions work, and remove it later.

Backfills cause a different kind of pain. Teams often launch a big data rewrite and treat it like a one-way trip. Then the job slows the database, dies halfway through, or writes mixed data across millions of rows. If you cannot pause it, restart it safely, and tell which rows it already changed, you do not have a rollback plan. You have a hope-and-wait plan.

Another mistake is believing one rollback command fixes everything. Rolling back the app is easy compared with rolling back data. Once a migration changes values, splits fields, or deletes rows, the old app may no longer understand what it sees. Deploys and migrations need separate steps, with clear rules about which versions can run against which schema.

Background jobs often get ignored until they break the rollback. Web servers may run the old code again, but workers, cron jobs, and queue consumers can keep writing the new shape of data for minutes or hours. That creates a messy database fast. Pause workers, drain queues, or make jobs handle both old and new formats before you release.

The usual trouble spots are predictable: removing old schema too early, running backfills without checkpoints or batch limits, tying app rollback and data rollback into one blind step, leaving workers and scheduled jobs out of the release plan, and skipping the short period where both schemas work.

Boring releases are safer. If each step is reversible on its own, rollbacks stay small and predictable instead of turning into a late-night repair job.

Quick checks before you deploy

Stress Test the Backfill

Batch large data changes before they slow production or block rollback.

Check Backfill

Safe rollback database migrations start with one blunt question: can the old app still run against the new schema? If yesterday's version cannot read or write safely, your rollback plan is mostly wishful thinking. Test that path before release, not during an incident.

A short review catches most avoidable mistakes. Run the previous app version against a copy of the migrated database. Create and edit real records, then check whether writes fail, fields go blank, or values get cut off. Confirm that you can turn off the new behavior with a feature flag or config switch. If you need another deploy to stop the feature, recovery will be slower than it looks on paper.

Rehearse rollback on production-like data. Tiny test fixtures hide the messy cases: nulls, duplicate rows, old bad data, and records shaped by earlier bugs. Also decide who makes the stop or go call before deployment starts. One owner and one backup are enough.

That last point matters more than many teams admit. During a rough release, delay often comes from indecision, not tooling. On a small team, the caller may be the CTO, founder, or fractional CTO. The job is simple: look at the signals, make the call, and keep everyone aligned.

Feature control also buys time. If the migration is live but the new code path stays behind a flag, you can pause the rollout, inspect errors, and keep the database where it is. That is often safer than trying to force a full rollback under pressure.

One practical habit helps a lot: write down the first ten minutes of failure response. Who disables the feature, who checks database health, who watches error tracking, and who tells the team whether to continue or stop. If those answers are fuzzy before deploy, they will be worse after it starts.

What to do next

Start with the next schema change already on your roadmap. Do not wait for a big release. A small change, done with care, is the fastest way to build a process that survives a bad deploy.

Write the change as three stages on one page: add the new schema without breaking old code, switch the app in a controlled release, often behind a flag, and remove the old schema only after the new path has been stable for a while.

That simple map forces clear thinking. If a step has no safe rollback, the plan is not ready yet. This is where rollback database migrations usually fail: teams skip the middle stage and try to do everything at once.

Next, run one rollback drill before your next risky release. Use a staging setup that looks as close to production as you can. Apply the migration, turn the new code path on, then walk backward. Check that old code still runs, old reads still work, and partial backfill data does not leave the app in a strange state.

A short template helps more than a long policy document. For each release, capture what schema change you add first, what app change switches traffic or behavior, what data backfill runs in the background, what tells you it is safe to roll back, and when you can finally remove the old column, table, or code path.

This gets even more useful when several people touch releases. It cuts down on guesswork and makes reversible database migrations part of normal work instead of a last-minute scramble.

If your team wants a second review before a risky release, Oleg Sotnikov at oleg.is works as a Fractional CTO and startup advisor on product architecture and infrastructure. A calm outside review often finds the one unsafe step that could turn a routine deploy into a long night.

Frequently Asked Questions

Why doesn’t a code rollback fix a bad schema release?

Because new code may have already written rows that the old app cannot read. If the old version expects a column or format that no longer exists, requests fail or data turns wrong.

Which database changes are safest if I need rollback?

Start with additive changes such as a new nullable column or a new table. Old code ignores them, and new code can use them without blocking a fast rollback.

What schema changes make rollback risky?

Renames, drops, type changes, stricter constraints, and format rewrites cause trouble fast. Old queries break, or old code reads new values in the wrong way.

What is an expand and contract migration?

You first add the new schema and keep the old one working. After the new path runs cleanly in production, you remove the old columns or rules in a later release.

Should I switch reads and writes at the same time?

Usually no. Move writes first, confirm that new records look right, and switch reads only after you know the new data is complete and jobs still work.

How long should dual-write stay on?

Keep it for at least one full release and enough normal traffic to catch edge cases. If you have billing, reports, or weekly jobs, wait until that cycle finishes cleanly too.

Do feature flags help with schema changes?

Yes, if the flag controls the risky behavior itself. Put it around reads, writes, and async jobs so you can turn the new path off without another deploy.

How should I handle a large backfill?

Run them in small batches with checkpoints and a pause button. Watch database load, locks, and replica lag, and make sure you can resume without rewriting the same rows twice.

What should I do about background jobs during deployment?

Treat workers, cron jobs, and queue consumers as part of the release. Pause them, drain queues, or make them handle both schemas before you switch traffic.

What should I test before a risky migration?

Run the previous app version against a copy of the migrated database and try real create and edit flows. Also rehearse who disables the flag, who checks database health, and who decides to stop the release.