Feb 14, 2025·8 min read

Go migration tools for safe schema changes in real teams

Q: Which Go migration tool should most teams start with?

For most teams, start with a simple SQL-first tool. `golang-migrate` fits well if you want strict `.up.sql` and `.down.sql` files. `goose` or `dbmate` work well if you want lighter workflow and easy-to-read files. Pick the one your team can run the same way on a laptop, in CI, and in production. Consistent habits matter more than extra features.

Go migration tools vary a lot on rollbacks, locks, CI use, and repo layout. This review shows what to check before your team picks one.

Why schema changes get messy fast

A schema change can look tiny on paper and still derail a calm deploy. A team adds one new column with a default value before lunch. The app code is ready, tests pass, and the change looks harmless. Then the database holds a lock longer than expected, writes slow down, and the release stalls while everyone watches dashboards.

Teams usually fear migrations more than code changes for a simple reason: code rolls back more cleanly than data. If a release goes bad, you can redeploy old app code in minutes. You cannot always undo a column drop, a bad backfill, or a type change without losing data or blocking traffic.

The stress gets worse when two services share one database. Service A starts writing a new field today. Service B still reads the old shape and breaks on nulls, missing values, or changed meaning. Nobody changed much in either service, but deploy order suddenly matters, and one quiet database change becomes a cross-team problem.

That is why database work feels less forgiving than normal application work. Timing matters. Lock behavior matters. Old and new versions often run at the same time. A migration can work in staging and still hurt in production because production has more rows, more traffic, and less room for mistakes.

Teams that ship cleanly aim for predictable releases. They make small changes, keep old and new code compatible for a while, and run the same migration steps in local setups, CI, and production. Go migration tools can help, but the goal is simpler: make schema changes routine enough that nobody has to hold their breath during deploys.

The tools worth comparing

Most Go teams end up looking at the same four names: goose, golang-migrate, Atlas, and dbmate. They all solve the same basic problem, but they push teams into different habits. That matters more than a long feature table.

If your team likes writing SQL by hand and keeping the migration layer thin, goose and dbmate feel familiar. If you want a stricter file pattern that behaves the same way every time, golang-migrate is often easier to standardize. Atlas goes further than a migration runner. It adds schema rules, planning, and generated diffs. Some teams love that structure. Others find it heavier than they need.

goose stays close to SQL, but it also lets you write Go migrations when SQL is not enough. It usually keeps up and down steps in the same file, separated by goose comments.
golang-migrate is plain and direct. It normally expects paired files such as .up.sql and .down.sql, which makes rollback intent obvious in code review.
Atlas adds more structure. It does not center the workflow on hand-written up and down pairs. Teams often use it to generate ordered SQL migrations from a declared schema state.
dbmate is simple and SQL-first. Like goose, it keeps up and down sections in one file, which many small teams find easy to read and maintain.

A small team with one Postgres app can stay productive for a long time with goose or dbmate. The files are easy to inspect, and nobody needs to learn much tooling. In a repo with several services, golang-migrate often feels cleaner because the naming stays rigid and the behavior is easy to script.

Atlas makes more sense when schema drift already hurts, or when one person keeps fixing migration mistakes by hand. It asks more from the team, but it can prevent a lot of messy database work later.

How rollback support differs

Rollback support matters most on a bad deploy day. A migration ships, the app starts throwing errors, and the team has to decide fast: undo the schema change or push a new fix that moves forward.

In day-to-day work, rollback support usually means one of two things. Either your tool can run an explicit down migration for each step, or your team treats migrations as forward-only and fixes mistakes with another migration.

Down migrations or forward-only

Explicit down migrations feel safer because they give you a direct escape hatch. If a migration only adds an index, creates a table, or adds a nullable column, a down file can work well and save a lot of stress.

That safety can be misleading. A down migration may reverse the SQL, but it may not reverse the app state around it. If one service already writes to the new column, or the old code path is gone, rolling back the database can break things in a different way.

Forward-only flows are stricter. You never pretend the old state is easy to restore. If a migration causes trouble, you write a new migration that repairs the schema or keeps the old and new code working together for one more release.

Many teams using Go migration tools end up preferring forward-only in production, even if the tool supports down files. It is slower in the moment, but it usually matches how real systems behave.

Where rollbacks help, and where they do not

Rollbacks help most with small, isolated changes:

adding or removing an index
creating a new table that nothing depends on yet
adding a column that no code uses yet
fixing a typo in a constraint or default

They get risky when the migration changes live data. Data backfills, column splits, renames, and type conversions rarely roll back cleanly. Once a script overwrites values, merges fields, or drops old data, the old shape may be gone for good.

That is why database migration rollback sounds cleaner than it often is. For schema-only changes, it can be practical. For data migrations, the safer plan is usually staged releases, backups, and a tested fix-forward path.

What lock behavior tells you

A migration tool that takes a database lock before it changes anything is usually safer than one that just starts running. The lock stops two processes from editing the schema at the same time, which is how teams end up with half-applied changes and messy deploys.

Picture two deploy jobs that start a few seconds apart. One comes from a merge to main. The other comes from a manual retry after a flaky test. If the tool has a clear locking step, one job runs and the other waits or exits. If it does not, both jobs may try to add the same column, create the same index, or write conflicting rows into the migration table.

That is why lock behavior matters more in CI than on a laptop. Local runs are usually one person, one command, one database. CI jobs overlap all the time. Production is even less forgiving because retries, restarts, and extra workers can trigger the same migration twice.

When you compare Go migration tools, check a few plain things:

Does the tool lock before it reads and applies pending migrations?
Does the second job wait, fail fast, or keep going without a lock?
If a job crashes, how do you clear a stale lock?
Can your team see lock events in logs?

Stale locks deserve extra attention. A tool that leaves a lock behind after a crash can block every later deploy until someone clears it by hand. That might be fine in development. It is a real problem in production at 2 a.m. Good tools make cleanup obvious with a timeout, a clear unlock command, or both.

If your team runs lean CI/CD with frequent deploys, lock behavior is not a small detail. It tells you whether the tool was built for real collisions or only for happy-path local runs.

How they fit into CI

Tighten Your Delivery Stack

Get help with CI, observability, and infrastructure around database changes.

Book Session

A migration tool proves itself in CI when it can build a database from nothing, every single time. That matters more than fancy features. If a fresh run fails, the team will keep finding the same surprise in new environments, test databases, and recovery drills.

Start with a simple check: create an empty database and apply every migration in order. That run exposes missing files, bad numbering, and hidden assumptions about old data. Many teams skip this and only test the newest migration, which leaves a gap big enough to hurt later.

Failure reporting matters just as much. CI needs a clear non-zero exit code, and the logs need to point to the exact migration that broke. If the output only says "migration failed" without the file name or SQL error, people waste time guessing instead of fixing.

Most teams end up with two separate jobs:

a pull request check that runs migrations on a fresh database
a release job that applies approved migrations in the target environment

That split keeps feedback fast in review and keeps deploy steps predictable. It also helps when several services share one repo, because you can test only the migrations that belong to the service under review while still keeping a full release path.

Local use should match CI as closely as possible. Developers should run the same command, with the same config shape, on their own machines. If CI needs special wrappers, custom flags, or hidden setup steps, debugging gets messy fast.

This is one place where boring wins. The better choice is often the tool that does less, reports more clearly, and fits into one repeatable command your team can run on a laptop, in pull requests, and in release automation.

What changes in a multi-service repo

A single app with one database is usually easy. A busy monorepo is not. Once several services ship code on different schedules, migration files stop feeling like simple SQL and start acting like shared infrastructure.

One shared migrations folder works when one team owns the database and deploys changes in one path. Everyone can see the full history, and rollback order is clear. The downside is friction. Two pull requests can both add the next version, and one of them will have to renumber or rebase.

A folder per service feels cleaner when each service owns its own database or schema. Teams move faster because they do not wait on each other for every change. That model breaks down when services share tables. Then the repo layout says one thing while the database reality says another.

Shared tables need shared rules. If two services depend on the same schema, one owner should approve the migration order, compatibility window, and cleanup timing. Without that, the tool choice matters less than coordination. Even a good migration library cannot fix a team that changes shared tables without a rollout plan.

This is also where strict naming helps. Predictable versioning, clear ownership, and one place that actually runs migrations make a multi-service repository much easier to live with.

A simple way to test tools in your repo

Skip toy examples. Take one migration you already need from your backlog and test two Go migration tools against that exact change. A good candidate is small enough to run in minutes, but real enough to expose problems, such as adding a column, backfilling data, or changing an index on a table your app touches often.

Use a fresh local database for each run. Old tables, leftover locks, and half-failed test data hide the behavior you actually need to see. If your repo has more than one service, run the migration the way your team would run it for real, with the same config files, startup order, and database user.

A quick bake-off usually tells you more than a long feature list:

Run tool A and tool B on the same fresh database and measure the obvious stuff: setup time, command clarity, and how easy it is to see which version ran.
Start two migration runners at the same time. One can come from your app container, the other from a local shell or second service. Watch what happens to lock behavior.
Force a failed migration. Add a bad SQL statement on purpose or stop the process halfway through. Then inspect what the tool leaves behind: partial tables, dirty state flags, or clear retry steps.
Fix the broken migration and run it again. This part matters more than the happy path. Teams remember the tool that makes recovery predictable.
Put the better tool into a throwaway CI job before you commit. That test shows whether it works well in CI database migrations, not just on your laptop.

If you finish this test in one afternoon, you will know more than you would from reading ten comparison pages. The winner is usually the tool your team can rerun without guessing, not the one with the longest feature list.

A realistic team scenario

Plan Safer Postgres Releases

Get a rollout plan for new columns, backfills, and cleanup across real traffic.

Book Consultation

A team has an API service, a worker, and one Postgres database. They add a new feature that stores a user's preferred locale, so they need one new column and a backfill for older rows.

The first migration should stay small: add preferred_locale as nullable. Do not combine that with a huge update in the same step. If the API goes out first and starts reading a column that does not exist yet, requests fail. If the migration goes out first, nothing breaks, because the old code can ignore an extra nullable column.

The next deploy changes the API code. It writes the new column for fresh updates, but it still works when preferred_locale is empty. That matters because old rows still need data.

The backfill should run after that, usually in the worker or a one-off job, not inside the schema migration itself. Batch updates keep locks short and avoid a long deploy that sits on Postgres for minutes. A migration file that both adds a column and rewrites millions of rows is hard to review and even harder to roll back.

After the backfill finishes, the API can switch to reading preferred_locale first. Only in a later release should the team make the column required or remove any old field that fed the backfill.

For this sort of change, a SQL-first tool with explicit up and down files is easy to live with. golang-migrate is a good fit if the team wants strict migration pairs and very plain CI behavior. goose also works well if the repo already mixes SQL and a little Go.

That choice keeps review simple. One file adds the column, another handles any follow-up schema step, and the backfill stays in worker code where developers can stop, retry, or throttle it. If the feature goes wrong, the team can roll back the API, stop the worker, and decide whether the column should stay for safety or be removed in a clean follow-up migration.

Common mistakes teams make

Most migration problems start with habits, not with the tool. Teams often blame Go migration tools when the real issue is how they write and run changes.

The first mistake is packing too much into one migration. A single file that adds columns, moves data, rebuilds indexes, and drops old tables is hard to review and even harder to undo. Small steps take longer to write, but they fail in smaller ways, and that matters a lot in production.

Another common mess is mixing data repair work into schema files with no retry plan. A data update can fail halfway because of bad rows, timeouts, or a lock that lasts longer than expected. If the script is not safe to run again, the team now has a half-changed database and no clean path forward.

Running migrations from every service at startup is also asking for trouble. In a repo with several apps, two instances can try to migrate at the same time. Even if the library has locking, startup becomes noisy and brittle. One service or one CI job should own the migration step.

Teams also skip down files even when the library expects them. That looks fine until someone needs a quick database migration rollback during a bad deploy. If the rollback path is empty, broken, or never tested, the tool cannot save you.

I have seen this exact pattern on a small team: one big SQL file, startup migrations in two services, and tests only on a developer laptop that already had old tables lying around. Everything looked fine until a clean staging database exposed missing steps.

A better routine is simple:

keep each migration narrow
separate schema changes from risky data fixes
run migrations once, in one controlled place
test on a clean database and on a copy with old data

That takes a bit more discipline, but it cuts a lot of avoidable drama.

A short checklist before you choose

Add Practical AI Automation

Use AI in code review, testing, and docs without making releases harder.

Get AI Help

The best tool is the one your team can use on a busy Tuesday without breaking production. A long feature list matters less than a few plain checks that catch trouble early.

Before you choose among Go migration tools, test them against your real repo, not a demo app:

Start from an empty database. A new developer should clone the repo, run one command, and get the schema the app expects.
Break a migration on purpose in CI. The job should stop fast, print the failed file, and show a clear error.
Run two migration jobs at the same time. Good lock behavior should keep runners from changing the same table together.
Open the repo tree and imagine six months of changes. Service folders, shared migrations, and file names should still make sense.
Put an owner on every migration. That can be a team, a service, or one engineer.

A small test tells you more than a comparison chart. If a tool lets a new hire build from zero, helps CI fail loudly, handles concurrent runners, stays readable, and makes ownership obvious, it will probably hold up in a real team.

If one of those checks fails now, it will fail at 5 p.m. on release day too.

Next steps for your team

Choose one tool, then stop comparing tools for a while. Most teams get more safety from a clear routine than from chasing the "best" library. If you already narrowed down a few options, pick the one that fits your rollback rules, your lock tolerance, and the way you ship code.

Write that decision down in one short team note. Set one migration style, who writes migrations, when they run, and what the team does if a deploy fails. If two services touch the same database, ask both service owners to agree on those rules now, not during an outage.

A small CI path is the safest next move:

apply migrations to a fresh database
start the app and run a smoke test
test database migration rollback if your tool supports it
fail the pipeline on lock waits or timeouts

Keep this path separate from production deploys at first. After a few clean runs, move it into the main pipeline and make it part of every release.

Teams with many services should review lock behavior and rollback policy together. One service that keeps a lock too long can block another team's deploy, even when both teams did their own work correctly. A 30-minute review with every service owner can save a very long afternoon later.

If your team wants an outside review, Oleg Sotnikov at oleg.is works as a Fractional CTO and startup advisor and helps teams tighten migration flow, CI, and infrastructure without adding a lot of extra process.

The best plan is usually the boring one. Pick a tool, test it in CI, agree on rules, and make schema changes feel routine.

Frequently Asked Questions

Which Go migration tool should most teams start with?

For most teams, start with a simple SQL-first tool. golang-migrate fits well if you want strict .up.sql and .down.sql files. goose or dbmate work well if you want lighter workflow and easy-to-read files.

Pick the one your team can run the same way on a laptop, in CI, and in production. Consistent habits matter more than extra features.

Should we rely on down migrations in production?

Use down migrations for small schema changes like adding an index or a nullable column. For data changes, treat production as fix-forward unless you have tested the rollback on real-like data.

A down file can reverse SQL, but it cannot always undo what your app already wrote.

Why do database changes break in production when staging looked fine?

Production has more rows, more traffic, and tighter timing. A migration that looks harmless in staging can hold locks longer or touch far more data in production.

That is why small steps and staged releases work better than big all-in-one changes.

Where should we put backfills?

Keep the schema migration small and move the backfill into app code, a worker, or a one-off job. Run it in batches so Postgres does not sit under long locks.

If the backfill fails halfway, you want a job you can stop, retry, or throttle without rerunning the schema step.

Should every service run migrations on startup?

No. Let one job or one service own migrations. If every service runs them at startup, two instances can collide and turn deploys into guesswork.

A single migration step keeps logs cleaner and makes failures easier to debug.

What should CI test for database migrations?

Start with one simple check: build an empty database from scratch on every pull request. Then make sure the job fails with a clear file name and SQL error when something breaks.

It also helps to run a release job that applies approved migrations in the target environment, separate from review checks.

How do we test lock behavior before production?

Run two migration commands against the same fresh database at nearly the same time. One runner should wait or fail cleanly instead of changing the schema at the same time as the other.

Also test what happens after a crash. You want a clear way to remove stale locks and retry.

What folder structure works best in a multi-service repo?

If each service owns its own database or schema, a folder per service is usually fine. If services share tables, keep shared migrations under one clear owner and one release path.

The folder layout matters less than ownership and deploy order. Shared tables need shared rules.

When is Atlas worth the extra structure?

Atlas makes sense when schema drift already hurts or your team wants generated diffs and stricter rules. It asks more from the team than a thin migration runner.

If your schema work is still straightforward, a simpler tool often gives you fewer surprises.

What is the safest way to roll out a new column?

Add the new column as nullable first. Deploy app code that can handle both old and new states. Backfill later in batches, then make the column required only after the data is ready.

That order keeps deploys calm because old code can ignore the extra column while new code starts writing it.