Sep 25, 2025·8 min read

Terraform state backend: GitLab, S3, or Terraform Cloud

Compare GitLab, S3, and Terraform Cloud for a small team. Learn how Terraform state backend choices affect locking, access, backups, and recovery.

Terraform state backend: GitLab, S3, or Terraform Cloud

Why state storage becomes a team problem

A local state file feels fine when one person builds everything. That changes as soon as a second person runs plan or apply. At that point, the Terraform state backend stops being a personal convenience and becomes shared infrastructure.

The state file is Terraform's memory. It tells Terraform what already exists and how real infrastructure maps to your code. It usually includes resource IDs, tracked attributes, dependency details, outputs, and sometimes sensitive values.

That matters because cloud infrastructure does not tolerate confusion. If Terraform loses track of a database, load balancer, or VPC ID, the next apply can try to replace something that should stay in place.

Small teams run into this faster than they expect. One engineer changes a security group. Another updates an instance size from stale state. The second apply can overwrite the first change or fail halfway through. Then nobody trusts the result, and cleanup takes longer than the original work.

State also needs tighter handling than many teams assume. Provider data, generated passwords, private endpoints, account IDs, and internal resource names can end up in the file or in backups around it. Even if Terraform hides some values in output, the stored state still deserves the same care as other sensitive operational data.

That is why backend choice is really about team rules. You want one shared source of truth, a lock that blocks two applies at once, and access controls that match who should read or change production. You also want a dull recovery plan: where backups live, who can restore them, and how the team tests that process before something breaks.

For a small team, simple wins. One source of truth, clear permissions, and a recovery drill save more pain than any clever workflow.

What to compare before you choose

State files look boring until two people run apply at once or someone exposes a secret. For a small team, the best backend is usually the one that makes mistakes harder to make and recovery boring.

Start with locking. If one person starts a change, can the backend block a second write right away? If a lock gets stuck, does the team know how to clear it safely? Weak locking feels acceptable right up until a long Friday evening proves otherwise.

Then look at access in plain language. Who can read state, who can write it, and who can delete old versions? State often holds enough detail to cause real trouble, so broad access is a bad habit. If you can split read and write permissions, do it.

Version history matters more than most teams expect. You want old snapshots, a clear audit trail, and backups that do not depend on one laptop or one admin remembering a manual step. Automatic versioning can save you after a bad import, a force-unlock, or a rushed state edit.

Recovery needs to be a real process, not a vague idea. Ask a blunt question: if someone damages the state today, how do you restore yesterday's version? The answer should cover who approves the restore, where the older copy lives, and how the team checks that restored state still matches real infrastructure.

Be honest about admin work too. S3 usually means extra setup around IAM, versioning, encryption, and locking. GitLab can feel simpler if your team already runs GitLab CI and already backs it up well. Terraform Cloud removes much of the setup, but you need to be comfortable with its access model and the way it runs changes.

A good Terraform state backend is not the one with the longest feature list. It is the one your team can still run calmly six months later, with clear access rules and a recovery path that does not depend on memory.

GitLab as the backend

For a small team, GitLab is often the easiest backend to live with if you already keep code and CI there. You do not need another account, another bill, or another permissions model. State stays tied to the same project where people review merge requests and run pipelines.

GitLab stores state in its own managed storage. On GitLab.com, GitLab handles that storage for you. On a self-managed setup, the state lives wherever your GitLab instance stores it, whether that is local disk or object storage behind the instance. That detail matters because your recovery plan depends on how GitLab admins back up that storage, not just on Terraform.

Access and locking

GitLab keeps access control fairly simple. Most teams already understand project roles, and those roles often map well enough to state access. That said, you still need to check who can trigger applies and who can read sensitive outputs.

Locking works well for normal day-to-day use. If one pipeline or user is running terraform apply, GitLab can lock the state so a second run does not write at the same time. That alone saves small teams from the classic "two people applied at once" mess. It does not fix bad habits, though. If someone kills a job halfway through, the team still needs a clear rule for checking and clearing stale locks.

Backup and recovery

Before you standardize on GitLab state, answer a few practical questions. Who backs up the GitLab storage that holds state? How often does the team test a restore instead of just creating backups? Can you export current state quickly if you later move to S3 or Terraform Cloud? Who can unlock state after a failed pipeline? Where do recovery notes live so the whole team can follow them?

GitLab fits best when the team is small, GitLab already runs your repo and CI, and you want one place to manage access. It is a sensible default for a startup with two to eight engineers. If you already run self-hosted GitLab with CI/CD, which many lean teams do, keeping state there removes one more moving part without making daily work harder.

S3 as the backend

S3 gives a small team a lot of control. That control is useful when most of your infrastructure already runs in AWS, but it also means you need to make a few decisions early instead of patching them later.

The first rule is simple: keep state access separate from Git access. A developer may need to read a GitLab repo and review merge requests, but that does not mean they should read or change production state. Use AWS IAM roles for people and CI jobs, and treat the S3 bucket like production infrastructure, not like another project file.

Set bucket rules on day one. Teams that postpone this usually keep moving with weak defaults. Turn on versioning, block public access, enable server-side encryption, limit access to named roles, and make sure audit logs exist.

Versioning matters more than many teams expect. State changes often, and a bad write, broken migration, or rushed manual fix can leave a mess behind. With object versioning, recovery is usually a controlled rollback instead of a late-night rebuild.

S3 also forces you to think about locking. Storage by itself does not stop two applies from writing at the same time. Many teams pair S3 with DynamoDB locking so concurrent runs fail fast instead of corrupting state. If your team uses GitLab CI, that matters even more because a retry or second pipeline can start before anyone notices.

Someone should own IAM roles, encryption settings, and audit logs. In a small team, that is often a senior engineer or the person acting as fractional CTO. If nobody owns it, permissions spread, temporary exceptions become permanent, and nobody knows who changed what.

S3 makes the most sense when you want tighter access control, customer-managed encryption, or clear recovery steps inside AWS. A team running AWS workloads with GitLab CI often lands here: GitLab handles code, AWS IAM handles state access, S3 stores the file, DynamoDB handles locking, and version history gives you a practical recovery path.

Terraform Cloud as the backend

Write your recovery plan
Map backups, restore steps, and ownership before a bad state change forces the issue.

Terraform Cloud gives a small team the fastest path to shared state. You create a workspace, connect your repo if you want remote runs, add variables, and start planning and applying without building your own storage, locking, or access rules.

That matters as soon as more than one person touches the same infrastructure. Terraform Cloud keeps the state file in the workspace and runs plans on its side. One workspace processes one run at a time, so two engineers do not write state at once. For a small team, that removes a common source of broken state with very little setup.

Permissions are also easier than in most self-managed options. You can give one group read access, let another queue plan runs, and limit apply rights to the people who own production. That split works well when a founder, one engineer, and an outside advisor all need different levels of control.

State history is straightforward too. Terraform Cloud keeps older state versions, so if a bad apply lands, you can review what changed, compare versions, and restore an earlier state. The safe habit is simple: restore the older version, run a fresh plan, and confirm Terraform shows only the changes you expect before anyone applies again.

The biggest advantage is saved setup time. You do not need to wire up S3, DynamoDB, bucket policies, encryption settings, or GitLab storage rules. You also get remote runs, variables, run history, and policy controls in one place. If your team wants a clean process quickly, Terraform Cloud is hard to beat.

The trade-off is less control. Your workflow depends on another service, and some teams do not like that. Costs can also rise as usage grows. Remote runs may feel restrictive if you need custom network access, unusual tooling, or strict rules about where state and secrets live.

For a small team that wants fewer moving parts, Terraform Cloud is often the least painful choice. For a team that already runs strong AWS or GitLab infrastructure and wants tighter control, it can feel limiting.

A simple decision path for a small team

For a small team, the right backend is usually the one you can keep boring. If storage, locking, and recovery depend on one person's memory, you do not have a standard yet.

Start by mapping who actually runs Terraform. Team size matters, but habits matter more. Maybe one engineer still runs plans and applies by hand. Maybe several engineers review code, but only one or two people apply. Maybe CI applies after approval. Maybe contractors need read access, or someone outside engineering needs audit visibility. Those details matter more than product marketing pages.

Once you know who does what, set one clear rule for writes. A simple rule works well: many people can read state, very few can change it, and the backend accepts writes only from approved CI jobs or named operators. That cuts down on accidental applies and makes incident review much easier.

Then choose the smallest setup your team can support. If your team already works in GitLab every day and understands its permissions model, GitLab may be the easiest fit. If AWS is where you already run most infrastructure, S3 may feel more natural. If nobody wants to run backend plumbing, Terraform Cloud may save time. Pick the option your team already understands.

Do not move every workspace at once. Test one low-risk project first, such as a staging environment or an internal service. Run a few normal changes, one failed run, and one lock or access error on purpose. Small tests expose messy details fast.

Before a full rollout, write recovery steps in plain language and store them where the team can find them. The document should answer five questions quickly: who can unlock state, where the backup copy lives, how to confirm the last good version, when to stop applies and inspect drift, and who makes the final call during recovery.

If your team cannot answer those points in two minutes, wait before you standardize. A calm setup beats a clever one every time.

A realistic example

Review Terraform state setup
Check locking, version history, and permissions with an experienced CTO.

Picture a four-person startup with one AWS account. The founder set up Terraform early and kept the state file on a laptop. At first, that works. One person makes the changes, one person remembers what happened, and nothing collides.

That stops working when the second engineer joins. Now two people can run plan and apply from different machines with different copies of the same state. One engineer adds an SQS queue. A few hours later, another changes an ECS service from an older local state file. The team starts asking in chat, "Did you already apply that?" That is the moment local state stops being convenient and starts being risky.

In that case, GitLab is often the sensible choice if the startup already keeps code and CI there. It gives the team one place to store the repo, run pipelines, and keep state under access control. S3 can work well, but for a small team it usually means more setup across buckets, IAM, and locking. Terraform Cloud is also a solid option, but it adds another tool before the team has a clear reason to do that.

At this stage, a simple rule is enough. Only GitLab CI should run apply on the main branch. Engineers can run plan in feature branches. Only two people should be able to unlock state or change backend credentials.

That one rule removes most "who changed this?" problems in a single move.

The team should also run a restore drill. Once a month, use a staging stack and practice recovery from a bad state change. Save the latest good state snapshot, confirm that live AWS resources still match it, then rehearse restoring that snapshot and run a fresh plan. If the plan shows unexpected deletes, stop and fix the gap before production ever sees it.

They can move later if the setup outgrows this shape. Common triggers are a second AWS account, separate teams touching the same infrastructure, stricter approval rules, or a need for deeper audit and policy controls. Until then, GitLab keeps the process simple and harder to misuse.

Mistakes that cause pain later

A Terraform backend usually fails in boring ways first. The team gets work done, nobody writes down the rules, and six months later nobody knows who can read state, who can unlock it, or how to recover it after a bad run.

Giving read access to everyone is one of the most common mistakes. State files can expose resource IDs, private endpoints, and sometimes even secrets if something upstream went wrong. A developer who only needs plan output does not need raw state access, and neither does every CI job.

Teams also skip versioning because the backend feels like enough protection on its own. It usually is not. If you use S3, turn on versioning. If you use GitLab state or Terraform Cloud state, learn exactly what history and recovery options you have before you need them. A lost or damaged state file can freeze changes across the whole team.

Another problem is the unwritten rulebook. People say things like "don't force unlock unless you're sure" or "only run applies from this pipeline," but they never put that into a short document. Then one failed apply turns into a long Slack thread while everyone guesses. Write down who can unlock state, who can move it, and what to do after a partial apply.

Mixing production and test state in one place also creates trouble. It feels simpler at the start, especially for a small team, but one wrong workspace, path, or variable set can point test changes at production state. Keep environments separate from day one. Separate storage paths help, and separate permissions help even more.

The mistake that hurts most is waiting for an outage before you test restore. Recovery is not theory. Pick a non-production stack and practice it: restore an older state version, run plan and check what changed, confirm the team can handle locks cleanly, and document the exact steps that worked.

That one drill often exposes the real gaps. Maybe backups exist but nobody can reach them. Maybe only one person knows the process. Maybe the team learns that Terraform state recovery is slower than expected. It is much better to discover that on a quiet afternoon than during a production incident.

Quick checks before you standardize

Set up S3 safely
Get help with IAM, versioning, encryption, and DynamoDB locking for Terraform state.

A backend choice gets expensive to change once several projects rely on it. Most trouble starts after setup, when a lock sticks, a secret leaks, or the one person who knows the recovery steps is on vacation.

Pick one owner for the setup. That person should create the backend, write the team note, and keep the first version simple. Ownership matters because small gaps pile up fast: a missing bucket policy, a token with too much access, or no written restore process.

Then remove the hero problem. At least two people should know how to unlock state safely, pull a backup, and restore the last good version. If only one person can do it, your team does not have a process. It has a weak spot.

A short checklist is enough. Limit write access to the few people and CI jobs that actually run applies. Keep one backend pattern across repos unless you have a clear reason to break it. Store recovery steps where the team can find them fast. Test the lock flow on purpose. Rehearse a restore with a low-risk project.

One small drill can save hours later. Take a non-critical workspace, create a fresh backup, simulate a bad change, and restore it with a second team member following the written steps. If the notes are vague or permissions fail, fix that now, not during an outage.

Keep the backend boring, limit who can change it, and make recovery routine. For a small team, that usually matters more than chasing the perfect feature list.

What to do next

A small team does better with one clear choice than with a debate that never ends. Pick the backend that fits your current size, risk, and habits, then write down why you chose it in plain language. Two or three sentences are enough if they cover locking, access control, backups, and who will support it.

Do not start with your most sensitive project. Run a short pilot on a low-risk stack first, such as a dev environment or an internal tool. That gives you a safe place to test state locking, team permissions, and the exact recovery steps if someone damages a state file or deletes access by mistake.

A simple rollout works best. Choose one backend and record the reason. Test it on one small project for a week or two. Fix permission gaps before moving more stacks. Document restore steps with the rest of your team docs. Rehearse a restore every few months.

Access usually causes more trouble than the backend itself. Before you migrate more projects, check who can read state, who can write it, and who can change backend settings. If those rules are fuzzy, slow down and fix them first. A messy permission model can turn a routine change into a long afternoon.

Recovery deserves a real drill, not a note in a wiki that nobody has tested. Pick a quiet day, restore a recent version, and confirm that the team can get back to a working state without guessing. If the drill takes an hour of confusion, that tells you something useful: the recovery process still needs work.

If you want an outside review before you standardize, Oleg Sotnikov at oleg.is advises startups and small teams on backend choices, access rules, and recovery plans. That kind of review can help when you are choosing between GitLab state, S3, and Terraform Cloud and want to avoid redoing the decision a few months later.