Feb 04, 2025·8 min read

Terraform for teams that still move fast by hand and where to start

Terraform for teams works best when you start with the parts that change often. Learn a simple rollout, common mistakes, and a quick checklist.

Terraform for teams that still move fast by hand and where to start

Why the all-at-once approach breaks down

A full Terraform rewrite looks neat in a planning doc. On a real team, it usually slows everything down.

Infrastructure changes faster than most teams document it. Someone fixes a production issue in the cloud console. Someone else updates a script on a laptop. A third person changes a setting during a late-night release and forgets to mention it. A few days later, the live setup has moved on, but the Terraform project still describes last Tuesday.

That gap gets expensive fast. Terraform works best when it can trust the current state. When hand edits pile up across consoles, scripts, and emergency fixes, the first big import gets messy. People spend hours asking, "Why does Terraform want to replace this?" instead of shipping.

The biggest mistake is trying to codify everything at once. Teams pull in networks, databases, queues, secrets, DNS, monitoring, and app deploys in one sweep. It sounds efficient, but it freezes normal work for too long. Product teams still need to release. Support still needs quick fixes. Engineers start treating Terraform like paperwork instead of help.

Trust drops the moment the tool starts fighting daily work. If a simple change now needs a long plan review, state repair, or three people on a call, the team goes back to hand edits. They may not say it directly, but the message is clear: Terraform is getting in the way.

You see this a lot in startups and small product teams. They run on speed, memory, and habit. That can be messy, but it keeps the business moving. A forced all-at-once rollout removes that flexibility before the new process has proved itself.

A better start is smaller and less dramatic. Pick the parts that change often, break often, or waste the most time when people do them by hand. Teams accept Terraform much faster when it removes pain they already feel.

What to codify first

Start with the parts your team changes all the time. If someone edits the same security group, DNS record, queue, database setting, or app service every week, that area is a better first target than a large network nobody has touched in a year.

Repetition is the giveaway. When people do the same setup by hand again and again, they stop reading each screen carefully. One missed checkbox can break deploys, block traffic, or leave staging different from production for days. Terraform works best here because it removes work people already dislike.

Keep the first scope small enough to finish in days, not months. Pick one app, one environment, or one cloud account. That gives you a clear boundary and makes review easier. If something goes wrong, the impact stays small.

A good first batch is usually simple: app settings that change during releases, DNS records for one product, queues or buckets tied to one app, access rules the team keeps recreating, or alerts for the same environment.

Skip old systems that only get attention when they fail. They often carry years of hidden assumptions and no clear owner. If you start there, the rollout turns into a cleanup project, and cleanup projects drag.

Look for places where mistakes hurt fast. Staging is often a smart start because teams touch it often and notice drift quickly. Production can work too, if the changes are narrow and well understood, such as a fixed set of DNS records or a small group of scheduled jobs.

A simple test helps: ask, "If we leave this manual for three more months, what keeps going wrong?" Start there. If the answer is "nothing, unless it breaks once a year," leave it alone for now.

One SaaS team started by codifying only the infrastructure for a single customer-facing app in one environment. They left shared networking and legacy servers alone. Within two weeks, deploy prep got faster, review got easier, and nobody had to guess which settings changed last Friday.

That is enough for a first win. Once one clean area is under Terraform, the next area gets easier because the team already has patterns, naming rules, and a better sense of what should stay manual for a while.

Set a clear boundary before you start

Teams get stuck when Terraform changes and hand edits overlap without rules. One person updates a load balancer in the cloud console, another runs apply, and now nobody trusts the result.

Start with one slice of infrastructure and make the line obvious. Good first slices are small, change often, and have limited risk. A staging environment, one new service, or one DNS zone is usually a better first target than the whole production account.

Write the boundary in plain English before you write any code. Keep it short enough that a new engineer can read it in a minute and know where changes belong.

A simple boundary note

Your note can be as simple as this:

  • Terraform creates and updates the staging app servers, security groups, and DNS records.
  • Engineers can still change production secrets and emergency firewall rules by hand.
  • One person approves changes for this area and decides what moves into code next.

That owner matters more than most teams expect. It does not mean one person does all the work. It means one person answers the annoying questions quickly: "Should this live in Terraform?" "Can we still patch this manually today?" "Who fixes drift if someone changes it by hand?"

Use plain naming rules from day one. Pick one pattern for folders, workspaces, and state files, then keep it boring. Environment plus service usually does the job, such as staging-api or prod-billing. If names need a long explanation, simplify them.

Do the same for state. One boundary should map to one state file unless you have a strong reason not to. Mixing unrelated resources into one state makes even small changes feel risky and slows every review.

Terraform gets easier when the first boundary is small enough to finish quickly. If your first rollout needs imports from ten old systems, several exceptions, and a meeting for every apply, cut the scope in half. Finishing one clean area teaches the team more than half-codifying the whole company.

A good first win feels almost boring. That is usually a sign you picked the right boundary.

How to move one area into Terraform

Pick one area that people touch often and that has a clear edge. A small network, one app environment, a DNS zone, or a group of cloud buckets works well. If you start with half the company stack, you will spend weeks arguing about edge cases instead of getting anything under control.

Start by listing every resource in that area. Use the cloud console, billing view, tags, and team notes. The goal is simple: know what exists, what it does, who uses it, and whether anyone still needs it.

Clean up obvious mess before you write much Terraform. Old names, test copies, and duplicate resources turn a small move into a confusing one. If two buckets do the same job, pick one and retire the other. If names look random, rename them now, or at least document the mapping so the Terraform code stays readable.

Import or recreate

Each resource needs a plain decision: import it into state, or rebuild it from code. Import works better for things that must stay live, like a production database, load balancer, or long-used DNS record. Recreate makes more sense for cheap, low-risk items where a clean reset saves time.

A simple rule works well. Import resources that carry live traffic or hard-to-replace data. Recreate resources that are disposable and easy to test. Delay messy edge cases if they block the rest of the move.

Before anyone applies changes, run plans in review and let another person read them. The plan should show only what you expect. If Terraform wants to replace half the area, stop and fix the code, the state, or the resource settings before you touch production.

Do the first live apply in a quiet window, not during a launch, migration, or sales push. Watch logs, health checks, and alerts right away. Small drift shows up fast when names, tags, or dependencies do not match what Terraform expects.

Write rollback steps before the first apply. Keep them boring and specific: which settings you will restore by hand, which backups you checked, who approves the rollback, and how you will confirm the service still works. That document often saves more time than the Terraform code itself.

How to handle hand changes during the transition

Set Better Infra Boundaries
Define what stays manual, what moves to code, and who owns each area.

Most teams still need a few direct changes while they migrate. Incidents happen, a customer needs a quick fix, or someone spots an open security group late in the day. If you pretend nobody will touch the console, you get hidden drift and arguments later.

Terraform stays manageable only when everyone knows when hand edits are still acceptable. Set that rule per area, not for the whole company at once. You might allow temporary DNS edits for one sprint, while compute, networking, and IAM changes must go through code from day one.

A shared log matters more than a fancy workflow. Keep every manual change in one place the whole team checks, such as a ticket board or a note in the repo. Record who changed it, what changed, why they did it, and when the team will pull it back into Terraform.

Do not let urgent fixes sit outside code for days. If someone increases database storage to stop an outage at noon, they should open the Terraform change before the day ends, or before the next apply at the latest. Small gaps turn into confusing plans very quickly.

A simple operating rule

  • Allow manual changes only in areas the team named in advance.
  • Write every console edit in the shared log.
  • Turn emergency fixes into Terraform code right away.
  • Review drift on a schedule, even when nothing seems wrong.
  • Once Terraform owns an area, stop editing it by hand.

A short drift review each week is enough for many small teams. Run a plan, compare it with the change log, and fix any mismatch while people still remember what happened.

When a folder, module, or service is fully under Terraform, close the door on console edits for that scope. That is usually the point where manual changes stop helping and start wasting time.

A realistic first rollout

A small startup with one product, one staging setup, and one production setup does not need a grand Terraform program. It needs one clean first win.

A sensible rollout starts in staging, where mistakes cost less. The team picks the parts they touch often: DNS records, app secrets, and the queues behind background jobs. Those pieces change enough to justify code, and they are simple enough to move without weeks of prep.

They do not try to pull every old decision into Terraform. Existing networking rules stay manual for now, especially if they grew over time and nobody wants to risk a surprise outage. That may feel messy, but it is better than dragging a fragile setup into code before the team understands it.

The first pass is boring on purpose. The team writes Terraform for staging DNS, stores secrets in the chosen secret system, and defines queues with clear names and limits. Then they apply, check the app, and make a few small fixes.

For the next two weeks, they keep using the same pattern. If someone needs a new staging queue or a DNS change, they make it in Terraform instead of clicking around. That short stretch matters because it shows whether the workflow works under normal pressure, not just in a quiet demo.

Once staging stays stable for two weeks, the team repeats the pattern in production. They do not redesign everything. They copy the structure, swap values, review it carefully, and keep the scope narrow. That is when incremental Terraform adoption starts to feel useful instead of theoretical.

Some things can wait. Rare billing settings, account-level preferences, and odd admin pages often sit untouched for months. Putting them in code on day one adds work without solving a real problem.

This kind of rollout is plain, but it works. Teams that move fast by hand usually need less ambition and more repetition. Oleg Sotnikov often uses the same approach in infrastructure work for smaller companies: move the parts people change every week, prove the process, then widen the boundary only after the team trusts it.

Mistakes that slow teams down

Start With One Clean Win
Choose the right first service or environment and move it into Terraform without slowing releases.

Teams usually do not fail because Terraform is hard. They fail because they turn a small migration into a company-wide rewrite.

The most common mistake is scope. A team starts with one pain point, like repeat server setup, then decides to model every network, database, secret, and policy before the first useful apply. Work piles up, review gets slow, and nobody trusts the plan output anymore.

Another mistake is mixing cleanup with migration. If your cloud account has old names, unused security groups, odd tags, and forgotten test resources, fix those in a separate track. Do not rewrite history and move live systems into Terraform at the same time. That turns every change into a debate.

A few simple rules prevent most of the chaos. Keep the first rollout narrow and choose one area that changes often and hurts when done by hand. Decide where state lives before people write much code, because shared remote state beats five local copies on laptops. Set naming rules early. If one team uses "prod-api" and another uses "api-production," the mess shows up in every plan. Give each Terraform area a clear owner. And once Terraform owns a resource, stop fixing it in the console.

State and ownership rules sound boring, but they save real time. If two engineers run changes against the same resources with different state files, one of them will eventually replace something by accident. That is not a Terraform problem. It is a team rule problem.

A small example makes this obvious. Say your team moves one staging app into Terraform. On Tuesday, someone changes an environment variable in the cloud console to test a bug fix. On Thursday, another engineer runs apply from code that still has the old value. The app flips back, the bug returns, and now people blame the tool.

That is why slow rollouts happen. Terraform can only manage what the team agrees to manage, in one place, with one source of truth.

Quick checks before each apply

Review Your Terraform Plans
Catch risky replacements and unclear boundaries before they reach production.

Fast teams get into trouble when one Terraform apply quietly turns into three changes at once. Before you run anything, make sure the apply covers one clear area and nothing else. If it touches a database, a DNS record, and a load balancer in one go, split it up.

That boundary matters more than many teams think. Small applies are easier to review, easier to explain, and much easier to undo when something goes wrong.

A short checklist keeps people honest:

  • Confirm the change affects one defined area only.
  • Read the plan line by line with another person.
  • Check that names match the ones your team already uses.
  • Make sure rollback steps are written down and easy to follow.
  • Ask whether anyone made hand changes since the last apply.

Plan review should be slower than feels comfortable. Skimming is how teams miss a renamed resource, a forced replacement, or a tag change that breaks a script later. A careful five-minute review saves a lot more than five minutes.

Names matter too. If your servers, buckets, or queues use one naming pattern in day-to-day work, Terraform should use the same one. A new naming style might look cleaner on paper, but it confuses people during incidents.

Rollback needs to be boring. Someone on the team should know exactly what to do if the apply fails halfway through. "We can probably fix it" is not a rollback plan. "Revert this variable, run this apply, and restore this setting" is.

Hand changes are the last trap. In teams that still move fast by hand, someone often updates a setting in the cloud console and forgets to mention it. Then Terraform plans a surprise change because the real system no longer matches the code. Ask about those edits every time, even if it feels repetitive.

One simple example: if a teammate changed an autoscaling limit by hand last night, note it before the next apply. Otherwise, Terraform may switch it back and create a problem you did not mean to cause.

What to do next

Start with one area you can finish soon. A single service, one database, or one staging environment is enough. Put a finish date on it, even if it is only 10 business days away. Deadlines force choices, and early Terraform work usually stalls because teams leave the scope open.

Keep the first pass plain. You do not need perfect modules, naming rules for every future case, or a big cleanup project first. You need one area where the team can say, "this now lives in Terraform," and mean it.

Before more people touch it, write simple ownership rules. Keep them short so people will read them: who can change the code, who can approve applies, what stays manual for now, and what to do if someone makes an urgent hand change.

Those rules save more time than clever code. Without them, teams drift back to chat messages, quick console edits, and mystery changes nobody remembers a week later.

After the first area is stable, copy the same rollout pattern to the next one. Do not redesign the process every time. Pick a boundary, migrate it, run it for a bit, fix the rough edges, then move on. Repetition is what turns infrastructure as code into normal team behavior instead of a side project.

Keep reviews short. A good Terraform review should answer a few plain questions: what changes, who owns it, and what could break? If reviews turn into long architecture debates, the team will start avoiding Terraform.

A small example works well here. If your team moves one queue worker into Terraform this month and one background job cluster next month, that is real progress. Two finished areas beat a half-planned company-wide rollout every time.

If you are unsure where to draw the first boundary, or you want a lean rollout that does not slow delivery, Oleg Sotnikov at oleg.is offers Fractional CTO and startup advisory help. That kind of outside input is most useful early, when a small decision can save weeks of cleanup later.

Frequently Asked Questions

Why is moving everything into Terraform at once a bad idea?

Because big rewrites slow normal work and expose every hidden hand edit at once. Start with one small area that changes often, prove the workflow, and let the team build trust before you widen the scope.

What should we codify first?

Pick the part your team changes every week and keeps messing up by hand. DNS for one product, one app environment, queues, buckets, or repeat access rules usually make better first targets than old shared networking nobody touches.

Should we begin in staging or production?

Start in staging if your team uses it often and can absorb a small mistake. Move to production after the same pattern works for a couple of weeks and the scope stays narrow.

How small should the first rollout be?

Keep it small enough to finish in days, not months. One service, one environment, or one cloud account usually gives you a clean edge and keeps reviews and rollback simple.

Should we import existing resources or recreate them?

Import anything that carries live traffic or hard-to-replace data, like a production database or long-used DNS record. Recreate cheap, disposable resources when a clean rebuild saves time.

How do we handle console changes during the transition?

Allow manual edits only in named areas and log every change in one shared place. Then turn each emergency fix into Terraform code right away so the next plan does not surprise anyone.

Who should own the first Terraform scope?

Give one person clear ownership of the first Terraform area. That owner answers scope questions, decides what stays manual for now, and keeps drift and review from turning into team-wide debates.

What should we check before running apply?

Read the plan line by line with another engineer, confirm the apply touches one defined area, and make sure rollback steps are written down. Also ask whether anyone changed something by hand since the last apply.

What should we leave out of Terraform for now?

Leave low-change, messy legacy systems for later. If a system only gets attention once a year and nobody fully understands it, you will spend more time cleaning history than fixing current pain.

When should we move to the next area?

Expand only after the first area stays stable under normal team pressure, not just in a quiet test. When people stop reaching for the console and the code matches reality, copy the same rollout pattern to the next area.