Feb 05, 2026·8 min read

Move one workload off cloud by starting with steady jobs

Learn how to move one workload off cloud with less risk by starting with logging, CI, or storage, testing recovery first, and keeping scope small.

Table of Contents

Why moving too much at once backfires

When a team moves several workloads off the cloud at the same time, it loses the chance to learn from the change. Every problem blends into one noisy mess. Did costs stay high because storage needed faster disks, because CI runners sat idle, or because logs grew faster than expected? If you change all three together, you never get a clean answer.

That confusion hurts planning. Teams often start with a simple goal: cut cloud spend. Then the project grows. They rebuild networks, change backups, adjust alerts, rewrite access rules, and replace old scripts. Cost work turns into reliability work, security work, and process work all at once. Soon nobody can say which change saved money and which one added risk.

Rollback also gets messy fast. One workload is usually easy to reverse. You keep the cloud version alive for a short time, switch traffic back, and review what failed. Several workloads create dependencies. CI may publish artifacts to new storage. Logging may feed the alerting setup. A rollback for one system now drags other systems with it, and that is when rushed mistakes show up.

Small outages spread farther than most teams expect. A logging gap sounds minor until support loses context during an incident. A CI slowdown looks tolerable until developers wait 20 extra minutes on every merge. A storage hiccup can block builds, backups, and file access at the same time. The technical issue may stay small, but the team impact grows quickly because more people have to stop and coordinate.

The safer approach is to keep the first move boring. Pick a system with steady traffic, clear ownership, and a simple recovery path. That gives you real numbers on cost, uptime, and team effort. More important, it gives the team one change they understand well enough to repeat.

What makes a good first workload

A good first move is usually a boring one. That is a compliment. If a workload behaves almost the same every day, you can plan capacity, test recovery, and spot trouble early.

Steady traffic matters more than raw size. A service with a predictable stream of jobs, logs, or file writes is easier to run than one that suddenly spikes because of a product launch, a batch import, or customer activity you cannot control. For a first move, you want fewer surprises, not a harder puzzle.

Data growth should also be easy to estimate. If storage goes up by about the same amount each week, you can buy disk, set retention rules, and know when you will need more space. If data jumps around with no pattern, small planning mistakes turn into late-night cleanup.

Ownership is another filter teams skip. Pick something one team already knows well and can test without waiting on five other groups. If the same team owns the workload, the deployment, and the rollback plan, decisions get faster and mistakes get fixed faster too.

The safest first workload is one you can rebuild without drama. That does not mean backups do not matter. It means you know exactly how to restore service, reload data if needed, and confirm the result is correct. Self-hosted CI is a common example because runners, pipelines, and caches are often easier to recreate than customer-facing databases.

A short delay during cutover should also be acceptable. Internal systems usually fit better than public ones. If a team can wait ten minutes for builds to resume or logs to catch up, the move stays manageable. If every extra minute costs sales or triggers support tickets, save that workload for later.

A good first candidate usually has most of these traits:

Daily load looks similar week to week.
Storage growth is easy to forecast.
One team can run tests and approve changes.
Backup and restore steps are already written down.
A brief cutover delay will not hurt users.

If a workload misses three of those five, pick another one.

Three workloads that usually fit first

If you want an off-cloud move that does not turn into a rescue project, start with something dull. Boring is good here. A steady job with known load gives you cleaner numbers, fewer surprises, and a much easier rollback.

Logging is often a good first pick. Most teams already know their daily write volume, how long they keep data, and who needs access. That makes self-hosted logging easier to size than a customer app that can jump from quiet to overloaded in minutes. Retention rules help too. If you keep 14 or 30 days of logs, you can estimate storage, backup windows, and recovery time without much guesswork.

CI is another strong option. Self-hosted CI runs in batches, not as a constant public service. Builds start when someone pushes code, then the system goes quiet again. That rhythm gives teams room to pause jobs, limit concurrency, and test with one runner before moving the whole pipeline. This is one reason experienced operators often start there. Oleg Sotnikov, for example, runs self-hosted GitLab runners in production. CI is easier to measure, cap, and recover without putting customers in the middle.

File or object storage can also work, but only when access stays predictable. Internal backups, build artifacts, archived media, and team files are usually safer than hot user uploads. A predictable storage workload has stable read and write patterns, no sudden global traffic spikes, and clear recovery rules. If one storage node fails, your team should know exactly what to restore and how fast.

Leave risky systems for later

Some systems look small on a diagram and still cause the most pain when they fail. Keep these in the cloud until your team has more operating experience:

customer-facing apps with sharp traffic spikes
payment flows
authentication services
live messaging or real-time features

Those systems punish small mistakes. A delayed log pipeline is annoying. A broken login flow stops people at the door.

If your goal is cloud cost reduction, the order matters. Start where traffic stays calm, recovery stays simple, and your team can practice operating the stack without business panic. One solid win teaches more than five rushed migrations.

How to choose between logging, CI, and storage

Pick the workload that is expensive enough to matter, simple enough to reverse, and familiar enough that your team will not dread supporting it. That choice matters more than the move itself.

Start with a rough cost check. Compare one month of cloud spend with the full monthly cost of running it yourself: hardware, disks, power, backups, monitoring, and the hours your team will spend on upkeep. If cloud logging costs $800 a month but self-hosting saves only $150 after ops time, it is probably not the first move. If hosted CI costs $2,000 and a small runner setup cuts that roughly in half with little extra work, CI looks better.

Then count the operational edges. Logging often touches alerts, retention rules, dashboards, and access for several teams. Storage adds backup schedules, restore tests, permissions, and sometimes legal retention rules. CI can be simpler if your pipelines are already clear and your team mostly needs runners, caches, and secrets.

A short scorecard helps:

Monthly spend: which workload hurts enough to fix now?
Complexity: how many integrations, alerts, backups, and access rules come with it?
Rollback: how fast can you switch back if it misbehaves?
Team comfort: which workload does your team already understand?
30-day result: what should improve, and by how much?

Rollback should break ties. If self-hosted CI fails, many teams can point jobs back to the old runner setup the same day. Logging can also be safe when you mirror data for a while. Storage is usually harder because restores take time and mistakes can affect live data.

Team knowledge matters more than people admit. A team that already manages GitLab runners will often get a quick win from self-hosted CI. That matches what practical operators usually do: move the part they already know how to observe, debug, and recover.

Write down a plain 30-day target before you choose. For CI, that might be "cut build spend by 40% with no slower deploys." For logging, "keep search and alerts stable while lowering retention cost." For storage, "run backups daily and pass one full restore test." If you cannot describe success in one sentence, wait before you move it.

How to move one workload safely

Stop costly overbuilds

Keep the first setup small, measurable, and easy for your team to run.

Plan Lean Setup

To move one workload safely, shrink the project until you can explain it on one page. If you need a diagram with ten boxes, you picked the wrong first move.

Start by mapping real usage. Who touches this workload every day, what data it keeps, how fast it grows, and when demand spikes all matter more than vendor feature lists. A logging system may look quiet at noon and then jump hard during nightly batch jobs or deploys.

The first setup should be small and plain. One service, one clear backup plan, and enough disk, CPU, and memory for normal peaks is often better than a full cluster. Many teams overbuild on day one and create more risk than the cloud bill ever did.

If the workload is CI, count how many jobs run at the same time and how long they wait in queue. If it is logging, measure daily ingest and retention. If it is storage, check file size, growth rate, and how often people read or write data.

Backups come before the move, not after. Run them, then restore into a test environment and confirm the data is usable. A backup that only exists on paper is useless.

Before the switch, keep both sides running long enough to compare results. Mirror a slice of CI jobs, sync logs to both destinations, or replicate storage data until checks match. This catches messy details like missing environment variables, time zone issues, and permission mistakes.

A safe move usually follows this order:

Measure normal load and peak hours.
Build the smallest setup that covers that load.
Test backup and restore with real data.
Sync or mirror before the switch.
Move in a quiet window with rollback ready.

Pick a low-traffic window. Keep the old system available, write down the rollback steps, and decide who can call the rollback if errors rise. Make that decision before the move, not during it.

For the next week, watch the numbers that show pain early. Errors, queue times, disk growth, memory pressure, and failed jobs tell the truth quickly. A small team can review those every day in a simple dashboard. That lean model works well for a reason. Oleg Sotnikov uses self-hosted GitLab, Sentry, Grafana, and Prometheus in production to keep issues visible without adding extra layers.

If the workload stays stable for a week, then you have earned the right to think about the next one. Until then, leave the rest alone.

Example: a small team moves CI off cloud

A 12-person product team had a cloud bill that felt silly. They paid for hosted runners every day even though their work had a clear rhythm. Most builds started between 9 a.m. and 6 p.m., and most finished in less than 15 minutes.

That pattern made CI a safe first move. The team did not try to rebuild everything at once. They chose one self-hosted runner on a single box, kept their cloud runners active, and routed only routine jobs to the new machine.

They used the local runner for normal branch builds, tests, and package jobs that rarely changed. Release builds and anything with sudden spikes stayed in the cloud. That split kept the setup boring, which was the point.

Before they trusted it, they pushed on the parts that usually break:

failed builds that stop halfway through
cache reads and cache misses
runner restarts after updates
jobs queued during busy morning hours

They also watched recovery time. If the runner froze, they wanted one person to fix it fast without turning CI into a full-time chore. A simple restart plan mattered more than fancy tuning.

This is why self-hosted CI often works well as a first move. Predictable daytime traffic is easy to size. One runner box can handle a lot when jobs are short and the queue is steady. Teams waste money when they pay for cloud convenience on work that looks the same every weekday.

After two weeks, they had enough data. Routine jobs moved off the cloud because the local runner stayed stable and build times stayed close to what developers already expected. Burst jobs still ran on cloud runners, especially when several branches pushed at once or when a release needed extra capacity.

They did not chase a perfect migration. They picked the part that was steady, cheap to recover, and easy to measure. That let them move one workload without betting the whole delivery pipeline on day one.

Mistakes that create extra work

Make cutover boring

Define owners, rollback triggers, and post-move checks before the switch.

Prep Cutover

Do not start with the workload that creates the biggest invoice. Big bills often come from noisy traffic, sudden spikes, or messy dependencies. That kind of move looks brave on paper and turns into late-night cleanup fast.

A calmer first step works better. Logging, CI, or a storage job with steady use usually gives you a cleaner test. You learn how your team handles hardware, backups, and alerts without putting the most fragile part of the business at risk.

Another common mistake is treating backups like proof of safety. A backup file means very little until someone restores it, checks the data, and confirms the service starts again. Teams skip this because restore tests feel boring. They are boring, and they save you when a disk fails on a Tuesday morning.

Teams also create extra work when they copy the old cloud setup piece by piece. That usually drags in too many runners, too much log retention, and too many moving parts. Your first off-cloud workload should get smaller, not identical. Trim what nobody uses. Keep the setup plain.

Ownership gets ignored more often than people admit. A server does not patch itself, reboot itself, or notice that a disk is almost full. If nobody owns those jobs, the workload becomes a side project that breaks at the worst time. Someone needs a simple routine for updates, monitoring, and recovery.

A short check helps before you move anything:

Pick a workload with stable demand, not the noisiest bill.
Run one real restore test before cutover.
Remove old extras instead of cloning them.
Assign one person or team to patching and alerts.
Leave other cost cuts for later.

That last point matters more than it seems. Teams often try to save on compute, storage, networking, and licenses all at once. Then they cannot tell which change caused the problem. Move one steady workload, let it run quietly for a few weeks, and only then decide what deserves the next round.

Quick checks before cutover

Pressure test rollback first

Let an experienced CTO check your restore steps, alerts, and cutover plan.

Check Rollback

The last hour before cutover matters more than the plan you wrote last week. If the team cannot answer a few plain questions quickly, wait a day and fix the gaps. That delay is cheaper than a messy rollback at 4 a.m.

Start with rollback. One person should be able to explain it in under a minute: what triggers it, who makes the call, how long it takes, and what users will notice. If that explanation turns into a long debate, the cutover is not ready.

A short preflight list keeps this simple:

Confirm you tested restore, not just backup.
Rebuild one part from scratch. For CI, that might mean creating a fresh runner and proving jobs still start.
Check access controls with real accounts.
Verify alerts for disk space, memory use, queue time, and hard failures.
Name one owner for the cutover call.

Team communication matters just as much as server checks. Tell affected teammates what changes, when it happens, and what they should do if something looks wrong. Keep that message short. People rarely read long migration notes when they are busy.

For logging or storage, test the dull stuff. Can someone find yesterday's data? Can they restore a file without extra permissions? Can they tell the difference between a small delay and a real outage? Those checks catch more trouble than another architecture review.

If you want an outside review before the switch, this is where it helps most. A fresh pair of eyes often spots the missing alert, the weak rollback step, or the teammate nobody warned.

Cut over only when the answers are boring and clear. That is usually a good sign.

What to do after the first move

The first 30 days tell you more than the migration plan did. A steady workload can look easy on paper and still create small daily annoyances once real people use it. Watch the boring stuff: what it costs, how often it stays up, and how much team time it quietly eats.

A short review works better than a thick report. Pull the numbers for one month and compare them with the old setup. If your bill dropped but your team now spends two extra hours each week fixing runners, disks, or alerts, the move may not be worth repeating yet.

Track a few plain measures:

total monthly cost
uptime and failed jobs or missed logs
support time from engineers or ops
recovery time for one small incident

Then write down what stayed simple. Maybe CI jobs ran on schedule and developers barely noticed the change. Maybe log retention got cheaper, but disk growth was faster than expected. Those notes matter because teams tend to remember the drama and forget the quiet wins.

Surprises are not a bad sign by themselves. The problem starts when every surprise points to the same weakness, such as poor backups, weak alerts, or unclear ownership. Fix that first. Do not add a second workload just because the first one technically works.

A second move makes sense when the first one feels boring. The team knows who owns it, recovery steps are tested, and on-call noise did not go up. If that is not true, stop there and tighten the setup before you expand.

This is also where many teams get overconfident. They move one workload off the cloud, save some money, and assume the next four will be just as easy. They usually are not. Logging, CI, and storage fail in very different ways.

If your team wants a second opinion before choosing the next step, Oleg Sotnikov shares this kind of practical infrastructure and workload review through oleg.is. The useful part is not a big migration plan. It is getting a clear read on what is steady enough to move, what should stay in the cloud, and how to avoid cheap-looking changes that create expensive support work later.

Frequently Asked Questions

Why should I avoid moving several workloads off cloud at the same time?

Because you learn less and risk more. When you change logging, CI, and storage together, every issue blends together and you cannot tell what saved money or what caused trouble.

Rollback also gets harder. One workload usually has a simple fallback, but several connected systems can pull each other down.

What makes a good first workload to move off cloud?

Start with a boring internal system. Pick one with steady demand, one team that owns it, and a recovery plan you already understand.

CI, logging, or predictable internal storage usually fit better than anything customer-facing.

Is self-hosted CI usually the best first move?

Usually, yes. CI often has predictable daytime traffic, short jobs, and a simple rollback path. You can move routine builds first and leave burst or release jobs in the cloud.

That gives you real cost and uptime data without putting customers in the middle.

When does logging make sense as the first workload?

Logging works well when your daily ingest stays fairly stable and your retention window stays fixed. That makes disk planning, backup timing, and recovery checks much easier.

It gets harder when many teams depend on the same alerts, dashboards, and access rules.

When is storage a bad first choice?

Skip storage first if reads and writes jump around, live users depend on it, or restore mistakes could affect production data. Storage looks simple until you need to recover quickly under pressure.

It fits better when you store backups, build artifacts, archives, or other internal files with predictable access.

How do I choose between logging, CI, and storage?

Use a short scorecard. Compare monthly cloud spend against hardware, power, backups, monitoring, and team time. Then check which option your team already knows how to run and reverse.

If two choices look close, pick the one with the faster rollback.

What should I test before cutover?

Test restore before you switch anything. A backup file alone does not protect you if nobody can reload the data and start the service again.

Also check access, alerts, disk space, queue time, and who has authority to call rollback if errors rise.

How long should I run the old and new systems in parallel?

Keep both sides running until the new setup matches the old one under normal use. For CI, mirror routine jobs. For logging, send data to both places. For storage, sync data and verify checks.

Do not rush this part. A few extra days of overlap usually cost less than one messy outage.

What should I watch after the first move?

For the first month, watch cost, uptime, failed jobs or missing logs, disk growth, memory pressure, and how much team time the system eats. Those numbers show whether the move actually helped.

If the bill drops but your team spends hours fixing small issues every week, pause before you move anything else.

When should I move the next workload?

Move a second workload only after the first one feels routine. The owner should know the update steps, the restore path, and the rollback plan without a long meeting.

If you still see frequent surprises, fix those first. If you want a second opinion before the next step, get help from an experienced CTO who can review the workload, rollback path, and operating cost.