Search indexing schedules by content type that save CPU
Search indexing schedules work better when you group content by update speed, refresh hot records first, and let old archives wait.

Why one refresh schedule fails
A single indexing schedule breaks for a simple reason: it assumes every record changes at the same pace. Real content never does.
A job listing edited three times today and a help article from 2021 should not go through the same refresh path. But one global timer treats them as if they do.
When that timer runs too slowly, fresh records go stale. A price changes, a product description gets fixed, a new page goes live, or an item gets removed. Search still shows the old version. Users notice fast, and they blame search.
When the timer runs too often, the system wastes CPU on records that barely change. The indexer keeps checking archived pages, closed items, and old content no one has touched in months. That extra work does not make search feel fresher.
The problem gets worse on sites with mixed content. Live inventory or listings can change many times a day. Docs, profiles, and landing pages change now and then. Archived records may sit untouched for months.
One schedule cannot handle all of that well. If you tune it for slow content, fresh records lag behind. If you tune it for fast content, old data burns CPU for no good reason.
Teams often fall back to full reindex runs because they feel safe. They also create the load spike you could have avoided. The database gets hammered, caches churn, and indexing workers start competing with normal user traffic.
The issue is not search itself. The issue is using one pace for content that lives at very different speeds.
Group content by how often it changes
A better schedule starts by sorting content into a few simple groups.
The first group is for records that change often and matter right away. Think prices, stock counts, booking slots, live listings, and anything users edit directly. When these records change, search should update soon after. If search says an item is in stock but the product page says it is sold out, trust drops fast.
The second group is for content that changes, but not all the time. Articles, help pages, profile pages, and category pages often fit here. These records still matter, but they do not need the same treatment as inventory or active listings.
The third group is the archive. Closed orders, retired items, old logs, and past events can wait much longer between refreshes. Some teams put them in a separate index. Others update them during quiet hours. Either way, moving archive work out of the hot path usually cuts CPU use a lot.
The rules for moving records between groups should stay simple. New or recently edited items go into the fast group. If a record sits unchanged for a while, move it to the slower group. Finished, expired, or hidden records go to the archive. If someone edits an archived record again, move it back to the fast group.
If support, product, and engineering cannot agree on where a record belongs, the rules are too vague.
A marketplace makes this easy to see. Active listings belong in the fast group while sellers keep changing prices or stock. Seller profile pages can sit in the slower group. Sold listings and old transaction logs belong in the archive once the sale closes. That split keeps fresh results fresh without asking the indexer to recheck everything all day.
Set freshness targets people can live with
A good freshness target is the longest delay users will accept before search feels wrong.
Start with user pain, not system limits. Ask one question for each group: after this record changes, how long can search stay out of date before it causes a problem?
The answer changes by content type. Stock levels, prices, and new listings usually need tight targets because mistakes cost money or create support issues. A stale help article is annoying, but it rarely does the same damage.
Support tickets, sales calls, and search logs are more useful than guesses here. If customers complain when stock is 20 minutes behind, that group needs a tighter window. If nobody cares that an old case study takes half a day to show an edit, give it half a day.
A simple set of targets might look like this:
- Inventory, prices, and time sensitive listings: 1 to 5 minutes
- New user content or marketplace posts: 5 to 15 minutes
- Product details and category pages: 30 to 60 minutes
- Help docs and blog posts: 4 to 12 hours
- Archived content: 1 to 7 days
These numbers only work when they match business cost. If wrong prices or expired offers lead to lost sales, refresh them quickly. If an old blog post changes once every few months, let it wait.
Leave room for bad days. Traffic spikes, queues grow, and jobs fail. If a group must stay under 15 minutes, do not build a pipeline that averages 14 minutes only when traffic is calm. Aim lower so the schedule survives busy hours.
A practical rule is to set one user facing target and one internal target. If search should be fresh within 30 minutes, design the pipeline to finish in 10 to 15. That buffer gives you room for retries, peak load, and routine maintenance.
Build the update paths
Start with one clean full index. Every content group needs a solid baseline before you tune refresh timing. If the index already contains stale fields, missing records, or old URLs, faster updates will only spread bad data faster.
After the full pass, save a checkpoint for each group. That gives you a clear starting point and makes later problems easier to track.
Then split updates by content type and change rate.
Hot records should use event driven updates. When a product price changes or a listing gets edited, push that single record to the index right away.
Normal content can run on scheduled batches. A fixed interval every 30 or 60 minutes is often enough for pages that change a few times a day.
Archive content should use a separate queue that runs in quiet hours. Old posts, closed listings, and past events rarely need daytime CPU.
Failed records need their own retry path. Retry only what broke, with limits and backoff. Do not rerun the whole dataset because one record failed.
Track each path on its own. If event updates stall while archive jobs keep finishing, you want to see that immediately.
A few small rules make the whole setup more stable. Keep batch sizes modest so one slow job does not block everything behind it. Store change timestamps so the scheduler can skip records that did not change. Deduplicate updates when the same page changes several times in a short window.
This kind of pipeline is plain, and that is exactly why it works. It uses less CPU, keeps fresh results moving, and is much easier to fix when one queue falls behind.
A simple example from one store
Take an online store with three types of pages: product pages, browse pages, and old content.
Product pages belong in the fast lane. If a price changes or a popular item sells out, shoppers notice quickly. The store should push those updates into search within a few minutes. Wrong pricing or stale stock leads to abandoned carts, support messages, and frustrated buyers.
Category pages can move slower. A page for "running shoes" or "desk lamps" does not need a rebuild every time one item changes. Refreshing it once an hour is usually close enough for shoppers and much easier on CPU.
Older material can wait. Blog posts from last year, help articles that change once in a while, and retired product pages can go into a nightly batch. People can still find them in search, but the system stops wasting cycles on pages that almost never need minute by minute updates.
Seasonal pages need their own temporary rule. A holiday gift guide or a "back to school" collection may stay quiet for months, then change all day during a sale. During that busy stretch, move those pages into the fast group so search reflects new stock, updated prices, and featured items quickly.
When the sale ends, move them back to the slower path. That small switch matters more than most teams expect. Many schedules fail because they treat seasonal content as fixed even when the business treats it as active.
Watch load before it turns into backlog
A schedule can look fine until indexing jobs start piling up. Then search gets stale, CPU stays high, and one busy hour spills into the next.
You do not need a huge reporting stack to catch this. A small dashboard with a few numbers is usually enough. Watch CPU during indexing jobs, queue length, job runtime, and failed jobs. Those four numbers tell you whether the system is draining work faster than new work arrives.
If only one number jumps, the fix is often simple. If all four rise together, one content group is probably causing most of the pain.
Look for the group that creates the biggest spikes. Fresh product updates may finish in three minutes while archive rebuilds run for 25 and push CPU from 40 percent to 90 percent. That tells you where to tune first.
When search starts slowing down, cap batch size early. Smaller batches can look less efficient on paper, but they usually keep query latency steady and reduce retry storms. A batch that finishes cleanly is better than a giant batch that times out and starts over.
Archive runs should move to quiet periods. People rarely notice if ten year old content waits until late evening. They notice quickly when current records stop showing up in search.
Traffic spikes also need a follow up review. After a campaign, release, or seasonal rush, compare backlog growth with a normal week. If the queue keeps growing for several cycles, adjust the rules. Lower batch size, change refresh timing, or pause archive work until the hot path catches up.
A healthy schedule does not keep CPUs busy all day. It keeps fresh results moving and stops yesterday's work from spilling into tomorrow.
Mistakes that waste CPU and slow search
The most common mistake is treating every change like an emergency.
A typo fix in one record should not trigger a full table reindex. That turns a tiny update into thousands or millions of writes, extra cache churn, and more merge work for no real gain.
Another common mistake is giving cosmetic edits the same priority as changes that affect money or availability. If a product goes out of stock, search needs that update fast. If someone fixes punctuation in a subtitle, users can wait.
Deletes cause a different kind of trouble. Some teams handle inserts and edits well, but they delay removals until a nightly cleanup. Dead records stay searchable, users click them, and the index keeps spending space and compute on data that should be gone.
Most systems do not need a heavy delete flow. Remove the document quickly, or mark it so the next refresh drops it without touching unrelated records.
One stuck job can also damage the whole pipeline. A malformed record, timeout, or failed dependency call should not freeze every update behind it. Yet many queues still work like a single lane road.
A few warning signs show up early:
- Queue lag keeps growing during normal traffic
- Small edits trigger the same job as full imports
- Deleted items still appear hours later
- One failed record stops fresh content from showing up
Timing rules also get old. A schedule that worked for a small catalog or a quiet blog can become wasteful after traffic changes, archives grow, or one content type starts changing all day.
Review the schedule on a regular cadence. Check how much CPU each path uses, how long records wait in queue, and which content types really need fast refresh. If those numbers changed, the schedule should change too.
Checks before you ship
A schedule can look clean on paper and still fail under real traffic. Before launch, test the parts that usually break first.
Start with the promise users will notice. If a product, ticket, or listing changes, it should appear in search within the target window for hot content. Do a small live test. Edit five recent records, note the time, and check when the index reflects each change. If one takes 30 seconds and another takes 12 minutes, the path is not stable yet.
Then test the safety brakes. Archive jobs should slow down or stop when CPU gets too high, or they will crowd out fresh updates. A simple rule works well: if CPU stays above your limit for a few minutes, pause archive batches and let hot updates finish first.
Keep the prelaunch checklist short:
- Change a hot record and confirm search updates inside the target window
- Raise system load and confirm archive jobs pause first
- Break one update on purpose and make sure the retry path works
- Delete a record and confirm it disappears from the index
- Ask one teammate to explain each schedule in one sentence
That last check sounds minor, but it catches messy plans fast. If the team cannot say, "news updates in 2 minutes, product pages in 15, archive posts at night," the system is probably too hard to run.
Watch delete handling closely. Old records that stay searchable confuse users and create support work. Test soft deletes, hard deletes, and merged records if your system has them.
Retry logic matters too. One failed batch should not leave a hole in the index until someone notices it the next morning. The job should retry, log the reason, and stop after a sensible limit.
If these checks pass, you do not have a perfect system. You have one the team can trust on a normal Tuesday, which is what matters.
Keep the schedule alive
Most teams do better with a schedule they can explain in one minute. If the rules fit on a page, people keep them current. If the rules turn into a pile of exceptions, they drift and CPU use starts climbing.
Three groups are enough for a first version: hot content that changes often, normal content that changes now and then, and archive content that rarely changes. That covers most sites.
Run the first version for two weeks and watch CPU, queue depth, refresh delay, and a small set of search results your team checks often. Real traffic usually exposes the weird cases quickly.
After that review, remove rules that no longer match the way content changes. A section that used to update every hour may now change once a month. An archive area may become active again after a relaunch. Clean up those cases before they turn into permanent waste.
It also helps to name one owner for the schedule. That person does not need to run every job by hand. They just need to notice when load, content patterns, or business priorities change and call for an update.
If your team wants an outside review, Oleg Sotnikov at oleg.is works with startups and smaller companies on infrastructure, search, and AI first engineering operations. A short review is often enough to spot refresh jobs that run too often, queues that should move to quieter hours, or rules no one needs anymore.
Set the next review date before you ship. If no one owns that date, the schedule will go stale long before the index does.
Frequently Asked Questions
Why is one global indexing schedule a bad idea?
Because different records change at different speeds. Push fast-changing records like prices or stock through a quick path, and let older or rarely edited content wait longer. That keeps search fresh where users care and cuts wasted CPU on stale pages.
What content groups should I start with?
Start with three groups: hot content, normal content, and archive content. Put items users edit often or content tied to price, stock, or availability in the hot group. Put articles, docs, and profile pages in the normal group, and move expired or closed records into archive.
How do I choose a good freshness target?
Ask how long search can stay wrong before users notice or support gets involved. For stock, prices, and active listings, that window is usually minutes. For docs or blog posts, hours often work fine.
When should I use event driven indexing instead of batches?
Use event updates for records that affect money, availability, or recent user actions. If someone changes a product price, stock count, booking slot, or live listing, send that record to the index right away instead of waiting for the next batch.
Should archived content use a separate queue or index?
Yes, if old content creates load and rarely changes. A separate queue or index for archive records keeps them out of the hot path, so current content updates faster and daytime CPU stays lower.
How do I stop small edits from wasting CPU?
Store change timestamps and reindex only records that actually changed. Also deduplicate rapid edits, so five updates in two minutes turn into one indexing job instead of five. That saves writes, merge work, and queue space.
What is the best way to handle deletes?
Remove deleted records quickly, or mark them so the next refresh drops them at once. Do not leave deletes for a nightly cleanup if users can still click dead results during the day.
Which metrics matter most for indexing schedules?
Watch CPU during indexing, queue length, job runtime, and failed jobs. Those four numbers tell you whether work drains faster than it arrives. If backlog grows during normal traffic, your schedule needs tuning.
What should I test before I ship a new schedule?
Edit a few hot records and confirm search updates inside the target window. Then force one failure, test a delete, and raise system load to make sure archive work pauses before hot updates fall behind.
How often should I review and change the schedule?
Review it on a steady cadence, then adjust when traffic or content patterns change. A rule that made sense for a small catalog can turn wasteful later. Keep the schedule simple enough that one teammate can explain it in a minute.