Aug 13, 2024·7 min read

OpenTelemetry exemplars for finding slow API requests

OpenTelemetry exemplars connect latency metrics to traces, so teams can jump from a spike to the exact slow API request and fix the cause faster.

Table of Contents

Why slow API spikes stay hard to explain

Slow API pain usually comes from a few bad requests, not from every request getting slower. An average response time of 220 ms can look healthy while one checkout call takes 14 seconds and times out. Users feel the 14 seconds. The dashboard often hides it.

Metrics show the shape of the problem. You see p95 jump, error rate twitch, and request volume stay normal. That tells you something changed, but not which request caused it. A chart can point at 10:42:13. It can't tell you that one customer sent a huge payload, one database query waited on a lock, or one downstream service retried three times.

Traces have the opposite problem. One trace can show the full path of a request in detail: gateway, app, cache, database, retry, then the span that ate 8 seconds. That's great once you have the right trace. Finding it is the hard part.

On a busy API, a bad minute can contain thousands of requests. Engineers open the tracing tool, filter by time, pick a few traces, and hope one looks ugly. Often they inspect several normal requests first. Then they go back to metrics, compare timestamps by hand, guess which pod handled the traffic, and try again.

Metrics show trends. Traces show stories. You need both, but they do not point at each other on their own. Without a direct link, teams spend incident time hunting for the right request instead of checking the cause.

OpenTelemetry exemplars close that gap. They attach trace context to a metric sample that actually contributed to the spike. When latency jumps, you stop with guesswork and start with a real request that helps explain the chart.

What exemplars add to normal metrics

Normal metrics tell you that something changed. They do not tell you which request did it. A latency chart can jump from 120 ms to 2.4 s in one minute, yet the graph still leaves you guessing.

An exemplar adds a real sample to a metric point. Think of it as a note attached to one histogram bucket or one recorded measurement. That note carries trace or span context from an actual request, often just the trace ID and span ID.

That makes the graph much more useful. The metric still keeps the big picture for the service. It shows traffic, error rate, and latency over time. The trace shows the path one slow request took through your code and dependencies. The exemplar is the bridge between them.

Each signal answers a different question. The metric shows when the spike started and how broad it was. The trace shows where the time went inside one request. The exemplar connects the chart to that request.

Latency histograms are usually the best place for exemplars. Histogram buckets already group requests by duration, so attaching a sample trace to a slow bucket fits naturally. If the 2 s to 5 s bucket suddenly fills up, an exemplar can point straight to a trace from that range instead of forcing you to search through thousands of spans.

Without exemplars, slow API debugging often turns into guesswork. An engineer opens random traces from the same minute and hopes one looks bad enough to explain the spike. That works once in a while, but it wastes time and can miss the request that pushed the metric up.

With exemplars, the chart becomes concrete. You still use metrics to watch the whole service, but now you also get a direct clue from a real request that contributed to the number on the screen. That is the missing link between "something got slower" and "this call to the auth service added 1.8 seconds."

When exemplars help most

Exemplars matter most when the graph looks bad, but the reason hides inside a small set of requests. If p95 or p99 latency jumps for three or four minutes and then drops back, averages blur the problem. A normal metric can tell you that something spiked. An exemplar can point to one trace that shows what happened.

This pays off when one endpoint slows down and the rest of the API looks fine. A shared dashboard may show overall latency creeping up, yet only /checkout or /search is in trouble. Without exemplars, engineers often open random traces and hope to get lucky. With exemplars, they can jump from the bucket that moved to a real request from that moment.

Rare failures are another good fit. Timeouts that happen once every few hundred requests often drown in noisy logs. You may have thousands of harmless warnings around the same time, plus a few unrelated retries. An exemplar cuts through that noise because it ties the bad metric point to a trace ID, not to a guess.

Deploys can create the same pattern. Maybe the new code adds extra work for users with large payloads, a specific header, or a feature flag. Most requests stay fast, so the average barely moves. The slow tail gets worse, and customers feel that first. Exemplars make it much easier to compare traces from before and after the deploy without digging through unrelated traffic.

They also help when one dependency hurts only one code path. Imagine an API with two ways to build a response. One path reads cached data and returns in 80 ms. The other runs a database query and sometimes takes 2.5 seconds. If that query slows down, the whole app is not broken. Only the requests that touch that branch suffer. An exemplar can lead straight to the trace with the long database span.

That is when metrics stop acting like a warning light and start acting like a map.

How the metric and trace stay connected

When one API request comes in, your app usually does two things at nearly the same moment. It opens a trace span for that request, and it measures how long the work takes. Those are separate signals, but OpenTelemetry can tie them to the same request.

The trace span carries the request's identity. It has a trace ID and a span ID. If /checkout takes 2.4 seconds, that slow run gets its own IDs from the moment the handler starts until the response goes out.

At the same time, the code records latency in a histogram metric. The histogram does not store every request as a full row. It groups observations into buckets such as 100 ms, 500 ms, 1 s, and higher. That keeps charts fast and cheap, but it also hides which exact request caused the bump.

This is where exemplars come in. When the SDK records that 2.4 second observation, it can attach the active trace context to that single metric sample. The histogram still behaves like a normal histogram, but one sample inside it now carries a pointer back to the trace. It is basically a note pinned to the metric point: "this request landed here."

Your telemetry system stores both parts. It keeps the histogram data for the chart, and it keeps the exemplar metadata next to the metric point. It also stores the full trace. Because both records include the same trace context, the UI can join them later without guessing.

What you see during a spike

Imagine p95 latency jumps for three minutes right after a deploy. On the chart, you hover over the spike and see an exemplar attached to one of the high-latency points. You click it, and the system opens the trace for that exact slow request.

Now you can inspect the span tree and see where the time went: a database query, a call to another service, or a retry loop.

That is the full chain. One request creates a span. The same request records a latency sample. The SDK stamps that sample with trace context. The backend stores it. The chart lets you jump from the metric to the trace. Without that link, slow API debugging often turns into educated guessing.

Set up exemplars step by step

Audit API Latency Issues

Find why your charts look fine while users still hit slow checkout or search calls.

Start the Audit

Exemplars work best when the first rollout is small. Pick one API service, wire up tracing and metrics there, and ignore the rest of the system for now. If you try to instrument everything at once, you usually end up chasing configuration problems instead of learning whether the link works.

A simple rollout looks like this:

Add both tracing and metrics to the same request path. The request needs a span and a duration metric in the same service, or the SDK has nothing to connect.
Measure latency with a histogram. A counter can tell you how many requests happened, but it cannot show the shape of response times or isolate the slow tail.
Keep trace context alive for the whole request. Pass it through HTTP handlers, background work started by the request, and database calls. If the context disappears halfway through, the chart may still show a spike, but the exemplar will not lead to the trace you need.
Send both signals to a system that can store and display exemplars. Some tools accept metrics and traces but ignore the connection between them. Confirm support in the collector, storage layer, and chart UI before you spend hours tuning code.
Test with one slow request on purpose. Add a short delay in staging or hit a route you already know is slow. Then open the latency chart, find the spike, and click the exemplar marker. If it opens the matching trace, the setup works.

After that, expand service by service. This slower approach is usually faster in practice. You find broken context, wrong metric types, and backend gaps early, before the whole observability setup turns into guesswork again.

A simple example from a real incident

A product search API normally finished in about 180 ms, so the team treated it as stable. Then at 9:10, p99 latency, the slowest 1 percent of requests, jumped past 3 seconds and stayed there for about ten minutes.

If you only look at a latency chart, that kind of spike invites bad guesses. People start asking about CPU load, network delays, cache misses, or some odd cloud noise. Those guesses can burn an hour fast.

This time, the chart had exemplars turned on. In the latency histogram, the 3 s and up bucket showed an exemplar attached to one of the bad requests. That small marker changed the whole investigation.

The engineer clicked the exemplar and opened the exact trace behind the spike. The trace told a clear story. The HTTP span looked normal at the start, but one database span took almost the entire 3.2 seconds.

A new search filter had gone live earlier that morning. When users picked that filter, the API built a SQL query that skipped an index and scanned far more rows than expected. The rest of the request barely mattered. The database query was the problem.

Without that trace, the team probably would have checked the wrong things first: CPU graphs on the API pods, network timing between services, recent deploys outside the search service, or autoscaling behavior. Instead, they fixed the query, added the right index, and changed the filter logic so it stopped producing that expensive pattern. Latency dropped back near its usual level right after the change.

That is the point of connecting metrics to traces. The metric told the team that users felt pain. The exemplar pointed to one real request. The trace showed the span that caused it. No guessing, no blame session, and no wasted effort chasing infrastructure that was fine.

Mistakes that break the link

Add Fractional CTO Support

Bring in a Fractional CTO to sort out tracing, sampling, and incident workflow.

Get CTO Help

Most teams lose exemplar data in instrumentation, not in the dashboard. A latency chart can jump from 180 ms to 2.8 s, yet the spike has no trace attached because the app recorded the metric after the request span had already ended.

That first mistake is easy to miss. A request starts a span, does the work, closes the span, and only then writes the latency metric in cleanup code. The number still reaches the chart, but exemplars cannot attach a trace when no active span exists. Record request duration while that span is still current.

The metric type can block you too. Summaries look simple, but many systems cannot attach exemplar samples to them in a useful way. Histograms work much better because they keep latency in buckets. When one bucket suddenly fills with slow requests, you can inspect the attached trace samples instead of guessing.

Route names cause trouble more often than people expect. If one release reports GET /orders/{id} and the next reports orders.show, your chart splits into different series. Raw paths are even worse because /orders/48192 and /orders/48193 become separate labels. Keep route names stable and use templates, not per-request paths.

Trace sampling can erase the evidence. Teams often lower sampling to save money, then wonder why a bad slowdown left no trace behind. If you keep only 1 in 1,000 traces, a rare slow request may never reach storage. Keep enough sampling for slow requests and errors, or use rules that preserve unusual traces after they finish.

Labels can wreck the data as well. If you add user IDs, session IDs, or cart IDs to request metrics, you create too many unique series. Costs go up, charts get noisy, and exemplar samples get scattered across tiny slices of traffic. A small label set usually works best: service name, route template, HTTP method, and status code or status class.

When metrics and traces stop lining up, check span timing, metric type, route naming, sampling, and labels first. In slow API debugging, those five checks solve the problem more often than any backend tweak.

Quick checks before you trust the data

Make Spikes Easier to Explain

Tighten your observability stack so engineers can move from spike to trace fast.

Review My Infra

A nice latency chart can still mislead you. Before you chase a slow request, make sure your telemetry path is complete from the first timing point to the full trace. If that chain breaks in one place, exemplars turn into random dots instead of proof.

Start with the simplest test. Send one request that you know will take longer than usual, then confirm your app creates a span for that request and records the same request in the latency histogram. Both signals need to come from the same execution path. If the span comes from middleware but the metric comes from a different timer, the link often falls apart.

Then check the labels on the metric. Route names should stay stable across deploys. A label like /users/123/orders/456 creates noise fast. A label like /users/:id/orders/:orderId keeps the chart readable and lets you compare spikes over time instead of starting from scratch after every release.

A short checklist helps. Trigger one known slow request and verify that it produces both a trace span and one histogram observation. Confirm the route label uses the same pattern before and after a deploy. Make sure your trace sampling keeps slow requests often enough to inspect later. Open the latency chart and check that exemplar markers appear on or near the spike, not on unrelated buckets. Then click a marker and confirm it opens a full trace with child spans for database calls, external APIs, queues, or whatever else the request touched.

Sampling causes more trouble than most teams expect. You may record exemplars correctly, yet lose the slow trace because the sampler drops it. That leaves you with a marker that points nowhere useful. For spike hunts, many teams keep all error traces and a bigger share of slow traces, even if they sample normal traffic more aggressively.

One last sanity check is simple. Pick a real spike from the chart, click an exemplar, and read the child spans. You should see where the time went in plain terms, like a 2.4 second database query or a stalled upstream call. If you cannot get that answer in under a minute, the setup still needs work.

What to do next

Pick one endpoint that hurts when it slows down. A login call, checkout step, or search request is enough. Start there and prove that your metrics, traces, and exemplars line up before you touch the rest of the service.

That small test gives you a clean yes or no answer. When a latency chart spikes, you should be able to click through to a real trace, inspect the slow span, and name the cause in a few minutes. If that does not work on one endpoint, a wider rollout will only spread confusion.

A short runbook helps more than another dashboard. Open the latency chart for the endpoint, find the spike, open the attached trace, inspect the slow span or downstream call, then write down the cause and the fix in one place. Teams actually use runbooks when they fit on one page. After the first incident, update it with screenshots, field names, and the exact chart your team should trust.

Before you roll out exemplars across more services, review the parts that usually break the flow. Check trace sampling first. If you sample too hard, the chart may point to traces that never reach storage. Then check retention. A spike from yesterday is useless if the trace disappeared after six hours. Last, check cost. Keep enough data to debug real problems, but do not keep every trace forever just because you can.

Share one real incident with the team. A short note like "p95 jumped to 2.4 s, exemplar pointed to trace 7ab, slow query came from a missing index" changes minds quickly. People trust a method when they see it save 30 minutes during a live issue.

If you need help setting up tracing, metrics, storage, and incident workflow without turning observability into a giant side project, Oleg Sotnikov at oleg.is works with startups and small teams as a Fractional CTO and advisor.

Frequently Asked Questions

What problem do exemplars solve?

They connect a bad metric point to a real trace. Instead of staring at a p95 spike and guessing which request caused it, you can jump from the chart to one slow request and inspect the span that used the time.

Where should I add exemplars first?

Start with one endpoint that hurts when it slows down, like login, checkout, or search. That keeps the rollout small and shows fast whether your metrics, traces, and storage actually line up.

Do I need histograms, or are counters enough?

Use a latency histogram. Counters tell you how many requests happened, but they do not show the slow tail well and they do not give you a useful path from a spike to one request.

Why don’t I see exemplar links on a latency spike?

Most missing links come from timing or context issues. Record the latency while the request span is still active, keep trace context through the full request path, and confirm your backend and UI both support exemplars.

How much trace sampling do I need?

Keep enough traces to catch slow requests and errors. If you sample too hard, the chart may show an exemplar marker but the full trace never reaches storage, which makes the marker far less useful.

What labels should I keep on API latency metrics?

Keep labels small and stable. Service name, route template, HTTP method, and status code or status class usually give you clean charts without blowing up series count.

Can exemplars help with short-lived spikes?

Yes, that is one of the best uses. When a spike lasts only a few minutes, averages blur it, but an exemplar can still point to one request from that window so you can see what changed.

How do I test the setup before production?

Make one request slow on purpose in staging, then open the latency chart and look for the marker on the slow bucket. Click it and check that it opens the matching trace with child spans for the database, cache, or downstream calls.

What should an engineer do during an incident?

Open the latency chart for the affected endpoint, find the spike, click the exemplar, and read the slow span first. That gives you a concrete starting point before you check CPU, autoscaling, or other system-level signals.

When should a small team ask for outside help?

Ask for help when your team keeps losing time on setup instead of fixing incidents. If you need someone to wire tracing, metrics, storage, and a simple incident flow without turning it into a huge side project, a Fractional CTO or advisor can shorten that work a lot.