May 19, 2025·8 min read

Docs vs code context for bugs, features, and incidents

Docs vs code context helps teams pick the right source for bugs, feature work, and incidents, so models get less noise and return clearer answers.

Docs vs code context for bugs, features, and incidents

Why sending both sources often makes answers worse

Feeding a model both docs and code at the same time feels safe. In practice, it often makes the answer less accurate. The model has to sort through extra material before it can focus on the actual job.

That noise matters. If you're asking about a bug, the fastest path is usually the code that runs today, not a product spec from three months ago. When you paste both, the model can burn time summarizing background, comparing versions, or repeating intent instead of tracing the fault.

Docs and code answer different questions. Docs explain what the team wanted. Code shows what the system does right now. In any real product, those two sources drift. If a model reads both without a clear starting point, it can blur that difference and give you a neat answer that hides the mismatch.

That makes docs vs code context a practical choice, not a style preference. If pricing rules changed in code but the docs still describe the old behavior, mixed context can make the model treat both as equally true. You get a reply that sounds reasonable but does not match production.

Large inputs create a second problem: cost and review time. More context means more tokens, slower replies, and longer outputs to check. That adds up fast during bug triage with AI or incident work, where teams need a tight loop and a clear trail of reasoning.

A smaller prompt usually gives a better first pass. Start with the source that matches the task. Add the second source only if the model hits a real gap, such as missing intent, unclear business rules, or code paths that depend on behavior nobody wrote down.

Start by naming the job

Most bad AI context starts before anyone attaches a file. Teams mix a bug report, a feature request, an incident note, and a cleanup task into one prompt, then wonder why the answer feels vague. The model didn't fail. The job was blurry.

Give the work a simple label first: bug, feature, incident, or cleanup. That one word changes what the model should read and what it should ignore. A bug asks about current behavior. A feature asks about intended behavior. An incident asks about live state. Cleanup asks what you can remove, simplify, or rename without changing results.

Then write the request in one plain sentence. "Why does the discount disappear after checkout?" is enough. "Add regional pricing rules for annual plans" is enough. If the sentence keeps growing, split the work. One prompt should answer one thing.

After that, decide what kind of context the model needs. If the question is about intent, use docs, product notes, or decision records. If it's about behavior, use code, tests, and recent changes. If it's about live state, use logs, alerts, dashboards, and runbooks.

This is where teams save time. If your sentence asks about intent, the model does not need half the repository. If it asks about behavior, a product brief won't help much. If it asks about live state, old docs may only distract. In docs vs code context, less is often better.

Drop every file and page that does not help answer the sentence you wrote. A long architecture doc won't fix a broken endpoint. Ten source files won't explain a policy decision. During incident work, yesterday's logs may matter more than a polished spec.

It sounds minor, but it changes the output fast. A clean prompt often cuts the context in half and gives you an answer you can use.

For bug work, read code first

When a bug already exists, the fastest truth is usually in the code path that fails. In most docs vs code context decisions, bugs lean hard toward code. Docs can tell you what should happen. Code shows what did happen, what can happen by mistake, and where the wrong branch starts.

Start small. Open the handler, service, query, or UI action closest to the bug report. Then open the test that covers that path, if one exists, and the config or feature flag that can change behavior. That set is often enough to find the issue without dragging in half the repo.

Tests matter because they expose assumptions. A test name might say "guest users can save drafts," while the implementation quietly blocks guests when one flag flips. Config matters for the same reason. Many bugs are not deep logic failures. They come from a default value, a stale env setting, a fallback, or one missing case in a switch.

A good bug context pack is usually small:

  • the failing function or component
  • the nearest test file
  • config, flags, or schema tied to that path
  • the most recent change touching those files

Recent changes deserve extra attention. If a bug appeared this week, compare the current code with the last working version before you read broad design docs. Teams often waste time pasting architecture notes into the prompt when the real cause is one changed default, one null case, or one renamed field.

Use docs later, not first. They help confirm expected behavior when the code and bug report disagree. If the docs say a request should round up and the code rounds down, now the mismatch means something. Before that, docs often distract more than they help.

Keep the context close to the bug. If a checkout total is wrong, send the pricing function, related tests, tax config, and the last few commits around that area. Skip unrelated modules, old planning notes, and the full repository tree. A tighter prompt usually gets a cleaner answer.

For feature work, start with docs

A new feature begins with intent, not files. Code shows how the system works today. Docs show what should change, who the change affects, and what must stay the same. If you begin in code, the model often grabs onto local patterns and misses the product rule that actually matters.

Before you send source files, collect the smallest written context that defines the change. That usually means the business rule, the user flow, a short note on what "done" means, and any edge cases the team already agreed on.

That order works because the two sources do different jobs. Docs answer "what are we trying to ship?" Code answers "where can we change it safely?" For feature work, starting with docs usually gives better results.

Then move to code with a narrow scope. Pull only the modules you expect to change, plus one nearby test if there is one. That gives the model enough AI coding context to find touch points and limits without burying it in unrelated files.

A simple example makes the difference clear. Say a SaaS product adds a trial rule: trial users can invite teammates, but only the account owner can export data. The doc tells you the rule, the affected user path, and the success criteria. The code search should then focus on invitation permissions, export checks, and trial-state handling. You do not need the whole billing system or every account model.

Before the model proposes code, ask one direct question: compare the docs with the selected files and list any gaps. That often reveals the real work. Maybe the endpoint checks plan status but ignores user role. Maybe the acceptance notes mention screen text, but no component renders it yet.

That simple order cuts noise. It also gives you a better chance of shipping the feature the team actually asked for, not the one the current code structure happens to suggest.

For incident work, start with runbooks and signals

Clean Up AI Triage
Book Oleg to set bug, feature, and incident context rules your team will actually follow.

During an incident, speed beats breadth. If you give the model the whole repo and a pile of docs, it often wanders into theory while the system is still failing. Start with the runbook, the alert text, and the ownership note for the service. That gives the model the current symptom, the expected checks, and the team or system in scope.

Then feed it live evidence. Good incident response context is usually small and concrete: what fired, when it started, what changed, and what users feel right now. In practice, that means logs, dashboards, and recent deploys matter more than architecture notes in the first few minutes.

A tight incident packet often includes the alert name and exact message, the affected service or user flow, recent deploys or config edits, a few log lines with timestamps and error text, and the runbook steps already tried.

Tools like Sentry, Grafana, and Prometheus help because they give the model something real to reason about. If error rate spiked at 14:03 and a deploy finished at 14:01, that timing matters. If the runbook says "check queue depth first" and the queue is normal, the model can stop guessing there and move on.

Open code after you narrow the failing area. Reading code too early slows triage because the model starts explaining how the system works instead of why it broke now. Once you isolate the likely service, job, or handler, send the small code slice that matches the symptom, plus the recent diff.

Say a billing alert fires and checkout returns 500s. Send the alert text, the last deploy summary, a few payment logs, and the billing runbook first. If those signals point to one webhook handler, then add that handler and the changed lines. That keeps the model focused on the live issue instead of every possible issue.

During the first pass, ask narrow questions. "What changed? What should we check next? What is the safest rollback point?" usually gets a better answer than "Read all this and find the problem."

How to choose context step by step

Most bad AI output starts with a simple mistake: the model gets too much material and no clear job. A repo, a spec, and a runbook dumped together often create noise, not clarity.

Write one sentence that names the task and one sentence that names the expected output. "Find the cause of this failed checkout test and suggest a fix" is clear. "Review this project" is not.

Then pick one primary source:

  • Use code when you need to explain behavior, trace a bug, or change logic.
  • Use docs when you need intent, scope, rules, or user-facing behavior.
  • Use a runbook or incident notes when the system is failing right now and time matters.

That first choice should do most of the work. In docs vs code context, the best default is not balance. It is focus.

Add one supporting source only if the first source leaves a real gap. If code shows a tax calculation bug but does not tell you the business rule, add the pricing spec. If a product doc describes a new setting but not where it lives, add the relevant module or API handler. If an alert fires and the runbook says "check queue lag," add the dashboard snapshot or recent logs, not the whole service folder.

After that, trim hard. Every file, page, or log chunk needs a reason to stay. If you cannot finish the sentence "I included this because...", remove it. Teams often keep extra context because it feels safer. It usually makes the model less certain and more likely to blend sources that should stay separate.

Before adding more material, ask one direct question: what is missing to finish the task well? The answer should be specific, such as "the expected discount rule" or "the worker that writes this field." If the gap sounds vague, the task is still too loose.

A small, purposeful context set gives better answers faster. It also makes review easier, because you can see why each source is there.

A simple example from a pricing bug

Get Startup CTO Advice
Work through product, architecture, and AI workflow choices with an experienced founder.

A support ticket says invoice totals look wrong. Customers see a few cents of difference between the checkout screen and the final invoice. It sounds minor, but finance bugs create support work fast.

Start with the pricing code and the tests, not the policy doc. The bug lives in behavior, and behavior comes from code. Ask the model to trace one invoice from item price to discount, tax, and final total.

In this case, the model finds that the code rounds each line item before adding the sum. That often looks harmless on its own. Across several items, those tiny rounding steps can push the final number away from what customers expect.

Now check the policy doc. It says "round once at the end." That gives you the real mismatch. The code answers "what the system does today." The doc answers "what the system is supposed to do."

The model can then explain the bug in plain language: early rounding changes the math, so the invoice total can drift by a few cents. It can also draft a fix. Keep full precision through the line calculations, add the raw values, and round only the final invoice amount.

To make the fix safe, ask for one failing test before any code change and one regression test after. A small case with three items, a discount, and tax is usually enough to show the old result and the corrected one.

This is a clear docs vs code context case. Start with code to find the defect. Then confirm the intended rule in the doc. If you send both at once without a purpose, the model may mix up actual behavior and policy and give you a vague answer instead of a fix.

Common mistakes teams make

Teams often send too much at once. They paste a whole folder, a wiki export, and a stack of old docs into one prompt, then ask the model to "find the bug." That usually hurts the answer. The model spends attention on dead files, stale decisions, and repeated explanations instead of the few lines that control the behavior.

Another mistake is skipping the files that actually shape runtime behavior. People share the controller, API route, or UI component, but leave out the test, the feature flag, the environment setting, or the pricing config. Then the model fills gaps with guesses. If behavior changes by plan, region, or customer type, config and tests can matter more than another source file.

Old docs create a different kind of mess. A spec might describe the product from six months ago, before a release changed the logic. If nobody updated that doc, the model can treat bad information as current information. In a docs vs code context, stale docs should act like a clue that something drifted, not like final truth.

Teams also ask for a fix before they define the problem. "Fix this" is too vague. Is it a regression, a missing edge case, a bad migration, or a mismatch between the spec and the shipped code? When the task is fuzzy, the model cannot choose the right context source.

A small habit helps a lot: write one plain sentence before you paste anything. "Find why discounts fail on annual plans after last Friday's release" is enough to narrow the search. That sentence tells the model what job it is doing, which means it can ask for the right code, the right docs, or the right incident signals instead of swallowing everything you have.

Quick checks before you send context

Set Rules for Features
Start from product notes, then move to code with a workflow that fits your stack.

Most weak AI output starts before the model reads a single file. The task is fuzzy, the context is mixed, and half the material adds noise. A two-minute check often does more than adding 20 extra pages.

Write down one exact question first. If you cannot fit it into one sentence, the model will usually guess wrong. "Why does annual pricing show the monthly total on checkout?" is clear. "Review this pricing system" is not.

Then pick the source that should lead. In docs vs code context, one source usually deserves first place:

  • If you need to explain current behavior or find a bug, code is usually the primary source.
  • If you need to plan a feature, check intended behavior, or confirm rules, docs usually go first.
  • If you are handling an incident, start with runbooks, alerts, traces, and recent changes.

After that, make every file earn its place. A good context pack is small and pointed. Include only the files, pages, and logs that help answer the question you wrote down.

Cut anything stale or noisy. Old docs confuse the model. Generated files waste space. Huge logs with no clear signal bury the one line that matters. If a file does not change the answer, remove it.

A simple test works well: ask a teammate why each item is there. If they cannot explain it in one short sentence, you probably should not send it.

You can use this short pre-send filter:

  • One question, one task.
  • One primary source chosen on purpose.
  • Only supporting material that helps answer that question.
  • No stale docs, generated output, or random log dumps.
  • A teammate can explain every included item.

This habit improves AI coding context more than most prompt tweaks. The model gets less to read, but more signal to work with. That usually means faster bug triage with AI, cleaner feature planning context, and calmer incident response context.

Next steps for teams that want cleaner AI work

Most teams do not need a bigger prompt library. They need a few rules people actually follow. If the team agrees on where the model should look first, the answers get shorter, faster, and less confused.

Start with a small playbook for the three common jobs. For bugs, give the model the failing test, error message, stack trace, and the files that likely caused the issue. For features, give it the product note, acceptance criteria, and the docs that explain expected behavior. For incidents, start with the runbook, logs, recent deploys, and current service signals. Save prompts that worked, label them by job type, and reuse them instead of rewriting from scratch.

Keep these sources easy to find. If runbooks live in one place, product notes in another, and tests hide in random folders, people will keep dumping everything into the prompt. A simple index, clear names, and one owner for each source fix a lot of that mess.

It also helps to review a few real prompts every month. Pick one bug, one feature, and one incident. Check what context the team sent, what the model missed, and what should have been left out. After two or three rounds, the patterns usually show up fast.

For teams working on docs vs code context, a short template often beats a long guide. One page is enough if it answers three things: what job is this, what source comes first, and what proof should the model use before it suggests a fix.

If you want outside help setting this up, keep it practical. Oleg Sotnikov, at oleg.is, works as a Fractional CTO and startup advisor. He helps teams build AI-first development workflows, lean infrastructure, and context rules that fit day-to-day engineering work.

Frequently Asked Questions

Should I send docs and code together every time?

No. Start with the source that matches the task. For a bug, send code first. For a feature, send the rule or product note first. For an incident, send alerts, logs, and the runbook first.

Add a second source only when the first one leaves a real gap. That keeps the answer focused and easier to review.

What should I send first when I want help with a bug?

For most bugs, send the failing code path first. Include the nearest test, any config or feature flag that changes behavior, and the recent change around that area.

That small pack usually gives the model enough to trace what the system does right now without getting lost in old plans or broad docs.

What should I send first for a new feature?

With feature work, begin with intent. Send the business rule, the user flow, what “done” means, and any edge cases the team already agreed on.

After that, add only the modules you expect to change. This order helps the model build the right thing instead of copying whatever pattern it finds in nearby code.

How should I prompt the model during an incident?

Start from live signals, not the whole repo. Send the alert text, runbook, recent deploy or config change, and a few log lines with timestamps.

Once you narrow the failing area, add the small code slice that matches the symptom. That keeps triage tight and helps you decide the next safe step faster.

How much context is enough?

Use the smallest context set that can answer one clear question. If you cannot explain why a file, page, or log chunk belongs in the prompt, remove it.

Small prompts usually work better because the model spends less time sorting noise and more time on the actual problem.

What if the docs and the code say different things?

Treat the mismatch as the issue you need to inspect. Code tells you what the system does now. Docs tell you what the team wanted or expected.

For bug work, trust the running behavior first. Then use the docs to confirm whether the behavior matches the rule or whether the rule drifted.

Why should I include tests and config in a bug prompt?

Tests and config often explain behavior that the main code file does not show. A flag, env value, schema rule, or test assumption can flip the result even when the function looks fine.

If you skip those files, the model has to guess. That is where vague or wrong fixes usually start.

What should I ask before I add more files?

Ask one direct question: what is missing to finish this task well? A good answer sounds concrete, like “the discount rule” or “the worker that writes this field.”

If the answer sounds fuzzy, the task is still too broad. Tighten the question before you add more material.

Can stale docs make the answer worse?

Yes. Old docs can pull the model toward behavior that no longer exists. That gets worse when a team changed the code but never updated the written rule.

Use stale docs as a clue that something drifted. Do not treat them as final truth without checking current code or live signals.

How can a team make this approach consistent?

Pick a simple playbook for the common jobs. Define what people send for bugs, features, and incidents, and keep those sources easy to find.

Review a few real prompts each month and trim what did not help. Teams improve fast when they reuse a small template instead of dumping everything into every request.