Oct 20, 2025·7 min read

Source attribution that survives answer rewriting in AI

Use a simple process for source attribution that survives answer rewriting, so a second model can change tone or structure without breaking citations.

Table of Contents

Why citations break after a rewrite

Most citation failures start with a very ordinary workflow. One model writes the answer. Another rewrites it to sound shorter, clearer, or more in line with the brand voice.

The second pass often improves the writing, but it also changes the structure. A claim that originally sat right next to its source can move three lines down. A sentence that held one fact can turn into two. Two separate ideas can get merged into one smooth line.

On the page, that still looks fine. The problem is that citation markers usually stay attached to the old text position, not to the fact itself. After the rewrite, the answer still looks cited, but the citation may now point to the wrong claim.

Mixed sentences make this worse. A rewrite model might combine a market number from one report with a product detail from another source. If the new sentence keeps one marker, readers cannot tell which part it supports. If it keeps both, readers still have to guess which source belongs to which clause.

This happens a lot in multi-model pipelines. A team drafts with one model, polishes with another, then runs a final style pass with a third. Each step can compress, split, and blend statements. Citation markers survive as leftover punctuation while the wording changes enough that they no longer make sense.

Readers notice fast. They see a citation, but they cannot tell what it proves. Does it support the whole sentence, only the first clause, or just the number at the end? Once that doubt appears, trust drops.

Readable writing is not enough. A good answer lets readers trace each factual claim back to clear evidence, even after the wording changes.

Track claims, not sentences

A rewrite should change wording, not evidence. Once those two get mixed together, citations drift.

The safer approach is to store facts as separate units before any model rewrites the answer. Think of each fact as a small record with three parts: the claim, the evidence behind it, and a stable ID. The polished sentence you show readers can change later. The record should not.

Keep the claim itself plain and neutral. Save the underlying point, not the final phrasing. "Revenue grew 18% in Q2" is a fact. "The company posted a strong quarter with notable growth" is style.

The evidence needs to be exact. Save the supporting passage, table cell, quote, or paragraph. A document title alone is too loose. If you only store "annual report," the next model can easily attach that citation to the wrong sentence. A precise span gives you something stable to carry through every rewrite.

Some claims need more than one source, and that should be clear from the start. "The outage lasted 43 minutes and affected EU customers only" may depend on one source for the timeline and another for customer scope. Treating that as one simple claim with one citation invites trouble.

Teams that do this well often keep a small fact sheet next to the draft. It is not fancy. It just gives every claim a stable handle so attribution still works after tone, structure, and phrasing change.

Build the fact sheet before rewriting

Before you rewrite anything, pull the answer apart into claims and park them in a simple table. A spreadsheet is enough.

Each row should hold one claim only. That rule matters more than it seems. If a sentence says, "The company launched in 2021 and reached 50,000 users in its first year," split it into two rows. Later, a second model may turn one sentence into two, merge two into one, or drop half of it. One row per claim makes all of that safer.

A useful fact sheet usually includes the claim in plain language, the source name, the source date, the exact quoted span that supports it, and a short review note if something needs checking.

The quoted span is the part many teams skip, and it is usually the reason attribution falls apart. "Source: annual report" is too broad. You want the exact line, paragraph, or table entry that proves the claim. When someone checks the answer later, they should not have to hunt through a long document.

Be strict about weak support. If the source only hints at a claim, do not keep it as a fact. Remove it, or rewrite it as an estimate or opinion if that is honest. This single step cuts a lot of quiet errors before they spread.

Give extra attention to numbers and dates. Review them one by one, even if the rest of the draft looks fine. Small wording changes can bend a statistic, and a missing year can change the meaning of a whole sentence.

Ten careful minutes here can save an hour of cleanup after the rewrite.

Keep facts and citations together during the rewrite

Citations usually break when a rewrite model treats the draft as plain text. It moves clauses, combines ideas, or shortens sentences, and the link between a fact and its source disappears.

Start with claim extraction. Pull each factual statement out of the first draft and make it small enough to verify. "Revenue grew 18% in Q2" is one claim. "Revenue grew 18% in Q2 and support tickets fell" is two, so split it.

Next, attach a stable ID to each claim. A simple label like C014 works well if you keep it permanent across every pass. In your claim table, store the ID, the source document, the exact supporting passage, and a short plain summary of the claim.

Then let the rewrite model work on style, but not on evidence. Tell it clearly that it can shorten, reorder, and smooth the writing, while keeping each claim ID attached to the same fact. If it splits one sentence into two, it should carry the right ID into both parts where needed. If it merges two supported claims, it should carry both IDs forward.

A short example shows the point. Suppose the first draft says: "The system cut response time by 32% [C021] and reduced cloud spend by 18% [C022]." A rewrite model might turn that into: "Response time dropped 32%. Cloud costs also fell 18%." That rewrite is fine only if C021 stays with the first sentence and C022 stays with the second.

Check the final draft before you publish. Your review script or editor should flag any factual sentence with no ID, any ID that does not exist in the claim table, and any claim whose meaning changed during the rewrite. That small check saves a lot of cleanup later.

A simple example with two model passes

Design Claim First Answers

Build a claim sheet process that survives rewrites, summaries, and style changes.

Talk to Oleg

A support team gets a customer question: "Can I cancel mid-cycle, and what happens to my data?" The team has a policy document with exact wording, but they want to send a short, friendly reply rather than a legal block of text.

The first model does one job only. It reads the policy and extracts claims with source pointers:

C1: Customers can cancel at any time. Source: policy lines 12-13.
C2: Service stays active until the end of the current billing period. Source: lines 14-16.
C3: The company deletes account data 30 days after cancellation unless law requires longer retention. Source: lines 21-24.

At this stage, the wording does not need polish. It just needs clean claim to source mapping. If a reviewer checks C3, they know exactly where it came from.

The second model gets a different task. It does not search the source again. It rewrites the approved claims into a warmer answer, keeps the IDs attached, and drops anything that does not fit the customer's question.

The final reply might read:

"Yes, you can cancel whenever you want [C1]. Your service will remain active until the end of your current billing period [C2]. After cancellation, we delete account data after 30 days unless we need to keep it for legal reasons [C3]."

The tone changed. The wording got shorter. The order now makes sense for the customer. But each fact kept its ID, so the system can still show the right policy lines in a review panel or under the answer.

This also holds up if the second model combines claims. If it rewrites C1 and C2 into "You can cancel anytime, and your access continues through the paid period," both IDs stay attached to that sentence. The sentence changed, but the attribution did not.

That is the whole pattern: extract facts first, rewrite second, and never let the rewrite step invent its own citations.

When one sentence turns into two

Rewrites often split a sentence for clarity. That looks harmless, but sentence-based citations break fast.

A sentence is only packaging. The claim is the unit that needs support.

Take a line from a fact sheet: "The team cut cloud costs by 30% in Q2 and kept uptime above 99.9%." That sentence contains two claims. If a second model rewrites it as "Cloud costs fell by 30% in Q2. Uptime stayed above 99.9%," each new sentence should keep the right ID. Do not create fresh IDs just because the wording changed.

Sometimes one original claim gets split into two sentences. In that case, the same ID may need to appear twice. "The outage lasted 14 minutes because a bad deploy filled the disk" might become "The outage lasted 14 minutes. A bad deploy filled the disk." If both points come from the same incident report and belong to the same approved claim unit, both sentences should keep that report ID.

The reverse happens too. A rewrite may combine two supported claims into one smoother line. Then one sentence may need two IDs. That is normal. "Revenue rose 12% after the pricing change, and support tickets stayed flat" can carry one ID for the finance report and another for the support dashboard.

The important boundary is simple: tone changes are fine. Extra meaning is not. "Cloud costs fell by 30%" is supported if the source says that. "The team became more efficient" is a new claim unless a source proves it.

This is where the workflow either works or fails. If you map claims to sources before rewriting, a second model can change tone and structure without losing proof. If you map sentences to sources, even a tidy edit can scramble the evidence.

Mistakes that break attribution

Bring In a Fractional CTO

Get senior help designing AI answer systems your team can trust and maintain.

Book Call

A clean rewrite can still ruin your citations. The usual problem is subtle: the text reads better, so nobody notices that the source marker no longer matches the exact claim on the page.

One common mistake starts early. A writer cites a whole paragraph when only one sentence in it came from a source. That feels neat, but it creates blur. Later, the rewrite model changes sentence order, merges ideas, or drops one line, and now the citation appears to support claims it never covered.

Another mistake is letting the rewrite model add details. This happens all the time with harmless-looking filler: a date, a percentage, a cause, or a comparison. The new wording sounds smooth, but those extra details may have no source at all. A rewrite step should change wording, not expand the fact set.

Fast-moving topics fail in a different way. Teams keep the source name but drop the date. That makes an answer look grounded when it is already stale. If a claim depends on a current policy, price, benchmark, or product change, the date is part of the fact.

Messy markers can hide trouble too. If one citation covers a direct quote, a summary of the same source, and the writer's opinion, readers cannot tell what came from where. Keep those separate. Quotes need exact support. Summaries need source backing. Opinion needs no citation unless it contains a factual claim.

The last mistake is very human: skipping the final fact check because the draft sounds right. Good writing can hide bad grounding.

Before publishing, check a few basics. Each factual claim should point to a source, not just the paragraph around it. The rewrite should add no new numbers, dates, names, or causes. Claims that depend on timing should still include their dates. Quotes, summaries, and opinion should stay separate. And someone should compare the final wording to the source, not just to the first draft.

If a sentence reads better after the rewrite, great. If the citation still matches every claim, that is what counts.

A quick review before you publish

Upgrade Your Answer Stack

Work with an experienced CTO on grounded AI replies, infra, and team processes.

Book Session

A rewritten answer can sound clean and still drift away from the source. The last pass should test whether every factual claim still has a clear path back to evidence.

Start with the IDs. Every statement that asserts a fact should carry one, even if the reader never sees it. If a sentence says a company launched in 2021, cut costs by 18%, or serves 4,000 users, each part needs an ID that points to the fact record behind it.

Then do a short review:

Open each ID and confirm that it points to a real source span, not just a whole page or document.
Compare names, dates, totals, and quoted wording against the source text line by line.
Check whether the rewrite merged two facts into one sentence and changed the meaning.
Look for new claims about cause, certainty, scale, or timing.
Ask a reviewer to trace each claim from answer to source in under a minute.

That second step matters more than many teams expect. A citation to a long article is weak if the reviewer still has to hunt for proof. Good attribution points to the exact sentence, table cell, or transcript span that supports the claim.

Numbers deserve extra suspicion. Models often round them, swap units, or turn a range into a single figure. Names can drift too. A rewrite may shorten an organization name, change a job title, or confuse two people with similar roles. Small edits like these break the mapping even when the answer still sounds plausible.

One simple rule catches a lot of problems: if the rewrite added any fact that does not map to an existing ID, remove it or verify it first. Fluent wording is not evidence.

A good final review feels almost boring. A reviewer should be able to scan the answer, spot a claim, open its ID, and see the source span right away. If that takes effort, the answer is not ready.

Next steps for teams using AI answers

Start smaller than you think. Pick one answer type and make it work there first. Support replies are a good place to begin because they are short, repeat often, and fail in obvious ways when a citation goes missing. Research summaries work too if your team already reviews them by hand.

Set one simple rule before you automate anything else: every factual claim gets a claim ID, and every claim ID points to an exact source span, not a whole document. It sounds strict, but it saves time later when another model changes sentence order, merges ideas, or splits one sentence into two.

A lightweight rule set is enough. Give every factual statement a stable ID. Save the exact quoted span or table cell behind it. Keep the ID attached through drafting, rewriting, and final formatting. Block publication if a final sentence has no mapped source.

Then test the cases that usually break things. Ask the rewrite model to make an answer friendlier, shorter, more formal, or more direct. Ask it to turn bullets into prose and prose into bullets. If attribution survives those changes, the workflow is probably strong enough for daily use.

One practical check helps a lot: compare the final answer against the claim map, not against the first draft. Teams often review the wrong artifact. The first draft may be perfectly grounded while the published version quietly adds a word like "usually" or "most" that the source never said.

If this workflow matters to your product or support team, outside review can save weeks of trial and error. Oleg Sotnikov at oleg.is works with startups and smaller companies on AI-first development systems, and this kind of answer flow review fits naturally into that work.

Start with ten real answers, not a hundred. You will spot the breakpoints faster, write better rules, and end up with a process your team can still follow next week.

Frequently Asked Questions

Why do citations break after a rewrite?

They break because the rewrite changes wording and sentence structure, but the citation marker often stays where the old text used to sit. After that, the marker may point at the wrong fact, only part of a sentence, or a claim that the source never proved.

What should I attach citations to?

Cite claims, not sentences. A sentence is just packaging, but a claim is the thing you need to prove. If you track each claim as its own record, you can rewrite the prose without losing the link to evidence.

What should a fact sheet include?

Keep one claim per row. Add a stable claim ID, the source name, the source date, and the exact quoted span or table cell that supports it. If something looks weak or unclear, add a short review note so someone checks it before you publish.

Do I really need exact source spans?

Use the exact source span whenever you can. A document title alone is too broad, so reviewers have to hunt for proof and models can attach the citation to the wrong line later. A precise quote or table cell gives you a clear anchor.

How do I handle a sentence with two facts in it?

Split the sentence into separate claims before you rewrite it. If one line says the company launched in 2021 and reached 50,000 users, give each fact its own row and its own ID. That makes later edits much safer.

Can one claim use more than one source?

Yes, and you should mark that early. If one source proves the timeline and another proves the scope, keep both attached to the claim instead of forcing one citation to cover everything. That makes review faster and avoids fuzzy attribution.

Should the rewrite model add missing details if they sound right?

No. Let the rewrite model change tone, length, and order, but do not let it invent dates, numbers, causes, or comparisons. If it adds a new fact, verify it first or remove it.

Do claim IDs need to appear in the final answer?

Not unless you want them to. You can keep claim IDs behind the scenes for review and still show clean prose to readers. What matters is that your team can open each ID and see the exact source span right away.

What should I review before publishing?

Check every factual statement against its claim ID and source span. Pay extra attention to names, dates, numbers, and quoted wording, because small edits often bend them. If a reviewer cannot trace a claim to proof in under a minute, fix it before you ship it.

How should a team start using this workflow?

Start with one answer type, like support replies or short research summaries. Build a simple claim table, keep IDs through every draft, and test a few rewrites that change tone and structure. Ten real examples will teach you more than a big plan on paper.