Jan 23, 2026·7 min read

Vendor-neutral agent memory for federated agents

Vendor-neutral agent memory keeps facts and task state outside any model, so federated agents can switch providers without losing context or history.

Vendor-neutral agent memory for federated agents

Why agents forget after a model switch

Most agents do not remember anything by themselves. They seem consistent because the app keeps sending hidden chat history, summaries, tool outputs, and old instructions back to the model on every turn. When that flow changes, the "memory" disappears.

A provider switch often breaks that flow. The new model may have a different context window, prefer shorter summaries, interpret system prompts differently, or call tools in another format. Even if your app logic stays the same, the model may pay attention to different parts of the conversation. Details that once felt stable suddenly drop out.

You see this quickly in support. A customer says, "Please send updates to our Berlin office, not London," and the agent gets it right for a few replies. Then you switch models, that note sits deep in the transcript, and the next reply asks for the address again or sends the update to the wrong office.

Sales follow-ups fail the same way. A prospect says they want a demo next Tuesday and only care about API access, not pricing. If that lives only in the conversation, a new provider may miss it and send a generic note about plans, discounts, or the wrong meeting date.

That is the difference between model memory and business data. Business data is what your company needs to keep straight: customer preferences, order status, next action date, account owner, open issues. Model memory is temporary context unless you store those facts somewhere outside the model.

That is why vendor-neutral agent memory matters. When facts and task state live in a database, ticket system, or CRM, you can swap models without losing the thread. The model still helps with writing and reasoning, but it is no longer the only place where your agent "knows" anything.

What to store outside the model

A model should not be the only place where your agents remember things. If facts, open work, and past tool output live only inside prompts, a provider switch can wipe context in a day. Vendor-neutral agent memory starts with a shared store that any model can read and update.

Start with durable customer facts. Names, account details, product choices, contract terms, preferences, and known constraints belong in records, not in chat history. Each fact should also include when you learned it and where it came from. A timestamp and source note save a lot of confusion when two agents disagree or a customer changes something later.

Agents also need a place to track work in progress. Many teams save the conversation but forget to save the work state. Then a new model has to reconstruct what happened from messy history, and it guesses wrong.

A simple task record usually needs:

  • a current status such as open, waiting, blocked, or done
  • recent tool results, like an API check or document lookup
  • decisions that were already made, and by whom
  • deadlines, reminders, and due dates
  • the next action and its owner

Tool results matter more than people expect. If an agent already checked inventory, pulled a billing record, or validated a support ticket, save that result with the task. Do not force the next agent to run the same checks again unless the data expired. That cuts cost, saves time, and reduces conflicting answers.

Keep the format plain. A small database row or JSON document works well if it is easy to query and update. The goal is simple: any agent, on any provider, should be able to load the same customer facts, see the same task state, and continue from the same next action without guessing.

Separate facts, task state, and history

If everything goes into one chat log, agents start mixing solid facts with rough guesses. Memory gets messy fast. A cleaner setup gives facts, task state, and history different jobs.

Verified facts need the strictest rules. Store only details you can confirm, such as a customer ID, product plan, billing status, or shipping address. If an agent infers something from context, keep it in a separate field such as "assumption" or do not save it until a person or another trusted system confirms it.

Long conversation history should stay out of the working memory that agents read first. Raw logs help with audits and edge cases, but they are slow to search and full of noise. Most agents do better with a short summary of what matters now, plus a pointer to the full transcript when they need detail.

Task state works best in fixed fields, not in free text. Instead of a note like "waiting on customer, refund maybe tomorrow," store structured values such as status: waiting_for_reply, owner: billing_agent, next_action: send_refund_form, due_at: 2026-04-15. That makes handoffs much safer when different models read the same record.

The split can stay simple:

  • Facts: confirmed details with source and timestamp
  • Task state: current status, owner, deadline, next action
  • History: full transcript and a short rolling summary

Agents also need clear read and write rules. A triage agent may read everything but only write a draft summary. A billing agent may update invoice status but not change the customer profile. A supervisor agent may turn an assumption into a verified fact after checking it against a trusted source.

A support case makes this concrete. If one agent says, "The customer sounds upset and probably wants a refund," that is not a fact. The fact is "Customer asked about a failed charge." The task state is "billing review open." The history keeps the exact wording. When you switch models, that separation keeps the new agent grounded.

Build a shared memory API

A shared memory setup works best when agents do not talk to the database directly. Put a thin API in the middle. Every agent, no matter which model runs it, should read and write through the same small set of endpoints.

That saves a lot of cleanup later. If you move from one model provider to another, the storage layer stays the same, and the new agent can pick up work without guessing how old data was stored.

The API should enforce one schema for every provider. Keep it plain. A fact is a fact, a task is a task, and an event is an event whether the request comes from Claude, GPT, or an open model. If one model returns long prose and another returns dense JSON, normalize both before you save anything.

Store prompts, instructions, and memory rules in your own system, not inside a provider dashboard. Then your team can version them, review changes, and apply the same rules everywhere. That is vendor-neutral agent memory in practice.

Each task record only needs a few things: an ID, status, owner, timestamps, durable facts, recent tool calls and results, a short rolling summary, and the next action. Save tool logs next to the task they changed. If an agent updated a ticket, sent an email, checked an order, or changed a CRM field, the next agent should see that action without rereading the full transcript.

For chat history, store less. Whole conversations get noisy fast, and they often carry provider-specific formatting that ages badly. A short summary plus the last few messages is usually enough.

Think about a refund case. The memory layer does not need every reply the customer sent over three days. It needs the stable facts: duplicate charge reported, refund requested, payment checked, refund pending bank confirmation. Any model can use that and keep moving.

How to build it step by step

Fix Sales Follow Ups
Keep demo dates, owners, and buyer notes steady across model changes.

Start small. If you try to design the perfect schema on day one, you will waste time and still miss something real users need.

Pick one workflow that matters, such as support triage, bug follow-up, or sales handoff. Then give every agent the same memory rules, no matter which model you use underneath.

Write down the facts every agent must know before it can do useful work. Keep this short: customer name, account tier, open issues, current owner, recent decision, and any hard business rule. If a fact changes often, store the latest value instead of making the model infer it from old chat.

Pick stable IDs early. Give each user, task, project, and conversation its own ID, and use those IDs everywhere. It sounds boring, but it prevents a lot of confusion when one provider calls something a "thread" and another calls it a "session."

Set write rules before you add more agents. Decide who can create facts, who can update task state, and who can only append notes. If two agents can rewrite the same field without rules, the memory layer turns into a mess.

Then test one workflow through two providers. Let Claude handle the first pass and GPT handle the retry, or swap the order the next day. Both agents should read the same facts, update the same task record, and leave the same sort of audit note.

After that, tighten the schema. Look for fields one model ignored, fields both models rewrote badly, or details they kept burying in free text. When that happens, add structure. A plain status field usually beats a paragraph.

Keep history separate from active state. History helps with review. Active state helps agents act now.

One test works well: pause the workflow halfway, swap providers, and see if the second agent finishes cleanly. If it asks the same questions again, loses ownership, or repeats work, the model is not the problem. Your memory design still depends too much on the provider.

Example: a support team

A support inbox shows why vendor-neutral agent memory matters. A customer writes in and asks for a refund because an order arrived late and one item was missing.

The first agent reads the email and saves facts to a shared record instead of stuffing them into a prompt and hoping the next model remembers. It stores the order number, customer ID, refund reason, missing item, stated deadline, and customer tone. It also opens a task with a status such as "awaiting order check."

The second agent does a different job. It checks the order system, confirms payment, looks up shipment scans, and notes whether the package arrived after the promised date. If the warehouse already marked one item as short shipped, that fact goes into the same record.

Now the memory layer holds customer-reported facts, verified facts from internal systems, the current task status and owner, and a short decision trail.

A third agent drafts the reply. It does not need the full email thread or the private logic of the other agents. It reads the shared facts, sees that the order was late by three days, sees that one item never shipped, and writes a clear refund message.

Then the team changes the reply model. Maybe the old model costs too much, or the team wants a different tone. Nothing breaks. The new model reads the same task record, the same verified facts, and the same policy notes. It can draft the next reply without asking the first two agents to repeat their work.

The support manager still sees one clean case file. They do not chase context across tools or models. They can open the task and check who touched it, what the system verified, what refund amount the team approved, and whether the customer already got a response.

That is the point of an agent memory layer. Models can change. The case record stays put, and the team keeps moving.

Mistakes that break portability

Start With One Workflow
Pick one broken process and build memory rules your team can use.

Portability usually fails for boring reasons. Teams switch providers and then discover that the old agent memory lived inside prompts, chat logs, or a provider-specific tool. The model changed, but the memory did not come with it.

The first mistake is saving everything inside transcripts. A long conversation can contain facts, decisions, open tasks, and wrong guesses all mixed together. That works until you move to another provider and discover the new system cannot reliably recover what matters. If you want vendor-neutral agent memory, pull durable information out of the transcript and store it in plain structures your team controls.

Another common problem is letting each provider keep its own memory format. One system stores notes as tool calls, another stores them as hidden summaries, and a third keeps them in custom metadata. Now every switch needs a translator, and translators always lose something. A shared memory layer should look the same whether your federated agents use Claude, GPT, or an open model.

Teams also break portability when they mix model guesses with verified facts. An agent might infer that a customer already approved a change, or guess the cause of an outage from partial logs. If that guess lands in the same table as confirmed facts, the next agent treats it as truth. Keep a hard line between "the model thinks" and "we know."

Task state causes a quieter failure. Many teams overwrite it instead of versioning changes. That feels simple, but it erases who changed the task, when they changed it, and what the previous state was. When two agents disagree, you need a history, not just the latest value.

A few fields prevent a lot of pain: source, owner, timestamp, verification status, and version. Without them, memory turns into rumor.

A support team can survive one provider switch with messy data. By the second switch, work starts to vanish, repeat, or contradict itself. At that point, portability stops being an architecture idea and becomes an operations problem.

Checks before you change providers

Build Better AI Handoffs
Get help turning rough transcripts into records your team can trust.

A provider switch should feel boring. If one model leaves and another takes over, the work should continue with the same facts, the same task status, and a clear record of who changed what. If that breaks, your memory layer still depends on the model more than you think.

Test this on a real task, not a clean demo. First, ask the new model to complete work using only the stored facts and current task state. If it still needs hidden prompt glue or a custom rewrite, your format is tied to the old provider.

Next, replay one finished task from the saved record alone. Another model should see the open steps, understand the latest status, and choose the same next action without guessing.

Then interrupt one agent in the middle of an update. The shared record should remain intact, with either the old state or the new state, not a half-written mix of both.

It also helps to put one wrong fact into the system on purpose, such as a customer's plan or a project deadline. Someone on the team should be able to correct it quickly without editing prompts or touching model settings.

Finally, check whether every update has a writer, a timestamp, and a short reason. When memory changes, your team should know which agent or person made the change.

A support workflow makes weak spots obvious. One agent marks a ticket urgent, another drafts a reply, and a third assigns the next action. After a provider change, the new model should still understand why the ticket is urgent, what reply is pending, and who changed the record last.

If it cannot do that, you do not have vendor-neutral agent memory. You have a transcript that only works with one model's habits.

Teams that handle this well keep facts, task state, and edit history in plain structures any model can read. It sounds simple, but it is the difference between a clean switch and a week of confusion. When the stored record can stand on its own, you can change providers for cost, speed, or quality without losing your place.

Next steps for your team

Pick one workflow that wastes time every week and start there. Good first candidates are support triage, sales follow-up, or bug handoff from chat to your tracker. If a process already causes missed context, repeated questions, or manual copy and paste, it is a good place to test vendor-neutral agent memory.

Keep the first schema small and boring. Most teams only need a few fields at the start: customer facts, current task status, owner, last decision, and a short summary of what happened. If you try to store everything, the memory layer turns into a junk drawer and people stop trusting it.

A simple rollout is enough. Choose one workflow and one team. Define the minimum fields the agent must read and write. Store memory outside the model in a database your team already understands. Then run the same workflow with two model providers before launch and log failures in plain language.

That provider swap matters more than many teams expect. If one model handles the workflow well but the second loses task state, writes facts in a different shape, or breaks your prompts, you are still tied to a vendor. A portable AI architecture should survive a swap with only small prompt and routing changes.

It also helps to get a second pair of eyes on storage, rollout order, and failure cases. Small design mistakes can create weeks of cleanup later. Teams often mix permanent facts with temporary task state, then watch old session data leak into new work. Another common problem is ownership: nobody knows who cleans bad memory entries or updates the schema when the workflow changes.

If you want outside help, Oleg Sotnikov at oleg.is works as a fractional CTO and startup advisor for startups and smaller businesses. He helps teams design practical AI-first development workflows and shared memory setups that can survive real provider changes, not just a demo.

Frequently Asked Questions

What does vendor-neutral agent memory mean?

It means your agent keeps facts and work state in your own storage instead of leaving them inside one model's prompt history. When you change providers, the new model reads the same record and keeps going.

Why do agents forget after a model switch?

Agents only look consistent because your app keeps feeding them history, summaries, and tool output. Change the model, and it may read that context differently, miss older details, or expect another tool format.

What should we store outside the model?

Store durable facts and active work. That usually includes customer details, preferences, order or account status, recent tool results, the current owner, the next action, and any due date.

How do we stop agents from mixing facts with guesses?

Keep facts and assumptions apart from the start. Save confirmed details with a source and timestamp, and put guesses in a separate field until a person or trusted system confirms them.

Do we need the full chat history in working memory?

No. Save the full transcript for audits, but let agents read a short summary and the last few messages first. That keeps memory smaller and cuts noise.

Should we save tool results too?

Yes, if the result affects the next step. When an agent checks inventory, validates a payment, or updates a ticket, save that result next to the task so the next model does not repeat the same work.

Why use a shared memory API instead of direct database access?

A thin API keeps every model on the same schema and write rules. It also gives you one place to normalize messy model output before it reaches your records.

What is the smallest schema we can start with?

Start with a few fields: task ID, status, owner, timestamps, confirmed facts, recent tool results, a short summary, and the next action. Keep it plain so your team can query and update it without extra glue.

How do we test portability before changing providers?

Run one real workflow through two providers and pause it halfway. If the second model can read the record, pick the same next action, and finish without asking the same questions again, your design holds up.

Which workflow should we start with first?

Support triage, sales follow-up, and bug handoff make good first tests because they break fast when memory gets messy. Pick the workflow that already causes repeat questions, missed details, or manual copy and paste.