Blog
Agent Memory, Forgetting, and Cost Control in Production AI
Agentic systems should not treat memory as unlimited shared context. Production reliability depends on deliberate forgetting, scoped recall, and economic controls.
Memory Is Not a Bigger Library
Enterprise agent architecture discussions often frame memory as shared context, institutional recall, or a way to preserve useful experience across workflows. That is partly right. Agents do need access to durable knowledge: policies, examples, approved procedures, customer context, workflow history, and verified outcomes. But production systems fail when memory is treated as an ever-growing library that the agent can browse freely.
The better question is not only "what should the agent remember?" It is what must the agent forget to act correctly? Old context can be as dangerous as missing context. A previous exception can become a false precedent. A draft policy can be retrieved instead of the approved one. A customer-specific workaround can leak into a general recommendation. A long conversation can bias the agent toward a plan that no longer matches the current state.
Shared Context Creates Shared Failure
In multi-agent systems, shared memory sounds efficient. One agent learns something, and other agents can use it. In practice, shared context can spread errors across the mesh. If a planning agent stores a flawed decomposition, a tool agent may execute against it, a validation agent may judge the wrong artifact, and a reporting agent may produce a confident explanation of a faulty workflow. The error is no longer local.
Memory should therefore be scoped. Session memory helps maintain a single interaction. Workflow memory records the state of a specific process. Domain memory stores approved knowledge for a bounded business area. Audit memory preserves what happened for reconstruction. Training memory contains human-verified examples. These categories should not be mixed casually. An agent should not use audit logs as instructions, or failed attempts as reusable best practice, unless the system explicitly marks them that way.
Good memory design uses provenance, expiry, access control, and confidence. Every retrieved memory should answer: where did this come from, who approved it, when does it expire, which workflow may use it, and what should happen if it conflicts with a higher-authority source?
Forgetting Is a Safety Mechanism
Forgetting is often treated as a limitation of AI systems. In production, deliberate forgetting is a safety mechanism. Agents should forget temporary reasoning after a task completes. They should forget sensitive details that are not needed for future work. They should forget failed intermediate plans unless those failures are stored as labeled negative examples. They should forget user preferences when those preferences conflict with policy or current facts.
This can be implemented through memory tiers. Short-term scratchpad state should disappear at the end of execution. Workflow state should persist only as long as the process is active. Approved knowledge should live in governed repositories with versioning and owners. Audit evidence should be retained according to regulatory and business requirements, but not made freely available as agent context. Human-verified golden records should be curated separately from raw interaction logs.
In other words, memory architecture should look more like records management than chat history. The goal is not maximum recall. The goal is correct recall.
Memory Also Drives Cost
Memory is not only a reliability issue. It is an economic issue. Larger context windows, repeated retrieval, multi-agent handoffs, and validation calls all increase token consumption. A mesh architecture that passes broad context to every agent can become expensive very quickly, especially when workflows retry or loop.
Production cost control requires context budgeting. Each workflow should define what context is required, what is optional, and what is forbidden. Retrieval should be filtered by metadata before semantic search. Summaries should be created with clear authority labels. Agents should receive the minimum context needed for their role, not the entire conversation or enterprise knowledge base.
Measure cost per successful outcome, not only cost per model call. Include failed runs, retries, human review, logging, evaluation, and infrastructure overhead. If a memory strategy reduces model hallucination but doubles review time and triples token spend, the architecture may still be economically weak.
Use Smaller Models and Procedural Checks
Many memory and routing tasks do not require frontier LLMs. A small language model can classify a ticket domain. A deterministic rule can choose the correct policy hierarchy. A metadata filter can exclude expired documents. A schema validator can detect malformed tool output. A procedural service can compute whether a workflow is within approval limits.
This matters because agent systems often become fragile when every decision is sent to a large model. The result is higher cost, slower response, more variance, and harder debugging. Use the expensive model where open-ended language reasoning is genuinely needed: interpreting ambiguous requests, drafting explanations, comparing trade-offs, or summarizing complex evidence. Use smaller models and deterministic components for the operational plumbing.
The practical architecture is hybrid. Probabilistic components help with language and ambiguity. Deterministic components enforce memory scope, tool rights, policy, and cost limits.
A Memory Governance Checklist
Before deploying an enterprise agent mesh, define a memory governance checklist. Which memory stores exist? Who owns each store? Which agents can read or write? What is the retention period? How are memories labeled as approved, draft, obsolete, private, failed, or audit-only? How are conflicts resolved? Which memories are never sent to an LLM? Which memories can be used for future training or prompt improvement?
These questions may seem operational, but they determine whether the system can scale safely. The failure mode of agent memory is not simply forgetting something useful. The more dangerous failure is remembering the wrong thing with confidence.
Agentic AI will not become production-grade because agents remember more. It will become production-grade when systems remember selectively, forget deliberately, and spend reasoning budget only where it improves the outcome.