Memory Systems for Long-Running Agents
An agent that forgets everything between turns is just an expensive API wrapper. Real agent systems need memory — but not all memory is created equal.
The Three Memory Types
Drawing from cognitive science, we can model agent memory as three distinct systems:
Working Memory is the agent’s current context window. It holds the active task, recent messages, and immediate tool results. This is what most frameworks implement by default — it’s just the prompt.
Episodic Memory stores past interactions as retrievable experiences. When the agent encounters a similar problem, it can recall how it solved it before. Think of it as a searchable log of past sessions.
Semantic Memory captures distilled knowledge — facts, preferences, and learned patterns extracted from episodic memories. This is the agent’s long-term understanding of the world.
Implementing Episodic Memory
The simplest useful episodic memory is a vector store of past interaction summaries:
class EpisodicMemory:
def __init__(self, embedding_model, vector_store):
self.embedder = embedding_model
self.store = vector_store
async def remember(self, interaction: Interaction):
summary = await self.summarize(interaction)
embedding = await self.embedder.embed(summary)
await self.store.upsert(
id=interaction.id,
embedding=embedding,
metadata={
"summary": summary,
"timestamp": interaction.timestamp,
"outcome": interaction.outcome,
"task_type": interaction.task_type,
}
)
async def recall(self, query: str, k: int = 5) -> list[Memory]:
embedding = await self.embedder.embed(query)
return await self.store.query(embedding, top_k=k)
The critical design decision is what to store. Raw transcripts are too noisy. You need a summarization step that extracts the decision-making process, not just the outcome.
Working Memory Management
The context window is finite. Managing it well is the difference between an agent that works and one that loses track mid-task:
class WorkingMemory:
def __init__(self, max_tokens: int = 8000):
self.max_tokens = max_tokens
self.segments = [] # (priority, content, token_count)
def add(self, content: str, priority: int = 1):
tokens = count_tokens(content)
self.segments.append((priority, content, tokens))
self._compact()
def _compact(self):
total = sum(s[2] for s in self.segments)
while total > self.max_tokens:
# Remove lowest priority segment
self.segments.sort(key=lambda s: s[0])
removed = self.segments.pop(0)
total -= removed[2]
Priority-based eviction means system instructions and recent tool results stay, while older conversation history gets dropped first.
Semantic Memory as Learned Preferences
After enough episodic memories accumulate, patterns emerge. Semantic memory extracts these:
- “User prefers TypeScript over JavaScript”
- “Database queries should use parameterized statements”
- “Always check for null before accessing nested properties”
These become persistent instructions that shape future behavior — essentially, the agent learns from experience.
The Memory Pipeline
In practice, these three systems form a pipeline:
- Working memory handles the current turn
- After each session, episodic memory stores a summary
- Periodically, a background job distills episodic memories into semantic memory
- Semantic memories are injected into working memory as system context
The result is an agent that gets better over time — not just at individual tasks, but at understanding the user’s needs and preferences.
Practical Considerations
- Storage costs scale with interaction volume — set retention policies
- Retrieval latency matters — keep vector indices warm
- Privacy requires careful handling — users should be able to delete memories
- Staleness is a real problem — semantic memories need expiration dates
Subscribe to the newsletter
Get notified when I publish new articles about agent systems and AI engineering. No spam, unsubscribe anytime.