Memory Systems for Long-Running Agents

An agent that forgets everything between turns is just an expensive API wrapper. Real agent systems need memory — but not all memory is created equal.

The Three Memory Types

Drawing from cognitive science, we can model agent memory as three distinct systems:

Working Memory is the agent’s current context window. It holds the active task, recent messages, and immediate tool results. This is what most frameworks implement by default — it’s just the prompt.

Episodic Memory stores past interactions as retrievable experiences. When the agent encounters a similar problem, it can recall how it solved it before. Think of it as a searchable log of past sessions.

Semantic Memory captures distilled knowledge — facts, preferences, and learned patterns extracted from episodic memories. This is the agent’s long-term understanding of the world.

Implementing Episodic Memory

The simplest useful episodic memory is a vector store of past interaction summaries:

class EpisodicMemory:
    def __init__(self, embedding_model, vector_store):
        self.embedder = embedding_model
        self.store = vector_store

    async def remember(self, interaction: Interaction):
        summary = await self.summarize(interaction)
        embedding = await self.embedder.embed(summary)
        await self.store.upsert(
            id=interaction.id,
            embedding=embedding,
            metadata={
                "summary": summary,
                "timestamp": interaction.timestamp,
                "outcome": interaction.outcome,
                "task_type": interaction.task_type,
            }
        )

    async def recall(self, query: str, k: int = 5) -> list[Memory]:
        embedding = await self.embedder.embed(query)
        return await self.store.query(embedding, top_k=k)

The critical design decision is what to store. Raw transcripts are too noisy. You need a summarization step that extracts the decision-making process, not just the outcome.

Working Memory Management

The context window is finite. Managing it well is the difference between an agent that works and one that loses track mid-task:

class WorkingMemory:
    def __init__(self, max_tokens: int = 8000):
        self.max_tokens = max_tokens
        self.segments = []  # (priority, content, token_count)

    def add(self, content: str, priority: int = 1):
        tokens = count_tokens(content)
        self.segments.append((priority, content, tokens))
        self._compact()

    def _compact(self):
        total = sum(s[2] for s in self.segments)
        while total > self.max_tokens:
            # Remove lowest priority segment
            self.segments.sort(key=lambda s: s[0])
            removed = self.segments.pop(0)
            total -= removed[2]

Priority-based eviction means system instructions and recent tool results stay, while older conversation history gets dropped first.

Semantic Memory as Learned Preferences

After enough episodic memories accumulate, patterns emerge. Semantic memory extracts these:

“User prefers TypeScript over JavaScript”
“Database queries should use parameterized statements”
“Always check for null before accessing nested properties”

These become persistent instructions that shape future behavior — essentially, the agent learns from experience.

The Memory Pipeline

In practice, these three systems form a pipeline:

Working memory handles the current turn
After each session, episodic memory stores a summary
Periodically, a background job distills episodic memories into semantic memory
Semantic memories are injected into working memory as system context

The result is an agent that gets better over time — not just at individual tasks, but at understanding the user’s needs and preferences.

Practical Considerations

Storage costs scale with interaction volume — set retention policies
Retrieval latency matters — keep vector indices warm
Privacy requires careful handling — users should be able to delete memories
Staleness is a real problem — semantic memories need expiration dates