Context Window Management for Multi-Turn Agents

An agent that can hold a 20-turn conversation but forgets what happened in turn 3 isn’t having a conversation — it’s having 17 separate interactions.

The Context Window Problem

Every model has a finite context window. As conversations grow, you face a choice: truncate history (lose information) or summarize it (lose nuance). Neither is ideal, but smart strategies make the trade-off manageable.

Sliding Window with Summaries

The most practical approach combines recent messages with compressed history:

class ContextManager:
    def __init__(self, max_tokens=6000, summary_threshold=4000):
        self.messages = []
        self.summary = ""
        self.max_tokens = max_tokens
        self.summary_threshold = summary_threshold

    def add_message(self, role, content):
        self.messages.append({"role": role, "content": content})
        if self.token_count() > self.summary_threshold:
            self._compress()

    def _compress(self):
        # Keep the last 4 messages, summarize the rest
        to_summarize = self.messages[:-4]
        self.summary = summarize_messages(self.summary, to_summarize)
        self.messages = self.messages[-4:]

    def build_prompt(self):
        parts = []
        if self.summary:
            parts.append(f"Previous conversation summary:\n{self.summary}")
        parts.extend([f"{m['role']}: {m['content']}" for m in self.messages])
        return "\n\n".join(parts)

The key insight: summarize early and often. Don’t wait until you hit the limit — by then you’ve already lost context in a way that’s hard to recover from.

Priority-Based Eviction

Not all context is equally important. A priority system ensures critical information survives compression:

Priority 1 (never evict): System instructions, tool schemas, active task description Priority 2 (summarize): Previous tool results, conversation history Priority 3 (evict first): Verbose tool outputs, error traces, exploration paths that led nowhere

Chunked Tool Outputs

Large tool outputs are the most common context budget buster. Chunk them aggressively:

def smart_truncate(output, max_chars=2000):
    if len(output) <= max_chars:
        return output

    # Keep the beginning (usually most relevant) and end (conclusions)
    head = output[:max_chars // 2]
    tail = output[-max_chars // 4:]

    return f"{head}\n\n[... {len(output) - len(head) - len(tail)} characters truncated ...]\n\n{tail}"

The Compression-Fidelity Trade-off

Every summarization step loses information. Track what you’re losing:

Entity preservation — are key names, numbers, and identifiers surviving compression?
Decision context — does the summary explain why decisions were made, not just what was decided?
Actionable details — can the agent act on the summary alone, or does it need the original?

The sweet spot is summaries that preserve decisions and their rationale while discarding the exploration that led to them.