Conversation Memory
Copy page
Understand how conversation history is managed and included in the context window for both main and delegated agents
Overview
Conversation memory determines how much of the conversation history is included in the context window when your Agent processes a new message. The Inkeep Agent Framework automatically manages conversation history to balance context retention with token efficiency, with specialized handling for delegated agents and tool results.
What's Included in Memory
The conversation history now includes:
- Chat messages: User messages and agent responses
- Tool results: Results from tool executions, providing context about what actions were performed
- Agent communications: Messages exchanged between agents during transfers and delegations
Default Limits
By default, the system includes conversation history using these limits:
- 50 messages: Up to the 50 most recent messages from the conversation
- 8,000 tokens: Maximum of 8,000 tokens from previous conversation messages
The 50-message and 8,000-token limits are the default values. The token limit can be adjusted via the AGENTS_CONVERSATION_HISTORY_MAX_OUTPUT_TOKENS_DEFAULT environment variable if needed.
How It Works
Message Retrieval: The system retrieves up to 50 most recent messages from the conversation history
Delegation Filtering: Messages are filtered based on delegation context - delegated agents see their own tool results plus top-level conversation context
Token Calculation: Remaining messages are processed, calculating token count for each message
Exclusion: If the total token count exceeds 4,000 tokens, older messages are excluded from the context window
Memory for Delegated Agents
When agents delegate tasks to other agents, memory is intelligently filtered:
Main Agents
- See complete conversation history including all tool results
- Maintain full context of delegated actions and their results
Delegated Agents
- See conversation history filtered to their delegation scope
- Receive tool results from:
- Their own tool executions
- Top-level (non-delegated) tool executions
- Cannot see tool results from unrelated delegations
This ensures delegated agents have sufficient context while preventing memory pollution from unrelated parallel delegations.
Tool Results in Memory
Tool execution results are automatically included in conversation history, helping agents:
- Understand what actions have already been performed
- Avoid duplicate tool calls
- Build on previous results when transferring between agents
The tool results include both the input parameters and output results, formatted as:
Conversation Compacting System
For very long conversations that exceed model context limits, the framework includes an intelligent compacting system that automatically manages memory by condensing older messages while preserving essential context.
How Compacting Works
The compacting system activates automatically when conversations approach token limits:
Context Monitoring: System continuously monitors conversation size against model limits
Automatic Triggering: Compacting triggers at 50% of context window for conversation-level, or at model-aware thresholds (~75-91% depending on model size) for sub-agent generation
Tool Result Archiving: Large tool results are stored as artifacts and replaced with summary references
AI Summarization: Older conversation parts are summarized by AI while preserving key context
Fallback Protection: If AI summarization fails, system falls back to simple message truncation
Model-Specific Behavior
Different models have different context windows, and compacting adapts accordingly:
| Model | Context Window | Conversation Threshold | Sub-Agent Generation Threshold |
|---|---|---|---|
| GPT-5.1 | 400K tokens | 200K (50%) | ~332K (83%) |
| Claude-4.5 Sonnet | 200K tokens | 100K (50%) | ~166K (83%) |
| Gemini 3 Flash | 1M tokens | 500K (50%) | ~910K (91%) |
Compacting Types
Conversation-Level Compacting
- Trigger: When conversation reaches 50% of model's context window
- Action: Compacts entire conversation history into summary + artifacts
- Use Case: Long conversations with extensive history
Sub-Agent Generation Compacting
- Trigger: During sub-agent execution when tool results exceed model-aware limits (75-91% depending on model size)
- Action: Compacts generated tool results while preserving original context
- Use Case: Sub-agents performing many tool operations during generation
Compacting happens automatically and transparently. Your agents will continue to work normally even with compacted conversations, as the system preserves all essential context and provides artifact references for detailed information.