Conversation Memory

Copy page

Understand how conversation history is managed and included in the context window for both main and delegated agents

Overview

Conversation memory determines how much of the conversation history is included in the context window when your Agent processes a new message. The Inkeep Agent Framework automatically manages conversation history to balance context retention with token efficiency, with specialized handling for delegated agents and tool results.

What's Included in Memory

The conversation history now includes:

  • Chat messages: User messages and agent responses
  • Tool results: Results from tool executions, providing context about what actions were performed
  • Agent communications: Messages exchanged between agents during transfers and delegations

Default Limits

By default, the system includes conversation history using these limits:

  • 50 messages: Up to the 50 most recent messages from the conversation
  • 8,000 tokens: Maximum of 8,000 tokens from previous conversation messages
Note
Note

The 50-message and 8,000-token limits are the default values. The token limit can be adjusted via the AGENTS_CONVERSATION_HISTORY_MAX_OUTPUT_TOKENS_DEFAULT environment variable if needed.

How It Works

Message Retrieval: The system retrieves up to 50 most recent messages from the conversation history

Delegation Filtering: Messages are filtered based on delegation context - delegated agents see their own tool results plus top-level conversation context

Token Calculation: Remaining messages are processed, calculating token count for each message

Exclusion: If the total token count exceeds 4,000 tokens, older messages are excluded from the context window

Memory for Delegated Agents

When agents delegate tasks to other agents, memory is intelligently filtered:

Main Agents

  • See complete conversation history including all tool results
  • Maintain full context of delegated actions and their results

Delegated Agents

  • See conversation history filtered to their delegation scope
  • Receive tool results from:
    • Their own tool executions
    • Top-level (non-delegated) tool executions
  • Cannot see tool results from unrelated delegations

This ensures delegated agents have sufficient context while preventing memory pollution from unrelated parallel delegations.

Tool Results in Memory

Tool execution results are automatically included in conversation history, helping agents:

  • Understand what actions have already been performed
  • Avoid duplicate tool calls
  • Build on previous results when transferring between agents

The tool results include both the input parameters and output results, formatted as:

## Tool: search_knowledge_base

**Input:**
{
  "query": "API authentication methods"
}

**Output:**
{
  "results": [...]
}

Conversation Compacting System

For very long conversations that exceed model context limits, the framework includes an intelligent compacting system that automatically manages memory by condensing older messages while preserving essential context.

How Compacting Works

The compacting system activates automatically when conversations approach token limits:

Context Monitoring: System continuously monitors conversation size against model limits

Automatic Triggering: Compacting triggers at 50% of context window for conversation-level, or at model-aware thresholds (~75-91% depending on model size) for sub-agent generation

Tool Result Archiving: Large tool results are stored as artifacts and replaced with summary references

AI Summarization: Older conversation parts are summarized by AI while preserving key context

Fallback Protection: If AI summarization fails, system falls back to simple message truncation

Model-Specific Behavior

Different models have different context windows, and compacting adapts accordingly:

ModelContext WindowConversation ThresholdSub-Agent Generation Threshold
GPT-5.1400K tokens200K (50%)~332K (83%)
Claude-4.5 Sonnet200K tokens100K (50%)~166K (83%)
Gemini 3 Flash1M tokens500K (50%)~910K (91%)

Compacting Types

Conversation-Level Compacting

  • Trigger: When conversation reaches 50% of model's context window
  • Action: Compacts entire conversation history into summary + artifacts
  • Use Case: Long conversations with extensive history

Sub-Agent Generation Compacting

  • Trigger: During sub-agent execution when tool results exceed model-aware limits (75-91% depending on model size)
  • Action: Compacts generated tool results while preserving original context
  • Use Case: Sub-agents performing many tool operations during generation
Note
Note

Compacting happens automatically and transparently. Your agents will continue to work normally even with compacted conversations, as the system preserves all essential context and provides artifact references for detailed information.

On this page