Context Engine

The context engine enhances your AI interactions through three mechanisms: conversation compression, cross-profile context sharing, and local RAG (Retrieval-Augmented Generation).

Conversation Compression

When conversations grow beyond a token threshold, Claudex uses an LLM to summarize older messages, keeping recent ones intact.

[context.compression]
enabled = true
threshold_tokens = 50000    # compress when total tokens exceed this
keep_recent = 10            # always keep the last N messages
profile = "openrouter"      # reuse a profile's base_url + api_key
model = "qwen/qwen-2.5-7b-instruct"  # override model (optional)

How It Works

Before forwarding a request, Claudex estimates total token count
If tokens exceed threshold_tokens, older messages (beyond keep_recent) are replaced with a summary
The summary is generated by the configured local LLM
The compressed conversation is then forwarded to the provider

Share context across different provider profiles within the same session.

[context.sharing]
enabled = true
max_context_size = 2000    # max tokens to inject from other profiles

This is useful when switching between providers mid-task — relevant context from previous interactions is automatically included.

Local RAG

Index local code and documentation for retrieval-augmented generation. Relevant code snippets are automatically injected into requests.

[context.rag]
enabled = true
index_paths = ["./src", "./docs"]     # directories to index
profile = "openrouter"                 # reuse a profile's base_url + api_key
model = "openai/text-embedding-3-small"  # embedding model
chunk_size = 512                       # text chunk size
top_k = 5                             # number of results to inject

How It Works

On startup, Claudex indexes files in index_paths using the embedding model
For each request, the user’s message is embedded and compared against the index
The top-k most relevant chunks are injected as additional context in the request
The provider receives richer context about your codebase

Context Engine

Conversation Compression

How It Works

Cross-Profile Sharing

Local RAG

How It Works