上下文引擎

上下文引擎通过三种机制增强 AI 交互：对话压缩、跨 profile 上下文共享和本地 RAG（检索增强生成）。

对话压缩

当对话超过 token 阈值时，Claudex 使用 LLM 压缩旧消息，保留最近的消息不变。

[context.compression]
enabled = true
threshold_tokens = 50000    # token 总数超过此值时压缩
keep_recent = 10            # 始终保留最近 N 条消息
profile = "openrouter"      # 复用此 profile 的 base_url + api_key
model = "qwen/qwen-2.5-7b-instruct"  # 覆盖模型（可选）

工作流程

转发请求前，Claudex 估算总 token 数
若超过 threshold_tokens，keep_recent 之前的旧消息被替换为摘要
摘要由配置的本地 LLM 生成
压缩后的对话转发到提供商

跨 Profile 共享

在同一会话中跨不同提供商 profile 共享上下文。

[context.sharing]
enabled = true
max_context_size = 2000    # 从其他 profile 注入的最大 token 数

在任务中切换提供商时很有用 — 之前交互的相关上下文会自动包含。

本地 RAG

索引本地代码和文档，用于检索增强生成。相关代码片段自动注入请求。

[context.rag]
enabled = true
index_paths = ["./src", "./docs"]     # 要索引的目录
profile = "openrouter"                 # 复用此 profile 的 base_url + api_key
model = "openai/text-embedding-3-small"  # embedding 模型
chunk_size = 512                       # 文本块大小
top_k = 5                             # 注入的结果数量

工作流程

启动时，Claudex 使用 embedding 模型索引 index_paths 中的文件
对每个请求，用户消息被嵌入并与索引进行比较
最相关的 top-k 个片段作为额外上下文注入请求
提供商获得关于你代码库的更丰富上下文