【Agent 学习 s06】为什么你的 AI 助手聊着聊着就"失忆"了?上下文压缩完全解析
🧠 你有没有遇到过这种情况:和 AI 聊得好好的,突然它就开始胡言乱语,或者直接报错”上下文已满”?这不是 AI 的锅,是你的 Harness 没做好压缩!本文带你深入理解三层压缩策略(micro_compact、auto_compact、manual_compact),学会让 AI 拥有”无限记忆”的秘诀。
s01 > s02 > s03 > s04 > s05 < [ s06 ] > s07 | s08 > s09 > s10 > s11 > s12
📚 学习目标
学完本篇,你将能够:
-
✅ 理解上下文窗口的限制与挑战 -
✅ 掌握三层压缩策略的设计思想 -
✅ 学会实现 micro_compact、auto_compact、manual_compact -
✅ 理解 token 估算与阈值触发机制 -
✅ 动手实现一个带上下文压缩的 Agent
🤔 从问题开始学习
思考一下
假设你正在开发一个编程助手 Agent,用户开始了一个长对话:
用户:帮我创建一个 React 项目Agent:好的,创建完成... [对话继续]用户:添加一个登录页面Agent:已添加... [对话继续]用户:实现用户认证Agent:正在实现... [对话继续]... 50 轮对话后 ...用户:优化一下性能Agent:[错误] 上下文窗口已满,无法继续
问题来了:如何让 Agent 支持无限长度的对话?
核心矛盾
|
|
|
|---|---|
| 上下文有限 |
|
| 对话增长 |
|
| 质量下降 |
|
| 成本上升 |
|
💡 学习笔记:压缩不是简单删除,而是保留关键信息、丢弃冗余细节。就像人类记忆——记住要点,忘记细节。
🎯 解决方案:三层压缩策略
压缩策略概览
┌─────────────────────────────────────────────────────────────┐│ 三层压缩策略 ││ ================== ││ ││ Layer 1: micro_compact(微观压缩) ││ ──────────────────────────────── ││ 时机:每次 LLM 调用前 ││ 动作:清理 tool_result 中的大块内容 ││ 效果:减少 30-50% 上下文 ││ ││ Layer 2: auto_compact(自动压缩) ││ ──────────────────────────────── ││ 时机:token 估算超过阈值时 ││ 动作:让 LLM 总结历史对话 ││ 效果:保留关键信息,丢弃冗余 ││ ││ Layer 3: manual_compact(手动压缩) ││ ──────────────────────────────── ││ 时机:用户或 Agent 主动触发 ││ 动作:按指定焦点总结 ││ 效果:精确控制保留内容 ││ │└─────────────────────────────────────────────────────────────┘
为什么是三层?
单一压缩的问题:- 压缩太频繁 → 丢失上下文 → 质量下降- 压缩太稀疏 → 上下文爆炸 → 成本失控三层策略的优势:- Layer 1:持续清理,防止积累- Layer 2:自动触发,无需干预- Layer 3:精确控制,保留重点
📖 概念理解:压缩机制详解
Layer 1: micro_compact(微观压缩)
核心思想:tool_result 往往包含大量临时数据,用完即可丢弃
defmicro_compact(messages: list):""" 微观压缩:清理 tool_result 中的大块内容 为什么需要? - bash 输出可能几千行 - read_file 返回整个文件 - 这些内容用完就没用了 策略: - 保留前 500 字符(用于理解上下文) - 保留后 500 字符(用于理解结果) - 中间用 "... (X chars truncated)" 替代 """for msg in messages:if msg.get("role") == "user"andisinstance(msg.get("content"), list):for block in msg["content"]:if block.get("type") == "tool_result": content = block.get("content", "")iflen(content) > 1000: block["content"] = ( content[:500] +f"\n... ({len(content) - 1000} chars truncated)\n" + content[-500:] )
效果示例:
压缩前:tool_result: "line1\nline2\n...line5000\n" # 50000 字符压缩后:tool_result: "line1\n...line10\n... (48000 chars truncated)\n...line4990\nline5000\n" # 2000 字符
Layer 2: auto_compact(自动压缩)
核心思想:当上下文接近上限时,让 LLM 自己总结历史
defauto_compact(messages: list) -> list:""" 自动压缩:让 LLM 总结历史对话 触发条件:estimate_tokens(messages) > THRESHOLD 工作流程: 1. 提取历史对话 2. 调用 LLM 生成总结 3. 用总结替换历史 4. 保留最近的几轮对话 """# 保留最近的对话(不压缩) keep_recent = 4iflen(messages) <= keep_recent:return messages to_compress = messages[:-keep_recent] recent = messages[-keep_recent:]# 构建总结提示 summary_prompt = """Summarize the conversation so far.Focus on:- Key decisions made- Important context discovered- Current task status- Files modifiedBe concise but preserve essential information."""# 调用 LLM 生成总结 response = client.messages.create( model=MODEL, messages=to_compress + [{"role": "user", "content": summary_prompt}], max_tokens=2000, ) summary = response.content[0].text# 构建新的消息历史return [ {"role": "user", "content": f"[Previous conversation summary]\n{summary}"}, {"role": "assistant", "content": "Understood. I have the context from the summary. Continuing."}, ] + recent
效果示例:
压缩前:50 轮对话,100000 tokens压缩后:[Previous conversation summary]- Created React project with TypeScript- Added login page with form validation- Implemented JWT authentication- Current task: Optimizing performance- Modified files: src/App.tsx, src/auth/Login.tsx, src/api/auth.tsUnderstood. I have the context from the summary. Continuing.[最近 4 轮对话...]总计:5000 tokens
Layer 3: manual_compact(手动压缩)
核心思想:用户或 Agent 可以主动触发压缩,指定保留重点
# compact 工具定义{"name": "compact","description": "Trigger manual conversation compression.","input_schema": {"type": "object","properties": {"focus": {"type": "string","description": "What to preserve in the summary" } } }}# 工具处理器defhandle_compact(focus: str = None):""" 手动压缩:按指定焦点总结 使用场景: - Agent 完成一个阶段任务,想清理上下文 - 用户输入 /compact 命令 - Agent 发现上下文影响效率 """ prompt = f"Summarize the conversation, focusing on: {focus}"if focus else"Summarize the conversation"# ... 调用 auto_compact 逻辑
🔧 工作原理:压缩架构
架构图
上下文压缩架构 ============== +------------------------------------------+ | Agent Loop | | +------------------------------------+ | | | while True: | | | | # Layer 1: 每次调用前清理 | | | | micro_compact(messages) | | | | | | | | # Layer 2: 超过阈值自动压缩 | | | | if estimate_tokens() > THRESHOLD:| | | | messages = auto_compact() | | | | | | | | response = llm(messages) | | | | messages.append(response) | | | | | | | | # Layer 3: 检测手动压缩触发 | | | | if tool == "compact": | | | | messages = auto_compact() | | | +------------------------------------+ | +------------------------------------------+ | v +------------------------------------------+ | Token 估算器 | | +------------------------------------+ | | | def estimate_tokens(messages): | | | | total = 0 | | | | for msg in messages: | | | | total += len(str(msg)) // 4 | | | | return total | | | +------------------------------------+ | +------------------------------------------+
💡 关键设计:Token 估算
为什么需要估算?
精确计算的问题:- 需要调用 tokenizer 库- 每次计算耗时- 不同模型 tokenizer 不同估算的优势:- 速度快:O(n) 字符串长度- 够用:误差在可接受范围- 通用:不依赖特定 tokenizer
估算实现
defestimate_tokens(messages: list) -> int:""" 估算消息的 token 数量 经验法则: - 英文:1 token ≈ 4 字符 - 中文:1 token ≈ 1.5 字符 - 代码:1 token ≈ 3 字符 这里使用保守估计:1 token = 4 字符 """ total = 0for msg in messages: content = msg.get("content", "")ifisinstance(content, str): total += len(content)elifisinstance(content, list):for block in content:ifisinstance(block, dict): total += len(str(block.get("content", "")))return total // 4# 阈值设置# Claude 200K 上下文,建议在 60-70% 时压缩THRESHOLD = 120000# ~60% of 200K
💻 动手实践:完整代码
现在让我们实现一个带三层压缩的完整 Agent。
完整代码(带详细注释)
"""s06 - Context Compact: 三层上下文压缩策略本示例演示如何在 Agent 中实现上下文压缩:1. micro_compact:每次调用前清理 tool_result2. auto_compact:超过阈值自动总结3. manual_compact:用户/Agent 主动触发核心思想:- 上下文总会满,要有办法腾地方- 压缩不是删除,而是总结- 三层策略:持续清理 + 自动触发 + 精确控制"""import osimport subprocessfrom pathlib import Pathfrom anthropic import Anthropic# ============ 配置 ============client = Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))MODEL = "claude-sonnet-4-20250514"WORKDIR = Path(".")THRESHOLD = 120000# ~60% of 200K context windowSYSTEM = """You are a helpful coding assistant.You have access to tools for file operations and shell commands.Use the `compact` tool when you want to compress the conversation history."""# ============ Token 估算 ============defestimate_tokens(messages: list) -> int:""" 估算消息的 token 数量 使用保守估计:1 token ≈ 4 字符 这个估算足够用于阈值判断 """ total = 0for msg in messages: content = msg.get("content", "")ifisinstance(content, str): total += len(content)elifisinstance(content, list):for block in content:ifisinstance(block, dict): total += len(str(block.get("content", "")))return total // 4# ============ Layer 1: micro_compact ============defmicro_compact(messages: list):""" 微观压缩:清理 tool_result 中的大块内容 这是第一道防线,每次 LLM 调用前执行 策略: - 保留前 500 字符 - 保留后 500 字符 - 中间用省略标记替代 """for msg in messages:if msg.get("role") == "user"andisinstance(msg.get("content"), list):for block in msg["content"]:if block.get("type") == "tool_result": content = block.get("content", "")# 只压缩超过 1000 字符的内容iflen(content) > 1000: block["content"] = ( content[:500] +f"\n... ({len(content) - 1000} chars truncated)\n" + content[-500:] )# ============ Layer 2: auto_compact ============defauto_compact(messages: list) -> list:""" 自动压缩:让 LLM 总结历史对话 这是第二道防线,当 token 估算超过阈值时触发 工作流程: 1. 保留最近的几轮对话 2. 让 LLM 总结历史 3. 用总结替换历史 """# 保留最近的对话 keep_recent = 4iflen(messages) <= keep_recent:return messages to_compress = messages[:-keep_recent] recent = messages[-keep_recent:]# 构建总结提示 summary_prompt = """Summarize the conversation so far.Focus on:- Key decisions made- Important context discovered- Current task status- Files modifiedBe concise but preserve essential information for continuing the work."""# 调用 LLM 生成总结 response = client.messages.create( model=MODEL, messages=to_compress + [{"role": "user", "content": summary_prompt}], max_tokens=2000, ) summary = response.content[0].text# 返回压缩后的消息历史return [ {"role": "user", "content": f"[Previous conversation summary]\n{summary}"}, {"role": "assistant", "content": "Understood. I have the context from the summary. Continuing."}, ] + recent# ============ 工具定义 ============TOOLS = [ {"name": "bash","description": "Run a shell command.","input_schema": {"type": "object","properties": {"command": {"type": "string"} },"required": ["command"] } }, {"name": "read_file","description": "Read file contents.","input_schema": {"type": "object","properties": {"path": {"type": "string"},"limit": {"type": "integer"} },"required": ["path"] } }, {"name": "write_file","description": "Write content to file.","input_schema": {"type": "object","properties": {"path": {"type": "string"},"content": {"type": "string"} },"required": ["path", "content"] } }, {"name": "edit_file","description": "Replace exact text in file.","input_schema": {"type": "object","properties": {"path": {"type": "string"},"old_text": {"type": "string"},"new_text": {"type": "string"} },"required": ["path", "old_text", "new_text"] } }, {"name": "compact","description": "Trigger manual conversation compression.","input_schema": {"type": "object","properties": {"focus": {"type": "string","description": "What to preserve in the summary" } } } }]# ============ 工具处理器 ============defsafe_path(p: str) -> Path:"""安全路径检查,防止路径逃逸""" path = (WORKDIR / p).resolve()ifnot path.is_relative_to(WORKDIR):raise ValueError(f"Path escapes workspace: {p}")return pathdefrun_bash(command: str) -> str:"""执行 bash 命令""" dangerous = ["rm -rf /", "sudo", "shutdown", "reboot", "> /dev/"]ifany(d in command for d in dangerous):return"Error: Dangerous command blocked"try: r = subprocess.run( command, shell=True, cwd=WORKDIR, capture_output=True, text=True, timeout=120 ) out = (r.stdout + r.stderr).strip()return out[:50000] if out else"(no output)"except subprocess.TimeoutExpired:return"Error: Timeout (120s)"defrun_read(path: str, limit: int = None) -> str:"""读取文件"""try: lines = safe_path(path).read_text().splitlines()if limit and limit < len(lines): lines = lines[:limit] + [f"... ({len(lines) - limit} more)"]return"\n".join(lines)[:50000]except Exception as e:returnf"Error: {e}"defrun_write(path: str, content: str) -> str:"""写入文件"""try: fp = safe_path(path) fp.parent.mkdir(parents=True, exist_ok=True) fp.write_text(content)returnf"Wrote {len(content)} bytes"except Exception as e:returnf"Error: {e}"defrun_edit(path: str, old_text: str, new_text: str) -> str:"""编辑文件"""try: fp = safe_path(path) content = fp.read_text()if old_text notin content:returnf"Error: Text not found in {path}" fp.write_text(content.replace(old_text, new_text, 1))returnf"Edited {path}"except Exception as e:returnf"Error: {e}"TOOL_HANDLERS = {"bash": lambda **kw: run_bash(kw["command"]),"read_file": lambda **kw: run_read(kw["path"], kw.get("limit")),"write_file": lambda **kw: run_write(kw["path"], kw["content"]),"edit_file": lambda **kw: run_edit(kw["path"], kw["old_text"], kw["new_text"]),"compact": lambda **kw: "Manual compression requested.",}# ============ Agent Loop ============defagent_loop(messages: list):""" Agent 主循环 关键点: 1. 每次 LLM 调用前执行 micro_compact 2. 检查 token 估算,超过阈值执行 auto_compact 3. 检测 compact 工具调用,执行手动压缩 """whileTrue:# Layer 1: micro_compact - 每次调用前清理 micro_compact(messages)# Layer 2: auto_compact - 超过阈值自动压缩if estimate_tokens(messages) > THRESHOLD:print("[auto_compact triggered]") messages[:] = auto_compact(messages)# 调用 LLM response = client.messages.create( model=MODEL, system=SYSTEM, messages=messages, tools=TOOLS, max_tokens=8000, ) messages.append({"role": "assistant", "content": response.content})# 检查是否需要继续工具调用if response.stop_reason != "tool_use":return# 处理工具调用 results = [] manual_compact = Falsefor block in response.content:if block.type == "tool_use":# 检测手动压缩触发if block.name == "compact": manual_compact = True output = "Compressing..."else: handler = TOOL_HANDLERS.get(block.name)try: output = handler(**block.input) if handler elsef"Unknown tool: {block.name}"except Exception as e: output = f"Error: {e}"print(f"> {block.name}: {str(output)[:200]}") results.append({"type": "tool_result","tool_use_id": block.id,"content": str(output) }) messages.append({"role": "user", "content": results})# Layer 3: manual_compact - 手动压缩触发if manual_compact:print("[manual compact]") messages[:] = auto_compact(messages)# ============ 主程序 ============if __name__ == "__main__": history = []whileTrue:try: query = input("\033[36ms06 >> \033[0m")except (EOFError, KeyboardInterrupt):breakif query.strip().lower() in ("q", "exit", ""):break history.append({"role": "user", "content": query}) agent_loop(history)# 打印响应 response_content = history[-1]["content"]ifisinstance(response_content, list):for block in response_content:ifhasattr(block, "text"):print(block.text)print()
🎯 实战技巧
技巧 1:合理设置阈值
# 保守策略:在 50% 上下文时压缩THRESHOLD = 100000# ~50% of 200K# 激进策略:在 70% 上下文时压缩THRESHOLD = 140000# ~70% of 200K# 平衡策略:在 60% 上下文时压缩(推荐)THRESHOLD = 120000# ~60% of 200K
技巧 2:优化总结提示
# 基础版summary_prompt = "Summarize the conversation."# 优化版:明确保留内容summary_prompt = """Summarize the conversation so far.Focus on:- Key decisions made- Important context discovered- Current task status- Files modifiedBe concise but preserve essential information."""# 高级版:根据任务类型定制defget_summary_prompt(task_type: str) -> str:if task_type == "coding":return"""Summarize the coding session.Focus on:- Files created/modified- Key architectural decisions- Current implementation status- Remaining tasks"""elif task_type == "debugging":return"""Summarize the debugging session.Focus on:- Error symptoms- Root cause identified- Fixes applied- Test results"""
技巧 3:保留关键上下文
defauto_compact_smart(messages: list, key_context: list = None) -> list:""" 智能压缩:保留指定的关键上下文 key_context: 需要保留的消息索引列表 """ifnot key_context:return auto_compact(messages)# 提取关键上下文 preserved = [messages[i] for i in key_context if i < len(messages)]# 压缩其他内容 to_compress = [m for i, m inenumerate(messages) if i notin key_context]# ... 执行压缩逻辑
📊 效果对比
无压缩 vs 三层压缩
场景:50 轮长对话,包含多次文件读取和 bash 命令无压缩:- Token 使用:180K / 200K(90%)- API 成本:$3.60- 质量:开始下降(使用 40% 上下文时)- 最终:上下文溢出,对话中断三层压缩:- Token 使用:平均 40K(20%)- API 成本:$0.80- 质量:保持稳定- 最终:可无限继续对话成本节省:78%质量提升:稳定
🔗 与其他机制的关系
s01 Agent Loop ↓s02 Tool Use ↓s03 TodoWrite ──────┐ ↓ │s04 子智能体 │ ↓ │s05 Skills │ ↓ │[ s06 Context Compact ] ←── 压缩 TodoWrite 的历史记录 ↓ │ 压缩子智能体的返回结果s07 任务系统 ────────┘ 压缩 Skills 的加载内容 ↓...
💭 思考题
-
阈值选择:为什么建议在 60% 上下文时压缩,而不是等到 90%?
-
压缩时机:micro_compact 在每次调用前执行,会不会影响性能?如何优化?
-
总结质量:如何确保 LLM 生成的总结保留了所有关键信息?
-
压缩粒度:保留最近 4 轮对话是否合适?什么情况下需要调整?
📚 扩展阅读
-
Claude Code 官方文档 – 上下文管理[1] -
learn-claude-code GitHub[2] -
Claude Context Windows[3]
🎓 小结
本篇学习了 Agent 的上下文压缩机制:
|
|
|
|
|
|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
核心启示:
上下文总会满,要有办法腾地方。压缩不是删除,而是总结。
三层策略协同工作:
-
持续清理:micro_compact 防止积累 -
自动触发:auto_compact 无需干预 -
精确控制:manual_compact 保留重点
下一篇:s07 任务系统[4] — 大目标要拆成小任务,排好序,记在磁盘上
夜雨聆风