【Agent 学习 s06】为什么你的 AI 助手聊着聊着就＂失忆＂了?上下文压缩完全解析-夜雨聆风

【Agent 学习 s06】为什么你的 AI 助手聊着聊着就＂失忆＂了?上下文压缩完全解析

🧠 你有没有遇到过这种情况：和 AI 聊得好好的，突然它就开始胡言乱语，或者直接报错”上下文已满”？这不是 AI 的锅，是你的 Harness 没做好压缩！本文带你深入理解三层压缩策略（micro_compact、auto_compact、manual_compact），学会让 AI 拥有”无限记忆”的秘诀。

s01 > s02 > s03 > s04 > s05 < [ s06 ] > s07 | s08 > s09 > s10 > s11 > s12

📚 学习目标

学完本篇，你将能够：

✅ 理解上下文窗口的限制与挑战
✅ 掌握三层压缩策略的设计思想
✅ 学会实现 micro_compact、auto_compact、manual_compact
✅ 理解 token 估算与阈值触发机制
✅ 动手实现一个带上下文压缩的 Agent

🤔 从问题开始学习

思考一下

假设你正在开发一个编程助手 Agent，用户开始了一个长对话：

用户：帮我创建一个 React 项目Agent：好的，创建完成... [对话继续]用户：添加一个登录页面Agent：已添加... [对话继续]用户：实现用户认证Agent：正在实现... [对话继续]... 50 轮对话后 ...用户：优化一下性能Agent：[错误] 上下文窗口已满，无法继续

问题来了：如何让 Agent 支持无限长度的对话？

核心矛盾

挑战	说明
上下文有限	Claude 200K tokens，GPT-4 128K tokens
对话增长	每轮对话增加 500-5000 tokens
质量下降	使用 20-40% 上下文时，质量就开始打折扣
成本上升	上下文越长，API 调用成本越高

💡 学习笔记：压缩不是简单删除，而是保留关键信息、丢弃冗余细节。就像人类记忆——记住要点，忘记细节。

🎯 解决方案：三层压缩策略

压缩策略概览

┌─────────────────────────────────────────────────────────────┐│                    三层压缩策略                              ││                    ==================                        ││                                                             ││  Layer 1: micro_compact（微观压缩）                         ││  ────────────────────────────────                           ││  时机：每次 LLM 调用前                                       ││  动作：清理 tool_result 中的大块内容                         ││  效果：减少 30-50% 上下文                                   ││                                                             ││  Layer 2: auto_compact（自动压缩）                          ││  ────────────────────────────────                           ││  时机：token 估算超过阈值时                                  ││  动作：让 LLM 总结历史对话                                   ││  效果：保留关键信息，丢弃冗余                                ││                                                             ││  Layer 3: manual_compact（手动压缩）                        ││  ────────────────────────────────                           ││  时机：用户或 Agent 主动触发                                 ││  动作：按指定焦点总结                                        ││  效果：精确控制保留内容                                      ││                                                             │└─────────────────────────────────────────────────────────────┘

为什么是三层？

单一压缩的问题：- 压缩太频繁 → 丢失上下文 → 质量下降- 压缩太稀疏 → 上下文爆炸 → 成本失控三层策略的优势：- Layer 1：持续清理，防止积累- Layer 2：自动触发，无需干预- Layer 3：精确控制，保留重点

📖 概念理解：压缩机制详解

Layer 1: micro_compact（微观压缩）

核心思想：tool_result 往往包含大量临时数据，用完即可丢弃

defmicro_compact(messages: list):"""    微观压缩：清理 tool_result 中的大块内容    为什么需要？    - bash 输出可能几千行    - read_file 返回整个文件    - 这些内容用完就没用了    策略：    - 保留前 500 字符（用于理解上下文）    - 保留后 500 字符（用于理解结果）    - 中间用 "... (X chars truncated)" 替代    """for msg in messages:if msg.get("role") == "user"andisinstance(msg.get("content"), list):for block in msg["content"]:if block.get("type") == "tool_result":                    content = block.get("content", "")iflen(content) > 1000:                        block["content"] = (                            content[:500] +f"\n... ({len(content) - 1000} chars truncated)\n" +                            content[-500:]                        )

效果示例：

压缩前：tool_result: "line1\nline2\n...line5000\n"  # 50000 字符压缩后：tool_result: "line1\n...line10\n... (48000 chars truncated)\n...line4990\nline5000\n"  # 2000 字符

Layer 2: auto_compact（自动压缩）

核心思想：当上下文接近上限时，让 LLM 自己总结历史

defauto_compact(messages: list) -> list:"""    自动压缩：让 LLM 总结历史对话    触发条件：estimate_tokens(messages) > THRESHOLD    工作流程：    1. 提取历史对话    2. 调用 LLM 生成总结    3. 用总结替换历史    4. 保留最近的几轮对话    """# 保留最近的对话（不压缩）    keep_recent = 4iflen(messages) <= keep_recent:return messages    to_compress = messages[:-keep_recent]    recent = messages[-keep_recent:]# 构建总结提示    summary_prompt = """Summarize the conversation so far.Focus on:- Key decisions made- Important context discovered- Current task status- Files modifiedBe concise but preserve essential information."""# 调用 LLM 生成总结    response = client.messages.create(        model=MODEL,        messages=to_compress + [{"role": "user", "content": summary_prompt}],        max_tokens=2000,    )    summary = response.content[0].text# 构建新的消息历史return [        {"role": "user", "content": f"[Previous conversation summary]\n{summary}"},        {"role": "assistant", "content": "Understood. I have the context from the summary. Continuing."},    ] + recent

效果示例：

压缩前：50 轮对话，100000 tokens压缩后：[Previous conversation summary]- Created React project with TypeScript- Added login page with form validation- Implemented JWT authentication- Current task: Optimizing performance- Modified files: src/App.tsx, src/auth/Login.tsx, src/api/auth.tsUnderstood. I have the context from the summary. Continuing.[最近 4 轮对话...]总计：5000 tokens

Layer 3: manual_compact（手动压缩）

核心思想：用户或 Agent 可以主动触发压缩，指定保留重点

# compact 工具定义{"name": "compact","description": "Trigger manual conversation compression.","input_schema": {"type": "object","properties": {"focus": {"type": "string","description": "What to preserve in the summary"            }        }    }}# 工具处理器defhandle_compact(focus: str = None):"""    手动压缩：按指定焦点总结    使用场景：    - Agent 完成一个阶段任务，想清理上下文    - 用户输入 /compact 命令    - Agent 发现上下文影响效率    """    prompt = f"Summarize the conversation, focusing on: {focus}"if focus else"Summarize the conversation"# ... 调用 auto_compact 逻辑

🔧 工作原理：压缩架构

架构图

                    上下文压缩架构                    ==============    +------------------------------------------+    |              Agent Loop                   |    |  +------------------------------------+  |    |  | while True:                        |  |    |  |   # Layer 1: 每次调用前清理        |  |    |  |   micro_compact(messages)          |  |    |  |                                    |  |    |  |   # Layer 2: 超过阈值自动压缩      |  |    |  |   if estimate_tokens() > THRESHOLD:|  |    |  |     messages = auto_compact()      |  |    |  |                                    |  |    |  |   response = llm(messages)         |  |    |  |   messages.append(response)        |  |    |  |                                    |  |    |  |   # Layer 3: 检测手动压缩触发      |  |    |  |   if tool == "compact":            |  |    |  |     messages = auto_compact()      |  |    |  +------------------------------------+  |    +------------------------------------------+                         |                         v    +------------------------------------------+    |           Token 估算器                    |    |  +------------------------------------+  |    |  | def estimate_tokens(messages):     |  |    |  |   total = 0                        |  |    |  |   for msg in messages:             |  |    |  |     total += len(str(msg)) // 4    |  |    |  |   return total                     |  |    |  +------------------------------------+  |    +------------------------------------------+

💡 关键设计：Token 估算

为什么需要估算？

精确计算的问题：- 需要调用 tokenizer 库- 每次计算耗时- 不同模型 tokenizer 不同估算的优势：- 速度快：O(n) 字符串长度- 够用：误差在可接受范围- 通用：不依赖特定 tokenizer

估算实现

defestimate_tokens(messages: list) -> int:"""    估算消息的 token 数量    经验法则：    - 英文：1 token ≈ 4 字符    - 中文：1 token ≈ 1.5 字符    - 代码：1 token ≈ 3 字符    这里使用保守估计：1 token = 4 字符    """    total = 0for msg in messages:        content = msg.get("content", "")ifisinstance(content, str):            total += len(content)elifisinstance(content, list):for block in content:ifisinstance(block, dict):                    total += len(str(block.get("content", "")))return total // 4# 阈值设置# Claude 200K 上下文，建议在 60-70% 时压缩THRESHOLD = 120000# ~60% of 200K

💻 动手实践：完整代码

现在让我们实现一个带三层压缩的完整 Agent。

完整代码（带详细注释）

"""s06 - Context Compact: 三层上下文压缩策略本示例演示如何在 Agent 中实现上下文压缩：1. micro_compact：每次调用前清理 tool_result2. auto_compact：超过阈值自动总结3. manual_compact：用户/Agent 主动触发核心思想：- 上下文总会满，要有办法腾地方- 压缩不是删除，而是总结- 三层策略：持续清理 + 自动触发 + 精确控制"""import osimport subprocessfrom pathlib import Pathfrom anthropic import Anthropic# ============ 配置 ============client = Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))MODEL = "claude-sonnet-4-20250514"WORKDIR = Path(".")THRESHOLD = 120000# ~60% of 200K context windowSYSTEM = """You are a helpful coding assistant.You have access to tools for file operations and shell commands.Use the `compact` tool when you want to compress the conversation history."""# ============ Token 估算 ============defestimate_tokens(messages: list) -> int:"""    估算消息的 token 数量    使用保守估计：1 token ≈ 4 字符    这个估算足够用于阈值判断    """    total = 0for msg in messages:        content = msg.get("content", "")ifisinstance(content, str):            total += len(content)elifisinstance(content, list):for block in content:ifisinstance(block, dict):                    total += len(str(block.get("content", "")))return total // 4# ============ Layer 1: micro_compact ============defmicro_compact(messages: list):"""    微观压缩：清理 tool_result 中的大块内容    这是第一道防线，每次 LLM 调用前执行    策略：    - 保留前 500 字符    - 保留后 500 字符    - 中间用省略标记替代    """for msg in messages:if msg.get("role") == "user"andisinstance(msg.get("content"), list):for block in msg["content"]:if block.get("type") == "tool_result":                    content = block.get("content", "")# 只压缩超过 1000 字符的内容iflen(content) > 1000:                        block["content"] = (                            content[:500] +f"\n... ({len(content) - 1000} chars truncated)\n" +                            content[-500:]                        )# ============ Layer 2: auto_compact ============defauto_compact(messages: list) -> list:"""    自动压缩：让 LLM 总结历史对话    这是第二道防线，当 token 估算超过阈值时触发    工作流程：    1. 保留最近的几轮对话    2. 让 LLM 总结历史    3. 用总结替换历史    """# 保留最近的对话    keep_recent = 4iflen(messages) <= keep_recent:return messages    to_compress = messages[:-keep_recent]    recent = messages[-keep_recent:]# 构建总结提示    summary_prompt = """Summarize the conversation so far.Focus on:- Key decisions made- Important context discovered- Current task status- Files modifiedBe concise but preserve essential information for continuing the work."""# 调用 LLM 生成总结    response = client.messages.create(        model=MODEL,        messages=to_compress + [{"role": "user", "content": summary_prompt}],        max_tokens=2000,    )    summary = response.content[0].text# 返回压缩后的消息历史return [        {"role": "user", "content": f"[Previous conversation summary]\n{summary}"},        {"role": "assistant", "content": "Understood. I have the context from the summary. Continuing."},    ] + recent# ============ 工具定义 ============TOOLS = [    {"name": "bash","description": "Run a shell command.","input_schema": {"type": "object","properties": {"command": {"type": "string"}            },"required": ["command"]        }    },    {"name": "read_file","description": "Read file contents.","input_schema": {"type": "object","properties": {"path": {"type": "string"},"limit": {"type": "integer"}            },"required": ["path"]        }    },    {"name": "write_file","description": "Write content to file.","input_schema": {"type": "object","properties": {"path": {"type": "string"},"content": {"type": "string"}            },"required": ["path", "content"]        }    },    {"name": "edit_file","description": "Replace exact text in file.","input_schema": {"type": "object","properties": {"path": {"type": "string"},"old_text": {"type": "string"},"new_text": {"type": "string"}            },"required": ["path", "old_text", "new_text"]        }    },    {"name": "compact","description": "Trigger manual conversation compression.","input_schema": {"type": "object","properties": {"focus": {"type": "string","description": "What to preserve in the summary"                }            }        }    }]# ============ 工具处理器 ============defsafe_path(p: str) -> Path:"""安全路径检查，防止路径逃逸"""    path = (WORKDIR / p).resolve()ifnot path.is_relative_to(WORKDIR):raise ValueError(f"Path escapes workspace: {p}")return pathdefrun_bash(command: str) -> str:"""执行 bash 命令"""    dangerous = ["rm -rf /", "sudo", "shutdown", "reboot", "> /dev/"]ifany(d in command for d in dangerous):return"Error: Dangerous command blocked"try:        r = subprocess.run(            command, shell=True, cwd=WORKDIR,            capture_output=True, text=True, timeout=120        )        out = (r.stdout + r.stderr).strip()return out[:50000] if out else"(no output)"except subprocess.TimeoutExpired:return"Error: Timeout (120s)"defrun_read(path: str, limit: int = None) -> str:"""读取文件"""try:        lines = safe_path(path).read_text().splitlines()if limit and limit < len(lines):            lines = lines[:limit] + [f"... ({len(lines) - limit} more)"]return"\n".join(lines)[:50000]except Exception as e:returnf"Error: {e}"defrun_write(path: str, content: str) -> str:"""写入文件"""try:        fp = safe_path(path)        fp.parent.mkdir(parents=True, exist_ok=True)        fp.write_text(content)returnf"Wrote {len(content)} bytes"except Exception as e:returnf"Error: {e}"defrun_edit(path: str, old_text: str, new_text: str) -> str:"""编辑文件"""try:        fp = safe_path(path)        content = fp.read_text()if old_text notin content:returnf"Error: Text not found in {path}"        fp.write_text(content.replace(old_text, new_text, 1))returnf"Edited {path}"except Exception as e:returnf"Error: {e}"TOOL_HANDLERS = {"bash": lambda **kw: run_bash(kw["command"]),"read_file": lambda **kw: run_read(kw["path"], kw.get("limit")),"write_file": lambda **kw: run_write(kw["path"], kw["content"]),"edit_file": lambda **kw: run_edit(kw["path"], kw["old_text"], kw["new_text"]),"compact": lambda **kw: "Manual compression requested.",}# ============ Agent Loop ============defagent_loop(messages: list):"""    Agent 主循环    关键点：    1. 每次 LLM 调用前执行 micro_compact    2. 检查 token 估算，超过阈值执行 auto_compact    3. 检测 compact 工具调用，执行手动压缩    """whileTrue:# Layer 1: micro_compact - 每次调用前清理        micro_compact(messages)# Layer 2: auto_compact - 超过阈值自动压缩if estimate_tokens(messages) > THRESHOLD:print("[auto_compact triggered]")            messages[:] = auto_compact(messages)# 调用 LLM        response = client.messages.create(            model=MODEL,            system=SYSTEM,            messages=messages,            tools=TOOLS,            max_tokens=8000,        )        messages.append({"role": "assistant", "content": response.content})# 检查是否需要继续工具调用if response.stop_reason != "tool_use":return# 处理工具调用        results = []        manual_compact = Falsefor block in response.content:if block.type == "tool_use":# 检测手动压缩触发if block.name == "compact":                    manual_compact = True                    output = "Compressing..."else:                    handler = TOOL_HANDLERS.get(block.name)try:                        output = handler(**block.input) if handler elsef"Unknown tool: {block.name}"except Exception as e:                        output = f"Error: {e}"print(f"> {block.name}: {str(output)[:200]}")                results.append({"type": "tool_result","tool_use_id": block.id,"content": str(output)                })        messages.append({"role": "user", "content": results})# Layer 3: manual_compact - 手动压缩触发if manual_compact:print("[manual compact]")            messages[:] = auto_compact(messages)# ============ 主程序 ============if __name__ == "__main__":    history = []whileTrue:try:            query = input("\033[36ms06 >> \033[0m")except (EOFError, KeyboardInterrupt):breakif query.strip().lower() in ("q", "exit", ""):break        history.append({"role": "user", "content": query})        agent_loop(history)# 打印响应        response_content = history[-1]["content"]ifisinstance(response_content, list):for block in response_content:ifhasattr(block, "text"):print(block.text)print()

🎯 实战技巧

技巧 1：合理设置阈值

# 保守策略：在 50% 上下文时压缩THRESHOLD = 100000# ~50% of 200K# 激进策略：在 70% 上下文时压缩THRESHOLD = 140000# ~70% of 200K# 平衡策略：在 60% 上下文时压缩（推荐）THRESHOLD = 120000# ~60% of 200K

技巧 2：优化总结提示

# 基础版summary_prompt = "Summarize the conversation."# 优化版：明确保留内容summary_prompt = """Summarize the conversation so far.Focus on:- Key decisions made- Important context discovered- Current task status- Files modifiedBe concise but preserve essential information."""# 高级版：根据任务类型定制defget_summary_prompt(task_type: str) -> str:if task_type == "coding":return"""Summarize the coding session.Focus on:- Files created/modified- Key architectural decisions- Current implementation status- Remaining tasks"""elif task_type == "debugging":return"""Summarize the debugging session.Focus on:- Error symptoms- Root cause identified- Fixes applied- Test results"""

技巧 3：保留关键上下文

defauto_compact_smart(messages: list, key_context: list = None) -> list:"""    智能压缩：保留指定的关键上下文    key_context: 需要保留的消息索引列表    """ifnot key_context:return auto_compact(messages)# 提取关键上下文    preserved = [messages[i] for i in key_context if i < len(messages)]# 压缩其他内容    to_compress = [m for i, m inenumerate(messages) if i notin key_context]# ... 执行压缩逻辑

📊 效果对比

无压缩 vs 三层压缩

场景：50 轮长对话，包含多次文件读取和 bash 命令无压缩：- Token 使用：180K / 200K（90%）- API 成本：$3.60- 质量：开始下降（使用 40% 上下文时）- 最终：上下文溢出，对话中断三层压缩：- Token 使用：平均 40K（20%）- API 成本：$0.80- 质量：保持稳定- 最终：可无限继续对话成本节省：78%质量提升：稳定

🔗 与其他机制的关系

s01 Agent Loop    ↓s02 Tool Use    ↓s03 TodoWrite ──────┐    ↓               │s04 子智能体         │    ↓               │s05 Skills          │    ↓               │[ s06 Context Compact ] ←── 压缩 TodoWrite 的历史记录    ↓               │    压缩子智能体的返回结果s07 任务系统 ────────┘    压缩 Skills 的加载内容    ↓...

💭 思考题

阈值选择：为什么建议在 60% 上下文时压缩，而不是等到 90%？
压缩时机：micro_compact 在每次调用前执行，会不会影响性能？如何优化？
总结质量：如何确保 LLM 生成的总结保留了所有关键信息？
压缩粒度：保留最近 4 轮对话是否合适？什么情况下需要调整？

📚 扩展阅读

Claude Code 官方文档 – 上下文管理^[1]
learn-claude-code GitHub^[2]
Claude Context Windows^[3]

🎓 小结

本篇学习了 Agent 的上下文压缩机制：

层级	名称	触发时机	核心动作
Layer 1	micro_compact	每次 LLM 调用前	清理 tool_result 大块内容
Layer 2	auto_compact	token 估算超过阈值	LLM 总结历史对话
Layer 3	manual_compact	用户/Agent 主动触发	按指定焦点总结

核心启示：

上下文总会满，要有办法腾地方。压缩不是删除，而是总结。

三层策略协同工作：

持续清理：micro_compact 防止积累
自动触发：auto_compact 无需干预
精确控制：manual_compact 保留重点

下一篇：s07 任务系统^[4] — 大目标要拆成小任务，排好序，记在磁盘上