Harness 工程之拆解 AI 编程助手(三):深入内核——那条指令背后,六个模块各司其职(2)

上一篇我们主要介绍了LLM 客户端（一切的基础）和工具系统（Agent 的手和脚）

今天我们来拆六大模块中最后三个模块——上下文压缩（记忆管理），会话持久化以及系统提示词。

上下文压缩（context.py，197 行）

是什么

这是技术含量最高的模块。它解决了一个核心问题：LLM 的上下文窗口是有限的。当对话越来越长，消息历史超过模型的上下文限制时，API 会直接报错。上下文压缩模块就是在错误发生之前，智能地”浓缩”历史消息。

为什么需要三层

一种直观的方案是”一刀切“——上下文快满时，直接删掉最早的消息。但这会导致严重的信息丢失：LLM 可能在第 3 轮做了一个关键决策，到第 20 轮还需要引用它。

CoreCoder 采用渐进式压缩策略，分三层逐级加码：

💡 面向 PM：这就像人类的记忆管理。短期记忆满了，你不会突然失忆，而是先忘记不重要的细节（Snip），然后给旧记忆写个摘要（Summarize），最后在极端情况下只保留最关键的信息（Collapse）。

Layer 1: Snip（50% 阈值）

# corecoder/context.py:69-94@staticmethoddef _snip_tool_outputs(messages: list[dict]) -> bool:    changed = False    for m in messages:        if m.get("role") != "tool":         # 只处理工具输出            continue        content = m.get("content", "")        if len(content) <= 1500:            # 短输出不用管            continue        lines = content.splitlines()        if len(lines) <= 6:                 # 行数少的也不用管            continue        # 保留前 3 行 + 后 3 行        snipped = (            "\n".join(lines[:3])            + f"\n... ({len(lines)} lines, snipped to save context) ...\n"            + "\n".join(lines[-3:])        )        m["content"] = snipped        changed = True    return changed

为什么从 50% 就开始？因为工具输出往往占上下文的大头。一次 bash 工具调用可能返回 3000 字符的测试输出，一次 read_file 可能返回 2000 行的文件内容。尽早截断这些”大块头”可以延后更昂贵的 LLM 摘要（Layer 2 需要额外调用一次 LLM，有成本）。

纯字符串操作，不需要 LLM，成本为零。只裁剪 tool 角色的消息，不动 user 和 assistant 的消息。

Layer 2: Summarize（70% 阈值）

# corecoder/context.py:96-117def _summarize_old(self, messages: list[dict], llm: LLM | None,                   keep_recent: int = 8) -> bool:    if len(messages) <= keep_recent:        return False    old = messages[:-keep_recent]       # 旧消息（除最后 8 条）    tail = messages[-keep_recent:]      # 最近消息（最后 8 条）    summary = self._get_summary(old, llm)   # 用 LLM 生成摘要    messages.clear()    messages.append({        "role": "user",        "content": f"[Context compressed - conversation summary]\n{summary}",    })    messages.append({        "role": "assistant",        "content": "Got it, I have the context from our earlier conversation.",    })    messages.extend(tail)    return True

这段代码做了两件事：

把消息分为 old（除最后 8 条）和 recent（最后 8 条）

用 LLM 把 old 部分压缩成一段摘要，然后替换消息列表为：[摘要] + [最近 8 条消息]

注意 messages.clear() + messages.extend() 的操作方式——这是原地修改传入的列表（而不是返回新列表），这样 Agent 持有的 self.messages 引用会自动更新。

LLM 不可用时的降级方案：

# corecoder/context.py:135-161def _get_summary(self, messages: list[dict], llm: LLM | None) -> str:    flat = self._flatten(messages)    if llm:        try:            resp = llm.chat(                messages=[                    {                        "role": "system",                        "content": (                            "Compress this conversation into a brief summary. "                            "Preserve: file paths edited, key decisions made, "                            "errors encountered, current task state. "                            "Drop: verbose command output, code listings, "                            "redundant back-and-forth."                        ),                    },                    {"role": "user", "content": flat[:15000]},                ],            )            return resp.content        except Exception:            pass    # LLM 摘要失败，走降级方案    # 降级方案：用正则提取文件路径和错误信息    return self._extract_key_info(messages)

摘要的 system prompt 明确告诉 LLM “保留什么、丢弃什么”——文件路径、关键决策、错误信息要保留，冗余的命令输出和代码列表可以丢弃。

降级方案 _extract_key_info() 使用纯正则从消息中提取文件路径（匹配 xxx.yyy 格式）和包含 “error” 的行：

# corecoder/context.py:173-196@staticmethoddef _extract_key_info(messages: list[dict]) -> str:    import re    files_seen = set()    errors = []    for m in messages:        text = m.get("content", "") or ""        for match in re.finditer(r'[\w./\-]+\.\w{1,5}', text):            files_seen.add(match.group())        for line in text.splitlines():            if 'error' in line.lower() or 'Error' in line:                errors.append(line.strip()[:150])    parts = []    if files_seen:        parts.append(f"Files touched: {', '.join(sorted(files_seen)[:20])}")    if errors:        parts.append(f"Errors seen: {'; '.join(errors[:5])}")    return "\n".join(parts) or "(no extractable context)"

🔧 降级方案虽然粗糙，但保证了即使 LLM 不可用，压缩也能继续。这是一个重要的工程原则：关键路径上的 fallback 必须零依赖。

Layer 3: Hard Collapse（90% 阈值）

# corecoder/context.py:119-133def _hard_collapse(self, messages: list[dict], llm: LLM | None):    tail = messages[-4:] if len(messages) > 4 else messages[-2:]    summary = self._get_summary(messages[:-len(tail)], llm)    messages.clear()    messages.append({        "role": "user",        "content": f"[Hard context reset]\n{summary}",    })    messages.append({        "role": "assistant",        "content": "Context restored. Continuing from where we left off.",    })    messages.extend(tail)

这是最后防线。当上下文用量达到 90% 时，只保留摘要 + 最后 4 条消息（比 Layer 2 的 8 条更激进）。防止上下文溢出导致 API 报错。

压缩的入口：maybe_compress()

三层压缩的调度逻辑在 maybe_compress() 方法中（第 45-67 行）：

# corecoder/context.py:45-67def maybe_compress(self, messages: list[dict], llm: LLM | None = None) -> bool:    current = estimate_tokens(messages)    compressed = False    # Layer 1: snip verbose tool outputs    if current > self._snip_at:        if self._snip_tool_outputs(messages):            compressed = True            current = estimate_tokens(messages)   # 重新估算    # Layer 2: LLM-powered summarization    if current > self._summarize_at and len(messages) > 10:        if self._summarize_old(messages, llm, keep_recent=8):            compressed = True            current = estimate_tokens(messages)   # 重新估算    # Layer 3: hard collapse    if current > self._collapse_at and len(messages) > 4:        self._hard_collapse(messages, llm)        compressed = True    return compressed

注意每层压缩后会重新估算 token 数，决定是否需要继续下一层。这意味着一次 maybe_compress() 调用可能只触发 Layer 1（Snip 截断后已降到 70% 以下），也可能三层全部触发（极端情况）。

🔧 estimate_tokens() 使用了一个极简的估算公式：len(text) // 3（第 22-24 行），即大约 3 个字符 = 1 个 token。这对于中英混合内容是合理的近似值。精确的 token 计数需要用 tiktoken 等分词器，但那会引入额外依赖和性能开销。

会话持久化（session.py，69 行）

是什么

把当前对话状态保存到磁盘，下次可以恢复继续。这就像游戏的”存档/读档”功能。

怎么做

保存（第 15-31 行）：

# corecoder/session.py:15-31def save_session(messages: list[dict], model: str, session_id: str | None = None) -> str:    SESSIONS_DIR.mkdir(parents=True, exist_ok=True)   # 确保 ~/.corecoder/sessions/ 存在    if not session_id:        session_id = f"session_{int(time.time())}"    # 用时间戳生成 ID    data = {        "id": session_id,        "model": model,        "saved_at": time.strftime("%Y-%m-%d %H:%M:%S"),        "messages": messages,    }    path = SESSIONS_DIR / f"{session_id}.json"    path.write_text(json.dumps(data, ensure_ascii=False, indent=2))    return session_id

加载（第 34-41 行）：

# corecoder/session.py:34-41def load_session(session_id: str) -> tuple[list[dict], str] | None:    path = SESSIONS_DIR / f"{session_id}.json"    if not path.exists():        return None    data = json.loads(path.read_text())    return data["messages"], data["model"]

列表（第 44-68 行）：遍历 sessions 目录，按时间倒序排列，取第一条用户消息作预览，最多返回 20 个。

💡 面向 PM：ensure_ascii=False 这个参数让 JSON 文件能正确保存中文内容。如果省略它，中文字符会被转义成 \uXXXX 格式，虽然程序能正确读取，但文件对人类不可读。

系统提示词（prompt.py，34 行）

是什么

系统提示词是每次 LLM 调用时放在消息列表最前面的”角色设定”。它告诉 LLM “你是谁、在哪里、有什么工具可用、应该遵守什么规则”。

怎么做

完整代码如下（prompt.py 全部 34 行）：

# corecoder/prompt.py（完整代码）import osimport platformdef system_prompt(tools) -> str:    cwd = os.getcwd()    tool_list = "\n".join(f"- **{t.name}**: {t.description}" for t in tools)    uname = platform.uname()    return f"""\You are CoreCoder, an AI coding assistant running in the user's terminal.You help with software engineering: writing code, fixing bugs, refactoring,explaining code, running commands, and more.# Environment- Working directory: {cwd}- OS: {uname.system} {uname.release} ({uname.machine})- Python: {platform.python_version()}# Tools{tool_list}# Rules1. **Read before edit.** Always read a file before modifying it.2. **edit_file for small changes.** Use edit_file for targeted edits; write_file   only for new files or complete rewrites.3. **Verify your work.** After making changes, run relevant tests or commands.4. **Be concise.** Show code over prose. Explain only what's necessary.5. **One step at a time.** For multi-step tasks, execute them sequentially.6. **edit_file uniqueness.** Include enough surrounding context in old_string.7. **Respect existing style.** Match the project's coding conventions.8. **Ask when unsure.** If the request is ambiguous, ask for clarification."""

动态注入了四类信息：

8 条规则中，第 2 条（优先用 edit_file）和第 6 条（保证 old_string 唯一性）直接服务于搜索替换编辑工具的设计。

🔧 系统提示词在 Agent 构造时生成一次（self._system = system_prompt(self.tools)，agent.py 第 34 行），之后不再更新。这意味着如果用户在会话中 cd 到其他目录，系统提示词中的 “Working directory” 不会自动更新。这是一个已知的简化——Claude Code 的 prompts.ts（914 行）会在每次调用时动态重新生成提示词，但也因此复杂得多。

写在最后

至此，我们已经拆完了所有底层模块：LLM 客户端（一切的基础）和工具系统（子智能体）（Agent 的手和脚）、上下文压缩（记忆管理），会话持久化以及系统提示词。下一篇，我们把视角拉高，看贯穿所有模块的 5 个核心设计模式——不只是代码，是代码背后的思想。