Harness 工程之拆解 AI 编程助手(三):深入内核——那条指令背后,六个模块各司其职(1)

上一章我们从宏观视角看完了系统的运行流程。这一章开始”开膛破肚”——逐个拆解每个模块的内部实现。我们按照运行时的依赖顺序来讲解：先看 LLM 客户端（一切的基础），再看工具系统（Agent 的手和脚），然后是上下文压缩（记忆管理），最后是会话持久化和系统提示词。

鉴于这一部分内容较多，故考虑分两篇分别介绍。本篇主要介绍LLM 客户端（一切的基础）和工具系统（Agent 的手和脚）。

开始吧！

LLM 客户端（llm.py，200 行）

是什么

LLM 客户端是整个系统的”喉舌”——所有与语言模型的通信都经过这 200 行代码。它基于 OpenAI SDK 构建，提供三个核心能力：流式响应处理、指数退避重试和费用估算。

💡 面向 PM：你可以把 LLM 客户端想象成一位翻译官。Agent（大脑）想说什么、想调用什么工具，都通过这位翻译官传达给远程的 AI 大脑。翻译官负责把话说清楚（组装 API 请求）、把回复带回来（解析流式响应）、在对方没听清时重说一遍（重试机制）、还要算算花了多少钱（费用估算）。

为什么需要”薄封装”

你可能会问：既然已经有 OpenAI SDK 了，为什么不直接用，还要再包一层？

原因有三：

流式 tool_calls 需要手动拼装

——OpenAI SDK 只返回原始的 chunk 流，一个完整的工具调用会被拆成多个 delta 片段，需要自己累积拼接

重试逻辑不应散落在业务代码中

——网络抖动、限速是常态，重试应该封装在底层

费用追踪需要统一入口

——每次调用的 token 用量都需要汇总，只有集中管理才能准确计算

核心功能一：流式响应处理

chat() 方法（第 105-182 行）是整个文件的心脏。当 Agent 调用 llm.chat() 时，它做的事情远不止”发个请求拿个结果”那么简单。

首先看方法签名和初始化参数：

# corecoder/llm.py:105-127def chat(    self,    messages: list[dict],    tools: list[dict] | None = None,    on_token=None,) -> LLMResponse:    params: dict = {        "model": self.model,        "messages": messages,        "stream": True,           # 关键：始终使用流式        **self.extra,    }    if tools:        params["tools"] = tools    # stream_options 是 OpenAI 的扩展字段，不是所有 provider 都支持    try:        params["stream_options"] = {"include_usage": True}        stream = self._call_with_retry(params)    except Exception:        params.pop("stream_options", None)        stream = self._call_with_retry(params)

🔧 stream_options = {“include_usage”: True} 是一个重要的细节。OpenAI 的 API 默认在流式模式下不返回 token 用量信息，需要显式请求。但 DeepSeek、Qwen 等其他 provider 可能不认识这个字段，会直接报错。代码用了一个优雅的兼容策略：先尝试带stream_options发请求，失败就去掉再试。这样既能在 OpenAI 上拿到精确的 token 统计，又不会在其他 provider 上崩溃。

接下来是最关键的部分——tool_calls 的分块累积：

# corecoder/llm.py:129-162content_parts: list[str] = []tc_map: dict[int, dict] = {}  # index -> {id, name, arguments_str}prompt_tok = 0completion_tok = 0for chunk in stream:    # token 用量在最后一个 chunk 中    if chunk.usage:        prompt_tok = chunk.usage.prompt_tokens        completion_tok = chunk.usage.completion_tokens    if not chunk.choices:        continue    delta = chunk.choices[0].delta    # 累积文本内容    if delta.content:        content_parts.append(delta.content)        if on_token:            on_token(delta.content)    # 实时回调给 CLI 层做流式显示    # 累积 tool_calls —— 这是最精妙的部分    if delta.tool_calls:        for tc_delta in delta.tool_calls:            idx = tc_delta.index       # 用 index 区分不同的 tool_call            if idx not in tc_map:                tc_map[idx] = {"id": "", "name": "", "args": ""}            if tc_delta.id:                tc_map[idx]["id"] = tc_delta.id            if tc_delta.function:                if tc_delta.function.name:                    tc_map[idx]["name"] = tc_delta.function.name                if tc_delta.function.arguments:                    tc_map[idx]["args"] += tc_delta.function.arguments  # 注意：这里是 += 拼接

为什么需要累积？这是一个初学者经常困惑的问题。答案在于 LLM 的流式输出机制。

当 LLM 决定调用一个工具时（比如 edit_file(file_path=’main.py’, old_string=’…’, new_string=’…’)），它不是一口气说完整的，而是像人类说话一样一个字一个字”吐”出来。在流式 API 中，这表现为多个 delta 片段：

chunk 1: tc_delta.id = "call_abc", tc_delta.function.name = "edit_file"chunk 2: tc_delta.function.arguments = '{"file_path": "main'chunk 3: tc_delta.function.arguments = '.py", "old_string": "from ut'chunk 4: tc_delta.function.arguments = 'ils import halper", "new_string"...'chunk 5: tc_delta.function.arguments = ': "from utils import helper"}'

每个 chunk 只包含 arguments 的一个片段。tc_map[idx][“args”] += tc_delta.function.arguments 这行代码的作用就是把所有片段拼成一个完整的 JSON 字符串，最后在第 169 行一次性解析：

# corecoder/llm.py:164-172parsed: list[ToolCall] = []for idx insorted(tc_map):    raw = tc_map[idx]    try:        args = json.loads(raw["args"])    except (json.JSONDecodeError, KeyError):        args = {}    # 解析失败就传空参数，让工具自己报错    parsed.append(ToolCall(id=raw["id"], name=raw["name"], arguments=args))

💡 面向 PM：你可以这样理解流式输出——想象你在等一份快递（LLM 的工具调用），但快递公司不是一次性送到，而是分很多小包裹陆续送来。你需要先收齐所有包裹（累积 delta），然后按编号拼起来（解析 JSON），才能看到完整的物品（工具调用参数）。

核心功能二：指数退避重试

_call_with_retry() 方法（第 184-199 行）处理所有”网络不稳定”的场景：

# corecoder/llm.py:184-199def _call_with_retry(self, params: dict, max_retries: int = 3):    """Retry on transient errors with exponential backoff."""    for attempt in range(max_retries):        try:            return self.client.chat.completions.create(**params)        except (RateLimitError, APITimeoutError, APIConnectionError) as e:            if attempt == max_retries - 1:                raise               # 最后一次还失败就不忍了，直接抛异常            wait = 2 ** attempt     # 1秒 → 2秒 → 4秒            time.sleep(wait)        except APIError as e:            # 5xx = 服务端错误，重试；4xx = 客户端错误，不重试            if e.status_code and e.status_code >= 500 and attempt < max_retries - 1:                time.sleep(2 ** attempt)            else:                raise

这个方法的设计哲学是只重试”有希望成功”的错误：

🔧 指数退避（exponential backoff）是分布式系统的经典策略。wait = 2 ** attempt 意味着第一次等 1 秒，第二次等 2 秒，第三次等 4 秒。越来越长的等待时间给服务端喘息的机会，避免雪崩——如果 1000 个客户端在同一时刻遇到限速，全部立刻重试只会让服务器更崩溃。

核心功能三：费用估算

estimated_cost 属性（第 93-103 行）提供了一个实时费用计算器：

# corecoder/llm.py:48-76 (定价表)_PRICING = {    # OpenAI    "gpt-5.4": (2.5, 15),          # (input $/M tokens, output $/M tokens)    "gpt-4o": (2.5, 10),    "gpt-4o-mini": (0.15, 0.6),    # DeepSeek    "deepseek-chat": (0.27, 1.10),    # Anthropic Claude    "claude-opus-4-6": (5, 25),    # Alibaba Qwen    "qwen-max": (0.78, 3.9),    # Moonshot Kimi    "kimi-k2.5": (0.6, 3),    # ... 共 15+ 个模型}

# corecoder/llm.py:93-103@propertydef estimated_cost(self) -> float | None:    pricing = _PRICING.get(self.model)    if not pricing:        return None       # 未知模型返回 None，不会瞎猜    input_rate, output_rate = pricing    return (        self.total_prompt_tokens * input_rate / 1_000_000        + self.total_completion_tokens * output_rate / 1_000_000    )

💡 面向 PM：费用估算功能让用户实时知道花了多少钱。这在多轮工具调用中特别重要——一个复杂任务可能涉及 20+ 次 LLM 调用。用户在 CLI 中输入 /tokens 命令就能看到当前会话的 token 用量和费用。

🔧 注意定价表是硬编码的。这意味着当供应商调整价格时需要手动更新。在生产系统中，你应该从供应商的 API 动态获取价格。但对于 CoreCoder 的教学目的，硬编码更直观。

工具系统（tools/，共 478 行）

工具系统是 Agent 的”手和脚”——LLM 负责思考，工具负责执行。整个工具系统由 9 个文件组成：1 个基类 + 1 个注册表 + 7 个具体工具。

工具基类（base.py，28 行）

是什么：所有工具的”出生证明”——定义了每个工具必须实现的接口。

为什么：Agent 需要用统一的方式调用不同工具。不管工具是执行命令、读写文件还是生成子代理，Agent 都用同一套 get_tool(name) → execute(**kwargs) 的模式调用。这就是经典的策略模式（Strategy Pattern）。

怎么做：

# corecoder/tools/base.py（完整代码，仅 28 行）from abc import ABC, abstractmethodclass Tool(ABC):    """Minimal tool interface. Subclass this to add new capabilities."""    name: str           # 工具名称，如 "bash"、"edit_file"    description: str    # 给 LLM 看的自然语言描述    parameters: dict    # JSON Schema，定义工具接受的参数    @abstractmethod    def execute(self, **kwargs) -> str:        """Run the tool and return a text result."""        ...    def schema(self) -> dict:        """OpenAI function-calling schema."""        return {            "type": "function",            "function": {                "name": self.name,                "description": self.description,                "parameters": self.parameters,            },        }

🔧 schema() 方法输出的是 OpenAI function-calling 格式的 JSON Schema。这个格式被几乎所有 LLM provider 支持（OpenAI、DeepSeek、Qwen、Kimi 等）。当 Agent 调用 llm.chat(tools=self._tool_schemas()) 时，实际上是把所有工具的 schema 列表传给了 LLM，LLM 据此决定调用哪个工具、传什么参数。

工具注册表（init.py，28 行）

是什么：把 7 个工具实例化并放进一个列表，提供按名查找的方法。

怎么做：

# corecoder/tools/__init__.py（完整代码）from .bash import BashToolfrom .read import ReadFileToolfrom .write import WriteFileToolfrom .edit import EditFileToolfrom .glob_tool import GlobToolfrom .grep import GrepToolfrom .agent import AgentToolALL_TOOLS = [    BashTool(),    ReadFileTool(),    WriteFileTool(),    EditFileTool(),    GlobTool(),    GrepTool(),    AgentTool(),]def get_tool(name: str):    """Look up a tool by name."""    for t in ALL_TOOLS:        if t.name == name:            return t    return None

你可能觉得 get_tool() 用线性查找效率太低。但在只有 7 个工具的场景下，线性查找的时间复杂度 O(7) 和字典查找 O(1) 的差异可以忽略不计（纳秒级别）。代码简洁性在这里远比性能优化重要。

💡 面向 PM：新增一个工具只需要三步——(1) 在 tools/ 下创建新文件，继承 Tool 类 (2) 在 __init__.py 中 import 并加入 ALL_TOOLS 列表 (3) 完成。这就是 README 中”添加自定义工具只需约 20 行”的由来。

搜索替换编辑（edit.py，90 行）——最重要的工具

是什么：这是 Claude Code 最核心的创新之一。它让 AI 的代码修改变得精确、安全、可审计。

为什么：AI 修改文件有三种方案，各有优劣：

CoreCoder 选择了第三种方案。

怎么做：execute() 方法的核心逻辑（第 44-73 行）：

# corecoder/tools/edit.py:44-73def execute(self, file_path: str, old_string: str, new_string: str) -> str:    try:        p = Path(file_path).expanduser().resolve()        if not p.exists():            return f"Error: {file_path} not found"        content = p.read_text()        occurrences = content.count(old_string)        if occurrences == 0:            # 没找到 → 报错 + 显示文件开头（帮助 LLM 纠正）            preview = content[:500] + ("..." if len(content) > 500 else "")            return (                f"Error: old_string not found in {file_path}.\n"                f"File starts with:\n{preview}"            )        if occurrences > 1:            # 找到多个 → 报错 + 提示加更多上下文            return (                f"Error: old_string appears {occurrences} times in {file_path}. "                f"Include more surrounding lines to make it unique."            )        # 恰好找到 1 次 → 替换        new_content = content.replace(old_string, new_string, 1)        p.write_text(new_content)        _changed_files.add(str(p))    # 记录修改过的文件        # 生成 unified diff 供用户审查        diff = _unified_diff(content, new_content, str(p))        return f"Edited {file_path}\n{diff}"    except Exception as e:        return f"Error: {e}"

三种结果的处理方式体现了精妙的设计：

找到 0 次：不是简单报错，而是把文件的前 500 个字符返回给 LLM。这相当于告诉 LLM：”你看，文件长这样，你再仔细找找。”

找到 >1 次：明确告诉 LLM 有几处匹配，并建议”加更多上下文”让匹配唯一。

找到恰好 1 次：替换成功，并生成标准 unified diff。

每次替换后都会调用 _unified_diff() 生成人类可读的变更摘要：

# corecoder/tools/edit.py:76-89def _unified_diff(old: str, new: str, filename: str, context: int = 3) -> str:    """Generate a compact unified diff between old and new file content."""    old_lines = old.splitlines(keepends=True)    new_lines = new.splitlines(keepends=True)    diff = difflib.unified_diff(        old_lines, new_lines,        fromfile=f"a/{filename}", tofile=f"b/{filename}",        n=context,    )    result = "".join(diff)    # 过大的 diff 截断，避免撑爆上下文    if len(result) > 3000:        result = result[:2500] + "\n... (diff truncated)\n"    return result

🔧 注意第 15 行的 _changed_files 集合——它是模块级全局变量，跟踪当前会话修改的所有文件。write.py 也通过 from .edit import _changed_files 共享这个集合。CLI 层的 /diff 命令遍历这个集合来显示本次会话的所有变更。

💡 面向 PM：搜索替换编辑是 Claude Code 最核心的创新之一。它让 AI 的代码修改变得可审计——用户能看到一个标准 diff，清楚地知道改了什么。这就像一位同事给你看 git diff 而不是口头说”我改了几个地方”。

Shell 执行（bash.py，116 行）

是什么：在子进程中执行 shell 命令，返回输出结果。这是最强大也最危险的工具。

为什么：AI 编程 Agent 必须能执行命令——运行测试、安装依赖、查看 git 状态，这些都离不开 shell。

怎么做：三个核心功能。

功能一：危险命令拦截（第 19-29 行）

# corecoder/tools/bash.py:19-29_DANGEROUS_PATTERNS = [    (r"\brm\s+(-\w*)?-r\w*\s+(/|~|\$HOME)", "recursive delete on home/root"),    (r"\brm\s+(-\w*)?-rf\s", "force recursive delete"),    (r"\bmkfs\b", "format filesystem"),    (r"\bdd\s+.*of=/dev/", "raw disk write"),    (r">\s*/dev/sd[a-z]", "overwrite block device"),    (r"\bchmod\s+(-R\s+)?777\s+/", "chmod 777 on root"),    (r":\(\)\s*\{.*:\|:.*\}", "fork bomb"),    (r"\bcurl\b.*\|\s*(sudo\s+)?bash", "pipe curl to bash"),    (r"\bwget\b.*\|\s*(sudo\s+)?bash", "pipe wget to bash"),]

9 个正则模式覆盖了最常见的危险操作：递归删除、格式化磁盘、fork 炸弹、管道执行远程脚本等。每个模式都有一个人类可读的原因描述，当命令被拦截时返回给 LLM。

💡 面向 PM：危险命令拦截就像给 AI 配了一个安全护栏。不是限制它的能力，而是防止”好心办坏事”。LLM 可能认为 rm -rf node_modules 是合理的清理操作，但如果正则匹配到了 rm -rf，就会被拦截并提示用户确认。

功能二：输出截断（第 82-87 行）

# corecoder/tools/bash.py:82-87iflen(out) > 15_000:    out = (        out[:6000]        + f"\n\n... truncated ({len(out)} chars total) ...\n\n"        + out[-3000:]    )

命令输出超过 15000 字符时，保留头部 6000 字符 + 尾部 3000 字符。为什么是”头 + 尾”？因为头部通常包含有用的正常输出，尾部通常包含错误信息——这两部分对 LLM 来说最有价值。

功能三：工作目录跟踪（第 103-116 行）

# corecoder/tools/bash.py:103-116def _update_cwd(command: str, current_cwd: str):    """Track directory changes from cd commands."""    global _cwd    parts = command.split("&&")    for part in parts:        part = part.strip()        if part.startswith("cd "):            target = part[3:].strip().strip("'\"")            if target:                new_dir = os.path.normpath(                    os.path.join(current_cwd, os.path.expanduser(target))                )                if os.path.isdir(new_dir):                    _cwd = new_dir

🔧 每次命令执行成功后（returncode == 0），都会调用 _update_cwd() 解析命令中的 cd 操作。它支持 && 链式命令（如 cd src && npm test），能正确追踪最终的工作目录。这是一个简化的实现——它不支持 pushd/popd、变量展开等复杂的 shell 特性，但覆盖了最常见的场景。

文件读取（read.py，54 行）

是什么：读取文件内容并添加行号。简单但实用。

怎么做：

# corecoder/tools/read.py:32-53def execute(self, file_path: str, offset: int = 1, limit: int = 2000) -> str:    try:        p = Path(file_path).expanduser().resolve()        if not p.exists():            return f"Error: {file_path} not found"        if not p.is_file():            return f"Error: {file_path} is a directory, not a file"        text = p.read_text(errors="replace")        lines = text.splitlines()        total = len(lines)        start = max(0, offset - 1)       # 1-based → 0-based        chunk = lines[start : start + limit]        numbered = [f"{start + i + 1}\t{ln}" for i, ln in enumerate(chunk)]        result = "\n".join(numbered)        if total > start + limit:            result += f"\n... ({total} lines total, showing {start+1}-{start+len(chunk)})"        return result or "(empty file)"    except Exception as e:        return f"Error: {e}"

两个设计亮点：

行号显示：每行前面加 {行号}\t{内容}。这让 LLM 在后续的 edit_file调用中能精确定位（虽然 edit 不用行号，但行号帮助 LLM 理解上下文）
分页机制：offset+ limit参数支持大文件的分段读取，默认一次最多 2000 行

文件写入（write.py，39 行）

是什么：创建新文件或完全覆盖已有文件。

怎么做（完整代码）：

# corecoder/tools/write.py（完整代码，仅 39 行）from pathlib import Pathfrom .base import Toolfrom .edit import _changed_filesclass WriteFileTool(Tool):    name = "write_file"    description = (        "Create a new file or completely overwrite an existing one. "        "For small edits to existing files, prefer edit_file instead."    )    parameters = {        "type": "object",        "properties": {            "file_path": {"type": "string", "description": "Path for the file"},            "content": {"type": "string", "description": "Full file content to write"},        },        "required": ["file_path", "content"],    }    def execute(self, file_path: str, content: str) -> str:        try:            p = Path(file_path).expanduser().resolve()            p.parent.mkdir(parents=True, exist_ok=True)   # 自动创建嵌套目录            p.write_text(content)            _changed_files.add(str(p))                     # 共享修改记录            n_lines = content.count("\n") + (1 if content and not content.endswith("\n") else 0)            return f"Wrote {n_lines} lines to {file_path}"        except Exception as e:            return f"Error: {e}"

💡 面向 PM：p.parent.mkdir(parents=True, exist_ok=True) 这一行是一个贴心的细节。当 LLM 要创建 src/utils/helper.py 时，如果 src/utils/ 目录不存在，它会自动创建，而不是报错说”目录不存在”。这避免了 LLM 需要先执行 mkdir 再写文件的两步操作。

🔧 注意 description 中特意写了 “For small edits to existing files, prefer edit_file instead.”——这是在引导 LLM 的行为。系统提示词中的规则 2 也说了同样的话。双重引导确保 LLM 在大多数情况下选择精确的搜索替换而非全文件重写。

文件搜索（glob_tool.py+grep.py）

glob（48 行）——按文件名模式搜索：

# corecoder/tools/glob_tool.py:28-47（核心逻辑）def execute(self, pattern: str, path: str = ".") -> str:    base = Path(path).expanduser().resolve()    if not base.is_dir():        return f"Error: {path} is not a directory"    hits = list(base.glob(pattern))    # 按修改时间排序，最新的在前    hits.sort(key=lambda p: p.stat().st_mtime if p.exists() else 0, reverse=True)    total = len(hits)    shown = hits[:100]             # 最多返回 100 条    ...

grep（79 行）——按文件内容搜索：

# corecoder/tools/grep.py:8_SKIP_DIRS = {".git", "node_modules", "__pycache__", ".venv", "venv", ".tox", "dist", "build"}

grep 的设计有两个保护机制：

跳过 8 个”噪音目录”（.git、node_modules 等），避免搜索结果被无关文件淹没

最多返回 200 条匹配、遍历 5000 个文件，防止搜索超时

子代理（agent.py，59 行）

是什么：生成一个独立的小型 Agent 来执行子任务。

为什么：复杂任务往往需要大量中间步骤（比如”分析整个项目的架构”），如果全部在主 Agent 的上下文中进行，会迅速消耗完上下文窗口。子代理提供了上下文隔离——子任务的工作记忆不会污染主 Agent。

怎么做（核心逻辑）：

# corecoder/tools/agent.py:36-58def execute(self, task: str) -> str:    if self._parent_agent is None:        return "Error: agent tool not initialized (no parent agent)"    from ..agent import Agent    # 延迟 import，避免循环依赖    parent = self._parent_agent    sub = Agent(        llm=parent.llm,                                          # ① 共享 LLM        tools=[t for t in parent.tools if t.name != "agent"],     # ② 排除自身        max_context_tokens=parent.context.max_tokens,        max_rounds=20,                                           # ③ 最多 20 轮    )    try:        result = sub.chat(task)        # 结果太长时截断，防止撑爆父级上下文        if len(result) > 5000:            result = result[:4500] + "\n... (sub-agent output truncated)"        return f"[Sub-agent completed]\n{result}"    except Exception as e:        return f"Sub-agent error: {e}"

四个关键设计决策：

共享 LLM（llm=parent.llm）：子代理使用同一个 LLM 实例（同一个 API key、同一个模型），不产生额外的连接开销
排除自身（if t.name != “agent”）：防止递归——子代理不能再生成子代理，避免”俄罗斯套娃”式的无限嵌套
独立上下文（sub = Agent(…)创建全新实例）：子代理有自己独立的messages列表，任务结束后直接销毁，不影响主 Agent
结果截断（5000 字符上限）：子代理可能产出很长的分析报告，全部塞进主 Agent 的上下文窗口得不偿失，只保留摘要

💡 面向 PM：子代理就像把任务”外包”给一个新人。新人有自己的工作记忆（不会和你的记忆混淆），用同一套工具（共享 LLM 和工具箱），但做完后只给你一个摘要报告。这样你的注意力不会被长篇大论的中间过程分散。

🔧 子代理的 _parent_agent 是在 Agent.__init__() 中注入的（agent.py 第 37-39 行），而不是在 AgentTool 构造时传入。这是因为 ALL_TOOLS 列表在模块加载时就完成了实例化，那时还没有 Agent 对象。

以上就是本篇的全部内容，下篇介绍上下文压缩（记忆管理），会话持久化以及系统提示词。

项目地址：https://github.com/he-yufeng/CoreCoder

关注我，追完整个系列。下一篇见。