Hermes Agent 源码-对话循环

Hermes Agent 源码解析

第 2 讲：Agent 对话循环——从用户输入到 LLM 响应的完整链路

基于 Hermes Agent v0.16.0 源码 · 2026-06-15

一、对话循环：Agent 的心脏

第 1 讲我们建立了全局架构地图。这一讲我们深入 Hermes 最核心、最复杂的代码区域——对话循环（Conversation Loop），逐层拆解从用户输入到 LLM 响应的完整链路。

📦 源码仓库

https://github.com/NousResearch/hermes-agent

本地源码：~/.hermes/hermes-agent/

本讲核心文件：agent/conversation_loop.py（4,252 行）

二、对话循环的入口与架构

1. AIAgent 的转发器模式

AIAgent 的 __init__ 是一个转发器——它接受 ~70 个参数，全部委托给 agent.agent_init.init_agent()。同样，run_conversation() 也委托给 agent/conversation_loop.py：

📄 run_agent.py (第 415-488 行)

    def __init__(self, base_url=None, api_key=None, provider=None,
                 max_iterations=90, tool_delay=1.0, ...):
        """Forwarder — see `agent.agent_init.init_agent."""
        from agent.agent_init import init_agent
        init_agent(self, base_url=base_url, api_key=api_key, ...)

    def run_conversation(self, user_message, system_message=None, ...):
        """Thin forwarder — see agent.conversation_loop.run_conversation`."""
        from agent.conversation_loop import run_conversation
        return run_conversation(self, user_message, system_message, ...)

设计意图：5400 行的 run_agent.py 被拆解为 agent/ 子模块，每个模块职责单一。对话循环独立为 4252 行的 conversation_loop.py，但仍通过 agent 参数（而非类方法）访问状态——这是一种"结构化的过程式"风格。

2. 对话循环的三阶段架构

一次完整的对话（turn）分为三个阶段：

┌─────────────────────────────────────────────────────────┐
│                 run_conversation()                       │
│                                                         │
│  ┌─ 阶段 1: Turn Prologue (build_turn_context) ──────┐  │
│  │ • 用户消息清洗 (surrogate sanitization)             │  │
│  │ • System prompt 恢复/重建                           │  │
│  │ • 迭代预算重置 (IterationBudget)                     │  │
│  │ • 上下文预压缩 (preflight compression)              │  │
│  │ • 插件钩子: pre_llm_call                            │  │
│  │ • 外部记忆预取 (memory prefetch)                     │  │
│  │ • 返回 TurnContext dataclass                        │  │
│  └────────────────────────────────────────────────────┘  │
│                       ↓                                  │
│  ┌─ 阶段 2: 主循环 (while loop) ─────────────────────┐  │
│  │ • 中断检查 (interrupt_requested)                    │  │
│  │ • 预算检查 (IterationBudget.consume())              │  │
│  │ • API 请求构建 + Anthropic cache 注入               │  │
│  │ • 重试循环 (retry_count < max_retries)              │  │
│  │   ├─ 错误分类 (classify_api_error)                  │  │
│  │   ├─ 凭证池轮换 / Fallback 切换                     │  │
│  │   ├─ 上下文溢出 → 压缩                              │  │
│  │   └─ 指数退避 + 中断感知等待                         │  │
│  │ • 响应解析 + 工具调用分发                            │  │
│  │ • 工具执行 (agent._execute_tool_calls)              │  │
│  │ • 空响应恢复 / 截断续传 / Thinking 预填充           │  │
│  │ • 循环内上下文压缩 (should_compress)                │  │
│  └────────────────────────────────────────────────────┘  │
│                       ↓                                  │
│  ┌─ 阶段 3: Turn Finalization (finalize_turn) ───────┐  │
│  │ • 预算耗尽 → 无工具摘要 (handle_max_iterations)    │  │
│  │ • 轨迹保存 (save_trajectory)                        │  │
│  │ • 会话持久化 (persist_session)                      │  │
│  │ • 退出诊断日志 (turn-exit diagnostic)               │  │
│  │ • 记忆/技能审查触发 (memory/skill review)           │  │
│  │ • 返回 result dict                                 │  │
│  └────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────┘

三、阶段 1：Turn Prologue 深度解析

1. TurnContext — 结构化上下文传递

build_turn_context() 返回一个 TurnContext dataclass，将散落在 470 行代码中的局部变量结构化：

📄 agent/turn_context.py (第 37-62 行)

@dataclass
class TurnContext:
    """Values produced by the turn prologue and consumed by the turn loop."""
    user_message: str                          # 清洗后的用户消息
    original_user_message: Any                 # 原始消息（用于日志/记忆）
    messages: List[Dict[str, Any]]             # 工作消息列表
    conversation_history: Optional[List[...]]  # 可能被压缩置空
    active_system_prompt: Optional[str]        # 当前系统提示
    effective_task_id: str                     # 任务 ID（隔离 VM）
    turn_id: str                               # 轮次 ID
    current_turn_user_idx: int                 # 用户消息在 messages 中的索引
    should_review_memory: bool = False         # 是否触发记忆审查
    plugin_user_context: str = ""              # 插件注入的上下文
    ext_prefetch_cache: str = ""              # 外部记忆预取结果

2. System Prompt 恢复/重建

System prompt 的恢复逻辑在 _restore_or_build_system_prompt() 中，有四种状态：

📄 agent/conversation_loop.py (第 225-337 行)

def _restore_or_build_system_prompt(agent, system_message, conversation_history):
    stored_prompt = None
    stored_state = "missing"
    if conversation_history and agent._session_db:
        session_row = agent._session_db.get_session(agent.session_id)
        raw_prompt = session_row.get("system_prompt")
        if raw_prompt is None:
            stored_state = "null"        # 旧会话，无 prompt 存储
        elif raw_prompt == "":
            stored_state = "empty"       # 持久化 bug，空字符串
        else:
            stored_prompt = raw_prompt
            stored_state = "present"     # 正常恢复

    if stored_prompt:
        agent._cached_system_prompt = stored_prompt  # 复用！缓存命中
        return
    # 首次或恢复失败 → 重新构建
    agent._cached_system_prompt = agent._build_system_prompt(system_message)
    # 持久化到 SessionDB
    agent._session_db.update_system_prompt(agent.session_id, ...)

关键洞察：System prompt 从 SessionDB 恢复而非重建，是 Hermes 保护 prompt 缓存的核心机制。Gateway 路径每轮创建新的 AIAgent 实例，依赖这个 DB roundtrip 实现跨轮次 prompt 缓存。

3. 上下文预压缩 (Preflight Compression)

在 API 调用前，如果消息量超过阈值，主动压缩上下文：

📄 agent/turn_context.py (第 248-314 行)

    # ── Preflight context compression ──
    if (agent.compression_enabled
        and len(messages) > agent.context_compressor.protect_first_n
                            + agent.context_compressor.protect_last_n + 1):
        _preflight_tokens = estimate_request_tokens_rough(messages, ...)
        if _compressor.should_compress(_preflight_tokens):
            for _pass in range(3):  # 最多 3 轮压缩
                messages, active_system_prompt = agent._compress_context(...)
                if len(messages) >= _orig_len:
                    break  # 无法进一步压缩

设计要点：预压缩最多 3 轮，每轮重新估算 token 数。如果压缩后消息数不再减少，立即退出——避免无效压缩消耗时间。

4. 插件钩子与外部记忆

Prologue 阶段触发 pre_llm_call 插件钩子，结果注入用户消息而非 system prompt：

📄 agent/turn_context.py (第 316-374 行)

    # Plugin hook: pre_llm_call
    _pre_results = _invoke_hook("pre_llm_call",
        session_id=agent.session_id, task_id=effective_task_id, ...)
    # 结果追加到用户消息，不碰 system prompt（保护缓存前缀）

    # External memory provider: prefetch once before the tool loop
    ext_prefetch_cache = agent._memory_manager.prefetch_all(_query) or ""

四、阶段 2：主循环核心机制

1. 循环控制与迭代预算

主循环的条件是 api_call_count < max_iterations AND iteration_budget.remaining > 0：

📄 agent/conversation_loop.py (第 461 行)

while (api_call_count < agent.max_iterations
       and agent.iteration_budget.remaining > 0)
       or agent._budget_grace_call:

IterationBudget 是线程安全的计数器，支持 refund()（如 execute_code 工具调用不计入预算）：

📄 agent/iteration_budget.py (第 17-62 行)

class IterationBudget:
    def __init__(self, max_total: int):
        self.max_total = max_total
        self._used = 0
        self._lock = threading.Lock()

    def consume(self) -> bool:
        with self._lock:
            if self._used >= self.max_total:
                return False
            self._used += 1
            return True

    def refund(self) -> None:
        with self._lock:
            if self._used > 0:
                self._used -= 1

2. API 请求构建流水线

每次 API 调用前，消息经过多级处理：

API 请求构建流水线（conversation_loop.py L573-750）

messages (原始)
  ↓
┌─ Tool call arguments 修复 (sanitize_tool_call_arguments)
├─ 角色交替修复 (repair_message_sequence_with_cursor)
├─ 逐条复制为 api_messages（浅拷贝）
│   ├─ 注入外部记忆上下文 (ext_prefetch_cache)
│   ├─ 注入插件上下文 (plugin_user_context)
│   ├─ 复制 reasoning_content (多轮推理上下文)
│   ├─ 移除内部字段 (reasoning, finish_reason, _thinking_prefill)
│   └─ 严格 API 兼容: 移除 Codex Responses 字段
├─ 追加 system prompt (cached + ephemeral)
├─ 注入 prefill messages
├─ Anthropic cache_control 注入 (system_and_3 策略)
├─ 孤儿工具结果清理 (_sanitize_api_messages)
├─ Thinking-only 清理 (_drop_thinking_only_and_merge_users)
├─ 空白/JSON 标准化 (保证字节级前缀匹配)
└─ Surrogate 字符清理 (_sanitize_messages_surrogates)
  ↓
api_messages → _build_api_kwargs() → API 调用

3. Anthropic Prompt Caching 策略

Hermes 实现了 system_and_3 缓存策略——在 system prompt + 最后 3 条消息上注入 cache_control 标记：

📄 agent/prompt_caching.py (第 49-79 行)

def apply_anthropic_cache_control(api_messages, cache_ttl="5m", native_anthropic=False):
    """最多 4 个 cache_control 断点: system + 最后 3 条非 system 消息。"""
    messages = copy.deepcopy(api_messages)
    marker = _build_marker(cache_ttl)  # {"type": "ephemeral", "ttl": "1h"?}

    if messages[0].get("role") == "system":
        _apply_cache_marker(messages[0], marker, native_anthropic)
        breakpoints_used += 1

    remaining = 4 - breakpoints_used
    non_sys = [i for i in range(len(messages))
               if messages[i].get("role") != "system"]
    for idx in non_sys[-remaining:]:
        _apply_cache_marker(messages[idx], marker)

效果：多轮对话中，input token 成本降低 ~75%。System prompt 在对话生命周期内字节级稳定，确保上游 prefix cache 持续命中。

4. 错误分类与恢复决策引擎

API 失败时，classify_api_error() 将错误分类为 FailoverReason 枚举之一，驱动恢复策略：

📄 agent/error_classifier.py (第 24-64 行)

class FailoverReason(enum.Enum):
    auth = "auth"                        # 401/403 — 刷新/轮换凭证
    auth_permanent = "auth_permanent"    # 刷新后仍失败 — 放弃
    billing = "billing"                  # 402 — 立即轮换
    rate_limit = "rate_limit"            # 429 — 退避后轮换
    overloaded = "overloaded"            # 503/529 — 退避
    server_error = "server_error"        # 500/502 — 重试
    timeout = "timeout"                  # 连接超时 — 重建 client
    context_overflow = "context_overflow" # 上下文溢出 — 压缩
    payload_too_large = "payload_too_large"  # 413 — 压缩
    content_policy_blocked = "..."       # 安全策略 — 不改重试
    thinking_signature = "..."           # Anthropic thinking 签名无效
    unknown = "unknown"                  # 未知 — 退避重试

恢复决策链（conversation_loop.py L2000-3100）：

错误恢复决策链（按优先级）

API Error → classify_api_error() → ClassifiedError
  ↓
┌─ billing + nous? → 刷新凭证 → continue
├─ credential pool 轮换 → continue
├─ image_too_large → 缩小图片 → continue
├─ multimodal_tool_content → 降级为文本 → continue
├─ oauth_long_context_beta → 禁用 beta → continue
├─ thinking_signature → 剥离 reasoning_details → continue
├─ invalid_encrypted_content → 禁用 replay → continue
├─ context_overflow → 上下文压缩 → break (回到外循环)
├─ non-retryable → fallback → continue / abort
├─ retry_count >= max → fallback → continue / abort
└─ 其他 → 指数退避 + 中断感知等待 → continue

5. 截断响应与续传机制

当 finish_reason == "length" 时，Hermes 支持最多 3 次续传：

📄 agent/conversation_loop.py (第 1319-1473 行)

if finish_reason == "length":
    # 检测 thinking-budget exhaustion（模型全花在推理上）
    if _thinking_exhausted:
        return {"error": "Thinking Budget Exhausted", ...}

    if length_continue_retries < 3:
        interim_msg = agent._build_assistant_message(...)
        messages.append(interim_msg)
        truncated_response_parts.append(assistant_message.content)

        continue_msg = {
            "role": "user",
            "content": _get_continuation_prompt(is_partial_stub, dropped_tools),
        }
        messages.append(continue_msg)
        break  # 回到外循环，重试 API

续传 prompt 有三种形态：

场景	Prompt
普通截断	"Continue exactly where you left off. Do not restart or repeat prior text."
流中断	"The previous response was cut off by a network error mid-stream."
工具调用截断	"Do NOT retry the same large tool call. Break into smaller calls."

6. 工具调用分发与执行

当 LLM 返回 tool_calls 时，Hermes 执行完整验证→执行→压缩检查链路：

📄 agent/conversation_loop.py (第 3506-3835 行)

if assistant_message.tool_calls:
    # ── 验证工具名 ──
    for tc in assistant_message.tool_calls:
        if tc.function.name not in agent.valid_tool_names:
            repaired = agent._repair_tool_call(tc.function.name)
            # 最多 3 次无效工具重试

    # ── 验证 JSON 参数 ──
    for tc in assistant_message.tool_calls:
        try: json.loads(tc.function.arguments)
        except: # 最多 3 次重试，然后注入修复结果

    # ── 后置护栏 ──
    assistant_message.tool_calls = agent._cap_delegate_task_calls(...)
    assistant_message.tool_calls = agent._deduplicate_tool_calls(...)

    # ── 执行工具 ──
    agent._execute_tool_calls(assistant_message, messages, ...)

    # ── 检查 guardrail halt ──
    if agent._tool_guardrail_halt_decision is not None:
        break  # 护栏触发，退出循环

    # ── 循环内压缩 ──
    if _compressor.should_compress(_real_tokens):
        messages, active_system_prompt = agent._compress_context(...)

    # ── execute_code 不计入预算 ──
    if _tc_names == {"execute_code"}:
        agent.iteration_budget.refund()

7. 空响应恢复策略

当 LLM 返回空响应时，Hermes 有四级恢复策略：

空响应恢复策略（conversation_loop.py L3837-4105）

empty response?
  ↓
┌─ 部分流恢复 (partial stream recovery)
│   已流式发送的内容 → 作为最终响应
├─ 前轮内容回退 (fallback to prior turn content)
│   前一轮有内容 + 全是 housekeeping 工具 → 复用前轮内容
├─ 工具后 nudging (post-tool-call empty response nudge)
│   工具执行后返回空 → 注入 nudge 消息，要求继续处理
├─ Thinking-only prefill continuation
│   有推理但无内容 → 附加 assistant 消息，让模型继续
├─ 空响应重试 (最多 3 次)
├─ Fallback provider 切换
└─ 终端: 返回 "(empty)" + _empty_terminal_sentinel

五、阶段 3：Turn Finalization

1. finalize_turn() — 后处理流水线

主循环结束后，finalize_turn() 处理收尾工作：

📄 agent/turn_finalizer.py (第 30-150 行)

def finalize_turn(agent, *, final_response, api_call_count, ...):
    # 预算耗尽 → 无工具摘要
    if final_response is None and budget_exhausted:
        final_response = agent._handle_max_iterations(messages, ...)

    # 轨迹保存
    agent._save_trajectory(messages, user_msg, completed)

    # 清理 VM/browser 资源
    agent._cleanup_task_resources(effective_task_id)

    # 丢弃空响应脚手架 + 持久化
    agent._drop_trailing_empty_response_scaffolding(messages)
    agent._persist_session(messages, conversation_history)

    # 退出诊断日志
    _last_msg_role = messages[-1].get("role")
    logger.info("turn exit: reason=%s last_role=%s exit_code=%s", ...)

2. 退出原因分类

_turn_exit_reason 记录循环退出原因，用于诊断：

退出原因	含义
`text_response`	正常完成，LLM 返回文本
`interrupted_by_user`	用户中断（新消息到达）
`budget_exhausted`	迭代预算耗尽
`guardrail_halt`	工具护栏触发停止
`partial_stream_recovery`	流中断，使用已发送内容
`empty_response_exhausted`	空响应重试+Fallback 耗尽
`compression_exhausted`	上下文压缩达到上限

六、对话循环完整流程图

┌──────────────────────────────────────────────────────────────────┐
│                    run_conversation() 入口                        │
│                                                                  │
│  build_turn_context() ──→ TurnContext                             │
│    ├─ sanitize user_message                                       │
│    ├─ restore_or_build_system_prompt()                             │
│    ├─ reset IterationBudget + counters                             │
│    ├─ preflight compression (if needed)                            │
│    ├─ pre_llm_call plugin hook                                     │
│    └─ memory prefetch                                             │
│                                                                  │
│  while budget > 0:                                                │
│    │                                                              │
│    ├─ check interrupt → break                                    │
│    ├─ consume budget → break if exhausted                         │
│    ├─ drain pending /steer commands                               │
│    │                                                              │
│    │  ┌─ retry loop (max_retries) ──────────────────────────┐    │
│    │  │                                                      │    │
│    │  │  build api_messages:                                  │    │
│    │  │    ├─ repair tool call args                           │    │
│    │  │    ├─ repair role alternation                         │    │
│    │  │    ├─ inject memory + plugin context                  │    │
│    │  │    ├─ apply cache_control markers                     │    │
│    │  │    ├─ sanitize surrogates                             │    │
│    │  │    └─ normalize JSON whitespace                       │    │
│    │  │                                                      │    │
│    │  │  API call (streaming preferred):                      │    │
│    │  │    try:                                               │    │
│    │  │      response = _perform_api_call(api_kwargs)         │    │
│    │  │      → normalize response                             │    │
│    │  │      → track token usage + cost                       │    │
│    │  │      → break (success)                                │    │
│    │  │    except InterruptedError:                           │    │
│    │  │      → break outer loop                               │    │
│    │  │    except Exception:                                  │    │
│    │  │      classify_api_error() → FailoverReason            │    │
│    │  │      ├─ credential pool rotate → continue             │    │
│    │  │      ├─ context_overflow → compress → break           │    │
│    │  │      ├─ fallback provider → continue                  │    │
│    │  │      ├─ non-retryable → abort                         │    │
│    │  │      └─ backoff + sleep(0.2) loop → continue          │    │
│    │  │                                                      │    │
│    │  └──────────────────────────────────────────────────────┘    │
│    │                                                              │
│    │  process response:                                           │
│    │    ├─ finish_reason == "length"?                             │    │
│    │    │   ├─ thinking exhausted? → error                        │    │
│    │    │   ├─ no tool calls → continuation retry (×3)           │    │
│    │    │   └─ has tool calls → truncated retry (×3)             │    │
│    │    │                                                         │    │
│    │    ├─ has tool_calls?                                        │    │
│    │    │   ├─ validate names (repair hallucinations)             │    │
│    │    │   ├─ validate JSON args                                 │    │
│    │    │   ├─ guardrails: cap delegates + deduplicate            │    │
│    │    │   ├─ _execute_tool_calls()                              │    │
│    │    │   ├─ check guardrail halt → break                       │    │
│    │    │   ├─ should_compress()? → compress                      │    │
│    │    │   └─ continue (next iteration)                          │    │
│    │    │                                                         │    │
│    │    └─ final response (no tool calls):                        │    │
│    │        ├─ partial stream recovery                            │    │
│    │        ├─ prior turn content fallback                        │    │
│    │        ├─ post-tool nudge                                    │    │
│    │        ├─ thinking prefill continuation                      │    │
│    │        ├─ empty response retry (×3)                          │    │
│    │        ├─ fallback provider                                  │    │
│    │        └─ break (done)                                       │    │
│    │                                                              │
│    └─ finalize_turn():                                           │
│        ├─ budget exhausted? → summary via no-tool call           │
│        ├─ save trajectory                                        │
│        ├─ cleanup resources                                      │
│        ├─ persist session                                        │
│        └─ return result dict                                     │
└──────────────────────────────────────────────────────────────────┘

七、关键设计模式总结

模式	实现	效果
结构化过程式	`agent` 参数访问状态	避免深层继承，状态直接可见
分层恢复	错误分类 → 凭证轮换 → Fallback	最大化可用性
中断感知	sleep(0.2) 循环检查中断	用户随时可打断
预算退款	`execute_code` 不计入预算	程序化工具调用不消耗配额
渐进式压缩	预压缩 + 循环内压缩	按需压缩，不浪费计算
多级空响应恢复	7 级恢复策略	极少返回空
缓存保护	system prompt 从 DB 恢复	保护 prefix cache
退出诊断	`_turn_exit_reason`	可观测性

八、本讲关键文件清单

文件	行数	职责
`agent/conversation_loop.py`	4,252	对话循环主逻辑
`agent/turn_context.py`	388	Turn prologue 构建
`agent/turn_finalizer.py`	428	Turn 后处理
`agent/iteration_budget.py`	62	迭代预算计数器
`agent/error_classifier.py`	1,365	API 错误分类
`agent/prompt_caching.py`	79	Anthropic 缓存策略
`agent/turn_retry_state.py`	~100	重试状态追踪
`agent/message_sanitization.py`	~800	消息清洗/修复
`agent/model_metadata.py`	~600	Token 估算/模型元数据
`agent/tool_dispatch_helpers.py`	~500	工具分发辅助

📌 预告

下一讲深入 工具系统——ToolRegistry 自注册机制、ToolSet 分组管理、check_fn 探针、以及 model_tools.py 中的工具分发逻辑。我们将看到 Hermes 如何管理 50+ 个工具并保持核心精简。

基于 Hermes Agent v0.16.0 (commit 45f9099e) 源码分析

https://github.com/NousResearch/hermes-agent