AI Agent工具调用的确定性测试

你的AI agent在开发环境调用了正确的工具，但到了生产环境却发错了Slack消息——问题出在哪？

这是什么

ToolGuard是一个Python框架，专门用于测试和验证AI agent的工具调用可靠性。它采用确定性方法，在工具执行前进行策略检查，而不是依赖LLM的非确定性判断。

这个项目源自EMNLP 2025论文《Towards Enforcing Company Policy Adherence in Agentic Workflows》，核心思想是：与其让LLM自己判断是否应该调用某个工具，不如用确定性的代码来强制执行策略。

传统做法是把策略规则写进system prompt，让LLM自己遵守。但这种方法有几个致命问题：

非确定性：同样的输入可能产生不同的判断
不可审计：无法证明LLM是否真的"理解"了规则
规模瓶颈：策略越复杂，LLM的遵守率越低

ToolGuard的解决方案是两阶段架构：

阶段一：构建时生成规范

输入：策略文档 + 工具定义
输出：ToolGuardSpec（每个工具的守卫规范）
过程：LLM分析策略文档，生成结构化的合规/违规示例

阶段二：运行时执行守卫

输入：ToolGuardSpec + 工具调用请求
输出：允许/拒绝决策
过程：确定性代码检查，无需LLM参与

为什么你现在需要它

2026年的AI agent开发已经进入深水区。根据行业调查：

27.2%的工程团队放弃了框架提供的授权机制，回退到自定义硬编码逻辑
90%的政府组织缺乏AI agent的目的绑定控制
492+个MCP服务器在生产环境中暴露，没有认证或加密

这不是alignment问题，而是authorization问题。就像早期的web API缺乏OAuth一样，现在的AI agent缺乏标准的per-action授权机制。

ToolGuard解决的是工具调用前的策略执行。它不是沙箱（沙箱只能限制爆炸半径，不能阻止未授权操作），也不是基于模型的审查（概率性的），而是确定性的、可审计的策略执行层。

举个例子：你的AI agent有一个"发送邮件"的工具。传统做法是在prompt里写"不要给外部联系人发邮件"，但LLM可能会被social engineering绕过。ToolGuard的做法是在工具调用前检查收件人域名，如果不是公司域名就直接拒绝——这是确定性的，不会被prompt injection影响。

怎么用：实操指南

安装

uvpipinstalltoolguard

核心概念

ToolGuard提供两个API：

Buildtime API (toolguard.buildtime)：生成守卫规范和代码
Runtime API (toolguard.runtime)：在工具调用时执行守卫

完整示例：计算器策略守卫

假设我们有一个计算器工具集，需要执行以下策略：

不允许除以零
不允许相加乘积为365的数字
不允许相乘KDI值为6.28的数字

第一步：定义工具

# tools.py defdivide_tool(g: float, h: float) -> float: """Divides one number by another.""" return g / h defadd_tool(a: float, b: float) -> float: """Adds two numbers.""" return a + b defmultiply_tool(a: float, b: float) -> float: """Multiplies two numbers.""" return a * b defmap_kdi_number(i: float) -> float: """Maps a number to its KDI value.""" return 3.14 * i

第二步：定义策略文档

# Calculator Usage Policy ## Operation Constraints -**Division by Zero is Not Allowed** The calculator must not allow division by zero. -**Summing Numbers Whose Product is 365 is Not Allowed** The calculator must not allow addition if their product equals 365. For example, adding 5 + 73 should be disallowed (5 * 73 = 365). -**Multiplying Numbers When Any Operand's KDI Value Equals 6.28 is Not Allowed** If any operand has KDI(x) = 6.28, multiplication must be rejected.

第三步：生成守卫规范（构建时）

import asyncio from pathlib import Path from toolguard.buildtime import generate_guard_specs, LitellmModel async defgenerate_specs(): # 配置LLM llm = LitellmModel( model_name="gpt-4o", provider="azure", kw_args={ "api_base": "your-api-base", "api_version": "2024-08-01-preview", "api_key": "your-api-key", } ) # 加载策略文本 with open("policy.md", "r") as f: policy_text = f.read() # 定义工具 tools = [divide_tool, add_tool, multiply_tool, map_kdi_number] # 生成规范 specs = await generate_guard_specs( policy_text=policy_text, tools=tools, work_dir="output/step1", llm=llm, ) return specs # 运行生成 specs = asyncio.run(generate_specs())

第四步：执行守卫（运行时）

from toolguard.runtime import execute_guard # 工具调用请求 tool_call = { "tool": "divide_tool", "parameters": {"g": 10.0, "h": 0.0} } # 执行守卫 result = execute_guard(tool_call, specs) if result.allowed: # 执行工具 result = divide_tool(**tool_call["parameters"]) else: print(f"Guard blocked: {result.reason}")

进阶技巧

1. 自定义守卫逻辑

ToolGuard生成的守卫代码是确定性的Python代码，你可以直接修改：

# 生成的守卫代码可能长这样 defdivide_guard(params): if params["h"] == 0: return GuardResult(allowed=False, reason="Division by zero") return GuardResult(allowed=True) # 你可以添加更复杂的逻辑 defdivide_guard_custom(params): if params["h"] == 0: return GuardResult(allowed=False, reason="Division by zero") if abs(params["g"] / params["h"]) > 1e6: return GuardResult(allowed=False, reason="Result too large") return GuardResult(allowed=True)

2. 集成到CI/CD

# .github/workflows/test-guards.yml name: Test AI Agent Guards on: [push, pull_request] jobs: test-guards: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Set up Python uses: actions/setup-python@v4 with: python-version: '3.10' - name: Install dependencies run: pip install toolguard pytest - name: Run guard tests run: pytest tests/test_guards.py -v

3. 测试守卫本身

# tests/test_guards.py import pytest from toolguard.runtime import execute_guard deftest_divide_by_zero_blocked(): guard = load_guard("divide_tool") result = execute_guard({"g": 10, "h": 0}, guard) assert not result.allowed assert "zero" in result.reason.lower() deftest_normal_division_allowed(): guard = load_guard("divide_tool") result = execute_guard({"g": 10, "h": 2}, guard) assert result.allowed

注意事项

1. 构建时需要LLM，运行时不需要

ToolGuard的构建阶段使用LLM分析策略文档并生成规范。这个阶段需要API调用，会产生费用。但生成的守卫代码是确定性的，运行时不需要LLM。

2. 人工审核必不可少

由于构建阶段使用LLM，生成的规范可能有误。ToolGuard特意将两个阶段解耦，让你有机会审核和修改规范。不要跳过这一步。

3. 不是万能药

ToolGuard解决的是工具调用前的策略执行问题。它不能：

防止LLM选择错误的工具（那是routing问题）
验证工具的输出质量（那是evaluation问题）
替代完整的安全架构（那是系统设计问题）

4. 性能考量

确定性检查的性能开销很小（微秒级），但构建阶段的LLM调用可能需要几秒到几分钟，取决于策略复杂度。

工具	方法	适用场景
ToolGuard	确定性策略执行	工具调用前的授权检查
Open Agent Passport (OAP)	预操作授权 + 审计	企业级合规场景
Pytest + Mock	传统单元测试	工具路由和参数验证
DeepEval	LLM评估	输出质量验证

写在最后

AI agent的可靠性问题正在从"能不能用"转向"能不能信任"。ToolGuard提供了一个务实的解决方案：用确定性代码来执行策略，而不是依赖LLM的"善意"。

对于正在构建生产级AI agent的团队，我建议：

先从关键工具开始：不要试图一次性给所有工具加守卫，先从有破坏性的工具开始（删除、发送、支付）
策略文档要清晰：ToolGuard的效果取决于策略文档的质量。模糊的规则会产生模糊的守卫
建立测试文化：守卫本身也需要测试。把guard tests纳入你的CI/CD流程

AI agent的时代，我们需要的不是更聪明的LLM，而是更可靠的基础设施。ToolGuard是这个方向上的一个有趣尝试。

相关资源

GitHub: https://github.com/AgentToolkit/toolguard
论文: EMNLP 2025 - Towards Enforcing Company Policy Adherence in Agentic Workflows
相关讨论: Open Agent Passport (OAP) - arxiv.org/html/2603.20953v1