手把手实现Agent:让文档＂跑＂起来

在传统软件工程中，自然语言文档 (Docs) 与可执行代码 (Code) 之间存在着天然的"语义断层"。通常，我们需要经历一个漫长且高损耗的链路：需求文档 → 抽象逻辑→手写代码→运行环境。随着大模型能力的演进，自然语言文档与可执行环境的边界正在迅速模糊。文档不再仅仅是静态的逻辑记录，而是可以直接注入 Agent 的执行指令。那种“文档改动、代码重构”的低效同步模式正在被消解，取而代之的是一种语义无损、即时生效的动态执行流。

今天基于一个办公室日常应用：内容处理处理决策agent, 真正感受下什么是“文档即执行”。

技术实现：构建“文档驱动型”Agent

在“文档即执行”的范式下，代码的职责发生了变化：它不再是硬编码的逻辑，而是能力的载体。我们将整个 Pipeline 的核心实现拆解为以下三个层面。

1. 基础设施：工具注册中心 (Tool Registry)

为了让 Agent 能够“发现”并“理解”文档指令，构建一个装饰器模式的注册中心。

# 核心：通过 Tool Registry 实现逻辑解构TOOL_REGISTRY = {}def register_tool(name=None, tags=None):    def decorator(func):        tool_name = name or func.__name__        # 包装函数，保留元数据        @wraps(func)        def wrapper(*args, **kwargs):            return func(*args, **kwargs)        # 将工具注入注册表，供 Agent 动态发现        TOOL_REGISTRY[tool_name] = wrapper        if tags:            wrapper.tags = tags        return wrapper    return decorator

2. 上下文协议：ActionContext 的状态管理

class ActionContext(dict):    """承载执行流中的中间状态与变量索引"""    def __init__(self, *args, **kwargs):        super().__init__(*args, **kwargs)

3. 核心节点：从 Docstring 到执行逻辑（五个模块实现整个流程）

感知层（OCR)通过视觉模型或OCR引擎完成原始内容(eg.,发票)的像素级识别。
数据结构化：利用LLM 的Schema 理解能力，将非结构化文本输出为标准JSON 格式，完成特征提取。
动态逻辑匹配（核心）：这是“文档即执行”的体现。Agent 实时读取最新的报销政策文档，通过语义推理完成分类与合规性决策。当业务逻辑文档变更，无需修改任何逻辑层实现。
结果反馈：给出判定结果及对应决策过程（涉及业务逻辑的判定理由）。
数据持久化：将结构化结果沉淀至业务数据库，完成闭环。

3.1 感知层的实现，将发票的内容进行提取（也可接入摄像头扫描，此次直接用图片）

from PIL import Imageimport pytesseractdef extract_text_from_invoice(image_path: str) -> str:    """    从发票图片中读取文本内容。    Args:        image_path: 发票图片路径（jpg/png/pdf）    Returns:        提取的文本字符串    """    image = Image.open(image_path)    # 可选优化：灰度 + 二值化    # image = image.convert('L')  # 转灰度    # image = image.point(lambda x: 0 if x<128 else 255)  # 简单二值化    text = pytesseract.image_to_string(image, lang='chi_sim')  # explicitly use lang='chi_sim' for Chinese OCR    return textextracted_text = extract_text_from_invoice("/content/test_invoice.png")print("Re-extracted text from invoice with chi_sim:")print(extracted_text）# 非结构化文本

3.2 结构化数据输出（明确结构化：schema,prompts）

invoice_schema = {        "type": "object",        "required": ["invoice_number", "date", "total_amount"],        "properties": {            "invoice_number": {"type": "string"},            "date": {"type": "string"},            "total_amount": {"type": "number"},            "vendor": {                "type": "object",                "properties": {                    "name": {"type": "string"},                    "address": {"type": "string"}                }            },            "line_items": {                "type": "array",                "items": {                    "type": "object",                    "properties": {                        "description": {"type": "string"},                        "quantity": {"type": "number"},                        "unit_price": {"type": "number"},                        "total": {"type": "number"}                    }                }            }        }    }    # Re-define prompt_llm_for_json prompts@register_tool()def prompt_llm_for_json(    action_context: ActionContext,    schema: Dict[str, Any],    prompt: str) -> Dict[str, Any]:    """    Have the LLM generate JSON in response to a prompt.    Always use this tool when you need structured data out of the LLM.    Args:        action_context: Agent runtime context        schema: JSON schema defining the expected structure        prompt: The prompt to send to the LLM    Returns:        A dictionary matching the provided schema    """    generate_response = action_context.get("llm")    if generate_response is None:        raise RuntimeError("LLM generator not found in action_context")    system_prompt = (        "You MUST produce output that strictly adheres to the following JSON schema.\n"        "Do not include any extra text.\n"        "Output ONLY valid JSON wrapped in a ```json code block.\n\n"        f"{json.dumps(schema, indent=4)}"    )

#Extracted structured invoice data: 输出结构化{   "invoice_number": "2511700000031440xxxx",  "date": "2025-03-18",  "total_amount": 35.55,  "vendor": {    "name": "北京xxxx有限公司",    "address": null  },  "line_items": [    {      "description": "运输服务*客运服务费",      "quantity": 1,      "unit_price": 37.87,      "total": 37.87    },    {      "description": "运输服务*客运服务费",      "quantity": null,      "unit_price": null,      "total": -3.36    }  ]}

3.3 结合业务核心逻辑诉求（核心）

# 公司发票报销管理规定## 通用原则：1. 业务相关性：所有费用支出必须与公司业务直接相关。2. 时效：报销申请必须在费用发生后的 30 天内提交。3. 凭证要求：所有报销均需提供原始纸质发票或电子发票。## 费用类别及细则XXX

3.4 依据prompts判定决策、推理（eg., 发票报销的结果）及业务层面理由（结构化)

evaluation_schema = {   #要求输出结构化        "type": "object",        "properties": {            "category": {                "type": "string",                "description": "The category of the expense based on the rules.",                "enum": [                    "Meals and Entertainment",                    "Office Supplies",                    "Travel Expenses",                    "Software Licenses",                    "Training and Professional Development",                    "Miscellaneous",                    "Non-Reimbursable"                ]            }

@register_tool(tags=["invoice_processing", "reimbursement_evaluation"])def evaluate_invoice_reimbursement(    action_context: ActionContext,    invoice_data: dict,    reimbursement_rules: str) -> dict:    """    Evaluates invoice data against defined reimbursement rules using an LLM,    categorizing the expense and determining its reimbursability.    Args:        action_context: The agent's runtime context.        invoice_data: A dictionary of the extracted invoice details.        reimbursement_rules: A string containing the natural language reimbursement rules.    Returns:        A dictionary containing the expense category, reimbursability status,        and an explanation, structured according to a defined schema.    """

业务结果：可报销，类别为交通费用，及推理理由 Invoice Reimbursement Evaluation:{  "category": "Travel Expenses",  "reimbursable": true,  "explanation": "The invoice is for a 'Transport Service*Passenger Service Fee' from a transportation vendor, which falls under Ground Transportation in the Travel Expenses category (Rule 3). The expense is directly related to company business as it is a transportation service, satisfying the General Principle 1. The total amount is $35.55, which is under the $200 threshold requiring only manager approval (Approval Requirements). There is no indication of personal items, traffic fines, or other non-reimbursable elements. The invoice includes a line item and a discount/credit adjustment, resulting in the net total. As a standard ground transportation expense for business purposes, it is reimbursable provided it was submitted within 30 days (General Principle 2) and has the required original receipt/digital invoice (General Principle 3)."}

3.5 状态存储存入ActionContext 也可以供下一个工具使用

#Agent 流程中，这些数据可存入context，供下一个工具使用。context['invoice_details'] = invoice_datacontext['reimbursement_evaluation'] = evaluation_resultprint("数据已成功存入 ActionContext:")print("--- 存储在 context['invoice_details'] ---")print(json.dumps(context['invoice_details'], ensure_ascii=False, indent=2))print("\n--- 存储在 context['reimbursement_evaluation'] ---")print(json.dumps(context['reimbursement_evaluation'], ensure_ascii=False, indent=2))# 其他 Agent 或工具使用：# 假设有另一个名为 'process_further' 的工具，它会接收这个 context# def process_further(action_context: ActionContext):#     invoice_details_from_context = action_context.get('invoice_details')#     evaluation_from_context = action_context.get('reimbursement_evaluation')#     print(f"\n在另一个工具中访问发票号: {invoice_details_from_context.get('invoice_number')}")#     print(f"在另一个工具中访问报销状态: {evaluation_from_context.get('reimbursable')}")#     # 进行进一步处理...# # 假设调用这个工具# process_further(context)

感谢Google Colab

完整🔗 https://github.com/LEE2020/DocAsExec-Agent#