今天基于一个办公室日常应用:内容处理处理决策agent, 真正感受下什么是“文档即执行”。
技术实现:构建“文档驱动型”Agent
1. 基础设施:工具注册中心 (Tool Registry)
# 核心:通过 Tool Registry 实现逻辑解构TOOL_REGISTRY = {}def register_tool(name=None, tags=None):def decorator(func):tool_name = name or func.__name__# 包装函数,保留元数据@wraps(func)def wrapper(*args, **kwargs):return func(*args, **kwargs)# 将工具注入注册表,供 Agent 动态发现TOOL_REGISTRY[tool_name] = wrapperif tags:wrapper.tags = tagsreturn wrapperreturn decorator
2. 上下文协议:ActionContext 的状态管理
class ActionContext(dict):"""承载执行流中的中间状态与变量索引"""def __init__(self, *args, **kwargs):super().__init__(*args, **kwargs)
感知层(OCR)通过视觉模型或OCR引擎完成原始内容(eg.,发票)的像素级识别。 数据结构化:利用LLM 的Schema 理解能力,将非结构化文本输出为标准JSON 格式,完成特征提取。 动态逻辑匹配(核心):这是“文档即执行”的体现。Agent 实时读取最新的报销政策文档,通过语义推理完成分类与合规性决策。当业务逻辑文档变更,无需修改任何逻辑层实现。 结果反馈:给出判定结果及对应决策过程(涉及业务逻辑的判定理由)。 数据持久化:将结构化结果沉淀至业务数据库,完成闭环。
from PIL import Imageimport pytesseractdef extract_text_from_invoice(image_path: str) -> str:"""从发票图片中读取文本内容。Args:image_path: 发票图片路径(jpg/png/pdf)Returns:提取的文本字符串"""image = Image.open(image_path)# 可选优化:灰度 + 二值化# image = image.convert('L') # 转灰度# image = image.point(lambda x: 0 if x<128 else 255) # 简单二值化text = pytesseract.image_to_string(image, lang='chi_sim') # explicitly use lang='chi_sim' for Chinese OCRreturn textextracted_text = extract_text_from_invoice("/content/test_invoice.png")print("Re-extracted text from invoice with chi_sim:")print(extracted_text)# 非结构化文本
3.2 结构化数据输出(明确结构化:schema,prompts)
invoice_schema = {"type": "object","required": ["invoice_number", "date", "total_amount"],"properties": {"invoice_number": {"type": "string"},"date": {"type": "string"},"total_amount": {"type": "number"},"vendor": {"type": "object","properties": {"name": {"type": "string"},"address": {"type": "string"}}},"line_items": {"type": "array","items": {"type": "object","properties": {"description": {"type": "string"},"quantity": {"type": "number"},"unit_price": {"type": "number"},"total": {"type": "number"}}}}}}# Re-define prompt_llm_for_json prompts@register_tool()def prompt_llm_for_json(action_context: ActionContext,schema: Dict[str, Any],prompt: str) -> Dict[str, Any]:"""Have the LLM generate JSON in response to a prompt.Always use this tool when you need structured data out of the LLM.Args:action_context: Agent runtime contextschema: JSON schema defining the expected structureprompt: The prompt to send to the LLMReturns:A dictionary matching the provided schema"""generate_response = action_context.get("llm")if generate_response is None:raise RuntimeError("LLM generator not found in action_context")system_prompt = ("You MUST produce output that strictly adheres to the following JSON schema.\n""Do not include any extra text.\n""Output ONLY valid JSON wrapped in a ```json code block.\n\n"f"{json.dumps(schema, indent=4)}")
#Extracted structured invoice data: 输出结构化{"invoice_number": "2511700000031440xxxx","date": "2025-03-18","total_amount": 35.55,"vendor": {"name": "北京xxxx有限公司","address": null},"line_items": [{"description": "运输服务*客运服务费","quantity": 1,"unit_price": 37.87,"total": 37.87},{"description": "运输服务*客运服务费","quantity": null,"unit_price": null,"total": -3.36}]}
# 公司发票报销管理规定## 通用原则:1. 业务相关性:所有费用支出必须与公司业务直接相关。2. 时效:报销申请必须在费用发生后的 30 天内提交。3. 凭证要求:所有报销均需提供原始纸质发票或电子发票。## 费用类别及细则XXX
evaluation_schema = { #要求输出结构化"type": "object","properties": {"category": {"type": "string","description": "The category of the expense based on the rules.","enum": ["Meals and Entertainment","Office Supplies","Travel Expenses","Software Licenses","Training and Professional Development","Miscellaneous","Non-Reimbursable"]}
@register_tool(tags=["invoice_processing", "reimbursement_evaluation"])def evaluate_invoice_reimbursement(action_context: ActionContext,invoice_data: dict,reimbursement_rules: str) -> dict:"""Evaluates invoice data against defined reimbursement rules using an LLM,categorizing the expense and determining its reimbursability.Args:action_context: The agent's runtime context.invoice_data: A dictionary of the extracted invoice details.reimbursement_rules: A string containing the natural language reimbursement rules.Returns:A dictionary containing the expense category, reimbursability status,and an explanation, structured according to a defined schema."""
业务结果:可报销,类别为交通费用,及推理理由Invoice Reimbursement Evaluation:{"category": "Travel Expenses","reimbursable": true,"explanation": "The invoice is for a 'Transport Service*Passenger Service Fee' from a transportation vendor, which falls under Ground Transportation in the Travel Expenses category (Rule 3). The expense is directly related to company business as it is a transportation service, satisfying the General Principle 1. The total amount is $35.55, which is under the $200 threshold requiring only manager approval (Approval Requirements). There is no indication of personal items, traffic fines, or other non-reimbursable elements. The invoice includes a line item and a discount/credit adjustment, resulting in the net total. As a standard ground transportation expense for business purposes, it is reimbursable provided it was submitted within 30 days (General Principle 2) and has the required original receipt/digital invoice (General Principle 3)."}
#Agent 流程中,这些数据可存入context,供下一个工具使用。context['invoice_details'] = invoice_datacontext['reimbursement_evaluation'] = evaluation_resultprint("数据已成功存入 ActionContext:")print("--- 存储在 context['invoice_details'] ---")print(json.dumps(context['invoice_details'], ensure_ascii=False, indent=2))print("\n--- 存储在 context['reimbursement_evaluation'] ---")print(json.dumps(context['reimbursement_evaluation'], ensure_ascii=False, indent=2))# 其他 Agent 或工具使用:# 假设有另一个名为 'process_further' 的工具,它会接收这个 context# def process_further(action_context: ActionContext):# invoice_details_from_context = action_context.get('invoice_details')# evaluation_from_context = action_context.get('reimbursement_evaluation')# print(f"\n在另一个工具中访问发票号: {invoice_details_from_context.get('invoice_number')}")# print(f"在另一个工具中访问报销状态: {evaluation_from_context.get('reimbursable')}")# # 进行进一步处理...# # 假设调用这个工具# process_further(context)
夜雨聆风