【原创:Splunk AI威胁狩猎插件开发教程 13】Day13:成本勘算 —

【原创:Splunk AI威胁狩猎插件开发教程 13】Day13:成本勘算 —— 真实环境下的 API Token 成本提取

🚀 Day 13: 成本勘算 —— 真实环境下的 API Token 成本提取

今日目标：由于不同大模型厂商对 Token 消耗量的回传格式完全不同（有的在 usage.total_tokens，有的分 input 和 output，有的甚至塞在 HTTP Headers 里），我们将编写一个强壮的多态提取函数 (Universal Token Extractor)。确保无论你未来对接哪个大模型，都能将每一次 AI 狩猎的真实资金成本精准记录进 Splunk，实现 100% 的财务可审计化！

💻 架构大纲：今天我们将如何重构“计费引擎”？

1. 废弃粗暴获取：移除之前单纯依赖 response_json.get("usage", {}).get("total_tokens", 0) 的脆弱写法。

2. 多厂商兼容字典遍历：

• OpenAI / 阿里 / DeepSeek 族：提取 usage.total_tokens。

• Anthropic / Claude 族：动态计算 usage.input_tokens + usage.output_tokens。

• 网关/代理族：从 HTTP 响应头（Headers）中提取 x-token-usage 等字段。

1. 安全回退机制 (Graceful Degradation)：即便 API 厂商大改版导致提取失败，使用 try-except 兜底返回 0，绝不允许因为计费失败导致核心安全阻断流程崩溃。

💻 终极实战：Day 13 FinOps 计费版全量代码

请打开 Add-on Builder 的 Define & Test 编辑器，用以下代码覆盖原有代码。

import os

import sys

import time

import datetime

import json

import uuid

import requests

import splunklib.client as client

import splunklib.results as results

# ==========================================

# HELPER 1: Execute AI Generated SPL

# ==========================================

def execute_ai_spl(helper, service, spl_query):

"""

ExecuteSPL generated by AI and return the raw result data.

"""

spl_query= spl_query.strip()

#Force the 'search' prefix to prevent syntax errors

ifnot spl_query.startswith("search") and not spl_query.startswith("|"):

spl_query= "search " + spl_query

kwargs_oneshot= {"output_mode": "json"}

helper.log_info(f"[AgenticEngine] Executing SPL: {spl_query}")

try:

search_results= service.jobs.oneshot(spl_query, **kwargs_oneshot)

reader= results.JSONResultsReader(search_results)

result_data= [res for res in reader if isinstance(res, dict)]

helper.log_info(f"[AgenticEngine] SUCCESS: Found {len(result_data)} events.")

returnresult_data

exceptException as e:

helper.log_error(f"[AgenticEngine] FAILED execution: {str(e)}")

return[]

# ==========================================

# HELPER 2: Fetch Real Logs (M-ATH Concept)

# ==========================================

def fetch_rare_logs(helper, service, target_index):

"""

Fetchthe most recent rare/anomalous logs from the target index to feed the AI.

"""

helper.log_info("Fetchingreal rare logs for analysis...")

#Fetching fresh data. Use cluster only if CPU permits, otherwise use head.

spl= f"search index={target_index} | head 5 | table _raw"

try:

results_data= execute_ai_spl(helper, service, spl)

ifnot results_data:

returnNone

#Extract the _raw strings and join them into a single text payload

raw_logs= [item.get("_raw", "") for item in results_data if "_raw" in item]

payload= "n".join(raw_logs)

#=========================================================================

#Context Distillation (Payload Truncation)

#Prevents massive Splunk logs from blowing up the LLM Context Window

#=========================================================================

MAX_CHARS= 6000 # Roughly equals 1500 Tokens

iflen(payload) > MAX_CHARS:

helper.log_info(f"Payloadtoo large ({len(payload)} chars). Truncating to {MAX_CHARS}...")

#Slice the string and append a clear signal for the LLM

payload= payload[:MAX_CHARS] + "nn...[TRUNCATED DUE TO CONTEXT LIMITS. ANALYZE AVAILABLE DATA ONLY.]..."

#=========================================================================

returnpayload

exceptException as e:

helper.log_error(f"Failedto fetch rare logs: {str(e)}")

returnNone

# =========================================================================

# [DAY 13 NEW]: Universal Token Extractor (FinOps Cost Tracking)

# =========================================================================

def extract_token_usage(helper, response_json, response_headers):

"""

Robustlyextract token usage across different LLM providers and API gateways.

EnsuresFinOps tracking never crashes the main thread.

"""

try:

#Strategy 1: OpenAI / DeepSeek / DashScope standard format

if"usage" in response_json:

usage= response_json["usage"]

if"total_tokens" in usage:

returnint(usage["total_tokens"])

#Strategy 2: Anthropic-style or granular input/output split

elif"prompt_tokens" in usage and "completion_tokens" in usage:

returnint(usage["prompt_tokens"]) + int(usage["completion_tokens"])

elif"input_tokens" in usage and "output_tokens" in usage:

returnint(usage["input_tokens"]) + int(usage["output_tokens"])

#Strategy 3: API Gateway headers (e.g., Azure, Cloudflare AI Gateway)

header_keys= [k.lower() for k in response_headers.keys()]

forkey in header_keys:

if"token-usage" in key or "x-ratelimit-usage" in key:

returnint(response_headers.get(key, 0))

exceptException as e:

helper.log_error(f"[FinOpsWarning] Failed to parse token usage correctly: {str(e)}")

#Graceful degradation: Return 0 if extraction fails, ensuring pipeline survival

return0

# ==========================================

# HELPER 3: The LLM API Connector

# ==========================================

# Added dynamic 'max_tokens' parameter to function signature

def call_llm_api(helper, api_key, base_url, model, system_prompt, user_prompt, max_tokens):

"""

Establishreal HTTP connection to the LLM API and return the JSON response.

"""

headers= {

"Authorization":f"Bearer {api_key}",

"Content-Type":"application/json"

}

payload= {

"model":model,

"messages":[

{"role":"system", "content": system_prompt},

{"role":"user", "content": user_prompt}

#Mandatory flag for modern LLMs to strictly output JSON

"response_format":{"type": "json_object"},

#Hardware-level output boundary (Token Circuit Breaker)

"max_tokens":max_tokens

}

#Ensure URL formatting is correct

endpoint= base_url if base_url.endswith("/chat/completions") else f"{base_url.rstrip('/')}/chat/completions"

try:

helper.log_info(f"Initiatingnetwork request to LLM API: {endpoint} (Max Tokens: {max_tokens})")

#120s timeout ensures deep-thinking models (CoT) have enough time

response= requests.post(endpoint, headers=headers, json=payload, timeout=120)

response.raise_for_status()

response_json= response.json()

llm_content= response_json["choices"][0]["message"]["content"]

#=========================================================================

#[DAY 13 MODIFIED]: Call the Universal Token Extractor

#=========================================================================

total_tokens= extract_token_usage(helper, response_json, response.headers)

helper.log_info(f"APICall Success. FinOps Tracked: {total_tokens} tokens consumed.")

returnllm_content, total_tokens

exceptrequests.exceptions.RequestException as e:

helper.log_error(f"Networkerror during API call: {str(e)}")

raise

# ==========================================

# MAIN WORKFLOW: The Autonomous Agent

# ==========================================

def collect_events(helper, ew):

"""

TheUltimate Live Workflow.

Features:Real API Integration, Unix Epoch Time injection, Anti-Hallucination, Truncation, and FinOps Tracking.

"""

helper.log_info("PEAKAI Hunter: LIVE MODE INITIALIZED.")

cycle_start_time= time.time()

#Generate a unique Session ID to stitch the flattened logs together

hunt_session_id= str(uuid.uuid4())

try:

#1. Acquire Splunk Service Session

session_key= getattr(helper, 'session_key', None) or getattr(helper._input_definition, 'metadata', {}).get('session_key')

ifnot session_key:

raiseValueError("Failed to acquire session_key.")

service= client.Service(token=session_key)

#2. Acquire Global Setup Configurations (API credentials)

api_key= helper.get_global_setting("api_key")

base_url= helper.get_global_setting("base_url")

model_name= helper.get_global_setting("model_name")

target_index= helper.get_output_index() or "main"

ifnot api_key or not base_url:

raiseValueError("API Key or Base URL is missing in Global Settings.")

#==========================================

#PHASE 1: PREPARE (Real LLM Call for Blueprint)

#==========================================

rare_logs_payload= fetch_rare_logs(helper, service, target_index)

ifnot rare_logs_payload:

helper.log_info("Noanomalous logs found to analyze. Terminating cycle early gracefully.")

return

#Prompt Distillation - Forcing extreme conciseness

sys_prompt_prepare= "You are a Senior Threat Hunter. You MUST reply in JSON format. Be extremely concise. No pleasantries. Schema requires: 'analysis' (string) and 'hypotheses' (array of objects). Each hypothesis must have 'hypothesis_id', 'ABLE' (Actor, Behavior, Location, Evidence), 'spl_round_1_validation', and 'spl_round_2_drilldown'."

#ANTI-HALLUCINATION FIX: Forcing the LLM to strictly use {target_index} parameter

usr_prompt_prepare= f"Analyze these real, rare logs from our environment:n{rare_logs_payload}nnGenerate exactly 2 hunting hypotheses. CRITICAL: For 'spl_round_1_validation' and 'spl_round_2_drilldown', you MUST strictly start your queries with 'search index={{target_index}}'. Do NOT guess or use real index names! Output ONLY JSON format."

helper.log_info("TriggeringLLM for Prepare Phase...")

#Pass max_tokens=1500 for generating SPLs

blueprint_text,prep_tokens = call_llm_api(helper, api_key, base_url, model_name, sys_prompt_prepare, usr_prompt_prepare, max_tokens=1500)

ai_hunting_plan= json.loads(blueprint_text.strip())

hypotheses= ai_hunting_plan.get("hypotheses", [])

#Write Plan to Splunk IMMEDIATELY (Injecting dynamic Unix Time)

ew.write_event(helper.new_event(

source=helper.get_input_type(),index=target_index, sourcetype="_json",

time=time.time(),# THE ULTIMATE TIMEZONE FIX

data=json.dumps({

"session_id":hunt_session_id,

"event_type":"PEAK_Plan",

"timestamp":round(time.time(), 3),

"content":ai_hunting_plan

},ensure_ascii=False)

))

#==========================================

#PHASE 2: EXECUTE (Agentic Splunk Query Loop)

#==========================================

all_hunt_evidence= []

fori, hyp in enumerate(hypotheses):

hyp_start= time.time()

spl_r1= hyp.get("spl_round_1_validation", "").replace("{target_index}", target_index)

spl_r2= hyp.get("spl_round_2_drilldown", "").replace("{target_index}", target_index)

r1_hits= len(execute_ai_spl(helper, service, spl_r1))

r2_hits= len(execute_ai_spl(helper, service, spl_r2))

all_hunt_evidence.append({

"hypothesis_id":hyp.get("hypothesis_id", i+1),

"threat_behavior":hyp.get('ABLE', {}).get('Behavior', 'Unknown'),

"round_1_hit_count":r1_hits,

"round_2_hit_count":r2_hits,

"execution_duration_sec":round(time.time() - hyp_start, 2)

})

#Write Evidence to Splunk IMMEDIATELY (Injecting dynamic Unix Time)

ew.write_event(helper.new_event(

source=helper.get_input_type(),index=target_index, sourcetype="_json",

time=time.time(),# THE ULTIMATE TIMEZONE FIX

data=json.dumps({

"session_id":hunt_session_id,

"event_type":"PEAK_Evidence",

"timestamp":round(time.time(), 3),

"content":all_hunt_evidence

},ensure_ascii=False)

))

#==========================================

#PHASE 3: ACT (Real LLM Call for Final Report)

#==========================================

#Concise prompt for Act Phase (Limits summary length)

sys_prompt_act= "You are a Security Director. Output ONLY valid JSON. Keep summaries under 30 words. Keys: 'executive_summary', 'threat_qualification' (Benign/Suspicious/Confirmed), 'risk_score' (0-100), 'recommended_alert_spl'."

usr_prompt_act= f"Here is the quantitative execution evidence collected by our agent:n{json.dumps(all_hunt_evidence)}nnBased on these hit counts, qualify the threat, assign a risk score, and generate an alert SPL. Reply in JSON format."

helper.log_info("TriggeringLLM for Act Phase...")

#Pass max_tokens=800 since this is just a short summary

report_text,act_tokens = call_llm_api(helper, api_key, base_url, model_name, sys_prompt_act, usr_prompt_act, max_tokens=800)

try:

final_report= json.loads(report_text.strip())

exceptjson.JSONDecodeError as e:

helper.log_error("JSONTruncation in Act Phase. Engaging fallback.")

final_report= {"executive_summary": "LLM output truncated.", "risk_score": -1, "raw": report_text}

#Write Final Report to Splunk (Injecting dynamic Unix Time)

ew.write_event(helper.new_event(

source=helper.get_input_type(),index=target_index, sourcetype="_json",

time=time.time(),# THE ULTIMATE TIMEZONE FIX

data=json.dumps({

"session_id":hunt_session_id,

"event_type":"PEAK_Final_Report",

"timestamp":round(time.time(), 3),

"total_tokens_used":prep_tokens + act_tokens,

"content":final_report

},ensure_ascii=False)

))

helper.log_info(f"LIVECYCLE COMPLETE. Time: {round(time.time() - cycle_start_time, 2)}s. Session ID: {hunt_session_id}")

exceptException as e:

helper.log_error(f"FATALPipeline Crash: {str(e)}")

💵 极客验证：将 Token 转化为真金白银

代码写好了，大模型不再是“糊涂账”了。现在，让我们在 Splunk 中体验一把精算师的快感！

1. 在 AOB 中保存代码并点击 Test 运行一次完整的流程。

2. 回到 Splunk 的 Search 界面。这一次，我们要在面板里引入一个惊艳的动态运算——把 Token 直接折算成美元成本（Cost USD）！ (假设我们以 GPT-4o-mini 或 Qwen 的平均价格，约为 $0.002 每 1000 个 Token 进行估算)

执行以下带有财务视角的终极 Dashboard SPL：

index=main sourcetype="_json" event_type="PEAK_Plan" OR event_type="PEAK_Evidence" OR event_type="PEAK_Final_Report"

| spath

| stats

min(timestamp)as Start_Time_Epoch,

max(timestamp)as End_Time_Epoch,

latest(content.risk_score)as Risk_Score,

latest(content.executive_summary)as Summary,

sum(content{}.round_1_hit_count)as Total_R1_Hits,

sum(content{}.round_2_hit_count)as Total_R2_Hits,

sum(total_tokens_used)as Total_Tokens

bysession_id

| eval Execution_Time_Sec = round(End_Time_Epoch - Start_Time_Epoch, 2)

| eval Start_Time = strftime(Start_Time_Epoch, "%Y-%m-%d %H:%M:%S")

| eval Cost_USD = "$" . tostring(round((Total_Tokens / 1000) * 0.002, 6))

| sort - Start_Time_Epoch

| table Start_Time, session_id, Risk_Score, Total_R1_Hits, Total_R2_Hits, Execution_Time_Sec, Total_Tokens, Cost_USD, Summary

🎯 你的验收时刻：看一眼表格的倒数第二列！Cost_USD 会以 $ 开头的形式，无比清晰地告诉你：刚刚这几秒钟大模型的思考，到底花了公司几厘钱！有了这套极为强壮的兼容逻辑，不管你们安全团队以后切换成哪个厂商的大模型，这份成本监控表将永远精准跳动。这就是企业级开发的尽头：业务要闭环，财务要透传！

👇 全套教程多平台同步更新 👇

✅ GitHub（原版文档+代码）

https://github.com/ziaoxin/Splunk-AI-PEAK-Tutorial

✅ 掘金（技术图文首发）

https://juejin.cn/column/7618098747671183414

✅ CSDN（运维/Splunk人群）

https://blog.csdn.net/thewindrider/category_13144991.html

✅ GitCode（国内镜像）

https://gitcode.com/Chang_feng_Po/Splunk-AI-PEAK-Tutorial