【Splunk AI威胁狩猎插件开发教程 11】Day 11: 灵魂注入—

【Splunk AI威胁狩猎插件开发教程 11】Day 11: 灵魂注入——彻底废弃 Mock,接入真实大模型 API

🚀 Day 11: 灵魂注入 —— 彻底废弃 Mock，接入真实大模型 API

今日目标：将 Day 4 配置的全局 API 凭证、Day 6 的真实日志抓取逻辑，以及 Day 10 的执行引擎完美融合。我们将引入 requests 库，真正打通 Splunk 到外部大模型的网络链路，实现全动态的 “真实异常发现 -> AI 现场出谋划策 -> 自动化执行下钻 -> AI 结案定性” 的终极闭环！

💻 架构大纲：今天我们将发生哪些巨变？

1. 时区防弹修复 (The Timezone Fix)：彻底摒弃无时区标识的 utcnow()，引入带有强制时区偏移（+00:00）的 datetime.now(datetime.timezone.utc)，确保 Splunk 无论在哪个大洲都能精准对齐时间。

2. 凭证动态提取：从 Splunk 底层保险箱中提取 API Key 和 Base URL。

3. 实战探针注入：编写 Python 函数，去你的系统里动态抓取最新的 5 条生僻日志（M-ATH 聚类算法）。

4. 网络通信封装：编写 call_llm_api 函数，封装 HTTP 请求，真正对接云端。

5. 两次灵魂唤醒：在 Prepare 阶段（写狩猎蓝图）和 Act 阶段（下发定性战报）真实调用大模型。

💻 终极实战：Day 11 全量代码基线

为了避免代码变成一坨“意大利面”，我们采用了模块化架构，增加了两个极其核心的辅助函数：fetch_rare_logs 和 call_llm_api。

请打开 Add-on Builder 的 Define & Test 编辑器，清空原有代码，直接粘贴以下全量代码。

import os

import sys

import time

import datetime

import json

import uuid

import requests

import splunklib.client as client

import splunklib.results as results

# ==========================================

# HELPER 1: Execute AI Generated SPL

# ==========================================

def execute_ai_spl(helper, service, spl_query):

"""

ExecuteSPL generated by AI and return the raw result data.

"""

spl_query= spl_query.strip()

ifnot spl_query.startswith("search") and not spl_query.startswith("|"):

spl_query= "search " + spl_query

kwargs_oneshot= {"output_mode": "json"}

helper.log_info(f"[AgenticEngine] Executing SPL: {spl_query}")

try:

search_results= service.jobs.oneshot(spl_query, **kwargs_oneshot)

reader= results.JSONResultsReader(search_results)

result_data= [res for res in reader if isinstance(res, dict)]

helper.log_info(f"[AgenticEngine] SUCCESS: Found {len(result_data)} events.")

returnresult_data

exceptException as e:

helper.log_error(f"[AgenticEngine] FAILED execution: {str(e)}")

return[]

# ==========================================

# HELPER 2: Fetch Real Logs (M-ATH Concept)

# ==========================================

def fetch_rare_logs(helper, service, target_index):

"""

Fetchthe most recent rare/anomalous logs from the target index to feed the AI.

"""

helper.log_info("Fetchingreal rare logs for analysis...")

#Using a simple SPL to grab actual data.

#Note: If 'cluster' consumes too much CPU, simplify to 'head 5'

spl= f"search index={target_index} | head 1000 | cluster showcount=t | sort count | head 5 | table _raw"

try:

results_data= execute_ai_spl(helper, service, spl)

ifnot results_data:

returnNone

#Extract the _raw strings and join them into a single payload

raw_logs= [item.get("_raw", "") for item in results_data if "_raw" in item]

return"n".join(raw_logs)

exceptException as e:

helper.log_error(f"Failedto fetch rare logs: {str(e)}")

returnNone

# ==========================================

# HELPER 3: The LLM API Connector

# ==========================================

def call_llm_api(helper, api_key, base_url, model, system_prompt, user_prompt):

"""

Establishreal HTTP connection to the LLM API and return the JSON response.

"""

headers= {

"Authorization":f"Bearer {api_key}",

"Content-Type":"application/json"

}

payload= {

"model":model,

"messages":[

{"role":"system", "content": system_prompt},

{"role":"user", "content": user_prompt}

#Force JSON output (API requires the word 'JSON' in the prompt)

"response_format":{"type": "json_object"}

}

#Ensure URL ends correctly

endpoint= base_url if base_url.endswith("/chat/completions") else f"{base_url.rstrip('/')}/chat/completions"

try:

helper.log_info(f"Initiatingnetwork request to LLM API: {endpoint}")

#Increased timeout to 120s for complex reasoning models

response= requests.post(endpoint, headers=headers, json=payload, timeout=120)

response.raise_for_status()

response_json= response.json()

llm_content= response_json["choices"][0]["message"]["content"]

#Log token usage for FinOps tracking

total_tokens= response_json.get("usage", {}).get("total_tokens", 0)

helper.log_info(f"APICall Success. Consumed {total_tokens} tokens.")

returnllm_content, total_tokens

exceptrequests.exceptions.RequestException as e:

helper.log_error(f"Networkerror during API call: {str(e)}")

raise

# ==========================================

# MAIN WORKFLOW: The Autonomous Agent

# ==========================================

def collect_events(helper, ew):

"""

Day11: Real API Integration Workflow with Timezone Fix.

"""

helper.log_info("PEAKAI Hunter: LIVE MODE INITIALIZED.")

cycle_start_time= time.time()

hunt_session_id= str(uuid.uuid4())

try:

#1. Acquire Splunk Service Session

session_key= getattr(helper, 'session_key', None) or getattr(helper._input_definition, 'metadata', {}).get('session_key')

ifnot session_key:

raiseValueError("Failed to acquire session_key.")

service= client.Service(token=session_key)

#2. Acquire Global Setup Configurations (from Day 4)

api_key= helper.get_global_setting("api_key")

base_url= helper.get_global_setting("base_url")

model_name= helper.get_global_setting("model_name")

target_index= helper.get_output_index() or "main"

#THE FIX: Generate timezone-aware UTC timestamp (e.g., 2026-05-02T12:05:18.142200+00:00)

timestamp_now= datetime.datetime.now(datetime.timezone.utc).isoformat()

ifnot api_key or not base_url:

raiseValueError("API Key or Base URL is missing in Global Settings.")

#==========================================

#PHASE 1: PREPARE (Real LLM Call for Blueprint)

#==========================================

rare_logs_payload= fetch_rare_logs(helper, service, target_index)

ifnot rare_logs_payload:

helper.log_info("Noanomalous logs found to analyze. Terminating cycle early gracefully.")

return

sys_prompt_prepare= "You are a Senior Threat Hunter. You MUST reply in JSON format. Output strictly valid JSON. Schema requires: 'analysis' (string) and 'hypotheses' (array of objects). Each hypothesis must have 'hypothesis_id', 'ABLE' (Actor, Behavior, Location, Evidence), 'spl_round_1_validation', and 'spl_round_2_drilldown'."

usr_prompt_prepare= f"Analyze these real, rare logs from our environment:n{rare_logs_payload}nnGenerate exactly 2 hunting hypotheses to investigate them. Write efficient Splunk SPL for the drill-downs. Output only JSON format."

helper.log_info("TriggeringLLM for Prepare Phase...")

blueprint_text,prep_tokens = call_llm_api(helper, api_key, base_url, model_name, sys_prompt_prepare, usr_prompt_prepare)

ai_hunting_plan= json.loads(blueprint_text.strip())

hypotheses= ai_hunting_plan.get("hypotheses", [])

#Write Plan to Splunk IMMEDIATELY

ew.write_event(helper.new_event(

source=helper.get_input_type(),index=target_index, sourcetype="_json",

data=json.dumps({"session_id":hunt_session_id, "event_type": "PEAK_Plan", "timestamp": timestamp_now, "content": ai_hunting_plan}, ensure_ascii=False)

))

#==========================================

#PHASE 2: EXECUTE (Agentic Splunk Query Loop)

#==========================================

all_hunt_evidence= []

fori, hyp in enumerate(hypotheses):

hyp_start= time.time()

spl_r1= hyp.get("spl_round_1_validation", "").replace("{target_index}", target_index)

spl_r2= hyp.get("spl_round_2_drilldown", "").replace("{target_index}", target_index)

r1_hits= len(execute_ai_spl(helper, service, spl_r1))

r2_hits= len(execute_ai_spl(helper, service, spl_r2))

all_hunt_evidence.append({

"hypothesis_id":hyp.get("hypothesis_id", i+1),

"threat_behavior":hyp.get('ABLE', {}).get('Behavior', 'Unknown'),

"round_1_hit_count":r1_hits,

"round_2_hit_count":r2_hits,

"execution_duration_sec":round(time.time() - hyp_start, 2)

})

#Write Evidence to Splunk IMMEDIATELY

ew.write_event(helper.new_event(

source=helper.get_input_type(),index=target_index, sourcetype="_json",

data=json.dumps({"session_id":hunt_session_id, "event_type": "PEAK_Evidence", "timestamp": timestamp_now, "content": all_hunt_evidence}, ensure_ascii=False)

))

#==========================================

#PHASE 3: ACT (Real LLM Call for Final Report)

#==========================================

sys_prompt_act= "You are a Security Director. You MUST reply in JSON format. Output ONLY valid JSON with keys: 'executive_summary', 'threat_qualification' (Benign/Suspicious/Confirmed), 'risk_score' (0-100), 'recommended_alert_spl'."

usr_prompt_act= f"Here is the quantitative execution evidence collected by our agent:n{json.dumps(all_hunt_evidence)}nnBased on these hit counts, qualify the threat, assign a risk score, and generate an alert SPL. Reply in JSON format."

helper.log_info("TriggeringLLM for Act Phase...")

report_text,act_tokens = call_llm_api(helper, api_key, base_url, model_name, sys_prompt_act, usr_prompt_act)

try:

final_report= json.loads(report_text.strip())

exceptjson.JSONDecodeError as e:

helper.log_error("JSONTruncation in Act Phase. Engaging fallback.")

final_report= {"executive_summary": "LLM output truncated.", "risk_score": -1, "raw": report_text}

#Write Final Report to Splunk

ew.write_event(helper.new_event(

source=helper.get_input_type(),index=target_index, sourcetype="_json",

data=json.dumps({"session_id":hunt_session_id, "event_type": "PEAK_Final_Report", "timestamp": timestamp_now, "total_tokens_used": prep_tokens + act_tokens, "content": final_report}, ensure_ascii=False)

))

helper.log_info(f"LIVECYCLE COMPLETE. Time: {round(time.time() - cycle_start_time, 2)}s. Session ID: {hunt_session_id}")

exceptException as e:

helper.log_error(f"FATALPipeline Crash: {str(e)}")

🔍 极客验证：见证奇迹的时刻

由于我们已经接入了真实的云端 API，并且彻底扫清了外围障碍（timeout 延长到了 120秒，API JSON 强校验也补齐了），修复了最核心的时区偏差，这次测试将会极其丝滑！

操作步骤：

1. 测试前置确认：确保你在 Day 4 的 AOB 界面（Configuration -> Add-on Setup Parameters）中，已经真实填写了你的 API Key、Base URL（例如 https://api.openai.com/v1 或各大厂商兼容的地址）以及模型名称（例如 gpt-4o 或 qwen-plus）。

2. 执行测试：在代码编辑器点击右上角的 Test。这一次，你会发现等待时间变长了（可能需要 15-30 秒），因为代码正在与云端的大模型进行真实的 HTTP 通信！

3. 保存结果：在测试完成后，看到绿色的 Done 后，立刻点击 Save！

4. 查收战果：打开 Splunk 的 Search 界面，执行下面这段查询，时间范围大胆选择 Last 15 minutes：

index=main sourcetype="_json" event_type="PEAK_Plan" OR event_type="PEAK_Evidence" OR event_type="PEAK_Final_Report"

| stats

latest(content.risk_score)as Risk_Score,

latest(content.executive_summary)as Summary,

sum(content{}.round_1_hit_count)as Total_R1_Hits,

sum(content{}.round_2_hit_count)as Total_R2_Hits

bysession_id

| sort - Risk_Score

🎉 终极实战里程碑：如果查询有结果，并且摘要里是对你本机真实日志的分析，这就意味着：你亲手打造的安全 AI 智能体，已经拥有了自己独立思考、下发指令、自动取证的能力！

现在，唯一的问题是：如果今天系统里产生的异常日志多达 10 万字，塞给大模型时会发生什么？这正是明天的 Day 12：动态上下文提纯与防护要解决的终极难题！感受真正的企业级大风大浪吧！

👇 全套教程多平台同步更新 👇

✅ GitHub（原版文档+代码）

https://github.com/ziaoxin/Splunk-AI-PEAK-Tutorial

✅ 掘金（技术图文首发）

https://juejin.cn/column/7618098747671183414

✅ CSDN（运维/Splunk人群）

https://blog.csdn.net/thewindrider/category_13144991.html

✅ GitCode（国内镜像）

https://gitcode.com/Chang_feng_Po/Splunk-AI-PEAK-Tutorial