🔐 核心聚焦:AI 基础设施的真实脆弱面
本周的安全研究高度集中在“架构层”和“语义层”的深水区。研究者们不再局限于简单的提示词注入,而是开始系统性清算现有 Agent 框架底层的隔离设计缺陷,并探索了 VLM 的语义攻击盲区。
面对 OpenClaw 这样复杂的系统,单一的 CVE 漏洞往往只是冰山一角。本周的重大研究通过分析横跨沙箱、插件、提示词等架构层的190多个安全漏洞,揭开了“无需认证权限即可完成远程代码执行 (RCE)”的系统性隐患。
精选论文:漏洞体系、语义后门与去中心化操纵
📊 本周 AI 安全全貌概览
| 28 | ||
| 22 | ||
| 25 | ||
| 11 | ||
| 13 | ||
| 9 | ||
| 12 | ||
| 54 |
💡 编者观察
本周研究观察到的行业转向信号:
⚠ 框架底层防御的匮乏
Agent框架不能单纯信赖执行沙箱。OpenClaw 本周被爆出的系统级组合漏洞,证明了只在隔离网关加一层权限白名单是不起作用的。参数注入一旦发生,整个调用链都会被迫配合执行。
🔬 语义型后门的暗网化
多模态模型的后门攻击已经极其优雅,从贴图攻击演化为了“自然行为触发”。对于 VLM 的商业化落地,这种植入恶性连带信息(广告、造黄谣等)的隐式后门攻击将是巨大的信任灾难。
📚 本周论文全列表 (174 篇)
AI Agent 架构与自动化攻防 (28)
Towards Context-Aware Image Anonymization with Multi-Agent Reasoning[2603.27817v1]
A Systematic Taxonomy of Security Vulnerabilities in the OpenClaw AI Agent Framework[2603.27517v1]
SkillTester: Benchmarking Utility and Security of Agent Skills[2603.28815v1]
SafetyDrift: Predicting When AI Agents Cross the Line Before They Actually Do[2603.27148v1]
SafeClaw-R: Towards Safe and Secure Multi-Agent Personal Assistants[2603.28807v1]
Red-MIRROR: Agentic LLM-based Autonomous Penetration Testing with Reflective Verification and Knowledge-augmented Interaction[2603.27127v1]
Hermes Seal: Zero-Knowledge Assurance for Autonomous Vehicle Communications[2603.26343v1]
Knowdit: Agentic Smart Contract Vulnerability Detection with Auditing Knowledge Summarization[2603.26270v1]
Clawed and Dangerous: Can We Trust Open Agentic Systems?[2603.26221v1]
AVDA: Autonomous Vibe Detection Authoring for Cybersecurity[2603.25930v2]
From Logic Monopoly to Social Contract: Separation of Power and the Institutional Foundations for Autonomous Agent Economies[2603.25100v1]
The System Prompt Is the Attack Surface: How LLM Agent Configuration Shapes Security and Creates Exploitable Vulnerabilities[2603.25056v1]
AIP: Agent Identity Protocol for Verifiable Delegation Across MCP and A2A[2603.24775v1]
Infrastructure for Valuable, Tradable, and Verifiable Agent Memory[2603.24564v1]
ClawKeeper: Comprehensive Safety Protection for OpenClaw Agents Through Skills, Plugins, and Watchers[2603.24414v1]
AgentRFC: Security Design Principles and Conformance Testing for Agent Protocols[2603.23801v1]
The Cognitive Firewall:Securing Browser Based AI Agents Against Indirect Prompt Injection Via Hybrid Edge Cloud Defense[2603.23791v1]
RTS-ABAC: Real-Time Server-Aided Attribute-Based Authorization & Access Control for Substation Automation Systems[2603.23012v1]
AgentRAE: Remote Action Execution through Notification-based Visual Backdoors against Screenshots-based Mobile GUI Agents[2603.23007v1]
SoK: The Attack Surface of Agentic AI -- Tools, and Autonomy[2603.22928v1]
Agent-Sentry: Bounding LLM Agents via Execution Provenance[2603.22868v1]
Agent Audit: A Security Analysis System for LLM Agent Applications[2603.22853v1]
Observable Channels, Not Just Storage: Evaluating Privacy Leakage in LLM Agent Pipelines[2603.22751v2]
CAPTCHA Solving for Native GUI Agents: Automated Reasoning-Action Data Generation and Self-Corrective Training[2603.23559v1]
STRIATUM-CTF: A Protocol-Driven Agentic Framework for General-Purpose CTF Solving[2603.22577v1]
Model Context Protocol Threat Modeling and Analyzing Vulnerabilities to Prompt Injection with Tool Poisoning[2603.22489v1]
Are AI-assisted Development Tools Immune to Prompt Injection?[2603.21642v1]
Auditing MCP Servers for Over-Privileged Tool Capabilities[2603.21641v1]
大模型安全、对齐与越狱注入 (22)
Hidden Ads: Behavior Triggered Semantic Backdoors for Advertisement Injection in Vision Language Models[2603.27522v1]
GUARD-SLM: Token Activation-Based Defense Against Jailbreak Attacks for Small Language Models[2603.28817v1]
Sovereign Context Protocol: An Open Attribution Layer for Human-Generated Content in the Age of Large Language Models[2603.27094v1]
Reentrancy Detection in the Age of LLMs[2603.26497v1]
Protecting User Prompts Via Character-Level Differential Privacy[2603.26032v1]
Unveiling the Resilience of LLM-Enhanced Search Engines against Black-Hat SEO Manipulation[2603.25500v1]
Beyond Content Safety: Real-Time Monitoring for Reasoning Vulnerabilities in Large Language Models[2603.25412v1]
Shape and Substance: Dual-Layer Side-Channel Attacks on Local Vision-Language Models[2603.25403v2]
PIDP-Attack: Combining Prompt Injection with Database Poisoning Attacks on Retrieval-Augmented Generation Systems[2603.25164v1]
IrisFP: Adversarial-Example-based Model Fingerprinting with Enhanced Uniqueness and Robustness[2603.24996v2]
Bridging Code Property Graphs and Language Models for Program Analysis[2603.24837v1]
Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs[2603.24511v1]
Policy-Guided Threat Hunting: An LLM enabled Framework with Splunk SOC Triage[2603.23966v2]
How Vulnerable Are Edge LLMs?[2603.23822v1]
Leveraging Large Language Models for Trustworthiness Assessment of Web Applications[2603.23781v1]
Not All Tokens Are Created Equal: Query-Efficient Jailbreak Fuzzing for LLMs[2603.23269v1]
Robust Safety Monitoring of Language Models via Activation Watermarking[2603.23171v2]
Does Teaming-Up LLMs Improve Secure Code Generation? A Comprehensive Evaluation with Multi-LLMSecCodeEval[2603.22717v1]
BioShield: A Context-Aware Firewall for Securing Bio-LLMs[2603.22612v1]
OrgForge-IT: A Verifiable Synthetic Benchmark for LLM-Based Insider Threat Detection[2603.22499v1]
Evaluating the Reliability and Fidelity of Automated Judgment Systems of Large Language Models[2603.22214v1]
Structured Visual Narratives Undermine Safety Alignment in Multimodal Large Language Models[2603.21697v1]
AI 数据隐私与边缘联邦学习 (25)
SNEAKDOOR: Stealthy Backdoor Attacks against Distribution Matching-based Dataset Condensation[2603.28824v1]
Gender-Based Heterogeneity in Youth Privacy-Protective Behavior for Smart Voice Assistants: Evidence from Multigroup PLS-SEM[2603.27117v1]
Privacy-Preserving Iris Recognition: Performance Challenges and Outlook[2603.26890v1]
Towards Privacy-Preserving Federated Learning using Hybrid Homomorphic Encryption[2603.26417v1]
Privacy-Enhancing Encryption in Data Sharing: A Survey on Security, Performance and Functionality[2603.26224v1]
EPDQ: Efficient and Privacy-Preserving Exact Distance Query on Encrypted Graphs[2603.26219v1]
Gaussian Shannon: High-Precision Diffusion Model Watermarking Based on Communication[2603.26167v1]
Not All Entities are Created Equal: A Dynamic Anonymization Framework for Privacy-Preserving Retrieval-Augmented Generation[2603.26074v1]
Supercharging Federated Intelligence Retrieval[2603.25374v1]
On the Vulnerability of Deep Automatic Modulation Classifiers to Explainable Backdoor Threats[2603.25310v1]
Physical Backdoor Attack Against Deep Learning-Based Modulation Classification[2603.25304v1]
An Explainable Federated Framework for Zero Trust Micro-Segmentation in IIoT Networks[2603.24754v1]
Amplified Patch-Level Differential Privacy for Free via Random Cropping[2603.24695v1]
PAC-DP: Personalized Adaptive Clipping for Differentially Private Federated Learning[2603.24003v1]
An Empirical Analysis of Google Play Data Safety Disclosures: A Consistency Study of Privacy Indicators in Mobile Gaming Apps[2603.23935v1]
Byzantine-Robust and Differentially Private Federated Optimization under Weaker Assumptions[2603.23472v1]
PRETTINESS -- Privacy pResErving aTTrIbute maNagEment SyStem[2603.23221v1]
Privacy-Aware Smart Cameras: View Coverage via Socially Responsible Coordination[2603.23197v1]
Multi-User Multi-Key Image Steganography with Key Isolation[2603.23005v1]
A Critical Review on the Effectiveness and Privacy Threats of Membership Inference Attacks[2603.22987v1]
Beyond Theoretical Bounds: Empirical Privacy Loss Calibration for Text Rewriting Under Local Differential Privacy[2603.22968v1]
Privacy-Preserving EHR Data Transformation via Geometric Operators: A Human-AI Co-Design Technical Report[2603.22954v1]
Combinatorial Privacy: Private Multi-Party Bitstream Grand Sum by Hiding in Birkhoff Polytopes[2603.22808v4]
In-network Attack Detection with Federated Deep Learning in IoT Networks: Real Implementation and Analysis[2603.21596v1]
Hardening Confidential Federated Compute against Side-channel Attacks[2603.21469v1]
Web3、DeFi与区块链治理 (11)
Ordering Power is Sanctioning Power: Sanction Evasion-MEV and the Limits of On-Chain Enforcement[2603.27739v1]
HFIPay: Privacy-Preserving, Cross-Chain Cryptocurrency Payments to Human-Friendly Identifiers[2603.26970v1]
Auditing Blockchain Innovations: Technical Challenges Beyond Traditional Finance[2603.26361v1]
Bitcoin Smart Accounts: Trust-Minimized Native Bitcoin DeFi Infrastructure[2603.26293v1]
PEB Separation and State Migration: Unmasking the New Frontiers of DeFi AML Evasion[2603.26290v1]
zk-X509: Privacy-Preserving On-Chain Identity from Legacy PKI via Zero-Knowledge Proofs[2603.25190v2]
SolRugDetector: Investigating Rug Pulls on Solana[2603.24625v1]
An Adaptive Neuro-Fuzzy Blockchain-AI Framework for Secure and Intelligent FinTech Transactions[2603.23829v1]
n-VM: A Multi-VM Layer-1 Architecture with Shared Identity and Token State[2603.23670v1]
Albank -- a case study on the use of ethereum blockchain technology and smart contracts for secure decentralized bank application[2603.21894v1]
Connecting Distributed Ledgers: Surveying Novel Interoperability Solutions in On-chain Finance[2603.21797v1]
前沿密码学与量子女巫抵抗 (13)
Quantum Bit Error Rate Analysis in BB84 Quantum Key Distribution: Measurement, Statistical Estimation, and Eavesdropping Detection[2603.27278v1]
Attacks on Sparse LWE and Sparse LPN with new Sample-Time tradeoffs[2603.27190v1]
Information-Theoretic Solutions for Seedless QRNG Bootstrapping and Hybrid PQC-QKD Key Combination[2603.26907v1]
Cryptanalysis of a PIR Scheme based on Linear Codes over Rings[2603.26409v1]
Send the Key in Cleartext: Halving Key Consumption while Preserving Unconditional Security in QKD Authentication[2603.25496v1]
Efficient ML-DSA Public Key Management Method with Identity for PKI and Its Application[2603.25043v1]
IPsec based on Quantum Key Distribution: Adapting non-3GPP access to 5G Networks to the Quantum Era[2603.24426v1]
Efficient Encrypted Computation in Convolutional Spiking Neural Networks with TFHE[2603.26781v1]
On the Vulnerability of FHE Computation to Silent Data Corruption[2603.23253v1]
mmFHE: mmWave Sensing with End-to-End Fully Homomorphic Encryption[2603.22437v1]
Asymptotically Ideal Hierarchical Secret Sharing Based on CRT for Integer Ring[2603.22011v1]
Asymptotically Ideal Conjunctive Hierarchical Secret Sharing Scheme Based on CRT for Polynomial Ring[2603.22001v1]
Q-AGNN: Quantum-Enhanced Attentive Graph Neural Network for Intrusion Detection[2603.22365v1]
AI赋能的下一代威胁感知 (9)
Context-Aware Phishing Email Detection Using Machine Learning and NLP[2603.27326v1]
Machine Learning Transferability for Malware Detection[2603.26632v1]
Understanding AI Methods for Intrusion Detection and Cryptographic Leakage[2603.25826v1]
CANGuard: A Spatio-Temporal CNN-GRU-Attention Hybrid Architecture for Intrusion Detection in In-Vehicle CAN Networks[2603.25763v1]
Targeted Adversarial Traffic Generation : Black-box Approach to Evade Intrusion Detection Systems in IoT Networks[2603.23438v1]
An Experimental Study of Machine Learning-Based Intrusion Detection for OPC UA over Industrial Private 5G Networks[2603.23416v1]
Security Barriers to Trustworthy AI-Driven Cyber Threat Intelligence in Finance: Evidence from Practitioners[2603.23304v1]
How Far Should We Need to Go : Evaluate Provenance-based Intrusion Detection Systems in Industrial Scenarios[2603.22982v1]
TLS Certificate and Domain Feature Analysis of Phishing Domains in the Danish .dk Namespace[2603.21652v1]
系统底座、硬件与云原生安全 (12)
Finding Memory Leaks in C/C++ Programs via Neuro-Symbolic Augmented Static Analysis[2603.27224v2]
SPARK: Secure Predictive Autoscaling for Robust Kubernetes[2603.26833v1]
Disguising Topology and Side-Channel Information through Covert Gate- and ML-Enabled IP Camouflaging[2603.25904v1]
ALPS: Automated Least-Privilege Enforcement for Securing Serverless Functions[2603.25393v1]
Design and Development of an ML/DL Attack Resistance of RC-Based PUF for IoT Security[2603.28798v1]
Towards Remote Attestation of Microarchitectural Attacks: The Case of Rowhammer[2603.24172v2]
Walma: Learning to See Memory Corruption in WebAssembly[2603.24167v1]
Toward a Multi-Layer ML-Based Security Framework for Industrial IoT[2603.24111v2]
Mind Your HEARTBEAT! Claw Background Execution Inherently Enables Silent Memory Pollution[2603.23064v2]
Explainable Threat Attribution for IoT Networks Using Conditional SHAP and Flow Behavior Modelling[2603.22771v1]
Semi-Automated Threat Modeling of Cloud-Based Systems Through Extracting Software Architecture from Configuration and Network Flow[2603.22603v1]
Framework for Risk-Based IoT Cybersecurity Audit Engagements[2603.22191v1]
基础安全前沿杂项 (54)
Decentralized Proof-of-Location for Content Provenance: Towards Capture-Time Authenticity[2603.27883v1]
Attacking AI Accelerators by Leveraging Arithmetic Properties of Addition[2603.27439v1]
"Elementary, My Dear Watson." Detecting Malicious Skills via Neuro-Symbolic Reasoning across Heterogeneous Artifacts[2603.27204v1]
Detecting Protracted Vulnerabilities in Open Source Projects[2603.27067v1]
On the Optimal Number of Grids for Differentially Private Non-Interactive $K$-Means Clustering[2603.26963v1]
Evolution-Based Timed Opacity under a Universal Observation Model[2603.26573v1]
Hidden Elo: Private Matchmaking through Encrypted Rating Systems[2603.26407v2]
ROAST: Risk-aware Outlier-exposure for Adversarial Selective Training of Anomaly Detectors Against Evasion Attacks[2603.26093v1]
A Large-scale Empirical Study on the Generalizability of Disclosed Java Library Vulnerability Exploits[2603.25997v1]
Neighbor-Aware Localized Concept Erasure in Text-to-Image Diffusion Models[2603.25994v1]
Why Safety Probes Catch Liars But Miss Fanatics[2603.25861v1]
TAAC: A gate into Trustable Audio Affective Computing[2603.25570v1]
Multi-target Coverage-based Greybox Fuzzing[2603.25354v1]
Second order Recurrences, quadratic number fields and cyclic codes[2603.25343v1]
Usability of Passwordless Authentication in Wi-Fi Networks: A Comparative Study of Passkeys and Passwords in Captive Portals[2603.25290v1]
Mitigating Evasion Attacks in Fog Computing Resource Provisioning Through Proactive Hardening[2603.25257v1]
A Public Theory of Distillation Resistance via Constraint-Coupled Reasoning Architectures[2603.25022v1]
LiteGuard: Efficient Task-Agnostic Model Fingerprinting with Enhanced Generalization[2603.24982v1]
On the Foundations of Trustworthy Artificial Intelligence[2603.24904v1]
Sovereign AI at the Front Door of Care: A Physically Unidirectional Architecture for Secure Clinical Intelligence[2603.24898v1]
An Approach to Generate Attack Graphs with a Case Study on Siemens PCS7 Blueprint for Water Treatment Plants[2603.24888v1]
Trusted-Execution Environment (TEE) for Solving the Replication Crisis in Academia[2603.24878v1]
AI Security in the Foundation Model Era: A Comprehensive Survey from a Unified Perspective[2603.24857v1]
Analysing the Safety Pitfalls of Steering Vectors[2603.24543v1]
A Large-Scale Study of Telegram Bots[2603.24302v1]
Software Supply Chain Smells: Lightweight Analysis for Secure Dependency Management[2603.24282v2]
Attack Assessment and Augmented Identity Recognition for Human Skeleton Data[2603.24232v1]
Invisible Threats from Model Context Protocol: Generating Stealthy Injection Payload via Tree-based Adaptive Search[2603.24203v1]
When Understanding Becomes a Risk: Authenticity and Safety Risks in the Emerging Image Generation Paradigm[2603.24079v1]
Forensic Implications of Localized AI: Artifact Analysis of Ollama, LM Studio, and llama.cpp[2603.23996v1]
AetherWeave: Sybil-Resistant Robust Peer Discovery with Stake[2603.23793v1]
Space Fabric: A Satellite-Enhanced Trusted Execution Architecture[2603.23745v1]
CSTS: A Canonical Security Telemetry Substrate for AI-Native Cyber Detection[2603.23459v1]
Canonical Byte-String Encoding for Finite-Ring Cryptosystems[2603.23364v1]
What a Mesh: Formal Security Analysis of WPA3 SAE Wireless Authentication[2603.23352v1]
The Power of Power Codes: New Classes of Easy Instances for the Linear Equivalence Problem[2603.23230v1]
Gyokuro: Source-assisted Private Membership Testing using Trusted Execution Environments[2603.23226v1]
TRAP: Hijacking VLA CoT-Reasoning via Adversarial Patches[2603.23117v1]
Secure Two-Party Matrix Multiplication from Lattices and Its Application to Encrypted Control[2603.22857v1]
Digital Twin Enabled Simultaneous Learning and Modeling for UAV-assisted Secure Communications with Eavesdropping Attacks[2603.22753v1]
BlindMarket: Enabling Verifiable, Confidential, and Traceable IP Core Distribution in Zero-Trust Settings[2603.22685v1]
Precision-Varying Prediction (PVP): Robustifying ASR systems against adversarial attacks[2603.22590v1]
Tock: From Research to Securing 10 Million Computers[2603.22585v1]
Adversarial Vulnerabilities in Neural Operator Digital Twins: Gradient-Free Attacks on Nuclear Thermal-Hydraulic Surrogates[2603.22525v1]
CTF as a Service: A reproducible and scalable infrastructure for cybersecurity training[2603.22511v2]
Architecture-Derived CBOMs for Cryptographic Migration: A Security-Aware Architecture Tradeoff Method[2603.22442v1]
TALUS: Threshold ML-DSA with One-Round Online Signing via Boundary Clearance and Carry Elimination[2603.22109v2]
SecureBreak -- A dataset towards safe and secure models[2603.21975v1]
Publicly Understandable Electronic Voting: A Non-Cryptographic, End-to-End Verifiable Scheme[2603.21833v1]
Cybersecurity Guidance for Smart Homes: A Cross-National Review of Government Sources[2603.21703v1]
Bridges connecting Encryption Schemes[2603.21694v1]
Towards Secure Retrieval-Augmented Generation: A Comprehensive Review of Threats, Defenses and Benchmarks[2603.21654v1]
A Survey of Web Application Security Tutorials[2603.21556v1]
When the Abyss Looks Back: Unveiling Evolving Dark Patterns in Cookie Consent Banners[2603.21515v1]
如需本周全部论文列表或详细分析资料,欢迎留言交流。
夜雨聆风