Claude 安全插件首发、xAI 重置 Grok Build 额度、本地 1-bit 扩散模型上线

24h AI 大事清单
2026.05.26

Claude 安全插件首发、xAI 重置 Grok Build 额度、本地 1-bit 扩散模型上线

一份过去 24 小时 AI 圈的干货清单——从一份 X List 里筛了 15 条主推、560 条评论。

盘点区间:2026 年 5 月 25 日 23:54 — 5 月 26 日 23:54(UTC)
阅读时长:约 6 分钟 · 字数 2300 · 含 7 段官方视频

写在前面

今天 X List 抓回来的数据是中等密度:15 条主推、560 条评论、7 个官方视频、22 张配图。条数不算多,但今天少有的"主角厂商整齐发声"——Anthropic、OpenAI、Google、xAI 全部在同一天有动作。

把吆喝、转发、纯玩梗筛掉之后,剩下的"信号"按惯例分四块:

1主角厂商(Anthropic 安全插件 + OpenAI Codex Mobile/Databricks)

2第二梯队(Google Gemini 新设计语言 + xAI 重置额度/Custom Skills)

3第三梯队 · 本地优先派(PrismML 1-bit 扩散 + Tencent 翻译模型)

4其他玩家(Shopify×Perplexity + Marlin 视频 VLM + CHI-Bench)

按这个顺序来。每条尽量"一句话讲清是什么 + 一两句说明重点在哪"。

PART 01

主角厂商:Anthropic 推安全插件,OpenAI 用 Codex 啃 Databricks 客户文档

01Anthropic:Claude Code 内置 security-guidance 插件,PR 安全评论降 30-40%

今天 Anthropic 旗下 @ClaudeDevs 一次性发了 3 条连推,是今天 X List 里讨论量最大的厂商发布(131+131+6+2 条评论,还在持续涨)。

主推原话:

"We've shipped a security-guidance plugin for Claude Code that helps identify and fix vulnerabilities as you're writing code.
Available for all Claude Code users. Install from the plugin marketplace (/plugins)."

这次的"新货"分三层——

▎ 三级 hook 在哪里拦

第二条推把工作机制讲清楚了:

"It runs via hooks and reviews code on three levels:
· On file edits: looks for risky patterns like commonly misused dangerous libraries
· After model turns: reviews the full diff for harder-to-spot issues
· On commits: reads surrounding code to validate vulnerabilities"

翻译过来:文件保存时扫"已知危险库"、模型回合结束时审查整段 diff、提交前再结合上下文复核。这套思路把"shift-left 安全"做到了"模型写一行查一行"的粒度,比传统 SAST 工具的"PR 后扫"早了一整个循环。

▎ 内部跑出来的数字

第三条把 Anthropic 自己的使用结果摆出来:

"We've been using the plugin extensively at Anthropic.
Across our internal rollout and benchmarks, we've seen a 30-40% decrease in security-related comments on PRs opened using the plugin."

"PR 上的安全相关评论减少 30-40%"。这是 Anthropic 内部 dog-fooding 跑出来的数字,不是把找到多少漏洞当指标,而是看 reviewer 还需要追写多少安全相关 comment。评论区 @aabyzov 给了一个不错的标注:

"Comments-that-never-had-to-be-written is the right metric. Most tools count findings; this counts friction removed at compose-time."

▎ 组织自定义规则

第四条说的是怎么注入企业自定义规则:

"You can add org-specific rules in a claude-security-guidance.md file. Drop it in your repo or distribute via MDM. The plugin enforces your policies alongside the built-in checks."

claude-security-guidance.md 文件直接放进仓库,或者通过 MDM 分发——大企业把安全 SOP 转成"Claude 写代码时必看的一份 markdown"。

关键判断:这一波最值得关注的不是"AI 帮你查漏洞"——市面上的 SAST 工具早就在做这件事——而是模型厂商自己下场做"官方一方插件"。评论区 @amperlycom 一句话点出:

"when the platform owner ships first-party plugins, the ecosystem is real"

平台方亲自下场写官方插件,意味着 Claude Code 的插件市场(/plugins)从"社区试水"进入"官方背书+生态卡位"阶段。同期 Cursor 和 GitHub Copilot 都还没有"模型厂商一方插件"这套打法。

▶

🎬 视频

Claude Code 安全插件主推视频

02OpenAI:Codex Mobile 改人写代码节奏,GPT-5.5 进 Databricks 啃客户文档

@OpenAIDevs 今天发了两条,一条是产品体验,一条是企业落地。

▎ Codex Mobile 让"离开屏幕"成了优势

发自一名 OpenAI 工程师的体验贴:

"Codex Mobile is making me a better developer in a way I didn't expect: I step away from my laptop and stop micromanaging.

I give it much more ambitious prompts (the way models work best).

And I get space to think instead of sitting there with burning eyes spamming prompts."

中文意思:Codex Mobile 让我变成了更好的开发者,但方式出乎意料——我离开了笔记本,停止微管理;给的 prompt 更大胆(这才是模型真正擅长的方式);终于有空间思考,而不是红着眼盯屏幕、一遍遍补 prompt。

评论里 @curonianai 把这件事概括得很到位:

"Codex Mobile became the AirPods of dev tools. Hands-free coding on the throne use case."

——"开发工具里的 AirPods,让你能 hands-free 写代码"。这条信号挺重要:agentic coding 的下一站是"模型跑、人离场",而手机端是这个工作流的承载体。

▎ GPT-5.5 + Databricks:客户文档解析

第二条更短:

"GPT-5.5 in Codex helps @databricks parse complex customer documents more reliably."

文档解析是数据团队这两年最痛的痛点之一(PDF、合同、扫描件,每家格式都不一样)。@aabyzov 抓住了重点:

"Customer-doc parsing is where the quiet wins hide. Schema never standard, the long tail of weird formats is what breaks pipelines."

信号在哪:OpenAI 这一天的两条放一起看——一条把"agentic coding"推到手机端、把人从屏幕前解放;另一条把 GPT-5.5 Codex 塞进 Databricks 的企业数据流水线。两条都不是"新模型发布",但都是把已有模型推进到"真正改变工作流"的位置。

▶

🎬 视频

GPT-5.5 + Databricks 文档解析演示

PART 02

第二梯队:Google 给 Gemini 换设计语言,xAI 同日修额度+发 Skills

01Google:Neural Expressive — Gemini App 的新设计语言

@Google 官号在 #GoogleIO 现场宣布:

"Neural Expressive is a new design language for @GeminiApp, with fluid animations, vibrant colors, haptic feedback and more improved experiences across the app. See it in action with @JoshWoodward at #GoogleIO"

简单说就是 Gemini 整个 App 视觉系统升级——流体动画 + 高饱和配色 + 触觉反馈。一句话评价:这是 Google 给 Gemini 做"消费者品牌质感"的一步,对标 ChatGPT 那种"工具气",Google 想让 Gemini 看起来更像"贴身助手"。

评论区 @LLMWorkflows 提了个尖锐问题:

"Fluid motion ≠ functional improvement. Test whether users actually get things done quicker."

设计语言不等于功能改进,要看用户是不是真的更快完成任务——这条留给一周后再看效果。

配套:同期 @GoogleAI 发了一张 Gemini Omni 的 prompt 卡片(无文字推文),从评论看应该是 Gemini 多模态生成的提示词手册类素材。

▶

🎬 视频

Gemini Neural Expressive 演示视频

02xAI:道歉式重置 Grok Build 额度 + 同日发 Grok Custom Skills

@xai 今天发了今日 X List 里评论量最高的一条(254 条评论):

"Thank you so much for all the feedback on the Grok Build Beta.

Some of you reported hitting limits quickly. Our team found areas to improve caching, so we've reset Grok Build usage limits for all accounts.

Please keep sharing feedback - the team is here to help."

5 月 25 日 Grok Build Beta 刚开放就传出"限额太紧"的反馈,xAI 24 小时之内回应——优化缓存 + 全员重置额度。评论区一片"Thanks",也有人继续提"weekly limits"的诉求。这是新产品上线后的标准危机公关,但速度够快。

同一天 @grok 发了 Grok Custom Skills 的产品视频:

"Create 'em in seconds, use 'em daily
Automate your life with Custom Skills"

——"几秒钟做一个,每天都在用"。Custom Skills 是 Grok 这条线对标 ChatGPT GPTs / Claude Skills 的功能,把"每个人自己的工作流"封装成可复用技能。这是 xAI 从"对话界面"往"个人自动化平台"扩的关键一步。

▶

🎬 视频

Grok Custom Skills 主推视频

PART 03

第三梯队 · 本地优先派:1-bit 扩散模型 + 登顶 HF 的翻译模型

01PrismML:1-bit / Ternary Bonsai Image 4B,手机能跑的扩散模型

@PrismML 一次性放了 6 条连推+Apache 2.0 开源,今天最值得技术圈关注的发布:

"Today we're releasing 1-bit and Ternary Bonsai Image 4B.

A new family of image-generation models designed to run high-quality diffusion inference on local hardware: from laptops to phones."

两个变体的硬指标(作者原话):

· The 1-bit Bonsai Image 4B diffusion transformer is just 0.93GB. That is 8.3x smaller than the full-precision diffusion transformer.
· Ternary Bonsai Image 4B ... at 1.21GB, it uses ternary weights {-1, 0, +1} to add more representational flexibility ... still remaining extremely compact at 6.4x smaller.

社区第一时间的实测数字:

— 社区实测 —

· @DaBrown95:iPhone 17 Pro Max 上 Ternary 模型跑 1024×1024,约 60 秒一张

· @ivanfioravanti:M5 Max 上 Ternary 模型跑 1024×1024,约 10 秒一张

同时上线了 iOS 端的 Bonsai Studio App,"on-device 出图,不订阅、不走 API"。

信号在哪:今年图像扩散模型这条线一直在追"端侧化",但 1-bit / 三值量化能把 4B 参数压到 1GB 以内、还保持可用质量,这是把"在 iPhone 上离线生图"从 demo 推到日用的关键节点。配 Apache 2.0 协议,意味着第三方 App 可以无心理负担集成。

▶

🎬 视频

Bonsai Studio iPhone 实拍演示

02Tencent Hunyuan:Hy-MT2 翻译模型登顶 HF 开源趋势榜

@TencentHunyuan 发的是趋势榜成绩:

"Our latest Tencent Hunyuan translation models are on fire on Hugging Face:
· Hy-MT2-1.8B ranks #1
· Hy-MT2-30B-A3B ranks #4 on the open-source model trending leaderboard, with over 7K downloads already!"

1.8B 小模型登顶、30B-A3B MoE 进前四——翻译这个垂直领域里,国产模型再一次在 HF 趋势榜上拿到主舞台位置。评论区 @rotenbar:"I am using 7B model it is amazing",海外用户的实际反馈也跟上来了。

PART 04

其他玩家:Shopify×Perplexity、Marlin 视频 VLM、CHI-Bench 医疗 Agent 评测

01Shopify × Perplexity Computer:再一次"代理跑店铺"

@Shopify 发的是接入 Perplexity Computer 后的演示:

"manage your store with @perplexity_ai Computer

do market research, generate product images, and design a theme in parallel

another day, another agent where you can run your business"

做市场调研、生成商品图、设计主题,三件事并行跑。这是继 ChatGPT Operator / Anthropic Computer Use 之后,电商场景的又一次代理化尝试。Perplexity CEO @AravSrinivas 转推强调了一句:"Perplexity Computer can manage your @Shopify store"。

▶

🎬 视频

Perplexity Computer 管 Shopify 店铺演示

02Marlin-2B:Apache 2.0 的开源视频理解 VLM

Hugging Face 的 @victormustar 介绍 Nemo Station 团队的新发布:

"cool new release: a tiny open video VLM that understands what happens in videos and when

Marlin-2B (Apache 2.0!) can caption clips into timestamped events, or find a natural-language moment inside the video"

2B 参数 + Apache 2.0 + 可以"按自然语言找视频中某个时刻"——这是给视频检索、剪辑、监控录像分析这类场景的开源新选择。评论区 @LIFgenii:"一年前这种活儿还需要一整套大模型 stack 才能演示,现在 2B 就能跑"。

▶

🎬 视频

Marlin-2B 视频时刻定位演示

03CHI-Bench:世界首个长流程医疗 Agent 评测集

@iscreamnearby(Actava 团队)在 Hugging Face 上线 CHI-Bench:

"Introducing CHI-Bench on @huggingface: the world's first long-horizon healthcare benchmark for AI agents.

75 real healthcare workflows + 20 apps + 200+ MCP tools + 1,290 skills + process / outcome rewards"

数字本身就很说明问题——75 个真实医疗工作流、20 个应用、200+ 个 MCP 工具、1290 个 skill,外加过程奖励和结果奖励分别打分。评论区 @dudat3ch 提了一个关键问题:

"For healthcare workflows, do you score 'safe refusal / escalate to human' separately from task completion?"

作者回复确认有这一维度——比如 prior auth(预授权)场景,"证据不足时强行提交"会被单独扣分。医疗这个场景下"该拒绝就拒绝"是 Agent 的必修课,CHI-Bench 把它写进了评测协议里。

— 一句话总结这 24 小时 —

Anthropic 把"模型写代码 + 模型查代码"做成了官方一方插件;OpenAI 把 GPT-5.5 推进 Databricks 的客户文档流水线、Codex Mobile 让开发者把手机当成"hands-free 代理终端"。同一天 Google 给 Gemini 换上 Neural Expressive 新设计语言、xAI 24 小时内修完 Grok Build 额度还顺手发了 Custom Skills。

第三梯队真正改变体感的是 PrismML——1-bit 扩散模型把 4B 参数压到 0.93GB,iPhone 上 60 秒出一张 1024×1024,端侧生图从"demo 阶段"迈进了"日用阶段"。

主线很清楚:前沿厂商在把"模型 + 工作流 + 平台心智"打通成一整套体验,开源派则在把"前沿能力推到本地、推到端侧、推到手机"。两条线今天罕见地同步发声。

这份 Newsletter 的素材来源是一份订阅了 OpenAI、Anthropic、Google、xAI、Hugging Face、ModelScope、Tencent 等主要 AI 厂商账号的 X List,每 24 小时抓一次。

如果你希望我继续做这种"24 小时 AI 大事清单"——给个赞 / 在看,我就接着写下一期。