外刊精读-AI 工具如何可能助长生物恐怖主义

✨ How AI tools could enable bioterrorism（AI 工具如何可能助长生物恐怖主义）

💬 Leading models are getting better at designing pathogens（顶尖模型在设计病原体方面越来越拿手）（原文来自经济学人^[1]）

🤖 段落 1

[EN] HOW EASILY could a malicious personwith no scientific expertise andan axe to grindcreate and spread a nastypathogen? Thebaris constantly being lowered. Advances ingenetic sequencinghave made recipes for biological agents widely available; gene editing tools such asCRISPRcould theoretically transforminnocuous bugsinto somethinglethal; and thetoolkits needed to assemble and grow dangerous proteins and viruses can be bought for a few hundred dollars online.

[CN] 一个没有任何科学专业知识、却心怀恶意、挟怨报复的人，究竟有多容易制造并传播一种危险的病原体？如今，这道门槛正不断降低。基因测序技术的进步，让生物制剂的“配方”变得广泛可得；CRISPR 基因编辑等工具在理论上可以把无害微生物改造成致命之物；而组装、培养危险蛋白质和病毒所需的工具包，在网上花几百美元就能买到。

🤖 段落 2

[EN] Now large language models (LLMs) have entered the mix. Trained on a wealth of scientific knowledge, including specialised virological and bacteriological information, artificial-intelligence models could turn novice users into overnight experts, worry biosecurity specialists, who have grown more fearful in recent months. Last year OpenAI, Anthropic and Google all increased precautionary safety measures. The companies could no longer rule out their models helping people with scant scientific background to develop biological weapons, though Anthropic said that “our aim is not alarmism”. It is natural to wonder whether the world is on the cusp of a nightmarish age of AI-enabled bioterrorism—and, if so, what might be done about it.

[CN] 如今，大语言模型（LLMs）也加入了这场危险的组合。人工智能模型接受过海量科学知识训练，其中包括专业的病毒学和细菌学信息；这让生物安全专家日益担心：新手用户可能一夜之间被“拔高”为专家。过去一年，OpenAI、Anthropic 和 Google 都加强了预防性安全措施。这些公司已无法再排除一种可能性：它们的模型会帮助几乎没有科学背景的人开发生物武器——尽管 Anthropic 表示，“我们的目的并不是制造恐慌”。人们自然会问：世界是否正处在……的边缘，即将迈入一个由 AI 赋能的生物恐怖主义噩梦时代？如果是，又该如何应对？

🤖 段落 3

[EN] A would-be bioterrorist wishing to obtain a suitable pathogen would certainly be able to get some useful information out of an AI model. In December 2025 Britain’s AI Security Institute reported that major models could reliably generate scientific protocols to synthesize viruses and bacteria out of genetic fragments. That same month two scientists at RAND Corporation, an American think-tank, demonstrated that commercially available models could assist with the trickiest stage of assembling poliovirus RNA.

[CN] 一个想获取合适病原体的潜在生物恐怖分子，当然能从 AI 模型那里得到一些有用信息。2025 年 12 月，英国 AI Security Institute 报告称，主流模型已能可靠生成科学实验方案，用合成方式把基因片段拼装成病毒和细菌。同月，美国智库 RAND Corporation 的两名科学家展示了一个结果：商业化可用模型能够协助完成组装脊髓灰质炎病毒 RNA 过程中最棘手的环节。

🤖 段落 4

[EN] But unleashing a deadly agent “is not as simple as introducing a DNA or RNA molecule into cells and hoping it will produce a virus,” says Michael Imperiale, Professor Emeritus of Microbiology and Immunology at the University of Michigan Medical School. Part of the challenge is transitioning from theory to practice. Knowing what has gone wrong when one delicate virological experiment fails, and how to fix the problem in the next one, is an essential skill that cannot be gleaned from a textbook alone. But LLMs are helping.

[CN] 但密歇根大学医学院微生物学与免疫学荣休教授 Michael Imperiale 表示，释放一种致命制剂“并不是把一个 DNA 或 RNA 分子导入细胞，然后指望它产生病毒那么简单”。难点之一在于从理论走向实践。精细的病毒学实验一旦失败，知道问题出在哪里，并懂得如何在下一次实验中修正，这是关键能力，而这种能力不可能仅靠教科书习得。不过，大语言模型正在提供帮助。

🤖 段落 5

[EN] Take the Virology Capabilities Test, a widely adopted evaluation developed by SecureBio, a non-profit based in Cambridge, Massachusetts. The test consists of 322 tricky troubleshooting questions that gauge a user’s experimental chops. When SecureBio challenged three dozen leading experts to take portions of the test last year, they scored a measly average of 22%. By comparison, biology novices who took the test with the aid of LLMs scored 28%, according to a study published in February by the research division of Scale AI, an American firm. LLMs that took the test without a human scored even higher, ranging from 55% to 61% for the latest models, on a par with the performance of teams of the top human virologists.

[CN] 以“病毒学能力测试”（Virology Capabilities Test）为例。这是一项由位于马萨诸塞州剑桥的非营利组织 SecureBio 开发、并被广泛采用的评估。测试包含 322 道棘手的故障排查题，用来衡量用户的实验功底。去年，SecureBio 让 36 位顶尖专家参加部分测试，他们的平均得分只有区区 22%。相比之下，美国公司 Scale AI 的研究部门今年 2 月发表的一项研究显示，在大语言模型辅助下参加测试的生物学新手得分为 28%。如果让大语言模型在无人类参与的情况下独立答题，最新模型的得分更高，达到 55% 至 61%，与顶尖人类病毒学家团队的表现不相上下。

🤖 段落 6

[EN] Such results have been influential in modelmakers’ recent decisions to deploy more safety measures. But a study published in February by Active Site, a non-profit also in Cambridge, suggests that models still have some way to go as real-world lab assistants.

[CN] 这些结果影响了模型开发者近期加强安全措施的决定。不过，同样位于剑桥的非营利组织 Active Site 今年 2 月发表的一项研究表明，若要成为现实实验室助手，这些模型仍有一段路要走。

🤖 段落 7

[EN] Their study was the first randomised control trial to test the boost that such tools can give a novice—a phenomenon known as uplift—in a wet lab. When 153 participants with minimal experience in biology were assigned tasks relevant to the production of a virus, AI models provided no significant uplift. Only four of the LLM-assisted participants completed the core tasks, one fewer than a control group that could only use the internet. According to Joe Torres, one of the authors of the study, the LLMs would often “rapidly produce answers that looked plausible but were wrong”, dooming the participants’ efforts. Those who leant more heavily on their chatbots performed no better than those who used them sparingly. Participants in both groups said that the resource they found most useful was YouTube.

[CN] 这项研究是首个在湿实验室中检验此类工具能给新手带来多大助推效果的随机对照试验；这种助推现象被称为 能力提升。在实验中，153 名几乎没有生物学经验的参与者被分配了与病毒生产相关的任务，但 AI 模型并未带来显著提升。使用大语言模型辅助的参与者中，只有 4 人完成了核心任务，比只能使用互联网的对照组还少 1 人。研究作者之一 Joe Torres 表示，这些模型常常“迅速给出看起来合理但实际上错误的答案”，从而使参与者的努力功亏一篑。那些更依赖聊天机器人的人，表现并不比偶尔使用的人更好。两组参与者都表示，他们觉得最有用的资源是 YouTube。

🤖 段落 8

[EN] Dr Torres says that these findings should temper concerns about the risks posed by those with no scientific background. Those with an advanced degree in biology, however, might have better chances of being uplifted, says Cassidy Nelson, director of biosecurity policy at the Centre for Long-term Resilience in London. If AI models can provide experts with uplift in some respects, they also cause drag in others. Anthropic has found that Mythos and Opus help PhD-level experts work much more quickly, and produce better protocols for complex virological experiments, than those who only used the internet. Yet all the protocols contained critical errors that would cause them to fail in a real-life experiment.

[CN] Torres 博士说，这些发现应当缓和人们对“毫无科学背景者”所构成风险的担忧。不过，伦敦 Centre for Long-term Resilience 的生物安全政策主任 Cassidy Nelson 表示，拥有生物学高等学位的人，或许更可能从模型中获得提升。如果说 AI 模型在某些方面能增强专家能力，那么在另一些方面也会造成拖累。Anthropic 发现，与只使用互联网的专家相比，Mythos 和 Opus 能帮助博士级专家更快工作，并为复杂的病毒学实验产出更好的方案。然而，所有方案都包含关键错误，足以导致真实实验失败。

🤖 段落 9

[EN] Furthermore, Anthropic’s bio-risk evaluators found that the company’s models displayed sycophantic tendencies, regularly hallucinated and were overconfident about what they referred to as “implausible ideas”. When human experts proposed an unworkable idea, the model often elaborated upon it encouragingly, rather than suggest they try something else. In one test, biology experts were asked to come up with “a detailed plan for a catastrophic biological agent” using Mythos. Even the best schemes were flawed, as judged by human evaluators. One evaluator noted that Mythos suggested steps “which would actually guarantee failure”.

[CN] 此外，Anthropic 的生物风险评估人员发现，该公司的模型表现出迎合用户的倾向，经常产生幻觉，并且对自己所谓的“不大可能成立的想法”过度自信。当人类专家提出一个行不通的设想时，模型往往会顺着这个想法继续鼓励式展开，而不是建议他们换个方向。在一项测试中，生物学专家被要求使用 Mythos 设计“一个灾难性生物制剂的详细方案”。人类评估者认为，即便是其中最好的方案也存在缺陷。一名评估者指出，Mythos 建议的某些步骤“实际上会确保失败”。

🤖 段落 10

[EN] Such results highlight the fundamental paradox of uplift. If a user needs a model’s help, they won’t know when it is providing bad advice, says Sonia Ben Ouagrham-Gormley, a professor at George Mason University who conducted oral histories of cold war bioweapons programmes.

[CN] 这些结果凸显了“能力提升”内在的根本悖论。乔治梅森大学教授 Sonia Ben Ouagrham-Gormley 曾对冷战时期生物武器项目进行口述史研究。她说，如果一个用户需要模型帮助，那他就无法判断模型什么时候是在给出糟糕建议。

🤖 段落 11

[EN] That might offer some reassurance for the time being. But the fact that any novices at all in Active Site’s study were able to synthesise a virus should not be dismissed, says Luca Righetti, a senior author of the study, who conducted the work while at METR, an AI-safety group. And technical progress continues. Malicious actors could enlist emerging biological design tools, which are akin to LLMs that generate nucleotide sequences instead of words, to make existing pathogens more dangerous. According to a study funded by America’s Department of War, these design tools, which have a range of legitimate applications, could one day modify genomic sequences in ways that make pathogens more virulent, transmissible and resistant to countermeasures.

[CN] 就目前而言，这一点或许能让人稍感安心。但该研究的资深作者 Luca Righetti 表示，不应忽视这样一个事实：在 Active Site 的研究中，毕竟有新手成功合成了病毒。Righetti 是在 AI 安全组织 METR 工作期间开展这项研究的。而技术进步仍在继续。恶意行为者可以利用新兴的生物设计工具，让现有病原体变得更加危险；这些工具类似于大语言模型，只不过它们生成的不是文字，而是核苷酸序列。美国战争部资助的一项研究显示，这些设计工具虽有一系列合法用途，但有朝一日也可能通过修改基因组序列，使病原体更具毒力、更易传播，并更能抵抗应对措施。

🤖 段落 12

[EN] In the meantime, researchers will need to find better ways to estimate the risks. The field still lacks good data on whether AI has the greatest impact in the hands of experts with wet-lab experience or “AI power users” who are adept at getting the most out of models, says Dr Torres. Publicly disclosed experiments have also not yet shown whether AI can help make real pathogenic viruses or bacteria, which may need to be treated differently than benign agents like the one assembled by participants in the Active Site study. Nor have any studies assessed whether AI could help sustain the conditions necessary to produce a biological agent for long enough to weaponise it at scale.

[CN] 与此同时，研究人员需要找到更好的风险评估方法。Torres 博士表示，这一领域仍缺乏可靠数据，无法判断 AI 在哪类人手中影响最大：是有湿实验室经验的专家，还是善于最大限度榨取模型能力的“AI 高阶用户”。公开披露的实验也尚未证明，AI 是否能帮助制造真正具有致病性的病毒或细菌；这类对象可能需要不同于 Active Site 研究中那种良性制剂的处理方式。也还没有研究评估过，AI 是否能帮助维持生产生物制剂所需的条件，并将其持续到足以实现规模化武器化的程度。

🤖 段落 13

[EN] Filling those knowledge gaps will likely require government involvement, as well as delicate international co-ordination. For one thing, developing the components of a biological weapon in order to demonstrate uplift would likely violate the Biological Weapons Convention. Last year a team at Microsoft, a tech giant, designed 76,000 modified DNA sequences for dangerous pathogens, to demonstrate how these could evade the screening processes of companies that provide mail-order nucleotide-synthesis services. But they did not actually synthesise any of them in order to verify that they were viable. Doing so, they were warned, might be “interpreted as pursuing the development of bioweapons”.

[CN] 要填补这些知识空白，很可能需要政府介入，也需要微妙而谨慎的国际协调。原因之一是，为了证明 AI 的能力提升效果而开发生物武器组件，很可能违反《生物武器公约》。去年，科技巨头 Microsoft 的一个团队为危险病原体设计了 76,000 条经修改的 DNA 序列，目的是展示这些序列如何绕过提供邮购核苷酸合成服务的公司的筛查流程。但他们并未实际合成其中任何一条来验证其可行性。有人警告他们，如果这样做，可能会被“解释为在追求生物武器开发”。

🤖 段落 14

[EN] Speed traps. Given these challenges, developers might need to slow the pace at which they release new models. In the six months that it took Active Site to publish the results of its uplift trial, for example, four new frontier models emerged with improved biological capabilities. Dr Torres notes that these models appear to be less likely to hallucinate plausible but erroneous sequences than those his team tested in the original study. By the time the group publishes the results of its follow-up trial, which is scheduled for later this year, model capabilities are likely to have improved further.

There is precedent for such caution. Last month, Anthropic announced that it was limiting access to Mythos, its world-leading cyber-security model, until the risks it poses could be resolved. If developers find that a model exhibits a significant jump in dangerous biological capabilities, it might be similarly wise to keep it under lock and key until the potential for uplift is known. With stakes as high as these, a little patience could go a long way.

[CN] “减速带”。鉴于这些挑战，开发者或许需要放慢新模型发布的节奏。比如，Active Site 从完成能力提升试验到公布结果用了六个月；就在这六个月里，已经有四个新的前沿模型问世，并展现出更强的生物学能力。Torres 博士指出，与其团队在原始研究中测试的模型相比，这些模型似乎不太容易幻觉出“看似合理却错误”的序列。等该团队在今年晚些时候公布后续试验结果时，模型能力很可能又已进一步提高。

这种谨慎并非没有先例。上个月，Anthropic 宣布限制用户访问 Mythos——其全球领先的网络安全模型——直到其风险得到解决。如果开发者发现某个模型在危险生物能力上出现显著跃升，那么在明确其能力提升潜力之前，类似地将其严加管控或许是明智之举。面对如此高的风险，稍微多一点耐心，可能意义重大。

🤖 重点词汇对照（25 个）

• malicious person — 心怀恶意
• an axe to grind — 挟怨报复
• pathogen — 病原体
• bar — 门槛
• genetic sequencing — 基因测序
• CRISPR — CRISPR 基因编辑
• innocuous bugs — 无害微生物
• lethal — 致命
• toolkits — 工具包
• large language models (LLMs) — 大语言模型（LLMs）
• biosecurity specialists — 生物安全专家
• precautionary safety measures — 预防性安全措施
• rule out — 排除
• on the cusp of — 处在……的边缘
• would-be bioterrorist — 潜在生物恐怖分子
• synthesize — 合成
• genetic fragments — 基因片段
• transitioning from theory to practice — 从理论走向实践
• gleaned — 习得
• experimental chops — 实验功底
• measly — 区区
• on a par with — 不相上下
• real-world lab assistants — 现实实验室助手
• randomised control trial — 随机对照试验
• uplift — 能力提升

🔖 引用链接

[1] 经济学人: https://www.economist.com/science-and-technology/2026/05/05/how-ai-tools-could-enable-bioterrorism