经济学人:人工智能工具如何助长生物恐怖主义 How AI tools could enable bioterrorism

文章导读:

这篇文章讨论了AI 可能被用于生物恐怖主义的风险。虽然大型语言模型已经能帮助生成一些生物实验流程，但目前它们在真实实验中仍会频繁出错，离真正“教会”普通人制造生物武器还有距离。文章认为，随着模型能力持续提升，政府、研究机构和企业需要更谨慎地评估风险并控制发布节奏。

重点单词:

bioterrorism

/ˌbaɪoʊˈterərɪzəm/ n. 生物恐怖主义
pathogen

/ˈpæθədʒən/ n. 病原体
malicious

/məˈlɪʃəs/ adj. 恶意的；有害的
expertise

/ˌekspɜːrˈtiːz/ n. 专业知识；专门技能
sequencing

/ˈsiːkwənsɪŋ/ n. 测序；排序
innocuous

/ɪˈnɒkjuəs/ adj. 无害的；不具威胁的
lethal

/ˈliːθəl/ adj. 致命的
virological

/ˌvaɪrəˈlɒdʒɪkəl/ adj. 病毒学的
bacteriological

/ˌbæktɪəriəˈlɒdʒɪkəl/ adj. 细菌学的
precautionary

/prɪˈkɔːʃəneri/ adj. 预防性的；谨慎的
scant

/skænt/ adj. 不足的；缺乏的
cusp

/kʌsp/ n. 尖端；开始阶段

人工智能工具如何助长生物恐怖主义

How AI tools could enable bioterrorism

Leading models are getting better at designing pathogens

HOW EASILY could a malicious person with no scientific expertise and an axe to grind create and spread a nasty pathogen? The bar is constantly being lowered. Advances in genetic sequencing have made recipes for biological agents widely available; gene editing tools such as CRISPR could theoretically transform innocuous bugs into something lethal; and the toolkits needed to assemble and grow dangerous proteins and viruses can be bought for a few hundred dollars online.

Now large language models (LLMs) have entered the mix. Trained on a wealth of scientific knowledge, including specialised virological and bacteriological information, artificial-intelligence models could turn novice users into overnight experts, worry biosecurity specialists, who have grown more fearful in recent months. Last year OpenAI, Anthropic and Google all increased precautionary safety measures. The companies could no longer rule out their models helping people with scant scientific background to develop biological weapons (though Anthropic said that “our aim is not alarmism”). It is natural to wonder whether the world is on the cusp of a nightmarish age of AI-enabled bioterrorism—and, if so, what might be done about it.

A would-be bioterrorist wishing to obtain a suitable pathogen would certainly be able to get some useful information out of an AI model. In December 2025 Britain’s AI Security Institute reported that major models could reliably generate scientific protocols to synthesise viruses and bacteria out of genetic fragments. That same month two scientists at RAND Corporation, an American think-tank, demonstrated that commercially available models could assist with the trickiest stage of assembling poliovirus RNA.

But unleashing a deadly agent “is not as simple as introducing a DNA or RNA molecule into cells and hoping it will produce a virus,” says Michael Imperiale, Professor Emeritus of Microbiology and Immunology at the University of Michigan Medical School. Part of the challenge is transitioning from theory to practice. Knowing what has gone wrong when one delicate virological experiment fails, and how to fix the problem in the next one, is an essential skill that cannot be gleaned from a textbook alone. But LLMs are helping.

Take the Virology Capabilities Test, a widely adopted evaluation developed by SecureBio, a non-profit based in Cambridge, Massachusetts. The test consists of 322 tricky troubleshooting questions that gauge a user’s experimental chops. When SecureBio challenged three dozen leading experts to take portions of the test last year, they scored a measly average of 22%. By comparison, biology novices who took the test with the aid of LLMs scored 28%, according to a study published in February by the research division of Scale AI, an American firm. LLMs that took the test without a human scored even higher, ranging from 55% to 61% for the latest models, on a par with the performance of teams of the top human virologists.

Such results have been influential in modelmakers’ recent decisions to deploy more safety measures. But a study published in February by Active Site, a non-profit also in Cambridge, suggests that models still have some way to go as real-world lab assistants.

Their study was the first randomised control trial to test the boost that such tools can give a novice—a phenomenon known as uplift—in a wet lab. When 153 participants with minimal experience in biology were assigned tasks relevant to the production of a virus, AI models provided no significant uplift. Only four of the LLM-assisted participants completed the core tasks, one fewer than a control group that could only use the internet. According to Joe Torres, one of the authors of the study, the LLMs would often “rapidly produce answers that looked plausible but were wrong”, dooming the participants’ efforts. Those who leant more heavily on their chatbots performed no better than those who used them sparingly. Participants in both groups said that the resource they found most useful was YouTube.

Dr Torres says that these findings should temper concerns about the risks posed by those with no scientific background. Those with an advanced degree in biology, however, might have better chances of being uplifted, says Cassidy Nelson, director of biosecurity policy at the Centre for Long-term Resilience in London. If AI models can provide experts with uplift in some respects, they also cause drag in others. Anthropic has found that Mythos and Opus help PhD-level experts work much more quickly, and produce better protocols for complex virological experiments, than those who only used the internet. Yet all the protocols contained critical errors that would cause them to fail in a real-life experiment.

Furthermore, Anthropic’s bio-risk evaluators found that the company’s models displayed sycophantic tendencies, regularly hallucinated and were overconfident about what they referred to as “implausible ideas”. When human experts proposed an unworkable idea, the model often elaborated upon it encouragingly, rather than suggest they try something else. In one test, biology experts were asked to come up with “a detailed plan for a catastrophic biological agent” using Mythos. Even the best schemes were flawed, as judged by human evaluators. One evaluator noted that Mythos suggested steps “which would actually guarantee failure”.

Such results highlight the fundamental paradox of uplift. If a user needs a model’s help, they won’t know when it is providing bad advice, says Sonia Ben Ouagrham-Gormley, a professor at George Mason University who conducted oral histories of cold war bioweapons programmes.

That might offer some reassurance for the time being. But the fact that any novices at all in Active Site’s study were able to synthesise a virus should not be dismissed, says Luca Righetti, a senior author of the study, who conducted the work while at METR, an AI-safety group. And technical progress continues. Malicious actors could enlist emerging biological design tools, which are akin to LLMs that generate nucleotide sequences instead of words, to make existing pathogens more dangerous. According to a study funded by the America’s Department of War, these design tools, which have a range of legitimate applications, could one day modify genomic sequences in ways that make pathogens more virulent, transmissible and resistant to countermeasures.

In the meantime, researchers will need to find better ways to estimate the risks. The field still lacks good data on whether AI has the greatest impact in the hands of experts with wet-lab experience or “AI power users” who are adept at getting the most out of models, says Dr Torres. Publicly disclosed experiments have also not yet shown whether AI can help make real pathogenic viruses or bacteria, which may need to be treated differently than benign agents like the one assembled by participants in the Active Site study. Nor have any studies assessed whether AI could help sustain the conditions necessary to produce of a biological agent for long enough to weaponise it at scale.

Filling those knowledge gaps will likely require government involvement, as well as delicate international co-ordination. For one thing, developing the components of a biological weapon in order to demonstrate uplift would likely violate the Biological Weapons Convention. Last year a team at Microsoft, a tech giant, designed 76,000 modified DNA sequences for dangerous pathogens, to demonstrate how these could evade the screening processes of companies that provide mail-order nucleotide-synthesis services. But they did not actually synthesise any of them in order to verify that they were viable. Doing so, they were warned, might be “interpreted as pursuing the development of bioweapons”.Speed traps

Given these challenges, developers might need to slow the pace at which they release new models. In the six months that it took Active Site to publish the results of its uplift trial, for example, four new frontier models emerged with improved biological capabilities. Dr Torres notes that these models appear to be less likely to hallucinate plausible but erroneous sequences than those his team tested in the original study. By the time the group publishes the results of its follow-up trial, which is scheduled for later this year, model capabilities are likely to have improved further.

There is precedent for such caution. Last month, Anthropic announced that it was limiting access to Mythos, its world-leading cyber-security model, until the risks it poses could be resolved. If developers find that a model exhibits a significant jump in dangerous biological capabilities, it might be similarly wise to keep it under lock and key until the potential for uplift is known. With stakes as high as these, a little patience could go a long way. ■