外刊精读-AI 模型或能为数学家提供一套共同语言

✨ AI models could offer mathematicians a common language（AI 模型或能为数学家提供一套共同语言）

💬 Some hope they will simplify the process of verifying proofs（有人希望它们能简化核验证明的过程）（原文来自经济学人^[1]）

🤖 段落 1

[EN] WORKING OUT how to most efficiently pack a crate full of oranges may seem like a juvenile pursuit for professional mathematicians. And yet the sphere-packing problem, as this pastime is properly known, confounded geometers for centuries. A breakthrough came in 1998, when Thomas Hales, a mathematician then at the University of Michigan, claimed to have proved what had long been conjectured: that hexagonal stacking, in which each sphere sits in the recess formed by a circle of six on the layer beneath it, provides the densest possible configuration.[CN] 研究一箱橙子怎样摆放最省空间，听上去也许像职业数学家才会沉迷的幼稚游戏。可这个正式名称为球体堆积问题的难题，几个世纪来一直难倒几何学家。1998 年，局面终于出现突破：当时任教于密歇根大学的数学家托马斯·黑尔斯（Thomas Hales）声称，他证明了一个长期以来的猜想——六角形堆叠，也就是每个球都落在下一层六个球围出的凹槽中，是最致密的排列方式。

🤖 段落 2

[EN] The story did not quite end there. It took over a decade of checking and rechecking before his fellow mathematicians were happy that Dr Hales’s calculations were, in fact, correct. Such a state of affairs is not unusual. Before a mathematical conclusion can be elevated to the exalted status of proof, its steps have to be painstakingly examined symbol by symbol and proposition by proposition. That highlights a “core bottleneck in mathematics, which is trust”, says Patrick Shafto, who works at America’s Defence Advanced Research Projects Agency (DARPA).[CN] 但事情并没有就此结束。黑尔斯博士的同行们花了十多年反复核查，才终于认可他的计算确实无误。这种局面在数学界并不罕见。一个数学结论要被提升到“证明”这一崇高地位，其中每一步都必须被一丝不苟地逐个符号、逐条命题地审查。美国国防高级研究计划局（DARPA）的 Patrick Shafto 说，这恰恰凸显了数学中的一个核心瓶颈：信任。

🤖 段落 3

[EN] Dr Shafto leads a team hoping to use artificial intelligence to accelerate the rate of progress in pure mathematics by, among other things, streamlining this formalisation process. If highbrow maths-literate large language models (LLMs) can indeed certify existing proofs, as well as help develop new ones, many mathematicians hope they could speed up discovery in an area that is otherwise painfully slow.[CN] Shafto 博士领导着一个团队，希望借助人工智能加快纯数学的推进速度；其中一个办法，就是把证明的形式化流程进一步简化提速。如果那些真正读得懂高等数学的大语言模型（LLM）不仅能验证现有证明，还能帮助发展新的证明，那么许多数学家都希望，它们能让这个原本推进得异常缓慢的领域更快产生新发现。

🤖 段落 4

[EN] One thing that is already clear is that LLMs work through mathematical logic very differently from humans. Before getting stuck into a challenging computation, humans like to devise at least the outline of a plan of attack—setting out their intentions with the aim of writing proofs that can be followed at each step. LLMs, by contrast, run as a “stream of consciousness,” says Terence Tao, a mathematician at the University of California, Los Angeles. Such systems, therefore, work through problems based on what they think they should see next, not future steps; they meander and sound more like “improv dialogue rather than scripted text”.[CN] 目前已经很清楚的一点是，LLM 处理数学逻辑的方式与人类大不相同。面对一项棘手计算时，人通常会先搭出一个大致的进攻框架，先把自己的意图交代清楚，好写出每一步都能被跟上的证明。相比之下，加州大学洛杉矶分校的数学家陶哲轩（Terence Tao）说，LLM 更像是在进行“意识流”运转。因此，这类系统解题时依赖的是它认为“下一步应该出现什么”，而不是更远处的后续步骤；它们常常一路游移迂回，听起来更像即兴对话，而不是预先写好的文本。

🤖 段落 5

[EN] This improvisational approach nonetheless pays dividends. In recent months AI models have proved capable of solving long-standing open mathematical questions as well as entirely new problems designed to flummox them. “The field is moving very rapidly,” says Dr Tao.[CN] 尽管如此，这种即兴式的方法还是带来了回报。近几个月来，AI 模型不仅展现出解决长期悬而未决数学问题的能力，也能解开那些专门用来难住它们的全新题目。陶哲轩说：“这个领域进展得非常快。”

🤖 段落 6

[EN] For Pushmeet Kohli, who heads AI for Science work at Google DeepMind, an AI lab, progress depends on ensuring these models express themselves in ways mathematicians can follow. One of the models he and his team have produced with this aim in mind is AlphaEvolve, a tool designed to generate proofs for optimisation problems: ones whose goal is to find mathematical objects that best meet some given criteria (densest sphere-packing, say). Although some techniques to solve such problems already existed, they were accessible only to experts. AlphaEvolve, by contrast, can be prompted using “natural language”, meaning non-experts can work with it. Although Dr Tao, who helped with its development, says that its reasoning can occasionally be hard to follow, he feels that DeepThink AI, another Google tool, is able to adequately explain its working.[CN] 在谷歌 DeepMind 负责 AI for Science 的 Pushmeet Kohli 看来，进步的关键在于让这些模型以数学家能跟得上的方式表达自己。他和团队为此开发的一款模型名叫 AlphaEvolve，专门为优化问题生成证明——也就是寻找最符合某项标准的数学对象，比如最致密的球体堆积。虽然求解这类问题的一些技术原本就存在，但过去只有专家能熟练使用；相比之下，AlphaEvolve 可以用“自然语言”来提示，因此非专家也能操作。参与其开发的陶哲轩承认，它的推理偶尔仍不太好追踪，但他认为谷歌的另一款工具 DeepThink AI 已经能较充分地解释其运作过程。

🤖 段落 7

[EN] There are plenty of other players in the space. One is Harmonic, an American startup. When its bot, known as Aristotle, verifies a proof, it tries to follow the steps as submitted by the user. It does this by first translating the proof into a rigorous language of symbols and axioms that allows each step to be checked. Aristotle makes use of Lean, an open-source coding language popular with mathematicians. If the human proof-writer has made small errors, Aristotle then attempts to fix them. If the human has skipped some steps (mathematicians often make intuitive leaps), it fills in the details. The result is an airtight proof that follows the same principles as the original, says Tudor Achim, Harmonic’s boss. If a human wants to understand it, they still have to do some leg-work to decompose it, he explains, “But they know that every single step is correct.”[CN] 这一领域的参与者远不止这些，美国初创公司 Harmonic 就是其中之一。它的机器人 Aristotle 在核验证明时，会尽量沿着用户提交的步骤往下走。为此，它先把证明翻译成一套足够严密、由符号和公理构成的语言，从而使每一步都可以被检查。Aristotle 使用的是 Lean——一种在数学界颇受欢迎的开源形式化语言。如果人类写证明时犯了小错，它会尝试修补；如果人类省略了某些步骤（数学家常凭直觉跳步），它也会把缺失的细节补上。Harmonic 的负责人 Tudor Achim 说，最终得到的是一个与原证明遵循相同原则、却无懈可击的证明。若人类想真正读懂它，仍得自己下些功夫把它拆开理解；“但至少他们知道，每一步都绝对正确。”

🤖 段落 8

[EN] Math, Inc. is another AI startup pursuing the same goal. It is developing a model called Gauss that can also convert a human-written proof into lines of Lean code. One pair of targets it had been pursuing were proofs to higher-dimensional versions of the sphere-packing problem obtained in 2016 by Maryna Viazovska, a mathematician then at Humboldt University of Berlin. Gauss successfully formalised these proofs—for 8-dimensional and 24-dimensional spheres, respectively—within weeks. Even though both proofs had already been verified without AI, Gauss’s work has contributed to mathematicians’ understanding of the tools in Dr Viazovska’s proofs.[CN] 另一家朝同一目标前进的 AI 初创公司是 Math, Inc.。它正在开发一个名为 Gauss 的模型，也能把人类写出的证明转成一行行 Lean 代码。它此前瞄准的一组目标，是柏林洪堡大学数学家 Maryna Viazovska 于 2016 年给出的高维球体堆积证明。Gauss 在数周内就成功把其中关于 8 维和 24 维球体的证明完成了形式化。尽管这两项证明在没有 AI 的情况下也已被验证过，Gauss 的工作仍加深了数学家对 Viazovska 证明中所用工具的理解。

🤖 段落 9

[EN] Show your working. At DARPA, Dr Shafto hopes to find a way to automate translation between natural language and formal languages such as Lean, and decompose complex proofs into the propositions they are typically made up of. Bringing what he describes, at present, as “a hot mess of papers, text books and human heads” into a unified body of work would allow younger mathematicians to more readily navigate the existing literature. That could radically accelerate the progress of pure mathematicians and lower the barrier to entry for others. For now, though, he warns that mathematical expertise remains a valuable skill for users. “The AI makes mistakes, and you have to be able to figure out where,” he says.[CN] “把推导过程写出来。”在 DARPA，Shafto 博士希望找到一种办法，能在自然语言与 Lean 这类形式语言之间实现自动翻译，并把复杂证明拆解成它们通常由之构成的一条条命题。若能把他口中如今仍是一团“论文、教科书和人脑里的乱麻”的东西，整理成一个统一知识体系，年轻数学家就能更容易地在现有文献中摸清路径。这或许会大幅加快纯数学研究的进度，也会降低其他人进入这一领域的门槛。不过就目前而言，他提醒说，数学专长对使用者依旧很有价值。“AI 会犯错，而你得有能力找出错在哪里。”

🤖 段落 10

[EN] There are other pitfalls. Timothy Gowers, a mathematician at Cambridge University, is working on automatic theorem proving, in which computers are trained to find proofs by trying to mirror how humans do so. As part of that work, the lab is exploring whether AI models can make non-obvious connections between mathematical subfields, rather than simply retreading old ground. When human mathematicians are thinking through alternative approaches to a problem, says Dr Gowers, “you have to sort of dig around; there’s a whole process by which you try to generate non-standard ideas.”[CN] 当然，别的隐患也不少。剑桥大学数学家 Timothy Gowers 正在研究自动定理证明：训练计算机去模仿人类寻找证明的方式。作为这项工作的一部分，他的实验室正在探索，AI 模型能否在不同数学子领域之间建立那些并不显然的联系，而不是一味重走老路。Gowers 说，人类数学家在为一个问题寻找替代路径时，往往得四处摸索，通过一整套过程去生成非常规的想法。

🤖 段落 11

[EN] So far, the models’ ability to replicate human creativity has fallen short. One failing, says Dr Gowers, is that LLMs struggle to apply what they have learned in solving one problem when tackling another. Human mathematicians also develop what he calls an aesthetic sense, prompting them to look for neater proofs that can sometimes yield surprising results. That combination of aesthetic sense and deep knowledge seems quite difficult for AI models to emulate. On that front, at least, “humans still have the edge.”[CN] 到目前为止，这些模型复制人类创造力的能力仍显不足。Gowers 说，其中一个弱点是：LLM 很难把它们在解决一个问题时学到的东西迁移到另一个问题上。与此同时，人类数学家还会形成他所谓的审美判断，从而驱使自己去寻找更漂亮、更简洁的证明，而那有时恰恰会带来出人意料的结果。审美判断与深厚知识结合起来形成的能力，似乎仍很难被 AI 模仿。至少在这一点上，“人类仍然占优”。

🤖 段落 12

[EN] Bots can also misbehave in surprising ways. In March Donald Knuth, a mathematician at Stanford University, was working with his colleague Filip Stappers on a problem akin to that of the travelling salesman, in which the shortest route has to be found for a salesman to visit a number of cities once before returning home. To help move things along, they turned to Claude Opus 4.6, an LLM developed by Anthropic.[CN] 机器人还会以出人意料的方式“出岔子”。今年 3 月，斯坦福大学数学家 Donald Knuth 与同事 Filip Stappers 正在研究一个类似旅行商问题的课题：一个推销员要把若干城市各走一遍后回到起点，怎样走最短。为了推进研究，他们求助于 Anthropic 开发的大语言模型 Claude Opus 4.6。

🤖 段落 13

[EN] After using several different approaches, Claude was able to solve the problem for cases when, at each stop, an odd number of routes remained to choose from. But when it was pushed to solve the problem for all even numbers, the model malfunctioned. “In the end, it was not even able to write and run explore programs correctly any more,” wrote Mr Stappers, referring to the code used by the model to express its reasoning. “Very weird.” All the same, wrote Dr Knuth, the way Claude tackled the problem left him impressed. (A few weeks later another researcher built on this work using ChatGPT 5.4 Pro, another LLM, to solve the problem for even numbers.)[CN] 在尝试了几种不同方法后，Claude 成功解决了这样一类情形：每到一站，剩下可选路线的条数都是奇数。但当研究者要求它把问题推广到所有偶数情形时，模型却失灵了。Stappers 写道，到了最后，它“甚至已经不能正确地编写和运行 explore 程序了”——这里指的是模型用来表达推理过程的代码，“非常诡异”。不过，Knuth 也写道，Claude 处理这个问题的方式仍让他印象深刻。（几周之后，另一位研究者又在这项工作的基础上，使用另一款大语言模型 ChatGPT 5.4 Pro，把偶数情形也解决了。）

🤖 段落 14

[EN] If these difficulties can be overcome, there are prizes aplenty on the horizon. In theory, an AI model able to retain the mathematical corpus could make new connections which have long eluded human researchers. And a model that can effectively reason about mathematics could also be taught to reason about other quantitative fields, from economics to physics. With such immense problem-solving abilities at their disposal, says Mr Kohli, the real challenge for humans will be finding the next set of problems worth tackling.[CN] 如果这些困难能够被克服，前方的回报将十分丰厚。理论上，一个能够保有整个数学知识语料的 AI 模型，也许可以建立起那些长期未被发现的新联系。而一个能有效进行数学推理的模型，也可能被训练去处理从经济学到物理学等其他定量学科的推理。有了这样强大的解题能力在手，Kohli 说，人类真正的挑战反而会变成：下一批值得去攻克的问题，究竟是什么。

🤖 重点词汇对照（20–25 个）

• sphere-packing problem — 球体堆积问题
• confounded — 难倒
• breakthrough — 突破
• conjectured — 猜想
• exalted status — 崇高地位
• painstakingly — 一丝不苟地
• bottleneck — 瓶颈
• streamlining — 简化提速
• formalisation — 形式化
• certify — 验证
• stream of consciousness — 意识流
• meander — 游移迂回
• improvisational — 即兴式
• flummox — 难住
• optimisation — 优化
• natural language — 自然语言
• rigorous — 严密
• axioms — 公理
• airtight — 无懈可击
• unified body of work — 统一知识体系
• pitfalls — 隐患
• theorem proving — 定理证明
• retreading old ground — 重走老路
• aesthetic sense — 审美判断
• eluded — 未被发现

🔖 引用链接

[1] 经济学人: https://www.economist.com/science-and-technology/2026/04/08/ai-models-could-offer-mathematicians-a-common-language