AI 超级周期经济学:Infrastructure 与 Capstone Case

"I would go long on the lowest layer of the stack because at least in the US, we have forgotten how to build very foundational infrastructure." — Sachin Katti

Sachin Katti（OpenAI Head of Industrial Compute，前 Intel CTO、Stanford EE 教了 15 年的教授）做客 Apoorv Agrawal 主讲的《AI 超级周期经济学》，给出 OpenAI 工业计算机负责人的内部视角：算力经济学正在被改写，"compute" 这词的语义在 2026 年已经从 GPU 扩展成一条完整供应链，而 AI 本身正在设计下一代 AI 基础设施。

一句话总结

AI 超级周期的真正稀缺品不是 GPU、不是模型、不是资本——而是把 1 GW / 700 亿美元 / 50 万张 GPU 在尽可能短的时间内点亮上线的能力，这件事的物理极限被 ASML 和少数几家 fab 锁死；OpenAI 的全部赌注都押在"如何在被锁死的供给里，把每一瓦算力抢先 6 个月送到模型面前"。

核心观点速览

• Compute 是完整供应链，不只是 GPU。Chips + memory + networking + power + cooling + building + 发电 + 输电 + 土地 = "compute"，OpenAI 的工作是采购并编排这条链。

• 收入 = 算力规模 × 利用率——是 frontier lab 的滞后指标。过去三年 OpenAI 算力每年 3 倍、营收每年 3 倍，且看不到拐点。

• 推理已是多数派，未来 80%+。Scaling law 从预训练扩散到 RL 后训练、合成数据、产品推理全生命周期。

• 三个 token 维度：(1) 让每个 token 更便宜；(2) 让每个 token 更聪明；(3) 让每个任务消耗更少 token。

• 1 GW ≈ 50 万 GPU = 700 亿美元。真正的硬骨头在签合同之后：怎么把这些零件拼成能跑的系统、怎么让芯片持续高利用率。

• Compute graph 从单节点 → 多节点推理 → DAG。Agent 时代 inference call 交错 tool call / DB / RL VM，复杂度爆炸。

• Heterogeneous compute 是 agent UX 的必要条件。Cerebras 这类晶圆级芯片做超快 inference，TSMC 出于自身利益会主动让多家芯片并存。

• 集中式算力在 2026 仍然赢：50MW 分散式每 MW 成本远高于 1 GW 集中式；agent 的 first-token latency 在 400-500ms 量级，远超任何边缘化收益。

• AI 正在设计下一代 AI 基础设施（recursion）——模型训练中决定自己跑什么硬件。chip 3 年设计周期相对模型迭代已是 eternity。

• Long 底层 infra，Short app/wrapper。Transformer / battery / generation / cooling / materials 是被遗忘的复利层；app 是"crutch to get to an outcome"，未来交互形式可能不再是 app。

主题章节

一、嘉宾背景：从 Intel 到 OpenAI 的全栈视角

Apoorv 隆重介绍 Sachin Katti：从创业网络公司起步、任 Intel CTO 兼 AI 业务负责人、2025 年 11 月转岗至 OpenAI 掌管 "Industrial Compute"。他是少有从最底层电子一路贯通到上层 agent 的全栈观察者。Katti 借机点评 Intel：全球供给严重紧缺 + agent 时代让 CPU 重新吃香，Intel 作为美国本土仅存的领先制程制造商有两股强顺风，但执行仍是悬念。

"We push on three dimensions. Keep improving hardware and software to generate tokens more cheaply. We keep pushing on the capabilities of models to make sure every token is more intelligent. And we keep pushing the hardness, like Codex, to make it such that we need less number of tokens to perform any given task." — Sachin Katti

二、OpenAI 的算力-收入曲线：收入 = 算力 × 利用率

Sarah Friar 早些时候发表的 OpenAI 算力规划图，Katti 坦言"这张图就是我每天的工作"。他给出一个非常锋利的框架：收入是 frontier lab 的滞后指标，等于"算力规模 × 利用率"。过去三年算力每年 3 倍增长，营收同步 3 倍增长，且看不到拐点——GPT-5.5 发布后 Codex 两周内录得两位数增长，用途从写代码扩展到通用知识工作。

围绕 scaling law，Katti 指出它已从预训练扩散到整个生命周期：RL 后训练、合成数据生成（真实世界数据已耗尽）以及 ChatGPT/Codex 本身主要都是推理负载。推理早已是多数派，未来 80%+ 将是推理。Apoorv 追问：推理比例上升，dollar per gigawatt 是不是会上升？Katti 答 yes——token 消耗与营收正相关，但 OpenAI 的使命是把 token 做便宜，要在三个维度同时推进。

"Revenue is basically a lagging indicator for frontier lab companies. And what I mean by that is it basically is very simple calculation of how much compute we have and how well utilized is the compute. And so the last three years have borne that out. Every year, we have tripled compute. Year over year and revenue has tripled." — Sachin Katti

Katti 用一句俏皮话定调自己的工作："让图表上的数字往右上方走"——"Forecasting is an easier job than actually making it happen."

三、Compute 的完整供应链：1 GW ≈ 50 万 GPU

Apoorv 追问"现在拿到算力像打群架，瓶颈到底是电力、能源、船只还是土地？" Katti 一句 "All of the above" 开启讲解。他重新定义 compute——它不是单一商品，而是完整供应链的总和：

``` compute = chips + memory + networking + power + cooling

• 数据中心楼宇 + 发电 + 输电 + 土地

```

OpenAI 的工作不是"采购算力"，而是采购并编排这条供应链，确保每种零部件按节奏同时到位。

关键数量级：

• 1 GW ≈ 50 万 GPU

• 6 GW / 10 GW 数据中心意味着几十万颗芯片要联网、供电、冷却、保持运行

• 真正难的工作在签合同之后：保证供应商按约交付、工程上拼成可跑系统、让对冷却和电压极度敏感的芯片持续高利用率

"All of that is equal to compute. All of that needs to come together to build compute at a gigawatt scale. ... The fun part is the contract signing. The hard part is everything after." — Sachin Katti

四、电力冲击：训练任务能压垮州级电网

接下来讨论几个未来级别的"case study 级"决策：

• 电网同步冲击：在 Georgia 或 Michigan 部署 GW 级数据中心，训练任务同步起落带来数百 MW 瞬时波动，可能导致州级电网崩溃——必须设计电网友好的运行方式

• 地缘去风险：把晶圆厂、内存工厂搬到全球不同地区

• 能源去风险：从电网解耦 → 转向天然气 → 最终转向核电

"It's not that crazy to think every one of us should have a GPU. ... A GPU is what? A kilowatt to 2 kilowatts now. 7 billion humans out there. That's 7 terawatts of compute. And so that is two orders of magnitude more than what we are talking about here." — Sachin Katti

美国电网被 AI 吃掉的量级：hyperscaler 整体加起来规划 100 GW，相当于美国电网的五分之一乃至更高——AI 计算将吃掉美国电力的两位数百分比。

五、Heterogeneous Compute：从 GPU 一家到 TSMC 多客户矩阵

Apoorv 问不同 agent workload 是不是会分化出不同的硬件偏好。Katti 借一张自嘲式 slide 引出"以人为瓶颈"的愿景：今天 agent 给你任务后跑几分钟甚至几小时，等它回来要你拍板时你已经 context-switched——AI 才是瓶颈；真正成功的标志是反过来"人成为瓶颈"，每一步都极快极准、你在 flow state。

实现这种 UX 不能靠纯 GPU——必须异构算力：

• Cerebras 这类晶圆级芯片做超快 inference

• 某些加速器能 hold 完整长 context（适合 coding agent 加载整 GitHub repo）

• 不同 agent 节点匹配不同硬件

Katti 顺势评论 hyperscaler 的 ASIC 军备赛：Amazon Trainium 已 $50B run rate，NVIDIA 仍是"the big guy"，但市场会向多供应商倾斜——理由是 "世界不应该在任何单一组件上 single-threaded"。

关键的行业结构性洞察：TSMC 出于自身利益会主动让多家客户都成功，不会把所有 wafer 押给单一客户。因此 OpenAI 这种规模玩家必须学会用所有类型的芯片，没有选择权。

"The way TSMC allocates wafers will mean that there have to be multiple GPUs and accelerators... by definition, we have to learn how to use all of these chips because we don't have a choice." — Sachin Katti

六、为什么算力应该集中：500ms first-token latency 的真相

Apoorv 追问 training（synchronous、coherent cluster）和 inference（spiky、难预测）的 workload 形状差异，推到"要不要把算力分散到边缘"的问题。Katti 给出非常冷静的工程派答案：

理由 1：经济性

"50 megawatts of compute is far more expensive per megawatt than building a gigawatt of compute at one location." — Sachin Katti

labor 是美国最大瓶颈——宁可为大集群配齐人，不为 50MW 分散式四处招人。

理由 2：技术性

Agent 时代 first-token latency 在 400-500ms 量级（要把整个 context page 进 attention），远超任何边缘化带来的网络延迟收益。所以未来一段时间推理仍会向集中式集群集中。除非模型被蒸馏到能跑在边缘。

延迟拆解：Codex 现在 400k token context，prefill 阶段要把整个 context 跑一遍 attention 才开始生成第一个 token——这几百 ms 的 latency 全发生在这里。剩下的 app 调 prompt、API、load balance 到 GPU 也就几十 ms。

Katti 顺手爆了一个八卦：把 Cerebras 接进来后 token 生成快了，结果暴露了 OpenAI API 栈所有其他层的低效——他们刚发了一篇博客解释怎么改 API 架构来跟上 Cerebras。"Whack-a-mole 问题"和"互联网早期优化 page load 一样"。

"The time to first token is still on the order of 400 to 500 milliseconds... 400 to 500 milliseconds is far larger than any latency benefits you get by putting compute closer to the user." — Sachin Katti

"We literally published a blog post on this yesterday... it forced us to change OpenAI's API infrastructure structure to actually keep pace with Cerebras." — Sachin Katti

Latency 是新战场：每削掉 30-50ms latency，engagement / revenue / retention 都会提升。"All of us are going to compete on that dimension."

七、AI 正在设计下一代 AI 基础设施（Recursion）

Apoorv 问每个嘉宾的固定问题："AI 社区对基础设施最大的误解是什么？"Katti 答得最锋利——他认为今天 AI 系统极简：大 compute unit 挂一层 HBM，CPU 时代的多层 cache、flash、磁盘那种成熟分层还没出现。我们处在 "AI 系统演化的极早期"。

第二层洞察更重磅：AI 正在设计下一代 AI 基础设施。OpenAI 已开始用最新模型设计下一颗芯片和低层软件——"recursion"。今天的世界是"我们训模型，芯片厂商独立设计，交付后我们想办法跑起来"；未来是"模型训练的同时就在决定自己想跑什么硬件"。

"We are increasingly using our latest models to design the next chip and the next set of low level software needed to run the next model. ... We are not that far from that world." — Sachin Katti

Chip 设计周期 3 年太长，"从 ChatGPT 发布到现在差不多 3 年——感觉像上辈子"。唯一可行路径就是递归——否则 human-in-the-loop 的设计迭代跟不上模型迭代。

八、Q&A 高潮：万亿赌注 / Fab 瓶颈 / Stargate / 开源

1. 万亿市值赌注

主持人抛出"谁先到 10 万亿美元市值"。Katti 几乎不假思索给出 NVIDIA，但随即补刀"OpenAI 一定会到"——他用一句话把 AI 价值链上的两极都押上了。

2. 基础设施最大未解难题

Katti 没有谈软件、没有谈电力、没有谈融资，而是直指 fab 产能：

"I'd say the single structural issue is enough fab capacity across logic and memory. ... it's ASML. For all of these, you need ASML machines. So to me, that is the single choke point of the whole supply chain." — Sachin Katti

这是对"AI 是不是泡沫"最冷静的反证——泡沫不会在 ASML 这一层排队。

3. Stargate 的选址逻辑

学生直接追问 Stargate 为什么集中式。Katti 给出全场最反直觉的金句——

"A big part in our approach is now time to compute rather than amount of compute." — Sachin Katti

1 GW = $70B = 50 万 GPU 这两个数字本身是物理事实，但真正决定投资去向的是"多快上线"。这解释了为什么 OpenAI 偏好"大块、集中"算力——运营复杂度是 O(N²)，越分散越慢。

4. 开源权重模型

Katti 展现出极强的边界感：前沿智能将继续吃光算力（scaling laws 没破），开源会蒸馏、追赶、缩小参数规模，但 "6 个月的智能领先就是巨大领先"——这句话等于把整个"开源会反噬闭源"的叙事一刀斩掉。OpenAI 的算力投资逻辑不会因为 Llama / DeepSeek 改变。

九、价值会向上移：Jensen 五层蛋糕的历史押韵

Apoorv 拿出 Jensen 的"五层蛋糕"图（energy / chips / infra / models / apps），问 Katti 长期价值在哪个层。Katti 用 mobile 革命做类比：

"If you look at the mobile revolution, initially a lot of the money was made by the telcos and the people building the infrastructure. Then it moved up into the application layer, the people building the apps. And then it moved up into the cloud services, cloud services layer. I don't see any reason why this cycle will be different." — Sachin Katti

现在钱在底层（infra），未来会向上移。这是给"infra 永久值钱"叙事的冷水。

十、Long 底层 Infra，Short App

看多：底层基础设施

"I would go long on the lowest layer of the stack because at least in the US, we have forgotten how to build very foundational infrastructure." — Sachin Katti

具体包括：

• Transformer（电网级）

• Battery（电池）

• Generation & distribution（发电 / 输电）

• Cooling（冷却）

• 各类组件

理由："Differentiation is sustainable because it's both technical as well as scale. If you build it, it's very hard for other people to replicate it." 他以自己在 Stanford EE 教了 15 年、亲见底层课（transistor、materials）选课人数持续下降作背景，强烈建议观众"go long on the lowest layer of the stack"。

看空：App / Model Wrapper

"I'm short anything that is a model wrapper... I think this whole notion of apps probably to me is the one that I'd be short of." — Sachin Katti

"Today apps are a crutch to get to an outcome." — Sachin Katti

未来交互形式可能是"this is the outcome I want, go figure it out"——app 这个概念本身可能被边缘化。模型进化速度太快，wrapper 业务被卷死是结构性问题。

关键概念与数据汇总

概念 / 数字	含义
30 GW	OpenAI 至 2030 年的算力目标
1 GW ≈ 50 万 GPU	衡量 GW 级数据中心规模的换算
1 GW = $70B	单数据中心建设投入的量级基准
100 GW	美国所有 hyperscaler 规划算力总和 ≈ 美国电网 1/5
7 TW	每人一个 1-2 kW GPU × 70 亿人 = 长期"天花板"参照
算力每年 3 倍	OpenAI 过去三年轨迹，与营收 3 倍增长同步
推理 80%+	未来推理算力占比；当下"已经是多数派"
400-500ms	Agent first-token latency，主要被 prefill 阶段吃掉
400k tokens	Codex 当前 context 长度
3 年	Chip 设计周期，相对模型迭代已是 eternity
ASML	整条 AI 供应链的 single choke point
Heterogeneous compute	GPU / CPU / Cerebras / 长 context 加速器共存
Time-to-compute	取代"总算力规模"成为首要优化目标
AI 设计 AI infra	Recursion：模型训练中决定自己跑什么硬件
6 个月智能领先	前沿智能的时间窗口本身就是护城河
Whack-a-mole latency	优化一层暴露其他层低效，类似 page load 优化

关键引用（全场最锋利 8 句）

• "All of that is equal to compute. ... The fun part is the contract signing. The hard part is everything after." — Compute = 完整供应链

• "It's not that crazy to think every one of us should have a GPU. ... 7 billion humans. That's 7 terawatts of compute." — 长期愿景

• "It is dangerous for the world to be single threaded on any one component." — Heterogeneous compute 的根本理由

• "50 megawatts of compute is far more expensive per megawatt than building a gigawatt of compute at one location." — 集中式赢的经济学

• "Now we have a much more directed acyclic graph... So the compute graph is now a lot more complex that you're executing." — Agent workload 形态

• "ASML... that is the single choke point of the whole supply chain." — Fab 产能的终极单点

• "A big part in our approach is now time to compute rather than amount of compute." — 投资评估函数被改写

• "Today apps are a crutch to get to an outcome." — App / wrapper 结构性看空