乐于分享
好东西不私藏

PDF的黄金时代终结了吗? | 经济学人

PDF的黄金时代终结了吗? | 经济学人

 中文导读

诞生1993PDF为“愚蠢想法”,最终成为全球常见文件格式之一,如今数量超过2.5亿份。然而AI时代,PDF结构复杂、难以解析,模型频频“看错”。一些创业公司试图创造文件格式取而代之,Adobe技术认为,通过改进AI工具,PDF仍有机会继续统治数字世界。

无注释纯原文版

The war against PDFs is heating up

Will the file type survive the AI revolution?

From:The Economist

Business

Feb 24, 2026 | 389 words | ★★

When Adobe introduced the portable document format (PDF) in 1993, a consultant from Gartner called it “the dumbest idea I’ve ever heard in my life”. Users would have to twiddle their thumbs waiting for the megabyte-sized files to download over their dial-up internet, then wait again for their PCs to render them. The software-maker’s board wanted to kill the project. But as sharing digital files became essential, the PDF triumphed—particularly after the Internal Revenue Service, America’s tax authority, started using it for its forms. Today more than 2.5trn PDFs float in the ether. But will the format survive the AI revolution?

PDFs still have drawbacks. They are a pain to view on a smartphone. Copying data from them is fiddly. Software tools that read screens for blind people struggle with PDFs. The file type, which Adobe relinquished control over in 2008, is also a vehicle for malware: a fifth of email-based cyber-attacks utilise PDF attachments, according to Check Point, a cyber-security firm.

Lately another source of criticism has emerged. The large language models underpinning generative AI are often bamboozled by PDFs, reading a page set in columns from left to right rather than top to bottom, say, or getting confused by headers and footers. Trouble parsing PDFs is one of the reasons AI chatbots occasionally “hallucinate”, generating nonsense.

Enter the disrupters. Startups such as Factify are on a mission to build a new file type that is better suited to the technology. Matan Gavish, its boss, talks of his “megalomaniac” vision of displacing the PDF.

Yet Duff Johnson, head of the PDF Association, protector of the format, argues that the fault lies not in the file type but in ourselves. He contends that there is no reason developers cannot build bots that are able to use PDFs. The AI assistant embedded in Acrobat, Adobe’s PDF reader, is designed to do precisely that, notes Leonard Rosenthol, the software firm’s PDF guru. Google, a leader in AI, has rolled out a tool for developers using its Gemini models that makes it easier to ingest PDFs. The format’s reign is not over yet. 

According to the passage, why do large language models sometimes struggle with PDFs?

A. Because PDFs are too large for AI systems to process efficiently.B. Because the layout of PDFs can confuse AI when extracting information.C. Because PDFs are protected by Adobe and cannot be accessed by AI tools.D. Because PDFs often contain inaccurate or outdated information.

The war against PDFs is heating up

针对PDF的“战争”正在升温

Will the file type survive the AI revolution?

这种文件格式能否在AI革命中继续存活?

01

When Adobe introduced the portable document format (PDF) in 1993, a consultant from Gartner called it “the dumbest idea I’ve ever heard in my life”. Users would have to twiddle their thumbs waiting for the megabyte-sized files to download over their dial-up internet, then wait again for their PCs to render them. The software-maker’s board wanted to kill the project. But as sharing digital files became essential, the PDF triumphed—particularly after the Internal Revenue Service, America’s tax authority, started using it for its forms. Today more than 2.5trn PDFs float in the ether. But will the format survive the AI revolution?

注解卡片

📌

twiddle /ˈtwɪdəl/ v.

(无聊或紧张时)摆弄,转动(手指等)

twiddle 意思是“(用手指)摆弄,拨弄,转动”,英文解释:to turn something with your fingers, or to move your fingers in a nervous or idle way. 举例:He sat there nervously twiddling his pen while waiting for the interview to begin. 在等待面试开始时,他紧张地坐在那里摆弄着手中的笔。

twiddle 出现在固定表达 twiddle one’s thumbs 里,意思是“百无聊赖地干等、无所事事地等待”。作者借用这一带有口语色彩的表达,生动地描绘了上世纪90年代早期互联网环境下用户的体验:网速极慢,人们只能干等文件下载完成,让读者仿佛看到用户无聊地摆弄手指、消磨时间的场景,也突出了当年技术条件的落后,从而为后文 PDF 最终取得成功形成一种历史反差。

📖《The Guardian》(卫报)有一篇社会评论文章提到:“Workers were left twiddling their thumbs as the project stalled due to bureaucratic delays.”(由于官僚程序的拖延,项目停滞不前,工作人员只能干等着无所事事。)twiddle one’s thumbs 常用来形容人在等待过程中无能为力、只能被动消磨时间的状态。这个表达带有一定的讽刺意味,既描写了等待的无聊,也暗示效率低下或制度拖延。

📌

essential /ɪˈsenʃəl/ adj. 

必不可少的;极其重要的

essential 意思是“必不可少的,极其重要的”,英文解释:extremely important or necessary; something that is needed in order for something to exist or happen. 举例:Good communication is essential for a successful team. 良好的沟通对于一个成功的团队来说是必不可少的。

作者用这个词说明随着互联网的发展,数字文件的共享逐渐成为一种不可或缺的需求,正是在这种背景下,PDF 格式才得以迅速普及并取得成功。相比简单的 important,essential 的语气更强,强调某件事如果缺少就无法正常运作。因此,这个词不仅解释了 PDF 成功的原因,也突出了技术发展与用户需求之间的紧密关系,使读者理解为什么一个最初被质疑的文件格式最终能够成为全球标准。

📖《The Economist》(经济学人)有一篇科技与经济评论文章提到:“Reliable data are essential for governments trying to design effective economic policies.”(对于试图制定有效经济政策的政府来说,可靠的数据是必不可少的。)在政策和经济分析的语境中,essential 常用于强调某个因素在系统运作中的基础性和关键性作用。这个词不仅表示“重要”,还暗示如果缺少这一条件,整体目标就难以实现。essential能够突出某个要素在复杂问题中的核心地位,使论述更具逻辑性和说服力。

1993年,当Adobe推出便携式文档格式(PDF)时,咨询公司Gartner的一名顾问曾将其称为“我这辈子听过最愚蠢的想法”。当时用户必须百无聊赖地等待兆字节级文件通过拨号上网慢慢下载下来,然后还要再等电脑将其渲染显示。甚至连这家软件公司的董事会也一度想叫停这个项目。然而,随着数字文件共享变得越来越重要,PDF最终取得了胜利——尤其是在美国税务机构美国国税局开始用它发布各类表格之后。如今,互联网上漂浮着超过2.5万亿份PDF文件。但在人工智能革命的冲击下,这一格式还能继续生存吗?

02

PDFs still have drawbacks. They are a pain to view on a smartphone. Copying data from them is fiddly. Software tools that read screens for blind people struggle with PDFs. The file type, which Adobe relinquished control over in 2008, is also a vehicle for malware: a fifth of email-based cyber-attacks utilise PDF attachments, according to Check Point, a cyber-security firm.

注解卡片

📌

fiddly /ˈfɪdli/ adj.

难以操作的;需要小心处理的

fiddly 意思是“需要细心操作的,难以摆弄的(因部件小或复杂)”,英文解释:difficult to do or handle because it involves small, delicate, or complicated parts. 举例:The watch repair is quite fiddly because the parts inside are extremely small. 这块手表的修理非常精细麻烦,因为里面的零件非常小。

作者用这个词来形容从 PDF 文件中复制数据时的不便之处,强调这种操作虽然不是不可能,但过程往往麻烦、效率低且不够顺畅。这种用词让文章更贴近用户体验,也更生动地说明了 PDF 在实际使用中的一个常见痛点。

📖《The Washington Post》(华盛顿邮报)有一篇科技产品评测文章提到:“Setting up the device can be a bit fiddly for first-time users.”(对于第一次使用的人来说,设备的设置过程可能有点繁琐。)在科技和产品评论的语境中,fiddly 常用来描述操作步骤多、细节复杂、需要耐心处理的小问题。这个词并不一定意味着事情非常困难,而是强调细节多、操作不够直观。

📌

relinquish /rɪˈlɪŋkwɪʃ/ v.

 放弃;交出;让出

relinquish 意思是“放弃,交出(权力、职位、控制权等)”,英文解释:to voluntarily give up something, such as a right, claim, or position. 举例:After much consideration, she decided to relinquish her role as team leader. 经过深思熟虑,她决定放弃团队领导的职位。

作者用这个词说明 Adobe 在 2008 年将 PDF 的控制权交出,使其成为更加开放的标准。相比简单的 give up,relinquish 更正式,常用于描述企业、政府或机构主动放弃某种权力、所有权或控制权的行为。在这篇文章中,这个词不仅提供了技术历史背景,也暗示了 PDF 能够广泛传播的重要原因——因为它不再完全受单一公司控制,从而被更多软件和机构采用。

📖《The Economist》(经济学人)有一篇政治分析文章提到:“Military leaders are often reluctant to relinquish power even after a transition to civilian rule.”(即便国家已经过渡到文官统治,军方领导人往往也不愿意放弃权力。)在政治评论语境中,relinquish 常用来描述权力、控制权或特权的交出,语气正式且带有制度或权力结构的意味。这个词更强调权力本身的分量以及放弃过程的重大意义。因此,在新闻和评论文章中使用 relinquish,能够体现出事件的政治性或制度性,也让表达显得更加严肃和分析性。

尽管如此,PDF仍然存在不少缺点。在智能手机上阅读PDF并不方便,从中复制数据也十分繁琐。为盲人朗读屏幕内容的软件在处理PDF时也常常困难重重。此外,这种Adobe在2008年已放弃控制权的文件格式还可能成为恶意软件的载体。网络安全公司Check Point的数据显示,大约五分之一的基于电子邮件的网络攻击都利用PDF附件实施。

03

Lately another source of criticism has emerged. The large language models underpinning generative AI are often bamboozled by PDFs, reading a page set in columns from left to right rather than top to bottom, say, or getting confused by headers and footers. Trouble parsing PDFs is one of the reasons AI chatbots occasionally “hallucinate”, generating nonsense.

注解卡片

📌

bamboozled /bæmˈbuːzld/ adj. 

困惑的;被欺骗的

bamboozled 意思是“被欺骗的,被搞糊涂的”,英文解释:completely tricked or confused by someone. 举例:He felt completely bamboozled by the complicated instructions. 他被那些复杂的说明彻底搞糊涂了。

作者用这个词形象地描述大型语言模型在处理 PDF 文件时经常出现的困惑状态,例如把多栏文本按错误顺序阅读,或被页眉页脚干扰。bamboozled 语气更生动,带有一种被复杂情况“搞晕”的意味。这种表达不仅增强了文章的可读性,也突出了 PDF 结构对 AI 解析带来的技术挑战,使读者更容易理解为什么 AI 有时会出现“幻觉式”错误。

📖《TIME》(时代周刊)有一篇政治评论文章提到:“Voters were left bamboozled by the government’s shifting explanations of the policy.”(政府不断变化的政策解释让选民感到一头雾水。)bamboozled 常用来形容公众面对复杂或矛盾信息时的困惑与不解。这个词比 confused 更具有情绪色彩,暗示信息不仅复杂,还可能带有误导或混乱的成分。因此,在新闻和评论文章中使用 bamboozled,既能增强语言的生动性,也能突出信息不清或局势复杂带来的理解困难。

最近,PDF又遭遇了新的批评来源。支撑生成式人工智能的大语言模型在处理PDF时常常被其版式“难住”:例如把分栏页面按从左到右而不是从上到下阅读,或被页眉页脚所干扰。解析PDF时出现困难,也是AI聊天机器人偶尔产生“幻觉”、输出无意义内容的原因之一。

04

Enter the disrupters. Startups such as Factify are on a mission to build a new file type that is better suited to the technology. Matan Gavish, its boss, talks of his “megalomaniac” vision of displacing the PDF.

于是,一些“颠覆者”开始登场。Factify等初创公司正试图打造一种更适合人工智能时代的新型文件格式。该公司负责人马坦·加维什甚至直言,他怀揣着一个“近乎狂妄”的愿景——取代PDF。

05

Yet Duff Johnson, head of the PDF Association, protector of the format, argues that the fault lies not in the file type but in ourselves. He contends that there is no reason developers cannot build bots that are able to use PDFs. The AI assistant embedded in Acrobat, Adobe’s PDF reader, is designed to do precisely that, notes Leonard Rosenthol, the software firm’s PDF guru. Google, a leader in AI, has rolled out a tool for developers using its Gemini models that makes it easier to ingest PDFs. The format’s reign is not over yet.

注解卡片

📌

embed /ɪmˈbed/ v. 

嵌入;牢牢插入

embed 意思是“嵌入,插入”,英文解释:to fix something firmly and deeply in a surrounding mass, or to insert something within another thing.

举例:The journalist embedded a reporter within the army unit to cover the conflict firsthand. 记者把一名报道员派入军队单位,以亲身报道冲突。

作者使用这个词强调 AI 助手是直接集成在 Acrobat 软件内部的,而不是作为独立工具或附加插件存在。相比简单的 include 或 install,embed 更强调一种深度整合、紧密结合的状态,突出技术与软件功能的无缝融合。这种用词不仅说明 AI 功能的存在方式,也暗示用户使用体验的便捷性和软件的技术先进性。

📖《Financial Times》(金融时报)就有文章提到:“Security features are embedded into the new smartphone to protect user data.”(新的智能手机内置了安全功能,以保护用户数据。)在科技报道和产品分析的语境中,embed 常用来描述功能或技术被深度集成到设备或系统中,强调其不可分割和系统性作用。使用 embed 可以突出设计的整体性和技术的先进性,使读者理解这些功能是原生集成的,而非事后附加。

不过,PDF协会负责人、这一格式的守护者达夫·约翰逊认为,问题并不在于文件格式本身,而在于我们自身。他指出,没有任何理由说明开发者无法打造能够处理PDF的机器人。Adobe的PDF阅读器Acrobat中嵌入的AI助手正是为此而设计的,该公司PDF技术专家伦纳德·罗森索尔表示。与此同时,人工智能领域的领军企业谷歌也推出了面向开发者的工具,使其Gemini模型更容易读取PDF。看来,PDF的统治时代尚未结束。


扫码进福利群👇免费获取精读讲义PDF

跟智博学长读外刊

每日精选优质外刊文章

扫码关注我们

觉得文章不错,点赞鼓励一下

本站文章均为手工撰写未经允许谢绝转载:夜雨聆风 » PDF的黄金时代终结了吗? | 经济学人

猜你喜欢

  • 暂无文章