经济学人|围剿PDF之势愈演愈烈
The war against PDFs is heating up
Will the file type survive the AI revolution?
When Adobe introduced the portable document format (PDF) in 1993, a consultant from Gartner called it “the dumbest idea I’ve ever heard in my life”. Users would have to twiddle their thumbs waiting for the megabyte-sized files to download over their dial-up internet, then wait again for their PCs to render them. The software-maker’s board wanted to kill the project. But the PDF triumphed, particularly after the Internal Revenue Service, America’s tax authority, began to use it for digital tax forms. Today more than 2.5trn PDFs float in the ether. But will the format survive the ai revolution?
当Adobe在1993年推出便携式文档格式(PDF)时,一位来自Gartner的顾问称其为“我一生中听过的最愚蠢的想法”。用户将不得不拨弄他们的拇指,等待兆字节大小的文件通过他们的拨号网络下载,然后再次等待他们的电脑渲染它们。软件制造商的董事会想扼杀这个项目。但是PDF取得了胜利,特别是在美国税务机关——美国国税局开始将其用于数字纳税表格之后。今天,超过2.5万亿的pdf文件在以太网上流通。但这种格式能否在人工智能革命中幸存下来?
PDFs still have drawbacks. They are a pain to view on a smartphone. Copying data from them is fiddly. Software tools that read screens for blind people struggle with PDFs. The file type, which Adobe relinquished control over in 2008, is also a vehicle for malware: a fifth of email-based cyber-attacks utilise PDF attachments, according to Check Point, a cyber-security firm.
PDF仍然有缺点。在智能手机上观看它们很痛苦。从它们中复制数据非常繁琐。为盲人阅读屏幕的软件工具在阅读pdf时遇到了困难。Adobe在2008年放弃了对PDF文件的控制,但这种文件类型也是恶意软件的载体:根据网络安全公司Check Point的数据,五分之一的基于电子邮件的网络攻击利用PDF附件。
Lately another source of criticism has emerged. The large language models (LLMs) underpinning generative AI are often bamboozled by PDFs, reading a page set in several columns from left to right rather than top to bottom, say, or getting confused by headers and footers. Trouble parsing PDFs is one of the reasons AI chatbots occasionally “hallucinate” nonsense.
最近出现了另一种批评。支持生成式人工智能的大型语言模型(llm)经常被PDF迷惑,从左到右而不是从上到下阅读几列的页面集,或者被标题和脚注弄糊涂。解析PDF文件的困难是人工智能聊天机器人偶尔“产生幻觉”的原因之一。
Enter the disrupters. Startups such as Factify are on a mission to build a new file type that is better suited to the technology. Matan Gavish, its boss, talks of his “megalomaniac” vision of displacing the PDF.
搅局者登场了。Factify等初创公司的使命是创建一种更适合该技术的新文件类型。它的老板马坦·加维什(Matan Gavish)谈到了他取代PDF的“自大狂”愿景。
Yet Duff Johnson, head of the PDF Association, protector of the format, argues that the fault lies not in the file type but in ourselves. He contends that there is no reason developers cannot build bots that are able to use PDFs. The AI assistant embedded in Acrobat, Adobe’s PDF reader, is designed to do precisely that, points out Leonard Rosenthol, the software-maker’s PDF guru. Google, a leader in AI, has also rolled out a tool for developers who use its Gemini models that makes it easier to ingest PDFs. The format’s reign is not over yet.
然而,PDF格式保护者、PDF协会主席达夫•约翰逊(Duff Johnson)认为,问题不在于文件类型,而在于我们自己。他认为,开发者没有理由不能构建能够使用pdf的机器人。软件制造商的PDF专家Leonard Rosenthol指出,嵌入在Adobe的PDF阅读器Acrobat中的人工智能助手正是为了做到这一点而设计的。人工智能领域的领导者谷歌也为使用其Gemini模型的开发人员推出了一个工具,使其更容易获取PDF。这种形式的统治还没有结束。
夜雨聆风
