乐于分享
好东西不私藏

正文:反对PDF的战争正在升温

正文:反对PDF的战争正在升温

自1993年问世以来,PDF 凭借其跨平台的一致性成为了全球通用的数字文档标准,但如今在人工智能时代正面临严峻挑战。尽管该格式解决了历史性的分享难题,但其在移动端体验、安全漏洞以及对屏幕阅读器的不友好等方面长期饱受诟病。目前的争议核心在于,传统 PDF 复杂的排版逻辑常导致大语言模型在解析数据时产生“幻觉”,促使新兴创业公司试图通过开发原生支持 AI 的新文件格式来取代它。然而,行业保卫者认为技术演进应聚焦于提升 AI 的识别能力而非废弃标准,这场关于文件格式未来的博弈将决定我们如何与数字信息进行交互。

Attachment issues
The war against PDFs is heating up
反对PDF的战争正在升温

Will the file type survive the AI revolution?
文件格式能否在人工智能革命中幸存?

▶ 英文段落:
WHEN ADOBE introduced the portable document format (PDF) in 1993, a consultant from Gartner called it “the dumbest idea I’ve ever heard in my life”. Users would have to twiddle their thumbs waiting for the megabyte-sized files to download over their dial-up internet, then wait again for their PCs to render them. The software-maker’s board wanted to kill the project. But as sharing digital files became essential, the PDF triumphed—particularly after the Internal Revenue Service, America’s tax authority, started using it for its forms. Today more than 2.5trn PDFs float in the ether. But will the format survive the AI revolution?

▶ 中文翻译:
当Adobe于1993年推出便携文档格式(PDF)时,Gartner的一位顾问称其为“我一生中听过的最愚蠢的想法”。用户不得不无所事事地等待兆字节大小的文件通过拨号互联网下载,然后再次等待他们的个人电脑渲染它们。这家软件制造商的董事会希望取消该项目。但随着数字文件共享变得至关重要,PDF最终获得了胜利——尤其是在美国税务机关国内收入局(Internal Revenue Service)开始使用它来处理表格之后。如今,超过2.5万亿个PDF文件漂浮在网络中。但这种格式能否在人工智能革命中幸存下来?

▶ 英文段落:
PDFs still have drawbacks. They are a pain to view on a smartphone. Copying data from them is fiddly. Software tools that read screens for blind people struggle with PDFs. The file type, which Adobe relinquished control over in 2008, is also a vehicle for malware: a fifth of email-based cyber-attacks utilise PDF attachments, according to Check Point, a cyber-security firm.

▶ 中文翻译:
PDF仍然存在缺点。它们在智能手机上查看非常麻烦。从中复制数据也很棘手。为盲人读取屏幕的软件工具在处理PDF时会遇到困难。这种文件类型,Adobe已于2008年放弃了控制权,也是恶意软件的载体:根据网络安全公司Check Point的数据,五分之一的电子邮件网络攻击会利用PDF附件。

▶ 英文段落:
Lately another source of criticism has emerged. The large language models underpinning generative AI are often bamboozled by PDFs, reading a page set in columns from left to right rather than top to bottom, say, or getting confused by headers and footers. Trouble parsing PDFs is one of the reasons AI chatbots occasionally “hallucinate”, generating nonsense.

▶ 中文翻译:
最近,又出现了一种批评的声音。驱动生成式人工智能的大型语言模型常常被PDF所困扰,例如,它们会从左到右而非从上到下阅读分栏的页面,或者被页眉页脚弄混。解析PDF的困难是人工智能聊天机器人偶尔会“幻觉”(产生胡言乱语)的原因之一。

▶ 英文段落:
Enter the disrupters. Startups such as Factify are on a mission to build a new file type that is better suited to the technology. Matan Gavish, its boss, talks of his “megalomaniac” vision of displacing the PDF.

▶ 中文翻译:
颠覆者登场了。Factify等初创公司正致力于构建一种更适合新技术的新文件格式。其负责人Matan Gavish谈到了他“妄想狂”般的愿景——取代PDF。

▶ 英文段落:
Yet Duff Johnson, head of the PDF Association, protector of the format, argues that the fault lies not in the file type but in ourselves. He contends that there is no reason developers cannot build bots that are able to use PDFs. The AI assistant embedded in Acrobat, Adobe’s PDF reader, is designed to do precisely that, notes Leonard Rosenthol, the software firm’s PDF guru. Google, a leader in AI, has rolled out a tool for developers using its Gemini models that makes it easier to ingest PDFs. The format’s reign is not over yet.

▶ 中文翻译:
然而,PDF协会的负责人、该格式的保护者Duff Johnson认为,问题不在于文件类型本身,而在于我们。他认为,开发者完全可以构建能够使用PDF的机器人。软件公司Adobe的PDF专家Leonard Rosenthol指出,嵌入在Adobe PDF阅读器Acrobat中的AI助手正是为此而设计。作为人工智能领域的领导者,谷歌已经推出了一个工具,供使用其Gemini模型的开发者使用,该工具可以更轻松地摄取PDF。这种格式的统治地位尚未结束。

本站文章均为手工撰写未经允许谢绝转载:夜雨聆风 » 正文:反对PDF的战争正在升温

评论 抢沙发

7 + 4 =
  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
×
订阅图标按钮