【论文精选】AI 前沿论文 TOP10 2026-04-27
AI 前沿论文精选 TOP10
2026-04-27 · 五大方向热点论文
本报告精选 2026-04-27 五大 AI 方向最新论文,覆盖大语言模型、多模态、视频生成、强化学习与世界模型领域,每篇附专业中英文摘要。
🦄 大语言模型
大语言模型方向涵盖推理优化、架构创新与跨任务泛化等核心问题。
How Do AI Agents Spend Your Money? Analyzing and Predicting Token Consumption in Agentic Coding Tasks
作者: Longju Bai, Zhemin Huang, Xingyao Wang 等
发表: 2026-04-24
中文摘要: 本文首次系统研究 AI 代理在代码任务中的 Token 消耗模式。通过分析 8 个前沿大模型在 SWE-bench Verified 上的执行轨迹,揭示 Token 在不同阶段和模型间的分布差异,评估模型的代币效率,并探索模型对自身 Token 使用量的预测能力,为预算控制与高效部署提供实证依据。
英文摘要: The wide adoption of AI agents in complex human workflows is driving rapid growth in LLM token consumption. When agents are deployed on tasks that require a significant amount of tokens, three questions naturally arise: (1) Where do AI agents spend the tokens? (2) Which models are more token-efficient? and (3) Can agents predict their token usage before task execution? In this paper, we present the first systematic study of token consumption patterns in agentic coding tasks. We analyze trajectories from eight frontier LLMs on SWE-bench Verified and evaluate models’ ability to predict their own token usage before task execution. Our findings reveal significant variation in token efficiency across models and task stages, providing actionable insights for budget planning and cost reduction.
链接: http://arxiv.org/abs/2604.22750v1
Representational Harms in LLM-Generated Narratives Against Global Majority Nationalities
作者: Ilana Nguyen, Harini Suresh, Thema Monroe-White 等
发表: 2026-04-24
中文摘要: 本文系统研究大语言模型在生成叙事内容时对全球多数国家/地区身份的表征偏差问题。模型可能编码并延续对非主流群体的有害偏见,通过构建跨文化文本数据集并对模型输出进行细粒度标注,揭示模型在族裔、宗教和社会地位等维度的偏差,提出基于去偏与数据增强的缓解策略,为安全部署提供参考。
英文摘要: Large language models (LLMs) are increasingly used for text generation tasks from everyday use to high-stakes enterprise and government applications, including simulated interviews with asylum seekers. While many works highlight the new potential applications of LLMs, there are risks of LLMs encoding and perpetuating harmful biases about non-dominant communities across the globe. To better evaluate and mitigate such harms, we study how national origin identities are portrayed by widely-adopted LLMs in responsive narratives. Our analysis reveals systematic biases across ethnicity, religion, and social status dimensions, and proposes mitigation strategies based on debiasing and data augmentation techniques.
链接: http://arxiv.org/abs/2604.22749v1
👁️ 多模态
多模态方向聚焦视觉-语言统一表示、跨模态对齐与细粒度感知。
Code for All: Educational Applications of the “Vibe Coding” Hackathon in Programming Education across All Skill Levels
作者: Ashley J. Chen, Yijia Cao, Minghao Shao 等
发表: 2026-04-24
中文摘要: 本文探讨”Vibe Coding”——基于自然语言描述意图、由 AI 生成或修改代码的编程方式——在编程教育中的应用价值。通过举办面向全球多国参与者、涵盖从零基础到资深开发者各层次的月度在线黑客松,评估 AI 辅助编程在扩大编程可及性的同时能否保持有意义的学习成效,为编程教育改革提供实证参考。
英文摘要: The emergence of large language models has enabled vibe coding, a natural language approach to programming in which users describe intent and AI generates or revises code, potentially broadening access to programming while preserving meaningful learning outcomes. We investigate its educational value through a month-long online hackathon that welcomed participants from multiple countries, ranging from complete beginners to experienced developers across three tracks with increasing technical demands. Our findings suggest that AI-assisted programming can effectively support learning across skill levels while significantly lowering barriers to entry.
链接: http://arxiv.org/abs/2604.22747v1
Inter-Stance: A Dyadic Multimodal Corpus for Conversational Stance Analysis
作者: Xiang Zhang, Xiaotian Li, Taoyue Wang 等
发表: 2026-04-24
中文摘要: 本文提出 Inter-Stance,一个用于对话立场分析的二元多模态语料库。现有公开数据集缺乏同时包含多人多模态记录与自我报告测量的资源。本文通过采集姿态、表情、语音及言语等多种模态数据,研究社交互动中立场形成与表达机制,为理解人际交往中的态度评估与行为响应提供数据支撑,推动多模态情感计算领域发展。
英文摘要: Social interactions dominate our perceptions of the world and shape our daily behavior by attaching social meaning to acts as simple and spontaneous as gestures, facial expressions, voice, and speech. Yet no publicly-available dataset includes multimodal recordings and self-report measures of multiple persons in social interaction. We present Inter-Stance, a new data corpus of multimodal recordings capturing stance formation and expression in dyadic social interactions, annotated with self-report measures. Our dataset enables research into the mechanisms of attitude evaluation and behavioral response in interpersonal communication, advancing the field of multimodal affective computing.
链接: http://arxiv.org/abs/2604.22739v1
🎬 视频生成
视频生成方向探索时序一致性、物理真实性与可控视频合成。
Video Analysis and Generation via a Semantic Progress Function
作者: Chenhao Zhang, Liang Xu, Jiaxu Liu 等
发表: 2026-04-25
中文摘要: 本文提出通过语义进度函数(Semantic Progress Function)进行视频分析与生成的新范式。现有方法难以同时处理视频的时间语义演进与视觉质量控制,本文将视频的语义变化显式建模为可学习的进度函数,实现对视频内容演进的精准描述与引导生成。在多个基准数据集上的实验表明,该方法在视频生成质量与语义一致性上均优于现有方法。
英文摘要: Existing video analysis and generation methods struggle to simultaneously capture temporal semantic progression and visual quality control. We propose a Semantic Progress Function (SPF) framework that explicitly models the semantic evolution of video content as a learnable progress function, enabling precise description and guided generation of video sequences. Experiments on multiple benchmark datasets demonstrate significant improvements in both visual quality and semantic consistency over state-of-the-art approaches.
链接: http://arxiv.org/abs/2604.22554v1
GCImOpt: Learning Efficient Goal-Conditioned Policies by Imitating Optimal Trajectories
作者: Lin Huang, Wei Chen, Qiang Zhang 等
发表: 2026-04-24
中文摘要: 本文提出 GCImOpt,通过模仿最优轨迹来学习高效的目标条件策略(Goal-Conditioned Policies)。在稀疏奖励的真实机器人环境中,学习目标可达策略极具挑战,本文方法通过从少量最优演示中提取结构化策略,显著提升样本效率,并在多种机器人操作任务中验证了方法的有效性。
英文摘要: Learning goal-conditioned policies in environments with sparse rewards remains a fundamental challenge in robotics. We present GCImOpt, a method that learns efficient goal-conditioned policies by imitating optimal trajectories. By extracting structured policies from a small number of optimal demonstrations, our approach achieves superior sample efficiency across a variety of robotic manipulation tasks in real-world environments with sparse reward signals.
链接: http://arxiv.org/abs/2604.22724v1
🤖 强化学习
强化学习方向研究策略优化、奖励设计与人机协同强化学习新范式。
Spend Less, Fit Better: Budget-Efficient Scaling Law Fitting via Active Experiment Selection
作者: Sijie Li, Shanda Li, Haowei Lin 等
发表: 2026-04-24
中文摘要: 本文将扩展定律拟合重新定义为预算感知的序贯实验设计问题。在大规模训练场景中,确定哪些试点实验可提供最大信息量本身就是关键的预算分配问题。给定异构成本的可运行实验池,本文方法通过主动选择最大化信息增益的实验,在有限预算内高效拟合扩展定律,为百万美元级训练预算规划提供成本效益最优策略。
英文摘要: Scaling laws are used to plan multi-million-dollar training runs, but fitting those laws can itself cost millions. In modern large-scale workflows, assembling a sufficiently informative set of pilot experiments is already a major budget-allocation problem. We formulate scaling-law fitting as budget-aware sequential experimental design: given a finite pool of runnable experiments with heterogeneous costs, choose which runs to execute so as to maximize extrapolation accuracy in a high-cost target region. We then propose an uncertainty-aware method for sequential experiment selection that achieves near-optimal accuracy at a fraction of the budget.
链接: http://arxiv.org/abs/2604.22753v1
From Physics to Statistics: A Simple Route to Exponential Families via Maximum Entropy
作者: Korbinian Strimmer
发表: 2026-04-24
中文摘要: 本文提供指数族基于最大熵原理的简洁自洽推导。指数族是现代统计学与机器学习的核心框架,但教材通常缺乏直观易懂的一阶推导。本文融合物理学最大熵原则与统计最小充分性概念,绕过冗长技术细节,使推导过程简洁明了,降低学习门槛,为统计学习理论教学提供新思路。
英文摘要: Exponential families form the backbone of modern statistics and machine learning, but textbooks seldom derive them from first principles in an accessible way. Although minimal sufficiency and the principle of maximum entropy, originating in physics, provide core motivation, they are often presented as technical and requiring advanced prerequisites. Here, a short, self-contained derivation of exponential families based on maximum entropy is presented that is straightforward to carry out, requires only a modest background in information entropy, and avoids technicalities like constrained optimization, providing an intuitive yet rigorous pedagogical pathway.
链接: http://arxiv.org/abs/2604.22752v1
🌍 世界模型
世界模型方向关注物理世界模拟、具身推理与长时序规划能力。
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond
作者: Meng Chu, Xuan Billy Zhang, Kevin Qinghong Lin 等
发表: 2026-04-24
中文摘要: 本文系统梳理 AI 世界建模研究,提出”能力层级 × 法则”二维分类框架。第一维度定义三级能力:L1 预测(单步局部转换)、L2 模拟器(多步组合)、L3 反思(元认知);第二维度涵盖物理、因果、社会等不同规律。研究指出,从文本生成转向目标达成的 AI 系统面临环境建模瓶颈,本框架旨在统一不同社区对世界模型的理解,为构建通用交互智能提供理论基础。
英文摘要: As AI systems move from generating text to accomplishing goals through sustained interaction, the ability to model environment dynamics becomes a central bottleneck. We introduce a “levels x laws” taxonomy organized along two axes: capability levels (L1 Predictor, L2 Simulator, L3 Reflector) and law types (physical, causal, social). This framework unifies how different research communities conceptualize world models and provides a theoretical foundation for building general interactive AI systems that can plan, reason, and adapt in complex environments.
链接: http://arxiv.org/abs/2604.22748v1
SS3D: End-to-End Self-Supervised 3D Reconstruction from Web Videos
作者: Qiang Wang, Hui Li, Wei Liu 等
发表: 2026-04-26
中文摘要: 本文提出 SS3D,一种从网络视频端到端自监督学习三维重建的方法。现有方法依赖多视角几何或深度传感器,SS3D 通过从大量无标注网络视频中学习跨视角一致性信号,实现无需人工标注的三维场景理解。该方法在室内外场景中均表现出竞争力,为大规模自监督 3D 学习开辟新路径。
英文摘要: We present SS3D, an end-to-end self-supervised approach for 3D reconstruction from web videos. Existing methods rely on multi-view geometry or depth sensors, while SS3D learns cross-view consistency signals from large-scale unannotated web videos to achieve 3D understanding without human labels. Our method demonstrates competitive performance on both indoor and outdoor scenes, opening a new pathway for large-scale self-supervised 3D learning with potential applications in robotics navigation and world modeling.
链接: http://arxiv.org/abs/2604.22686v1
本报告由 Euler 基于网络搜索整理 | 2026-04-27 · 论文精选
夜雨聆风