【医学影像与AI文献快递】第38期|2026年6月7日

【医学影像与AI文献快递】第38期｜2026年6月7日

1. 基于Mamba架构的深度学习预测乳腺癌病理完全缓解

期刊：npj Digital Medicine

英文标题：Deep learning prediction of pathological complete response in breast cancer using Mamba architecture.

中文摘要

目的：开发一种基于Mamba架构的深度学习模型，利用穿刺活检样本预测乳腺癌患者新辅助化疗后的病理完全缓解，以克服当前方法依赖小样本队列和传统卷积神经网络或Transformer架构的局限性。

方法：研究纳入来自五家三级医院的1646例乳腺癌患者，使用穿刺活检样本开发MCEN模型。将其中一家医院的1023例活检样本按8:2比例随机分为训练集和验证集，其余四家医院作为外部测试集，评估模型性能与稳健性。

结果：在训练集和验证集中，MCEN模型的受试者工作特征曲线下面积（AUROC）分别为0.923和0.78；四个外部测试集的AUROC范围为0.761至0.809。加入临床病理信息后，模型预测性能提升，训练集和验证集AUROC分别达到0.937和0.811，外部测试集AUROC范围为0.773至0.84。

结论：本研究展示了MCEN模型作为临床决策支持工具的潜力，能够有效预测乳腺癌新辅助化疗的病理完全缓解。

本刊点评

该研究利用Mamba架构这一新兴序列建模方法，在较大规模多中心数据上验证了深度学习预测乳腺癌化疗疗效的可行性，为影像与病理AI融合提供了新思路。其外部测试集性能稳定，但AUROC值仍有提升空间，未来可结合多模态影像与分子分型进一步优化模型泛化能力。

英文原摘要

Deep learning is capable of efficiently predicting the therapeutic efficacy of neoadjuvant chemotherapy (NAC) in breast cancer. However, current methods predominantly rely on convolutional neural networks or transformer architectures and are often validated in small patient cohorts. We developed a Mamba-based deep learning model for predicting chemotherapy efficacy using needle biopsy (MCEN) from 1646 patients with breast cancer across five tertiary hospitals, aiming to predict pathological complete response following NAC. We randomly divided 1023 biopsy samples from one hospital into training and validation sets at an 8:2 ratio and used the remaining four hospitals as external test sets to evaluate the model's performance and robustness. In the training and validation sets, the MCEN achieved areas under the receiver operating characteristic curve (AUROCs) of 0.923 and 0.78, respectively. For the four external test sets, the MCEN achieved AUROCs ranging from 0.761- to 0.809. Incorporating clinicopathological information improved the MCEN model's predictive performance, achieving AUROCs of 0.937 and 0.811 in the training and validation sets, respectively, and ranging from 0.773- to 0.84 in the external test sets. Our study demonstrates the potential of the MCEN as a valuable tool in clinical decision-making.

原文

[1] https://doi.org/10.1038/s41746-026-02849-2

2. 中国医学生生成式人工智能使用、依赖行为及标准化应用路径的混合方法研究

期刊：npj Digital Medicine

英文标题：Mixed-methods study on GenAI Usage, dependence behaviors, and standardized application paths among Chinese medical students.

中文摘要

目的：探讨生成式人工智能（GenAI）在中国医学生中的使用现状、依赖行为及其影响因素，并探索标准化应用路径。

方法：采用解释性顺序混合方法设计。定量阶段对1295名中国医学生进行问卷调查并进行实证分析；定性阶段对16名医学教育者进行访谈，采用主题分析法阐明潜在机制。

结果：GenAI已深度融入医学生的日常学习，工具选择偏向通用平台，专业医学工具使用率极低，所有情境下的临床应用可能性均低于20%。总体依赖评分为21.91±6.75，超过60%的学生报告对GenAI存在依赖。多元线性回归分析显示，绩效期望、学业压力和社会影响与GenAI依赖呈显著正相关，而批判性思维呈显著负相关。

结论：未来医学教育应将GenAI战略性地重新定位为“认知支架”，通过强化批判性思维并建立标准化使用指南，以促进高质量发展。

本刊点评

该研究揭示了医学生对通用型GenAI工具的高度依赖及专业医学工具应用不足的现状，对核医学与影像AI领域具有警示意义。未来需在影像诊断教学中强化批判性思维训练，并推动开发符合临床规范的专业AI辅助工具，避免学生过度依赖通用模型而忽视影像学核心技能。标准化应用路径的建立将有助于平衡技术赋能与临床能力培养。

英文原摘要

Generative artificial intelligence (GenAI) is reshaping medical education while fostering technological dependence among students. This study employed an explanatory sequential mixed-methods design. In the quantitative phase, an empirical analysis was conducted using survey data collected from a sample of 1295 Chinese medical students. The subsequent qualitative phase involved thematic analysis of interview transcripts from 16 medical educators to elucidate the underlying mechanisms. Findings reveal that GenAI was deeply integrated into medical students' daily learning routines. Tool selection favored general-purpose platforms, whereas specialist medical tools exhibited exceptionally low utilization rates. The clinical application possibilities remained below 20% across all situations. With an overall dependency score of 21.91 ± 6.75, over 60% of students reported dependence on GenAI. Multivariate linear regression analysis indicated performance expectancy, academic pressure, and social influence showed significant positive correlations with GenAI dependency. Conversely, critical thinking exhibited a significant negative correlation. Future medical education should strategically reposition GenAI as a "cognitive scaffold" by reinforcing critical thinking and establishing standardized usage guidelines to facilitate high-quality development.

原文

[2] https://doi.org/10.1038/s41746-026-02839-4

3. 临床实践中大语言模型干预的临床结局与报告质量：一项系统性证据图谱

期刊：npj Digital Medicine

英文标题：Clinical outcomes and reporting quality of large language model interventions in practice: a systematic evidence map.

中文摘要

目的：评估大语言模型在临床部署中的真实世界有效性证据基础，系统刻画已发表研究和注册临床试验中使用的结局指标特征。

方法：采用系统性证据图谱方法，对2022年1月至2025年6月间评估大语言模型性能的55项纳入研究进行分析，涵盖人机协作设计比例、干预类型及报告质量评估。

结果：人机协作设计占主导地位（65.5%），主要用于决策支持和症状管理。纯大语言模型干预聚焦功能性能及操作/流程影响结局（如准确性和时间节省），而大语言模型辅助干预在心理健康终点等临床效果方面呈现积极效应。关键证据缺口包括：随机试验中诊断准确性显著较低且变异更大（范围0.65-0.88），而非随机研究通常≥0.80；临床效率影响不一致；报告质量欠佳（平均CONSORT-AI依从率78.8%），在数据处理质量和性能错误报告方面存在关键遗漏。

结论：当前证据呈现异质性和不充分性，需建立标准化核心结局集、强制使用专项报告指南并开展稳健临床试验，以确保大语言模型的安全整合。

本刊点评

本研究系统揭示了LLM在临床应用中证据基础的薄弱环节，对核医学与影像AI领域具有重要警示意义。当前影像AI辅助诊断研究同样面临报告标准不一、真实世界验证不足的困境，亟需借鉴CONSORT-AI等专项指南提升研究质量。未来应重点关注LLM在影像报告生成、结构化解读等场景中的诊断准确性变异与临床效能验证。

英文原摘要

Large language models (LLMs) are being deployed in clinical settings despite an underdeveloped evidence base regarding their real-world effectiveness. This study employed systematic evidence mapping to characterize outcome measures used in published studies and registered clinical trials (Jan 2022-Jun 2025) evaluating LLM performance. Analysis of 55 included studies revealed a predominance of human-AI collaborative designs (65.5%) for decision support and symptom management. LLM-only interventions focused on functional performance and operational or process impact outcomes (e.g., accuracy and time saving), whereas LLM-assisted interventions showed positive clinical effects, particularly in psychological health endpoints. Critical evidence gaps persist: diagnostic accuracy in randomized trials was notably lower and more variable (range 0.65-0.88) compared to non-randomized studies (typically ≥ 0.80); clinical efficiency impacts were inconsistent, and reporting quality was suboptimal (78.8% mean CONSORT-AI adherence), with critical omissions in handling data quality and performance errors. These findings indicate a heterogeneous and insufficient evidence landscape, necessitating standardized core outcome sets, mandatory use of specialized reporting guidelines, and robust clinical trials to ensure the safe integration of LLMs.

原文

[3] https://doi.org/10.1038/s41746-026-02837-6