【医学影像与AI文献快递】第56期|2026年6月25日
1. 临床元数据与[18F]FDG PET/CT深度融合用于非小细胞肺癌组织学分型:多中心研究
期刊:EJNMMI
英文标题:Deep integration of clinical metadata with [18F]FDG PET/CT imaging for histological subtyping in non-small cell lung cancer: a multi-center study.
中文摘要
目的:开发整合临床元数据与[18F]FDG PET/CT影像的多模态深度学习框架,以改善非小细胞肺癌腺癌与鳞癌的无创组织学分型。
方法:多中心手术NSCLC队列共780例,分为开发集675例和独立外部测试集105例。模型先用3D Transformer完成肿瘤定位,再用基于FiLM的多任务网络将早晚期等临床分期信息动态注入视觉主干。
结果:在验证集上,多模态模型AUC为0.894(95% CI 0.813-0.959),显著优于传统放射组学基线0.796(P=0.017)和纯临床基线0.759(P=0.004)。内测集AUC为0.832(95% CI 0.744-0.906);外部测试集AUC为0.787(95% CI 0.687-0.876),虽与纯临床0.740和纯影像0.685相比差异未达显著,但F1值最高且决策曲线净获益最佳。辅助分期任务中,FiLM融合将AUC提高到0.656。
结论:早期临床-生物学信息融合可提升FDG PET/CT对NSCLC组织学异质性的判别能力,并在跨中心泛化方面表现更稳健。
本刊点评
这类多模态肺癌工作说明,PET/CT单独看代谢表型常会遇到重叠上限,临床先验的注入不是附加项,而是破局手段。外部AUC仍不算高,但方向是对的。
英文原摘要
PURPOSE: To develop and validate a multimodal deep learning framework that integrates clinical metadata with [18F]FDG PET/CT imaging to resolve overlapping metabolic phenotypes. The primary objective is the histological subtyping of non-small cell lung cancer (NSCLC), utilizing binary clinical staging (early vs. advanced) strategically as an auxiliary regularization task.
METHODS: A multi-center surgical NSCLC cohort (n = 780) was partitioned into a development set (n = 675) and an independent external test set (n = 105). The framework first utilized a 3D Transformer for bounding-box-based tumor localization. Subsequently, a multi-task network employed Feature-wise Linear Modulation (FiLM) to dynamically inject clinical metadata into the visual backbone.
RESULTS: For histological subtyping of adenocarcinoma versus squamous cell carcinoma, in the validation cohort, the proposed multimodal framework achieved the highest area under the receiver operating characteristic curve (AUC) of 0.894 (95% CI: 0.813-0.959), significantly outperforming the conventional radiomics baseline (AUC = 0.796, DeLong test P = 0.017) and the clinical-only baseline (AUC = 0.759, P = 0.004). On the internal test set, the multimodal model maintained an AUC of 0.832 (95% CI: 0.744-0.906), outperforming competing models numerically, though differences did not reach statistical significance (all P > 0.11). On the independent external test cohort, the multimodal framework demonstrated superior cross-center stability, maintaining an AUC of 0.787 (95% CI: 0.687-0.876). On the external cohort, the between-model AUC differences did not reach statistical significance against the clinical-only model (AUC of 0.740, P = 0.480) or the image-only model (AUC of 0.685, P = 0.082). Nevertheless, the multimodal framework achieved the highest F1-score and yielded the most optimal net clinical benefit across a wide range of threshold probabilities in decision curve analysis. For the intrinsically challenging auxiliary staging task, the unguided image-only network exhibited severe vulnerability, however, the FiLM-based multimodal mechanism effectively enhanced diagnostic capacity by employing systemic clinical priors, improving the AUC to 0.656.
CONCLUSION: Combining 3D detection with an early clinico-biological fusion strategy effectively enhances NSCLC characterization on [18F]FDG PET/CT, which has the potential to mitigate the limitations of single-modality imaging in resolving diagnostically ambiguous cases characterized by overlapping [18F]FDG uptake phenotypes, thereby providing a non-invasive decision-support tool in the precision management of NSCLC.
原文
[1] https://doi.org/10.1007/s00259-026-08010-1
2. 高危儿童霍奇金淋巴瘤基线与中期PET特征的预后价值:AHOD1331试验回顾性分析
期刊:Journal of Nuclear Medicine (JNM)
英文标题:Evaluation of Baseline and Interim-Therapy PET Features for Prognostication in High-Risk Pediatric Hodgkin Lymphoma: A Retrospective Analysis of the AHOD1331 Trial.
中文摘要
目的:评估传统PET指标和高级影像特征在高危儿童霍奇金淋巴瘤中预测无进展生存的能力,并比较深度学习自动分割与人工分割特征的效果。
方法:研究基于儿童肿瘤协作组AHOD1331试验数据,纳入150家机构的558例患者,提取基线和中期治疗PET的传统指标与放射组学特征,并结合临床变量建立嵌套交叉验证的CoxNet结局模型。
结果:所有特征中,仅基线常规PET指标模型表现最佳,C-index为0.72±0.01,显著优于仅临床模型的0.65±0.01。加入放射组学或中期PET特征均未进一步提升表现;基于深度学习自动分割的特征模型C-index同样为0.72±0.01,与基于医生勾画的模型相当。
结论:在高危儿童霍奇金淋巴瘤中,基线定量PET特征比复杂放射组学和中期治疗特征更具稳健预后价值,自动分割可支持此类特征提取的流程自动化。
本刊点评
这篇研究给了一个很重要的反直觉结果:复杂特征不一定赢过基础PET指标。对于多中心儿科队列,稳健和可复制往往比“更花哨”的影像表征更有价值。
英文原摘要
This study aimed to evaluate the prognostic value of conventional and advanced PET metrics for predicting progression-free survival in high-risk pediatric Hodgkin lymphoma (HL), using data from the Children's Oncology Group AHOD1331 trial. Methods: This retrospective analysis included 558 patients from 150 institutions. Conventional PET metrics and radiomics features were extracted from both baseline and interim-therapy PET images. Clinical variables, including demographic characteristics and common risk factors, were also considered. A standardized outcome-modeling pipeline was developed, comprising feature selection and penalized Cox regression with elastic net regularization (CoxNet). Models were trained and evaluated using nested cross-validation stratified by institution, ensuring consistent external testing. We investigated whether conventional PET metrics added prognostic value beyond clinical variables and whether radiomics or interim PET-derived features offered further benefit. Additionally, we compared the prognostic performance of features extracted from automated deep learning (DL) segmentations against those derived from physician annotations. Model performance was assessed using the concordance index (C-index). Results: Among all types of features, the CoxNet model using conventional baseline PET metrics achieved the highest performance (C-index, 0.72 ± 0.01), significantly outperforming the clinical model (C-index, 0.65 ± 0.01). Neither the inclusion of radiomics features nor interim PET-derived features improved performance. Models using DL-generated segmentations achieved comparable prognostic accuracy (C-index, 0.72 ± 0.01) to those using physician-based segmentations. Conclusion: Quantitative PET features provided significant prognostic improvements over clinical variables. Furthermore, DL-based segmentation offered a promising approach for automating feature extraction, supporting more effective risk stratification in pediatric HL.
原文
[2] https://doi.org/10.2967/jnumed.125.270791
3. FetalCLIP:用于胎儿超声图像分析的视觉-语言基础模型
期刊:npj Digital Medicine
英文标题:FetalCLIP: a visual-language foundation model for fetal ultrasound image analysis.
中文摘要
目的:构建一个能够学习胎儿超声通用表示的视觉-语言基础模型,以提升多种下游胎儿超声任务的性能和数据效率。
方法:作者以210,035张配对文本的胎儿超声图像进行多模态预训练,提出FetalCLIP,并在分类、孕周估计、先天性心脏病检测和胎儿结构分割等多项任务上与基线模型比较。
结果:FetalCLIP在广泛基准测试中均优于所有对照方法,展现出较强泛化能力,并且在标注数据有限时仍保持良好性能。摘要未给出各任务具体数值,但强调这是目前同类视觉-语言模型中规模最大的胎儿超声配对数据集之一。
结论:基于大规模配对数据预训练的胎儿超声基础模型可为多任务分析提供统一而稳健的表示。
本刊点评
胎儿超声长期受制于视角复杂、文本配对稀缺和标注成本高,基础模型路线很有吸引力。关键问题将转向:这种通用表示在多中心、不同设备和操作者条件下能否保持稳定。
英文原摘要
Foundation models are becoming increasingly effective in the medical domain, offering pre-trained models on large datasets that can be readily adapted for downstream tasks. Despite progress, fetal ultrasound images remain a challenging domain for visual-language foundation models due to their inherent complexity, often requiring substantial additional training and facing limitations due to the scarcity of paired multimodal data. To overcome these challenges, here we introduce FetalCLIP, a vision-language foundation model capable of generating universal representation of fetal ultrasound images. FetalCLIP was pre-trained using a multimodal learning approach on a diverse dataset of 210,035 fetal ultrasound images paired with text. This represents the largest paired dataset of its kind used for visual-language foundation model development to date. This unique training approach allows FetalCLIP to effectively learn the intricate anatomical features present in fetal ultrasound images, resulting in robust representations that can be used for a variety of downstream applications. In extensive benchmarking across a range of key fetal ultrasound applications, including classification, gestational age estimation, congenital heart defect (CHD) detection, and fetal structure segmentation, FetalCLIP outperformed all baselines while demonstrating remarkable generalizability and strong performance even with limited labeled data. The FetalCLIP model is publicly available at https://github.com/biomedia-mbzuai/fetalclip to support the broader scientific community.
原文
[3] https://doi.org/10.1038/s41746-026-02907-9
4. MRI与超声结合深度学习和手工放射组学预测乳腺癌腋窝淋巴结状态:系统综述与Meta分析
期刊:European Radiology
英文标题:Assessing the performance of deep learning and hand-crafted radiomics models using MRI and ultrasound in predicting axillary lymph node status in breast cancer: a systematic review and meta-analysis.
中文摘要
目的:总结MRI和超声AI模型在乳腺癌腋窝淋巴结转移预测中的诊断效能,并比较深度学习、手工放射组学及其组合策略。
方法:检索截至2024年11月的四个数据库,共纳入41项以病理为参照的研究,采用双变量随机效应模型汇总敏感度、特异度和AUC,并进行异质性与亚组分析。
结果:内部验证合并敏感度为0.79(95% CI 0.74-0.84)、特异度0.78(95% CI 0.75-0.81),AUC为0.84;外部验证敏感度0.78、特异度0.74,AUC为0.82。似然比分析提示其单独临床效用有限(LR+ 3.0,LR- 0.33)。DL+HCR集成方法表现最佳,MRI任务AUC为0.88,超声任务AUC为0.92;同时纳入瘤内与瘤周区域优于仅瘤内区域(AUC 0.81 vs 0.75)。
结论:现有MRI/超声AI模型对ALNM具有中等偏好的诊断效能,更适合作为辅助分层工具;标准化开发和跨人群外部验证仍是临床转化前提。
本刊点评
这类Meta分析的现实意义在于给临床一个“不要过度解读AI”的边界。AUC看起来不错,但似然比仍提示它更像术前补充信息,而不是替代哨兵淋巴结活检。
英文原摘要
BACKGROUND: Axillary lymph node metastasis (ALNM) is a critical prognostic factor in breast cancer. While sentinel lymph node biopsy remains the gold standard, conventional imaging relies on operator expertise with variable diagnostic accuracy. This meta-analysis evaluates the diagnostic performance of deep learning (DL) and hand-crafted radiomics (HCR) models using MRI and ultrasound (US) for ALNM prediction.
METHODS: Literature search was conducted across four databases up to November 2024. Studies assessing DL or HCR models for ALNM prediction using MRI or US, with histopathological confirmation as the reference standard, were included. Diagnostic accuracy metrics were pooled using bivariate random-effects meta-analysis. Heterogeneity was assessed using Higgins I², and subgroup analyses explored its potential sources.
RESULTS: Across 41 included studies, pooled sensitivity was 0.79 (95% CI: 0.74-0.84) and specificity 0.78 (95% CI: 0.75-0.81) for internal validation, with AUC 0.84. External validation demonstrated sensitivity of 0.78 and specificity of 0.74, with an AUC of 0.82. Likelihood ratio analysis (LR+ 3.0, LR- 0.33) indicated limited standalone clinical utility. Ensemble approaches combining DL and HCR showed higher diagnostic performance (AUC = 0.88 in MRI and AUC = 0.92 in US) compared to individual methods. Models incorporating both intratumoral and peritumoral regions yielded higher AUCs (0.81 vs 0.75) than intratumoral alone.
CONCLUSION: AI models demonstrate moderate diagnostic accuracy but limited standalone clinical utility. These tools may serve adjunctive roles in risk stratification and treatment planning. Ensemble approaches combining DL and HCR achieve superior performance. Methodological standardization and validation across diverse populations are essential before clinical implementation.
KEY POINTS: Question Can artificial intelligence models applied to US and MRI accurately predict axillary lymph-node metastasis in breast cancer? Findings Across 41 studies, AI models demonstrated good diagnostic performance for predicting nodal metastasis. DL, HCR, and combined approaches each achieved clinically meaningful accuracy, with integrated DL + HCR models showing the highest pooled diagnostic effect and lowest heterogeneity. Clinical relevance AI-enhanced imaging can assist in non-invasively stratifying nodal status and may reduce reliance on invasive procedures such as sentinel lymph-node biopsy. Consistent imaging protocols, standardized model development, and external validation are needed before clinical adoption.
原文
[4] https://doi.org/10.1007/s00330-026-12682-6
5. 利用可解释AI建模全天ECG信号预测心力衰竭风险
期刊:npj Digital Medicine
英文标题:Modeling day-long ECG signals to predict heart failure risk with explainable AI.
中文摘要
目的:探索基于24小时单导联Holter ECG的深度学习模型,能否在5年内预测心力衰竭风险。
方法:研究使用Technion-Leumit Holter ECG数据集,共69,663份记录、47,729名患者、跨度20年,训练DeepHHF模型,并与基于30秒片段的模型及临床评分进行比较,同时做可解释性分析。
结果:DeepHHF的AUC达到0.80,优于30秒片段模型和临床评分。被模型判为高风险的个体发生住院或死亡事件的概率增加约2倍;可解释性分析显示模型主要关注心律失常和心脏异常相关特征。
结论:连续24小时ECG的深度学习建模能够捕捉阵发性事件,对HF长期风险预测具有可行性,并具备低成本、非侵入的应用前景。
本刊点评
虽然这不是影像研究,但它代表了医疗AI从静态快照走向长程时序建模的趋势。对可穿戴设备和门诊随访结合场景,这类模型的转化空间很大。
英文原摘要
Heart failure (HF) affects 11.8% of adults aged 65 and older, reducing quality of life and longevity. Preventing HF can reduce morbidity and mortality. We hypothesized that artificial intelligence (AI) applied to 24-hour single-lead electrocardiogram (ECG) data could predict the risk of HF within five years. To research this, the Technion-Leumit Holter ECG (TLHE) dataset, including 69,663 recordings from 47,729 patients, collected over 20 years, was used. Our deep learning model, DeepHHF, trained on 24-hour ECG recordings, achieved an area under the receiver operating characteristic curve of 0.80 that outperformed a model using 30-second segments and a clinical score. High-risk individuals identified by DeepHHF had a two-fold chance of hospitalization or death incidents. Explainability analysis showed DeepHHF focused on arrhythmias and heart abnormalities. This study highlights the feasibility of deep learning to model 24-hour continuous ECG data, capturing paroxysmal events essential for reliable risk prediction. Artificial intelligence applied to single-lead Holter ECG is non-invasive, inexpensive, and widely accessible, making it a promising tool for HF risk prediction.
原文
[5] https://doi.org/10.1038/s41746-026-02835-8
6. 注意力引导的公平皮肤癌诊断AI建模
期刊:npj Digital Medicine
英文标题:Attention guided fair artificial intelligence modeling for skin cancer diagnosis.
中文摘要
目的:缓解皮肤病学AI中的性别偏倚,同时保持皮肤癌二分类诊断性能。
方法:作者提出LesionAttn算法,通过把模型注意力显式引导到病灶区域,并结合Pareto Frontier双目标优化,在两个大型皮肤科数据集上与现有偏倚缓解算法比较。
结果:LesionAttn在显著降低性别偏倚的同时维持较高诊断性能,并整体优于既有偏倚缓解方法。摘要未提供具体AUC或公平性差值,但强调该策略有效平衡了公平性与准确度。
结论:将模型注意力锚定在医学关键特征上,是兼顾性能与公平性的务实路径,可为更可靠的临床皮肤科AI提供参考。
本刊点评
公平性如果只靠后处理约束,往往容易牺牲判别能力。把临床先验直接注入注意力分配,是比单纯做损失函数惩罚更有希望的一条路线。
英文原摘要
Artificial intelligence (AI) has shown promise in dermatology, offering accurate and non-invasive diagnosis of skin cancer. While extensive research has addressed skin-tone bias, gender bias in dermatologic AI remains underexplored, potentially perpetuating diagnostic disparities. In this study, we developed LesionAttn, an algorithm designed to mitigate gender bias by directing model attention toward lesions, thereby mirroring clinicians' diagnostic focus. Combined with Pareto Frontier optimization for dual-objective model selection, LesionAttn balances gender fairness and diagnostic performance. Validated on two large-scale dermatologic datasets for binary malignancy classification, LesionAttn significantly mitigated gender bias while maintaining high diagnostic performance, outperforming existing bias-mitigation algorithms. Our study demonstrates that explicitly guiding model attention to medically essential features provides a practical approach to advance both performance and fairness in dermatologic AI. By leveraging clinical priors to bridge the gap between human expertise and algorithmic optimization, this study demonstrates a feasible pathway for developing equitable and reliable diagnostic tools.
原文
[6] https://doi.org/10.1038/s41746-026-02897-8