: Publication 19647

Publication 19647

Title:	Can a Natural Image-Based Foundation Model Outperform a Retina-Specific Model in Detecting Ocular and Systemic Diseases?
Journal:	Ophthalmology Science
Published:	27 Aug 2025
Pubmed:	https://pubmed.ncbi.nlm.nih.gov/41140901/
DOI:	https://doi.org/10.1016/j.xops.2025.100923
URL:	https://www.ophthalmologyscience.org/article/S2666-9145(25)00221-0/pdf

Abstract

Purpose: DINOv2 is a natural image-based foundation model (FM), pretrained exclusively on 142 million natural images from the LVD-142M data set. In contrast, RETFound is a retina-specific FM, pretrained on ∼3 million images, including natural images, color fundus photos, and OCT images (∼1 million each). Despite DINOv2's massive pretraining data set, its application in ophthalmology and relative performance to domain-specific FMs remain understudied. To address this gap, we conducted a head-to-head comparative evaluation between DINOv2 and RETFound models across a range of downstream ocular and systemic disease tasks.

Design: Retrospective head-to-head evaluation.

Subjects: Ocular disease detection tasks included diabetic retinopathy (DR), glaucoma, and multiclass eye diseases, whereas systemic disease incidence prediction focused on the 3-year incidence of heart failure, myocardial infarction, and ischemic stroke. Eight open-source data sets (APTOS-2019, IDRID, MESSIDOR2 for DR; PAPILA, Glaucoma Fundus for glaucoma; JSIEC, Retina, OCTID for multiclass eye diseases) and the Moorfields AlzEye data set (for systemic diseases) were used for fine-tuning and internal testing. External test sets included the same open-source data sets (cross-dataset validation) and the UK Biobank (for systemic diseases).

Methods: We replicated the fine-tuning methodology from the original RETFound study on 3 DINOv2 models (large, base, small). All models were fine-tuned on the respective data sets and evaluated through internal and external testing.

Main Outcome Measures: Area under the receiver operating characteristics curve and 2-sided t-tests were used to compare models' performances.

Results: For ocular disease detection, DINOv2 models generally outperformed RETFound. For DR, DINOv2-Large achieved AUCs of 0.850 to 0.952, exceeding RETFound's 0.823 to 0.944 (all P ≤ 0.007). For multiclass eye diseases, DINOv2-large (AUC = 0.892, Retina data set) surpassed RETFound (AUC = 0.846, P < 0.001). For glaucoma, DINOv2-base (AUC = 0.958, Glaucoma Fundus) outperformed RETFound (AUC = 0.940, P < 0.001). Conversely, for systemic disease incidence prediction, RETFound achieved superior AUCs of 0.796 (heart failure), 0.732 (myocardial infarction), and 0.754 (ischemic stroke), outperforming DINOv2's best models' AUC (0.663-0.771, all P < 0.001). This trend persisted in external validation.

Conclusions: Our findings reveal the merits of DINOv2 in ocular disease detection tasks, whereas RETFound demonstrates an edge in systemic disease incidence prediction. These findings showcase the distinct scenarios where general-purpose and domain-specific FMs excel, highlighting the importance of aligning FM selection with task-specific requirements to optimize clinical performance.

Financial Disclosures: Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.

22 Authors

Qingshan Hou

Yukun Zhou

Jocelyn Hui Lin Goh

Ke Zou

Samantha Min Er Yew

Sahana Srinivasan

Meng Wang

Thaddaeus Wai Soon Lo

Xiaofeng Lei

Siegfried K. Wagner

Mark A. Chia

Gabriel Dawei Yang

Hongyang Jiang

An Ran Ran

Rui Santos

Gabor Mark Somfai

Juan Helen Zhou

Haoyu Chen

Qingyu Chen

Carol Y. Cheung

Pearse A. Keane

Yih Chung Tham