Abstract
Purpose: DINOv2 is a natural image-based foundation model (FM), pretrained exclusively on 142 million natural images from the LVD-142M data set. In contrast, RETFound is a retina-specific FM, pretrained on ∼3 million images, including natural images, color fundus photos, and OCT images (∼1 million each). Despite DINOv2's massive pretraining data set, its application in ophthalmology and relative performance to domain-specific FMs remain understudied. To address this gap, we conducted a head-to-head comparative evaluation between DINOv2 and RETFound models across a range of downstream ocular and systemic disease tasks.</p>
Design: Retrospective head-to-head evaluation.</p>
Subjects: Ocular disease detection tasks included diabetic retinopathy (DR), glaucoma, and multiclass eye diseases, whereas systemic disease incidence prediction focused on the 3-year incidence of heart failure, myocardial infarction, and ischemic stroke. Eight open-source data sets (APTOS-2019, IDRID, MESSIDOR2 for DR; PAPILA, Glaucoma Fundus for glaucoma; JSIEC, Retina, OCTID for multiclass eye diseases) and the Moorfields AlzEye data set (for systemic diseases) were used for fine-tuning and internal testing. External test sets included the same open-source data sets (cross-dataset validation) and the UK Biobank (for systemic diseases).</p>
Methods: We replicated the fine-tuning methodology from the original RETFound study on 3 DINOv2 models (large, base, small). All models were fine-tuned on the respective data sets and evaluated through internal and external testing.</p>
Main Outcome Measures: Area under the receiver operating characteristics curve and 2-sided t-tests were used to compare models' performances.</p>
Results: For ocular disease detection, DINOv2 models generally outperformed RETFound. For DR, DINOv2-Large achieved AUCs of 0.850 to 0.952, exceeding RETFound's 0.823 to 0.944 (all P ≤ 0.007). For multiclass eye diseases, DINOv2-large (AUC = 0.892, Retina data set) surpassed RETFound (AUC = 0.846, P < 0.001). For glaucoma, DINOv2-base (AUC = 0.958, Glaucoma Fundus) outperformed RETFound (AUC = 0.940, P < 0.001). Conversely, for systemic disease incidence prediction, RETFound achieved superior AUCs of 0.796 (heart failure), 0.732 (myocardial infarction), and 0.754 (ischemic stroke), outperforming DINOv2's best models' AUC (0.663-0.771, all P < 0.001). This trend persisted in external validation.</p>
Conclusions: Our findings reveal the merits of DINOv2 in ocular disease detection tasks, whereas RETFound demonstrates an edge in systemic disease incidence prediction. These findings showcase the distinct scenarios where general-purpose and domain-specific FMs excel, highlighting the importance of aligning FM selection with task-specific requirements to optimize clinical performance.</p>
Financial Disclosures: Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.</p>