| Title: | Adaptive stratified sampling design in two-phase studies for average causal effect estimation |
| Journal: | Biometrics |
| Published: | 8 Oct 2025 |
| Pubmed: | https://pubmed.ncbi.nlm.nih.gov/41140202/ |
| DOI: | https://doi.org/10.1093/biomtc/ujaf143 |
| Title: | Adaptive stratified sampling design in two-phase studies for average causal effect estimation |
| Journal: | Biometrics |
| Published: | 8 Oct 2025 |
| Pubmed: | https://pubmed.ncbi.nlm.nih.gov/41140202/ |
| DOI: | https://doi.org/10.1093/biomtc/ujaf143 |
WARNING: the interactive features of this website use CSS3, which your browser does not support. To use the full features of this website, please update your browser.
Causal inference using observational data often suffers from numerous confounding effects, with greatly distorted average causal effect (ACE) estimates if the confounders are ignored. Information on some confounders, such as genetic biomarkers and medical imaging, is prohibitively expensive to obtain in practice. Two-phase studies are resource-efficient solutions to this problem. In such studies, outcome, treatment, and inexpensive confounders are measured for a large number of subjects in the first phase; costly confounder measurements are then collected for a limited number of subjects in the second phase. An efficient statistical design is essential in controlling the cost arising in the second phase. In this paper, we propose an adaptive stratified sampling design (AdaStrat), which minimizes the variance of the ACE estimator with a given second-phase sample size. AdaStrat begins with gathering costly confounder measures for randomly selected pilot data, which are used to develop a stratification strategy and determine the sampling probabilities of strata. The resulting stratification and sampling strategy is applied to all first-phase subjects to determine the second-phase subjects with costly confounders measures. We rigorously show that AdaStrat produces a more efficient ACE estimator compared with the existing sampling designs with strata being prefixed. Finite sample properties of AdaStrat were evaluated through simulation studies, demonstrating its superiority against the fixed stratified sampling design (FixStrat), with relative efficiencies ranging from 20% to 30% in our simulation situations. The desired finite sample properties for AdaStrat were further confirmed through the application of the UK Biobank data.</p>
| Application ID | Title |
|---|---|
| 96744 | Causal relationship between psychosocial factors and cancer prognosis mediated by gene expressions |
Enabling scientific discoveries that improve human health