Abstract
Machine learning is increasingly used to discover diagnostic and prognostic biomarkers from high-dimensional molecular data. However, a variety of factors related to experimental design may affect the ability to learn generalizable and clinically applicable diagnostics. Here we argue that a causal perspective improves the identification of these challenges and formalizes their relation to the robustness and generalization of machine learning-based diagnostics. To make for a concrete discussion, we focus on a specific, recently established high-dimensional biomarker—adaptive immune receptor repertoires (AIRRs). Through simulations, we illustrate how major biological and experimental factors of the AIRR domain may influence the learned biomarkers. In conclusion, we argue that causal modelling improves machine learning-based biomarker robustness by identifying stable relations between variables and guiding the adjustment of the relations and variables that vary between populations.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
9,800 Yen / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
¥14,900 per year
only ¥1,242 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout





Similar content being viewed by others
Data availability
All data and results for the analysis presented in the manuscript are openly available on Zenodo at https://zenodo.org/record/7756163 (experiment 1), https://zenodo.org/record/7752837 (experiment 2), https://zenodo.org/record/7752115 (experiment 3, AIRR setting) and https://zenodo.org/record/7727894 (experiment 3, transcriptomic setting).
Code availability
The source code for the experiments is openly available on GitHub at https://github.com/uio-bmi/causalairr.
References
Frazer, K. A., Murray, S. S., Schork, N. J. & Topol, E. J. Human genetic variation and its contribution to complex traits. Nat. Rev. Genet. 10, 241–251 (2009).
Locke, W. J. et al. DNA methylation cancer biomarkers: translation to the clinic. Front. Genet. 10, 1150 (2019).
Byron, S. A., Van Keuren-Jensen, K. R., Engelthaler, D. M., Carpten, J. D. & Craig, D. W. Translating RNA sequencing into clinical diagnostics: opportunities and challenges. Nat. Rev. Genet. 17, 257–271 (2016).
Huang, K., Wu, L. & Yang, Y. Gut microbiota: an emerging biological diagnostic and treatment approach for gastrointestinal diseases. JGH Open 5, 973–975 (2021).
Arnaout, R. A. et al. The future of blood testing is the immunome. Front. Immunol 12, 626793 (2021).
Strimbu, K. & Tavel, J. A. What are biomarkers? Curr. Opin. HIV AIDS 5, 463–466 (2010).
Subbaswamy, A. & Saria, S. From development to deployment: dataset shift, causality and shift-stable models in health AI. Biostatistics 21, 345–352 (2020).
Castro, D. C., Walker, I. & Glocker, B. Causality matters in medical imaging. Nat. Commun. 11, 3673 (2020).
Whalen, S., Schreiber, J., Noble, W. S. & Pollard, K. S. Navigating the pitfalls of applying machine learning in genomics. Nat. Rev. Genet. 23, 169–181 (2021).
Dockès, J., Varoquaux, G. & Poline, J.-B. Preventing dataset shift from breaking machine-learning biomarkers. GigaScience. 10, giab055 (2021).
Daumé, H. & Marcu, D. Domain adaptation for statistical classifiers. J. Artif. Intell. Res. 26, 101–126 (2006).
Kouw, W. M. & Loog, M. A review of domain adaptation without target labels. IEEE Trans. Pattern Anal. Mach. Intell. 43, 766–785 (2021).
Wang, J. et al. Generalizing to unseen domains: a survey on domain generalization. IEEE Trans. Knowl. Data Eng. 35, 8052–8072 (2023).
Gulrajani, I. & Lopez-Paz, D. In search of lost domain generalization. Preprint at https://arxiv.org/abs/2007.01434 (2020).
Liu, J. et al. Towards out-of-distribution generalization: a survey. Preprint at https://doi.org/10.48550/arXiv.2108.13624 (2023).
Pearl, J. Causality (Cambridge Univ. Press, 2009); https://doi.org/10.1017/CBO9780511803161
Peters, J., Janzing, D. & Schölkopf, B. Elements of Causal Inference: Foundations and Learning Algorithms (MIT Press, 2017).
Hernán, M. & Robins, J. Causal Inference: What If (Chapman & Hall/CRC, 2020).
Rothenhäusler, D. & Bühlmann, P. Distributionally robust and generalizable inference. Statist. Sci. 38, 527–542 (2023).
Kaddour, J., Lynch, A., Liu, Q., Kusner, M. J. & Silva, R. Causal machine learning: a survey and open problems. Preprint at https://doi.org/10.48550/arXiv.2206.15475 (2022).
Heinze-Deml, C., Maathuis, M. H. & Meinshausen, N. Causal structure learning. Annu. Rev. Stat. Appl. 5, 371–391 (2018).
Squires, C. & Uhler, C. Causal structure learning: a combinatorial perspective. Found. Comput. Math. https://doi.org/10.1007/s10208-022-09581-9 (2022).
Peters, J., Bühlmann, P. & Meinshausen, N. Causal inference by using invariant prediction: identification and confidence intervals. J. R. Stat. Soc. B Stat. Methodol. 78, 947–1012 (2016).
Arjovsky, M., Bottou, L., Gulrajani, I. & Lopez-Paz, D. Invariant risk minimization. Preprint at https://doi.org/10.48550/arXiv.1907.02893 (2020).
Jiang, Y. & Veitch, V. Invariant and transportable representations for anti-causal domain shifts. Adv. Neural Inf. Process Syst. 35, 20782–20794 (2022).
Magliacane, S. et al. Domain adaptation by using causal inference to predict invariant conditional distributions. Adv. Neural Inf. Process Syst. 31, 10846–10856 (2018).
Schölkopf, B. et al. Toward causal representation learning. Proc. IEEE 109, 612–634 (2021).
Cui, P. & Athey, S. Stable learning establishes some common ground between causal inference and machine learning. Nat. Mach. Intell. 4, 110–115 (2022).
Bareinboim, E. & Pearl, J. Causal inference and the data-fusion problem. Proc. Natl Acad. Sci. USA 113, 7345–7352 (2016).
Richens, J. G., Lee, C. M. & Johri, S. Improving the accuracy of medical diagnosis with causal machine learning. Nat. Commun. 11, 3923 (2020).
Prosperi, M. et al. Causal inference and counterfactual prediction in machine learning for actionable healthcare. Nat. Mach. Intell. 2, 369–375 (2020).
Raita, Y., Camargo, C. A., Liang, L. & Hasegawa, K. Big data, data science and causal inference: a primer for clinicians. Front. Med. 8, 678047 (2021).
Schölkopf, B. et al. On causal and anticausal learning. In Proc. 29th International Conference on Machine Learning 459–466 (Omnipress, 2012).
Greiff, V., Yaari, G. & Cowell, L. Mining adaptive immune receptor repertoires for biological and clinical information using machine learning. Curr. Opin. Syst. Biol. https://doi.org/10.1016/j.coisb.2020.10.010 (2020).
Emerson, R. O. et al. Immunosequencing identifies signatures of cytomegalovirus exposure history and HLA-mediated effects on the T cell repertoire. Nat. Genet. 49, 659–665 (2017).
Kelly, C. J., Karthikesalingam, A., Suleyman, M., Corrado, G. & King, D. Key challenges for delivering clinical impact with artificial intelligence. BMC Med. 17, 195 (2019).
Britanova, O. V. et al. Age-related decrease in TCR repertoire diversity measured with deep and normalized sequence profiling. J. Immunol. 192, 2689–2698 (2014).
Schneider-Hohendorf, T. et al. Sex bias in MHC I-associated shaping of the adaptive immune system. Proc. Natl Acad. Sci. USA 115, 2168–2173 (2018).
Slabodkin, A. et al. Individualized VDJ recombination predisposes the available Ig sequence space. Genome Res. 31, 2209–2224 (2021).
Dendrou, C. A., Petersen, J., Rossjohn, J. & Fugger, L. HLA variation and disease. Nat. Rev. Immunol. 18, 325–339 (2018).
Ishigaki, K. et al. HLA autoimmune risk alleles restrict the hypervariable region of T cell receptors. Nat. Genet. 54, 393–402 (2022).
Barennes, P. et al. Benchmarking of T cell receptor repertoire profiling methods reveals large systematic biases. Nat. Biotechnol. 39, 236–245 (2021).
Trück, J. et al. Biological controls for standardization and interpretation of adaptive immune receptor repertoire profiling. eLife 10, e66274 (2021).
Smirnova, A. O. et al. The use of non-functional clonotypes as a natural calibrator for quantitative bias correction in adaptive immune receptor repertoire profiling. eLife 12, e69157 (2023).
Krishna, C., Chowell, D., Gönen, M., Elhanati, Y. & Chan, T. A. Genetic and environmental determinants of human TCR repertoire diversity. Immun. Ageing 17, 26 (2020).
Klein, S. L. & Flanagan, K. L. Sex differences in immune responses. Nat. Rev. Immunol. 16, 626–638 (2016).
Castelo-Branco, C. & Soveral, I. The immune system and aging: a review. Gynecol. Endocrinol. 30, 16–22 (2014).
Hernán, M. A., Hsu, J. & Healy, B. A second chance to get causal inference right: a classification of data science tasks. Chance 32, 42–49 (2019).
Blaas, A., Miller, A., Zappella, L., Jacobsen, J.-H. & Heinze-Deml, C. Considerations for distribution shift robustness in health. In Proc. Machine Learning for Healthcare Workshop (ICLR, 2023).
Leek, J. T. et al. Tackling the widespread and critical impact of batch effects in high-throughput data. Nat. Rev. Genet. 11, 733–739 (2010).
Bonaguro, L. et al. A guide to systems-level immunomics. Nat. Immunol. 23, 1412–1423 (2022).
Bareinboim, E. & Pearl, J. Controlling selection bias in causal inference. In Proc. 15th International Conference on Artificial Intelligence and Statistics Vol. 22 (eds Lawrence, N. et al.), 100–108 (PMLR, 2012).
Correa, J., Tian, J. & Bareinboim, E. Generalized adjustment under confounding and selection biases. In Proc. 32nd AAAI Conference on Artificial Intelligence Vol. 32, 6335–6342 (AAAI, 2018).
Laubach, Z. M., Murray, E. J., Hoke, K. L., Safran, R. J. & Perng, W. A biologist’s guide to model selection and causal inference. Proc. R. Soc. B Biol. Sci. 288, 20202815 (2021).
Hernán, M. A., Hernández-Díaz, S. & Robins, J. M. A structural approach to selection bias. Epidemiology 15, 615–625 (2004).
Zhang, K., Schölkopf, B., Muandet, K. & Wang, Z. Domain adaptation under target and conditional shift. In Proc. International Conference on Machine Learning 28 (eds Dasgupta, S. et al.) 819–827 (PMLR, 2013).
Garg, S., Wu, Y., Balakrishnan, S. & Lipton, Z. C. A unified view of label shift estimation. Adv. Neural Inf. Proc. Syst. 33, 3290–3300 (2020).
Pearl, J. & Bareinboim, E. External validity: from Do-calculus to transportability across populations. Stat. Sci. 29, 579–595 (2014).
Degtiar, I. & Rose, S. A review of generalizability and transportability. Annu. Rev. Stat. Appl. 10, 501–524 (2023).
Sharon, E. et al. Genetic variation in MHC proteins is associated with T cell receptor expression biases. Nat. Genet. 48, 995–1002 (2016).
Jabri, B. & Sollid, L. M. T cells in Celiac disease. J. Immunol. 198, 3005–3014 (2017).
Schaafsma, E., Fugle, C. M., Wang, X. & Cheng, C. Pan-cancer association of HLA gene expression with cancer prognosis and immunotherapy efficacy. Br. J. Cancer 125, 422–432 (2021).
Rappazzo, C. G. et al. Defining and studying B cell receptor and TCR interactions. J. Immunol. 211, 311–322 (2023).
Hendrycks, D., Lee, K. & Mazeika, M. Using pre-training can improve model robustness and uncertainty. In Proc. 36th International Conference on Machine Learning (eds Chaudhuri, K. et al.) 2712–2721 (PMLR, 2019).
Pradier, M. F. et al. AIRIVA: a deep generative model of adaptive immune repertoires. Preprint at https://doi.org/10.48550/arXiv.2304.13737 (2023).
Gao, Y. et al. Pan-Peptide meta learning for T-cell receptor–antigen binding recognition. Nat. Mach. Intell. 5, 236–249 (2023).
Ostrovsky-Berman, M., Frankel, B., Polak, P. & Yaari, G. Immune2vec: embedding B/T cell receptor sequences in ℝN using natural language processing. Front. Immunol. 12, 680687 (2021).
Fang, Y., Liu, X. & Liu, H. Attention-aware contrastive learning for predicting T cell receptor–antigen binding specificity. Brief. Bioinform. 23, bbac378 (2022).
Gupta, G., Kapila, R., Gupta, K. & Raskar, R. Domain generalization in robust invariant representation. Preprint at https://doi.org/10.48550/arXiv.2304.03431 (2023).
Zhang, J. & Bottou, L. Learning useful representations for shifting tasks and distributions. In Proc. 40th International Conference on Machine Learning (eds Krause, A et al.), 40830–40850 (PMLR, 2023).
Walsh, I. et al. DOME: recommendations for supervised machine learning validation in biology. Nat. Methods 18, 1122–1127 (2021).
Wiles, O. et al. A fine-grained analysis on distribution shift. Preprint at https://arxiv.org/abs/2110.11328 (2021).
Byrd, J. & Lipton, Z. What is the effect of importance weighting in deep learning? In Proc. 36th International Conference on Machine Learning (eds Chaudhuri, K. et al.) 872–881 (PMLR, 2019).
Rubelt, F. et al. Adaptive Immune Receptor Repertoire Community recommendations for sharing immune-repertoire sequencing data. Nat. Immunol. 18, 1274–1278 (2017).
Vander Heiden, J. A. et al. AIRR community standardized representations for annotated immune repertoires. Front. Immunol. 9, 2206 (2018).
Peng, K. et al. Diversity in immunogenomics: the value and the challenge. Nat. Methods 18, 588–591 (2021).
Huang, Y.-N. et al. Ancestral diversity is limited in published T cell receptor sequencing studies. Immunity 54, 2177–2179 (2021).
Registered Reports (Center for Open Science); https://www.cos.io/initiatives/registered-reports
DeWitt, W. S. III et al. Human T cell receptor occurrence patterns encode immune history, genetic background and receptor specificity. eLife 7, e38358 (2018).
Zaslavsky, M. E. et al. Disease diagnostics using machine learning of immune receptors. Preprint at bioRxiv https://doi.org/10.1101/2022.04.26.489314 (2023).
Langenberg, C., Hingorani, A. D. & Whitty, C. J. M. Biological and functional multimorbidity—from mechanisms to management. Nat. Med. 29, 1649–1657 (2023).
Bongers, S., Forré, P., Peters, J. & Mooij, J. M. Foundations of structural causal models with cycles and latent variables. Ann. Stat. 49, 2885–2915 (2021).
Chakraborty, B. & Murphy, S. A. Dynamic treatment regimes. Annu. Rev. Stat. Appl. 1, 447–464 (2014).
Bizzarri, M. et al. A call for a better understanding of causation in cell biology. Nat. Rev. Mol. Cell Biol. 20, 261–262 (2019).
Baron, R. M. & Kenny, D. A. The moderator–mediator variable distinction in social psychological research: conceptual, strategic and statistical considerations. J. Pers. Soc. Psychol. 51, 1173–1182 (1986).
Greiff, V., Miho, E., Menzel, U. & Reddy, S. T. Bioinformatic and statistical analysis of adaptive immune repertoires. Trends Immunol. 36, 738–749 (2015).
Nikolich-Žugich, J., Slifka, M. K. & Messaoudi, I. The many important facets of T-cell repertoire diversity. Nat. Rev. Immunol. 4, 123–132 (2004).
Zarnitsyna, V., Evavold, B., Schoettle, L., Blattman, J. & Antia, R. Estimating the diversity, completeness, and cross-reactivity of the T cell repertoire. Front. Immunol. 4, 485 (2013).
Murugan, A., Mora, T., Walczak, A. M. & Callan, C. G. Statistical inference of the generation probability of T-cell receptors from sequence repertoires. Proc. Natl Acad. Sci. USA 109, 16161–16166 (2012).
Tonegawa, S. Somatic generation of antibody diversity. Nature 302, 575–581 (1983).
Weinstein, J. A., Jiang, N., White, R. A., Fisher, D. S. & Quake, S. R. High-throughput sequencing of the zebrafish antibody repertoire. Science 324, 807–810 (2009).
Xu, J. L. & Davis, M. M. Diversity in the CDR3 region of VH is sufficient for most antibody specificities. Immunity 13, 37–45 (2000).
Davis, M. M. & Bjorkman, P. J. T-cell antigen receptor genes and T-cell recognition. Nature 334, 395–402 (1988).
Brown, A. J. et al. Augmenting adaptive immunity: progress and challenges in the quantitative engineering and analysis of adaptive immune receptor repertoires. Mol. Syst. Des. Eng. 4, 701–736 (2019).
Qi, Q. et al. Diversity and clonal selection in the human T-cell repertoire. Proc. Natl Acad. Sci. USA 111, 13139–13144 (2014).
Elhanati, Y. et al. Inferring processes underlying B-cell repertoire diversity. Philos. Trans. R. Soc. Lond. B. Biol. Sci. 370, 20140243 (2015).
Greiff, V. et al. A bioinformatic framework for immune repertoire diversity profiling enables detection of immunological status. Genome Med. 7, 49 (2015).
Elhanati, Y., Sethna, Z., Callan, C. G. Jr, Mora, T. & Walczak, A. M. Predicting the spectrum of TCR repertoire sharing with a data-driven model of recombination. Immunol. Rev. 284, 167–179 (2018).
Varoquaux, G. & Cheplygina, V. Machine learning for medical imaging: methodological failures and recommendations for the future. Npj Digit. Med. 5, 48 (2022).
Ben-David, S. et al. A theory of learning from different domains. Mach. Learn. 79, 151–175 (2010).
Acknowledgements
We acknowledge generous support by The Leona M. and Harry B. Helmsley Charitable Trust (grant no. 2019PG-T1D011 to V.G.), the UiO World-Leading Research Community (to V.G. and L.M.S.), the UiO:LifeScience Convergence Environment Immunolingo (to V.G. and G.K.S.), the UiO:LifeScience Convergence Environment RealArt (to G.K.S. and C.K.), EU Horizon 2020 iReceptorplus (grant no. 825821 to V.G. and L.M.S.), a Research Council of Norway FRIPRO project (grant no. 300740 to V.G.), a Norwegian Cancer Society Grant (215817 to V.G.), a Research Council of Norway IKTPLUSS project (grant no. 311341 to V.G. and G.K.S.) and Stiftelsen Kristian Gerhard Jebsen (K.G. Jebsen Coeliac Disease Research Centre, to L.M.S. and G.K.S.).
Author information
Authors and Affiliations
Contributions
M.P., V.G. and G.K.S. conceived the study. M.P., G.S.A.H. and C.K. performed the experiments. J.P., M.E.W., L.M.S., C.K. and G.S.A.H. provided critical feedback. M.P., V.G. and G.K.S. drafted the manuscript. G.K.S. supervised the project. All authors read and approved the final manuscript and are personally accountable for its content.
Corresponding authors
Ethics declarations
Competing interests
V.G. declares advisory board positions in aiNET GmbH, Enpicom BV, Absci, Omniscope and Diagonal Therapeutics. V.G. is a consultant for Adaptyv Biosystems, Specifica Inc., Roche/Genentech, immunai and LabGenius. The remaining authors declare no competing interests.
Peer review
Peer review information
Nature Machine Intelligence thanks Anna Susmelj and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary text with details for experiments.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Pavlović, M., Al Hajj, G.S., Kanduri, C. et al. Improving generalization of machine learning-identified biomarkers using causal modelling with examples from immune receptor diagnostics. Nat Mach Intell 6, 15–24 (2024). https://doi.org/10.1038/s42256-023-00781-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s42256-023-00781-8