Improving generalization of machine learning-identified biomarkers with causal modeling: an investigation into immune receptor diagnostics

Pavlović, Milena; Hajj, Ghadi S. Al; Kanduri, Chakravarthi; Pensar, Johan; Wood, Mollie; Sollid, Ludvig M.; Greiff, Victor; Sandve, Geir Kjetil

Quantitative Biology > Quantitative Methods

arXiv:2204.09291 (q-bio)

[Submitted on 20 Apr 2022 (v1), last revised 3 Apr 2023 (this version, v2)]

Title:Improving generalization of machine learning-identified biomarkers with causal modeling: an investigation into immune receptor diagnostics

Authors:Milena Pavlović, Ghadi S. Al Hajj, Chakravarthi Kanduri, Johan Pensar, Mollie Wood, Ludvig M. Sollid, Victor Greiff, Geir Kjetil Sandve

View PDF

Abstract:Machine learning is increasingly used to discover diagnostic and prognostic biomarkers from high-dimensional molecular data. However, a variety of factors related to experimental design may affect the ability to learn generalizable and clinically applicable diagnostics. Here, we argue that a causal perspective improves the identification of these challenges and formalizes their relation to the robustness and generalization of machine learning-based diagnostics. To make for a concrete discussion, we focus on a specific, recently established high-dimensional biomarker - adaptive immune receptor repertoires (AIRRs). Through simulations, we illustrate how major biological and experimental factors of the AIRR domain may influence the learned biomarkers. In conclusion, we argue that causal modeling improves machine learning-based biomarker robustness by identifying stable relations between variables and by guiding the adjustment of the relations and variables that vary between populations.

Subjects:	Quantitative Methods (q-bio.QM); Machine Learning (cs.LG)
Cite as:	arXiv:2204.09291 [q-bio.QM]
	(or arXiv:2204.09291v2 [q-bio.QM] for this version)
	https://doi.org/10.48550/arXiv.2204.09291

Submission history

From: Milena Pavlović [view email]
[v1] Wed, 20 Apr 2022 08:15:54 UTC (1,258 KB)
[v2] Mon, 3 Apr 2023 09:03:07 UTC (1,395 KB)

Quantitative Biology > Quantitative Methods

Title:Improving generalization of machine learning-identified biomarkers with causal modeling: an investigation into immune receptor diagnostics

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Quantitative Biology > Quantitative Methods

Title:Improving generalization of machine learning-identified biomarkers with causal modeling: an investigation into immune receptor diagnostics

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators