{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,8,6]],"date-time":"2024-08-06T04:38:29Z","timestamp":1722919109988},"reference-count":38,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2020,10,1]],"date-time":"2020-10-01T00:00:00Z","timestamp":1601510400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,10,1]],"date-time":"2020-10-01T00:00:00Z","timestamp":1601510400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["LM010098","AI116794","LM012601","DK112217","ES013508"],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2020,12]]},"abstract":"Abstract<\/jats:title>\nBackground<\/jats:title>\nA typical task in bioinformatics consists of identifying which features are associated with a target outcome of interest and building a predictive model. Automated machine learning (AutoML) systems such as the Tree-based Pipeline Optimization Tool (TPOT) constitute an appealing approach to this end. However, in biomedical data, there are often baseline characteristics of the subjects in a study or batch effects that need to be adjusted for in order to better isolate the effects of the features of interest on the target. Thus, the ability to perform covariate adjustments becomes particularly important for applications of AutoML to biomedical big data analysis.\n<\/jats:p>\n<\/jats:sec>\nResults<\/jats:title>\nWe developed an approach to adjust for covariates affecting features and\/or target in TPOT. Our approach is based on regressing out the covariates in a manner that avoids \u2018leakage\u2019 during the cross-validation training procedure. We describe applications of this approach to toxicogenomics and schizophrenia gene expression data sets. The TPOT extensions discussed in this work are available at https:\/\/github.com\/EpistasisLab\/tpot\/tree\/v0.11.1-resAdj<\/jats:ext-link>.<\/jats:p>\n<\/jats:sec>\nConclusions<\/jats:title>\nIn this work, we address an important need in the context of AutoML, which is particularly crucial for applications to bioinformatics and medical informatics, namely covariate adjustments. To this end we present a substantial extension of TPOT, a genetic programming based AutoML approach. We show the utility of this extension by applications to large toxicogenomics and differential gene expression data. The method is generally applicable in many other scenarios from the biomedical field.<\/jats:p>\n<\/jats:sec>","DOI":"10.1186\/s12859-020-03755-4","type":"journal-article","created":{"date-parts":[[2020,10,1]],"date-time":"2020-10-01T05:09:07Z","timestamp":1601528947000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":14,"title":["Embedding covariate adjustments in tree-based automated machine learning for biomedical big data analyses"],"prefix":"10.1186","volume":"21","author":[{"ORCID":"http:\/\/orcid.org\/0000-0002-4110-3714","authenticated-orcid":false,"given":"Elisabetta","family":"Manduchi","sequence":"first","affiliation":[]},{"given":"Weixuan","family":"Fu","sequence":"additional","affiliation":[]},{"given":"Joseph D.","family":"Romano","sequence":"additional","affiliation":[]},{"given":"Stefano","family":"Ruberto","sequence":"additional","affiliation":[]},{"given":"Jason H.","family":"Moore","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2020,10,1]]},"reference":[{"key":"3755_CR1","doi-asserted-by":"publisher","first-page":"123","DOI":"10.1007\/978-3-319-31204-0_9","volume-title":"Applications of evolutionary computation","author":"RS Olson","year":"2016","unstructured":"Olson RS, Urbanowicz RJ, Andrews PC, Lavender NA, Kidd LC, Moore JH. Automating biomedical data science through tree-based pipeline optimization. In: Squillero G, Burelli P, editors. Applications of evolutionary computation. Cham: Springer; 2016. p. 123\u201337."},{"key":"3755_CR2","doi-asserted-by":"publisher","first-page":"151","DOI":"10.1007\/978-3-030-05318-5_8","volume-title":"Automated machine learning: methods, systems, challenges","author":"RS Olson","year":"2019","unstructured":"Olson RS, Moore JH. TPOT: a tree-based pipeline optimization tool for automating machine learning. In: Hutter F, Kotthoff L, Vanschoren J, editors. Automated machine learning: methods, systems, challenges. Cham: Springer; 2019. p. 151\u201360. https:\/\/doi.org\/10.1007\/978-3-030-05318-5_8."},{"key":"3755_CR3","doi-asserted-by":"publisher","unstructured":"Orlenko A, Moore JH, Orzechowski P, Olson RS, Cairns J, Caraballo PJ, et al. Considerations for automated machine learning in clinical metabolic profiling: altered homocysteine plasma concentration associated with metformin exposure. In: Biocomputing 2018. World Scientific; 2017. p. 460\u201371. Doi: https:\/\/doi.org\/10.1142\/9789813235533_0042.","DOI":"10.1142\/9789813235533_0042"},{"key":"3755_CR4","doi-asserted-by":"crossref","first-page":"1772","DOI":"10.1093\/bioinformatics\/btz796","volume":"36","author":"A Orlenko","year":"2020","unstructured":"Orlenko A, Kofink D, Lyytik\u00e4inen L-P, Nikus K, Mishra P, Kuukasj\u00e4rvi P, et al. Model selection for metabolomics: predicting diagnosis of coronary artery disease using automated machine learning. Bioinformatics. 2020;36:1772\u20138.","journal-title":"Bioinformatics"},{"key":"3755_CR5","doi-asserted-by":"publisher","first-page":"250","DOI":"10.1093\/bioinformatics\/btz470","volume":"36","author":"TT Le","year":"2020","unstructured":"Le TT, Fu W, Moore JH. Scaling tree-based automated machine learning to biomedical big data with a feature set selector. Bioinformatics. 2020;36:250\u20136.","journal-title":"Bioinformatics"},{"key":"3755_CR6","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4757-3462-1","author":"F Harrell","year":"2001","unstructured":"Harrell F. Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis. Springer, New York. 2001. https:\/\/doi.org\/10.1007\/978-1-4757-3462-1.","journal-title":"Springer, New York."},{"issue":"Database issue","key":"3755_CR7","doi-asserted-by":"publisher","first-page":"D921","DOI":"10.1093\/nar\/gku955","volume":"43","author":"Y Igarashi","year":"2015","unstructured":"Igarashi Y, Nakatsu N, Yamashita T, Ono A, Ohno Y, Urushidani T, et al. Open TG-GATEs: a large-scale toxicogenomics database. Nucleic Acids Res. 2015;43(Database issue):D921\u20137.","journal-title":"Nucleic Acids Res"},{"key":"3755_CR8","doi-asserted-by":"publisher","DOI":"10.1126\/science.aat8464","author":"D Wang","year":"2018","unstructured":"Wang D, Liu S, Warrell J, Won H, Shi X, Navarro FCP, et al. Comprehensive functional genomic resource and integrative model for the human brain. Science. 2018. https:\/\/doi.org\/10.1126\/science.aat8464.","journal-title":"Science"},{"key":"3755_CR9","doi-asserted-by":"publisher","first-page":"267","DOI":"10.1038\/ng1180","volume":"34","author":"VK Mootha","year":"2003","unstructured":"Mootha VK, Lindgren CM, Eriksson K-F, Subramanian A, Sihag S, Lehar J, et al. PGC-1\u03b1-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet. 2003;34:267\u201373.","journal-title":"Nat Genet"},{"key":"3755_CR10","unstructured":"MacQueen J. Some methods for classification and analysis of multivariate observations. The Regents of the University of California; 1967. https:\/\/projecteuclid.org\/euclid.bsmsp\/1200512992. Accessed 29 May 2020."},{"key":"3755_CR11","doi-asserted-by":"publisher","first-page":"100","DOI":"10.2307\/2346830","volume":"28","author":"JA Hartigan","year":"1979","unstructured":"Hartigan JA, Wong MA. Algorithm AS 136: a K-means clustering algorithm. Appl Stat. 1979;28:100.","journal-title":"Appl Stat"},{"key":"3755_CR12","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18637\/jss.v061.i06","volume":"61","author":"M Charrad","year":"2014","unstructured":"Charrad M, Ghazzali N, Boiteau V, Niknafs A. NbClust: an R package for determining the relevant number of clusters in a data set. J Stat Softw. 2014;61:1\u201336.","journal-title":"J Stat Softw"},{"key":"3755_CR13","doi-asserted-by":"publisher","first-page":"95","DOI":"10.1080\/01969727408546059","volume":"4","author":"JC Dunn","year":"1974","unstructured":"Dunn JC. Well-separated clusters and optimal fuzzy partitions. J Cybern. 1974;4:95\u2013104.","journal-title":"J Cybern."},{"key":"3755_CR14","doi-asserted-by":"publisher","first-page":"313","DOI":"10.1016\/S0898-6568(97)00137-X","volume":"10","author":"RH Weiss","year":"1998","unstructured":"Weiss RH. G protein-coupled receptor signalling in the kidney. Cell Signal. 1998;10:313\u201320.","journal-title":"Cell Signal"},{"key":"3755_CR15","doi-asserted-by":"publisher","DOI":"10.3389\/fphys.2015.00219","author":"F Park","year":"2015","unstructured":"Park F. Accessory proteins for heterotrimeric G-proteins in the kidney. Front Physiol. 2015. https:\/\/doi.org\/10.3389\/fphys.2015.00219.","journal-title":"Front Physiol"},{"key":"3755_CR16","doi-asserted-by":"publisher","first-page":"244","DOI":"10.1016\/j.matbio.2016.12.003","volume":"57\u201358","author":"OM Viquez","year":"2017","unstructured":"Viquez OM, Yazlovitskaya EM, Tu T, Mernaugh G, Secades P, McKee KK, et al. Integrin alpha6 maintains the structural integrity of the kidney collecting system. Matrix Biol J Int Soc Matrix Biol. 2017;57\u201358:244\u201357.","journal-title":"Matrix Biol J Int Soc Matrix Biol"},{"key":"3755_CR17","doi-asserted-by":"publisher","first-page":"10182","DOI":"10.1038\/ncomms10182","volume":"6","author":"JM Herter","year":"2015","unstructured":"Herter JM, Grabie N, Cullere X, Azcutia V, Rosetti F, Bennett P, et al. AKAP9 regulates activation-induced retention of T lymphocytes at sites of inflammation. Nat Commun. 2015;6:10182.","journal-title":"Nat Commun"},{"key":"3755_CR18","doi-asserted-by":"publisher","first-page":"152","DOI":"10.1136\/jclinpath-2018-205456","volume":"72","author":"SH Kim","year":"2019","unstructured":"Kim SH, Park WS, Chung J. Tumour heterogeneity in triplet-paired metastatic tumour tissues in metastatic renal cell carcinoma: concordance analysis of target gene sequencing data. J Clin Pathol. 2019;72:152\u20136.","journal-title":"J Clin Pathol"},{"key":"3755_CR19","doi-asserted-by":"publisher","first-page":"eaan2507","DOI":"10.1126\/science.aan2507","volume":"357","author":"M Uhlen","year":"2017","unstructured":"Uhlen M, Zhang C, Lee S, Sj\u00f6stedt E, Fagerberg L, Bidkhori G, et al. A pathology atlas of the human cancer transcriptome. Science. 2017;357:eaan2507.","journal-title":"Science."},{"key":"3755_CR20","first-page":"1441","volume":"22","author":"C Chen","year":"2017","unstructured":"Chen C, Chi H, Min L, Junhua Z. Downregulation of guanine nucleotide-binding protein beta 1 (GNB1) is associated with worsened prognosis of clearcell renal cell carcinoma and is related to VEGF signaling pathway. J BUON. 2017;22:1441\u20136.","journal-title":"J BUON"},{"key":"3755_CR21","doi-asserted-by":"publisher","first-page":"5985","DOI":"10.1038\/onc.2017.210","volume":"36","author":"O Zimmermannova","year":"2017","unstructured":"Zimmermannova O, Doktorova E, Stuchly J, Kanderova V, Kuzilkova D, Strnad H, et al. An activating mutation of GNB1 is associated with resistance to tyrosine kinase inhibitors in ETV6-ABL1 -positive leukemia. Oncogene. 2017;36:5985\u201394.","journal-title":"Oncogene"},{"key":"3755_CR22","doi-asserted-by":"publisher","first-page":"1131","DOI":"10.1016\/j.tranon.2019.05.005","volume":"12","author":"R Ohashi","year":"2019","unstructured":"Ohashi R, Schraml P, Batavia A, Angori S, Simmler P, Rupp N, et al. Allele loss and reduced expression of CYCLOPS genes is a characteristic feature of chromophobe renal cell carcinoma. Transl Oncol. 2019;12:1131\u20137.","journal-title":"Transl Oncol"},{"key":"3755_CR23","doi-asserted-by":"publisher","first-page":"70","DOI":"10.1016\/S0165-0173(03)00203-0","volume":"43","author":"MS Lidow","year":"2003","unstructured":"Lidow MS. Calcium signaling dysfunction in schizophrenia: a unifying approach. Brain Res Brain Res Rev. 2003;43:70\u201384.","journal-title":"Brain Res Brain Res Rev"},{"key":"3755_CR24","doi-asserted-by":"publisher","first-page":"S17","DOI":"10.1186\/1755-8794-6-S1-S17","volume":"6","author":"Y Liu","year":"2013","unstructured":"Liu Y, Li Z, Zhang M, Deng Y, Yi Z, Shi T. Exploring the pathogenetic association between schizophrenia and type 2 diabetes mellitus diseases based on pathway analysis. BMC Med Genom. 2013;6:S17.","journal-title":"BMC Med Genom"},{"key":"3755_CR25","doi-asserted-by":"publisher","first-page":"477","DOI":"10.1007\/s00441-014-1806-z","volume":"357","author":"MJ Berridge","year":"2014","unstructured":"Berridge MJ. Calcium signalling and psychiatric disease: bipolar disorder and schizophrenia. Cell Tissue Res. 2014;357:477\u201392.","journal-title":"Cell Tissue Res"},{"key":"3755_CR26","doi-asserted-by":"publisher","DOI":"10.3389\/fncel.2014.00370","author":"Y Mizoguchi","year":"2014","unstructured":"Mizoguchi Y, Kato TA, Horikawa H, Monji A. Microglial intracellular Ca2+ signaling as a target of antipsychotic actions for the treatment of schizophrenia. Front Cell Neurosci. 2014. https:\/\/doi.org\/10.3389\/fncel.2014.00370.","journal-title":"Front Cell Neurosci"},{"key":"3755_CR27","doi-asserted-by":"publisher","first-page":"2894","DOI":"10.1038\/s41598-018-21297-x","volume":"8","author":"Y Hu","year":"2018","unstructured":"Hu Y, Fang Z, Yang Y, Rohlsen-Neal D, Cheng F, Wang J. Analyzing the genes related to nicotine addiction or schizophrenia via a pathway and network based approach. Sci Rep. 2018;8:2894.","journal-title":"Sci Rep"},{"key":"3755_CR28","doi-asserted-by":"publisher","first-page":"466","DOI":"10.1016\/j.neuron.2018.03.017","volume":"98","author":"E Nanou","year":"2018","unstructured":"Nanou E, Catterall WA. Calcium channels, synaptic plasticity, and neuropsychiatric disease. Neuron. 2018;98:466\u201381.","journal-title":"Neuron"},{"key":"3755_CR29","doi-asserted-by":"publisher","first-page":"200","DOI":"10.1016\/j.schres.2011.11.002","volume":"135","author":"DE Adkins","year":"2012","unstructured":"Adkins DE, Khachane AN, McClay JL, \u00c5berg K, Buksz\u00e1r J, Sullivan PF, et al. SNP-based analysis of neuroactive ligand-receptor interaction pathways implicates PGE2 as a novel mediator of antipsychotic treatment response: data from the CATIE study. Schizophr Res. 2012;135:200\u20131.","journal-title":"Schizophr Res"},{"key":"3755_CR30","doi-asserted-by":"publisher","first-page":"689","DOI":"10.1016\/S0006-3223(99)00104-3","volume":"46","author":"SV Kyosseva","year":"1999","unstructured":"Kyosseva SV, Elbein AD, Griffin WS, Mrak RE, Lyon M, Karson CN. Mitogen-activated protein kinases in schizophrenia. Biol Psychiatry. 1999;46:689\u201396.","journal-title":"Biol Psychiatry"},{"key":"3755_CR31","doi-asserted-by":"publisher","first-page":"896","DOI":"10.1038\/npp.2011.267","volume":"37","author":"AJ Funk","year":"2012","unstructured":"Funk AJ, McCullumsmith RE, Haroutunian V, Meador-Woodruff JH. Abnormal activity of the MAPK- and cAMP-associated signaling pathways in frontal cortical areas in postmortem brain in schizophrenia. Neuropsychopharmacology. 2012;37:896\u2013905.","journal-title":"Neuropsychopharmacology"},{"key":"3755_CR32","doi-asserted-by":"publisher","first-page":"23","DOI":"10.1186\/s12920-015-0098-9","volume":"8","author":"M Maschietto","year":"2015","unstructured":"Maschietto M, Tahira AC, Puga R, Lima L, Mariani D, da Silveira PB, et al. Co-expression network of neural-differentiation genes shows specific pattern in schizophrenia. BMC Med Genom. 2015;8:23.","journal-title":"BMC Med Genom"},{"key":"3755_CR33","first-page":"990","volume":"18","author":"MV Frantseva","year":"2001","unstructured":"Frantseva MV, Fitzgerald PB, Chen R, M\u00f6ller B, Daigle M, Daskalakis ZJ. Evidence for impaired long-term potentiation in schizophrenia and its relationship to motor skill learning. Cereb Cortex N Y N 1991. 2001;18:990\u20136.","journal-title":"Cereb Cortex N Y N 1991"},{"key":"3755_CR34","doi-asserted-by":"publisher","first-page":"15545","DOI":"10.1073\/pnas.0506580102","volume":"102","author":"A Subramanian","year":"2005","unstructured":"Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA. 2005;102:15545\u201350.","journal-title":"Proc Natl Acad Sci USA"},{"key":"3755_CR35","doi-asserted-by":"publisher","first-page":"469","DOI":"10.1093\/bib\/bbs037","volume":"14","author":"C Lazar","year":"2013","unstructured":"Lazar C, Meganck S, Taminau J, Steenhoff D, Coletta A, Molter C, et al. Batch effect removal methods for microarray gene expression data integration: a survey. Brief Bioinform. 2013;14:469\u201390.","journal-title":"Brief Bioinform"},{"key":"3755_CR36","doi-asserted-by":"publisher","first-page":"203","DOI":"10.1038\/s41586-018-0579-z","volume":"562","author":"C Bycroft","year":"2018","unstructured":"Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562:203\u20139.","journal-title":"Nature"},{"key":"3755_CR37","first-page":"2825","volume":"12","author":"F Pedregosa","year":"2011","unstructured":"Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: machine learning in Python. J Mach Learn Res. 2011;12:2825\u201330.","journal-title":"J Mach Learn Res"},{"key":"3755_CR38","doi-asserted-by":"publisher","first-page":"D711","DOI":"10.1093\/nar\/gky964","volume":"47","author":"A Athar","year":"2019","unstructured":"Athar A, F\u00fcllgrabe A, George N, Iqbal H, Huerta L, Ali A, et al. ArrayExpress update\u2014from bulk to single-cell expression data. Nucleic Acids Res. 2019;47:D711\u20135.","journal-title":"Nucleic Acids Res"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-020-03755-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12859-020-03755-4\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-020-03755-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,10,1]],"date-time":"2021-10-01T00:31:59Z","timestamp":1633048319000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-020-03755-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,10,1]]},"references-count":38,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2020,12]]}},"alternative-id":["3755"],"URL":"https:\/\/doi.org\/10.1186\/s12859-020-03755-4","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2020.08.24.265116","asserted-by":"object"}]},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,10,1]]},"assertion":[{"value":"21 July 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"15 September 2020","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 October 2020","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Not applicable.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"430"}}