{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,12,4]],"date-time":"2024-12-04T17:13:37Z","timestamp":1733332417070,"version":"3.30.1"},"update-to":[{"updated":{"date-parts":[[2023,3,24]],"date-time":"2023-03-24T00:00:00Z","timestamp":1679616000000},"DOI":"10.1371\/journal.pcbi.1010963","type":"new_version","label":"New version"}],"reference-count":41,"publisher":"Public Library of Science (PLoS)","issue":"3","license":[{"start":{"date-parts":[[2023,3,14]],"date-time":"2023-03-14T00:00:00Z","timestamp":1678752000000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100005416","name":"Norges Forskningsr\u00e5d","doi-asserted-by":"publisher","award":["272402"],"id":[{"id":"10.13039\/501100005416","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"Estimating feature importance, which is the contribution of a prediction or several predictions due to a feature, is an essential aspect of explaining data-based models. Besides explaining the model itself, an equally relevant question is which features are important in the underlying data generating process. We present a Shapley-value-based framework for inferring the importance of individual features, including uncertainty in the estimator. We build upon the recently published model-agnostic feature importance score of SAGE (Shapley additive global importance) and introduce Sub-SAGE. For tree-based models, it has the advantage that it can be estimated without computationally expensive resampling. We argue that for all model types the uncertainties in our Sub-SAGE estimator can be estimated using bootstrapping and demonstrate the approach for tree ensemble methods. The framework is exemplified on synthetic data as well as large genotype data for predicting feature importance with respect to obesity.<\/jats:p>","DOI":"10.1371\/journal.pcbi.1010963","type":"journal-article","created":{"date-parts":[[2023,3,14]],"date-time":"2023-03-14T17:32:13Z","timestamp":1678815133000},"page":"e1010963","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":4,"title":["Inferring feature importance with uncertainties with application to large genotype data"],"prefix":"10.1371","volume":"19","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2599-7914","authenticated-orcid":true,"given":"P\u00e5l Vegard","family":"Johnsen","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1820-6544","authenticated-orcid":true,"given":"Inga","family":"Str\u00fcmke","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5714-0288","authenticated-orcid":true,"given":"Mette","family":"Langaas","sequence":"additional","affiliation":[]},{"given":"Andrew Thomas","family":"DeWan","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5308-7651","authenticated-orcid":true,"given":"Signe","family":"Riemer-S\u00f8rensen","sequence":"additional","affiliation":[]}],"member":"340","published-online":{"date-parts":[[2023,3,14]]},"reference":[{"issue":"10","key":"pcbi.1010963.ref001","doi-asserted-by":"crossref","first-page":"573","DOI":"10.1038\/s42256-020-00236-4","article-title":"Drug discovery with explainable artificial intelligence","volume":"2","author":"J Jim\u00e9nez-Luna","year":"2020","journal-title":"Nature Machine Intelligence"},{"key":"pcbi.1010963.ref002","article-title":"A Value for n-Person Games","volume":"Volume II","author":"LS Shapley","year":"1953","journal-title":"Contributions to the Theory of Games (AM-28)"},{"key":"pcbi.1010963.ref003","article-title":"Explaining individual predictions when features are dependent: More accurate approximations to Shapley values","volume":"298","author":"K Aas","year":"2021","journal-title":"Artificial Intelligence"},{"issue":"1","key":"pcbi.1010963.ref004","doi-asserted-by":"crossref","DOI":"10.1038\/s42256-019-0138-9","article-title":"From local explanations to global understanding with explainable AI for trees","volume":"2","author":"SM Lundberg","year":"2020","journal-title":"Nature Machine Intelligence"},{"issue":"46","key":"pcbi.1010963.ref005","doi-asserted-by":"crossref","first-page":"2027","DOI":"10.21105\/joss.02027","article-title":"shapr: An R-package for explaining machine learning models with dependence-aware Shapley values","volume":"5","author":"N Sellereite","year":"2019","journal-title":"Journal of Open Source Software"},{"key":"pcbi.1010963.ref006","first-page":"4765","volume-title":"Advances in Neural Information Processing Systems 30","author":"SM Lundberg","year":"2017"},{"key":"pcbi.1010963.ref007","doi-asserted-by":"crossref","first-page":"647","DOI":"10.1007\/s10115-013-0679-x","article-title":"Explaining prediction models and individual predictions with feature contributions","volume":"41","author":"E Strumbelj","year":"2013","journal-title":"Knowledge and Information Systems"},{"key":"pcbi.1010963.ref008","first-page":"1","article-title":"An Efficient Explanation of Individual Classifications using Game Theory","volume":"11","author":"E Strumbelj","year":"2010","journal-title":"Journal of Machine Learning Research"},{"year":"2019","author":"SM Lundberg","journal-title":"Consistent Individualized Feature Attribution for Tree Ensembles","key":"pcbi.1010963.ref009"},{"year":"2020","author":"A Redelmeier","journal-title":"Explaining predictive models with mixed features using Shapley values and conditional inference trees","key":"pcbi.1010963.ref010"},{"year":"2021","author":"Y Kwon","journal-title":"Efficient computation and analysis of distributional Shapley values","key":"pcbi.1010963.ref011"},{"key":"pcbi.1010963.ref012","doi-asserted-by":"crossref","first-page":"1060","DOI":"10.1137\/15M1048070","article-title":"Shapley Effects for Global Sensitivity Analysis: Theory and Computation","volume":"4","author":"E Song","year":"2016","journal-title":"SIAM\/ASA Journal on Uncertainty Quantification"},{"year":"2021","author":"N Moehle","journal-title":"Portfolio Performance Attribution via Shapley Value","key":"pcbi.1010963.ref013"},{"year":"2020","author":"I Covert","journal-title":"Explaining by Removing: A Unified Framework for Model Explanation","key":"pcbi.1010963.ref014"},{"key":"pcbi.1010963.ref015","doi-asserted-by":"crossref","first-page":"1887","DOI":"10.1162\/0899766041336387","article-title":"Fair Attribution of Functional Contribution in Artificial and Biological Networks","volume":"16","author":"A Keinan","year":"2003","journal-title":"Neural Computation"},{"key":"pcbi.1010963.ref016","doi-asserted-by":"crossref","first-page":"e582","DOI":"10.7717\/peerj-cs.582","article-title":"Model independent feature attributions: Shapley values that uncover non-linear dependencies","volume":"7","author":"DV Fryer","year":"2021","journal-title":"PeerJ Computer Science"},{"year":"2020","author":"I Covert","journal-title":"Understanding Global Feature Contributions With Additive Importance Measures","key":"pcbi.1010963.ref017"},{"year":"2021","author":"D Fryer","journal-title":"Shapley values for feature selection: The good, the bad, and the axioms","key":"pcbi.1010963.ref018"},{"issue":"3","key":"pcbi.1010963.ref019","doi-asserted-by":"crossref","first-page":"e1001779","DOI":"10.1371\/journal.pmed.1001779","article-title":"UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age","volume":"12","author":"C Sudlow","year":"2015","journal-title":"PLoS medicine"},{"issue":"7726","key":"pcbi.1010963.ref020","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1038\/s41586-018-0579-z","article-title":"The UK Biobank resource with deep phenotyping and genomic data","volume":"562","author":"C Bycroft","year":"2018","journal-title":"Nature"},{"issue":"7538","key":"pcbi.1010963.ref021","doi-asserted-by":"crossref","first-page":"197","DOI":"10.1038\/nature14177","article-title":"Genetic studies of body mass index yield new insights for obesity biology","volume":"518","author":"AE Locke","year":"2015","journal-title":"Nature"},{"key":"pcbi.1010963.ref022","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1007\/BF01769885","article-title":"Monotonic solutions of cooperative games","volume":"14","author":"HP Young","year":"1985","journal-title":"International Journal of Game Theory"},{"key":"pcbi.1010963.ref023","doi-asserted-by":"crossref","first-page":"1239","DOI":"10.1214\/12-EJS710","article-title":"Axiomatic arguments for decomposiing goodness of fit according to Shapley and Owen values","volume":"6","author":"F Huettner","year":"2012","journal-title":"Electronic Journal of Statistics"},{"issue":"9","key":"pcbi.1010963.ref024","doi-asserted-by":"crossref","DOI":"10.1038\/s41588-018-0184-y","article-title":"Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies","volume":"50","author":"W Zhou","year":"2018","journal-title":"Nature Genetics"},{"issue":"1","key":"pcbi.1010963.ref025","doi-asserted-by":"crossref","DOI":"10.1186\/s12859-021-04041-7","article-title":"A new method for exploring gene\u2013gene and gene\u2013environment interactions in GWAS with tree ensemble methods and SHAP values","volume":"22","author":"PV Johnsen","year":"2021","journal-title":"BMC Bioinformatics"},{"key":"pcbi.1010963.ref026","doi-asserted-by":"crossref","first-page":"223","DOI":"10.1007\/978-0-387-84858-7","volume-title":"The Elements of Statistical Learning","author":"T Hastie","year":"2009","edition":"2"},{"issue":"397","key":"pcbi.1010963.ref027","doi-asserted-by":"crossref","first-page":"171","DOI":"10.1080\/01621459.1987.10478410","article-title":"Better Bootstrap Confidence Intervals","volume":"82","author":"B Efron","year":"1987","journal-title":"Journal of the American Statistical Association"},{"doi-asserted-by":"crossref","unstructured":"Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining\u2014KDD\u201916. 2016; p. 785\u2013794.","key":"pcbi.1010963.ref028","DOI":"10.1145\/2939672.2939785"},{"issue":"1","key":"pcbi.1010963.ref029","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1016\/j.ajhg.2017.06.005","article-title":"10 Years of GWAS Discovery: Biology, Function, and Translation","volume":"101","author":"PM Visscher","year":"2017","journal-title":"American Journal of Human Genetics"},{"issue":"11","key":"pcbi.1010963.ref030","doi-asserted-by":"crossref","first-page":"1946","DOI":"10.1002\/sim.6082","article-title":"Multiple hypothesis testing in genomics","volume":"33","author":"JJ Goeman","year":"2014","journal-title":"Statistics in Medicine"},{"key":"pcbi.1010963.ref031","doi-asserted-by":"crossref","first-page":"199","DOI":"10.1038\/35075590","article-title":"Linkage disequilibrium in the human genome","volume":"411","author":"DE Reich","year":"2001","journal-title":"Nature"},{"issue":"4","key":"pcbi.1010963.ref032","doi-asserted-by":"crossref","first-page":"511","DOI":"10.1038\/hdy.2010.91","article-title":"Overview of techniques to account for confounding due to population stratification and cryptic relatedness in genomic data association analyses","volume":"106","author":"MJ Sillanp\u00e4\u00e4","year":"2011","journal-title":"Heredity"},{"issue":"15","key":"pcbi.1010963.ref033","doi-asserted-by":"crossref","first-page":"2595","DOI":"10.1093\/bioinformatics\/btv153","article-title":"PRROC: computing and visualizing precision-recall and receiver operating characteristic curves in R","volume":"31","author":"J Grau","year":"2015","journal-title":"Bioinformatics"},{"issue":"9","key":"pcbi.1010963.ref034","doi-asserted-by":"crossref","first-page":"1390","DOI":"10.1038\/s41591-019-0563-7","article-title":"Contribution of genetics to visceral adiposity and its relation to cardiovascular and metabolic disease","volume":"25","author":"T Karlsson","year":"2019","journal-title":"Nature medicine"},{"issue":"6","key":"pcbi.1010963.ref035","doi-asserted-by":"crossref","DOI":"10.1038\/s41588-020-0622-5","article-title":"Exploring and visualizing large-scale genetic associations by using PheWeb","volume":"52","author":"SA Gagliano Taliun","year":"2020","journal-title":"Nature Genetics"},{"issue":"7845","key":"pcbi.1010963.ref036","doi-asserted-by":"crossref","first-page":"290","DOI":"10.1038\/s41586-021-03205-y","article-title":"Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program","volume":"590","author":"D Taliun","year":"2021","journal-title":"Nature"},{"issue":"6","key":"pcbi.1010963.ref037","doi-asserted-by":"crossref","first-page":"991","DOI":"10.1161\/CIRCRESAHA.116.305697","article-title":"Obesity-Induced Hypertension","volume":"116","author":"JE Hall","year":"2015","journal-title":"Circulation Research"},{"key":"pcbi.1010963.ref038","first-page":"66","article-title":"Can We Trust the Bootstrap in High-dimensions? The Case of Linear Models","volume":"19","author":"NE Karoui","year":"2018","journal-title":"Journal of Machine Learning Research"},{"key":"pcbi.1010963.ref039","doi-asserted-by":"crossref","first-page":"287","DOI":"10.1002\/9781118555552","volume-title":"Computational Statistics","author":"GH Givens","year":"2012"},{"issue":"9","key":"pcbi.1010963.ref040","doi-asserted-by":"crossref","first-page":"581","DOI":"10.1038\/s41576-018-0018-x","article-title":"The personal and clinical utility of polygenic risk scores","volume":"19","author":"A Torkamani","year":"2018","journal-title":"Nature Reviews Genetics"},{"issue":"1","key":"pcbi.1010963.ref041","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s42003-022-03812-z","article-title":"Non-linear machine learning models incorporating SNPs and PRS improve polygenic prediction in diverse human populations","volume":"5","author":"M Elgart","year":"2022","journal-title":"Communications Biology"}],"updated-by":[{"updated":{"date-parts":[[2023,3,24]],"date-time":"2023-03-24T00:00:00Z","timestamp":1679616000000},"DOI":"10.1371\/journal.pcbi.1010963","type":"new_version","label":"New version"}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1010963","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,3,24]],"date-time":"2023-03-24T17:48:44Z","timestamp":1679680124000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1010963"}},"subtitle":[],"editor":[{"given":"Shaun","family":"Mahony","sequence":"first","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2023,3,14]]},"references-count":41,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2023,3,14]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1010963","relation":{},"ISSN":["1553-7358"],"issn-type":[{"type":"electronic","value":"1553-7358"}],"subject":[],"published":{"date-parts":[[2023,3,14]]}}}