{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,4,9]],"date-time":"2025-04-09T05:52:58Z","timestamp":1744177978090,"version":"3.37.3"},"reference-count":46,"publisher":"Oxford University Press (OUP)","issue":"4","license":[{"start":{"date-parts":[[2021,11,18]],"date-time":"2021-11-18T00:00:00Z","timestamp":1637193600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Lifelines Biobank"},{"name":"FES (Fonds Economische Structuurversterking), SNN"},{"name":"REP"},{"DOI":"10.13039\/501100003246","name":"Dutch Research Council","doi-asserted-by":"publisher","award":["ZonMW-VIDI 917.14.374"],"id":[{"id":"10.13039\/501100003246","id-type":"DOI","asserted-by":"publisher"}]},{"name":"European Research Council) Starting Grant","award":["637640"]},{"DOI":"10.13039\/501100001826","name":"The Netherlands Organisation for Health Research and Development (ZonMw","doi-asserted-by":"crossref","award":["09150161910057"],"id":[{"id":"10.13039\/501100001826","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,1,27]]},"abstract":"Abstract<\/jats:title>\n \n Motivation<\/jats:title>\n Identifying sample mix-ups in biobanks is essential to allow the repurposing of genetic data for clinical pharmacogenetics. Pharmacogenetic advice based on the genetic information of another individual is potentially harmful. Existing methods for identifying mix-ups are limited to datasets in which additional omics data (e.g. gene expression) is available. Cohorts lacking such data can only use sex, which can reveal only half of the mix-ups. Here, we describe Id\u00e9fix, a method for the identification of accidental sample mix-ups in biobanks using polygenic scores.<\/jats:p>\n <\/jats:sec>\n \n Results<\/jats:title>\n In the Lifelines population-based biobank, we calculated polygenic scores (PGSs) for 25 traits for 32\u00a0786 participants.\u00a0We then applied Id\u00e9fix to compare the actual phenotypes to PGSs, and to use the relative discordance that is expected for mix-ups, compared to correct samples. In a simulation, using induced mix-ups, Id\u00e9fix reaches an AUC of 0.90 using 25 polygenic scores and sex. This is a substantial improvement over using only sex, which has an AUC of 0.75. Subsequent simulations present Id\u00e9fix\u2019s potential in varying datasets with more powerful PGSs. This suggests its performance will likely improve when more highly powered GWASs for commonly measured traits will become available. Id\u00e9fix can be used to identify a set of high-quality participants for whom it is very unlikely that they reflect sample mix-ups, and for these participants we can use genetic data for clinical purposes, such as pharmacogenetic profiles. For instance, in Lifelines, we can select 34.4% of participants, reducing the sample mix-up rate from 0.15% to 0.01%.<\/jats:p>\n <\/jats:sec>\n \n Availabilityand implementation<\/jats:title>\n Id\u00e9fix is freely available at https:\/\/github.com\/molgenis\/systemsgenetics\/wiki\/Idefix. The individual-level data that support the findings were obtained from the Lifelines biobank under project application number ov16_0365. Data is made available upon reasonable request submitted to the LifeLines Research office (research@lifelines.nl, https:\/\/www.lifelines.nl\/researcher\/how-to-apply\/apply-here).<\/jats:p>\n <\/jats:sec>\n \n Supplementary information<\/jats:title>\n Supplementary data are available at Bioinformatics online.<\/jats:p>\n <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btab783","type":"journal-article","created":{"date-parts":[[2021,11,15]],"date-time":"2021-11-15T13:48:17Z","timestamp":1636984097000},"page":"1059-1066","source":"Crossref","is-referenced-by-count":4,"title":["Id\u00e9fix: identifying accidental sample mix-ups in biobanks using polygenic scores"],"prefix":"10.1093","volume":"38","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8691-0053","authenticated-orcid":false,"given":"Robert","family":"Warmerdam","sequence":"first","affiliation":[{"name":"Department of Genetics, University Medical Center Groningen, University of Groningen , 9700AB Groningen, The Netherlands"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7288-4445","authenticated-orcid":false,"given":"Pauline","family":"Lanting","sequence":"additional","affiliation":[{"name":"Department of Genetics, University Medical Center Groningen, University of Groningen , 9700AB Groningen, The Netherlands"}]},{"name":"Lifelines Cohort Study","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5654-3966","authenticated-orcid":false,"given":"Patrick","family":"Deelen","sequence":"additional","affiliation":[{"name":"Department of Genetics, University Medical Center Groningen, University of Groningen , 9700AB Groningen, The Netherlands"},{"name":"Department of Genetics, University Medical Center Utrecht , 3508GA Utrecht, The Netherlands"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5159-8802","authenticated-orcid":false,"given":"Lude","family":"Franke","sequence":"additional","affiliation":[{"name":"Department of Genetics, University Medical Center Groningen, University of Groningen , 9700AB Groningen, The Netherlands"}]}],"member":"286","published-online":{"date-parts":[[2021,11,18]]},"reference":[{"key":"2023020108532723700_btab783-B2","doi-asserted-by":"crossref","first-page":"287","DOI":"10.1159\/000194981","article-title":"When a case is not a case: effects of phenotype misclassification on power and sample size requirements for the transmission disequilibrium test with affected child trios","volume":"67","author":"Buyske","year":"2009","journal-title":"Hum. Hered"},{"key":"2023020108532723700_btab783-B3","doi-asserted-by":"crossref","first-page":"1266","DOI":"10.1002\/humu.23265","article-title":"Matching phenotypes to whole genomes: lessons learned from four iterations of the personal genome project community challenges","volume":"38","author":"Cai","year":"2017","journal-title":"Hum. Mutat"},{"key":"2023020108532723700_btab783-B4","doi-asserted-by":"crossref","first-page":"1593","DOI":"10.1038\/s41588-018-0248-z","article-title":"An atlas of genetic associations in UK Biobank","volume":"50","author":"Canela-Xandri","year":"2018","journal-title":"Nat. Genet"},{"key":"2023020108532723700_btab783-B5","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1186\/s13742-015-0047-8","article-title":"Second-generation PLINK: rising to the challenge of larger and richer datasets","volume":"4","author":"Chang","year":"2015","journal-title":"GigaScience"},{"key":"2023020108532723700_btab783-B6","doi-asserted-by":"publisher","first-page":"46","DOI":"10.1016\/j.ajhg.2020.05.004","article-title":"Non-parametric Polygenic Risk Prediction via Partitioned GWAS Summary Statistics","volume":"107","author":"Chun","year":"2020","journal-title":"American Journal of Human Genetics"},{"key":"2023020108532723700_btab783-B7","doi-asserted-by":"crossref","first-page":"827","DOI":"10.1056\/NEJMc0904266","article-title":"Codeine, ultrarapid-metabolism genotype, and postoperative death","volume":"361","author":"Ciszkowski","year":"2009","journal-title":"N. Engl. J. Med"},{"key":"2023020108532723700_btab783-B8","doi-asserted-by":"crossref","first-page":"42","DOI":"10.1038\/s41588-017-0014-7","article-title":"Multiancestry association study identifies new asthma risk loci that colocalize with immune-cell enhancer marks","volume":"50","author":"Demenais","year":"2018","journal-title":"Nat. Genet"},{"key":"2023020108532723700_btab783-B9","doi-asserted-by":"crossref","first-page":"e0182438","DOI":"10.1371\/journal.pone.0182438","article-title":"A SNP panel and online tool for checking genotype concordance through comparing QR codes","volume":"12","author":"Du","year":"2017","journal-title":"PLoS One"},{"key":"2023020108532723700_btab783-B10","doi-asserted-by":"crossref","first-page":"e1003348","DOI":"10.1371\/journal.pgen.1003348","article-title":"Power and predictive accuracy of polygenic risk scores","volume":"9","author":"Dudbridge","year":"2013","journal-title":"PLOS Genet"},{"key":"2023020108532723700_btab783-B11","first-page":"648","article-title":"Assessing the accuracy of observer-reported ancestry in a biorepository linked to electronic medical records","volume":"12","author":"Dumitrescu","year":"2010","journal-title":"Genet. Med. Off. J. Am. Coll. Med. Genet"},{"key":"2023020108532723700_btab783-B12","doi-asserted-by":"crossref","first-page":"3328","DOI":"10.1038\/s41467-019-11112-0","article-title":"Analysis of polygenic risk score usage and performance in diverse human populations","volume":"10","author":"Duncan","year":"2019","journal-title":"Nat. Commun"},{"key":"2023020108532723700_btab783-B13","doi-asserted-by":"crossref","DOI":"10.1101\/185330","article-title":"Major flaws in \u201cIdentification of individuals by trait prediction using whole-genome sequencing data\u201d","author":"Erlich","year":"2017"},{"key":"2023020108532723700_btab783-B14","doi-asserted-by":"crossref","first-page":"1412","DOI":"10.1038\/s41588-018-0205-x","article-title":"Genetic analysis of over 1 million people identifies 535 new loci associated with blood pressure traits","volume":"50","author":"Evangelou","year":"2018","journal-title":"Nat. Genet"},{"year":"2018","author":"Fialkowski","key":"2023020108532723700_btab783-B15"},{"key":"2023020108532723700_btab783-B16","doi-asserted-by":"crossref","first-page":"2827","DOI":"10.1056\/NEJMoa041888","article-title":"Codeine intoxication associated with ultrarapid CYP2D6 metabolism","volume":"351","author":"Gasche","year":"2004","journal-title":"N. Engl. J. Med"},{"key":"2023020108532723700_btab783-B17","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41467-019-09718-5","article-title":"Polygenic prediction via Bayesian regression and continuous shrinkage priors","volume":"10","author":"Ge","year":"2019","journal-title":"Nat. Commun"},{"key":"2023020108532723700_btab783-B18","doi-asserted-by":"crossref","first-page":"723","DOI":"10.1007\/s10549-019-05345-2","article-title":"Survival after bilateral risk-reducing mastectomy in healthy BRCA1 and BRCA2 mutation carriers","volume":"177","author":"Heemskerk-Gerritsen","year":"2019","journal-title":"Breast Cancer Res. Treat"},{"key":"2023020108532723700_btab783-B19","doi-asserted-by":"crossref","first-page":"597","DOI":"10.1007\/s00439-010-0880-x","article-title":"Using public control genotype data to increase power and decrease cost of case\u2013control genetic association studies","volume":"128","author":"Ho","year":"2010","journal-title":"Hum. Genet"},{"key":"2023020108532723700_btab783-B20","doi-asserted-by":"crossref","first-page":"401","DOI":"10.1038\/s41588-018-0064-5","article-title":"A large electronic-health-record-based genome-wide study of serum lipids","volume":"50","author":"Hoffmann","year":"2018","journal-title":"Nat. Genet"},{"key":"2023020108532723700_btab783-B21","doi-asserted-by":"crossref","first-page":"e1007522","DOI":"10.1371\/journal.pcbi.1007522","article-title":"DRAMS: a tool to detect and re-align mixed-up samples for integrative studies of multi-omics data","volume":"16","author":"Jiang","year":"2020","journal-title":"PLOS Comput. Biol"},{"key":"2023020108532723700_btab783-B22","doi-asserted-by":"crossref","first-page":"1112","DOI":"10.1038\/s41588-018-0147-3","article-title":"Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals","volume":"50","author":"Lee","year":"2018","journal-title":"Nat. Genet"},{"key":"2023020108532723700_btab783-B23","doi-asserted-by":"crossref","first-page":"617","DOI":"10.1002\/cpt.1665","article-title":"Repurposing of diagnostic whole exome sequencing data of 1,583 individuals for clinical pharmacogenetics","volume":"107","author":"Lee","year":"2020","journal-title":"Clin. Pharmacol. Ther"},{"key":"2023020108532723700_btab783-B24","doi-asserted-by":"crossref","first-page":"604","DOI":"10.7326\/0003-4819-150-9-200905050-00006","article-title":"A new equation to estimate glomerular filtration rate","volume":"150","author":"Levey","year":"2009","journal-title":"Ann. Intern. Med"},{"key":"2023020108532723700_btab783-B25","doi-asserted-by":"publisher","first-page":"10166","DOI":"10.1073\/pnas.1711125114","article-title":"Identification of individuals by trait prediction using whole-genome sequencing data","volume":"114","author":"Lippert","year":"2017","journal-title":"Proceedings of the National Academy of Sciences of the United States of America"},{"key":"2023020108532723700_btab783-B26","doi-asserted-by":"crossref","first-page":"562","DOI":"10.1016\/j.clinbiochem.2017.02.004","article-title":"Managing the patient identification crisis in healthcare and laboratory medicine","volume":"50","author":"Lippi","year":"2017","journal-title":"Clin. Biochem"},{"key":"2023020108532723700_btab783-B46"},{"key":"2023020108532723700_btab783-B27","doi-asserted-by":"crossref","first-page":"1505","DOI":"10.1038\/s41588-018-0241-6","article-title":"Fine-mapping type 2 diabetes loci to single-variant resolution using high-density imputation and islet-specific epigenome maps","volume":"50","author":"Mahajan","year":"2018","journal-title":"Nat. Genet"},{"key":"2023020108532723700_btab783-B28","doi-asserted-by":"crossref","first-page":"186","DOI":"10.1038\/nature21039","article-title":"Rare and low-frequency coding variants alter human adult height","volume":"542","author":"Marouli","year":"2017","journal-title":"Nature"},{"volume-title":"Returning Individual Research Results to Participants: Guidance for a New Research Paradigm","year":"2018","author":"Downey","key":"2023020108532723700_btab783-B29"},{"key":"2023020108532723700_btab783-B30","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1038\/nature17671","article-title":"Genome-wide association study identifies 74 loci associated with educational attainment","volume":"533","author":"Okbay","year":"2016","journal-title":"Nature"},{"author":"Purcell","key":"2023020108532723700_btab783-B31"},{"key":"2023020108532723700_btab783-B32","doi-asserted-by":"crossref","first-page":"77","DOI":"10.1186\/1471-2105-12-77","article-title":"pROC: an open-source package for R and S+ to analyze and compare ROC curves","volume":"12","author":"Robin","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2023020108532723700_btab783-B59967953"},{"key":"2023020108532723700_btab783-B33","doi-asserted-by":"crossref","first-page":"486","DOI":"10.1016\/j.tig.2009.09.008","article-title":"Detecting new neurodegenerative disease genes: does phenotype accuracy limit the horizon?","volume":"25","author":"Samuels","year":"2009","journal-title":"Trends Genet"},{"author":"Smail","key":"2023020108532723700_btab783-B34"},{"key":"2023020108532723700_btab783-B35","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1007\/s10654-007-9204-4","article-title":"Universal risk factors for multifactorial diseases: lifeLines: a three-generation population-based study","volume":"23","author":"Stolk","year":"2008","journal-title":"Eur. J. Epidemiol"},{"key":"2023020108532723700_btab783-B36","doi-asserted-by":"crossref","first-page":"449","DOI":"10.2217\/pgs.10.14","article-title":"Amelogenin-based sex identification as a strategy to control the identity of DNA samples in genetic association studies","volume":"11","author":"Tzvetkov","year":"2010","journal-title":"Pharmacogenomics"},{"key":"2023020108532723700_btab783-B37","doi-asserted-by":"crossref","first-page":"433","DOI":"10.1161\/CIRCRESAHA.117.312086","article-title":"Identification of 64 novel genetic loci provides an expanded view on the genetic architecture of coronary artery disease","volume":"122","author":"Van der Harst","year":"2018","journal-title":"Circ. Res"},{"key":"2023020108532723700_btab783-B38","doi-asserted-by":"crossref","DOI":"10.1007\/978-0-387-21706-2","volume-title":"Modern Applied Statistics with S Fourth","author":"Venables","year":"2002"},{"key":"2023020108532723700_btab783-B39","doi-asserted-by":"crossref","first-page":"1214","DOI":"10.1016\/j.cell.2020.08.008","article-title":"The polygenic and monogenic basis of blood traits and diseases","volume":"182","author":"Vuckovic","year":"2020","journal-title":"Cell"},{"key":"2023020108532723700_btab783-B40","doi-asserted-by":"crossref","first-page":"2104","DOI":"10.1093\/bioinformatics\/btr323","article-title":"MixupMapper: correcting sample mix-ups in genome-wide datasets increases power to detect small genetic effects","volume":"27","author":"Westra","year":"2011","journal-title":"Bioinformatics"},{"key":"2023020108532723700_btab783-B41","doi-asserted-by":"crossref","first-page":"e1002383","DOI":"10.1371\/journal.pmed.1002383","article-title":"Impact of common genetic determinants of Hemoglobin A1c on type 2 diabetes risk and diagnosis in ancestrally diverse populations: a transethnic genome-wide meta-analysis","volume":"14","author":"Wheeler","year":"2017","journal-title":"PLoS Med"},{"key":"2023020108532723700_btab783-B42","doi-asserted-by":"crossref","first-page":"507","DOI":"10.1038\/nrg3457","article-title":"Pitfalls of predicting complex traits from SNPs","volume":"14","author":"Wray","year":"2013","journal-title":"Nat. Rev. Genet"},{"key":"2023020108532723700_btab783-B43","doi-asserted-by":"crossref","first-page":"957","DOI":"10.1038\/s41588-019-0407-x","article-title":"A catalog of genetic loci associated with kidney function from analyses of a million individuals","volume":"51","author":"Wuttke","year":"2019","journal-title":"Nat. Genet"},{"key":"2023020108532723700_btab783-B44","doi-asserted-by":"crossref","first-page":"3641","DOI":"10.1093\/hmg\/ddy271","article-title":"Meta-analysis of genome-wide association studies for height and body mass index in \u223c700000 individuals of European ancestry","volume":"27","author":"Yengo","year":"2018","journal-title":"Hum. Mol. Genet"},{"key":"2023020108532723700_btab783-B45","doi-asserted-by":"crossref","first-page":"869","DOI":"10.1002\/sim.1976","article-title":"The impact of diagnostic error on testing genetic association in case-control studies","volume":"24","author":"Zheng","year":"2005","journal-title":"Stat. Med"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btab783\/41818015\/btab783.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/4\/1059\/49008904\/btab783.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/4\/1059\/49008904\/btab783.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,1]],"date-time":"2023-02-01T20:14:47Z","timestamp":1675282487000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/38\/4\/1059\/6430970"}},"subtitle":[],"editor":[{"given":"Russell","family":"Schwartz","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2021,11,18]]},"references-count":46,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2022,1,27]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btab783","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2021.03.12.435080","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2022,2,15]]},"published":{"date-parts":[[2021,11,18]]}}}