{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,4,23]],"date-time":"2024-04-23T09:55:40Z","timestamp":1713866140779},"reference-count":23,"publisher":"Oxford University Press (OUP)","license":[{"start":{"date-parts":[[2019,9,11]],"date-time":"2019-09-11T00:00:00Z","timestamp":1568160000000},"content-version":"vor","delay-in-days":253,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000865","name":"Bill & Melinda Gates Foundation","doi-asserted-by":"publisher","award":["E-SPACE (1504-004)"],"id":[{"id":"10.13039\/100000865","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,1,1]]},"abstract":"Abstract<\/jats:title>Motivation<\/jats:title>With high-throughput genotyping systems now available, it has become feasible to fully integrate genotyping information into breeding programs. To make use of this information effectively requires DNA extraction facilities and marker production facilities that can efficiently deploy the desired set of markers across samples with a rapid turnaround time that allows for selection before crosses needed to be made. In reality, breeders often have a short window of time to make decisions by the time they are able to collect all their phenotyping data and receive corresponding genotyping data. This presents a challenge to organize information and utilize it in downstream analyses to support decisions made by breeders. In order to implement genomic selection routinely as part of breeding programs, one would need an efficient genotyping data storage system. We selected and benchmarked six popular open-source data storage systems, including relational database management and columnar storage systems.<\/jats:p><\/jats:sec>Results<\/jats:title>We found that data extract times are greatly influenced by the orientation in which genotype data is stored in a system. HDF5 consistently performed best, in part because it can more efficiently work with both orientations of the allele matrix.<\/jats:p><\/jats:sec>Availability<\/jats:title>http:\/\/gobiin1.bti.cornell.edu:6083\/projects\/GBM\/repos\/benchmarking\/browse<\/jats:p><\/jats:sec>","DOI":"10.1093\/database\/baz096","type":"journal-article","created":{"date-parts":[[2019,9,11]],"date-time":"2019-09-11T09:38:37Z","timestamp":1568194717000},"source":"Crossref","is-referenced-by-count":5,"title":["Benchmarking database systems for Genomic Selection implementation"],"prefix":"10.1093","volume":"2019","author":[{"ORCID":"http:\/\/orcid.org\/0000-0001-8398-9982","authenticated-orcid":false,"given":"Yaw","family":"Nti-Addae","sequence":"first","affiliation":[{"name":"Institute of Biotechnology, Cornell University"}]},{"given":"Dave","family":"Matthews","sequence":"additional","affiliation":[{"name":"Boyce Thompson Institute"}]},{"given":"Victor Jun","family":"Ulat","sequence":"additional","affiliation":[{"name":"Centro Internacional de Mejoramiento de Ma\u00edz y Trigo (CIMMYT)"}]},{"given":"Raza","family":"Syed","sequence":"additional","affiliation":[{"name":"Institute of Biotechnology, Cornell University"}]},{"given":"Guilhem","family":"Semp\u00e9r\u00e9","sequence":"additional","affiliation":[{"name":"INTERTRYP, Univ Montpellier, CIRAD, IRD"}]},{"given":"Adrien","family":"P\u00e9tel","sequence":"additional","affiliation":[{"name":"UMR PVBMT, CIRAD"}]},{"given":"Jon","family":"Renner","sequence":"additional","affiliation":[{"name":"University of Minnesota"}]},{"given":"Pierre","family":"Larmande","sequence":"additional","affiliation":[{"name":"UMR DIADE, IRD, University of Montpellier"}]},{"given":"Valentin","family":"Guignon","sequence":"additional","affiliation":[{"name":"Bioversity International"}]},{"given":"Elizabeth","family":"Jones","sequence":"additional","affiliation":[{"name":"Institute of Biotechnology, Cornell University"}]},{"given":"Kelly","family":"Robbins","sequence":"additional","affiliation":[{"name":"Section of Plant Breeding and Genetics, School of Integrative Plants Sciences, Cornell University"}]}],"member":"286","published-online":{"date-parts":[[2019,9,11]]},"reference":[{"key":"2019091105383209400_ref1","doi-asserted-by":"crossref","first-page":"1819 LP","DOI":"10.1093\/genetics\/157.4.1819","article-title":"Prediction of Total Genetic Value Using Genome-Wide Dense Marker Maps","volume":"157","author":"Meuwissen","year":"2001","journal-title":"Genetics"},{"key":"2019091105383209400_ref2","doi-asserted-by":"crossref","first-page":"1297","DOI":"10.1038\/ng.3920","article-title":"Genomic prediction unifies animal and plant breeding programs to form platforms for biological discovery","volume":"49","author":"Hickey","year":"2017","journal-title":"Nat. Genet."},{"key":"2019091105383209400_ref3","doi-asserted-by":"crossref","first-page":"1177","DOI":"10.1071\/CP13363","article-title":"Genomic selection in crops, trees and forages: a review","volume":"65","author":"Lin","year":"2014","journal-title":"Crop Pasture Sci."},{"key":"2019091105383209400_ref4","doi-asserted-by":"crossref","first-page":"S3","DOI":"10.1186\/1471-2164-15-S8-S3","article-title":"High dimensional biological data retrieval optimization with NoSQL technology","volume":"15","author":"Wang","year":"2014","journal-title":"BMC genomics"},{"key":"2019091105383209400_ref5","article-title":"Data management for high-throughput genomics","volume-title":"arXiv Prepr","author":"R\u00f6hm","year":"2009"},{"key":"2019091105383209400_ref6","doi-asserted-by":"crossref","first-page":"1458","DOI":"10.1093\/bioinformatics\/btq164","article-title":"The Genomedata format for storing large-scale functional genomics data","volume":"26","author":"Hoffman","year":"2010","journal-title":"Bioinformatics"},{"key":"2019091105383209400_ref7","doi-asserted-by":"crossref","first-page":"D1023","DOI":"10.1093\/nar\/gku1039","article-title":"SNP-Seek database of SNPs derived from 3000 rice genomes","volume":"43","author":"Alexandrov","year":"2014","journal-title":"Nucleic Acids Res."},{"key":"2019091105383209400_ref8","doi-asserted-by":"crossref","first-page":"185","DOI":"10.1007\/s13222-015-0198-x","article-title":"Genome sequence analysis with MonetDB","volume":"15","author":"Cijvat","year":"2015","journal-title":"Datenbank-Spektrum"},{"key":"2019091105383209400_ref9","doi-asserted-by":"crossref","DOI":"10.1109\/BIBM.2015.7359902","article-title":"A study of genomic data provenance in NoSQL document-oriented database systems","author":"Guimaraes","year":"2015"},{"issue":"1","key":"2019091105383209400_ref10","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.ygeno.2012.05.006","article-title":"Relax with CouchDB\u2014Into the non-relational DBMS era of bioinformatics","volume":"100","author":"Manyam","year":"2012","journal-title":"Genomics"},{"key":"2019091105383209400_ref11","doi-asserted-by":"crossref","first-page":"415","DOI":"10.1186\/s12859-015-0861-0","article-title":"BigQ: a NoSQL based framework to handle genomic variants in i2b2","volume":"16","author":"Gabetta","year":"2015","journal-title":"BMC Bioinformatics"},{"key":"2019091105383209400_ref12","doi-asserted-by":"crossref","first-page":"288","DOI":"10.1016\/j.jbi.2016.10.015","article-title":"Evaluation of relational and NoSQL database architectures to manage genomic annotations","volume":"64","author":"Schulz","year":"2016","journal-title":"J. Biomed. Inform."},{"key":"2019091105383209400_ref13","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1145\/2465848.2465849","article-title":"Performance evaluation of a mongodb and hadoop platform for scientific data analysis","volume-title":"Proceedings of the 4th ACM workshop on Scientific cloud computing","author":"Dede","year":"2013"},{"key":"2019091105383209400_ref14","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1186\/s13742-016-0131-8","article-title":"Gigwa\u2014Genotype investigator for genome-wide analyses","volume":"5","author":"Semp\u00e9r\u00e9","year":"2016","journal-title":"Gigascience"},{"key":"2019091105383209400_ref15","doi-asserted-by":"crossref","first-page":"737","DOI":"10.1126\/science.1174320","article-title":"Genetic properties of the maize nested association mapping population","volume":"325","author":"McMullen","year":"2009","journal-title":"Science (80-.)"},{"key":"2019091105383209400_ref16","doi-asserted-by":"crossref","first-page":"e90346","DOI":"10.1371\/journal.pone.0090346","article-title":"TASSEL-GBS: a high capacity genotyping by sequencing analysis pipeline","volume":"9","author":"Glaubitz","year":"2014","journal-title":"PLoS One"},{"key":"2019091105383209400_ref17","volume-title":"Elasticsearch: The Definitive Guide: A Distributed Real-Time Search and Analytics Engine","author":"Gormley","year":"2015"},{"key":"2019091105383209400_ref18","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1145\/2934664","article-title":"Apache spark: a unified engine for big data processing","volume":"59","author":"Zaharia","year":"2016","journal-title":"Commun. ACM"},{"key":"2019091105383209400_ref19","first-page":"gix134","article-title":"Construction of the third-generation Zea mays haplotype map","author":"Bukowski","year":"2018","journal-title":"GigaScience"},{"key":"2019091105383209400_ref20","volume-title":"Advances in Computational Biology. Advances in Experimental Medicine and Biology","author":"Mason","year":"2010"},{"key":"2019091105383209400_ref21","doi-asserted-by":"crossref","article-title":"Poretools: a toolkit for analyzing nanopore sequence data","author":"Loman","DOI":"10.1093\/bioinformatics\/btu555"},{"key":"2019091105383209400_ref22","author":"Thomson","year":"2014"},{"key":"2019091105383209400_ref23","author":"","year":"2016"}],"container-title":["Database"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/database\/article-pdf\/doi\/10.1093\/database\/baz096\/29956991\/baz096.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,9,28]],"date-time":"2022-09-28T05:23:20Z","timestamp":1664342600000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/database\/article\/doi\/10.1093\/database\/baz096\/5566651"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,1,1]]},"references-count":23,"URL":"https:\/\/doi.org\/10.1093\/database\/baz096","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/519017","asserted-by":"object"}]},"ISSN":["1758-0463"],"issn-type":[{"value":"1758-0463","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2019]]},"published":{"date-parts":[[2019,1,1]]}}}