{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,9,14]],"date-time":"2024-09-14T09:04:41Z","timestamp":1726304681707},"reference-count":51,"publisher":"Oxford University Press (OUP)","issue":"13","license":[{"start":{"date-parts":[[2016,10,2]],"date-time":"2016-10-02T00:00:00Z","timestamp":1475366400000},"content-version":"vor","delay-in-days":1201,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/3.0"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2013,7,1]]},"abstract":"Abstract<\/jats:title>Motivation: The DNA binding specificity of a transcription factor (TF) is typically represented using a position weight matrix model, which implicitly assumes that individual bases in a TF binding site contribute independently to the binding affinity, an assumption that does not always hold. For this reason, more complex models of binding specificity have been developed. However, these models have their own caveats: they typically have a large number of parameters, which makes them hard to learn and interpret.<\/jats:p>Results: We propose novel regression-based models of TF\u2013DNA binding specificity, trained using high resolution in vitro data from custom protein-binding microarray (PBM) experiments. Our PBMs are specifically designed to cover a large number of putative DNA binding sites for the TFs of interest (yeast TFs Cbf1 and Tye7, and human TFs c-Myc, Max and Mad2) in their native genomic context. These high-throughput quantitative data are well suited for training complex models that take into account not only independent contributions from individual bases, but also contributions from di- and trinucleotides at various positions within or near the binding sites. To ensure that our models remain interpretable, we use feature selection to identify a small number of sequence features that accurately predict TF\u2013DNA binding specificity. To further illustrate the accuracy of our regression models, we show that even in the case of paralogous TF with highly similar position weight matrices, our new models can distinguish the specificities of individual factors. Thus, our work represents an important step toward better sequence-based models of individual TF\u2013DNA binding specificity.<\/jats:p>Availability: Our code is available at http:\/\/genome.duke.edu\/labs\/gordan\/ISMB2013. The PBM data used in this article are available in the Gene Expression Omnibus under accession number GSE47026.<\/jats:p>Contact: raluca.gordan@duke.edu<\/jats:p>","DOI":"10.1093\/bioinformatics\/btt221","type":"journal-article","created":{"date-parts":[[2013,6,27]],"date-time":"2013-06-27T05:33:26Z","timestamp":1372311206000},"page":"i117-i125","source":"Crossref","is-referenced-by-count":45,"title":["Stability selection for regression-based models of transcription factor\u2013DNA binding specificity"],"prefix":"10.1093","volume":"29","author":[{"given":"Fantine","family":"Mordelet","sequence":"first","affiliation":[]},{"given":"John","family":"Horton","sequence":"additional","affiliation":[]},{"given":"Alexander J.","family":"Hartemink","sequence":"additional","affiliation":[]},{"given":"Barbara E.","family":"Engelhardt","sequence":"additional","affiliation":[]},{"given":"Raluca","family":"Gord\u00e2n","sequence":"additional","affiliation":[]}],"member":"286","published-online":{"date-parts":[[2013,6,19]]},"reference":[{"key":"2023062614325930300_btt221-B1","doi-asserted-by":"crossref","first-page":"e1000916","DOI":"10.1371\/journal.pcbi.1000916","article-title":"High resolution models of transcription factor-DNA affinities improve in vitro and in vivo binding predictions","volume":"6","author":"Agius","year":"2010","journal-title":"PLoS Comput. Biol."},{"key":"2023062614325930300_btt221-B2","doi-asserted-by":"crossref","first-page":"e20059","DOI":"10.1371\/journal.pone.0020059","article-title":"A linear model for transcription factor binding affinity prediction in protein binding microarrays","volume":"6","author":"Annala","year":"2011","journal-title":"PLoS One"},{"key":"2023062614325930300_btt221-B3","author":"Bach","year":"2008"},{"key":"2023062614325930300_btt221-B4","doi-asserted-by":"crossref","first-page":"1720","DOI":"10.1126\/science.1162327","article-title":"Diversity and complexity in DNA recognition by transcription factors","volume":"324","author":"Badis","year":"2009","journal-title":"Science"},{"key":"2023062614325930300_btt221-B5","author":"Barash","year":"2003"},{"key":"2023062614325930300_btt221-B6","doi-asserted-by":"crossref","first-page":"4442","DOI":"10.1093\/nar\/gkf578","article-title":"Additivity in protein-DNA interactions: how good an approximation is it?","volume":"30","author":"Benos","year":"2002","journal-title":"Nucleic Acids Res."},{"key":"2023062614325930300_btt221-B7","doi-asserted-by":"crossref","first-page":"1429","DOI":"10.1038\/nbt1246","article-title":"Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities","volume":"24","author":"Berger","year":"2006","journal-title":"Nat. Biotechnol."},{"key":"2023062614325930300_btt221-B8","doi-asserted-by":"crossref","first-page":"393","DOI":"10.1038\/nprot.2008.195","article-title":"Universal protein-binding microarrays for the comprehensive characterization of the DNA binding specificities of transcription factors","volume":"4","author":"Berger","year":"2009","journal-title":"Nat. Protoc."},{"key":"2023062614325930300_btt221-B9","doi-asserted-by":"crossref","first-page":"1255","DOI":"10.1093\/nar\/30.5.1255","article-title":"Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors","volume":"30","author":"Bulyk","year":"2002","journal-title":"Nucleic Acids Res."},{"key":"2023062614325930300_btt221-B10","doi-asserted-by":"crossref","first-page":"407","DOI":"10.1214\/009053604000000067","article-title":"Least angle regression","volume":"32","author":"Efron","year":"2004","journal-title":"Ann. Stat."},{"key":"2023062614325930300_btt221-B11","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1038\/nature11247","article-title":"An integrated encyclopedia of DNA elements in the human genome","volume":"489","author":"ENCODE Project Consortium","year":"2012","journal-title":"Nature"},{"key":"2023062614325930300_btt221-B12","doi-asserted-by":"crossref","first-page":"721","DOI":"10.1016\/j.devcel.2012.01.015","article-title":"Genetic and epigenetic determinants of neurogenesis and myogenesis","volume":"22","author":"Fong","year":"2012","journal-title":"Dev. Cell"},{"key":"2023062614325930300_btt221-B13","doi-asserted-by":"crossref","first-page":"2090","DOI":"10.1101\/gr.094144.109","article-title":"Distinguishing direct versus indirect transcription factor-DNA interactions","volume":"19","author":"Gord\u00e2n","year":"2009","journal-title":"Genome Res."},{"key":"2023062614325930300_btt221-B14","doi-asserted-by":"crossref","first-page":"R125","DOI":"10.1186\/gb-2011-12-12-r125","article-title":"Curated collection of yeast transcription factor DNA binding specificity data reveals novel structural and gene regulatory insights","volume":"12","author":"Gord\u00e2n","year":"2011","journal-title":"Genome Biol."},{"key":"2023062614325930300_btt221-B15","doi-asserted-by":"crossref","first-page":"1093","DOI":"10.1016\/j.celrep.2013.03.014","article-title":"Genomic regions flanking E-box binding sites influence DNA binding specificity of bHLH transcription factors through DNA shape","volume":"3","author":"Gord\u00e2n","year":"2013","journal-title":"Cell Rep."},{"key":"2023062614325930300_btt221-B16","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1038\/nature02800","article-title":"Transcriptional regulatory code of a eukaryotic genome","volume":"431","author":"Harbison","year":"2004","journal-title":"Nature"},{"key":"2023062614325930300_btt221-B17","doi-asserted-by":"crossref","first-page":"145","DOI":"10.1186\/1752-0509-6-145","article-title":"TIGRESS: Trustful inference of gene regulation using stability selection","volume":"6","author":"Haury","year":"2012","journal-title":"BMC Syst. Biol."},{"key":"2023062614325930300_btt221-B18","doi-asserted-by":"crossref","first-page":"39","DOI":"10.1042\/BJ20111768","article-title":"The crystal structure of the Sox4 HMG domain-DNA complex suggests a mechanism for positional interdependence in DNA recognition","volume":"443","author":"Jauch","year":"2012","journal-title":"Biochem. J."},{"key":"2023062614325930300_btt221-B19","doi-asserted-by":"crossref","first-page":"861","DOI":"10.1101\/gr.100552.109","article-title":"Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities","volume":"20","author":"Jolma","year":"2010","journal-title":"Genome Res."},{"key":"2023062614325930300_btt221-B20","doi-asserted-by":"crossref","first-page":"327","DOI":"10.1016\/j.cell.2012.12.009","article-title":"DNA binding specificities of human transcription factors","volume":"152","author":"Jolma","year":"2013","journal-title":"Cell"},{"key":"2023062614325930300_btt221-B21","doi-asserted-by":"crossref","first-page":"e1001290","DOI":"10.1371\/journal.pgen.1001290","article-title":"Quantitative models of the mechanisms that control genome-wide patterns of transcription factor binding during early Drosophila development","volume":"7","author":"Kaplan","year":"2011","journal-title":"PLoS Genet."},{"key":"2023062614325930300_btt221-B22","doi-asserted-by":"crossref","first-page":"27116","DOI":"10.1074\/jbc.M403818200","article-title":"Cbf1p is required for chromatin remodeling at promoter-proximal CACGTG motifs in yeast","volume":"279","author":"Kent","year":"2004","journal-title":"J. Biol. Chem."},{"key":"2023062614325930300_btt221-B23","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1016\/j.cell.2012.08.026","article-title":"Transcriptional amplification in tumor cells with elevated c-Myc","volume":"151","author":"Lin","year":"2012","journal-title":"Cell"},{"key":"2023062614325930300_btt221-B24","doi-asserted-by":"crossref","first-page":"113","DOI":"10.1186\/1471-2105-7-113","article-title":"An improved map of conserved regulatory sites for Saccharomyces cerevisiae","volume":"7","author":"MacIsaac","year":"2006","journal-title":"BMC Bioinformatics"},{"key":"2023062614325930300_btt221-B25","doi-asserted-by":"crossref","first-page":"233","DOI":"10.1126\/science.1131007","article-title":"A systems approach to measuring the binding energy landscapes of transcription factors","volume":"315","author":"Maerkl","year":"2007","journal-title":"Science"},{"key":"2023062614325930300_btt221-B26","first-page":"1","article-title":"Feature selection for support vector regression via kernel penalization","volume-title":"IJCNN 2010","author":"Maldonado","year":"2010"},{"key":"2023062614325930300_btt221-B27","doi-asserted-by":"crossref","first-page":"2471","DOI":"10.1093\/nar\/29.12.2471","article-title":"Non-independence of Mnt repressor-operator interaction determined by a new quantitative multiple fluorescence relative affinity (QuMFRA) assay","volume":"29","author":"Man","year":"2001","journal-title":"Nucleic Acids Res."},{"key":"2023062614325930300_btt221-B28","doi-asserted-by":"crossref","first-page":"417","DOI":"10.1111\/j.1467-9868.2010.00740.x","article-title":"Stability selection","volume":"72","author":"Meinshausen","year":"2010","journal-title":"J. R. Stat. Soc. Ser. B"},{"key":"2023062614325930300_btt221-B29","doi-asserted-by":"crossref","first-page":"145","DOI":"10.1007\/978-3-642-37195-0_12","article-title":"Distinguishing between genomic regions bound by paralogous transcription factors","volume":"7821","author":"Munteanu","year":"2013","journal-title":"Recomb2013. Lect. Notes Comp. Sci."},{"key":"2023062614325930300_btt221-B30","doi-asserted-by":"crossref","first-page":"584","DOI":"10.1016\/j.patcog.2009.09.003","article-title":"Optimal feature selection for support vector machines","volume":"43","author":"Nguyen","year":"2010","journal-title":"Pattern Recogn."},{"key":"2023062614325930300_btt221-B31","doi-asserted-by":"crossref","first-page":"2646","DOI":"10.1128\/MCB.15.5.2646","article-title":"The GCR1 requirement for yeast glycolytic gene expression is suppressed by dominant mutations in the SGC1 gene, which encodes a novel basic-helix-loop-helix protein","volume":"15","author":"Nishi","year":"1995","journal-title":"Mol. Cell. Biol."},{"key":"2023062614325930300_btt221-B32","doi-asserted-by":"crossref","first-page":"447","DOI":"10.1101\/gr.112623.110","article-title":"Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data","volume":"21","author":"Pique-Regi","year":"2011","journal-title":"Genome Res."},{"key":"2023062614325930300_btt221-B33","doi-asserted-by":"crossref","first-page":"233","DOI":"10.1146\/annurev-biochem-060408-091030","article-title":"Origins of specificity in protein-DNA recognition","volume":"79","author":"Rohs","year":"2010","journal-title":"Annu. Rev. Biochem."},{"key":"2023062614325930300_btt221-B34","doi-asserted-by":"crossref","first-page":"e1000154","DOI":"10.1371\/journal.pcbi.1000154","article-title":"A feature-based approach to modeling protein-DNA interactions","volume":"4","author":"Sharon","year":"2008","journal-title":"PLoS Comput. Biol."},{"key":"2023062614325930300_btt221-B35","doi-asserted-by":"crossref","first-page":"e9722","DOI":"10.1371\/journal.pone.0009722","article-title":"Dinucleotide weight matrices for predicting transcription factor binding sites: generalizing the position weight matrix","volume":"5","author":"Siddharthan","year":"2010","journal-title":"PLoS One"},{"key":"2023062614325930300_btt221-B36","doi-asserted-by":"crossref","first-page":"199","DOI":"10.1023\/B:STCO.0000035301.49549.88","article-title":"A tutorial on support vector regression","volume":"14","author":"Smola","year":"2004","journal-title":"Stat. Comput."},{"issue":"1 Pt 2","key":"2023062614325930300_btt221-B37","doi-asserted-by":"crossref","first-page":"505","DOI":"10.1093\/nar\/12.1Part2.505","article-title":"Computer methods to locate signals in nucleic acid sequences","volume":"12","author":"Staden","year":"1984","journal-title":"Nucleic Acids Res."},{"key":"2023062614325930300_btt221-B38","doi-asserted-by":"crossref","first-page":"16","DOI":"10.1093\/bioinformatics\/16.1.16","article-title":"DNA binding sites: representation and discovery","volume":"16","author":"Stormo","year":"2000","journal-title":"Bioinformatics"},{"key":"2023062614325930300_btt221-B39","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1111\/j.2517-6161.1996.tb02080.x","article-title":"Regression shrinkage and selection via the lasso","volume":"58","author":"Tibshirani","year":"1996","journal-title":"J. R. Stat. Soc. Ser. B"},{"key":"2023062614325930300_btt221-B40","doi-asserted-by":"crossref","first-page":"933","DOI":"10.1093\/bioinformatics\/btm055","article-title":"Position dependencies in transcription factor binding sites","volume":"23","author":"Tomovic","year":"2007","journal-title":"Bioinformatics"},{"key":"2023062614325930300_btt221-B41","volume-title":"Statistical Learning Theory","author":"Vapnik","year":"1998"},{"key":"2023062614325930300_btt221-B42","doi-asserted-by":"crossref","first-page":"126","DOI":"10.1038\/nbt.2486","article-title":"Evaluation of methods for modeling transcription-factor sequence specificity","volume":"31","author":"Weirauch","year":"2013","journal-title":"Nat. Biotechnol."},{"key":"2023062614325930300_btt221-B43","doi-asserted-by":"crossref","first-page":"W389","DOI":"10.1093\/nar\/gki439","article-title":"enoLOGOS: a versatile web tool for energy normalized sequence logos","volume":"33","author":"Workman","year":"2005","journal-title":"Nucleic Acids Res."},{"key":"2023062614325930300_btt221-B44","first-page":"343","article-title":"Feature selection for support vector regression using probabilistic prediction","volume-title":"ACM SIGKDD","author":"Yang","year":"2010"},{"key":"2023062614325930300_btt221-B45","doi-asserted-by":"crossref","first-page":"480","DOI":"10.1038\/nbt.1893","article-title":"Quantitative analysis demonstrates most transcription factors require only simple models of specificity","volume":"29","author":"Zhao","year":"2011","journal-title":"Nat. Biotechnol."},{"key":"2023062614325930300_btt221-B46","doi-asserted-by":"crossref","first-page":"e1000590","DOI":"10.1371\/journal.pcbi.1000590","article-title":"Inferring binding energies from selected binding sites","volume":"5","author":"Zhao","year":"2009","journal-title":"PLoS Comput. Biol."},{"key":"2023062614325930300_btt221-B47","doi-asserted-by":"crossref","first-page":"781","DOI":"10.1534\/genetics.112.138685","article-title":"Improved models for transcription factor binding site identification using nonindependent interactions","volume":"191","author":"Zhao","year":"2012","journal-title":"Genetics"},{"key":"2023062614325930300_btt221-B48","doi-asserted-by":"crossref","first-page":"909","DOI":"10.1093\/bioinformatics\/bth006","article-title":"Modeling within-motif dependence for transcription factor binding site predictions","volume":"20","author":"Zhou","year":"2004","journal-title":"Bioinformatics"},{"key":"2023062614325930300_btt221-B49","doi-asserted-by":"crossref","first-page":"826","DOI":"10.1016\/j.molcel.2011.05.025","article-title":"Integrated approaches reveal determinants of genome-wide binding and function of the transcription factor Pho4","volume":"42","author":"Zhou","year":"2011","journal-title":"Mol. Cell"},{"key":"2023062614325930300_btt221-B50","doi-asserted-by":"crossref","first-page":"556","DOI":"10.1101\/gr.090233.108","article-title":"High-resolution DNA binding specificity analysis of yeast transcription factors","volume":"19","author":"Zhu","year":"2009","journal-title":"Genome Res."},{"key":"2023062614325930300_btt221-B51","doi-asserted-by":"crossref","first-page":"7046","DOI":"10.1073\/pnas.88.16.7046","article-title":"Static and statistical bending of DNA evaluated by Monte Carlo simulations","volume":"88","author":"Zhurkin","year":"1991","journal-title":"Proc. Natl Acad. Sci. USA"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/29\/13\/i117\/50703570\/bioinformatics_29_13_i117.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/29\/13\/i117\/50703570\/bioinformatics_29_13_i117.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,13]],"date-time":"2024-05-13T02:35:31Z","timestamp":1715567731000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/29\/13\/i117\/192242"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,6,19]]},"references-count":51,"journal-issue":{"issue":"13","published-print":{"date-parts":[[2013,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btt221","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2013,7]]},"published":{"date-parts":[[2013,6,19]]}}}