{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,19]],"date-time":"2025-03-19T11:15:44Z","timestamp":1742382944645,"version":"3.32.0"},"reference-count":31,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2006,12]]},"abstract":"Abstract<\/jats:title>Background<\/jats:title>Although it is not difficult for state-of-the-art gene finders to identify coding regions in prokaryotic genomes, exact prediction of the corresponding translation initiation sites (TIS) is still a challenging problem. Recently a number of post-processing tools have been proposed for improving the annotation of prokaryotic TIS. However, inherent difficulties of these approaches arise from the considerable variation of TIS characteristics across different species. Therefore prior assumptions about the properties of prokaryotic gene starts may cause suboptimal predictions for newly sequenced genomes with TIS signals differing from those of well-investigated genomes.<\/jats:p><\/jats:sec>Results<\/jats:title>We introduce a clustering algorithm for completely unsupervised scoring of potential TIS, based on positionally smoothed probability matrices. The algorithm requires an initial gene prediction and the genomic sequence of the organism to perform the reannotation. As compared with other methods for improving predictions of gene starts in bacterial genomes, our approach is not based on any specific assumptions about prokaryotic TIS. Despite the generality of the underlying algorithm, the prediction rate of our method is competitive on experimentally verified test data fromE. coli<\/jats:italic>andB. subtilis<\/jats:italic>. Regarding genomes with high G+C content, in contrast to some previously proposed methods, our algorithm also provides good performance onP. aeruginosa<\/jats:italic>,B. pseudomallei<\/jats:italic>andR. solanacearum<\/jats:italic>.<\/jats:p><\/jats:sec>Conclusion<\/jats:title>On reliable test data we showed that our method provides good results in post-processing the predictions of the widely-used program GLIMMER. The underlying clustering algorithm is robust with respect to variations in the initial TIS annotation and does not require specific assumptions about prokaryotic gene starts. These features are particularly useful on genomes with high G+C content. The algorithm has been implemented in the tool \u00bbTICO\u00ab(TIs COrrector) which is publicly available from our web site.<\/jats:p><\/jats:sec>","DOI":"10.1186\/1471-2105-7-121","type":"journal-article","created":{"date-parts":[[2006,3,10]],"date-time":"2006-03-10T19:19:59Z","timestamp":1142018399000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":20,"title":["An unsupervised classification scheme for improving predictions of prokaryotic TIS"],"prefix":"10.1186","volume":"7","author":[{"given":"Maike","family":"Tech","sequence":"first","affiliation":[]},{"given":"Peter","family":"Meinicke","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2006,3,9]]},"reference":[{"issue":"23","key":"860_CR1","doi-asserted-by":"publisher","first-page":"4636","DOI":"10.1093\/nar\/27.23.4636","volume":"27","author":"AL Delcher","year":"1999","unstructured":"Delcher AL, Harmon D, Kasif S, White O, Salzberg SL: Improved microbial gene identification with GLIMMER. Nucleic Acids Res 1999, 27(23):4636\u20134641. 10.1093\/nar\/27.23.4636","journal-title":"Nucleic Acids Res"},{"issue":"6","key":"860_CR2","doi-asserted-by":"publisher","first-page":"1780","DOI":"10.1093\/nar\/gkg254","volume":"31","author":"FB Guo","year":"2003","unstructured":"Guo FB, Ou HY, Zhang CT: ZCURVE: A new system for recognizing protein-coding genes in bacterial and archaeal genomes. Nucleic Acides Res 2003, 31(6):1780\u20131789. 10.1093\/nar\/gkg254","journal-title":"Nucleic Acides Res"},{"issue":"12","key":"860_CR3","doi-asserted-by":"publisher","first-page":"2607","DOI":"10.1093\/nar\/29.12.2607","volume":"29","author":"J Besemer","year":"2001","unstructured":"Besemer J, Lomsadze A, Borodovsky M: GeneMarkS: A self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res 2001, 29(12):2607\u20132618. 10.1093\/nar\/29.12.2607","journal-title":"Nucleic Acids Res"},{"issue":"3","key":"860_CR4","doi-asserted-by":"publisher","first-page":"535","DOI":"10.1016\/j.biocel.2003.08.013","volume":"36","author":"HY Ou","year":"2004","unstructured":"Ou HY, Guo FB, Zhang CT: GS-Finder: A program to find bacterial gene start sites with a self-training method. The International Journal of Biochemistry & Cell Biology 2004, 36(3):535\u2013544. 10.1016\/j.biocel.2003.08.013","journal-title":"The International Journal of Biochemistry & Cell Biology"},{"issue":"18","key":"860_CR5","doi-asserted-by":"publisher","first-page":"3308","DOI":"10.1093\/bioinformatics\/bth390","volume":"20","author":"HQ Zhu","year":"2004","unstructured":"Zhu HQ, Hu GQ, Ouyang ZQ, Wang J, She ZS: Accuracy improvement for identifying translation initiation sites in microbial genomes. Bioinformatics 2004, 20(18):3308\u20133317. 10.1093\/bioinformatics\/bth390","journal-title":"Bioinformatics"},{"issue":"4","key":"860_CR6","first-page":"441","volume":"3","author":"M Tech","year":"2003","unstructured":"Tech M, Merkl R: YACOP: Enhanced gene prediction obtained by a combination of existing methods. In Silico Biology 2003, 3(4):441\u201351. [http:\/\/www.bioinfo.de\/isb\/2003\/03\/0037\/main.html]","journal-title":"In Silico Biology"},{"key":"860_CR7","first-page":"262","volume-title":"ISMB","author":"M Tompa","year":"1999","unstructured":"Tompa M: An exact method for finding short motifs in sequences, with application to the ribosom binding site problem. ISMB 1999, 262\u2013271."},{"issue":"17","key":"860_CR8","doi-asserted-by":"publisher","first-page":"3577","DOI":"10.1093\/nar\/27.17.3577","volume":"27","author":"SS Hannenhalli","year":"1999","unstructured":"Hannenhalli SS, Hayes WS, Hatzigeoriou AG, Fickett JW: Bacterial start site prediction. Nucleic Acids Res 1999, 27(17):3577\u20133582. 10.1093\/nar\/27.17.3577","journal-title":"Nucleic Acids Res"},{"key":"860_CR9","doi-asserted-by":"publisher","first-page":"123","DOI":"10.1016\/0097-8485(93)85004-V","volume":"17","author":"M Borodovsky","year":"1993","unstructured":"Borodovsky M, McIninch J: GenMark: Parallel gene recognition for both DNA strands. Comput Chem 1993, 17: 123\u2013133. 10.1016\/0097-8485(93)85004-V","journal-title":"Comput Chem"},{"issue":"12","key":"860_CR10","doi-asserted-by":"publisher","first-page":"1123","DOI":"10.1093\/bioinformatics\/17.12.1123","volume":"17","author":"BE Suzek","year":"2001","unstructured":"Suzek BE, Ermolaeva MD, Schreiber M, Salzberg SL: A probabilistic method for identifying start codons in bacterial genomes. Bioinformatics 2001, 17(12):1123\u20131130. 10.1093\/bioinformatics\/17.12.1123","journal-title":"Bioinformatics"},{"key":"860_CR11","volume-title":"Bioinformatics","author":"M Tech","year":"2005","unstructured":"Tech M, Pfeifer N, Morgenstern B, Meinicke P: TICO: A tool for improving predictions of prokaryotic translation initiation sites. Bioinformatics 2005. [http:\/\/bioinformatics.oxfordjournals.org\/cgi\/content\/abstract\/21\/17\/3568]"},{"key":"860_CR12","doi-asserted-by":"crossref","unstructured":"Meinicke P, Tech M, Morgenstern B, Merkl R: Oligo Kernels for datamining on biological sequences: A case study on prokaryotic translation initiation sites. BMC Bioinformatics 2004., 5(169): [http:\/\/www.biomedcentral.com\/1471\u20132105\/5\/169\/abstract]","DOI":"10.1186\/1471-2105-5-169"},{"key":"860_CR13","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1111\/j.2517-6161.1977.tb01600.x","volume":"39","author":"AP Dempster","year":"1977","unstructured":"Dempster AP, Laird NM, Rubin DB: Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society Series B 1977, 39: 1\u201338.","journal-title":"Journal of the Royal Statistical Society Series B"},{"issue":"8","key":"860_CR14","doi-asserted-by":"publisher","first-page":"2115","DOI":"10.1162\/089976698300016990","volume":"10","author":"A Utsugi","year":"1998","unstructured":"Utsugi A: Density Estimation by Mixture Models with Smoothing Priors. Neural Computation 1998, 10(8):2115\u20132135. 10.1162\/089976698300016990","journal-title":"Neural Computation"},{"key":"860_CR15","volume-title":"Signal Detection Theory and ROC Analysis","author":"JP Egan","year":"1975","unstructured":"Egan JP: Signal Detection Theory and ROC Analysis. New York: Academic Press; 1975."},{"issue":"4857","key":"860_CR16","doi-asserted-by":"publisher","first-page":"1285","DOI":"10.1126\/science.3287615","volume":"240","author":"JA Swets","year":"1988","unstructured":"Swets JA: Measuring the accuracy of diagnostic systems. Science 1988, 240(4857):1285\u20131293. 10.1126\/science.3287615","journal-title":"Science"},{"key":"860_CR17","doi-asserted-by":"publisher","first-page":"60","DOI":"10.1093\/nar\/28.1.60","volume":"28","author":"KE Rudd","year":"2000","unstructured":"Rudd KE: EcoGene: A genome sequence database for Escherichia coli K-12. Nucleic Acids Res 2000, 28: 60\u201364. 10.1093\/nar\/28.1.60","journal-title":"Nucleic Acids Res"},{"key":"860_CR18","doi-asserted-by":"publisher","first-page":"1259","DOI":"10.1002\/elps.1150180807","volume":"18","author":"AJ Link","year":"1997","unstructured":"Link AJ, Robinson K, Church GM: Comparing the predicted and observed properties encoded in the genome of Escherichia coli . Electrophoresis 1997, 18: 1259\u20131313. 10.1002\/elps.1150180807","journal-title":"Electrophoresis"},{"issue":"5331","key":"860_CR19","doi-asserted-by":"publisher","first-page":"1453","DOI":"10.1126\/science.277.5331.1453","volume":"277","author":"FR Blattner","year":"1997","unstructured":"Blattner FR, Plunkett GI, Bloch CA, Perna NT, Burland V, Riley M, Collodo-Vides J, Glasner DD, Rode CK, Mayhew GF, Gregor J, WDavis N, Kirkpatrick HA, Goeden MA, Rose DJ, Mau B, Shao Y: Complete genome sequence of Escherichia coli K-12. Science 1997, 277(5331):1453\u20131474. 10.1126\/science.277.5331.1453","journal-title":"Science"},{"issue":"6657","key":"860_CR20","doi-asserted-by":"publisher","first-page":"249","DOI":"10.1038\/36786","volume":"390","author":"F Kunst","year":"1997","unstructured":"Kunst F, Ogasawara N, Moszer L, Albertini AM, Alloni G, Azevedo V, Bertero MG, Bessieres P, Bolotin A, Borriss SB, Boursier L, Brans A, Braun M, Brignell SC, Bron S, Brouillet S, Bruschi CV, Caldwell B, Capuano V, Carter NM, Choi SK, Codani JJ, Connerton IF, Cummings NJ, Daniel RA, Denizot F, Devine KM, D\u00fcsterh\u00f6ft A, Ehrlich SD, Emmerson PT, Entian KD, Errington J, Fabret C, Ferrari E, Foulger D, Fritz C, Fujita M, Fujita Y, Fuma S, Galizzi A, Galleron N, Ghim SY, Glaser P, Goffeau A, Golightly EJ, Grandi G, Guiseppi G, Guy BJ, Haga K, Haiech J, Harwood CR, H\u00e9naut A, Hilbert H, Holsappel S, Hosono S, Hullo MF, Itaya M, Jones L, Joris B, Karamata D, Kasahara Y, Klaerr-Blanchard M, Klein C, Kobayashi Y, Koetter P, Koningstein G, Krogh S, Kumano M, Kurita K, Lapidus A, Lardinois S, Lauber J, Lazarevic V, Lee SM, Levine A, Liu H, Masuda S, Mau\u00ebl C, M\u00e9digue C, Medina N, Mellado RP, Mizuno M, Moest D, Nakai S, Noback M, Noone D, O'Reilly M, Ogawa K, Ogiwara A, Oudega B, Park SH, Parro V, Pohl TM, Portetelle D, Porwolli S, Prescott AM, Presecan E, Pujic P, Purnelle B, Rapoport G, Rey M, Reynolds S, Rieger M, Rivolta C, Rocha E, Roche B, Rose M, Sadaie Y, Sato T, Scanlan E, Schleich S, Schroeter R, Scoffone F, Sekiguchi J, Sekowska A, Seror SJ, Serror P, Shin BS, Soldo B, Sorokin A, Tacconi E, Takagi T, Takahashi H, Takemaru K, Takeuchi M, Tamakoshi A, Tanaka T, Terpstra P, Tognoni A, Tosato V, Uchiyama S, Vandenbol M, Vannier F, Vassarotti A, Viari A, Wambutt R, Wedler E, Wedler H, Weitzenegger T, Winters P, Wipat A, Yamamoto H, Yamane K, Yasumoto K, Yata K, Yoshida K, Yoshikawa HF, Zumstein E, Yoshikawa H, Danchin A: The complete genome sequence of the Gram-positive bacterium Bacillus subtilis . Nature 1997, 390(6657):249\u2013256. 10.1038\/36786","journal-title":"Nature"},{"key":"860_CR21","doi-asserted-by":"publisher","first-page":"97","DOI":"10.1093\/dnares\/8.3.97","volume":"8","author":"T Yada","year":"2001","unstructured":"Yada T, Totoki Y, Takagi T, Nakai K: A novel bacterial gene-finding system with improved accuracy in locating start codon. DNA Res 2001, 8: 97\u2013106. 10.1093\/dnares\/8.3.97","journal-title":"DNA Res"},{"key":"860_CR22","unstructured":"Center of Theoretical Biology (CTB), Peking University[http:\/\/ctb.pku.edu.cn\/main\/SheGroup]"},{"key":"860_CR23","doi-asserted-by":"publisher","first-page":"959","DOI":"10.1038\/35023079","volume":"406","author":"K Stover","year":"2000","unstructured":"Stover K, Pham X, Erwin A, Mizoguchi S, Warrener P, Hickey M, Brinkman F, Hufnagle WO, Kowalik D, Lagrou M: Complete genome sequence of Pseudomonas aeruginosa PAO1: an opportunistic pathogen. Nature 2000, (406):959\u2013964.","journal-title":"Nature"},{"key":"860_CR24","unstructured":"Pseudomonas aeruginosa Community Annotation Project[http:\/\/pseudomonas.com\/]"},{"issue":"39","key":"860_CR25","doi-asserted-by":"publisher","first-page":"14240","DOI":"10.1073\/pnas.0403302101","volume":"101","author":"M Holden","year":"2004","unstructured":"Holden M, Titball R, Peacock S, Cerdeno-Tarraga A, Atkins T, Crossman L, Pitt T, Churcher C, Mungall K, Bentley S, Sebaihia M, Thomson N, Beacham NBI, Brooks K, Brown K, Brown N, Challis G, Cherevach I, Chillingworth T, Cronin A, Crossett B, Davis P, DeShazer D, Feltwell T, Fraser A, Hance Z, Hauser H, Holroyd S, Jagels K, Keith K, Moule MMS, Price C, Quail M, Rabbinowitsch E, Rutherford K, Sanders M, Simmonds M, Songsivilai S, Stevens K, Tumapa S, Vesaratchavest M, Yeats SWC, Barrell B, Oyston P, Parkhill J: From the cover: genomic plasticity of the causative agent of melioidosis, Burkholderia pseudomallei . Proc Natl Acad Sci USA 2004, 101(39):14240\u201314245. 10.1073\/pnas.0403302101","journal-title":"Proc Natl Acad Sci USA"},{"issue":"6871","key":"860_CR26","doi-asserted-by":"publisher","first-page":"497","DOI":"10.1038\/415497a","volume":"415","author":"M Salanoubat","year":"2002","unstructured":"Salanoubat M, Genin S, Artiguenave F, Gouzy J, Mangenot S, Arlat M, Billault A, Brottier P, Camus J, Cattolico L, Chandler M, Choisne N, Claudel-Renard C, Cunnac S, Demange N, Gaspin C, Lavie M, Moisan A, Robert C, Saurin W, Schiex T, Siguier P, Thebault P, Whalen M, Wincker P, Levy M, Weissenbach J, Boucher C: Genome sequence of the plant pathogen Ralstonia solanacearum . Nature 2002, 415(6871):497\u2013502. 10.1038\/415497a","journal-title":"Nature"},{"key":"860_CR27","unstructured":"Sanger Trust Institute[http:\/\/www.sanger.ac.uk\/]"},{"key":"860_CR28","doi-asserted-by":"publisher","first-page":"3738","DOI":"10.1093\/nar\/gkg610","volume":"31","author":"T Schiex","year":"2003","unstructured":"Schiex T, Gouzy J, Moisan A, de Oliveira Y: FrameD: a flexible program for quality check and gene prediction in prokaryotic genomes and noisy matured eukaryotic sequences. Nucl Acids Res 2003, 31: 3738\u20133741. [http:\/\/www.toulouse.inra.fr\/FrameD.html\/] 10.1093\/nar\/gkg610","journal-title":"Nucl Acids Res"},{"key":"860_CR29","doi-asserted-by":"publisher","first-page":"1342","DOI":"10.1073\/pnas.71.4.1342","volume":"71","author":"J Shine","year":"1974","unstructured":"Shine J, Dalgarno L: The 3' terminal sequence of Escherichia coli 16S ribosomal RNA: complementary to nonsense triplets and ribosom binding sites. Proc Natl Acad Sci 1974, (71):1342\u20131346. 10.1073\/pnas.71.4.1342","journal-title":"Proc Natl Acad Sci"},{"key":"860_CR30","doi-asserted-by":"publisher","first-page":"2941","DOI":"10.1093\/nar\/26.12.2941","volume":"26","author":"D Frishman","year":"1998","unstructured":"Frishman D, Mironov A, Mewes HW, Gelfand M: Combining diverse evidence for gene recognition in completely sequenced bacterial genomes. Nucleic Acids Res 1998, 26: 2941\u20132947. 10.1093\/nar\/26.12.2941","journal-title":"Nucleic Acids Res"},{"key":"860_CR31","unstructured":"TICO web interface[http:\/\/tico.gobics.de\/]"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-7-121.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,8]],"date-time":"2025-01-08T00:01:28Z","timestamp":1736294488000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-7-121"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2006,3,9]]},"references-count":31,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2006,12]]}},"alternative-id":["860"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-7-121","relation":{},"ISSN":["1471-2105"],"issn-type":[{"type":"electronic","value":"1471-2105"}],"subject":[],"published":{"date-parts":[[2006,3,9]]},"assertion":[{"value":"28 November 2005","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 March 2006","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 March 2006","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"121"}}