{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,1,2]],"date-time":"2024-01-02T10:00:33Z","timestamp":1704189633808},"reference-count":22,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2010,12]]},"abstract":"Abstract<\/jats:title>\n \n Background<\/jats:title>\n While multiple alignment is the first step of usual classification schemes for biological sequences, alignment-free methods are being increasingly used as alternatives when multiple alignments fail. Subword-based combinatorial methods are popular for their low algorithmic complexity (suffix trees ...) or exhaustivity (motif search), in general with fixed length word and\/or number of mismatches. We developed previously a method to detect local similarities (the N<\/jats:italic>-local decoding) based on the occurrences of repeated subwords of fixed length, which does not impose a fixed number of mismatches. The resulting similarities are, for some \"good\" values of N<\/jats:italic>, sufficiently relevant to form the basis of a reliable alignment-free classification. The aim of this paper is to develop a method that uses the similarities detected by N<\/jats:italic>-local decoding while not imposing a fixed value of N<\/jats:italic>. We present a procedure that selects for every position in the sequences an adaptive value of N<\/jats:italic>, and we implement it as the MS4 classification tool.<\/jats:p>\n <\/jats:sec>\n \n Results<\/jats:title>\n Among the equivalence classes produced by the N<\/jats:italic>-local decodings for all N<\/jats:italic>, we select a (relatively) small number of \"relevant\" classes corresponding to variable length subwords that carry enough information to perform the classification. The parameter N<\/jats:italic>, for which correct values are data-dependent and thus hard to guess, is here replaced by the average repetitivity \u03ba<\/jats:italic> of the sequences. We show that our approach yields classifications of several sets of HIV\/SIV sequences that agree with the accepted taxonomy, even on usually discarded repetitive regions (like the non-coding part of LTR).<\/jats:p>\n <\/jats:sec>\n \n Conclusions<\/jats:title>\n The method MS4 satisfactorily classifies a set of sequences that are notoriously hard to align. This suggests that our approach forms the basis of a reliable alignment-free classification tool. The only parameter \u03ba<\/jats:italic> of MS4 seems to give reasonable results even for its default value, which can be a great advantage for sequence sets for which little information is available.<\/jats:p>\n <\/jats:sec>","DOI":"10.1186\/1471-2105-11-406","type":"journal-article","created":{"date-parts":[[2010,7,30]],"date-time":"2010-07-30T18:59:45Z","timestamp":1280516385000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["MS4 - Multi-Scale Selector of Sequence Signatures: An alignment-free method for classification of biological sequences"],"prefix":"10.1186","volume":"11","author":[{"given":"Eduardo","family":"Corel","sequence":"first","affiliation":[]},{"given":"Florian","family":"Pitschi","sequence":"additional","affiliation":[]},{"given":"Ivan","family":"Laprevotte","sequence":"additional","affiliation":[]},{"given":"Gilles","family":"Grasseau","sequence":"additional","affiliation":[]},{"given":"Gilles","family":"Didier","sequence":"additional","affiliation":[]},{"given":"Claudine","family":"Devauchelle","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2010,7,30]]},"reference":[{"key":"3863_CR1","first-page":"87","volume-title":"RECOMB-CG Proceedings","author":"B Haubold","year":"2008","unstructured":"Haubold B, Domazet-Loso M, Wiehe T: Alignment-free distance measure for closely related genomes. RECOMB-CG Proceedings 2008, 87\u201399."},{"issue":"18","key":"3863_CR2","doi-asserted-by":"publisher","first-page":"2224","DOI":"10.1093\/bioinformatics\/btl376","volume":"22","author":"T Lingner","year":"2006","unstructured":"Lingner T, Meinicke P: Remote homology detection based on oligomer distances. Bioinformatics 2006, 22(18):2224\u20132231. 10.1093\/bioinformatics\/btl376","journal-title":"Bioinformatics"},{"key":"3863_CR3","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1142\/S0219720004000442","volume":"2","author":"B Hao","year":"2004","unstructured":"Hao B, Qi J: Prokaryote phylogeny without sequence alignment: from avoidance signature to composition distance. J Bioinform Comput Biol 2004, 2: 1\u201319. 10.1142\/S0219720004000442","journal-title":"J Bioinform Comput Biol"},{"issue":"Suppl 6","key":"3863_CR4","doi-asserted-by":"publisher","first-page":"S15","DOI":"10.1186\/1471-2105-9-S6-S15","volume":"9","author":"G Lu","year":"2008","unstructured":"Lu G, Zhang S, Fang X: An improved string composition method for sequence comparison. BMC Bioinformatics 2008, 9(Suppl 6):S15. 10.1186\/1471-2105-9-S6-S15","journal-title":"BMC Bioinformatics"},{"key":"3863_CR5","doi-asserted-by":"publisher","first-page":"2369","DOI":"10.1093\/nar\/27.11.2369","volume":"27","author":"AL Delcher","year":"1999","unstructured":"Delcher AL, Kasif S, Fleischmann RD, Peterson J, White O, Salzberg SL: Alignment of whole genomes. Nucl Acids Res 1999, 27: 2369\u20132376. 10.1093\/nar\/27.11.2369","journal-title":"Nucl Acids Res"},{"key":"3863_CR6","doi-asserted-by":"publisher","first-page":"S312","DOI":"10.1093\/bioinformatics\/18.suppl_1.S312","volume":"18","author":"M H\u00f6hl","year":"2002","unstructured":"H\u00f6hl M, Kurtz S, Ohlebusch E: Efficient multiple genome alignment. Bioinformatics 2002, 18: S312-S320.","journal-title":"Bioinformatics"},{"key":"3863_CR7","doi-asserted-by":"publisher","first-page":"R12","DOI":"10.1186\/gb-2004-5-2-r12","volume":"5","author":"S Kurtz","year":"2004","unstructured":"Kurtz S, Phillippy A, Delcher AL, Shumway M, Antonescu C, Salzberg SL: Versatile and open software for comparing large genomes. Genome Biol 2004, 5: R12. 10.1186\/gb-2004-5-2-r12","journal-title":"Genome Biol"},{"key":"3863_CR8","doi-asserted-by":"publisher","first-page":"1394","DOI":"10.1101\/gr.2289704","volume":"14","author":"A Darling","year":"2004","unstructured":"Darling A, Mau B, Blatter FR, Perna NT: Mauve: multiple alignment of conserved genomic sequences with rearrangements. Genome Res 2004, 14: 1394\u20131403. 10.1101\/gr.2289704","journal-title":"Genome Res"},{"key":"3863_CR9","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1016\/S0304-3975(97)00122-9","volume":"215","author":"G Didier","year":"1999","unstructured":"Didier G: Caract\u00e9risation des N -\u00e9critures et application \u00e1 l'\u00e9tude des suites de complexit\u00e9 ultimement n +cste. Theor Comput Sc 1999, 215: 31\u201349. 10.1016\/S0304-3975(97)00122-9","journal-title":"Theor Comput Sc"},{"key":"3863_CR10","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/1471-2105-8-1","volume":"8","author":"G Didier","year":"2007","unstructured":"Didier G, Debomy L, Pupin M, Zhang M, Grossmann A, Devauchelle C, Laprevotte I: Comparing sequences without using alignments: application to HIV\/SIV subtyping. BMC Bioinformatics 2007, 8: 1. 10.1186\/1471-2105-8-1","journal-title":"BMC Bioinformatics"},{"key":"3863_CR11","doi-asserted-by":"publisher","first-page":"254","DOI":"10.1093\/molbev\/msj030","volume":"23","author":"DH Huson","year":"2006","unstructured":"Huson DH, Bryant D: Application of phylogenetics networks in evolutionary studies. Mol Biol Evol 2006, 23: 254\u2013267. 10.1093\/molbev\/msj030","journal-title":"Mol Biol Evol"},{"key":"3863_CR12","doi-asserted-by":"publisher","first-page":"255","DOI":"10.1093\/molbev\/msh018","volume":"21","author":"D Bryant","year":"2004","unstructured":"Bryant D, Moulton V: NeighborNet: an agglomerative algorithm for the construction of planar phylogenetic networks. Mol Biol Evol 2004, 21: 255\u2013265. 10.1093\/molbev\/msh018","journal-title":"Mol Biol Evol"},{"key":"3863_CR13","doi-asserted-by":"publisher","first-page":"1465","DOI":"10.1089\/cmb.2006.13.1465","volume":"13","author":"G Didier","year":"2006","unstructured":"Didier G, Laprevotte I, Pupin M, H\u00e9naut A: Local decoding of sequences and alignment-free comparison. J Comput Biol 2006, 13: 1465\u20131476. 10.1089\/cmb.2006.13.1465","journal-title":"J Comput Biol"},{"key":"3863_CR14","first-page":"27","volume-title":"Computational and Evolutionary Analysis of HIV Molecular Sequences","author":"CL Kuiken","year":"2001","unstructured":"Kuiken CL, Leitner T: HIV-1 Subtyping. In Computational and Evolutionary Analysis of HIV Molecular Sequences. Edited by: Rodrigo AG, Learn GHJ. Kluwer Academic Publishers; 2001:27\u201353."},{"key":"3863_CR15","unstructured":"HIV and SIV Nomenclature[http:\/\/www.hiv.lanl.gov\/content\/sequence\/HelpDocs\/subtypes-more.html]"},{"key":"3863_CR16","unstructured":"Los Alamos HIV sequence database[http:\/\/hiv-web.lanl.gov\/]"},{"key":"3863_CR17","unstructured":"HIV-1\/HIV-2\/SIV Complete Genomes[http:\/\/www.hiv.lanl.gov\/content\/sequence\/HIV\/COMPENDIUM\/2000\/HIV12SIVcomplete.pdf]"},{"key":"3863_CR18","doi-asserted-by":"publisher","first-page":"1231","DOI":"10.1093\/oxfordjournals.molbev.a003909","volume":"18","author":"I Laprevotte","year":"2001","unstructured":"Laprevotte I, Pupin M, Coward E, Didier G, Terzian C, Devauchelle C, H\u00e9naut A: HIV-1 and HIV-2 nucleotide sequences: assessment of the alignment by N-block presentation, \"retroviral signatures\" of overrepeated oligonucleotides, and probable important role of scrambled stepwise duplications\/deletions in molecular evolution. Mol Biol Evol 2001, 18: 1231\u20131245.","journal-title":"Mol Biol Evol"},{"issue":"1","key":"3863_CR19","doi-asserted-by":"publisher","first-page":"6","DOI":"10.1186\/1748-7188-1-6","volume":"1","author":"B Morgenstern","year":"2006","unstructured":"Morgenstern B, Prohaska S, P\u00f6hler D, Stadler PF: Multiple sequence alignment with user-defined anchor points. Algorithms for Molecular Biology 2006, 1(1):6. 10.1186\/1748-7188-1-6","journal-title":"Algorithms for Molecular Biology"},{"key":"3863_CR20","doi-asserted-by":"publisher","first-page":"1189","DOI":"10.1093\/bioinformatics\/btp033","volume":"25","author":"AM Waterhouse","year":"2009","unstructured":"Waterhouse AM, Procter JB, Martin DMA, Clamp M, Barton GJ: Jalview Version 2 - a multiple sequence alignment editor and analysis workbench. Bioinformatics 2009, 25: 1189\u20131191. 10.1093\/bioinformatics\/btp033","journal-title":"Bioinformatics"},{"key":"3863_CR21","volume-title":"BMC Bioinformatics","author":"F Pitschi","year":"2010","unstructured":"Pitschi F, Devauchelle C, Corel E: Automatic detection of anchor points for multiple alignment. BMC Bioinformatics 2010, in press."},{"key":"3863_CR22","unstructured":"Jalview Download Page[http:\/\/www.jalview.org\/download.html]"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-11-406.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T05:29:37Z","timestamp":1630474177000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-11-406"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,7,30]]},"references-count":22,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2010,12]]}},"alternative-id":["3863"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-11-406","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2010,7,30]]},"assertion":[{"value":"20 October 2009","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"30 July 2010","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"30 July 2010","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"406"}}