Abstract
A new method to identify all sufficiently long repeating nucleotide substrings in one or several DNA sequences is proposed. The method based on a specific gauge applied to DNA sequences that guarantees identification of the repeating substrings. The method allows the matching substrings to contain a given level of errors. The gauge is based on the development of a heavily sparse dictionary of repeats, thus drastically accelerating the search procedure. Some biological applications illustrate the method.
Similar content being viewed by others
References
Altschul, S.F., Gish, W., Miller, W., Meyers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)
Bergroth, L., Hakonen, H., Raita, T.: A survey of longest common subsequence algorithms. In: SPIRE 2000, pp. 39–48. IEEE Computer Society (2000)
Chen, G.L., Chang, Y.J., Hsueh, C.H.: PRAP: an ab initio software package for automated genome-wide analysis of DNA repeats for prokaryotes. Bioinformatics. 29(21), 2683–2689 (2013)
Erciyes, K.: Distributed and Sequential Algorithms for Bioinformatics. Computational Biology, vol. 23, p. 367. Springer, Cham (2015)
Girgis, H.Z.: Red: an intelligent, rapid, accurate tool for detecting repeats de-novo on the genomic scale. BMC Bioinform. 16, 227 (2015)
Hirschberg, D.S.: A linear space algorithm for computing maximal common subsequences. Commun. ACM 18(6), 341–343 (1975)
Lian, S., Chen, X., Wang, P., Zhang, X., Dai, X.: A complete and accurate Ab initio repeat finding algorithm. Interdiscip. Sci. 8(1), 75–83 (2016)
Maier, D.: The complexity of some problems on subsequences and supersequences. J. ACM 25(2), 322–336 (1978). ACM Press
Masek, W.J., Paterson, M.S.: A faster algorithm computing string edit distances. J. Comput. Syst. Sci. 20(1), 18–31 (1980)
Novák, P., Neumann, P., Pech, J., Steinhaisl, J., Macas, J.: RepeatExplorer: a galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next-generation sequence reads. Bioinformatics 29(6), 792–793 (2013)
Pearson, W., Lipman, D.: Improved tools for biological sequence comparison. PNAS USA 85, 2444–2448 (1988)
Sadovsky, M.G.: Information capacity of nucleotide sequences and its applications. Bull. Math. Biology. 68, 156 (2006)
Acknowledgments
This study was supported by a research grant No. 14.Y26.31.0004 from the Government of the Russian Federation (M.G. Sadovsky) and the grant from Russian Ministry of Education and Science to Siberian Federal University, contract No. 1.1462.2014/K (S.P. Tsarev).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Tsarev, S.P., Sadovsky, M.G. (2016). New Error Tolerant Method for Search of Long Repeats in DNA Sequences. In: Botón-Fernández, M., Martín-Vide, C., Santander-Jiménez, S., Vega-Rodríguez, M.A. (eds) Algorithms for Computational Biology. AlCoB 2016. Lecture Notes in Computer Science(), vol 9702. Springer, Cham. https://doi.org/10.1007/978-3-319-38827-4_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-38827-4_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-38826-7
Online ISBN: 978-3-319-38827-4
eBook Packages: Computer ScienceComputer Science (R0)