Abstract
A novel algorithm to find all sufficiently long repeating nucleotide substrings in one or several DNA sequences is proposed. The algorithm searches approximately matching strings very fast with given level of local error density. Some biological applications illustrate the method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alsmadi, I., Nuser, M.: String matching evaluation methods for DNA comparison. Int. J. Adv. Sci. Technol. 47, 13–32 (2012)
Altschul, S.F., Gish, W., Miller, W., Meyers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. NAR 25(17), 3389–3402 (1997)
Bugaenko, N.N., Gorban, A.N., Sadovsky, M.G.: Maximum entropy method in analysis of genetic text and measurement of its information content. Open Syst. Inf. Dyn. 5(2), 265–278 (1998)
Canzar, S., Salzberg, S.L.: Short read mapping: an algorithmic tour. Proc. IEEE 105(3), 436–458 (2017)
Chegrane I., Belazzougui, D.: Simple, compact and robust approximate string dictionary. arXiv:1312.4678v2 [cs.DS] (2014)
Gonnet, G.H.: Some string matching problems from Bioinformatics which still need better solutions. J. Dis. Alg. 2(1), 3–15 (2004)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Dokl. 10(8), 707–710 (1966)
Maier, D.: The complexity of some problems on substrings and supersequences. J. ACM 25(2), 322–336 (1978)
Marçais, G., Delcher, A.L., Phillippy, A.M., Coston, R., Salzberg, S.L., Zimin, A.: MUMmer4: a fast and versatile genome alignment system. PLoS Comput. Biol. 14(1), e1005944 (2018)
Martin, R.R.: On the computation of edit distance functions. Dis. Math. 338(2), 291–305 (2015)
Pearson, W., Lipman, D.: Improved tools for biological sequence comparison. PNAS 85, 2444–2448 (1988)
Sadovsky, M.G.: Comparison of real frequencies of strings vs. the expected ones reveals the information capacity of macromoleculae. J. Biol. Phys. 29, 23 (2003)
Sadovsky, M.G.: Information capacity of nucleotide sequences and its applications. Bull. Math. Biol. 68, 156 (2006)
Tsarev, S.P., Sadovsky, M.G.: New error tolerant method for search of long repeats in DNA sequences. In: Boton-Fernandez, M., Martin-Vide, C., Santander-Jimenez, S., Vega-Rodríguez, M.A. (eds.) Algorithms for Computational Biology. LNCS, vol. 9702, pp. 171–182. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-38827-4_14
Acknowledgments
This study was supported by a research grant No. 14.Y26.31.0004 from the Government of the Russian Federation (M.G. Sadovsky) and the grant from Russian Ministry of Education and Science \(N^{o}\) 1.8591.2017/VU (S.P. Tsarev).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Tsarev, S.P., Senashova, M.Y., Sadovsky, M.G. (2018). Fast Algorithm for Vernier Search of Long Repeats in DNA Sequences with Bounded Error Density. In: Jansson, J., Martín-Vide, C., Vega-Rodríguez, M. (eds) Algorithms for Computational Biology. AlCoB 2018. Lecture Notes in Computer Science(), vol 10849. Springer, Cham. https://doi.org/10.1007/978-3-319-91938-6_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-91938-6_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91937-9
Online ISBN: 978-3-319-91938-6
eBook Packages: Computer ScienceComputer Science (R0)