Abstract
Recently a new variation of approximate Boyer-Moore string matching was presented for the k-mismatch problem. This variation was developed for gene sequences. We further tuned this algorithm gaining speedups in both preprocessing and search times. Our preprocessing has lower time complexity than the previous algorithm and our experiments show that our algorithm is over 30% faster than the previous one. We also present two variations of the algorithm for the k-difference problem.
Work supported by Academy of Finland.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Arlazarova, V., Dinic, E., Kronrod, M., Faradzev, I.: On economic construction of the transitive closure of a directed graph. Doklady Academi Nauk SSSR 194, 487–488 (1970) (in Russian) (English translation in Soviet Mathematics Doklady 11, 1209–1210 (1975))
Baeza-Yates, R., Gonnet, G.: A new approach to text searching. Communications of the ACM 35(10), 74–82 (1992)
Baeza-Yates, R., Gonnet, G.: Fast string matching with mismatches. Information and Computation 108(2), 187–199 (1994)
Baeza-Yates, R.A., Perleberg, C.H.: Fast and practical approximate string matching. Information Processing Letters 59(1), 21–27 (1996)
Boyer, R., Moore, J.: A fast string searching algorithm. Communications of the ACM 10(20), 762–772 (1977)
El-Mabrouk, N., Crochemore, M.: Boyer-Moore strategy to efficient approximate string matching. In: Hirschberg, D.S., Meyers, G. (eds.) CPM 1996. LNCS, vol. 1075, pp. 24–38. Springer, Heidelberg (1996)
Fredriksson, K., Navarro, G.: Average-optimal single and multiple approximate string matching. ACM Journal of Experimental Algorithmics 9, 1–47 (2004)
Horspool, N.: Practical fast searching in strings. Software Practice & Experience 10, 501–506 (1980)
Liu, Z., Chen, X., Borneman, J., Jiang, T.: A fast algorithm for approximate string matching on gene sequences. In: Apostolico, A., Crochemore, M., Park, K. (eds.) CPM 2005. LNCS, vol. 3537, pp. 79–90. Springer, Heidelberg (2005)
Masek, W., Paterson, M.: A faster algorithm for computing string edit distances. Journal of Computer and System Sciences 20, 18–31 (1980)
Myers, G.: A fast bit-vector algorithm for approximate string matching based on dynamic programming. Journal of the ACM 46(3), 395–415 (1999)
Navarro, G.: A guided tour to approximate string matching. ACM Computing Surveys 33(1), 31–88 (2001)
Navarro, G., Sutinen, E., Tanninen, J., Tarhio, J.: Indexing text with approximate q-grams. In: Giancarlo, R., Sankoff, D. (eds.) CPM 2000. LNCS, vol. 1848, pp. 350–363. Springer, Heidelberg (2000)
Tarhio, J., Ukkonen, E.: Approximate Boyer-Moore string matching. SIAM Journal on Computing 22, 243–260 (1993)
Wu, S., Manber, U., Myers, E.: A subquadratic algorithm for approximate limited expression matching. Algorithmica 15(1), 50–67 (1996)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kalsi, P., Salmela, L., Tarhio, J. (2007). Tuning Approximate Boyer-Moore for Gene Sequences. In: Ziviani, N., Baeza-Yates, R. (eds) String Processing and Information Retrieval. SPIRE 2007. Lecture Notes in Computer Science, vol 4726. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75530-2_16
Download citation
DOI: https://doi.org/10.1007/978-3-540-75530-2_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75529-6
Online ISBN: 978-3-540-75530-2
eBook Packages: Computer ScienceComputer Science (R0)