Summary
In this paper, we propose SimSearch, an algorithm implementing a new variant of dynamic programming based on distance series for optimal and near-optimal similarity discovery in biological sequences. The initial phase of SimSearch is devoted to fulfil the binary similarity matrices by signalling the distances between occurrences of the same symbol. The scoring scheme is further applied, when analysed the maximal extension of the pattern. Employing bit parallelism to analyse the global similarity matrix’s upper triangle, the new methodology searches the sequence(s) for all the exact and approximate patterns in regular or reverse order. The algorithm accepts parameterization to work with greater seeds for near-optimal results. Performance tests show significant efficiency improvement over traditional optimal methods based on dynamic programming. Comparing the new algorithm’s efficiency against heuristic based methods, equalizing the required sensitivity, the proposed algorithm remains acceptable.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Altschul, S.F., Gish, W., Miller, W., Myers, E., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)
Bellman, R.E.: Dynamic Programming. Princeton University Press, Princeton (1957)
Gotoh, O.: An improved algorithm for matching biological sequences. J. Mol. Biol. 162, 705–708 (1982)
Henikoff, Henikoff.: Amino Acid Substitution Matrices from Protein Blocks. Natl. Acad. Sci. USA 89, 10915 (1989)
Huang, X., Miller, W.: A time-eficient, linear-space local similarity algorithm. Adv. Appl. Math. 12, 337–357 (1991)
José, M.V., Govezensky, T., Bobadilla, J.R.: Statistical properties of DNA sequences revisited: the role of inverse bilateral symmetry in bacterial chromosomes. Physica A: Statistical Mechanics and its Applications 351, 477–498 (2005)
Kolpakov, R., Bana, G., Kucherov, G.: mreps: efficient and flexible detection of tandem repeats in DNA. Nucleic Acids Res. 31, 3672–3678 (2003)
Kruskal, J.B.: An overview of sequence comparison. Addison Wesley, Reading (1983)
Lefebvre, A., Lecroq, T., Dauchel, H., Alexandre, J.: FORRepeats: detects repeats on entire chromosomes and between genomes. Bioinformatics 19, 319–326 (2002)
Li, M., Ma, B., Kisman, D., Tromp, J.: PatternHunter II: Highly Sensitive and Fast Homology Search. J. Bioinform. Comput. Biol. 2, 417–439 (2004)
Ma, B., Tromp, J., Li, M.: Pattern Hunter: fast and more sensitive homology search. Bioinformatics 18, 440–445 (2002)
Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48, 443–453 (1970)
Pearson, W.R.: Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. Genomics 11, 635–650 (1991)
Peltola, H., Tarhio, J.: Alternative Algorithms for Bit-Parallel String Matching. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds.) SPIRE 2003. LNCS, vol. 2857, pp. 80–93. Springer, Heidelberg (2003)
Sanchez, F., Salami, E., Ramirez, A., Valero, M.: Performance Analysis of Sequence Alignment Applications. In: IEEE International Symposium on Workload Characterization, pp. 51–60 (2006)
Schmidt, T., Heslop-Harrison, J.S.: Genomes genes and junk: the large-scale organization of plant chromosomes. Trends Plant Sci. 3, 195–199 (1998)
Sellers, P.H.: On the theory and computation of evolutionary distances. SIAM J. Appl. Math. 26, 787–793 (1974)
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. Journal of Molecular Biology 147, 195–197 (1981)
Teodorescu, H.-N., Fira, L.-I.: Analysis of the predictability of time series obtained from genomic sequences by using several predictors. Journal of Intelligent and Fuzzy Systems 19, 51–63 (2008)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Deusdado, S.A.D., Carvalho, P.M.M. (2009). SimSearch: A New Variant of Dynamic Programming Based on Distance Series for Optimal and Near-Optimal Similarity Discovery in Biological Sequences. In: Corchado, J.M., De Paz, J.F., Rocha, M.P., Fernández Riverola, F. (eds) 2nd International Workshop on Practical Applications of Computational Biology and Bioinformatics (IWPACBB 2008). Advances in Soft Computing, vol 49. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85861-4_25
Download citation
DOI: https://doi.org/10.1007/978-3-540-85861-4_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85860-7
Online ISBN: 978-3-540-85861-4
eBook Packages: EngineeringEngineering (R0)