SimSearch: A New Variant of Dynamic Programming Based on Distance Series for Optimal and Near-Optimal Similarity Discovery in Biological Sequences | SpringerLink
Skip to main content

SimSearch: A New Variant of Dynamic Programming Based on Distance Series for Optimal and Near-Optimal Similarity Discovery in Biological Sequences

  • Conference paper
2nd International Workshop on Practical Applications of Computational Biology and Bioinformatics (IWPACBB 2008)

Part of the book series: Advances in Soft Computing ((AINSC,volume 49))

  • 707 Accesses

Summary

In this paper, we propose SimSearch, an algorithm implementing a new variant of dynamic programming based on distance series for optimal and near-optimal similarity discovery in biological sequences. The initial phase of SimSearch is devoted to fulfil the binary similarity matrices by signalling the distances between occurrences of the same symbol. The scoring scheme is further applied, when analysed the maximal extension of the pattern. Employing bit parallelism to analyse the global similarity matrix’s upper triangle, the new methodology searches the sequence(s) for all the exact and approximate patterns in regular or reverse order. The algorithm accepts parameterization to work with greater seeds for near-optimal results. Performance tests show significant efficiency improvement over traditional optimal methods based on dynamic programming. Comparing the new algorithm’s efficiency against heuristic based methods, equalizing the required sensitivity, the proposed algorithm remains acceptable.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 17159
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 21449
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Altschul, S.F., Gish, W., Miller, W., Myers, E., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)

    Google Scholar 

  2. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)

    Article  Google Scholar 

  3. Bellman, R.E.: Dynamic Programming. Princeton University Press, Princeton (1957)

    Google Scholar 

  4. Gotoh, O.: An improved algorithm for matching biological sequences. J. Mol. Biol. 162, 705–708 (1982)

    Article  Google Scholar 

  5. Henikoff, Henikoff.: Amino Acid Substitution Matrices from Protein Blocks. Natl. Acad. Sci. USA 89, 10915 (1989)

    Article  Google Scholar 

  6. Huang, X., Miller, W.: A time-eficient, linear-space local similarity algorithm. Adv. Appl. Math. 12, 337–357 (1991)

    Article  MATH  MathSciNet  Google Scholar 

  7. José, M.V., Govezensky, T., Bobadilla, J.R.: Statistical properties of DNA sequences revisited: the role of inverse bilateral symmetry in bacterial chromosomes. Physica A: Statistical Mechanics and its Applications 351, 477–498 (2005)

    Article  Google Scholar 

  8. Kolpakov, R., Bana, G., Kucherov, G.: mreps: efficient and flexible detection of tandem repeats in DNA. Nucleic Acids Res. 31, 3672–3678 (2003)

    Article  Google Scholar 

  9. Kruskal, J.B.: An overview of sequence comparison. Addison Wesley, Reading (1983)

    Google Scholar 

  10. Lefebvre, A., Lecroq, T., Dauchel, H., Alexandre, J.: FORRepeats: detects repeats on entire chromosomes and between genomes. Bioinformatics 19, 319–326 (2002)

    Article  Google Scholar 

  11. Li, M., Ma, B., Kisman, D., Tromp, J.: PatternHunter II: Highly Sensitive and Fast Homology Search. J. Bioinform. Comput. Biol. 2, 417–439 (2004)

    Article  Google Scholar 

  12. Ma, B., Tromp, J., Li, M.: Pattern Hunter: fast and more sensitive homology search. Bioinformatics 18, 440–445 (2002)

    Article  Google Scholar 

  13. Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48, 443–453 (1970)

    Article  Google Scholar 

  14. Pearson, W.R.: Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. Genomics 11, 635–650 (1991)

    Article  Google Scholar 

  15. Peltola, H., Tarhio, J.: Alternative Algorithms for Bit-Parallel String Matching. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds.) SPIRE 2003. LNCS, vol. 2857, pp. 80–93. Springer, Heidelberg (2003)

    Google Scholar 

  16. Sanchez, F., Salami, E., Ramirez, A., Valero, M.: Performance Analysis of Sequence Alignment Applications. In: IEEE International Symposium on Workload Characterization, pp. 51–60 (2006)

    Google Scholar 

  17. Schmidt, T., Heslop-Harrison, J.S.: Genomes genes and junk: the large-scale organization of plant chromosomes. Trends Plant Sci. 3, 195–199 (1998)

    Article  Google Scholar 

  18. Sellers, P.H.: On the theory and computation of evolutionary distances. SIAM J. Appl. Math. 26, 787–793 (1974)

    Article  MATH  MathSciNet  Google Scholar 

  19. Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. Journal of Molecular Biology 147, 195–197 (1981)

    Article  Google Scholar 

  20. Teodorescu, H.-N., Fira, L.-I.: Analysis of the predictability of time series obtained from genomic sequences by using several predictors. Journal of Intelligent and Fuzzy Systems 19, 51–63 (2008)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Juan M. Corchado Juan F. De Paz Miguel P. Rocha Florentino Fernández Riverola

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Deusdado, S.A.D., Carvalho, P.M.M. (2009). SimSearch: A New Variant of Dynamic Programming Based on Distance Series for Optimal and Near-Optimal Similarity Discovery in Biological Sequences. In: Corchado, J.M., De Paz, J.F., Rocha, M.P., Fernández Riverola, F. (eds) 2nd International Workshop on Practical Applications of Computational Biology and Bioinformatics (IWPACBB 2008). Advances in Soft Computing, vol 49. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85861-4_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85861-4_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85860-7

  • Online ISBN: 978-3-540-85861-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics