High Performance Genomic Sequencing: A Filtered Approach | SpringerLink
Skip to main content

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 294))

  • 1152 Accesses

Abstract

Protein and DNA homology detection systems are an essential part in computational biology applications. These algorithms have changed over the time from dynamic programming approaches by finding the optimal local alignment between two sequences to statistical approaches with different kinds of heuristics that minimize former executions times. However, the continuously increasing size of input datasets is being projected into the use of High Performance Computing (HPC) hardware and software in order to address this problem. The aim of the research presented in this paper is to propose a new filtering methodology, based on general-purpose graphical processor units (GP-GPUs) and multi-core processors, for removing those sequences considered irrelevant in terms of homology and similarity. The proposed methodology is completely independent from the homology detection algorithm. This approach is very useful for researchers and practitioners because they do not need to understand a new algorithm. This design has been approved by the National Biotechnology Research Center of Spain (CNB).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 17159
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 21449
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. National Human Genome Research Institute. Why are genetics and genomics important to my health (2014), http://www.genome.gov/19016904

  2. Weiss, B.: Genomics companies ripe for flurry of mergers. The Wall Street Journal (2013), http://www.marketwatch.com/story/genomics-companies-ripe-for-flurry-of-mergers-2013-04-16

  3. Humphries, C.: A Hospital Takes Its Own Big-Data Medicine. MIT Technology Review (2013), http://www.technologyreview.com/news/518916/a-hospital-takes-its-own-big-data-medicine/

  4. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic Local Alignment Search Tool. J. Mol. Biol. 215, 403–410 (1990)

    Article  Google Scholar 

  5. Burrows, M., Wheeler, D.J.: A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation, Palo Alto, California (1994)

    Google Scholar 

  6. Kent, W.J.: BLAT the BLAT-like alignment tool. Gen. Res. 12, 656–664 (2002)

    Article  MathSciNet  Google Scholar 

  7. Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. Journal of Molecular Biology 147, 195–197 (1991)

    Article  Google Scholar 

  8. Aho, A.V., Corasick, M.J.: Efficient string matching: An aid to bibliographic search. Comm. ACM 18(6), 333–340 (1975)

    Article  MATH  MathSciNet  Google Scholar 

  9. Nordin, M., Rahman, A., Yazid, M., Saman, M., Ahmad, A., Osman, A., Tap, M.: A Filtering Algorithm for Efficient Retrieving of DNA Sequence. International Journal of Computer Theory and Engineering 1(2), 1793–8201 (2009)

    Google Scholar 

  10. Xiao, S., Lin, H., Feng, W.: Accelerating Protein Sequence Search in a Heterogeneous Computing System. In: Parallel & Distributed Processing Symposium, IPDPS (2011)

    Google Scholar 

  11. Vouzis, P.D., Sahinidis, N.V.: GPU-BLAST: using graphics processors to accelerate protein sequence alignment. Bioinformatics 27(2), 182–188 (2011)

    Article  Google Scholar 

  12. Wootton, J.C., Federhen, S.: Statistics of local complexity in amino acid sequences and sequence databases. Comput. Chem. 17, 149–163 (1993)

    Article  MATH  Google Scholar 

  13. Tatusova, T.A., Madden, T.L.: BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequences. FEMS Microbiol. Lett. 174, 247–250 (1999)

    Article  Google Scholar 

  14. Dayhoff, M.O., Schwartz, R.M., Orcutt, B.C.: A Model of Evolutionary Change in Proteins. In: Dayhoff, M.O. (ed.) Atlas of Protein Sequence and Structure, vol. 5(3), pp. 345–352 (1978)

    Google Scholar 

  15. Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proceedings of National Academic Science USA 89, 10915–10919 (1992)

    Article  Google Scholar 

  16. Altschul, S.F.: Amino acid substitution matrices from an information theoretic perspective. J. Mol. Biol. 219, 555–565 (1991)

    Article  Google Scholar 

  17. Eddy, S.R.: Where did the BLOSUM62 alignment score matrix come from? Nature Biotechnology (2004)

    Google Scholar 

  18. Liisa, B., Koski, G., Golding, B.: The Closest BLAST Hit Is Often Not the Nearest Neighbor. Journal Molecular Evolution 52, 540–542 (2001)

    Article  Google Scholar 

  19. Risler, J.L., Delorme, M.O., Delacroix, H., Henaut, A.: Amino acid substitutions in structurally related proteins. A pattern recognition approach. J. Mol. Biol. 204, 1019–1029 (1988)

    Article  Google Scholar 

  20. Johnson, M.S., Overington, J.P.: A structural basis for sequence comparisons. An evaluation of scoring methodologies. J. Mol. Biol. 233, 716–738 (1993)

    Google Scholar 

  21. Ziheng, Y., Yoder, A.D.: Estimation of the Transition/Transversion Rate Bias and Species Sampling. J. Mol. Evol. 48, 274–283 (1999)

    Article  Google Scholar 

  22. Darling, A., Carey, L., Feng, W.: The Design, Implementation, and Evaluation of mpiBLAST. In: Proc. of the 4th Intl. Conf. on Linux Clusters, p. 14 (2003)

    Google Scholar 

  23. Cornelis, P.: Pseudomonas: Genomics and Molecular Biology. Caister Academic Press (2008) ISBN 1-904455-19-0

    Google Scholar 

  24. Douglas, A.E.: Nutritional interactions in insect-microbial symbioses: Aphids and their symbiotic bacteria Buchnera. Annual Review of Entomology 43, 17–38 (1998)

    Article  Google Scholar 

  25. Karch, H., Tarr, P., Bielaszewska, M.: Enterohaemorrhagic Escherichia coli in human medicine. Int. J. Med. Microbiol. 295(67), 405–418 (2005)

    Article  Google Scholar 

  26. Brayton, K.A., Kappmeyer, L.S., Herndon, D.R., Dark, M.J., Tibbals, D.L., et al.: Complete genome sequencing of Anaplasma marginale reveals that the surface is skewed to two superfamilies of outer membrane proteins. Proc. Natl. Acad. Sci. USA 102, 844–849 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to German Retamosa .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Retamosa, G., de Pedro, L., Gonzalez, I., Tamames, J. (2014). High Performance Genomic Sequencing: A Filtered Approach. In: Saez-Rodriguez, J., Rocha, M., Fdez-Riverola, F., De Paz Santana, J. (eds) 8th International Conference on Practical Applications of Computational Biology & Bioinformatics (PACBB 2014). Advances in Intelligent Systems and Computing, vol 294. Springer, Cham. https://doi.org/10.1007/978-3-319-07581-5_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-07581-5_16

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-07580-8

  • Online ISBN: 978-3-319-07581-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics