Abstract
Protein and DNA homology detection systems are an essential part in computational biology applications. These algorithms have changed over the time from dynamic programming approaches by finding the optimal local alignment between two sequences to statistical approaches with different kinds of heuristics that minimize former executions times. However, the continuously increasing size of input datasets is being projected into the use of High Performance Computing (HPC) hardware and software in order to address this problem. The aim of the research presented in this paper is to propose a new filtering methodology, based on general-purpose graphical processor units (GP-GPUs) and multi-core processors, for removing those sequences considered irrelevant in terms of homology and similarity. The proposed methodology is completely independent from the homology detection algorithm. This approach is very useful for researchers and practitioners because they do not need to understand a new algorithm. This design has been approved by the National Biotechnology Research Center of Spain (CNB).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
National Human Genome Research Institute. Why are genetics and genomics important to my health (2014), http://www.genome.gov/19016904
Weiss, B.: Genomics companies ripe for flurry of mergers. The Wall Street Journal (2013), http://www.marketwatch.com/story/genomics-companies-ripe-for-flurry-of-mergers-2013-04-16
Humphries, C.: A Hospital Takes Its Own Big-Data Medicine. MIT Technology Review (2013), http://www.technologyreview.com/news/518916/a-hospital-takes-its-own-big-data-medicine/
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic Local Alignment Search Tool. J. Mol. Biol. 215, 403–410 (1990)
Burrows, M., Wheeler, D.J.: A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation, Palo Alto, California (1994)
Kent, W.J.: BLAT the BLAT-like alignment tool. Gen. Res. 12, 656–664 (2002)
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. Journal of Molecular Biology 147, 195–197 (1991)
Aho, A.V., Corasick, M.J.: Efficient string matching: An aid to bibliographic search. Comm. ACM 18(6), 333–340 (1975)
Nordin, M., Rahman, A., Yazid, M., Saman, M., Ahmad, A., Osman, A., Tap, M.: A Filtering Algorithm for Efficient Retrieving of DNA Sequence. International Journal of Computer Theory and Engineering 1(2), 1793–8201 (2009)
Xiao, S., Lin, H., Feng, W.: Accelerating Protein Sequence Search in a Heterogeneous Computing System. In: Parallel & Distributed Processing Symposium, IPDPS (2011)
Vouzis, P.D., Sahinidis, N.V.: GPU-BLAST: using graphics processors to accelerate protein sequence alignment. Bioinformatics 27(2), 182–188 (2011)
Wootton, J.C., Federhen, S.: Statistics of local complexity in amino acid sequences and sequence databases. Comput. Chem. 17, 149–163 (1993)
Tatusova, T.A., Madden, T.L.: BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequences. FEMS Microbiol. Lett. 174, 247–250 (1999)
Dayhoff, M.O., Schwartz, R.M., Orcutt, B.C.: A Model of Evolutionary Change in Proteins. In: Dayhoff, M.O. (ed.) Atlas of Protein Sequence and Structure, vol. 5(3), pp. 345–352 (1978)
Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proceedings of National Academic Science USA 89, 10915–10919 (1992)
Altschul, S.F.: Amino acid substitution matrices from an information theoretic perspective. J. Mol. Biol. 219, 555–565 (1991)
Eddy, S.R.: Where did the BLOSUM62 alignment score matrix come from? Nature Biotechnology (2004)
Liisa, B., Koski, G., Golding, B.: The Closest BLAST Hit Is Often Not the Nearest Neighbor. Journal Molecular Evolution 52, 540–542 (2001)
Risler, J.L., Delorme, M.O., Delacroix, H., Henaut, A.: Amino acid substitutions in structurally related proteins. A pattern recognition approach. J. Mol. Biol. 204, 1019–1029 (1988)
Johnson, M.S., Overington, J.P.: A structural basis for sequence comparisons. An evaluation of scoring methodologies. J. Mol. Biol. 233, 716–738 (1993)
Ziheng, Y., Yoder, A.D.: Estimation of the Transition/Transversion Rate Bias and Species Sampling. J. Mol. Evol. 48, 274–283 (1999)
Darling, A., Carey, L., Feng, W.: The Design, Implementation, and Evaluation of mpiBLAST. In: Proc. of the 4th Intl. Conf. on Linux Clusters, p. 14 (2003)
Cornelis, P.: Pseudomonas: Genomics and Molecular Biology. Caister Academic Press (2008) ISBN 1-904455-19-0
Douglas, A.E.: Nutritional interactions in insect-microbial symbioses: Aphids and their symbiotic bacteria Buchnera. Annual Review of Entomology 43, 17–38 (1998)
Karch, H., Tarr, P., Bielaszewska, M.: Enterohaemorrhagic Escherichia coli in human medicine. Int. J. Med. Microbiol. 295(67), 405–418 (2005)
Brayton, K.A., Kappmeyer, L.S., Herndon, D.R., Dark, M.J., Tibbals, D.L., et al.: Complete genome sequencing of Anaplasma marginale reveals that the surface is skewed to two superfamilies of outer membrane proteins. Proc. Natl. Acad. Sci. USA 102, 844–849 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Retamosa, G., de Pedro, L., Gonzalez, I., Tamames, J. (2014). High Performance Genomic Sequencing: A Filtered Approach. In: Saez-Rodriguez, J., Rocha, M., Fdez-Riverola, F., De Paz Santana, J. (eds) 8th International Conference on Practical Applications of Computational Biology & Bioinformatics (PACBB 2014). Advances in Intelligent Systems and Computing, vol 294. Springer, Cham. https://doi.org/10.1007/978-3-319-07581-5_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-07581-5_16
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07580-8
Online ISBN: 978-3-319-07581-5
eBook Packages: EngineeringEngineering (R0)