Abstract
In the era of Big Data in Life Sciences, efficient processing and analysis of vast amounts of sequence data is becoming an ever daunting challenge. Among such analyses, sequence alignment is one of the most commonly used procedures, as it provides useful insights on the functionality and relationship of the involved entities. Sequence alignment is one of the most common computational bottlenecks in several bioinformatics workflows. We have designed and implemented a time-efficient distributed modular application for sequence alignment, phylogenetic profiling and clustering of protein sequences, by utilizing the European Grid Infrastructure. The optimal utilization of the Grid with regards to the respective modules, allowed us to achieve significant speedups to the order of 1400%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Hach, F., et al.: SCALCE: boosting sequence compression algorithms using locally consistent encoding. Bioinformatics. 28(23), 3051–3057 (2012)
Jourdren, L., et al.: Eoulsan: a cloud computing-based framework facilitating high throughput sequencing analyses. Bioinformatics. 28(11), 1542–1543 (2012)
Vouzis, P., et al.: GPU-BLAST: using graphics processors to accelerate protein sequence alignment. Bioinformatics 27(2), 182–188 (2011)
Chung, W.C., et al.: CloudDOE: a user-friendly tool for deploying Hadoop clouds and analyzing high-throughput sequencing data with MapReduce. PLoS One 9(6), e98146 (2014)
Jun, G., et al.: An efficient and scalable analysis framework for variant extraction and refinement from population scale DNA sequence data. Genome Res. 16. pii: gr.176552.114 (2015)
Decap, D., et al.: Halvade: scalable sequence analysis with MapReduce. Bioinformatics. 26. pii: btv179 (2015)
Lobo, I.: Basic Local Alignment Search Tool (BLAST). Nature Education 1(1), 215 (2008)
Enright, A.J., Van Dongen, S.: C. A. Ouzounis.: An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30(7), 1575–1584 (2002)
Pellegrini, M., et al.: Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles. Proc. Natl. Acad. Sci. USA 96, 4285–4288 (1999)
Psomopoulos, F.E., Mitkas, P.A., Ouzounis, C.A.: Detection of Genomic Idiosyncrasies Using Fuzzy Phylogenetic Profiles. PLoS ONE 8(1), e52854 (2013)
Gómez, J., et al.: BioJS: an open source JavaScript framework for biological data visualization. Bioinformatics 29(8), 1103–1104 (2013)
Psomopoulos, F.E, et al.: The Chlamydiales Pangenome Revisited: Structural Stability and Functional Coherence. Genes 3(2), 291–319
Goecks, J., et al.: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11(8), R86 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Vrousgou, O.T., Psomopoulos, F.E., Mitkas, P.A. (2015). A Grid-Enabled Modular Framework for Efficient Sequence Analysis Workflows. In: Iliadis, L., Jayne, C. (eds) Engineering Applications of Neural Networks. EANN 2015. Communications in Computer and Information Science, vol 517. Springer, Cham. https://doi.org/10.1007/978-3-319-23983-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-23983-5_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23981-1
Online ISBN: 978-3-319-23983-5
eBook Packages: Computer ScienceComputer Science (R0)