Abstract
In this manuscript, we present an optimized and parallel version of our previous work IMSAME, an exhaustive gapped aligner for the pairwise and accurate comparison of metagenomes. Parallelization strategies are applied to take advantage of modern multiprocessor architectures. In addition, sequential optimizations in CPU time and memory consumption are provided. These algorithmic and computational enhancements enable IMSAME to calculate near optimal alignments which are used to directly assess similarity between metagenomes without requiring reference databases. We show that the overall efficiency of the parallel implementation is superior to 80% while retaining scalability as the number of parallel cores used increases. Moreover, we also show that sequential optimizations yield up to 8\(\times \) speedup for scenarios with larger data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Alyass, A., Turcotte, M., Meyre, D.: From big data analysis to personalized medicine for all: challenges and opportunities. BMC Med. Genomics 8(1), 33 (2015)
Altschul, S.F., Madden, T.L., Schffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389–3402 (1997)
Darling, A., Carey, L., Feng, W.C.: The design, implementation, and evaluation of mpiBLAST. In: Proceedings of ClusterWorld, pp. 13–15 (2003)
http://www.timelogic.com/catalog/757. Accessed 9 May 2017
Liu, Y., Schmidt, B.: GSWABE: faster GPU accelerated sequence alignment with optimal alignment retrieval for short DNA sequences. Concurr. Comput. Pract. Exp. 27(4), 958–972 (2015)
Liu, Y., Schmidt, B.: CUSHAW2-GPU: empowering faster gapped short-read alignment using GPU computing. IEEE Design Test 31(1), 31–39 (2014)
Liu, Y., Tran, T.T., Lauenroth, F., Schmidt, B.: SWAPHI-LS: Smith-Waterman algorithm on Xeon Phi coprocessors for long DNA sequences. In: Cluster Computing (CLUSTER), 2014 IEEE IC, pp. 257–265, September 2014
Langmead, B.: Aligning short sequencing reads with Bowtie. Curr. Protoc. Bioinform. 11–17 (2010)
Li, H., Durbin, R.: Fast and accurate short read alignment with Burrows Wheeler transform. Bioinformatics 25(14), 1754–1760 (2009)
Jing, G., Sun, Z., Wang, H., Gong, Y., Huang, S., Ning, K., Su, X.: Parallel-META 3: comprehensive taxonomical and functional analysis platform for efficient comparison of microbial communities. Sci. Rep. 7, 40371 (2017)
Nichols, B., Buttlar, D., Farrell, J.: A POSIX standard for better multiprocessing. O’Reilly Media Inc., Sebastopol (1996)
Perez-Wohlfeil, E., Torreno, O., Trelles, O.: Pairwise and incremental multi-stage alignment of metagenomes: a new proposal. In: International Conference on Bioinformatics and Biomedical Engineering, pp. 74–80, April 2017
Ondov, B.D., Treangen, T.J., Melsted, P., Mallonee, A.B., Bergman, N.H., Koren, S., Phillippy, A.M.: Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol. 17(1), 132 (2016)
Benoit, G., Peterlongo, P., Mariadassou, M., Drezen, E., Schbath, S., Lavenier, D., Lemaitre, C.: Multiple comparative metagenomics using multiset k-mer counting. PeerJ Comput. Sci. 2, e94 (2016)
Maillet, N., Collet, G., Vannier, T., Lavenier, D., Peterlongo, P.: COMMET: comparing and combining multiple metagenomic datasets. In: Bioinformatics and Biomedicine (BIBM), 2014 IEEE IC, pp. 94–98, November 2014
Gotoh, O.: An improved algorithm for matching biological sequences. J. Mol. Biol. 162(3), 705–708 (1982)
Torreno, O., Trelles, O.: Breaking the computational barriers of pairwise genome comparison. BMC Bioinform. 16(1), 250 (2015)
Pitteway, M.L.V., Watkinson, D.J.: Bresenham’s algorithm with Grey scale. Commun. ACM 23(11), 625–626 (1980)
Polychronopoulos, C.D., Kuck, D.J.: Guided self-scheduling: a practical scheduling scheme for parallel supercomputers. IEEE Trans. Comput. 100(12), 1425–1439 (1987)
http://www.scbi.uma.es/site/scbi/hardware. Accessed 9 May 2017
Hanley, J.A., McNeil, B.J.: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1), 29–36 (1982)
Acknowledgments
This work has been partially supported by the European project ELIXIR- EXCELERATE (grant no. 676559), the Spanish national projects Plataforma de Recursos Biomoleculares y Bioinformticos (ISCIII-PT13.0001.0012) and RIRAAF (ISCIII-RD12/0013/0006) and the University of Malaga.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Pérez-Wohlfeil, E., Torreno, O., Trelles, O. (2017). Accelerating Exhaustive Pairwise Metagenomic Comparisons. In: Ibrahim, S., Choo, KK., Yan, Z., Pedrycz, W. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2017. Lecture Notes in Computer Science(), vol 10393. Springer, Cham. https://doi.org/10.1007/978-3-319-65482-9_46
Download citation
DOI: https://doi.org/10.1007/978-3-319-65482-9_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65481-2
Online ISBN: 978-3-319-65482-9
eBook Packages: Computer ScienceComputer Science (R0)