Abstract
Previously [1], we reported a coarse-grained parallel computational approach to identifying rare molecular evolutionary events often referred to as horizontal gene transfers. Very high degrees of parallelism (up to 65x speedup on 4,096 processors) were reported, yet the overall execution time for a realistic problem size was still on the order of 12 days. With the availability of large numbers of compute clusters, as well as genomic sequence from more than 2,000 species containing as many as 35,000 genes each, and trillions of sequence nucleotides in all, we demonstrated the computational feasibility of a method to examine "clusters" of genes using phylogenetic tree similarity as a distance metric. A full serial solution to this problem requires years of CPU time, yet only makes modest IPC and memory demands; thus, it is an ideal candidate for a grid computing approach involving low-cost compute nodes. This paper now describes a multiple granularity parallelism solution that includes exploitation of multi-core shared memory nodes to address fine-grained aspects in the tree-clustering phase of our previous deployment of XenoCluster 1.0. In addition to benchmarking results that show up to 80% speedup efficiency on 8 CPU cores, we report on the biological accuracy and relevance of our results compared to a reported set of known xenologs in yeast.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Walters, J., Casavant, T., Robinson, J., Bair, T., Braun, T., Scheetz, T.: XenoCluster: A Grid Computing Approach to Finding Ancient Evolutionary Anomolies. In: Malyshkin, V.E. (ed.) PaCT 2005. LNCS, vol. 3606, pp. 355–366. Springer, Heidelberg (2005)
Tatusov, R.L., Fedorova, N.D., Jackson, J.D., Jacobs, A.R., Kiryutin, B., Koonin, E.V., Krylov, D., Mazumder, R., Mekhedov, S.L., Nikolskaya, A.N., Rao, B.S., Smirnov, S., Sverdlov, A.V., Vasudevan, S., Wolf, Y.I., Yin, J.J., Natale, D.A.: The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4(1), 41 (2003)
Li, L., Stoeckert Jr., C., Roos, D.S.: OrthoMCL. Identification of Ortholog Groups for Eukaryotic Genomes. Genome Res. 13, 2178–2189 (2003)
Lee, Y., Sultana, R., Pertea, G., Cho, J., Karamycheva, S., Tsia, J., Parvizi, B., Cheung, F., Tonescu, V., White, J., Holt, I., Liang, F., Quackenbush, J.: Cross-referencing eukaryotic genomes: TIGR orthologous gene alignments (TOGA). Genome Research 12(3), 493–502 (2002)
Felsenstein, J.: PHYLIP - Phylogeny Inference Package (Version 3.2). Cladistics 5, 164–166 (1989)
Swofford, D.: LPAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4. Sinauer Associates, Sunderland, Massachusetts (2003)
Alexandros, S.: RAxML-VI-HPC: Maximum Likelihood-based Phylogenetic Analyses with Thousands of Taxa and Mixed Models. Bioinformatics 22(21), 2688–2690 (2006)
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 15, 403–410 (1990)
Pruitt, K.D., Katz, K., Sicotte, H., Maglott, D.R.: Introducing RefSeq and LocusLink: curated human genome resources at the NCBI. Trends Genet. 16(1), 44–47 (2000)
Stajich, J.E., Block, D., Boulez, K., Brenner, S.E., Chervitz, S.A., Dagdigian, C., Fuellen, G., Gilbert, J.G.R., Korf, I., Lapp, H., Lehvaslaiho, H., Matsalla, C., Mungall, C.J., Osborne, B.I., Pocock, M.R., Schattner, P., Senger, M., Stein, L.D., Stupka, E.D., Wilkinson, M., Birney, E.: The Bioperl Toolkit: Perl modules for the life sciences. Genome Research 12(10), 1611–1618 (2002)
PBS Pro, http://www.pbspro.com/
Thompson, J.D., Higgins, D.G., Gibson, T.J.: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680 (1994)
Wang, J.T.L., Shan, H., Shasha, D., Piel, W.H.: TreeRank: A Similarity Measure for Nearest Neighbor Searching in Phylogenetic Databases. In: Proceedings of the 15th International Conference on Scientific and Statistical Database Management (SSDBM 2003), Cambridge, Massachusetts, pp. 171–180 (2003)
Nichols, B., Buttlar, D., Farrell, J.P.: Pthreads Programming A POSIX Standard for Better Multiprocessing. O’Reilly, Sebastopol (1996)
Squyres, J.M., Lumsdaine, A.: A Component Architecture for LAM/MPI. In: Dongarra, J., Laforenza, D., Orlando, S. (eds.) EuroPVM/MPI 2003. LNCS, vol. 2840, pp. 379–387. Springer, Heidelberg (2003)
Hall, C., Brachat, S., Dietrich, F.S.: Contribution of horizontal gene transfer to the evolution of Saccharomyces cerevisiae. Eukaryot Cell. 4(6), 1102–1115 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Walters, J.D., Bair, T.B., Braun, T.A., Scheetz, T.E., Robinson, J.P., Casavant, T.L. (2009). Multi-granularity Parallel Computing in a Genome-Scale Molecular Evolution Application. In: Malyshkin, V. (eds) Parallel Computing Technologies. PaCT 2009. Lecture Notes in Computer Science, vol 5698. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03275-2_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-03275-2_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03274-5
Online ISBN: 978-3-642-03275-2
eBook Packages: Computer ScienceComputer Science (R0)