Abstract
The various clustering methods are widely applied in analyzing biological interaction networks, such as the protein–protein interaction and the genetic interaction networks. With the rapid growth of these biological datasets in scale, much longer runtime is required to make cluster analyses on them. In this paper, we propose a novel parallel Markov clustering (MCL) method based on the ELLPACK-R sparse matrix format that can run on multiple graphic processing units (GPUs) equipped standalone computers. The method is implemented using the Compute Unified Device Architecture (CUDA) programming framework, and fine-grained warp-level optimization is introduced for improving the performance. The BioGRID, a large-scale and freely accessible database of protein and genetic interactions, is adopted as the dataset in the experiment. The method has been assessed on a desktop computer equipped with two NVIDIA GTX 1070 GPUs. The results show that the proposed multi-GPU method can conduct MCL clustering on the full-size BioGRID database with about 6.5 min, that is much faster than the CPU serial MCL implementation which needs almost an hour and a half execution time.
Similar content being viewed by others
References
Van Dongen SM (2000) Graph clustering by flow simulation. University of Utrecht, Utrecht
Brohee S, Van Helden J (2006) Evaluation of clustering algorithms for protein–protein interaction networks. BMC Bioinform 7(1):488
Sharan R, Ulitsky I, Shamir R (2007) Network-based prediction of protein function. Mol Syst Biol 3(1):88
Vlasblom J, Wodak SJ (2009) Markov clustering versus affinity propagation for the partitioning of protein interaction graphs. BMC Bioinform 10(1):99
Keckler SW, Dally WJ, Khailany B et al (2011) GPUs and the future of parallel computing. IEEE Micro 31(5):7–17
NVIDIA (2019) CUDA C programming guide v10.1.243. https://docs.nvidia.com/cuda/cuda-c-programming-guide/
AMD (2019) OpenCL programming guide. http://developer.amd.com/wordpress/media/2013/07/AMD_Accelerated_Parallel_Processing_OpenCL_Programming_Guide-rev-2.7.pdf
Zhou W, Cai Z, Lian B et al (2017) Protein database search of hybrid alignment algorithm based on GPU parallel acceleration. J Supercomput 73(10):4517–4534
Zhou W, Cai Z, Lian B et al (2018) A multi-GPU protein database search model with hybrid alignment manner on distributed GPU clusters. Concurr Comput Pract Exp 30(18):e4522
Rani S, Gupta OP (2017) CLUS\_GPU-BLASTP: accelerated protein sequence alignment using GPU-enabled cluster. J Supercomput 73(10):4580–4595
Anderson JA, Lorenz CD, Travesset A (2008) General purpose molecular dynamics simulations fully implemented on graphics processing units. J Comput Phys 227(10):5342–5359
Bustamam A, Burrage K, Hamilton NA (2012) Fast parallel Markov clustering in bioinformatics using massively parallel computing on GPU with CUDA and ELLPACK-R sparse format. IEEE/ACM Trans Comput Biol Bioinform (TCBB) 9(3):679–692
Saadi H, Taboudjemat NN, Rahmoun A et al (2019) Efficient GPU-based parallelization of solvation calculation for the blind docking problem. J Supercomput. https://doi.org/10.1007/s11227-019-02834-5
Escobar JJ, Ortega J, Díaz AF et al (2019) Time-energy analysis of multilevel parallelism in heterogeneous clusters: the case of EEG classification in BCI tasks. J Supercomput 75(7):3397–3425
Cheng J, Grossman M, McKercher T (2014) Professional Cuda C programming. Wiley, Indianapolis
Saad Y (2003) Iterative methods for sparse linear systems. SIAM, Philadelphia
Vazquez F, Ortega G, Fernández JJ et al (2010) Improving the performance of the sparse matrix vector product with GPUs. In: 2010 10th IEEE International Conference on Computer and Information Technology. IEEE, pp 1146–1151
Wikipedia (2019) Markov chain. https://en.wikipedia.org/wiki/Markov_chain
Stark C, Breitkreutz BJ, Reguly T et al (2006) BioGRID: a general repository for interaction datasets. Nucleic Acids Res 34(suppl-1):D535–D539
He L, Lu L, Wang Q (2017) An optimal parallel implementation of Markov Clustering based on the coordination of CPU and GPU. J Intell Fuzzy Syst 32(5):3609–3617
Azad A, Pavlopoulos GA, Ouzounis CA et al (2018) HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks. Nucleic Acids Res 46(6):e33–e33
Liu Y, Schmidt B (2018) Lightspmv: faster cuda-compatible sparse matrix-vector multiplication using compressed sparse rows. J Signal Process Syst 90(1):69–86
Acknowledgements
The authors would like to thank all the reviewers for their precious comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Fu, Y., Zhou, W. A novel parallel Markov clustering method in biological interaction network analysis under multi-GPU computing environment. J Supercomput 76, 7689–7706 (2020). https://doi.org/10.1007/s11227-020-03193-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-020-03193-2