Abstract
When social network has reached hundreds of million users, the analysis of data in social network services becomes very important. Understanding how nodes interconnect in large graphs is an essential problem in many fields. In order to find connecting nodes between two nodes or two groups of source nodes in huge graphs, we propose a parallelized data-mining algorithm to get the shortest path between nodes in a social network based on HBase distributed key/value store. Our algorithm can achieve the shortest path among different nodes in network under the parallel environment. We analyze the social network model by this algorithm first, and then optimize the output from cloud platform by using the intermediary degrees and degree central algorithm. Finally, with a simulated social network, we validate the efficiency of the proposed algorithm. The experiment results indicate that our algorithm can improve the efficiency of parallel breath-first search (BSF).
Similar content being viewed by others
References
Barabási A-L, Albert R (1999) Emergence of scaling in random networks. Science 286:509
Barabási A-L, Albert R, Jeong H (1999) Mean-field theory for scale-free random networks. Physica A 272:173
Brandes U (2001) A faster algorithm for betweenness centrality. J Math Sociol 25(2):163–177
Chang F, et al. (2006) Bigtable: a distributed storage system for structured data. OSDI
Dean J, Ghemawat S (2008) Mapreduce simplified data processing on large clusters. Commun ACM 51(1):107–113
Dijkstra EW (1959) A note on two problems in connexion with graphs. Numerische Mathematik 1(1):269–271
Gu L, Huang HL, Zhang XD (2013) The clustering coefficient and the diameter of small-world networks. Acta Mathematica Sinica 29:199–208 English Series
Holme P, Kim BJ (2002) Growing scale free networks with tunable clustering. Phys Rev E 65:026107
Hoque I, Gupta IC (2012) Disk layout techniques for online social network data. IEEE INTERNET COMPUTING
Lakshman A, Malik P (2010) Cassandra: a decentralized structured storage system. SIGOPS 44(2):35–40
Lin J, Dyer C (2010) Data-intensive text processing with MapReduce, ser. Synthesis lectures on human language technologies. Morgan and Claypool Publishers, Florida
Lu Z, Fan L, Wu W, Thuraisingham B, Yang K (2014) Efficient influence spread estimation for influence maximization under the linear threshold model. To appear in computational, social networks
McCubbin C, Perozzi B (2011) Finding the ‘Needle’: Locating interesting nodes using the K-shortest paths algorithm in MapReduce. 2011 11th IEEE International Conference on Data Mining Workshops
Missen MMSC (2008) The small world of web network graphs. International Multi Topic Conference on Wireless Networks, Information Processing and Systems, IMTIC
Qin L, Li H (2011) Centrality analysis of BBS reply networks. 2011 International Conference on Information Technology, Computer Engineering and Management Sciences, ICM 2011. September 24-25 2011
Škrabálek J, Kunc P, Nguyen F (2013) Towards effective social network system implementation/new trends in databases and information systems. Springer Berlin, Heidelberg, pp 327–336
Taylor RC (2010) An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. BMC Bioinform 11(Suppl 12):S1
Watts DJ, Strogatz SH (1998) Collective dynamics of ‘small-world’ networks. Nature 393:440
Acknowledgments
This study was supported by the National Natural Science Foundation of China (Grant No. 61202163, 61240035, 61373100); Natural Science Foundation of Shanxi Province (Grant No. 2012011015-1) and Programs for Science and Technology Development of Shanxi Province (Grant No. 20120313032-3). This work was also supported in part by the US National Science Foundation (NSF) under Grant no. CNS-1016320 and CCF-0829993.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Qiang, Y., Pei, B., Wu, W. et al. Improvement of path analysis algorithm in social networks based on HBase. J Comb Optim 28, 588–599 (2014). https://doi.org/10.1007/s10878-013-9675-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10878-013-9675-z