Abstract
Community discovery is an essential topic in social network analysis since it provides a way for recursively decomposing a large social graph to easily interpretable subgraphs. The implementation of four major community discovery algorithms, namely the Newman–Girvan or Edge Betweeness, the Walktrap, the Louvain, and the CNM as Java analytics over Neo4j is described. Their correctness was evaluated functionally in two real Twitter graphs with vastly different characteristics. This was done on the grounds that a successful structural graph partitioning should eventually be reflected in the network functionality domain. Additionally, most real world graphs lack a list of ground truth communities, rendering a structural verification difficult, while functionality can be easily observed in most cases. Naturally, this renders the evaluation network-specific, as different social networks have different operational characteristics. The primary algorithmic finding was that the Louvain algorithm yields Twitter communities whose distribution size matches closer, in terms of the Kullback–Leibler divergence, the tweet and retweet distributions, with Newman–Girvan, Walktrap, and CNM following in that order.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Agichtein E, Castillo C, Donato D, Gionis A, Mishne D (2008) Finding high-quality content in social media. WSDM, ACM, New York, pp 183–194
Aiello W, Chung F, Lu L (2000) A random graph model for massive graphs. STOC, ACM, New York, pp 171–180
Barrière L, Huemer C, Mitsche D, Orden D (2011) On the Fiedler value of large planar graphs. Electr Notes Discrete Math 38:111–116
Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of community hierarchies in large networks. J Stat Mech Theory Exp P1000
Bollobás B (1998) Modern graph theory, volume 184. Graduate texts in mathematics. Springer, New York
Carrington PJ, Scott J, Wasserman S (2005) Models and methods in social network analysis. Cambridge University Press, Cambridge
Clauset A, Newman ME, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70(6):066111
Cox IJ, Rao S, Zhong Y (1996) Ratio regions: a technique for image segmentation. In: ICPR, pp 557–564
Drakopoulos G (2016) Tensor fusion of social structural and functional analytics over Neo4j. In: IISA, Chalkidiki, Greece
Drakopoulos G, Kanavos A, Tsakalidis A (2016) A Neo4j implementation of fuzzy random walkers. In: SETN, Thessaloniki, Greece
Drakopoulos G, Kanavos A, Karydis I, Sioutas S, Vrahatis AG (2017a) Tensor-based semantically-aware topic clustering of biomedical documents. Computation 5(3):34
Drakopoulos G, Kanavos A, Tsakalidis K (2017b) Fuzzy random walkers with second order bounds: an asymmetric analysis. Algorithms 10(2):40
Drakopoulos G, Kanavos A, Tsolis D, Mylonas P, Sioutas S (2017c) Towards a framework for tensor ontologies over Neo4j: Representations and operations. In: IISA, Larnaca, Cyprus
Flake GW, Lawrence S, Giles CL (2000) Efficient identification of web communities. KDD, ACM, New York, pp 150–160
Fortunato S (2010) Community detection in graphs. Phys Rep 486:75–174
Girvan M, Newman M (2002) Community structure in social and biological networks. PNAS 99(2):7821–7826
Hagen LW, Kahng AB (1991) Fast spectral methods for ratio cut partitioning and clustering. In: ICCAD, pp 10–13
Jurczyk P, Agichtein E (2007) Discovering authorities in question answer communities by using link analysis. In: CIKM, ACM press, pp 919–922
Kanavos A, Drakopoulos G, Tsakalidis A (2017) Graph community discovery algorithms in Neo4j with a regularization-based evaluation metric. In: WEBIST, Porto, Portugal
Kernighan BW, Lin S (1970) An efficient heuristic procedure for partitioning graphs. Bell Syst Tech J 49(2):291–307
Kleinberg JM (1998) Authoritative sources in a hyperlinked environment. In: SODA, pp 668–677
Kontopoulos S, Drakopoulos G (2014) A space efficient scheme for graph representation. In: ICTAI, Limassol, Cyprus, pp 299–303
Langville A, Meyer C (2006) Google’s PageRank and beyond: the science of search engine rankings. Princeton University Press, Princeton
Leskovec J (2011) Social media analytics: tracking, modeling and predicting the flow of information through networks. WWW, ACM, New York, pp 277–278
Leskovec J, Chakrabarti D, Kleinberg J, Faloutsos C, Ghahramani Z (2010) Kronecker graphs: an approach to modeling networks. J Mach Learn Res 11:985–1042
Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein JM (2010) GraphLab: a new parallel framework for machine learning. In: UAI
Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: ICDM, pp 135–146
Newman ME (2004a) Detecting community structure in networks. Eur Phys J B Condens Matter Complex Syst 38(2):321–330
Newman ME (2004b) Fast algorithm for detecting community structure in networks. Phys Rev E 69(6):066133
Newman ME (2010) Netkworks: an introduction. Oxford University Press, Oxford
Pal A, Counts S (2011) Identifying topical authorities in microblogs. In: WSDM, pp 45–54
Pons P, Latapy M (2005) Computing communities in large networks using random walks
Scott J (2000) Social network analysis: a handbook. SAGE Publications Ltd, Thousand Oaks
Shi J, Malik J (2000) Normalized cuts and image segmentation. TPAMI 22(8):888–905
Tagkalakis F, Papagiannaki A, Drakopoulos G, Megalooikonomou V (2016) Augmenting fMRI-generated brain connectivity with temporal information. In: IISA, IEEE, Chalkidiki, Greece
Wei YA, Cheng C (1989) Towards efficient hierarchical designs by ratio cut partitioning. In: IEEE/ACM international conference on computer-aided design, (ICCAD), pp 298–301
Weng J, Lim EP, Lim J, Jiang QH (2010) TwitterRank: finding topic-sensitive influential Twitterers. In: WSDM, pp 261–270
Xin RS, Gonzalez JE, Franklin MJ, Stoica I (2013) GraphX: a resilient distributed graph system on Spark. In: First international workshop on graph data management experiences and systems, ACM
Zamparas V, Kanavos A, Makris C (2015) Real time analytics for measuring user influence on Twitter. In: ICTAI, Vietri sul Mare, Italy
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations
Rights and permissions
About this article
Cite this article
Drakopoulos, G., Gourgaris, P. & Kanavos, A. Graph communities in Neo4j. Evolving Systems 11, 397–407 (2020). https://doi.org/10.1007/s12530-018-9244-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12530-018-9244-x