A survey of current challenges in partitioning and processing of graph-structured data in parallel and distributed systems | Distributed and Parallel Databases Skip to main content
Log in

A survey of current challenges in partitioning and processing of graph-structured data in parallel and distributed systems

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

One of the concepts that attracts attention since entering of big data era is the graph-structured data. Suitable frameworks to handle such data would face several constraints, especially scalability, partitioning challenges, processing complexity and hardware configurations. Unfortunately, although several works deal with big data issues, there is a lack of literature review concerning the challenges related to query answering on large-scale graph data. In this survey paper, we review current problems related to the partitioning and processing of graph-structured data. We discuss existing graph processing systems and provide some insights to know how to choose the right system for parallel and distributed processing of large-scale graph data. Finally, we survey current open challenges in this field.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. http://eulerarchive.maa.org/.

  2. https://snap.stanford.edu/.

References

  1. Armstrong, T.G., Ponnekanti, V., Borthakur, D., Callaghan, M.: LinkBench: a database benchmark based on the facebook social graph. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD ’13, pp. 1185–1196. ACM, New York (2013)

  2. Watts, D.J., Strogatz, S.H.: Collective dynamics of ’small-world’ networks. Nature 393(6684), 440–442 (1998)

    MATH  Google Scholar 

  3. Travers, J., Milgram, S.: An experimental study of the small world problem. Sociometry 32(4), 425–443 (1969)

    Google Scholar 

  4. Barabási, A.-L., Albert, R.: Emergence of scaling in random networks. Science 286(5439), 509–512 (1999)

    MathSciNet  MATH  Google Scholar 

  5. Albert, R., Barabási, A.-L.: Statistical mechanics of complex networks. Rev. Mod. Phys. 74(1), 47 (2002)

    MathSciNet  MATH  Google Scholar 

  6. Abeywickrama, T., Cheema, M.A., Taniar, D.: K-nearest neighbors on road networks: a journey in experimentation and in-memory implementation. Proc. VLDB Endow. 9(6), 492–503 (2016)

    Google Scholar 

  7. Beutel, A.: User behavior modeling with large-scale graph analysis. PhD thesis, University of Trento (2016)

  8. Czerepicki, A.: Application of graph databases for transport purposes. Bull. Pol. Acad. Sci. Tech. Sci. 64(3), 457–466 (2016)

    Google Scholar 

  9. Miler, M., Medak, D., Odobašióc, D.: The shortest path algorithm performance comparison in graph and relational database on a transportation network. Promet Traffic Transp. 26(1), 75–82 (2014)

    Google Scholar 

  10. Have, C.T., Jensen, L.J.: Are graph databases ready for bioinformatics? Bioinformatics 29(24), 3107–3108 (2013)

    Google Scholar 

  11. Yoon, B.-H., Kim, S.-K., Kim, S.-Y.: Use of graph database for the integration of heterogeneous biological data. Genomics Inform. 15(1), 19–27 (2017)

    Google Scholar 

  12. Adoni, W.Y.H., Nahhal, T., Aghezzaf, B., Elbyed, A.: MRA*: parallel and distributed path in large-scale graph using MapReduce-A* based approach. In: Ubiquitous Networking, Lecture Notes in Computer Science, pp. 390–401. Springer, Cham (2017)

  13. Aridhi, S., d’Orazio, L., Maddouri, M., Mephu, N.E.: Density-based data partitioning strategy to approximate large-scale subgraph mining. Inf. Syst. 48, 213–223 (2015)

    Google Scholar 

  14. Lakhotia, K., Kannan, R., Prasanna, V.: Accelerating pagerank using partition-centric processing. In: 2018 USENIX Annual Technical Conference (USENIX ATC 18). USENIX Association, Boston (2018)

  15. Plimpton, S.J., Devine, K.D.: MapReduce in MPI for large-scale graph algorithms. Parallel Comput. 37(9), 610–632 (2011)

    Google Scholar 

  16. Kleinberg, J.M., Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.S.: The web as a graph: measurements, models, and methods. In: Computing and Combinatorics, Lecture Notes in Computer Science, pp. 1–17. Springer, Berlin (1999)

  17. Maillo, J., Ramírez, S., Triguero, I., Herrera, F.: kNN-IS: an iterative spark-based design of the k-nearest neighbors classifier for big data. Knowl. Based Syst. 117, 3–15 (2016)

    Google Scholar 

  18. Guo, K., Guo, W., Chen, Y., Qiu, Q., Zhang, Q.: Community discovery by propagating local and global information based on the MapReduce model. Inf. Sci. 323, 73–93 (2015)

    MathSciNet  Google Scholar 

  19. Moon, S., Lee, J.-G., Kang, M., Choy, M., Lee, J.-W.: Parallel community detection on large graphs with MapReduce and GraphChi. Data Knowl. Eng. 104, 17–31 (2016)

    Google Scholar 

  20. Dhillon, I.S.: Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’01, pp. 269–274. ACM, New York (2001)

  21. Junghanns, M., Petermann, A., Neumann, M., Rahm, E.: Management and analysis of big graph data: current systems and open challenges. In: Handbook of Big Data Technologies, pp. 457–505. Springer (2017)

  22. Skhiri, S., Jouili, S.: Large graph mining: recent developments, challenges and potential solutions. In: European Business Intelligence Summer School, pp. 103–124. Springer (2012)

  23. Adoni, W.Y.H., Nahhal, T., Aghezzaf, B., Elbyed, A.: The MapReduce-based approach to improve the shortest path computation in large-scale road networks: the case of A* algorithm. J. Big Data 5(1), 16 (2018)

    Google Scholar 

  24. Cossalter, M., Mengshoel, O., Selker, T.: Visualizing and understanding large-scale Bayesian networks. In: Proceedings of the 17th AAAI Conference on Scalable Integration of Analytics and Visualization, AAAIWS’11-17, pp. 12–21. AAAI Press, Menlo Park (2011)

  25. Gantz, J., Reinsel, D.: Extracting value from chaos. IDC iView 1142(2011), 1–12 (2011)

    Google Scholar 

  26. Alekseev, V.E., Boliac, R., Korobitsyn, D.V., Lozin, V.V.: NP-hard graph problems and boundary classes of graphs. Theor. Comput. Sci. 389(1), 219–236 (2007)

    MathSciNet  MATH  Google Scholar 

  27. Cameron, K., Eschen, E.M., Hoáng, C.T., Sritharan, R.: The complexity of the list partition problem for graphs. SIAM J. Discret. Math. 21(4), 900–929 (2008)

    MathSciNet  MATH  Google Scholar 

  28. Yan, D., Tian, Y., Cheng, J.: Systems for Big Graph Analytics. Springer Briefs in Computer Science. Springer, Cham (2017)

    Google Scholar 

  29. Goel, A.: Neo4j Cookbook Harness the Power of Neo4j to Perform Complex Data Analysis over the Course of 75 Easy-to-Follow Recipes. Packt Publishing, Birmingham (2015)

    Google Scholar 

  30. Guerrieri, A.: Distributed computing for large-scale graphs. PhD thesis, University of Trento (2015)

  31. Yan, D., Huang, L., Jordan, M.I.: Fast approximate spectral clustering. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’09, pp. 907–916. ACM, New York (2009)

  32. Martin, C.H.: Spectral clustering: a quick overview. PhD thesis (2012)

  33. Filippone, M., Camastra, F., Masulli, F., Rovetta, S.: A survey of kernel and spectral methods for clustering. Pattern Recognit. 41(1), 176–190 (2008)

    MATH  Google Scholar 

  34. Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic, NIPS’01, pp. 849–856. MIT Press, Cambridge (2001)

  35. Kong, H., Akakin, H.C., Sarma, S.E.: A generalized Laplacian of Gaussian filter for Blob detection and its applications. IEEE Trans. Cybern. 43(6), 1719–1733 (2013)

    Google Scholar 

  36. Kamvar, S.D., Klein, D., Manning, C.D.: Spectral learning. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence, IJCAI’03, pp. 561–566. Morgan Kaufmann Publishers Inc., San Francisco (2003)

  37. Dhillon, I.S., Guan, Y., Kulis, B.: Kernel k-means: spectral clustering and normalized cuts. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’04, pp. 551–556. ACM, New York (2004)

  38. Qiu, Y., Li, R., Li, J., Qiao, S., Wang, G., Yu, J.X., Mao, R.: Efficient structural clustering on probabilistic graphs. IEEE Trans. Knowl. Data Eng. 31, 1555–1568 (2018)

    Google Scholar 

  39. Aggarwal, C.C., Wang, H.: A survey of clustering algorithms for graph data. In: Aggarwal, C.C., Wang, H. (eds.) Managing and Mining Graph Data, vol. 40, pp. 275–301. Springer, Boston (2010)

    MATH  Google Scholar 

  40. Kernighan, B.W., Lin, S.: An efficient heuristic procedure for partitioning graphs. Bell Syst. Tech. J. 49(2), 291–307 (1970)

    MATH  Google Scholar 

  41. Fiduccia, C.M., Mattheyses, R.M.: A linear-time heuristic for improving network partitions. In: Proceedings of the 19th Design Automation Conference, DAC ’82, pp. 175–181. IEEE Press, Piscataway (1982)

  42. Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20, 359–392 (1998)

    MathSciNet  MATH  Google Scholar 

  43. Karypis, G., Kumar, V.: Multilevel algorithms for multi-constraint graph partitioning. In: Proceedings of the 1998 ACM/IEEE Conference on Supercomputing, SC ’98, pp. 1–13. IEEE Computer Society, Washington, DC (1998)

  44. Karypis, G., Kumar, V.: Multilevel K-way hypergraph partitioning. In: Proceedings of the 36th Annual ACM/IEEE Design Automation Conference, DAC ’99, pp. 343–348. ACM, New York (1999)

  45. Schloegel, K., Karypis, G., Kumar, V.: Parallel multilevel algorithms for multi-constraint graph partitioning. In: Euro-Par 2000 Parallel Processing. Lecture Notes in Computer Science, pp. 296–310. Springer, Berlin (2000)

  46. Apache Spark-Lightning-Fast Cluster Computing. https://spark.apache.org/

  47. Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. HotCloud 10(10), 95 (2010)

    Google Scholar 

  48. Kyrola, A., Blelloch, G., Guestrin, C.: GraphChi: large-scale graph computation on just a PC. In: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI’12, pp. 31–46. USENIX Association, Berkeley (2012)

  49. Johnson, D.S., Aragon, C.R., McGeoch, L.A., Schevon, C.: Optimization by simulated annealing: an experimental evaluation; part I, graph partitioning. Oper. Res. 37(6), 865–892 (1989)

    MATH  Google Scholar 

  50. Rolland, E., Pirkul, H., Glover, F.: Tabu search for graph partitioning. Ann. Oper. Res. 63, 209–232 (1996)

    MATH  Google Scholar 

  51. Bui, T.N., Strite, L.C.: An ant system algorithm for graph bisection. In: Proceedings of the 4th Annual Conference on Genetic and Evolutionary Computation, GECCO’02, pp. 43–51. Morgan Kaufmann Publishers Inc., San Francisco (2002)

  52. Maini, H., Mehrotra, K., Mohan, C., Ranka, S.: Genetic algorithms for graph partitioning and incremental graph partitioning. In: Proceedings of the 1994 ACM/IEEE Conference on Supercomputing, Supercomputing ’94, pp. 449–457. IEEE Computer Society Press, Los Alamitos (1994)

  53. Kim, J., Hwang, I., Kim, Y.-H., Moon, B.-R.: Genetic approaches for graph partitioning: a survey. In: Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation, GECCO ’11, pp. 473–480. ACM, New York (2011)

  54. Chen, R., Weng, X., He, B., Choi, B., Yang, M.: Network Performance Aware Graph Partitioning for Large Graph Processing Systems in the Cloud. Nanyang Technological University, Singapore (2014)

    Google Scholar 

  55. Aggarwal, C.C., Zhao, Y., Yu, P.S.: A framework for clustering massive graph streams. Stat. Anal. Data Min. 3(6), 399–416 (2010)

    MathSciNet  Google Scholar 

  56. Tsourakakis, C., Gkantsidis, C., Radunovic, B., Vojnovic, M.: FENNEL: streaming graph partitioning for massive scale graphs. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, WSDM ’14, pp. 333–342. ACM, New York (2014)

  57. Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: PowerGraph: Distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI’12, pp. 17–30. USENIX Association, Berkeley (2012)

  58. Rahimian, F., Payberah, A.H., Girdzijauskas, S., Jelasity, M., Haridi, S.: A distributed algorithm for large-scale graph partitioning. ACM Trans. Auton. Adapt. Syst. 10(2), 1–24 (2015)

    Google Scholar 

  59. Rahimian, F., Payberah, A.H., Girdzijauskas, S., Haridi, S.: Distributed vertex-cut partitioning. In: IFIP International Conference on Distributed Applications and Interoperable Systems, pp. 186–200. Springer (2014)

  60. Stanton, I., Kliot, G.: Streaming graph partitioning for large distributed graphs. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’12, pp. 1222–1230. ACM, New York (2012)

  61. Tashkova, K., Koros̆ec, P., S̆ilc, J.: A distributed multilevel ant-colony algorithm for the multi-way graph partitioning. Int. J. Bio-Inspired Comput. 3(5), 286–296 (2011)

    Google Scholar 

  62. White, T.: Hadoop: The Definitive Guide, 3rd edn. O’Reilly, Beijing (2012)

    Google Scholar 

  63. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Google Scholar 

  64. Vavilapalli, V.K., Seth, S., Saha, B., Curino, C., O’Malley, O., Radia, S., Reed, B., Baldeschwieler, E., Murthy, A.C., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H.: Apache Hadoop YARN: Yet Another Resource Negotiator, pp. 1–16. ACM Press, Santa Clara (2013)

    Google Scholar 

  65. Al hajj Hassan, M., Bamha, M.: Handling Limits of High Degree Vertices in Graph Processing Using MapReduce and Pregel, Research Report. Université Orléans, INSA Centre Val de Loire, LIFO EA 4022, Orléans (2017)

    Google Scholar 

  66. Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)

    Google Scholar 

  67. Tian, Y., Balmin, A., Corsten, S.A., Tatikonda, S., McPherson, J.: From think like a vertex to think like a graph. Proc. VLDB Endow. 7(3), 193–204 (2013)

    Google Scholar 

  68. Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD ’10, pp. 135–146. ACM, New York (2010)

  69. Sagharichian, M., Naderi, H., Haghjoo, M.: ExPregel: a new computational model for large-scale graph processing. Concur. Comput. Pract. Exp. 27(17), 4954–4969 (2015)

    Google Scholar 

  70. Ching, A.: Giraph: large-scale graph processing infrastructure on Hadoop. In: Proceedings of the Hadoop Summit, Vol. 11 of 3, Santa Clara, pp. 5–9 (2011)

  71. Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.: GraphLab: A new framework for parallel machine learning. In: Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence, UAI’10, pp. 340–349. AUAI Press, Arlington (2010)

  72. Salihoglu, S., Widom, J.: Gps: a graph processing system. In: Proceedings of the 25th International Conference on Scientific and Statistical Database Management, SSDBM, vol. 22, pp. 1–12. ACM, New York (2013)

  73. Yan, D., Cheng, J., Lu, Y., Ng, W.: Blogel: a block-centric framework for distributed computation on real-world graphs. Proc. VLDB Endow. 7(14), 1981–1992 (2014)

    Google Scholar 

  74. Xin, R.S., Gonzalez, J.E., Franklin, M.J., Stoica, I.: Graphx: a resilient distributed graph system on spark. In: First International Workshop on Graph Data Management Experiences and Systems, GRADES ’13, pp. 1–6. ACM, New York (2013)

  75. Junghanns, M., Petermann, A., Gómez, K., Rahm, E.: GRADOOP: Scalable Graph Data Management and Analytics with Hadoop. CoRR abs/1506.00548

  76. Ghemawat, S., Gobioff, H., Leung, S.-T.: The Google file system. In: ACM SIGOPS Operating Systems Review, vol. 37, pp. 29–43. ACM, New York (2003)

  77. Schank, T., Wagner, D.: Finding, counting and listing all triangles in large graphs, an experimental study. In: Nikoletseas, S.E. (ed.) Experimental and Efficient Algorithms. Lecture Notes in Computer Science, pp. 606–609. Springer, Berlin (2005)

    MATH  Google Scholar 

  78. Jain, N., Liao, G., Willke, T.L.: Graphbuilder: scalable graph ETL framework. In: First International Workshop on Graph Data Management Experiences and Systems, GRADES ’13, pp. 1–6. ACM, New York (2013)

  79. Chonbodeechalermroong, A., Hewett, R.: Towards visualizing big data with large-scale edge constraint graph drawing. Big Data Res. 10, 21–32 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hamilton Wilfried Yves Adoni.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Adoni, H.W.Y., Nahhal, T., Krichen, M. et al. A survey of current challenges in partitioning and processing of graph-structured data in parallel and distributed systems. Distrib Parallel Databases 38, 495–530 (2020). https://doi.org/10.1007/s10619-019-07276-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-019-07276-9

Keywords

Navigation