Abstract
Block modeling algorithms are used to discover important latent structures in graphs. They are the graph equivalent of clustering algorithms. However, existing block modeling algorithms work directly on the given graphs, making them computationally expensive and less effective on large complex graphs. In this paper, we propose a FastMap-based algorithm for block modeling on single-view undirected graphs. FastMap embeds a given undirected graph into a Euclidean space in near-linear time such that the pairwise Euclidean distances between vertices approximate a desired graph-based distance function between them. In the first phase, our FastMap-based block modeling (FMBM) algorithm uses FastMap with a probabilistically-amplified shortest-path distance function between vertices. In the second phase, it uses Gaussian Mixture Models (GMMs) for identifying clusters (blocks) in the resulting Euclidean space. FMBM outperforms other state-of-the-art methods on many benchmark and synthetic instances, both in efficiency and solution quality. It also enables a perspicuous visualization of clusters (blocks) in the graphs, not provided by other methods.
This work at the University of Southern California is supported by DARPA under grant number HR001120C0157 and by NSF under grant numbers 1409987, 1724392, 1817189, 1837779, 1935712, and 2112533. The views, opinions, and/or findings expressed are those of the author(s) and should not be interpreted as representing the official views or policies of the sponsoring organizations, agencies, or the U.S. Government. This research was partially supported by the OPTIMA ARC training centre IC200100009.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The conditions used in community detection prevent the proper identification of peripheral groups.
- 2.
In a single-view graph, there is at most one edge between any two vertices.
- 3.
i.e., linear time after ignoring logarithmic factors
- 4.
i.e., a pair of vertices not connected by an edge
- 5.
unless \(|E| = O(|V|)\), in which case the complexity is near-linear in the size of the input because of the \(\log |V|\) factor
- 6.
The complement graph \(\bar{G}\) has the same vertices as the original graph G but represents every edge in G as a non-edge and every non-edge in G as an edge.
- 7.
An edge-induced subgraph of G has the same vertices as G but a subset of its edges.
- 8.
M can be computed from A and C in \(O(|E| + k^2)\) time while evaluating the objective function in Eq. 2.
- 9.
The domain of each \(c_i\) is \(\{1, 2 \ldots k\}\). Block \(\mathcal {B}_h\) refers to the collection of all vertices \(v_i \in V\) such that \(c_i = h\).
- 10.
These values are only important as ballpark estimates. We observed that the performance of FMBM often stays stable within broad ranges of hyperparameter values, imparting robustness to FMBM. Moreover, only a few different hyperparameter settings had to be examined to determine the best one.
- 11.
Although Graph-Tool does not require a user-specified value of k, it has a tendency to produce trivial solutions with \(k = 1\), resulting in 0 NMI values when the value of k is not explicitly specified.
- 12.
- 13.
DANMF did not assign any block membership to a few vertices in some synthetic test cases. We assign Block \(\mathcal {B}_1\) by default to such vertices.
- 14.
The spread here refers to how a block extends from its center to its periphery.
- 15.
References
Abbe, E.: Community detection and stochastic block models: recent developments. J. Mach. Learn. Res. 18, 6446–6531 (2017)
Antonopoulos, C.G.: Dynamic range in the C. elegans brain network. Chaos: Interdisc. J. Nonlinear Sci. 26(1), 013102 (2016)
Chan, J., Liu, W., Kan, A., Leckie, C., Bailey, J., Ramamohanarao, K.: Discovering latent blockmodels in sparse and noisy graphs using non-negative matrix factorisation. In: Proceedings of the ACM International Conference on Information & Knowledge Management (2013)
Cohen, L., Uras, T., Jahangiri, S., Arunasalam, A., Koenig, S., Kumar, T.K.S.: The FastMap algorithm for shortest path computations. In: Proceedings of the International Joint Conference on Artificial Intelligence (2018)
Davis, T.: USAir97 (2014). https://www.cise.ufl.edu/research/sparse/matrices/Pajek/USAir97
Faloutsos, C., Lin, K.I.: FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (1995)
Fredman, M.L., Tarjan, R.E.: Fibonacci heaps and their uses in improved network optimization algorithms. J. ACM (JACM) 34, 596–615 (1987)
Girvan, M., Newman, M.E.: Community structure in social and biological networks. Natl. Acad. Sci. 99, 7821–7826 (2002)
Gopalakrishnan, S., Cohen, L., Koenig, S., Kumar, T.K.S.: Embedding directed graphs in potential fields using FastMap-D. In: Proceedings of the International Symposium on Combinatorial Search (2020)
Hagberg, A., Swart, P., Chult, D.S.: Exploring network structure, dynamics, and function using NetworkX. Technical report, Los Alamos National Lab, Los Alamos, NM (United States) (2008)
Lee, J., Gross, S.P., Lee, J.: Improved network community structure improves function prediction. Sci. Rep. 3, 1–9 (2013)
Li, J., Felner, A., Koenig, S., Kumar, T.K.S.: Using FastMap to solve graph problems in a Euclidean space. In: Proceedings of the International Conference on Automated Planning and Scheduling (2019)
Lin, S., Hu, Q., Wang, G., Yu, P.S.: Understanding community effects on information diffusion. In: Cao, T., Lim, E.-P., Zhou, Z.-H., Ho, T.-B., Cheung, D., Motoda, H. (eds.) PAKDD 2015. LNCS (LNAI), vol. 9077, pp. 82–95. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18038-0_7
Mattenet, A., Davidson, I., Nijssen, S., Schaus, P.: Generic constraint-based block modeling using constraint programming. J. Artif. Intell. Res. 70, 597–630 (2021)
Murphy, K.P.: Machine Learning: A probabilistic perspective. The MIT Press, Cambridge (2012)
Newman, M.E.: Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E 74(3), 036104 (2006)
Peixoto, T.P.: Efficient Monte Carlo and greedy heuristic for the inference of stochastic block models. Phys. Rev. E 89(1), 012804 (2014)
Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: online learning of social representations. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2014)
Ramteke, R., et al.: Improving single and multi-view blockmodelling by algebraic simplification. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN) (2020)
Ye, F., Chen, C., Zheng, Z.: Deep autoencoder-like nonnegative matrix factorization for community detection. In: Proceedings of the ACM International Conference on Information and Knowledge Management (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Li, A., Stuckey, P., Koenig, S., Kumar, T.K.S. (2022). A FastMap-Based Algorithm for Block Modeling. In: Schaus, P. (eds) Integration of Constraint Programming, Artificial Intelligence, and Operations Research. CPAIOR 2022. Lecture Notes in Computer Science, vol 13292. Springer, Cham. https://doi.org/10.1007/978-3-031-08011-1_16
Download citation
DOI: https://doi.org/10.1007/978-3-031-08011-1_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08010-4
Online ISBN: 978-3-031-08011-1
eBook Packages: Computer ScienceComputer Science (R0)