Abstract
How can we find communities in dynamic networks of social interactions, such as who calls whom, who emails whom, or who sells to whom? How do we store a large volume of IP network source–destination connection graphs, which grow over time? In this chapter, we study these two fundamental problems on time-evolving graphs and exploit the subtle connection between pattern mining and compression. We propose a pattern mining method, GraphScope, that automatically reveals the underlying communities in the graphs, as well as the change points in time. Our method needs no human intervention, and it is carefully designed to operate in a streaming fashion. Moreover, it is based on lossless compression principles. Therefore, in addition to revealing the fundamental structure of the graphs, the discovered patterns naturally lead to an excellent storage scheme for graph streams. Thus, our proposed GraphScope method unifies and solves both the mining and the compression problem (1) by producing meaningful time-evolving patterns agreeing with human intuition and (2) by identifying key change points in several real large time-evolving graphs. We demonstrate its efficiency and effectiveness on real data sets from several domains.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
To encode a positive integer x, we need \({\log^\star} x\approx \log_2 x+\log_2\log_2 x+\cdots \), where only the positive terms are retained and this is the optimal length, if the range of x is unknown [25]
- 2.
One edge from 4 to 3 in \({\ensuremath{G^{(1)}_{}}}\), two edges from 4 to 2 and 3 in \({\ensuremath{G^{(2)}_{}}}\) in Fig. 3.2.
- 3.
\((t_{s+1}-t_s)n\) is the total number of possible edges of a source node in the graph segment
- 4.
Two graphs are superimposed together by taking the union of their edges.
- 5.
The campus network is constantly running some port-scanning program to identify potential vulnerability of the in-network hosts.
- 6.
Due anonymity requirements, the account types are obfuscated.
References
C. C. Aggarwal and P. S. Yu. Online analysis of community evolution in data streams. In SDM, 2005.
R. Rantzau, and R. Srikant. Auditing compliance with a hippocratic database. In VLDB, Toronto, Ontario, Canada, 2004.
B. Babcock, M. Datar, and R. Motwani. Sampling from a moving window over streaming data. In SODA, 2002.
L. Backstrom, D. P. Huttenlocher, J. M. Kleinberg, and X. Lan. Group formation in large social networks: Membership, growth, and evolution. In KDD, pages 44–54, 2006.
D. Chakrabarti. Autopart: Parameter-free graph partitioning and outlier detection. In PKDD, pages 112–124, 2004.
D. Chakrabarti, S. Papadimitriou, D. S. Modha, and C. Faloutsos. Fully automatic cross-associations. In KDD, pages 79–88. ACM Press, 2004.
G. Cormode, M. Datar, P. Indyk, and S. Muthukrishnan. Comparing data streams using hamming norms (how to zero in). In TKDE, 15(3), 2003.
T. M. Cover and J. A. Thomas. Elements of information theory. Wiley-Interscience, New York, NY, 1991.
I. S. Dhillon, S. Mallela, and D. S. Modha. Information-theoretic co-clustering. In KDD, pages 89–98, 2003.
P. Drineas, R. Kannan, and M. W. Mahoney. Fast monte carlo algorithms for matrices iii: Computing a compressed approximate matrix decomposition. In SIAM Journal on Computing, 36(1):184–206, 2006.
M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the internet topology. In SIGCOMM, 1999.
V. Ganti, J. Gehrke, and R. Ramakrishnan. Mining data streams under block evolution. SIGKDD Explorations Newsletter, 3(2), 2002.
M. Garofalakis and P. B. Gibbons. Probabilistic wavelet synopses. ACM Transactions on Database System, 29(1), 2004.
G. H. Golub and C. F. V. Loan. Matrix Computation. Johns Hopkins University Press, Baltimore, MD, 3rd edition, 1996.
P. Indyk. Stable distributions, pseudorandom generators, embeddings and data stream computation. In FOCS, 2000.
G. Karypis and V. Kumar. Multilevel k-way partitioning scheme for irregular graphs. Journal of Parallel and Distributed Computing, 48(1):96–129, 1998.
E. Keogh, S. Lonardi, and C. A. Ratanamahatana. Towards parameter-free data mining. In KDD, pages 206–215, New York, NY ACM Press, 2004.
R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Extracting large-scale knowledge bases from the web. In VLDB, Edinburgh, Scotland, 1999.
J. Leskovec, J. Kleinberg, and C. Faloutsos. Graphs over time: Densification laws, shrinking diameters and possible explanations. In SIGKDD, 2005.
Y.-R. Lin, Y. Chi, S. Zhu, H. Sundaram, and B. Tseng. Analyzing communities and their evolutions in dynamics networks. ACM Transactions on Knowledge Discovery from Data (TKDD), special issue on Social Computing, Behavioral Modeling, 3, 2009.
Y.-R. Lin, J. Sun, P. Castro, R. Konuru, H. Sundaram, and A. Kelliher. Metafac: Community discovery via relational hypergraph factorization. In KDD, 2009.
S. Muthukrishnan. Data streams: algorithms and applications, Foundations and Trends. in Theoretical Computer Science, 1, 2005.
A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In NIPS, pages 849–856, 2001.
S. Papadimitriou, J. Sun, and C. Faloutsos. Streaming pattern discovery in multiple time-series. In VLDB, pages 697–708, Trondheim, Norway, 2005.
J. Rissanen. A universal prior for integers and estimation by minimum description length. Annals of Statistics, 11(2):416–431, 1983.
J. Sun, D. Tao, and C. Faloutsos. Beyond streams and graphs: Dynamic tensor analysis. In KDD, pages 374–383, 2006.
J. Sun, Y. Xie, H. Zhang, and C. Faloutsos. Less is more: Compact matrix decomposition for large sparse graphs. In Proceedings of the 2007 SIAM International Conference on Data Mining (SDM), 2007.
Y. Zhu and D. Shasha. Statstream: Statistical monitoring of thousands of data streams in real time. In VLDB, pages 358–369, Hong Kong, China, 2002.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Sun, J., Papadimitriou, S., Yu, P.S., Faloutsos, C. (2010). Community Evolution and Change Point Detection in Time-Evolving Graphs. In: Yu, P., Han, J., Faloutsos, C. (eds) Link Mining: Models, Algorithms, and Applications. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-6515-8_3
Download citation
DOI: https://doi.org/10.1007/978-1-4419-6515-8_3
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-6514-1
Online ISBN: 978-1-4419-6515-8
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)