Abstract
Relation graphs, in which multi-type (or single type) nodes are related to each other, frequently arise in many important applications, such as Web mining, information retrieval, bioinformatics, and epidemiology. In this study, We propose a general framework for clustering on relation graphs. Under this framework, we derive a family of clustering algorithms including both hard and soft versions, which are capable of learning cluster patterns from relation graphs with various structures and statistical properties. A number of classic approaches on special cases of relation graphs, such as traditional graphs with singly-type nodes and bi-type relation graphs with two types of nodes, can be viewed as special cases of the proposed framework. The theoretic analysis and experiments demonstrate the great potential and effectiveness of the proposed framework and algorithm.
Similar content being viewed by others
References
Banerjee A, Dhillon I, Ghosh J, Merugu S, Modha D (2004) A generalized maximum entropy approach to bregman co-clustering and matrix approximation. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, NY, USA, pp 509–514
Banerjee A, Merugu S, Dhillon I, Ghosh J (2005) Clustering with Bregman divergences. J Mach Learn Res 6: 1705–1749
Bui T, Jones C (1993) A heuristic for reducing fill in sparse matrix factorization. In: 6th SIAM conference on parallel processing for scientific computing, pp 445–452
Chan P, Schlag M, Zien J (1993) Spectral k-way ratio-cut partitioning and clustering. In: Proceedings of the 30th international conference on design automation. ACM, New York, NY, USA, pp 749–754
Della Pietra S, Della Pietra V, Lafferty J (2002) Duality and auxiliary functions for Bregman distances. Technical report, Technical Report CMU-CS-01-109, School of Computer Science, Carnegie Mellon University, 2001
Dhillon I (2001) Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, NY, USA, pp 269–274
Dhillon I, Guan Y, Kulis B (2005) A fast kernel-based multilevel algorithm for graph clustering. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining. ACM, New York, NY, USA, pp 629–634
Dhillon I, Guan Y, Kulis B (2005) A unified view of kernel k-means, spectral clustering and graph cuts. The University of Texas at Austin, Department of Computer Sciences, Technical Report TR-04
Dhillon I, Mallela S, Modha D (2003) Information-theoretic co-clustering. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, NY, USA, pp 89–98
Ding C, He X, Zha H, Gu M, Simon H (2001) A min-max cut algorithm for graph partitioning and data clustering. In: Proceedings IEEE international conference on data mining, 2001, ICDM 2001, pp 107–114
El-Yaniv R, Souroujon O (2001) Iterative double clustering for unsupervised and semi-supervised learning. In: EMCL’01: Proceedings of the 12th European conference on machine learning. Springer, London, UK, pp 121–132. ISBN:3-540-42536-5
Gao B, Liu T, Zheng X, Cheng Q, Ma W (2005) Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining. ACM, New York, NY, USA, pp 41–50
Golub G, Van Loan C (1996) Matrix computations. Johns Hopkins University Press, Baltimore
Hendrickson B, Leland R (1995) A multilevel algorithm for partitioning graphs. In: Supercomputing ’95: proceedings of the 1995 ACM/IEEE conference on supercomputing (CDROM). ACM, New York, NY, USA, p 28
Hofmann T (1999) Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, NY, USA, pp 50–57
Hofmann T, Puzicha J (1999), Latent class models for collaborative filtering. In: IJCAI ’99: proceedings of the sixteenth international joint conference on artificial intelligence. Morgan Kaufmann, San Francisco, CA, USA, pp 688–693
Karypis G, Kumar V (1999) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1): 359
Kernighan B, Lin S (1970) An efficient heuristic procedure for partitioning graphs. Bell Syst Tech J 49(2): 291–307
Lang K (1995) Newsweeder: learning to filter netnews. In: Proceedings of the twelfth international conference on machine learning, pp 331–339
Lee DD, Seung HS (2001) Algorithms for non-negative matrix factorization. Adv Neural Inf Process Syst 13: 556–562
Li T (2005) A general model for clustering binary data. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining. ACM, New York, NY, USA, pp 188–197
Long B, Binghamton S, Wu X, Zhang Z, Yu P (2007) Community learning by graph approximation. In: Proceedings of the 2007 seventh IEEE international conference on data mining. IEEE Computer Society, Washington, DC, USA, pp 232–241
Long B, Wu X, Zhang Z, Yu P (2006) Unsupervised learning on k-partite graphs. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, NY, USA, pp 317–326
Long B, Zhang Z, Yu P (2005) Co-clustering by block value decomposition. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining. ACM, New York, NY, USA, pp 635–640
Long B, Zhang Z, Wu X, Yu P (2006) Spectral clustering for multi-type relational data. In: Proceedings of the 23rd international conference on machine learning. ACM, New York, NY, USA, pp 585–592
Neville J, Adler M, Jensen D (2003) Clustering relational data using attribute and link information. In: Proceedings of the IJCAI text mining and link analysis workshop
Ng A, Jordan M, Weiss Y (2002) On spectral clustering: analysis and an algorithm. Adv Neural Inf Process Syst 2: 849–856
Salakhutdinov R, Roweis S (2003) Adaptive overrelaxed bound optimization methods. In: Machine learning-international workshop then conference, vol 20, p 664
Shental N, Zomet A, Hertz T, Weiss Y (2003) Pairwise clustering and graphical models. Adv Neural Inf Process Syst 16
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8): 888–905
Strehl A, Ghosh J (2003) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3: 583–617
Tishby N, Pereira FC, Bialek W (1999) The information bottleneck method. In: Proceedings of the 37th annual allerton conference on communication, control and computing, pp 368–377
Van Dongen SM (2000) Graph clustering by flow simulation. PhD thesis, University of Utrecht
Wang J, Zeng H, Chen Z, Lu H, Tao L, Ma W (2003) ReCoM: reinforcement clustering of multi-type interrelated data objects. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval. ACM, New York, NY, USA, pp 274–281
Yu K, Yu S, Tresp V (2006) Soft clustering on graphs. Adv Neural Inf Process Syst 18: 1553
Yu SX, Shi J (2003) Multiclass spectral clustering. In: ICCV ’03: Proceedings of the ninth IEEE international conference on computer vision. IEEE Computer Society, Washington, DC, USA, p 313
Zeng H-J, Chen Z, Ma W-Y (2002) A unified framework for clustering heterogeneous web objects. In: WISE’02: Proceedings of the 3rd international conference on web information systems engineering. IEEE Computer Society, Washington, DC, USA, pp 161–172. ISBN:0-7695-1766-8
Zha H, He X, Ding C, Simon H, Gu M (2001) Bipartite graph partitioning and data clustering. In: Proceedings of the tenth international conference on information and knowledge management. ACM, New York, NY, USA, pp 25–32
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Long, B., Zhang, Z. & Yu, P.S. A general framework for relation graph clustering. Knowl Inf Syst 24, 393–413 (2010). https://doi.org/10.1007/s10115-009-0255-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-009-0255-6