Abstract
We focus on the problem of clustering with soft instance-level constraints. Recently, the CVQE algorithm was proposed in this context. It modifies the objective function of traditional K-means to include penalties for violated constraints. CVQE was shown to efficiently produce high-quality clustering of UCI data. In this work, we examine the properties of CVQE and propose a modification that results in a more intuitive objective function, with lower computational complexity. We present our extensive experimentation, which provides insight into CVQE and shows that our new variant can dramatically improve clustering quality while reducing run time. We show its superiority in a large-scale surveillance scenario with noisy constraints.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Wagstaff, K., Cardie, C., Rogers, S., Schroedl, S.: Constrained k-means clustering with background knowledge. In: Brodley, C., Danyluk, A. (eds.) Proceeding of the 17th International Conference on Machine Learning, Morgan Kaufmann, San Francisco (2001)
Lin, W.-H., Hauptmann, A.: Structuring continuous video recordings of everyday life using time-constrained clustering. In: IS&T/SPIE Symposium on Electronic Imaging, San Jose, CA (January 2006)
Hertz, T., Shental, N., Bar-Hillel, A., Weinshall, D.: Enhancing image and video retrieval: Learning via equivalence constraints. In: Proc. of IEEE Conference on Computer Vision and Pattern Recognition, IEEE Computer Society Press, Los Alamitos (2003)
Davidson, I., Ravi, S.S.: Clustering with constraints: Feasibility issues and the k-means algorithm. In: 5th SIAM Data Mining Conference (2005)
Yu, S.X., Shi, J.: Grouping with directed relationships. In: Figueiredo, M., Zerubia, J., Jain, A.K. (eds.) EMMCVPR 2001. LNCS, vol. 2134, Springer, Heidelberg (2001)
Cohn, D., Caruana, R., McCallum, A.: Semi-supervised clustering with user feedback. Technical report, Cornell University, TR2003-1892 (2003)
Basu, S., Bilenko, M., Mooney, R.J.: A probabilistic framework for semi-supervised clustering. In: Kim, W., Kohavi, R., Gehrke, J., DuMouchel, W. (eds.) Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, pp. 59–68. ACM Press, New York (2004)
Basu, S., Davidson, I.: Clustering with constraints: Theory and practice. In: Online Proceedings of a KDD tutorial (2006), http://www.ai.sri.com/~basu/kdd-tutorial-2006/
Davidson, I., Ravi, S.S.: Identifying and generating easy sets of constraints for clustering. In: AAAI, AAAI Press, Stanford (2006)
Davidson, I., Wagstaff, K., Basu, S.: Measuring constraint-set utility for partitional clustering algorithms. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 115–126. Springer, Heidelberg (2006)
Pelleg, D., Baras, D.: k-means with large and noisy constraint sets. Technical Report H-0253, IBM (2007)
Bilenko, M., Basu, S., Mooney, R.J.: Integrating constraints and metric learning in semi-supervised clustering. In: Brodley, C.E. (ed.) ICML, ACM, New York (2004)
Davidson, I., Ravi, S.S.: Intractability and clustering with constraints. In: ICML (2007)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pelleg, D., Baras, D. (2007). K-Means with Large and Noisy Constraint Sets. In: Kok, J.N., Koronacki, J., Mantaras, R.L.d., Matwin, S., Mladenič, D., Skowron, A. (eds) Machine Learning: ECML 2007. ECML 2007. Lecture Notes in Computer Science(), vol 4701. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74958-5_67
Download citation
DOI: https://doi.org/10.1007/978-3-540-74958-5_67
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74957-8
Online ISBN: 978-3-540-74958-5
eBook Packages: Computer ScienceComputer Science (R0)