Abstract
Given a set of binary vectors drawn from a finite multiple Bernoulli mixture model, an important problem is to determine which vectors are outliers and which features are relevant. The goal of this paper is to propose a model for binary vectors clustering that accommodates outliers and allows simultaneously the incorporation of a feature selection methodology into the clustering process. We derive an EM algorithm to fit the proposed model. Through simulation studies and a set of experiments involving handwritten digit recognition and visual scenes categorization, we demonstrate the usefulness and effectiveness of our method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: Proc. of KDD, pp. 226–231 (1996)
Bouguila, N., Daoudi, K.: A Statistical Approach for Binary Vectors Modeling and Clustering. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 184–195. Springer, Heidelberg (2009)
Bouguila, N., Daoudi, K.: Learning Concepts from Visual Scenes Using a Binary Probabilistic Model. In: Proc. of IEEE International Workshop on Multimedia Signal Processing (MMSP), pp. 1–5 (October 2009)
Abend, K., Harley, T.J., Kanal, L.N.: Classification of Binary Random Patterns. IEEE Transactions on Information Theory 11(4), 538–544 (1965)
Aitchison, J., Aitken, C.G.G.: Multivariate Binary Discrimination by the Kernel Method. Biometrika 63(3), 413–420 (1976)
Bezdek, J.C.: Feature Selection for Binary Data: Medical Diagnosis with Fuzzy Sets. In: Proc. of the National Computer Conference and Exposition, New York, NY, USA, pp. 1057–1068 (1976)
Moore II, D.H.: Evaluation of Five Discrimination Procedures for Binary Variables. Journal of the American Statistical Association 68(342), 399–404 (1973)
Saund, E.: Unsupervised Learning of Mixtures of Multiple Causes in Binary Data. In: Advances in Neural Information Processing Systems (NIPS), pp. 27–34 (1993)
Bouguila, N.: On multivariate binary data clustering and feature weighting. Computational Statistics & Data Analysis 54(1), 120–134 (2010)
Breunig, M.M., Kriegel, H.-P., Ng, R.T., Sander, J.: LOF: Identifying Density-Based Local Outliers. In: Proc. of the ACM SIGMOD International Conference on Management of Data (MOD), pp. 93–104 (2000)
Boutemedjet, S., Ziou, D., Bouguila, N.: Unsupervised Feature Selection for Accurate Recommendation of High-Dimensional Image Data. In: Advances in Neural Information Processing Systems (NIPS), pp. 177–184 (2007)
Law, M.H.C., Figueiredo, M.A.T., Jain, A.K.: Simultaneous Feature Selection and Clustering Using Mixture Models. IEEE Transactions on Pattern Analysis and Machine Intelligence 26(9), 1154–1166 (2004)
Schwarz, G.: Estimating the Dimension of a Model. Annals of Statistics 16, 461–464 (1978)
Freund, Y., Schapire, R.E.: Experiments with a New Boosting Algorithm. In: Proc. of ICML, pp. 148–156 (1996)
Blake, C.L., Merz, C.J.: Repository of Machine Learning Databases. University of California, Irvine, Dept. of Information and Computer Sciences (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Jeannin, S., Bober, M.: Description of core experiments for MPEG-7 motion/shape. Technical Report ISO/IEC JTC 1/SC 29/WG 11 MPEG99/N2690, MPEG-7 Visual Group, Seoul (March 1999)
Everingham, M., Zisserman, A., Williams, C.K.I., Van Gool, L., Allan, M., Bishop, C.M., Chapelle, O., Dalal, N., Deselaers, T., Dorkó, G., Duffner, S., Eichhorn, J., Farquhar, J.D.R., Fritz, M., Garcia, C., Griffiths, T., Jurie, F., Keysers, D., Koskela, M., Laaksonen, J., Larlus, D., Leibe, B., Meng, H., Ney, H., Schiele, B., Schmid, C., Seemann, E., Shawe-Taylor, J., Storkey, A.J., Szedmak, S., Triggs, B., Ulusoy, I., Viitaniemi, V., Zhang, J.: The 2005 PASCAL Visual Object Classes Challenge. In: Quiñonero-Candela, J., Dagan, I., Magnini, B., d’Alché-Buc, F. (eds.) MLCW 2005. LNCS (LNAI), vol. 3944, pp. 117–176. Springer, Heidelberg (2006)
Lowe, D.G.: Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision 60(2), 91–110 (2004)
Ke, Y., Sukthankar, R.: PCA-SIFT: A More Distinctive Representation for Local Image Descriptors. In: Proc. of IEEE CVPR, pp. 506–513 (2004)
Knorr, E.M., Ng, R.T.: Algorithms for Mining Distance-Based Outliers in Large Datasets. In: Proc. of 24rd International Conference on Very Large Data Bases (VLDB), pp. 392–403 (1998)
Durst, R., Champion, T., Witten, B., Miller, E., Spagnuolo, L.: Testing and Evaluating Computer Intrusion Detection Systems. Commun. ACM 42, 53–61 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mashrgy, M.A., Bouguila, N., Daoudi, K. (2011). A Robust Approach for Multivariate Binary Vectors Clustering and Feature Selection. In: Lu, BL., Zhang, L., Kwok, J. (eds) Neural Information Processing. ICONIP 2011. Lecture Notes in Computer Science, vol 7063. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24958-7_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-24958-7_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24957-0
Online ISBN: 978-3-642-24958-7
eBook Packages: Computer ScienceComputer Science (R0)