Abstract
Regular classification of data includes a training set and test set. For example for Naïve Bayes, Artificial Neural Networks, and Support Vector Machines, each classifier employs the whole training set to train itself. This study will explore the possibility of using a condensed form of the training set in order to get a comparable classification accuracy. The technique we explored in this study will use a clustering algorithm to explore how the data can be compressed. For example, is it possible to represent 50 records as a single record? Can this single record train a classifier as similarly to using all 50 records? This thesis aims to explore the idea of how we can achieve data compression through clustering, what are the concepts that extract the qualities of a compressed dataset, and how to check the information gain to ensure the integrity and quality of the compression algorithm. This study will explore compression through Affinity Propagation using categorical data, exploring entropy within cluster sets to calculate integrity and quality, and testing the compressed dataset with a classifier using Cosine Similarity against the uncompressed dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Mulier, F.M., Cherkassky, V.: Learning From Data. Wiley-IEEE Press (2007)
Kalechofsky, H.: A simple framework for building predictive models. M Squared Consulting (2016). http://www.msquared.com/wp-content/uploads/2017/01/A-Simple-Framework-for-Building-Predictive-Models.pdf
Kumar, V.: Introduction to data mining. In: Cluster Analysis: Basic Concepts and Methods. Pearson (2005)
Dueck, D., Frey, B.J.: Clustering by passing messages between data points. Sci. Mag. 315, 972–976 (2007)
Trono, J., Kronenberg, D., Redmond, P.: Affinity propagation, and other data clustering techniques, Saint Michael’s College. http://academics.smcvt.edu/jtrono/Papers/SMCClustering%20Paper_PatrickRedmond.pdf
Refianti, R., Mutiara, A.B., Gunawan, S.: Time complexity comparison between affinity propagation algorithms. J. Theor. Appl. Inf. Technol. 95(7), 1497–1505 (2017)
Barrett, P.: Euclidian distance. Technical Whitepaper (2005). https://www.pbarrett.net/techpapers/euclid.pdf
Kumar, A., Bholowalia, P.: EBK-means: a clustering technique based on Elbow Method and K-means in WSN. Int. J. Comput. Appl. 105(9), 17–24 (2014)
Dey, L., Ahmad, A.: A k-mean clustering algorithm for mixed numeric and categorical data. Data Knowl. Eng. 63, 503–527 (2007)
Garcia, E.: Cosine Similarity Tutorial, 04 October 2015. http://www.minerazzi.com/tutorials/cosine-similarity-tutorial.pdf. Accessed 15 Sept 2018
Wolberg, W.H.: UCI machine learning repository. University of Wisconsin Hospitals, Madison WI, 15 July 1992. https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Original%29. Accessed 01 Mar 2019
Schlimmer, J.: UCI machine learning repository. The Audubon Society Field Guide to North American Mushrooms, 27 April 1987. https://archive.ics.uci.edu/ml/datasets/Mushroom. Accessed 01 Mar 2019
Soklic, M., Zwitter, M.: Breast cancer data set. UCI Machine Learning Repository, 11 July 1988. https://archive.ics.uci.edu/ml/datasets/BreastCancer. Accessed 15 Apr 2019
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Klecker, C., Saad, A. (2019). Building a Classification Model Using Affinity Propagation. In: Pérez García, H., Sánchez González, L., Castejón Limas, M., Quintián Pardo, H., Corchado Rodríguez, E. (eds) Hybrid Artificial Intelligent Systems. HAIS 2019. Lecture Notes in Computer Science(), vol 11734. Springer, Cham. https://doi.org/10.1007/978-3-030-29859-3_24
Download citation
DOI: https://doi.org/10.1007/978-3-030-29859-3_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29858-6
Online ISBN: 978-3-030-29859-3
eBook Packages: Computer ScienceComputer Science (R0)