Abstract
We introduce a novel ensemble model based on random projections. The contribution of using random projections is two-fold. First, the randomness provides the diversity which is required for the construction of an ensemble model. Second, random projections embed the original set into a space of lower dimension while preserving the dataset’s geometrical structure to a given distortion. This reduces the computational complexity of the model construction as well as the complexity of the classification. Furthermore, dimensionality reduction removes noisy features from the data and also represents the information which is inherent in the raw data by using a small number of features. The noise removal increases the accuracy of the classifier.
The proposed scheme was tested using WEKA based procedures that were applied to 16 benchmark dataset from the UCI repository.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Alonso, C.J.: Rotation forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(10), 1619–1630 (2006)
Asuncion, A., Newman, D.J.: UCI machine learning repository (2007)
Bingham, E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2001), San Francisco, CA, USA, August 26-29, 2001, pp. 245–250 (2001)
Bourgain, J.: On lipschitz embedding of finite metric spaces in Hilbert space. Israel Journal of Mathematics 52, 46–52 (1985)
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Chapman & Hall, Inc., New York (1993)
Candès, E., Romberg, J., Tao, T.: Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on Information Theory 52(2), 489–509 (2006)
Demsar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)
Donoho, D.L.: Compressed sensing. IEEE Transactions on Information Theory 52(4), 1289–1306 (2006)
Zhang Fern, X., Brodley, C.E.: Random projection for high dimensional data clustering: A cluster ensemble approach, pp. 186–193 (2003)
Folgieri, R.: Ensembles based on Random Projection for gene expression data analysis. PhD thesis, University of Milano (2007)
Freund, Y., Schapire, R.: Experiments with a new boosting algorithm. machine learning. In: Proceedings for the Thirteenth International Conference, pp. 148–156. Morgan Kaufmann, San Francisco (1996)
Goel, N., Bebis, G., Nefian, A.: Face recognition experiments with random projection. In: Proceedings of SPIE, vol. 5779, p. 426 (2005)
Hegde, C., Wakin, M., Baraniuk, R.G.: Random projections for manifold learning. In: Neural Information Processing Systems (NIPS) (December 2007)
Hein, M., Audibert, Y.: Intrinsic dimensionality estimation of submanifolds in Euclidean space. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 289–296 (2005)
Ho, T.: The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 832–844 (1998)
Johnson, W.B., Lindenstrauss, J.: Extensions of Lipshitz mapping into Hilbert space. Contemporary Mathematics 26, 189–206 (1984)
Kuncheva, L.I.: Combining Pattern Classifiers. Methods and Algorithms. John Wiley and Sons, Chichester (2004)
Kuncheva, L.I.: Diversity in multiple classifier systems (editorial). Information Fusion 6(1), 3–4 (2004)
Leigh, W., Purvis, R., Ragusa, J.M.: Forecasting the nyse composite index with technical analysis, pattern recognizer, neural networks, and genetic algorithm: a case study in romantic decision support. Decision Support Systems 32(4), 361–377 (2002)
Linial, M., Linial, N., Tishby, N., Yona, G.: Global self-organization of all known protein sequences reveals inherent biological signatures. Journal of Molecular Biology 268(2), 539–556 (1997)
Mangiameli, P., West, D., Rampal, R.: Model selection for medical diagnosis decision support systems. Decision Support Systems 36(3), 247–259 (2004)
Margineantu, D.D., Dietterich, T.G.: Pruning adaptive boosting. In: Proceedings of the 14th International Conference on Machine Learning, pp. 211–218 (1997)
Polikar, R.: Ensemble based systems in decision making. IEEE Circuits and Systems Magazine 6(3), 21–45 (2006)
Quinlan, R.R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Rokach, L.: Mining manufacturing data using genetic algorithm-based feature set decomposition. International Journal of Intelligent Systems Technologies and Applications 4(1/2), 57–78 (2008)
Rooney, N., Patterson, D., Tsymbal, A., Anand, S.: Random subspacing for regression ensembles. Technical report, Department of Computer Science, Trinity College Dublin, Ireland, February 10 (2004)
Valentini, G., Muselli, M., Ruffino, F.: Bagged ensembles of svms for gene expression data analysis. In: Proceeding of the International Joint Conference on Neural Networks - IJCNN, pp. 1844–1849. IEEE Computer Society Press, Los Alamitos (2003)
Vapnik, V.N.: The Nature of Statistical Learning Theory (Information Science and Statistics). Springer, Heidelberg (1999)
Yang, Z., Nie, X., Xu, W., Guo, J.: An approach to spam detection by naive bayes ensemble based on decision induction. In: Proceedings of the Sixth International Conference on Intelligent Systems Design and Applications (ISDA 2006) (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Schclar, A., Rokach, L. (2009). Random Projection Ensemble Classifiers. In: Filipe, J., Cordeiro, J. (eds) Enterprise Information Systems. ICEIS 2009. Lecture Notes in Business Information Processing, vol 24. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01347-8_26
Download citation
DOI: https://doi.org/10.1007/978-3-642-01347-8_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01346-1
Online ISBN: 978-3-642-01347-8
eBook Packages: Computer ScienceComputer Science (R0)