Pruning boxes in a box-based classification method

Spinelli, Vincenzo

doi:10.1007/s11634-014-0193-3

Pruning boxes in a box-based classification method

Regular Article
Published: 17 December 2014

Volume 10, pages 285–304, (2016)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

Vincenzo Spinelli¹

256 Accesses
2 Citations
Explore all metrics

Abstract

In this work we address an extension of box clustering in supervised classification problems that makes use of optimization problems to refine the results obtained by agglomerative techniques. The central concept of box clustering is that of homogeneous boxes that give rise to overtrained classifiers under some conditions. Thus, we focus our attentions on the issue of pruning out redundant boxes, using the information gleaned from the other boxes generated under the hypothesis that such a choice would identify simpler models with good predictive power. We propose a pruning method based on an integer optimization problem and a family of sub problems derived from the main one. The overall performances are then compared to the accuracy levels of competing methods on a wide range of real data sets. The method has proven to be robust, making it possible to derive a more compact system of boxes in the instance space with good performance on training and test data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

References

Almuallim H, Dietterich TG (1991) Learning with many irrelevant features. In: Proceedings of the ninth national conference on artificial intelligence, vol 2. AAAI Press, Menlo Park, CA, pp 547–552
Bache K, Lichman M (2013) UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. http://www.ics.uci.edu/~mlearn/MLRepository.html
Bertolazzi P, Felici G, Festa P, Lancia G (2008) Logic classification and feature selection for biomedical data. Comput Math Appl 55(5):889–899
Article MathSciNet MATH Google Scholar
Boros E, Hammer PL, Ibaraki T, Kogan A (1997) Logical analysis of numerical data. Math Progr 79:163–190
MathSciNet MATH Google Scholar
Boros E, Hammer PL, Ibaraki T, Kogan A, Mayoraz E, Muchnik I (2000) An implementation of logical analysis of data. Knowl Data Eng IEEE Trans 12(2):292–306
Article Google Scholar
Davenport MA, Baraniuk MG, Scott CD (2006a) Controlling false alarms with support vector machines. In: IEEE international conference on acoustics, speech, and signal processing (ICASSP), Toulouse, France
Davenport MA, Baraniuk MG, Scott CD (2006b) Learning minimum volume sets with support vector machines. In: IEEE workshop on machine learning for signal processing (MLSP), Maynooth, Ireland
Davenport MA, Baraniuk RG, Scott C (2010) Tuning support vector machines for minimax and neyman-pearson classification. IEEE Trans Pattern Anal Mach Intell 32(10):1888–1898. http://www.ece.rice.edu/~md/np_svm.php
Eckstein J, Hammer PL, Liu Y, Nediak M, Simeone B (2002) The maximum box problem and its application to data analysis. Comput Optim Appl 23(3):285–298
Article MathSciNet MATH Google Scholar
Felici G, Simeone B, Spinelli V (2010) Classification techniques and error control in logic mining. In: Stahlbock R, Crone SF, Lessmann S (eds) Data mining, Annals of information systems, vol 8. Springer, London, pp 99–119. ISBN:978-1-4419-1279-4
Grudzinski K, Grochowski M, Duch W (2010) Pruning classification rules with reference vector selection methods. In: Rutkowski L, Scherer R, Tadeusiewicz R, Zadeh LA, Zurada JM (eds) ICAISC (1), Lecture notes in computer science, vol 6113. Springer, Berlin, Heidelberg, pp 347–354
Hammer PL, Liu Y, Simeone B, Szedmàk S (2004) Saturated systems of homogeneous boxes and the logical analysis of numerical data. Discret Appl Math 144:103–109
Article MathSciNet MATH Google Scholar
Harris E (2002) Information gain versus gain ratio: a study of split method biases. In: ISAIM. http://dblp.uni-trier.de/db/conf/isaim/isaim2002.html
Huang J, Ling CX (2005) Using AUC and accuracy in evaluating learning algorithms. Knowl Data Eng IEEE Trans 17(3):299–310
Kaneko A, Kano M (2003) Discrete geometry on red and blue points in the plane–a survey. In: Discrete and computational geometry. Springer, Berlin, Heidelberg, pp 551–570
Kohavi R, Provost F (1998) Glossary of terms. Mach Learn 30:271–274
Article Google Scholar
Maloof MA (2003) Learning when data sets are imbalanced and when costs are unequal and unknown. In: ICML-2003 workshop on learning from imbalanced data sets
Maravalle M, Ricca F, Simeone B, Spinelli V (2014) Carpal tunnel syndrome automatic classification: electromyography vs. ultrasound imaging. In: Proceedings of TOP, pp 1–24
McCarthy K, Zabar B, Weiss G (2005) Does cost-sensitive learning beat sampling for classifying rare classes? In: Proceedings of the 1st international workshop on utility-based data mining, pp 69–77
Nitesh VC (2005) Data mining for imbalanced datasets: an overview. In: Maimon O, Rokach L (eds) The data mining and knowledge discovery handbook. Springer, New York, pp 853–867
Google Scholar
Schaffer C (1994) A conservative law for generalization performance. In: Cohen WW, Hirsh H (eds) Eleventh international conference on machine learning, ICML. Morgan Kaufmann, San Francisco, CA, pp 259–265. ISBN:1-55860-335-2
Shah D, Lakshmanan LVS, Ramamritham K, Sudarshan S (1999) Interestingness and pruning of mined patterns. In: 1999 ACM SIGMOD workshop on research issues in data mining and knowledge discovery. http://dblp.uni-trier.de/db/conf/dmkd/dmkd1999.html
Simeone B, Spinelli V (2007) The optimization problem framework for box clustering approach in logic mining. In: Book of abstract of Euro XXII–22nd European conference on operational research, Euro XXII
Simeone B, Felici G, Spinelli V (2007) A graph coloring approach for box clustering techniques in logic mining. In: Book of Abstract of Euro XXII–22nd European conference on operational research, Euro XXII
Sogaard A (2013) Semi-supervised learning and domain adaptation in natural language processing. Morgan & Claypool, San Rafael
Google Scholar
Weka (2013) Machine learning group– data mining software in java. University of Waikato, Department of Computer Science. http://www.cs.waikato.ac.nz/ml/weka
Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann, San Francisco, CA. ISBN:0123748569, 9780123748560
Wu S, Flach P (2005) A scored AUC metric for classifier evaluation and selection In: ICML’05 workshop on ROC Analysis in Machine Learning, Bonn, Germany, August 2005
Zadrozny B, Langford J, Abe N (2003) Cost-sensitive learning by cost-proportionate example weighting. In: Proceedings of the third IEEE international conference on data mining
Zilberstein S (1996) Using anytime algorithms in intelligent systems. AI Mag 17(3):73–83
Google Scholar

Download references

Author information

Authors and Affiliations

Istat, Istituto Nazionale di Statistica, Via Tuscolana 1788, 00173, Rome, Italy
Vincenzo Spinelli

Authors

Vincenzo Spinelli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vincenzo Spinelli.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Spinelli, V. Pruning boxes in a box-based classification method. Adv Data Anal Classif 10, 285–304 (2016). https://doi.org/10.1007/s11634-014-0193-3

Download citation

Received: 14 October 2013
Revised: 26 November 2014
Accepted: 27 November 2014
Published: 17 December 2014
Issue Date: September 2016
DOI: https://doi.org/10.1007/s11634-014-0193-3

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Pruning boxes in a box-based classification method

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Design of the Best Linear Classifier for Box-Constrained Data Sets

Multi-dimensional Bayesian network classifiers: A survey

Reconstructing the training data set based on reducing boundary complexity

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Navigation

Pruning boxes in a box-based classification method

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Design of the Best Linear Classifier for Box-Constrained Data Sets

Multi-dimensional Bayesian network classifiers: A survey

Reconstructing the training data set based on reducing boundary complexity

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Search

Navigation