Abstract
We construct classifiers for multivariate and functional data. Our approach is based on a kind of distance between data points and classes. The distance measure needs to be robust to outliers and invariant to linear transformations of the data. For this purpose we can use the bagdistance which is based on halfspace depth. It satisfies most of the properties of a norm but is able to reflect asymmetry when the class is skewed. Alternatively we can compute a measure of outlyingness based on the skew-adjusted projection depth. In either case we propose the DistSpace transform which maps each data point to the vector of its distances to all classes, followed by k-nearest neighbor (kNN) classification of the transformed data points. This combines invariance and robustness with the simplicity and wide applicability of kNN. The proposal is compared with other methods in experiments with real and simulated data.
Similar content being viewed by others
References
Alonso A, Casado D, Romo J (2012) Supervised classification for functional data: a weighted distance approach. Comput Stat Data Anal 56:2334–2346
Bache K, Lichman M (2013) UCI Machine Learning Repository, https://archive.ics.uci.edu/ml/datasets.html
Brys G, Hubert M, Rousseeuw PJ (2005) A robustification of independent component analysis. J Chemom 19:364–375
Brys G, Hubert M, Struyf A (2004) A robust measure of skewness. J Comput Gr Stat 13:996–1017
Chen Y, Keogh E, Hu B, Begum N, Bagnall A, Mueen A, Batista GJ (2015) The UCR Time Series Classification Archive. http://www.cs.ucr.edu/~eamonn/time_series_data/
Christmann A, Fischer P, Joachims T (2002) Comparison between various regression depth methods and the support vector machine to approximate the minimum number of misclassifications. Comput Stat 17:273–287
Christmann A, Rousseeuw PJ (2001) Measuring overlap in logistic regression. Comput Stat Data Anal 37:65–75
Claeskens G, Hubert M, Slaets L, Vakili K (2014) Multivariate functional halfspace depth. J Am Stat Assoc 109(505):411–423
Cuesta-Albertos JA, Nieto-Reyes A (2010) Functional classification and the random Tukey depth: Practical issues. In: Borgelt C, Rodríguez GG, Trutschnig W, Lubiano MA, Angeles Gil M, Grzegorzewski P, Hryniewicz O (eds) Combining soft computing and statistical methods in data analysis Springer, Berlin Heidelberg, pp 123–130
Cuesta-Albertos JA, Febrero-Bande M, Oviedo de la Fuente M (2015) The \(DD^G\)-classifier in the functional setting. arXiv:1501.00372v2
Delaigle A, Hall P, Bathia N (2012) Componentwise classification and clustering of functional data. Biometrika 99:299–313
Donoho D (1982) Breakdown properties of multivariate location estimators. Ph.D. Qualifying paper, Dept. Statistics, Harvard University, Boston
Donoho D, Gasko M (1992) Breakdown properties of location estimates based on halfspace depth and projected outlyingness. Ann Stat 20(4):1803–1827
Dutta S, Ghosh A (2011) On robust classification using projection depth. Ann Inst Stat Math 64:657–676
Dyckerhoff R, Mozharovskyi P (2016) Exact computation of the halfspace depth. Comput Stat Data Anal 98:19–30
Ferraty F, Vieu P (2006) Nonparametric functional data analysis: theory and practice. Springer, New York
Felipe JC, Traina AJM, Traina C (2005) Global warp metric distance: boosting content-based image retrieval through histograms. Proceedings of the Seventh IEEE International Symposium on Multimedia (ISM’05), p 8
Fix E, Hodges JL (1951) Discriminatory analysis—nonparametric discrimination: Consistency properties. Technical Report 4 USAF School of Aviation Medicine, Randolph Field, Texas
Ghosh A, Chaudhuri P (2005) On maximum depth and related classifiers. Scand J Stat 32(2):327–350
Hallin M, Paindaveine D, Šiman M (2010) Multivariate quantiles and multiple-output regression quantiles: from \(L_1\) optimization to halfspace depth. Ann Stat 38(2):635–669
Hastie T, Buja A, Tibshirani R (1995) Penalized discriminant analysis. Ann Stat 23(1):73–102
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning, 2nd edn. Springer, New York
Hlubinka D, Gijbels I, Omelka M, Nagy S (2015) Integrated data depth for smooth functions and its application in supervised classification. Comput Stat 30:1011–1031
Hubert M, Rousseeuw PJ, Segaert P (2015) Multivariate functional outlier detection. Stat Methods Appl 24:177–202
Hubert M, Van der Veeken S (2010) Robust classification for skewed data. Adv Data Anal Classif 4:239–254
Hubert M, Vandervieren E (2008) An adjusted boxplot for skewed distributions. Comput Stat Data Anal 52(12):5186–5201
Hubert M, Van Driessen K (2004) Fast and robust discriminant analysis. Comput Stat Data Anal 45:301–320
Jörnsten R (2004) Clustering and classification based on the \(L_1\) data depth. J Multivar Anal 90:67–89
Koenker R, Bassett G (1978) Regression quantiles. Econometrica 46:33–50
Lange T, Mosler K, Mozharovskyi P (2014) Fast nonparametric classification based on data depth. Stat Papers 55(1):49–69
Li B, Yu Q (2008) Classification of functional data: a segmentation approach. Comput Stat Data Anal 52(10):4790–4800
Li J, Cuesta-Albertos J, Liu R (2012) DD-classifier: nonparametric classification procedure based on DD-plot. J Am Stat Assoc 107:737–753
Liu R (1990) On a notion of data depth based on random simplices. Ann Stat 18(1):405–414
López-Pintado S, Romo J (2006) Depth-based classification for functional data. In Data depth: robust multivariate analysis, computational geometry and applications, vol 72 of DIMACS Ser. Discrete Math. Theoret. Comput. Sci., pp 103–119. Am Math Soc, Providence, RI
Maronna R, Martin D, Yohai V (2006) Robust statistics: theory and methods. Wiley, New York
Martin-Barragan B, Lillo R, Romo J (2014) Interpretable support vector machines for functional data. Eur J Op Res 232(1):146–155
Massé J-C, Theodorescu R (1994) Halfplane trimming for bivariate distributions. J Multivar Anal 48(2):188–202
Mosler K (2013) Depth statistics. In: Becker C, Fried R, Kuhnt S (eds) Robustness and Complex data structures, festschrift in honour of Ursula Gather. Springer, Berlin, pp 17–34
Mosler K, Mozharovskyi P (2016) Fast DD-classification of functional data. Statistical Papers. doi:10.1007/s00362-015-0738-3
Müller DW, Sawitzki G (1991) Excess mass estimates and tests for multimodality. J Am Stat Assoc 86:738–746
Nagy S, Gijbels I, Omelka M, Hlubinka D (2016) Integrated depth for functional data: statistical properties and consistency. ESAIM Probab Stat. doi:10.1051/ps/2016005
Paindaveine D, Šiman M (2012) Computing multiple-output regression quantile regions. Comput Stat Data Anal 56:840–853
Pigoli D, Sangalli L (2012) Wavelets in functional data analysis: estimation of multidimensional curves and their derivatives. Comput Stat Data Anal 56(6):1482–1498
Ramsay J, Silverman B (2005) Functional data analysis, 2nd edn. Springer, New York
Riani M, Zani S (2000) Generalized distance measures for asymmetric multivariate distributions. In: Rizzi A, Vichi M, Bock HH (eds) Advances in data science and classification. Springer, Berlin, pp 503–508
Rossi F, Villa N (2006) Support vector machine for functional data classification. Neurocomputing 69:730–742
Rousseeuw PJ, Hubert M (1999) Regression depth. J Am Stat Assoc 94:388–402
Rousseeuw PJ, Leroy A (1987) Robust regression and outlier detection. Wiley-Interscience, New York
Rousseeuw PJ, Ruts I (1996) Bivariate location depth. Appl Stat 45:516–526
Rousseeuw PJ, Ruts I (1998) Constructing the bivariate Tukey median. Stat Sinica 8:827–839
Rousseeuw PJ, Ruts I (1999) The depth function of a population distribution. Metrika 49:213–244
Rousseeuw PJ, Ruts I, Tukey J (1999) The bagplot: a bivariate boxplot. Am Stat 53:382–387
Rousseeuw PJ, Struyf A (1998) Computing location depth and regression depth in higher dimensions. Stat Comput 8:193–203
Ruts I, Rousseeuw PJ (1996) Computing depth contours of bivariate point clouds. Comput Stat Data Anal 23:153–168
Stahel W (1981) Robuste Schätzungen: infinitesimale Optimalität und Schätzungen von Kovarianzmatrizen. PhD thesis, ETH Zürich
Struyf A, Rousseeuw PJ (2000) High-dimensional computation of the deepest location. Comput Stat Data Anal 34(4):415–426
Thakoor N, Gao J (2005) Shape classifier based on generalized probabilistic descent method with hidden Markov descriptor. Tenth IEEE International Conference on Computer Vision (ICCV 2005), vol 1, pp 495–502
Tukey J (1975) Mathematics and the picturing of data. In: Proceedings of the International Congress of Mathematicians. Vol 2, Vancouver, pp 523–531
Zuo Y (2003) Projection-based depth functions and associated medians. Ann Stat 31(5):1460–1490
Zuo Y, Serfling R (2000) General notions of statistical depth function. Ann Stat 28:461–482
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported by the Internal Funds KU Leuven under Grant C16/15/068. We are grateful to two referees for constructive remarks which improved the presentation.
Rights and permissions
About this article
Cite this article
Hubert, M., Rousseeuw, P. & Segaert, P. Multivariate and functional classification using depth and distance. Adv Data Anal Classif 11, 445–466 (2017). https://doi.org/10.1007/s11634-016-0269-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-016-0269-3