Multivariate and functional classification using depth and distance | Advances in Data Analysis and Classification Skip to main content
Log in

Multivariate and functional classification using depth and distance

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

We construct classifiers for multivariate and functional data. Our approach is based on a kind of distance between data points and classes. The distance measure needs to be robust to outliers and invariant to linear transformations of the data. For this purpose we can use the bagdistance which is based on halfspace depth. It satisfies most of the properties of a norm but is able to reflect asymmetry when the class is skewed. Alternatively we can compute a measure of outlyingness based on the skew-adjusted projection depth. In either case we propose the DistSpace transform which maps each data point to the vector of its distances to all classes, followed by k-nearest neighbor (kNN) classification of the transformed data points. This combines invariance and robustness with the simplicity and wide applicability of kNN. The proposal is compared with other methods in experiments with real and simulated data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  • Alonso A, Casado D, Romo J (2012) Supervised classification for functional data: a weighted distance approach. Comput Stat Data Anal 56:2334–2346

    Article  MathSciNet  MATH  Google Scholar 

  • Bache K, Lichman M (2013) UCI Machine Learning Repository, https://archive.ics.uci.edu/ml/datasets.html

  • Brys G, Hubert M, Rousseeuw PJ (2005) A robustification of independent component analysis. J Chemom 19:364–375

    Article  Google Scholar 

  • Brys G, Hubert M, Struyf A (2004) A robust measure of skewness. J Comput Gr Stat 13:996–1017

    Article  MathSciNet  MATH  Google Scholar 

  • Chen Y, Keogh E, Hu B, Begum N, Bagnall A, Mueen A, Batista GJ (2015) The UCR Time Series Classification Archive. http://www.cs.ucr.edu/~eamonn/time_series_data/

  • Christmann A, Fischer P, Joachims T (2002) Comparison between various regression depth methods and the support vector machine to approximate the minimum number of misclassifications. Comput Stat 17:273–287

    Article  MathSciNet  MATH  Google Scholar 

  • Christmann A, Rousseeuw PJ (2001) Measuring overlap in logistic regression. Comput Stat Data Anal 37:65–75

    Article  MATH  Google Scholar 

  • Claeskens G, Hubert M, Slaets L, Vakili K (2014) Multivariate functional halfspace depth. J Am Stat Assoc 109(505):411–423

    Article  MathSciNet  MATH  Google Scholar 

  • Cuesta-Albertos JA, Nieto-Reyes A (2010) Functional classification and the random Tukey depth: Practical issues. In: Borgelt C, Rodríguez GG, Trutschnig W, Lubiano MA, Angeles Gil M, Grzegorzewski P, Hryniewicz O (eds) Combining soft computing and statistical methods in data analysis Springer, Berlin Heidelberg, pp 123–130

  • Cuesta-Albertos JA, Febrero-Bande M, Oviedo de la Fuente M (2015) The \(DD^G\)-classifier in the functional setting. arXiv:1501.00372v2

  • Delaigle A, Hall P, Bathia N (2012) Componentwise classification and clustering of functional data. Biometrika 99:299–313

    Article  MathSciNet  MATH  Google Scholar 

  • Donoho D (1982) Breakdown properties of multivariate location estimators. Ph.D. Qualifying paper, Dept. Statistics, Harvard University, Boston

  • Donoho D, Gasko M (1992) Breakdown properties of location estimates based on halfspace depth and projected outlyingness. Ann Stat 20(4):1803–1827

    Article  MathSciNet  MATH  Google Scholar 

  • Dutta S, Ghosh A (2011) On robust classification using projection depth. Ann Inst Stat Math 64:657–676

    Article  MathSciNet  MATH  Google Scholar 

  • Dyckerhoff R, Mozharovskyi P (2016) Exact computation of the halfspace depth. Comput Stat Data Anal 98:19–30

    Article  MathSciNet  Google Scholar 

  • Ferraty F, Vieu P (2006) Nonparametric functional data analysis: theory and practice. Springer, New York

    MATH  Google Scholar 

  • Felipe JC, Traina AJM, Traina C (2005) Global warp metric distance: boosting content-based image retrieval through histograms. Proceedings of the Seventh IEEE International Symposium on Multimedia (ISM’05), p 8

  • Fix E, Hodges JL (1951) Discriminatory analysis—nonparametric discrimination: Consistency properties. Technical Report 4 USAF School of Aviation Medicine, Randolph Field, Texas

  • Ghosh A, Chaudhuri P (2005) On maximum depth and related classifiers. Scand J Stat 32(2):327–350

    Article  MathSciNet  MATH  Google Scholar 

  • Hallin M, Paindaveine D, Šiman M (2010) Multivariate quantiles and multiple-output regression quantiles: from \(L_1\) optimization to halfspace depth. Ann Stat 38(2):635–669

    Article  MathSciNet  MATH  Google Scholar 

  • Hastie T, Buja A, Tibshirani R (1995) Penalized discriminant analysis. Ann Stat 23(1):73–102

    Article  MathSciNet  MATH  Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning, 2nd edn. Springer, New York

    Book  MATH  Google Scholar 

  • Hlubinka D, Gijbels I, Omelka M, Nagy S (2015) Integrated data depth for smooth functions and its application in supervised classification. Comput Stat 30:1011–1031

    Article  MathSciNet  MATH  Google Scholar 

  • Hubert M, Rousseeuw PJ, Segaert P (2015) Multivariate functional outlier detection. Stat Methods Appl 24:177–202

    Article  MathSciNet  MATH  Google Scholar 

  • Hubert M, Van der Veeken S (2010) Robust classification for skewed data. Adv Data Anal Classif 4:239–254

    Article  MathSciNet  MATH  Google Scholar 

  • Hubert M, Vandervieren E (2008) An adjusted boxplot for skewed distributions. Comput Stat Data Anal 52(12):5186–5201

    Article  MathSciNet  MATH  Google Scholar 

  • Hubert M, Van Driessen K (2004) Fast and robust discriminant analysis. Comput Stat Data Anal 45:301–320

    Article  MathSciNet  MATH  Google Scholar 

  • Jörnsten R (2004) Clustering and classification based on the \(L_1\) data depth. J Multivar Anal 90:67–89

    Article  MathSciNet  MATH  Google Scholar 

  • Koenker R, Bassett G (1978) Regression quantiles. Econometrica 46:33–50

    Article  MathSciNet  MATH  Google Scholar 

  • Lange T, Mosler K, Mozharovskyi P (2014) Fast nonparametric classification based on data depth. Stat Papers 55(1):49–69

    Article  MathSciNet  MATH  Google Scholar 

  • Li B, Yu Q (2008) Classification of functional data: a segmentation approach. Comput Stat Data Anal 52(10):4790–4800

    Article  MathSciNet  MATH  Google Scholar 

  • Li J, Cuesta-Albertos J, Liu R (2012) DD-classifier: nonparametric classification procedure based on DD-plot. J Am Stat Assoc 107:737–753

    Article  MathSciNet  MATH  Google Scholar 

  • Liu R (1990) On a notion of data depth based on random simplices. Ann Stat 18(1):405–414

    Article  MathSciNet  MATH  Google Scholar 

  • López-Pintado S, Romo J (2006) Depth-based classification for functional data. In Data depth: robust multivariate analysis, computational geometry and applications, vol 72 of DIMACS Ser. Discrete Math. Theoret. Comput. Sci., pp 103–119. Am Math Soc, Providence, RI

  • Maronna R, Martin D, Yohai V (2006) Robust statistics: theory and methods. Wiley, New York

    Book  MATH  Google Scholar 

  • Martin-Barragan B, Lillo R, Romo J (2014) Interpretable support vector machines for functional data. Eur J Op Res 232(1):146–155

    Article  Google Scholar 

  • Massé J-C, Theodorescu R (1994) Halfplane trimming for bivariate distributions. J Multivar Anal 48(2):188–202

    Article  MathSciNet  MATH  Google Scholar 

  • Mosler K (2013) Depth statistics. In: Becker C, Fried R, Kuhnt S (eds) Robustness and Complex data structures, festschrift in honour of Ursula Gather. Springer, Berlin, pp 17–34

    Chapter  Google Scholar 

  • Mosler K, Mozharovskyi P (2016) Fast DD-classification of functional data. Statistical Papers. doi:10.1007/s00362-015-0738-3

    Google Scholar 

  • Müller DW, Sawitzki G (1991) Excess mass estimates and tests for multimodality. J Am Stat Assoc 86:738–746

    MathSciNet  MATH  Google Scholar 

  • Nagy S, Gijbels I, Omelka M, Hlubinka D (2016) Integrated depth for functional data: statistical properties and consistency. ESAIM Probab Stat. doi:10.1051/ps/2016005

    MathSciNet  MATH  Google Scholar 

  • Paindaveine D, Šiman M (2012) Computing multiple-output regression quantile regions. Comput Stat Data Anal 56:840–853

    Article  MathSciNet  MATH  Google Scholar 

  • Pigoli D, Sangalli L (2012) Wavelets in functional data analysis: estimation of multidimensional curves and their derivatives. Comput Stat Data Anal 56(6):1482–1498

    Article  MathSciNet  MATH  Google Scholar 

  • Ramsay J, Silverman B (2005) Functional data analysis, 2nd edn. Springer, New York

    MATH  Google Scholar 

  • Riani M, Zani S (2000) Generalized distance measures for asymmetric multivariate distributions. In: Rizzi A, Vichi M, Bock HH (eds) Advances in data science and classification. Springer, Berlin, pp 503–508

    Google Scholar 

  • Rossi F, Villa N (2006) Support vector machine for functional data classification. Neurocomputing 69:730–742

    Article  Google Scholar 

  • Rousseeuw PJ, Hubert M (1999) Regression depth. J Am Stat Assoc 94:388–402

    Article  MathSciNet  MATH  Google Scholar 

  • Rousseeuw PJ, Leroy A (1987) Robust regression and outlier detection. Wiley-Interscience, New York

    Book  MATH  Google Scholar 

  • Rousseeuw PJ, Ruts I (1996) Bivariate location depth. Appl Stat 45:516–526

    Article  MATH  Google Scholar 

  • Rousseeuw PJ, Ruts I (1998) Constructing the bivariate Tukey median. Stat Sinica 8:827–839

    MathSciNet  MATH  Google Scholar 

  • Rousseeuw PJ, Ruts I (1999) The depth function of a population distribution. Metrika 49:213–244

    MathSciNet  MATH  Google Scholar 

  • Rousseeuw PJ, Ruts I, Tukey J (1999) The bagplot: a bivariate boxplot. Am Stat 53:382–387

    Google Scholar 

  • Rousseeuw PJ, Struyf A (1998) Computing location depth and regression depth in higher dimensions. Stat Comput 8:193–203

    Article  Google Scholar 

  • Ruts I, Rousseeuw PJ (1996) Computing depth contours of bivariate point clouds. Comput Stat Data Anal 23:153–168

    Article  MATH  Google Scholar 

  • Stahel W (1981) Robuste Schätzungen: infinitesimale Optimalität und Schätzungen von Kovarianzmatrizen. PhD thesis, ETH Zürich

  • Struyf A, Rousseeuw PJ (2000) High-dimensional computation of the deepest location. Comput Stat Data Anal 34(4):415–426

    Article  MATH  Google Scholar 

  • Thakoor N, Gao J (2005) Shape classifier based on generalized probabilistic descent method with hidden Markov descriptor. Tenth IEEE International Conference on Computer Vision (ICCV 2005), vol 1, pp 495–502

  • Tukey J (1975) Mathematics and the picturing of data. In: Proceedings of the International Congress of Mathematicians. Vol 2, Vancouver, pp 523–531

  • Zuo Y (2003) Projection-based depth functions and associated medians. Ann Stat 31(5):1460–1490

    Article  MathSciNet  MATH  Google Scholar 

  • Zuo Y, Serfling R (2000) General notions of statistical depth function. Ann Stat 28:461–482

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mia Hubert.

Additional information

This work was supported by the Internal Funds KU Leuven under Grant C16/15/068. We are grateful to two referees for constructive remarks which improved the presentation.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hubert, M., Rousseeuw, P. & Segaert, P. Multivariate and functional classification using depth and distance. Adv Data Anal Classif 11, 445–466 (2017). https://doi.org/10.1007/s11634-016-0269-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-016-0269-3

Keywords

Mathematics Subject Classification

Navigation