On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study

Campos, Guilherme O.; Zimek, Arthur; Sander, Jörg; Campello, Ricardo J. G. B.; Micenková, Barbora; Schubert, Erich; Assent, Ira; Houle, Michael E.

doi:10.1007/s10618-015-0444-8

On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study

Published: 16 January 2016

Volume 30, pages 891–927, (2016)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Guilherme O. Campos¹,
Arthur Zimek²,
Jörg Sander³,
Ricardo J. G. B. Campello¹,
Barbora Micenková⁴,
Erich Schubert²,
Ira Assent⁴ &
…
Michael E. Houle⁵

24k Accesses
18 Altmetric
1 Mention
Explore all metrics

Abstract

The evaluation of unsupervised outlier detection algorithms is a constant challenge in data mining research. Little is known regarding the strengths and weaknesses of different standard outlier detection models, and the impact of parameter choices for these algorithms. The scarcity of appropriate benchmark datasets with ground truth annotation is a significant impediment to the evaluation of outlier methods. Even when labeled datasets are available, their suitability for the outlier detection task is typically unknown. Furthermore, the biases of commonly-used evaluation measures are not fully understood. It is thus difficult to ascertain the extent to which newly-proposed outlier detection methods improve over established methods. In this paper, we perform an extensive experimental study on the performance of a representative set of standard k nearest neighborhood-based methods for unsupervised outlier detection, across a wide variety of datasets prepared for this purpose. Based on the overall performance of the outlier detection methods, we provide a characterization of the datasets themselves, and discuss their suitability as outlier detection benchmark sets. We also examine the most commonly-used measures for comparing the performance of different methods, and suggest adaptations that are more suitable for the evaluation of outlier detection results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

On normalization and algorithm selection for unsupervised outlier detection

Article 21 November 2019

Similarity-Based Unsupervised Evaluation of Outlier Detection

On the evaluation of outlier detection and one-class classification: a comparative study of algorithms, model selection, and ensembles

Article Open access 16 May 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

While only recently defined formally, SimplifiedLOF has been implicitly used (and adapted), often presumably unintentionally [i.e., not being aware of the special definition of the reachability distance (Eq. 2)], in many earlier variants of LOF. Here, for the first time, it is evaluated explicitly.
In fact, the number of true outliers expected to be ranked by chance among the top n positions is a fraction n / N of |O|, which yields \(P@n = \frac{n \cdot |O|}{N}\big / n = \frac{|O|}{N}\).
http://www.dbs.ifi.lmu.de/research/outlier-evaluation/.
Available at: http://www.ipd.kit.edu/~muellere/HiCS/realworld.zip. Note that we have supplemented our collection with some of these datasets, without further preprocessing.
For unsupervised learning, both training and test sets can be used together, and we assume this is the case unless otherwise specified.
FastABOD requires at least a set of 3 neighbors, as it computes variances of angles to neighbors. LDOF, KDEOS, and ODIN require at least 2 neighbors.
http://www.dbs.ifi.lmu.de/research/outlier-evaluation/.
We see the same overall tendency (although much weaker due to overall low values) if we use \(P@n\) and \({{\mathrm{AP}}}\) (both adjusted and unadjusted) instead of ROC AUC. This is expected since (Adjusted) \(P@n\) and (Adjusted) \({{\mathrm{AP}}}\) can yield additional insights when comparing results that are very good in terms of ROC AUC. In this aggregated evaluation, however, many results with weak scores are included. The corresponding plots are available online.
Therefore, as a side effect, such heat maps can also serve to visualize the profile of performance in terms of \(P@(x \cdot n)\) for \(x=1,\ldots ,9\).
This is not surprising given the relatively large amount of outliers (\(\approx \)75 %) in the base dataset.
Prima facie, this conclusion is valid, based on our experiments, for the dependency of related methods on a parameter choice regarding cardinality of a local neighborhood. Common sense suggests that we can have a similar expectation, mutatis mutandis, for other types of parameters for other kinds of methods.

References

Abe N, Zadrozny B, Langford J (2006) Outlier detection by active learning. In: Proceedings of the 12th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Philadelphia, pp 504–509. doi:10.1145/1150402.1150459
Achtert E, Kriegel HP, Schubert E, Zimek A (2013) Interactive data mining with 3D-parallel-coordinate-trees. In: Proceedings of the ACM international conference on management of data (SIGMOD), New York, pp 1009–1012. doi:10.1145/2463676.2463696
Aggarwal CC (2013) Outlier analysis. Springer, Berlin
Book MATH Google Scholar
Akoglu L, Tong H, Koutra D (2015) Graph-based anomaly detection and description: a survey. Data Mining Knowl Discov 29(3):626–688. doi:10.1007/s10618-014-0365-y
Article MathSciNet Google Scholar
Angiulli F, Pizzuti C (2002) Fast outlier detection in high dimensional spaces. In: Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD), Helsinki, pp 15–26. doi:10.1007/3-540-45681-3_2
Angiulli F, Pizzuti C (2005) Outlier mining in large high-dimensional data sets. IEEE Trans Knowl Data Eng 17(2):203–215. doi:10.1109/TKDE.2005.31
Article MathSciNet MATH Google Scholar
Bache K, Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
Barnett V, Lewis T (1994) Outliers in statistical data, 3rd edn. Wiley, New York
MATH Google Scholar
Breunig MM, Kriegel HP, Ng R, Sander J (2000) LOF: identifying density-based local outliers. In: Proceedings of the ACM international conference on management of data (SIGMOD), Dallas, pp 93–104. doi:10.1145/342009.335388
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surveys 41(3):1–58. doi:10.1145/1541880.1541882
Article Google Scholar
Craswell N (2009a) Precision at n. In: Liu L, Özsu MT (eds) Encyclopedia of database systems. Springer, Berlin, pp 2127–2128. doi:10.1007/978-0-387-39940-9_484
Craswell N (2009b) R-precision. In: Liu L, Özsu MT (eds) Encyclopedia of database systems. Springer, Berlin, p 2453. doi:10.1007/978-0-387-39940-9_486
Dang XH, Micenková B, Assent I, Ng R (2013) Outlier detection with space transformation and spectral analysis. In: Proceedings ofthe 13th SIAM international conference on data mining (SDM), Austin, pp 225–233
Dang XH, Assent I, Ng RT, Zimek A, Schubert E (2014) Discriminative features for identifying and interpreting outliers. In: Proceedings of the 30th International Conference on Data Engineering (ICDE), Chicago, pp 88–99. doi:10.1109/ICDE.2014.6816642
Davis J, Goadrich M (2006) The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd international conference on machine learning (ICML), Pittsburgh, pp 233–240
de Vries T, Chawla S, Houle ME (2010) Finding local anomalies in very high dimensional space. In: Proceedings of the 10th IEEE International Conference on Data Mining (ICDM), Sydney, pp 128–137. doi:10.1109/ICDM.2010.151
de Vries T, Chawla S, Houle ME (2012) Density-preserving projections for large-scale local anomaly detection. Knowl Inf Syst 32(1):25–52. doi:10.1007/s10115-011-0430-4
Article Google Scholar
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Emmott AF, Das S, Dietterich T, Fern A, Wong WK (2013) Systematic construction of anomaly detection benchmarks from real data. In: Workshop on outlier detection and description, held in conjunction with the 19th ACM SIGKDD international conference on knowledge discovery and data mining, Chicago, pp 16–21
Estivill-Castro V (2002) Why so many clustering algorithms—a position paper. ACM SIGKDD Explor 4(1):65–75. doi:10.1145/568574.568575
Article MathSciNet Google Scholar
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701. doi:10.1080/01621459.1937.10503522
Article MATH Google Scholar
Färber I, Günnemann S, Kriegel HP, Kröger P, Müller E, Schubert E, Seidl T, Zimek A (2010) On using class-labels in evaluation of clusterings. In: MultiClust: 1st international workshop on discovering, summarizing and using multiple clusterings held in conjunction with KDD 2010, Washington, DC
Gao J, Tan PN (2006) Converting output scores from outlier detection algorithms into probability estimates. In: Proceedings of the 6th IEEE international conference on data mining (ICDM), Hong Kong, pp 212–221. doi:10.1109/ICDM.2006.43
Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143:29–36
Article Google Scholar
Hautamäki V, Kärkkäinen I, Fränti P (2004) Outlier detection using k-nearest neighbor graph. In: Proceedings of the 17th international conference on pattern recognition (ICPR), Cambridge, pp 430–433. doi:10.1109/ICPR.2004.1334558
Hawkins D (1980) Identification of outliers. Chapman and Hall, London
Book MATH Google Scholar
Houle ME, Kriegel HP, Kröger P, Schubert E, Zimek A (2010) Can shared-neighbor distances defeat the curse of dimensionality? In: Proceedings of the 22nd international conference on scientific and statistical database management (SSDBM), Heidelberg, pp 482–500. doi:10.1007/978-3-642-13818-8_34
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218
Article MATH Google Scholar
Jin W, Tung AKH, Han J, Wang W (2006) Ranking outliers using symmetric neighborhood relationship. In: Proceedings of the 10th Pacific-Asia conference on knowledge discovery and data mining (PAKDD), Singapore, pp 577–593. doi:10.1007/11731139_68
Keller F, Müller E, Böhm K (2012) HiCS: high contrast subspaces for density-based outlier ranking. In: Proceedings of the 28th international conference on data engineering (ICDE), Washington, DC, pp 1037–1048. doi:10.1109/ICDE.2012.88
Knorr EM, Ng RT (1997) A unified notion of outliers: properties and computation. In: Proceedings of the 3rd ACM international conference on knowledge discovery and data mining (KDD), Newport Beach, pp 219–222. doi:10.1145/782010.782021
Knorr EM, Ng RT (1998) Algorithms for mining distance-based outliers in large datasets. In: Proceedings of the 24th international conference on very large data bases (VLDB), New York, pp 392–403
Kriegel HP, Schubert M, Zimek A (2008) Angle-based outlier detection in high-dimensional data. In: Proceedings of the 14th ACM international conference on knowledge discovery and data mining (SIGKDD), Las Vegas, pp 444–452. doi:10.1145/1401890.1401946
Kriegel HP, Kröger P, Schubert E, Zimek A (2009a) LoOP: local outlier probabilities. In: Proceedings of the 18th ACM conference on information and knowledge management (CIKM), Hong Kong, pp 1649–1652. doi:10.1145/1645953.1646195
Kriegel HP, Kröger P, Zimek A (2009b) Clustering high dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans Knowl Discov Data 3(1):1–58. doi:10.1145/1497577.1497578
Article Google Scholar
Kriegel HP, Kröger P, Schubert E, Zimek A (2011a) Interpreting and unifying outlier scores. In: Proceedings of the 11th SIAM international conference on data mining (SDM), Mesa, pp 13–24. doi:10.1137/1.9781611972818.2
Kriegel HP, Schubert E, Zimek A (2011b) Evaluation of multiple clustering solutions. In: 2nd MultiClust Workshop: Discovering, Summarizing and Using Multiple Clusterings Held in Conjunction with ECML PKDD 2011, Athens, Greece, pp 55–66
Kriegel HP, Schubert E, Zimek A (2015) The (black) art of runtime evaluation: Are we comparing algorithms or implementations? submitted
Latecki LJ, Lazarevic A, Pokrajac D (2007) Outlier detection with kernel density functions. In: Proceedings of the 5th international conference on machine learning and data mining in pattern recognition (MLDM), Leipzig, pp 61–75. doi:10.1007/978-3-540-73499-4_6
Lazarevic A, Kumar V (2005) Feature bagging for outlier detection. In: Proceedings of the 11th ACM international conference on knowledge discovery and data mining (SIGKDD), Chicago, pp 157–166. doi:10.1145/1081870.1081891
Liu FT, Ting KM, Zhou ZH (2012) Isolation-based anomaly detection. ACM Trans Knowl Discov Data 6(1):31–39
Article Google Scholar
Marques HO, Campello RJGB, Zimek A, Sander J (2015) On the internal evaluation of unsupervised outlier detection. In: Proceedings of the 27th international conference on scientific and statistical database management (SSDBM), San Diego, pp 7:1–12. doi:10.1145/2791347.2791352
Micenková B, van Beusekom J, Shafait F (2012) Stamp verification for automated document authentication. In: 5th International workshop on computational forensics
Müller E, Schiffer M, Seidl T (2011) Statistical selection of relevant subspace projections for outlier ranking. In: Proceedings of the 27th international conference on data engineering (ICDE), Hannover, pp 434–445. doi:10.1109/ICDE.2011.5767916
Müller E, Assent I, Iglesias P, Mülle Y, Böhm K (2012) Outlier ranking via subspace analysis in multiple views of the data. In: Proceedings of the 12th IEEE international conference on data mining (ICDM), Brussels, pp 529–538. doi:10.1109/ICDM.2012.112
Nemenyi P (1963) Distribution-free multiple comparisons. PhD thesis, New Jersey
Nguyen HV, Gopalkrishnan V (2010) Feature extraction for outlier detection in high-dimensional spaces. J Mach Learn Res Proc Track 10:66–75
Google Scholar
Nguyen HV, Ang HH, Gopalkrishnan V (2010) Mining outliers with ensemble of heterogeneous detectors on random subspaces. In: Proceedings of the 15th international conference on database systems for advanced applications (DASFAA), Tsukuba, pp 368–383. doi:10.1007/978-3-642-12026-8_29
Orair GH, Teixeira C, Wang Y, Meira W Jr, Parthasarathy S (2010) Distance-based outlier detection: consolidation and renewed bearing. Proc VLDB Endow 3(2):1469–1480
Article Google Scholar
Radovanović M, Nanopoulos A, Ivanović M (2014) Reverse nearest neighbors in unsupervised distance-based outlier detection. IEEE Trans Knowl Data Eng. doi:10.1109/TKDE.2014.2365790
Ramaswamy S, Rastogi R, Shim K (2000) Efficient algorithms for mining outliers from large data sets. In: Proceedings of the ACM international conference on management of data (SIGMOD), Dallas, pp 427–438. doi:10.1145/342009.335437
Schubert E, Wojdanowski R, Zimek A, Kriegel HP (2012) On evaluation of outlier rankings and outlier scores. In: Proceedings of the 12th SIAM international conference on data mining (SDM), Anaheim, pp 1047–1058. doi:10.1137/1.9781611972825.90
Schubert E, Zimek A, Kriegel HP (2014a) Generalized outlier detection with flexible kernel density estimates. In: Proceedings of the 14th SIAM International Conference on Data Mining (SDM), Philadelphia, pp 542–550. doi:10.1137/1.9781611973440.63
Schubert E, Zimek A, Kriegel HP (2014b) Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection. Data Min Knowl Discov 28(1):190–237. doi:10.1007/s10618-012-0300-z
Article MathSciNet MATH Google Scholar
Schubert E, Koos A, Emrich T, Züfle A, Schmid KA, Zimek A (2015a) A framework for clustering uncertain data. Proc VLDB Endow 8(12):1976–1979
Article Google Scholar
Schubert E, Zimek A, Kriegel HP (2015b) Fast and scalable outlier detection with approximate nearest neighbor ensembles. In: Proceedings of the 20th international conference on database systems for advanced applications (DASFAA), Hanoi, Vietnam, pp 19–36. doi:10.1007/978-3-319-18123-3_2
Tang J, Chen Z, Fu AWC, Cheung DW (2002) Enhancing effectiveness of outlier detections for low density patterns. In: Proceedings of the 6th Pacific-Asia conference on knowledge discovery and data mining (PAKDD), Taipei, pp 535–548. doi:10.1007/3-540-47887-6_53
Ting KM, Zhou GT, Liu FT, Tan SC (2013) Mass estimation. Mach Learn 90(1):127–160. doi:10.1007/s10994-012-5303-x
Article MathSciNet MATH Google Scholar
Vendramin L, Campello RJGB, Hruschka ER (2010) Relative clustering validity criteria: a comparative overview. Stat Anal Data Min 3(4):209–235. doi:10.1002/sam.10080
MathSciNet Google Scholar
von Luxburg U, Williamson RC, Guyon I (2012) Clustering: science or art? JMLR Workshop Conf Proc 27:65–79
Google Scholar
Vreeken J, Tatti N (2014) Interesting patterns, chapter 5. In: Aggarwal CC, Han J (eds) Frequent pattern mining. Springer, Berlin, pp 105–134. doi:10.1007/978-3-319-07821-2_5
Wang Y, Parthasarathy S, Tatikonda S (2011) Locality sensitive outlier detection: a ranking driven approach. In: Proceedings of the 27th international conference on data engineering (ICDE), Hannover, pp 410–421. doi:10.1109/ICDE.2011.5767852
Wolpert DH (1996) The lack of a priori distinctions between learning algorithms. Neural Comput 8(7):1341–1390. doi:10.1162/neco.1996.8.7.1341
Article Google Scholar
Yang J, Zhong N, Yao Y, Wang J (2008) Local peculiarity factor and its application in outlier detection. In: Proceedings of the 14th ACM international conference on knowledge discovery and data mining (SIGKDD), Las Vegas, pp 776–784. doi:10.1145/1401890.1401983
Zhang E, Zhang Y (2009) Average precision. In: Liu L, Özsu MT (eds) Encyclopedia of database systems. Springer, Berlin, pp 192–193. doi:10.1007/978-0-387-39940-9_482
Zhang K, Hutter M, Jin H (2009) A new local distance-based outlier detection approach for scattered real-world data. In: Proceedings of the 13th Pacific-Asia conference on knowledge discovery and data mining (PAKDD), Bangkok, pp 813–822. doi:10.1007/978-3-642-01307-2_84
Zimek A, Vreeken J (2015) The blind men and the elephant: on meeting the problem of multiple truths in data from clustering and pattern mining perspectives. Mach Learn 98(1–2):121–155. doi:10.1007/s10994-013-5334-y
Article MathSciNet MATH Google Scholar
Zimek A, Schubert E, Kriegel HP (2012) A survey on unsupervised outlier detection in high-dimensional numerical data. Stat Anal Data Min 5(5):363–387. doi:10.1002/sam.11161
Article MathSciNet Google Scholar
Zimek A, Campello RJGB, Sander J (2013a) Ensembles for unsupervised outlier detection: challenges and research questions. ACM SIGKDD Explor 15(1):11–22
Article Google Scholar
Zimek A, Gaudet M, Campello RJGB, Sander J (2013b) Subsampling for efficient and effective unsupervised outlier detection ensembles. In: Proceedings of the 19th ACM international conference on knowledge discovery and data mining (SIGKDD), Chicago, pp 428–436. doi:10.1145/2487575.2487676
Zimmermann A (2014) The data problem in data mining. ACM SIGKDD Explor 16(2):38–45. doi:10.1145/2783702.2783706
Article Google Scholar

Download references

Acknowledgments

This project was partially funded by FAPESP (Brazil—Grant #2013/18698-4), CNPq (Brazil—Grants #304137/2013-8 and #400772/2014-0), NSERC (Canada), and the Danish Council for Independent Research—Technology and Production Sciences (FTP) (Denmark—Grant 10-081972).

Author information

Authors and Affiliations

University of São Paulo, SCC/ICMC/USP, C.P. 668, CEP 13566-590, São Carlos, SP, Brazil
Guilherme O. Campos & Ricardo J. G. B. Campello
Ludwig-Maximilians-Universität München, Oettingenstr. 67, 80538, Munich, Germany
Arthur Zimek & Erich Schubert
Department of Computing Science, University of Alberta, Edmonton, AB, T6G 2E8, Canada
Jörg Sander
Department of Computer Science, Aarhus University, Aabogade 34, 8200, Aarhus, Denmark
Barbora Micenková & Ira Assent
National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo, 101-8430, Japan
Michael E. Houle

Authors

Guilherme O. Campos
View author publications
You can also search for this author inPubMed Google Scholar
Arthur Zimek
View author publications
You can also search for this author inPubMed Google Scholar
Jörg Sander
View author publications
You can also search for this author inPubMed Google Scholar
Ricardo J. G. B. Campello
View author publications
You can also search for this author inPubMed Google Scholar
Barbora Micenková
View author publications
You can also search for this author inPubMed Google Scholar
Erich Schubert
View author publications
You can also search for this author inPubMed Google Scholar
Ira Assent
View author publications
You can also search for this author inPubMed Google Scholar
Michael E. Houle
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Arthur Zimek.

Additional information

Responsible editor: Johannes Fuernkranz.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Campos, G.O., Zimek, A., Sander, J. et al. On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Min Knowl Disc 30, 891–927 (2016). https://doi.org/10.1007/s10618-015-0444-8

Download citation

Received: 17 July 2015
Accepted: 18 November 2015
Published: 16 January 2016
Issue Date: July 2016
DOI: https://doi.org/10.1007/s10618-015-0444-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

On normalization and algorithm selection for unsupervised outlier detection

Similarity-Based Unsupervised Evaluation of Outlier Detection

On the evaluation of outlier detection and one-class classification: a comparative study of algorithms, model selection, and ensembles

Explore related subjects

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now