A New Local Distance-Based Outlier Detection Approach for Scattered Real-World Data | SpringerLink
Skip to main content

A New Local Distance-Based Outlier Detection Approach for Scattered Real-World Data

  • Conference paper
Advances in Knowledge Discovery and Data Mining (PAKDD 2009)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5476))

Included in the following conference series:

Abstract

Detecting outliers which are grossly different from or inconsistent with the remaining dataset is a major challenge in real-world KDD applications. Existing outlier detection methods are ineffective on scattered real-world datasets due to implicit data patterns and parameter setting issues. We define a novel Local Distance-based Outlier Factor (LDOF) to measure the outlier-ness of objects in scattered datasets which addresses these issues. LDOF uses the relative location of an object to its neighbours to determine the degree to which the object deviates from its neighbourhood. We present theoretical bounds on LDOF’s false-detection probability. Experimentally, LDOF compares favorably to classical KNN and LOF based outlier detection. In particular it is less sensitive to parameter values.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  • Barnett, V.: Outliers in Statistical Data. John Wiley, Chichester (1994)

    MATH  Google Scholar 

  • Breunig, M.M., Kriegel, H.-P., Ng, R.T., Sander, J.: OPTICS-OF: Identifying local outliers. In: Żytkow, J.M., Rauch, J. (eds.) PKDD 1999. LNCS, vol. 1704, pp. 262–270. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  • Breunig, M.M., Kriegel, H.-P., Ng, R.T., Sander, J.: LOF: Identifying density-based local outliers. In: SIGMOD Conference, pp. 93–104 (2000)

    Google Scholar 

  • Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density- based algorithm for discovering clusters in large spatial databases with noise. In: KDD, pp. 226–231 (1996)

    Google Scholar 

  • Fan, H., Zaïane, O.R., Foss, A., Wu, J.: A non- parametric outlier detection for effectively discovering top-n outliers from engineering data. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS, vol. 3918, pp. 557–566. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  • Hawkins, D.: Identification of Outliers. Chapman and Hall, London (1980)

    Book  MATH  Google Scholar 

  • Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: VLDB, pp. 392–403 (1998)

    Google Scholar 

  • Kriegel, H.-P., Schubert, M., Zimek, A.: Angle-based outlier detection in high-dimensional data. In: KDD, pp. 444–452 (2008)

    Google Scholar 

  • Mardia, K.V., Kent, J.T., Bibby, J.M.: Multivariate analysis. Academic Press, New York (1979)

    MATH  Google Scholar 

  • Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algo- rithms for mining outliers from large data sets. In: SIGMOD Conference, pp. 427–438 (2000)

    Google Scholar 

  • Tang, J., Chen, Z., Fu, A.W.-C., Cheung, D.W.-L.: Enhancing effectiveness of outlier detections for low density patterns. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS, vol. 2336, pp. 535–548. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  • Tukey, J.W.: Exploratory Data Analysis. Addison-Wiley, Chichester (1977)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, K., Hutter, M., Jin, H. (2009). A New Local Distance-Based Outlier Detection Approach for Scattered Real-World Data. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, TB. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2009. Lecture Notes in Computer Science(), vol 5476. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01307-2_84

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-01307-2_84

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-01306-5

  • Online ISBN: 978-3-642-01307-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics