Abstract
Outlying Aspect Mining (OAM) aims to find the subspaces (a.k.a. aspects) in which a given query is an outlier with respect to a given data set. Existing OAM algorithms use traditional distance/density-based outlier scores to rank subspaces. Because these distance/density-based scores depend on the dimensionality of subspaces, they cannot be compared directly between subspaces of different dimensionality. Z-score normalisation has been used to make them comparable. It requires to compute outlier scores of all instances in each subspace. This adds significant computational overhead on top of already expensive density estimation—making OAM algorithms infeasible to run in large and/or high-dimensional datasets. We also discover that Z-score normalisation is inappropriate for OAM in some cases. In this paper, we introduce a new score called Simple Isolation score using Nearest Neighbor Ensemble (SiNNE), which is independent of the dimensionality of subspaces. This enables the scores in subspaces with different dimensionalities to be compared directly without any additional normalisation. Our experimental results revealed that SiNNE produces better or at least the same results as existing scores; and it significantly improves the runtime of an existing OAM algorithm based on beam search.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The synthetic data set is downloaded from https://www.ipd.kit.edu/~muellere/HiCS/.
- 2.
- 3.
We reported results of 10 queries only out of 19 because of the page limit.
- 4.
We used the implementation of LOF available in Weka [6] and parameter k = 50.
- 5.
We present the results of the top ranked query only because of the page limit.
References
Aryal, S., Ting, K.M., Haffari, G.: Revisiting attribute independence assumption in probabilistic unsupervised anomaly detection. In: Chau, M., Wang, G.A., Chen, H. (eds.) PAISI 2016. LNCS, vol. 9650, pp. 73–86. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-31863-9_6
Bandaragoda, T.R., Ting, K.M., Albrecht, D., Liu, F.T., Wells, J.R.: Efficient anomaly detection by isolation using nearest neighbour ensemble. In: 2014 IEEE International Conference on Data Mining Workshop, pp. 698–705 (Dec 2014)
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD 2000, pp. 93–104. ACM, New York (2000)
Campos, G.O., et al.: On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Min. Knowl. Discovery 30(4), 891–927 (2016). https://doi.org/10.1007/s10618-015-0444-8
Duan, L., Tang, G., Pei, J., Bailey, J., Campbell, A., Tang, C.: Mining outlying aspects on numeric data. Data Min. Knowl. Discovery 29(5), 1116–1151 (2015). https://doi.org/10.1007/s10618-014-0398-2
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Härdle, W.: Smoothing Techniques: with Implementation in S. Springer, New York (2012). https://doi.org/10.1007/978-1-4612-4432-5
Keller, F., Muller, E., Bohm, K.: HiCS: high contrast subspaces for density-based outlier ranking. In: Proceedings of the 2012 IEEE 28th International Conference on Data Engineering, ICDE 2012, pp. 1037–1048. IEEE Computer Society, Washington, DC (2012)
Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, pp. 413–422 (2008)
Vinh, N.X., et al.: Discovering outlying aspects in large datasets. Data Min. Knowl. Discovery 30(6), 1520–1555 (2016). https://doi.org/10.1007/s10618-016-0453-2
Wells, J.R., Ting, K.M.: A new simple and efficient density estimator that enables fast systematic search. Pattern Recogn. Lett. 122, 92–98 (2019)
Zhang, J., Lou, M., Ling, T.W., Wang, H.: HOS-miner: a system for detecting outlyting subspaces of high-dimensional data. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases - Volume 30, VLDB 2004, pp. 1265–1268. VLDB Endowment, Toronto (2004)
Acknowledgments
This work is supported by Federation University Research Priority Area (RPA) scholarship, awarded to Durgesh Samariya.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Samariya, D., Aryal, S., Ting, K.M., Ma, J. (2020). A New Effective and Efficient Measure for Outlying Aspect Mining. In: Huang, Z., Beek, W., Wang, H., Zhou, R., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2020. WISE 2020. Lecture Notes in Computer Science(), vol 12343. Springer, Cham. https://doi.org/10.1007/978-3-030-62008-0_32
Download citation
DOI: https://doi.org/10.1007/978-3-030-62008-0_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-62007-3
Online ISBN: 978-3-030-62008-0
eBook Packages: Computer ScienceComputer Science (R0)