Abstract
Outlier detection is a primary aspect in data-mining and machine learning applications, also known as outlier mining. The importance of outlier detection in medical data came from the fact that outliers may carry some precious information however outlier detection can show very bad performance in the presence of high dimensional data. In this paper, a new outlier detection technique is proposed based on a feature selection strategy to avoid the curse of dimensionality, named Infinite Feature Selection DBSCAN. The main purpose of our proposed method is to reduce the dimensions of a high dimensional data set in order to efficiently identify outliers using clustering techniques. Simulations on real databases proved the effectiveness of our method taking into account the accuracy, the error-rate, F-score and the retrieval time of the algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Laurikkala, J., Juhola, M., Kentala, E., Lavrac, N., Miksch, S., Kavsek, B.: Informal identification of outliers in medical data. In: Fifth International Workshop on Intelligent Data Analysis in Medicine and Pharmacology, vol. 1, pp. 20–24 (2000)
Goldstein, M., Dengel, A.: Histogram-based outlier score (HBOS): a fast unsupervised anomaly detection algorithm. In: KI-2012: Poster and Demo Track, pp. 59–63 (2012)
Kriegel, H.-P., Zimek, A., et al.: Angle-based outlier detection in high-dimensional data. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 444–452. ACM (2008)
Zhang, K., Hutter, M., Jin, H.: A new local distance-based outlier detection approach for scattered real-world data. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 813–822. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01307-2_84
Breunig, M.M., Kriegel, H.-P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: ACM SIGMOD Record, vol. 29, pp. 93–104. ACM (2000)
Jin, W., Tung, A.K.H., Han, J., Wang, W.: Ranking outliers using symmetric neighborhood relationship. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 577–593. Springer, Heidelberg (2006). https://doi.org/10.1007/11731139_68
Ester, M., Kriegel, H.-P., Sander, J., Xiaowei, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, vol. 96, pp. 226–231 (1996)
Xianting, Q., Pan, W.: A density-based clustering algorithm for high-dimensional data with feature selection. In: 2016 International Conference on Industrial Informatics-Computing Technology, Intelligent Technology, Industrial Information Integration(ICIICII), pp. 114–118. IEEE (2016)
Huang, J., Zhu, Q., Yang, L., Cheng, D.D., Quanwang, W.: A novel outlier cluster detection algorithm without top-n parameter. Knowl. Based Syst. 121, 32–40 (2017)
Smiti, A., Elouedi, Z.: COID: maintaining case method based on clustering, outliers and internal detection. In: Lee, R., Ma, J., Bacon, L., Du, W., Petridis, M. (eds.) Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing 2010. SCI, vol. 295, pp. 39–52. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13265-0_4
Smiti, A., Elouedi, Z.: WCOID: maintaining case-based reasoning systems using weighting, clustering, outliers and internal cases detection. In: International Conference on Intelligent Systems Design and Applications (ISDA), pp. 356–361. IEEE Computer Society (2011)
UCI machine learning repository. https://archive.ics.uci.edu/ml/index.php/
Roffo, G., Melzi, S., Cristani, M.: Infinite feature selection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4202–4210 (2015)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Messaoud, T.A., Smiti, A., Louati, A. (2019). A Novel Density-Based Clustering Approach for Outlier Detection in High-Dimensional Data. In: Pérez García, H., Sánchez González, L., Castejón Limas, M., Quintián Pardo, H., Corchado Rodríguez, E. (eds) Hybrid Artificial Intelligent Systems. HAIS 2019. Lecture Notes in Computer Science(), vol 11734. Springer, Cham. https://doi.org/10.1007/978-3-030-29859-3_28
Download citation
DOI: https://doi.org/10.1007/978-3-030-29859-3_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29858-6
Online ISBN: 978-3-030-29859-3
eBook Packages: Computer ScienceComputer Science (R0)