Abstract
DBSCAN is a well-established density-based clustering algorithm capable of discovering clusters of arbitrary shape with numerous practical applications. Despite the significant advances achieved by optimized variants of DBSCAN, these methods still encounter challenges when handling data with uneven density distributions. Additionally, they fail to optimally distribute the computational load in parallel architectures and are constrained by the need for fixed threshold parameter settings. These limitations represent key bottlenecks in existing DBSCAN variants. To address these issues, we propose a Parallel Density peak based DBSCAN clustering algorithm, called PaD-DBSCAN. This approach dynamically detects changes in density peaks, thereby enhancing parallel processing capabilities and eliminating the drawbacks of fixed parameter settings. Extensive experiments conducted on various datasets demonstrate the effectiveness and superiority of the PaD-DBSCAN, thus justifying our design choices.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
de Andrade Silva, J., Hruschka, E.R., Gama, J.: An evolutionary algorithm for clustering data streams with a variable number of clusters. Expert Syst. Appl. 67, 228–238 (2017)
Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: Optics: ordering points to identify the clustering structure. SIGMOD 28(2), 49–60 (1999)
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)
Cheng, D., Zhu, Q., Huang, J., Wu, Q., Yang, L.: Clustering with local density peaks-based minimum spanning tree. TKDE 33(2), 374–387 (2019)
Dafir, Z., Lamari, Y., Slaoui, S.C.: A survey on parallel clustering algorithms for big data. Artif. Intell. Rev. 54(4), 2411–2443 (2021)
Ester, M., Kriegel, H.P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, vol. 96, pp. 226–231 (1996)
Gong, S., Zhang, Y., Yu, G.: Clustering stream data by exploring the evolution of density mountain. PVLDB 11(4), 393–405 (2017)
Han, D., Agrawal, A., Liao, W.K., Choudhary, A.: A novel scalable DBSCAN algorithm with spark. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 1393–1402. IEEE (2016)
Liu, G., et al.: MCS-GPM: multi-constrained simulation based graph pattern matching in contextual social graphs. IEEE Trans. Knowl. Data Eng. 30(6), 1050–1064 (2017)
Lu, J., Liu, A., Dong, F., Gu, F., Gama, J., Zhang, G.: Learning under concept drift: a review. TKDE 31(12), 2346–2363 (2018)
Lulli, A., Dell’Amico, M., Michiardi, P., Ricci, L.: NG-DBSCAN: scalable density-based clustering for arbitrary data. PVLDB 10(3), 157–168 (2016)
Luo, G., Luo, X., Gooch, T.F., Tian, L., Qin, K.: A parallel DBSCAN algorithm based on spark. In: BDCloud, pp. 548–553. IEEE (2016)
McInnes, L., Healy, J., Astels, S., et al.: HDBSCAN: hierarchical density based clustering. J. Open Sour. Softw. 2(11), 205 (2017)
Noticewala, M., Vaghela, D.: MR-IDBSCAN: efficient parallel incremental DBSCAN algorithm using MapReduce. Int. J. Comput. Appl. 93(4) (2014)
Patwary, M.M.A., Palsetia, D., Agrawal, A., Liao, W.k., Manne, F., Choudhary, A.: A new scalable parallel DBSCAN algorithm using the disjoint-set data structure. In: SC, pp. 1–11. IEEE (2012)
Puschmann, D., Barnaghi, P., Tafazolli, R.: Adaptive clustering for dynamic IoT data streams. IEEE Internet Things J. 4(1), 64–74 (2016)
Qiu, T., Li, Y.J.: Fast LDP-MST: an efficient density-peak-based clustering method for large-size datasets. TKDE 35(5), 4767–4780 (2023)
Ran, X., Zhou, X., Lei, M., Tepsan, W., Deng, W.: A novel k-means clustering algorithm with a noise algorithm for capturing urban hotspots. Appl. Sci. 11(23), 11202 (2021)
Song, H., Lee, J.G.: RP-DBSCAN: a superfast parallel DBSCAN algorithm based on random partitioning. In: SIGMOD, pp. 1173–1187 (2018)
Xia, S., et al.: A fast adaptive k-means with no bounds. TPAMI (2020)
Xiong, Z., Chen, R., Zhang, Y., Zhang, X.: Multi-density DBSCAN algorithm based on density levels partitioning. J. Inf. Comput. Sci. 9(10), 2739–2749 (2012)
Xu, X., Jäger, J., Kriegel, H.P.: A fast parallel clustering algorithm for large spatial databases. In: Guo, Y., Grossman, R. (eds.) High Performance Data Mining, pp. 263–290. Springer, Boston (1999). https://doi.org/10.1007/0-306-47011-X_3
Yewang, C., Hailu, C., Yi, C., Zhao, K., Zhen, L., Jixiang, D.: Survey on DBSCAN acceleration algorithms for large scale data. J. Comput. Res. Dev. 60, 2028–2047 (2023)
Zhang, Y., Liu, G., Liu, A., Zhang, Y., Li, Z., Zhang, X., Li, Q.: Personalized geographical influence modeling for POI recommendation. IEEE Intell. Syst. 35(5), 18–27 (2020)
Acknowledgements
This work was supported by the Natural Science Foundation of the National Natural Science Foundation of China under grant (No. 61802273), Jiangsu Higher Education Institutions of China (No. 23KJA520011), China Science and Technology Plan Project of Suzhou (No. SYG202139).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wang, Y., Fang, J., Fu, R., Chao, P. (2025). PaD-DBSCAN: Enhancing Parallel DBSCAN Clustering with Density Peak Detection. In: Sheng, Q.Z., et al. Advanced Data Mining and Applications. ADMA 2024. Lecture Notes in Computer Science(), vol 15387. Springer, Singapore. https://doi.org/10.1007/978-981-96-0811-9_21
Download citation
DOI: https://doi.org/10.1007/978-981-96-0811-9_21
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-96-0810-2
Online ISBN: 978-981-96-0811-9
eBook Packages: Computer ScienceComputer Science (R0)