Abstract
This paper presents a nearest base-neighbor (NBN) search that can be applied to a clustered nearest neighbor problem on spatial datasets with static properties. Given two sets of data points R and S, a query point q, distance threshold δ and cardinality threshold k, the NBN query retrieves a nearest point r (called the base-point) in R where more than k points in S are located within the distance δ. In this paper, we formally define a base-point and NBN problem. As the brute-force approach to this problem in massive datasets has large computational and I/O costs, we propose in-memory and external memory processing techniques for NBN queries. In particular, our proposed in-memory algorithms are used to minimize I/Os in the external memory algorithms. Furthermore, we devise a solution-based index, which we call the neighborhood-augmented grid, to dramatically reduce the search space. A performance study is conducted both on synthetic and real datasets. Our experimental results show the efficiency of our proposed approach.
Similar content being viewed by others
References
Aly AM, Aref WG, Ouzzani M (2012) Spatial queries with two kNN predicates. PVLDB 5(11):1100–1111. https://doi.org/10.14778/2350229.2350231
Aly AM, Aref WG, Ouzzani M (2015) Spatial queries with k-nearest-neighbor and relational predicates. In: Proceedings of the 23rd SIGSPATIAL international conference on advances in geographic information systems. https://doi.org/10.1145/2820783.2820815
Cao X, Cong G, Jensen CS, Ooi BC (2011) Collective spatial keyword querying. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 373–384. https://doi.org/10.1145/1989323.1989363
Chen Y, Patel JM (2007) Efficient evaluation of all-nearest-neighbor queries. In: Proceedings of IEEE international conference on data engineering, pp 1056–1065. https://doi.org/10.1109/ICDE.2007.368964
Cheung K, Fu AWC (1998) Enhanced nearest neighbor search on the R-tree. ACM SIGMOD Record 27(3):16–21. https://doi.org/10.1145/290593.290596
Choi DW, Chung CW (2015) Nearest neighborhood search in spatial databases. In: Proceedings of IEEE international conference on data engineering, pp 699–710. https://doi.org/10.1109/ICDE.2015.7113326
Deng K, Sadiq SW, Zhou X, Xu H, Fung GPC, Lu Y (2012) On group nearest group query processing. IEEE Trans Knowl Data Eng 24(2):295–308. https://doi.org/10.1109/TKDE.2010.230
Deng K, Zhou X, Shen HT, Xu K, Lin X (2006) Surface k-NN query processing. In: Proceedings of IEEE international conference on data engineering, p 78. https://doi.org/10.1109/ICDE.2006.152
Gan J, Tao Y (2015) DBSCAN Revisited: Mis-Claim, Un-fixability, and approximation. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 519–530. https://doi.org/10.1145/2723372.2737792
Gao Y, Liu Q, Miao X, Yang J (2016) Reverse k-nearest neighbor search in the presence of obstacles. Inf Sci 330(10):274–292. https://doi.org/10.1016/j.ins.2015.10.022
Gunawan A (2013) A faster algorithm for DBSCAN. Master’s Thesis. Technische University Eindhoven, Eindhoven
Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 47–57. https://doi.org/10.1145/971697.602266
Hjaltason G, Samet H (1999) Distance browsing in spatial databases. ACM Trans Database Syst 24(2):265–318. https://doi.org/10.1145/320248.320255
Kolahdouzan MR, Shahabi C (2004) Voronoi-based k nearest neighbor search for spatial network databases. In: International conference on very large data bases, pp 840–851
Korn F, Muthukrishnan S (2000) Influence sets based on reverse nearest neighbor queries. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 201–212. https://doi.org/10.1145/335191.335415
Lee EY, Cho HJ, Chung TS, Ryu KY (2015) Moving range k nearest neighbor queries with quality guarantee over uncertain moving objects. Inf Sci 325(20):324–341. https://doi.org/10.1016/j.ins.2015.07.034
Papadias D, Kalnis P, Zhang J, Tao Y (2001) Efficient OLAP operations in spatial data warehouse. In: Proceedings of the international symposium on spatial and temporal databases, pp 443–459
Papadias D, Shen Q, Tao Y, Mouratidis K (2004) Group nearest neighbor queries. In: Proceedings of IEEE international conference on data engineering. https://doi.org/10.1109/ICDE.2004.1320006
Papadias D, Tao Y, Mouratidis K, Hui CK (2005) Aggregate nearest neighbor queries in spatial databases. ACM Trans Database Syst 30(2):529–576. https://doi.org/10.1145/1071610.1071616
Roussopoulos N, Kelly S, Vincent F (1995) Nearest neighbor queries. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 71–79
Tao Y, Papadias D, Lian X (2002) Continuous nearest neighbor search. In: International conference on very large data bases, pp 287–298
Yi S, Ryu H, Son J, Chung YD (2014) View field nearest neighbor: a novel type of spatial queries. Inf Sci 275:68–82. https://doi.org/10.1016/j.ins.2014.02.022
Zhang D, Chan CY, Tan KL (2013) Nearest group queries. In: Proceeding of the conference on scientific and statistical database management. https://doi.org/10.1145/2484838.2484866
Zhang D, Chee YM, Mondal A, Tung AKH, Kitsuregawa M (2009) Keyword search in spatial databases: Towards searching by document. In: Proceedings of IEEE international conference on data engineering, pp 688–699. https://doi.org/10.1109/ICDE.2009.77
Zhang J, Mamoulis N, Papadias D, Tao Y (2004) All-nearest-neighbors queries in spatial databases. In: Proceedings of the 16th international conference on scientific and statistical database management, pp 297–306. https://doi.org/10.1109/SSDM.2004.1311221
U.S. Cencus Bureau. Tiger/Line Shapefiles, https://www.census.gov/geo/maps-data/data/tiger.html. Accessed on 26 Mar 2019
Acknowledgements
This work was supported by the National Research Foundation of Korea (NRF) Grant Funded by the Korean Government (MSIP) (NRF-2016R1A2B1014013) and by Basic Science Research Program through the National Research Foundation of Korea (NRF) Funded by the Ministry of Education (2016R1D1A1B03930907)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 A. Construction of grid
Algorithm 8 shows the construction procedure of the grid used in DPA. The algorithm takes two data sets (RS and SS) and δ as input. We initialize variables for grid construction and calculate the number of cells (Lines 1–6). We initialize each cell ci iteratively by the number of cells and put ci with the key value i into the grid map (Lines 7–13). In the example of Fig. 17, the key value of c10 is 10. Finally, we find the cell c where the point is located for each data point of RS and SS through Algorithm 9 and update the SCR(c) or MCR(c) or c.SS (Lines 14–20). Figure 17 shows an example of a grid construction. nCols is 5, nRows is 5 and the number of cells is 25. RN(c13) represents 21 gray cells including c13.
Rights and permissions
About this article
Cite this article
Jang, HJ., Hyun, KS., Chung, J. et al. Nearest base-neighbor search on spatial datasets. Knowl Inf Syst 62, 867–897 (2020). https://doi.org/10.1007/s10115-019-01360-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-019-01360-3