Abstract
Approximate nearest neighbor algorithms are used to speed up nearest neighbor search in a wide array of applications. However, current indexing methods feature several hyperparameters that need to be tuned to reach an acceptable accuracy–speed trade-off. A grid search in the parameter space is often impractically slow due to a time-consuming index-building procedure. Therefore, we propose an algorithm for automatically tuning the hyperparameters of indexing methods based on randomized space-partitioning trees. In particular, we present results using randomized k-d trees, random projection trees and randomized PCA trees. The tuning algorithm adds minimal overhead to the index-building process but is able to find the optimal hyperparameters accurately. We demonstrate that the algorithm is significantly faster than existing approaches, and that the indexing methods used are competitive with the state-of-the-art methods in query time while being faster to build.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The gradient descent consistently converges with the learning rate \(\gamma =0.01\) in all our experiments; we did not observe further tuning of the learning rate to be necessary.
- 2.
- 3.
References
Bilen, H., Pedersoli, M., Tuytelaars, T.: Weakly supervised object detection with convex clustering. In: CVPR, pp. 1081–1089. IEEE (2015)
Boytsov, L., Naidan, B.: Engineering efficient and effective non-metric space library. In: Brisaboa, N., Pedreira, O., Zezula, P. (eds.) SISAP 2013. LNCS, vol. 8199, pp. 280–293. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41062-8_28
Dasgupta, S., Sinha, K.: Randomized partition trees for nearest neighbor search. Algorithmica 72(1), 237–263 (2015)
Dong, W., Wang, Z., Josephson, W., Charikar, M., Li, K.: Modeling LSH for performance tuning. In: CIKM, pp. 669–678. ACM (2008)
Hassan, H., Elaraby, M., Tawfik, A.Y.: Synthetic data for neural machine translation of spoken-dialects. Small 16, 17–33 (2017)
Hyvönen, V., et al.: Fast nearest neighbor search through sparse random projections and voting. In: IEEE International Conference on Big Data, pp. 881–888 (2016)
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: STOC, pp. 604–613. ACM (1998)
Jégou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. TPAMI 33(1), 117–128 (2011)
Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with GPUs. arXiv preprint arXiv:1702.08734 (2017)
Malkov, Y.A., Yashunin, D.A.: Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. arXiv:1603.09320 (2016)
McBryde, C.R.: Spacecraft visual navigation using appearance matching and multi-spectral sensor fusion. Ph.D. thesis, Georgia Institute of Technology (2018)
McCartin-Lim, M., McGregor, A., Wang, R.: Approximate principal direction trees. In: ICML, pp. 1611–1618 (2012)
Muja, M., Lowe, D.G.: Scalable nearest neighbor algorithms for high dimensional data. TPAMI 36(11), 2227–2240 (2014)
Silpa-Anan, C., Hartley, R.: Optimised KD-trees for fast image descriptor matching. In: CVPR, pp. 1–8. IEEE (2008)
Theil, H.: A rank-invariant method of linear and polynomial regression analysis. In: Henri Theil’s Contributions to Economics and Econometrics, pp. 345–381 (1992)
Verma, N., Kpotufe, S., Dasgupta, S.: Which spatial partition trees are adaptive to intrinsic dimension? In: UAI, pp. 565–574. AUAI Press (2009)
Wang, L., Tasoulis, S., Roos, T., Kangasharju, J.: Kvasir: scalable provision of semantically relevant web content on big data framework. IEEE Trans. Big Data 2(3), 219–233 (2016)
Yianilos, P.N.: Data structures and algorithms for nearest neighbor search in general metric spaces. In: SODA, vol. 93, pp. 311–321 (1993)
Acknowledgments
This project was supported by Business Finland (project 3662/31/2018 Advanced Machine Learning for Industrial Applications) and the Academy of Finland (project 311277 TensorML).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Jääsaari, E., Hyvönen, V., Roos, T. (2019). Efficient Autotuning of Hyperparameters in Approximate Nearest Neighbor Search. In: Yang, Q., Zhou, ZH., Gong, Z., Zhang, ML., Huang, SJ. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2019. Lecture Notes in Computer Science(), vol 11440. Springer, Cham. https://doi.org/10.1007/978-3-030-16145-3_46
Download citation
DOI: https://doi.org/10.1007/978-3-030-16145-3_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-16144-6
Online ISBN: 978-3-030-16145-3
eBook Packages: Computer ScienceComputer Science (R0)