Abstract
Scale-invariant keypoint detection is a fundamental problem in low-level vision. To accelerate keypoint detectors (e.g. DoG, Harris-Laplace, Hessian-Laplace) that are developed in Gaussian scale-space, various fast detectors (e.g., SURF, CenSurE, and BRISK) have been developed by approximating Gaussian filters with simple box filters. However, there is no principled way to design the shape and scale of the box filters. Additionally, the involved integral image technique makes it difficult to figure out the continuous kernels that correspond to the discrete ones used in these detectors, so there is no guarantee that those good properties such as causality in the original Gaussian space can be inherited. To address these issues, in this paper, we propose a unified B-spline framework for scale-invariant keypoint detection. Owing to an approximate relationship to Gaussian kernels, the B-spline framework provides a mathematical interpretation of existing fast detectors based on integral images. In addition, from B-spline theories, we illustrate the problem in repeated integration, which is the generalized version of the integral image technique. Finally, following the dominant measures for keypoint detection and automatic scale selection, we develop B-spline determinant of Hessian (B-DoH) and B-spline Laplacian-of-Gaussian (B-LoG) as two instantiations within the unified B-spline framework. For efficient computation, we propose to use repeated running-sums to convolve images with B-spline kernels with fixed orders, which avoids the problem of integral images by introducing an extra interpolation kernel. Our B-spline detectors can be designed in a principled way without the heuristic choice of kernel shape and scales and naturally extend the popular SURF and CenSurE detectors with more complex kernels. Extensive experiments on the benchmark dataset demonstrate that the proposed detectors outperform the others in terms of repeatability and efficiency.

















Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Available from http://www.robots.ox.ac.uk/~vgg/research/affine/
References
Afonso, M. V., Nascimento, J. C., & Marques, J. S. (2014). Automatic estimation of multiple motion fields from video sequences using a region matching based approach. IEEE Transactions on Multimedia, 16(1), 1–14.
Agrawal, M., Konolige, K., & Blas,M. R. (2008). Censure: Center surround extremas for realtime feature detection and matching. In European Conference on Computer Vision (pp. 102-115). Springer, Berlin, Heidelberg.
Awrangjeb, M., Lu, G., & Fraser, C. S. (2012). Performance comparisons of contour-based corner detectors. IEEE Transactions on Image Processing, 21(9), 4167–4179.
Babaud, J., Witkin, A. P., Baudin, M., & Duda, R. O. (1986). Uniqueness of the gaussian kernel for scale-space filtering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1, 26–33.
Balntas, V., Lenc, K., Vedaldi, A., & Mikolajczyk, K. (2017). Hpatches: A benchmark and evaluation of handcrafted and learned local descriptors. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5173–5182).
Barroso-Laguna, A., Riba, E., Ponsa, D., & Mikolajczyk, K. (2019). Key. net: Keypoint detection by handcrafted and learned cnn filters. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 5836–5844).
Bay, H., Tuytelaars, T., & Van Gool, L. (2006). Surf: Speeded up robust features. In European Conference on Computer Vision (pp. 404-417). Springer, Berlin, Heidelberg.
Bay, H., Ess, A., Tuytelaars, T., & Van Gool, L. (2008). Speeded-up robust features (SURF). Computer Vision and Image Understanding, 110 (3): 346–359. ISSN 1077-3142.
Benbihi, A., Geist, M., & Pradalier, C. (2019). Elf: Embedded localisation of features in pre-trained cnn. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 7940–7949).
Bouma, H., Vilanova, A., Bescós, J. O., ter Haar Romeny, B. M., Gerritsen, F. A. (2007). Fast and accurate gaussian derivatives based on b-splines. In International Conference on Scale Space and Variational Methods in Computer Vision (pp. 406-417). Springer, Berlin, Heidelberg.
Bretzner, L., Laptev, I., & Lindeberg, T. (2002). Hand gesture recognition using multi-scale colour features, hierarchical models and particle filtering. In Proceedings of fifth IEEE international conference on automatic face gesture recognition (pp. 423–428). IEEE.
Brown, M., & Lowe, D. (2002). Invariant features from interest point groups. In British Machine Vision Conference. Citeseer.
Canny, J. (1987). A computational approach to edge detection. In Readings in computer vision (pp. 184–203). Elsevier, Amsterdam.
Chaudhury, K., Muñoz-Barrutia, A., & Unser, M. (2010). Fast space-variant elliptical filtering using box splines. Image Processing, IEEE Transactions on, 19 (9): 2290 –2306. ISSN 1057-7149. https://doi.org/10.1109/TIP.2010.2046953.
Crow, F. (1984). Summed-area tables for texture mapping. ACM SIGGRAPH Computer Graphics, 18(3), 207–212.
Csurka, G., Dance, C., Fan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In Workshop on statistical learning in computer vision, ECCV 1, pp. 1–2. Prague.
Deselaers, T., Keysers, D., & Ney, H. (2008). Features for image retrieval: an experimental comparison. Information retrieval, 11(2), 77–107.
DeTone, D., Malisiewicz, T., & Rabinovich, A. (2018). Superpoint: Self-supervised interest point detection and description. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 224–236).
Dusmanu, M., Rocco, I., Pajdla, T., Pollefeys, M., Sivic, J., Torii, A., & Sattler, T. (2019). D2-net: A trainable cnn for joint description and detection of local features. In Proceedings of the ieee/cvf conference on computer vision and pattern recognition (pp. 8092–8101).
Fauqueur, J., Brostow, G., & Cipolla, R. (2007). Assisted video object labeling by joint tracking of regions and keypoints. In 2007 IEEE 11th International Conference on Computer Vision (pp. 1–7). IEEE.
DeTone, D., Malisiewicz, T., & Rabinovich, A. (2018). Superpoint: Self-supervised interest point detection and description. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 224–236).
Goh, S., Goodman, T., & Lee, S. (2007). Causality properties of refinable functions and sequences. Advances in Computational Mathematics, 26(1), 231–250.
Harris, C. G., Stephens, M., et al. (1988). A combined corner and edge detector. In Alvey vision conference 15, pp. 10–5244. Citeseer.
Heckbert, P. S. (1986). Filtering by repeated integration. In ACM SIGGRAPH Computer Graphics, 20, pp. 315–321. ACM.
Herman, G., Zhang, B., Wang, Y., Ye, G., & Chen, F. (2013). Mutual information-based method for selecting informative feature sets. Pattern Recognition, 46(12), 3315–3327.
Kadir, T., & Brady, M. (2001). Saliency, scale and image description. International Journal of Computer Vision, 45(2): 83–105. ISSN 0920-5691.
Kienzle, W., Wichmann, F., Scholkopf, B., & Franz, M. (2007). A nonparametric approach to bottom-up visual saliency. Advances in Neural Information Processing Systems, 19, 689.
Koenderink, J. J. (1984). The structure of images. Biological cybernetics, 50(5), 363–370.
S. Krig. (2016). Interest point detector and feature descriptor survey. In Computer vision metrics (pp. 187-246). Springer, Cham.
Łągiewka, M., Korytkowski, M., & Scherer, R. (2017). Distributed image retrieval with color and keypoint features. In 2017 IEEE International Conference on INnovations in Intelligent SysTems and Applications (INISTA) (pp. 45–50). IEEE.
Lawton, W., Lee, S., & Shen, Z. (1995). Characterization of compactly supported refinable splines. Advances in computational mathematics, 3(1–2), 137–145.
Ledwich, L., & Williams, S. (2004). Reduced sift features for image retrieval and indoor localisation. In Australian conference on robotics and automation 322, pp. 3. Citeseer.
Leutenegger, S., Chli, M., & Siegwart, R. (2011). Brisk: Binary robust invariant scalable keypoints. In 2011 IEEE international conference on computer vision (ICCV) (pp. 2548–2555). IEEE.
Li, Y., Wang, S., Tian, Q., & Ding, X. (2015). A survey of recent advances in visual feature detection. Neurocomputing, 149, 736–751.
Lindeberg, T. (1994). Scale-space theory: A basic tool for analyzing structures at different scales. Journal of applied statistics, 21(1–2), 225–270.
Lindeberg, T. (1998). Feature detection with automatic scale selection. International Journal of Computer Vision, 30(2), 79–116.
Lindeberg, T. (2009). Scale-space. Encyclopedia of Computer Science and Engineering, (B. Wah, ed), IV: 2495–2504.
Lindeberg, T. (2013). Scale selection properties of generalized scale-space interest point detectors. Journal of Mathematical Imaging and vision, 46(2), 177–210.
Lindeberg, T. (2014). Scale selection. Computer Vision: A Reference Guide, (K. Ikeuchi, ed.) (pp. 701–713).
Lindeberg, T. (2015). Image matching using generalized scale-space interest points. Journal of mathematical Imaging and Vision, 52(1), 3–36.
Lindeberg, T. (2018). Spatio-temporal scale selection in video data. Journal of Mathematical Imaging and Vision, 60(4), 525–562.
Lindeberg, T., & Bretzner, L. (2003). Real-time scale selection in hybrid multi-scale representations. In International Conference on Scale-Space Theories in Computer Vision (pp. 148-163). Springer, Berlin, Heidelberg.
Lorenz, C., Carlsen, I., Buzug, T., Fassnacht, C., & Weese, J. (1997). Multi-scale line segmentation with automatic estimation of width, contrast and tangential direction in 2d and 3d medical images. In CVRMed-MRCAS’97. (pp. 233-242). Springer, Berlin, Heidelberg.
Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60 (2): 91–110. ISSN 0920-5691.
Mair, E., Hager, G. D., Burschka, D., Suppa, M., & Hirzinger, G. (2010). Adaptive and generic corner detection based on the accelerated segment test. In European conference on Computer vision (pp. 183-196). Springer, Berlin, Heidelberg.
Mikolajczyk, K. (2002). Detection of local features invariant to affine transformations. PhD thesis, Institut National Polytechnique de Grenoble, France.
Mikolajczyk, K., Schmid, C. (2004). Scale & affine invariant interest point detectors. International Journal of Computer Vision, 60(1): 63–86. ISSN 0920-5691.
Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., & Gool, L. (2005). A comparison of affine region detectors. International Journal of Computer Vision, 65(1): 43–72. ISSN 0920-5691.
Mortensen, E., Deng, H., & Shapiro, L. (2005). A sift descriptor with global context. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, 1, pp. 184–190. IEEE.
Muñoz, A., Ertlé, R., & Unser, M. (2002). Continuous wavelet transform with arbitrary scales and O (N) complexity. Signal processing, 82 (5): 749–757. ISSN 0165-1684.
Muñoz-Barrutia, A., Artaechevarria, X., & Ortiz-de Solorzano, C. (2010). Spatially variant convolution with scaled b-splines. Image Processing, IEEE Transactions on, 19 (1): 11 –24. ISSN 1057-7149. https://doi.org/10.1109/TIP.2009.2031235.
Ono, Y., Trulls Fortuny, E., Fua, P., & Yi, K. M. (2018). Lf-net: Learning local features from images. In Neural Information Processing Systems (NIPS), number CONF.
Revaud, J., Weinzaepfel, P., De Souza, C., Pion, N., Csurka, G., Cabon, Y., & Humenberger, M. (2019). R2d2: repeatable and reliable detector and descriptor. In Neural Information Processing Systems (NIPS).
Rosten, E., Porter, R., & Drummond, T. (2010). Faster and better: A machine learning approach to corner detection. IEEE transactions on pattern analysis and machine intelligence, 32(1), 105–119.
Rublee, E., Rabaud, V., Konolige, K., & Bradski, G. R. (2011). Orb: An efficient alternative to sift or surf. In ICCV, 11, pp. 2. Citeseer.
Savinov, N., Seki, A., Ladicky, L., Sattler, T., & Pollefeys, M. (2017). Quad-networks: unsupervised learning to rank for interest point detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1822–1830).
Tola, E., Lepetit, V., & Fua, P. (2009). Daisy: An efficient dense descriptor applied to wide-baseline stereo. IEEE transactions on pattern analysis and machine intelligence, 32(5), 815–830.
Tuytelaars, T., Mikolajczyk, K., et al. (2008). Local invariant feature detectors: a survey. Foundations and trends® in computer graphics and vision, 3(3), 177–280.
Unser, M., Aldroubi, A., & Eden, M. (1992). On the asymptotic convergence of b-spline wavelets to gabor functions. IEEE Transactions on Information Theory, 38 (2): 864 –872, Mar. ISSN 0018-9448. https://doi.org/10.1109/18.119742.
Unser, M., Aldroubi, A., & Eden, M. (1993a). B-spline signal processing. i. theory. Signal Processing, IEEE Transactions on, 41(2): 821 –833, Feb. ISSN 1053-587X. https://doi.org/10.1109/78.193220.
Unser, M., Aldroubi, A., & Eden, M. (1993). The l/sub 2/-polynomial spline pyramid. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(4), 364–379.
Unser, M., Aldroubi, A., & Schiff, S. (1994). Fast implementation of the continuous wavelet transform with integer scales. IEEE Transactions on Signal Processing, 42 (12): 3519 –3523, Dec. ISSN 1053-587X. https://doi.org/10.1109/78.340787.
van denBoomgaard, R., van derWeij, R. (2006). Gaussian convolutions numerical approximations based on interpolation. Scale-Space and Morphology in Computer Vision (pp. 205–214).
Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In Computer Vision and Pattern Recognition, 1, 2001. https://doi.org/10.1109/CVPR.2001.990517
Wang, Y.-P., & Lee, S. (1998). Scale-space derived from b-splines. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(10): 1040 –1055, Oct. ISSN 0162-8828. https://doi.org/10.1109/34.722612.
Wang, Z., Xiao, H., He, W., Wen, F., & Yuan, K. (2013). Real-time sift-based object recognition system. In 2013 IEEE International Conference on Mechatronics and Automation (pp. 1361–1366). IEEE.
Witkin, A. P. (1987). Scale-space filtering. In Readings in Computer Vision (pp. 329–332). Elsevier, Amsterdam.
Zhang, J., Marszałek, M., Lazebnik, S., & Schmid, C. (2007). Local features and kernels for classification of texture and object categories: A comprehensive study. International Journal of Computer Vision, 73(2), 213–238.
Acknowledgements
This research was supported by the NSFC (No. 61772220 and 62172177).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Tinne Tuytelaars.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Proof of Proposition 1
Proof
It is obvious that the first derivative of the scaled \(zero^{th}\) order B-spline \(\varphi ^0_s(x)\) is \(\varDelta _s(x+\frac{s}{2})\). Using (2), the scaled B-spline of degree n can be expressed as
Thus, the \((n+1)^{th}\) derivative of \(\varphi ^n_s(x)\) can be calculated using (36), i.e.,
where \(D^{n+1}\) is the \((n+1)\)-fold iteration of the differential operator \(Df(x)=\frac{\partial {f(x)}}{\partial {x}}\). Finally, using (37)
\(\square \)
Appendix B: Proof of Theorem 1
Proof
Using (6), the \((n+1)^{th}\) finite difference of the discrete B-spline \(\phi _s^n(k)\) can be calculated as
Replacing \(D^{-(n+1-d)}\) with the \((n+1-d)\)-fold iteration of the running-sum operator \(\varDelta ^{-(n+1-d)}\), (21) is reformulated as
\(\square \)
Appendix C: Proof of Proposition 2
Proof
From (9), we can establish the relation between \(\varphi _{s,d}^n(k)\) and \(\phi _{s,d}^n(k)\):
Similar to the idea of repeated integration, the results remain unchanged if the finite difference operator \(\varDelta \) is applied to the kernel and running-sum operator \(\varDelta ^{-1}\) is applied to the original signal, i.e.,
where the third equation derives using (41), and the last equation derives using (39). \(\square \)
Rights and permissions
About this article
Cite this article
Zheng, Q., Gong, M., You, X. et al. A Unified B-Spline Framework for Scale-Invariant Keypoint Detection. Int J Comput Vis 130, 777–799 (2022). https://doi.org/10.1007/s11263-021-01568-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-021-01568-3