Abstract
We propose a way to train deep learning based keypoint descriptors that makes them approximately equivariant for locally affine transformations of the image plane. The main idea is to use the representation theory of GL(2) to generalize the recently introduced concept of steerers from rotations to affine transformations. Affine steerers give high control over how keypoint descriptions transform under image transformations. We demonstrate the potential of using this control for image matching. Finally, we propose a way to finetune keypoint descriptors with a set of steerers on upright images and obtain state-of-the-art results on several standard benchmarks. Code will be published at github.com/georg-bn/affine-steerers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
In practice, image warps corresponding to camera motions are piecewise continuous and piecewise differentiable. The discontinuities stem from motion boundaries, which we will ignore in the theoretical part of this paper.
- 2.
An irrep on V is a representation that does not leave any proper subspace \(W\subset V\) invariant. Irreps can be thought of as fundamental building blocks of representations as many general representations can be decomposed into irreps. However, for \(\textrm{GL}(2)\), not all representations can be built out of its irreps. The standard counterexample is [63, Example 4.11].
References
Balntas, V., Lenc, K., Vedaldi, A., Mikolajczyk, K.: HPatches: a benchmark and evaluation of handcrafted and learned local descriptors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5173–5182 (2017)
Balntas, V., Riba, E., Ponsa, D., Mikolajczyk, K.: Learning local feature descriptors with triplets and shallow convolutional neural networks. In: BMVC (2016)
Barath, D., Mishkin, D., Cavalli, L., Sarlin, P.E., Hruby, P., Pollefeys, M.: Affineglue: joint matching and robust estimation. arXiv preprint arXiv:2307.15381 (2023)
Barath, D., Polic, M., Förstner, W., Sattler, T., Pajdla, T., Kukelova, Z.: Making affine correspondences work in camera geometry computation. In: European Conference on Computer Vision (ECCV), pp. 723–740 (2020)
Barroso-Laguna, A., Riba, E., Ponsa, D., Mikolajczyk, K.: Key.Net: keypoint detection by handcrafted and learned CNN filters. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5836–5844 (2019)
Bentolila, J., Francos, J.M.: Conic epipolar constraints from affine correspondences. Comput. Vis. Image Underst. (CVIU) 122, 105–114 (2014)
Bruintjes, R.J., Motyka, T., van Gemert, J.: What affects learned equivariance in deep image recognition models? In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 4838–4846 (2023)
Brynte, L., Iglesias, J.P., Olsson, C., Kahl, F.: Learning structure-from-motion with graph attention networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2024)
Bökman, G., Edstedt, J., Felsberg, M., Kahl, F.: Steerers: a framework for rotation equivariant keypoint descriptors. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2024)
Bökman, G., Kahl, F.: A case for using rotation invariant features in state of the art feature matchers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5110–5119 (2022)
Bökman, G., Kahl, F.: Investigating how ReLU-networks encode symmetries. In: Thirty-Seventh Conference on Neural Information Processing Systems (2023). https://openreview.net/forum?id=8lbFwpebeu
Bökman, G., Kahl, F., Flinth, A.: ZZ-Net: a universal rotation equivariant architecture for 2D point clouds. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Cao, C., Fu, Y.: Improving transformer-based image matching by cascaded capturing spatially informative keypoints. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 12129–12139 (2023)
Caron, M., et al.: Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9650–9660 (2021)
Chen, H., et al.: Learning to match features with seeded graph matching network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6301–6310 (2021)
Cohen, T., Welling, M.: Group equivariant convolutional networks. In: International Conference on Machine Learning, pp. 2990–2999. PMLR (2016)
Cohen, T.S., Welling, M.: Transformation properties of learned visual representations. ICLR 2015 arXiv:1412.7659 (2014)
DeTone, D., Malisiewicz, T., Rabinovich, A.: Superpoint: self-supervised interest point detection and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 224–236 (2018)
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=YicbFdNTTy
Dusmanu, M., et al.: D2-Net: a trainable CNN for joint detection and description of local features. In: Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019)
Edstedt, J., Athanasiadis, I., Wadenbäck, M., Felsberg, M.: DKM: dense kernelized feature matching for geometry estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (2023)
Edstedt, J., Bökman, G., Wadenbäck, M., Felsberg, M.: DeDoDe: detect, don’t describe – describe, don’t detect for local feature matching. In: 2024 International Conference on 3D Vision (3DV). IEEE (2024)
Edstedt, J., Sun, Q., Bökman, G., Wadenbäck, M., Felsberg, M.: RoMa: robust dense feature matching. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2024)
Felsberg, M., Sommer, G.: Image features based on a new approach to 2D rotation invariant quadrature filters. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 369–383. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-47969-4_25
Forssén, P.E., Lowe, D.G.: Shape descriptors for maximally stable extremal regions. In: 2007 IEEE 11th International Conference on Computer Vision, pp. 1–8. IEEE (2007)
Garrido, Q., Assran, M., Ballas, Nicolas Bardes, A., Najman, L., LeCun, Y.: Learning and leveraging world models in visual representation learning. arXiv preprint arXiv:2403.00504 (2024)
Garrido, Q., Najman, L., Lecun, Y.: Self-supervised learning of split invariant equivariant representations. In: International Conference on Machine Learning. PMLR (2023)
Giang, K.T., Song, S., Jo, S.: TopicFM: robust and interpretable topic-assisted feature matching. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37 (2023)
Gleize, P., Wang, W., Feiszli, M.: SiLK: simple learned keypoints. In: ICCV (2023)
Grill, J.B., et al.: Bootstrap your own latent-a new approach to self-supervised learning. In: Advances in Neural Information Processing Systems, vol. 33, pp. 21271–21284 (2020)
Gruver, N., Finzi, M.A., Goldblum, M., Wilson, A.G.: The lie derivative for measuring learned equivariance. In: The Eleventh International Conference on Learning Representations (2023). https://openreview.net/forum?id=JL7Va5Vy15J
Gupta, S., Robinson, J., Lim, D., Villar, S., Jegelka, S.: Structuring representation geometry with rotationally equivariant contrastive learning. arXiv preprint arXiv:2306.13924 (2023)
Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), vol. 2, pp. 1735–1742. IEEE (2006)
Han, J., Ding, J., Xue, N., Xia, G.S.: Redet: a rotation-equivariant detector for aerial object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2786–2795 (2021)
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16000–16009 (2022)
Howard, A., Trulls, E., Yi, K.M., Mishkin, D., Dane, S., Jin, Y.: Image matching challenge 2022 (2022). https://kaggle.com/competitions/image-matching-challenge-2022
Huang, D., et al.: Adaptive assignment for geometry aware local feature matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5425–5434 (2023)
Jonsson, E., Felsberg, M.: Efficient computation of channel-coded feature maps through piecewise polynomials. Image Vis. Comput. 27(11), 1688–1694 (2009)
Koyama, M., Fukumizu, K., Hayashi, K., Miyato, T.: Neural fourier transform: a general approach to equivariant representation learning. In: The Twelfth International Conference on Learning Representations (2024). https://openreview.net/forum?id=eOCvA8iwXH
Lawrence, H., Harris, M.T.: Learning polynomial problems with \(sl(2, \mathbb{R})\) equivariance. In: The Twelfth International Conference on Learning Representations (2023)
Lee, J., Jeong, Y., Cho, M.: Self-supervised learning of image scale and orientation. In: 31st British Machine Vision Conference 2021, BMVC 2021, Virtual Event, UK. BMVA Press (2021). https://www.bmvc2021-virtualconference.com/programme/accepted-papers/
Lee, J., Kim, B., Cho, M.: Self-supervised equivariant learning for oriented keypoint detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4847–4857 (2022)
Lee, J., Kim, B., Kim, S., Cho, M.: Learning rotation-equivariant features for visual correspondence. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21887–21897 (2023)
Lenc, K., Vedaldi, A.: Understanding image representations by measuring their equivariance and equivalence. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Li, Z., Snavely, N.: Megadepth: learning single-view depth prediction from internet photos. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2041–2050 (2018)
Lindenberger, P., Sarlin, P.E., Pollefeys, M.: LightGlue: local feature matching at light speed. In: IEEE International Conference on Computer Vision (ICCV) (2023)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision (IJCV) 60, 91–110 (2004)
Luo, Z., et al.: Contextdesc: local descriptor augmentation with cross-modality context. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2527–2536 (2019)
MacDonald, L.E., Ramasinghe, S., Lucey, S.: Enabling equivariance for arbitrary lie groups. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8183–8192 (2022)
Mao, R., Bai, C., An, Y., Zhu, F., Lu, C.: 3DG-STFM: 3D geometric guided student-teacher feature matching. In: Proceedings of European Conference on Computer Vision (ECCV) (2022)
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 22(10), 761–767 (2004)
Matas, J., Obdrzalek, T., Chum, O.: Local affine frames for wide-baseline stereo. In: 2002 International Conference on Pattern Recognition, vol. 4, pp. 363–366. IEEE (2002)
Melnyk, P., Felsberg, M., Wadenbäck, M.: Steerable 3D spherical neurons. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., Sabato, S. (eds.) Proceedings of the 39th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 162, pp. 15330–15339. PMLR (2022). https://proceedings.mlr.press/v162/melnyk22a.html
Mikolajczyk, K., Schmid, C.: Scale & affine invariant interest point detectors. Int. J. Comput. Vision (IJCV) 60, 63–86 (2004)
Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. (T-PAMI) 27(10), 1615–1630 (2005)
Mironenco, M., Forré, P.: Lie group decompositions for equivariant neural networks. In: The Twelfth International Conference on Learning Representations (2024). https://openreview.net/forum?id=p34fRKp8qA
Mishchuk, A., Mishkin, D., Radenovic, F., Matas, J.: Working hard to know your neighbor’s margins: local descriptor learning loss. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Mishkin, D., Matas, J., Perdoch, M., Lenc, K.: WxBS: wide baseline stereo generalizations. arXiv preprint arXiv:1504.06603 (2015)
Mishkin, D., Radenovic, F., Matas, J.: Repeatability is not enough: learning affine regions via discriminability. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 284–300 (2018)
Mishkin, D., Radenović, F., Matas, J.: Repeatability is not enough: learning affine regions via discriminability. In: European Conference on Computer Vision (ECCV), pp. 287–304 (2018)
Ni, J., et al.: Pats: patch area transportation with subdivision for local feature matching. In: The IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR) (2023)
Obdržálek, Š, Matas, J.: Local affine frames for image retrieval. In: Lew, M.S., Sebe, N., Eakins, J.P. (eds.) CIVR 2002. LNCS, vol. 2383, pp. 318–327. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45479-9_34
Olver, P.J.: Classical invariant theory. No. 44 in London Mathematical Society Student Texts, Cambridge University Press (1999)
Olver, P.J., Qu, C., Yang, Y.: Feature matching and heat flow in centro-affine geometry. SIGMA. Symmetry Integrability Geom. Methods Appl. 16, 093 (2020). https://doi.org/10.3842/SIGMA.2020.093. https://www.emis.de/journals/SIGMA/2020/093/
Oquab, M., et al.: DINOv2: learning robust visual features without supervision. arXiv:2304.07193 (2023)
Park, J.Y., Biza, O., Zhao, L., van de Meent, J.W., Walters, R.: Learning symmetric embeddings for equivariant world models. arXiv preprint arXiv:2204.11371 (2022)
Potje, G., Cadar, F., Araujo, A., Martins, R., Nascimento, E.R.: Xfeat: accelerated features for lightweight image matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2682–2691 (2024)
Revaud, J., De Souza, C., Humenberger, M., Weinzaepfel, P.: R2D2: reliable and repeatable detector and descriptor. In: Advances in Neural Information Processing Systems (NeurIPS), vol. 32 (2019)
Santellani, E., Sormann, C., Rossi, M., Kuhn, A., Fraundorfer, F.: S-trek: sequential translation and rotation equivariant keypoints for local feature extraction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9728–9737 (2023)
Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperGlue: learning feature matching with graph neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: Superglue: learning feature matching with graph neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4938–4947 (2020)
Shakerinava, M., Mondal, A.K., Ravanbakhsh, S.: Structuring representations using group invariants. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural Information Processing Systems, vol. 35, pp. 34162–34174. Curran Associates, Inc. (2022). https://proceedings.neurips.cc/paper_files/paper/2022/file/dcd297696d0bb304ba426b3c5a679c37-Paper-Conference.pdf
Shi, Y., Cai, J.X., Shavit, Y., Mu, T.J., Feng, W., Zhang, K.: Clustergnn: cluster-based coarse-to-fine graph neural network for efficient feature matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12517–12526 (2022)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Conference Track Proceedings (2015)
Stoken, A., Fisher, K.: Find my astronaut photo: automated localization and georectification of astronaut photography. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 6196–6205 (2023)
Sun, J., Shen, Z., Wang, Y., Bao, H., Zhou, X.: LoFTR: detector-free local feature matching with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8922–8931 (2021)
Tian, Y., Barroso Laguna, A., Ng, T., Balntas, V., Mikolajczyk, K.: HyNet: learning local descriptor with hybrid similarity measure and triplet loss. In: Advances in Neural Information Processing Systems, vol. 33, pp. 7401–7412 (2020)
Tian, Y., Yu, X., Fan, B., Wu, F., Heijnen, H., Balntas, V.: Sosnet: second order similarity regularization for local descriptor learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11016–11025 (2019)
Truong, P., Danelljan, M., Gool, L.V., Timofte, R.: GOCor: bringing globally optimized correspondence volumes into your neural network. In: Advances in Neural Information Processing Systems, vol. 33 (2020)
Truong, P., Danelljan, M., Timofte, R.: GLU-Net: global-local universal network for dense flow and correspondences. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6258–6268 (2020)
Truong, P., Danelljan, M., Timofte, R., Van Gool, L.: PDC-Net+: enhanced probabilistic dense correspondence network. IEEE Trans. Pattern Anal. Mach. Intell. 45(8), 10247–10266 (2023)
Truong, P., Danelljan, M., Van Gool, L., Timofte, R.: Learning accurate dense correspondences and when to trust them. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5714–5724 (2021)
Tuznik, S.L., Olver, P.J., Tannenbaum, A.: Equi-affine differential invariants for invariant feature point detection. Eur. J. Appl. Math. 31(2), 277–296 (2020). https://doi.org/10.1017/S0956792519000020
Tyszkiewicz, M., Fua, P., Trulls, E.: Disk: learning local features with policy gradient. In: Advances in Neural Information Processing Systems (NeurIPS), vol. 33, pp. 14254–14265 (2020)
Wang, Q., Zhang, J., Yang, K., Peng, K., Stiefelhagen, R.: MatchFormer: interleaving attention in transformers for feature matching. In: Asian Conference on Computer Vision (2022)
Weiler, M., Cesa, G.: General e (2)-equivariant steerable CNNs. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Yan, P., Tan, Y., Xiong, S., Tai, Y., Li, Y.: Learning soft estimator of keypoint scale and orientation with probabilistic covariant loss. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19406–19415 (2022)
Yi, K.M., Trulls, E., Lepetit, V., Fua, P.: LIFT: learned invariant feature transform. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 467–483. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_28
Yu, G., Morel, J.M.: ASIFT: an algorithm for fully affine invariant comparison. Image Process. On Line 1, 11–38 (2011)
Yu, J., Chang, J., He, J., Zhang, T., Yu, J., Feng, W.: ASTR: adaptive spot-guided transformer for consistent local feature matching. In: The IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR) (2023)
Zhao, X., Wu, X., Chen, W., Chen, P.C.Y., Xu, Q., Li, Z.: Aliked: a lighter keypoint and descriptor extraction network via deformable transformation. IEEE Trans. Instrum. Meas. 72, 1–16 (2023)
Zhao, X., Wu, X., Miao, J., Chen, W., Chen, P.C., Li, Z.: Alike: accurate and lightweight keypoint detection and descriptor extraction. IEEE Trans. Multimedia 25, 3101–3112 (2022)
Zhou, J., et al.: Image BERT pre-training with online tokenizer. In: International Conference on Learning Representations (2022). https://openreview.net/forum?id=ydopy-e6Dg
Zhu, S., Liu, X.: PMatch: paired masked image modeling for dense geometric matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023)
Acknowledgements
This work was supported by the Wallenberg Artificial Intelligence, Autonomous Systems and Software Program (WASP), funded by the Knut and Alice Wallenberg Foundation and by the strategic research environment ELLIIT, funded by the Swedish government. The computational resources were provided by the National Academic Infrastructure for Supercomputing in Sweden (NAISS) at C3SE, partially funded by the Swedish Research Council through grant agreement no. 2022-06725, and by the Berzelius resource, provided by the Knut and Alice Wallenberg Foundation at the National Supercomputer Centre.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Bökman, G., Edstedt, J., Felsberg, M., Kahl, F. (2025). Affine Steerers for Structured Keypoint Description. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15144. Springer, Cham. https://doi.org/10.1007/978-3-031-73016-0_26
Download citation
DOI: https://doi.org/10.1007/978-3-031-73016-0_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73015-3
Online ISBN: 978-3-031-73016-0
eBook Packages: Computer ScienceComputer Science (R0)