Joint features-guided linear transformer and CNN for efficient image super-resolution | International Journal of Machine Learning and Cybernetics Skip to main content

Advertisement

Log in

Joint features-guided linear transformer and CNN for efficient image super-resolution

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Integrating convolutional neural networks (CNNs) and transformers has notably improved lightweight single image super-resolution (SISR) tasks. However, existing methods lack the capability to exploit multi-level contextual information, and transformer computations inherently add quadratic complexity. To address these issues, we propose a Joint features-Guided Linear Transformer and CNN Network (JGLTN) for efficient SISR, which is constructed by cascading modules composed of CNN layers and linear transformer layers. Specifically, in the CNN layer, our approach employs an inter-scale feature integration module (IFIM) to extract critical latent information across scales. Then, in the linear transformer layer, we design a joint feature-guided linear attention (JGLA). It jointly considers adjacent and extended regional features, dynamically assigning weights to convolutional kernels for contextual feature selection. This process garners multi-level contextual information, which is used to guide linear attention for effective information interaction. Moreover, we redesign the method of computing feature similarity within the self-attention, reducing its computational complexity to linear. Extensive experiments shows that our proposal outperforms state-of-the-art models while balancing performance and computational costs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

Data will be made available on request.

References

  1. Chen G, Jiao P, Hu Q, Xiao L, Ye Z (2022) Swinstfm: Remote sensing spatiotemporal fusion using swin transformer. IEEE Trans Geosci Remote Sens 60:1–18. https://doi.org/10.1109/TGRS.2022.3182809

    Article  Google Scholar 

  2. Wang C, Lv X, Shao M, Qian Y, Zhang Y (2023) A novel fuzzy hierarchical fusion attention convolution neural network for medical image super-resolution reconstruction. Inform Sci 622:424–436. https://doi.org/10.1016/j.ins.2022.11.140

    Article  Google Scholar 

  3. Ran R, Deng L-J, Jiang T-X, Hu J-F, Chanussot J, Vivone G (2023) Guidednet: a general cnn fusion framework via high-resolution guidance for hyperspectral image super-resolution. IEEE Trans Cybernet. https://doi.org/10.1109/TCYB.2023.3238200

    Article  Google Scholar 

  4. Pang Y, Cao J, Wang J, Han J (2019) Jcs-net: Joint classification and super-resolution network for small-scale pedestrian detection in surveillance images. IEEE Trans Inform Forensics Secur 14(12):3322–3331. https://doi.org/10.1109/TIFS.2019.2916592

    Article  Google Scholar 

  5. Dong C, Loy CC, He K, Tang X (2015) Image super-resolution using deep convolutional networks. IEEE Trans Pattern Anal Mach Intell 38(2):295–307. https://doi.org/10.48550/arXiv.1501.00092

    Article  Google Scholar 

  6. Zhao H, Kong X, He J, Qiao Y, Dong C (2020) Efficient image super-resolution using pixel attention. In: Computer Vision–ECCV 2020 Workshops: Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16, pp. 56–72. Springer. https://doi.org/10.48550/arXiv.2010.01073

  7. Liang J, Cao J, Sun G, Zhang K, Van Gool L, Timofte R (2021) Swinir: Image restoration using swin transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1833–1844. https://doi.org/10.48550/arXiv.2108.10257

  8. Lu Z, Li J, Liu H, Huang C, Zhang L, Zeng T (2022) Transformer for single image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 457–466. https://doi.org/10.48550/arXiv.2108.11084

  9. Wu F, Fan A, Baevski A, Dauphin YN, Auli M (2019) Pay less attention with lightweight and dynamic convolutions. arXiv preprint arXiv:1901.10430. https://doi.org/10.1145/3340531.3412118

  10. Lin X, Ma L, Liu W, Chang S-F (2020) Context-gated convolution. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVIII 16, pp. 701–718. Springer. https://doi.org/10.48550/arXiv.1910.05577

  11. Katharopoulos A, Vyas A, Pappas N, Fleuret F (2020) Transformers are rnns: Fast autoregressive transformers with linear attention. In: International Conference on Machine Learning, pp. 5156–5165. PMLR. https://doi.org/10.48550/arXiv.2006.16236

  12. Liu J, Pan Z, He H, Cai J, Zhuang B (2022) Ecoformer: Energy-saving attention with linear complexity. Adv Neural Inform Process Syst 35: 10295–10308. https://doi.org/10.48550/arXiv.2209.09004

  13. Lim B, Son S, Kim H, Nah S, Mu Lee K (2017) Enhanced deep residual networks for single image super-resolution. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 136–144. https://doi.org/10.48550/arXiv.1707.02921

  14. Xia B, Hang Y, Tian Y, Yang W, Liao Q, Zhou J (2022) Efficient non-local contrastive attention for image super-resolution. In: Proceedings of the AAAI conference on artificial intelligence, 36: 2759–2767 (2022) https://doi.org/10.48550/arXiv.2201.03794

  15. Chen Z, Zhang Y, Gu J, Kong L, Yang X, Yu F (2023) Dual aggregation transformer for image super-resolution. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12312–12321. https://doi.org/10.48550/arXiv.2308.03364

  16. Ahn N, Kang B, Sohn K-A (2018) Fast, accurate, and lightweight super-resolution with cascading residual network. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 252–268. https://doi.org/10.48550/arXiv.1803.08664

  17. Hui Z, Gao X, Yang Y, Wang X (2019) Lightweight image super-resolution with information multi-distillation network. In: Proceedings of the 27th Acm International Conference on Multimedia, pp. 2024–2032. https://doi.org/10.48550/arXiv.1909.11856

  18. Liu J, Tang J, Wu G (2020) Residual feature distillation network for lightweight image super-resolution. In: Computer Vision–ECCV 2020 Workshops: Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16, pp. 41–55. Springer. https://doi.org/10.48550/arXiv.2009.11551

  19. Luo X, Xie Y, Zhang Y, Qu Y, Li C, Fu Y (2020) Latticenet: Towards lightweight image super-resolution with lattice block. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXII 16, pp. 272–289. Springer. https://doi.org/10.1007/978-3-030-58542-6_17

  20. Wang X, Dong C, Shan Y (2022) Repsr: Training efficient vgg-style super-resolution networks with structural re-parameterization and batch normalization. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 2556–2564. https://doi.org/10.1145/3503161.3547915

  21. Gao G, Li W, Li J, Wu F, Lu H, Yu Y (2022) Feature distillation interaction weighting network for lightweight image super-resolution. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 661–669. https://doi.org/10.48550/arXiv.2112.08655

  22. Luo X, Qu Y, Xie Y, Zhang Y, Li C, Fu Y (2023) Lattice network for lightweight image restoration. IEEE Trans Pattern Anal Mach Intell 45(4):4826–4842. https://doi.org/10.1109/TPAMI.2022.3194090

    Article  Google Scholar 

  23. Wang H, Zhang Y, Qin C, Van Gool L, Fu Y (2023) Global aligned structured sparsity learning for efficient image super-resolution. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2023.3268675

    Article  Google Scholar 

  24. Guo J, Zou X, Chen Y, Liu Y, Liu J, Yan Y, Hao J (2023) Asconvsr: Fast and lightweight super-resolution network with assembled convolutions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1582–1592. https://doi.org/10.48550/arXiv.2305.03387

  25. Muqeet A, Hwang J, Yang S, Kang J, Kim Y, Bae S-H (2020) Multi-attention based ultra lightweight image super-resolution. In: Computer Vision–ECCV 2020 Workshops: Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16, pp. 103–118. Springer. https://doi.org/10.48550/arXiv.2008.12912

  26. Chen H, Gu J, Zhang Z (2021) Attention in attention network for image super-resolution. arXiv preprint arXiv:2104.09497. https://doi.org/10.48550/arXiv.2104.09497

  27. Zhang D, Li C, Xie N, Wang G, Shao J (2021) Pffn: Progressive feature fusion network for lightweight image super-resolution. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 3682–3690. https://doi.org/10.1145/3474085.3475650

  28. Kong F, Li M, Liu S, Liu D, He J, Bai Y, Chen F, Fu L (2022) Residual local feature network for efficient super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 766–776. https://doi.org/10.48550/arXiv.2205.07514

  29. Du Z, Liu D, Liu J, Tang J, Wu G, Fu L (2022) Fast and memory-efficient network towards efficient image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 853–862. https://doi.org/10.48550/arXiv.2204.08397

  30. Fan C-M, Liu T-J, Liu K-H(2022) Sunet: Swin transformer unet for image denoising., 2333–2337. https://doi.org/10.1109/ISCAS48785.2022.9937486

  31. Tsai F-J, Peng Y-T, Lin Y-Y, Tsai C-C, Lin C-W (2022) Stripformer: Strip transformer for fast image deblurring. In: European Conference on Computer Vision, pp. 146–162. Springer. https://doi.org/10.48550/arXiv.2204.04627

  32. Ma F, Sun J (2022) Crossuformer: A cross attention u-shape transformer for low light image enhancement. In: Proceedings of the Asian Conference on Computer Vision, pp. 928–943

  33. Song Y, He Z, Qian H, Du X (2023) Vision transformers for single image dehazing. IEEE Transactions on Image Processing 32, 1927–1941. https://doi.org/10.48550/arXiv.2204.03883

  34. Chen, X., Wang, X., Zhou, J., Qiao, Y., Dong, C.: Activating more pixels in image super-resolution transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22367–22377 (2023). https://doi.org/10.48550/arXiv.2205.04437

  35. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021). https://doi.org/10.48550/arXiv.2103.14030

  36. Zhang X, Zeng H, Guo S, Zhang L (2022) Efficient long-range attention network for image super-resolution. In: European Conference on Computer Vision, pp. 649–667. Springer. https://doi.org/10.48550/arXiv.2203.06697

  37. Choi H, Lee J, Yang J(2023) N-gram in swin transformers for efficient lightweight image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2071–2081 . https://doi.org/10.48550/arXiv.2211.11436

  38. Cai H, Gan C, Han S (2022) Efficientvit: Enhanced linear attention for high-resolution low-computation visual recognition. arXiv preprint arXiv:2205.14756 . https://doi.org/10.48550/arXiv.2205.14756

  39. Choromanski K, Likhosherstov V, Dohan D, Song X, Gane A, Sarlos T, Hawkins P, Davis J, Mohiuddin A, Kaiser L, et al. (2020) Rethinking attention with performers. arXiv preprint arXiv:2009.14794. https://doi.org/10.48550/arXiv.2009.14794

  40. You, H., Xiong, Y., Dai, X., Wu, B., Zhang, P., Fan, H., Vajda, P., Lin, Y.C.: Castling-vit: Compressing self-attention via switching towards linear-angular attention at vision transformer inference. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14431–14442 (2023). https://doi.org/10.48550/arXiv.2211.10526

  41. Chen, B., Dao, T., Winsor, E., Song, Z., Rudra, A., Ré, C.: Scatterbrain: Unifying sparse and low-rank attention. Advances in Neural Information Processing Systems 34, 17413–17426 (2021). https://doi.org/10.48550/arXiv.2110.15343

  42. Dass, J., Wu, S., Shi, H., Li, C., Ye, Z., Wang, Z., Lin, Y.: Vitality: Unifying low-rank and sparse approximation for vision transformer acceleration with a linear taylor attention. In: 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pp. 415–428 (2023). IEEE. https://doi.org/10.48550/arXiv.2211.05109

  43. Lipson L, Teed Z, Deng J (2021) Raft-stereo: Multilevel recurrent field transforms for stereo matching. In: 2021 International Conference on 3D Vision (3DV), pp. 218–227. IEEE. https://doi.org/10.48550/arXiv.2109.07547

  44. Niu B, Wen W, Ren W, Zhang X, Yang L, Wang S, Zhang K, Cao X, Shen H (2020) Single image super-resolution via a holistic attention network. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XII 16, pp. 191–207. Springer. https://doi.org/10.48550/arXiv.2008.08767

  45. Han D, Pan X, Han Y, Song S, Huang G (2023) Flatten transformer: Vision transformer using focused linear attention. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5961–5971. https://doi.org/10.48550/arXiv.2308.00442

  46. Zhang Y, Li K, Li K, Wang L, Zhong B, Fu Y (2018) Image super-resolution using very deep residual channel attention networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 286–301. https://doi.org/10.48550/arXiv.1807.02758

  47. Hui Z, Wang X, Gao X (2018) Fast and accurate single image super-resolution via information distillation network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 723–731 . https://doi.org/10.48550/arXiv.1803.09454

  48. Lan R, Sun L, Liu Z, Lu H, Pang C, Luo X (2020) Madnet: A fast and lightweight network for single-image super resolution. IEEE Trans Cybernet 51(3):1443–1453. https://doi.org/10.1109/TCYB.2020.2970104

    Article  Google Scholar 

  49. Wang L, Dong X, Wang Y, Ying X, Lin Z, An W, Guo Y (2021) Exploring sparsity in image super-resolution for efficient inference. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4917–4926. https://doi.org/10.48550/arXiv.2006.09603

  50. Park K, Soh JW, Cho NI (2021) Dynamic residual self-attention network for lightweight single image super-resolution. IEEE Trans Multimedia. https://doi.org/10.1109/TMM.2021.3134172

    Article  Google Scholar 

  51. Luo X, Qu Y, Xie Y, Zhang Y, Li C, Fu Y (2022) Lattice network for lightweight image restoration. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(4), 4826–4842. https://doi.org/10.48550/arXiv.2112.08655

  52. Sun B, Zhang Y, Jiang S, Fu Y (2023) Hybrid pixel-unshuffled network for lightweight image super-resolution. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 2375–2383. https://doi.org/10.48550/arXiv.2203.08921

  53. Timofte, R, Agustsson E, Van Gool L, Yang M-H, Zhang L (2017) Ntire 2017 challenge on single image super-resolution: Methods and results. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 114–125. https://doi.org/10.1109/CVPRW.2017.149

  54. Bevilacqua M, Roumy A, Guillemot C, Alberi-Morel ML (2012). Low-complexity single-image super-resolution based on nonnegative neighbor embedding. https://doi.org/10.5244/C.26.135

    Article  Google Scholar 

  55. Zeyde, R., Elad, M., Protter, M.: On single image scale-up using sparse-representations. In: Curves and Surfaces: 7th International Conference, Avignon, France, June 24-30, 2010, Revised Selected Papers 7, pp. 711–730 (2012). Springer. https://doi.org/10.1007/978-3-642-27413-8_47

  56. Martin, D., Fowlkes, C., Tal, D., Malik, J.: A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, vol. 2, pp. 416–423 (2001). IEEE. https://doi.org/10.1109/ICCV.2001.937655

  57. Huang J-B, Singh A, Ahuja N (2015) Single image super-resolution from transformed self-exemplars. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5197–5206. https://doi.org/10.1109/CVPR.2015.7299156

  58. Matsui Y, Ito K, Aramaki Y, Fujimoto A, Ogawa T, Yamasaki T, Aizawa K (2017) Sketch-based manga retrieval using manga109 dataset. Multimedia Tools and Applications 76, 21811–21838. https://doi.org/10.48550/arXiv.1510.04389

  59. Agustsson E, Timofte R (2017) Ntire 2017 challenge on single image super-resolution: Dataset and study. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 126–135. https://doi.org/10.1109/CVPRW.2017.150

  60. Cai J, Zeng H, Yong H, Cao Z, Zhang L (2019) Toward real-world single image super-resolution: A new benchmark and a new model. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3086–3095. https://doi.org/10.48550/arXiv.1904.00523

Download references

Acknowledgements

This work was supported by Natural science research project of Guizhou Provincial Department of Education, China (QianJiaoJi[2022]029,QianJiaoHeKY[2021]022).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yongjun Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, B., Zhang, Y., Long, W. et al. Joint features-guided linear transformer and CNN for efficient image super-resolution. Int. J. Mach. Learn. & Cyber. 15, 5765–5780 (2024). https://doi.org/10.1007/s13042-024-02277-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-024-02277-2

Keywords