Abstract
This paper explores learned image compression based on traditional and learned discrete wavelet transform (DWT) architectures and learned entropy models for coding DWT subband coefficients. A learned DWT is obtained through the lifting scheme with learned nonlinear predict and update filters. Several learned entropy models, with varying computational complexities, are explored to exploit inter- and intra-DWT subband coefficient dependencies, akin to traditional EZW, SPIHT, or EBCOT algorithms. Experimental results show that when the explored learned entropy models are combined with traditional wavelet filters, such as the CDF 9/7 filters, compression performance that far exceeds that of JPEG2000 can be achieved. When the learned entropy models are combined with the learned DWT, compression performance increases further. The computations in the learned DWT and all entropy models, except one, can be simply parallelized, and thus, the systems provide practical encoding and decoding times on GPUs, unlike other DWT-based learned compression systems in the literature.


















Similar content being viewed by others
Notes
Encoding/decoding times for JPEG2000, and the two systems with IISCEM are obtained with a CPU (due to sequential encoding/decoding requirement), while all others are obtained with a GPU.
References
Jiao, L., Zhao, J.: A survey on the new generation of deep learning in image processing. IEEE Access 7, 172231–172263 (2019)
Steinmetz, R.: Data compression in multimedia computing-standards and systems. Multimed. Syst. 1(5), 187–204 (1994)
Pennebaker, W.B., Mitchell, J.L.: JPEG: Still image data compression standard. Springer (1992)
Rabbani, M., Joshi, R.: An overview of the jpeg 2000 still image compression standard. Signal Process. Image commun. 17(1), 3–48 (2002)
Christopoulos, C., Skodras, A., Ebrahimi, T.: The JPEG2000 still image coding system: an overview. Consum. Electron. IEEE Trans 46(4), 1103–1127 (2000). https://doi.org/10.1109/30.920468
Lainema, J, Hannuksela, MM, Vadakital ,VK, Aksu, EB: Hevc still image coding and high efficiency image file format. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 71–75 (2016). https://doi.org/10.1109/ICIP.2016.7532321
(Netflix) CC.: AV1 Image File Format (AVIF). Last accessed 26 February 2023 (2023). http://www.aomediacodec.github.io
Goodfellow, I, Bengio, Y., Courville, A.: Deep Learning. MIT Press. http://www.deeplearningbook.org (2016)
Goyal, V.K.: Theoretical foundations of transform coding. IEEE Signal Process. Mag. 18(5), 9–21 (2001)
Ahmed, N., Natarajan, T., Rao, K.R.: Discrete cosine transform. IEEE Trans. Comput. 100(1), 90–93 (1974)
Han, J., Saxena, A., Melkote, V., Rose, K.: Jointly optimized spatial prediction and block transform for video and image coding. IEEE Trans. Image Process. 21(4), 1874–1884 (2011)
Kamisli, F.: Block-based spatial prediction and transforms based on 2d markov processes for image and video compression. IEEE Trans. Image Process. 24(4), 1247–1260 (2015)
Ballé, J., Laparra, V., Simoncelli, E.P.: End-to-end optimized image compression. arXiv preprint arXiv:1611.01704 (2016)
Ballé, J., Minnen, D., Singh, S., Hwang, S.J., Johnston, N.: Variational image compression with a scale hyperprior. arXiv preprint arXiv:1802.01436 (2018)
Hilton, M.L., Jawerth, B.D., Sengupta, A.: Compressing still and moving images with wavelets. Multimed. Syst. 2, 218–227 (1994)
Geetha, V., Anbumani, V., Murugesan, G., Gomathi, S.: Hybrid optimal algorithm-based 2d discrete wavelet transform for image compression using fractional kca. Multimed. Syst. 26, 687–702 (2020)
Buccigrossi, R.W., Simoncelli, E.P.: Image compression via joint statistical characterization in the wavelet domain. IEEE Trans. Image Process. 8(12), 1688–1701 (1999)
Liu, Z., Karam, L.J.: 2002 Quantifying the intra and inter subband correlations in the zerotree-based wavelet image coders. Conf Rec Thirty-Sixth Asilomar Conf Signals Syst Comput 2, 1730–17342 (2002). https://doi.org/10.1109/ACSSC.2002.1197071
Shapiro, J.M.: Embedded image coding using zerotrees of wavelet coefficients. IEEE Trans. Signal Process. 41(12), 3445–3462 (1993)
Said, A., Pearlman, W.A.: A new, fast, and efficient image codec based on set partitioning in hierarchical trees. IEEE Trans. Circuits Syst. Video Technol. 6(3), 243–250 (1996)
Taubman, D.: High performance scalable image compression with ebcot. IEEE Trans. Image Process. 9(7), 1158–1170 (2000)
Ma, H., Liu, D., Yan, N., Li, H., Wu, F.: End-to-end optimized versatile image compression with wavelet-like transform. IEEE Trans. Pattern Anal. Mach. Intell. 44, 1247 (2020)
Minnen, D., Ballé, J., Toderici, G.D.: Joint autoregressive and hierarchical priors for learned image compression. Adv. Neural Inform. Process. Syst. (2018). https://doi.org/10.48550/arXiv.1809.02736
Sweldens, W.: The lifting scheme: A construction of second generation wavelets. SIAM J. Math. Anal. 29(2), 511–546 (1998). https://doi.org/10.1137/S0036141095289051
Daubechies, I., Sweldens, W.: Factoring wavelet transforms into lifting steps. J. Fourier Anal. Appl. 4(3), 247–269 (1998)
Cohen, A., Daubechies, I., Feauveau, J.-C.: Biorthogonal bases of compactly supported wavelets. Commun. Pure Appl. Math. 45(5), 485–560 (1992)
Dragotti, P.L., Vetterli, M.: Wavelet footprints: theory, algorithms, and applications. IEEE Trans. Signal Process. 51(5), 1306–1323 (2003)
Dragotti, P.L., Vetterli, M.: Footprints and edgeprints for image denoising and compression. In: Proceedings 2001 International Conference on Image Processing (Cat. No. 01CH37205), vol. 2, pp. 237–240 (2001). IEEE
Dragotti, P.L., Vetterli, M.: Deconvolution with wavelet footprints for ill-posed inverse problems. IEEE Int. Conf. Acoust. Speech Signal Process. 2, 1257 (2002)
Zhao, X., Huang, P., Shu, X.: Wavelet-attention cnn for image classification. Multimed. Syst. 28(3), 915–924 (2022)
Brahimi, T., Khelifi, F., Laouir, F., Kacha, A.: A new, enhanced ezw image codec with subband classification. Multimed. Syst. 28(1), 1–19 (2022)
Cheng, Z., Sun, H., Takeuchi, M., Katto, J.: Learned image compression with discretized gaussian mixture likelihoods and attention modules. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7939–7948 (2020)
Yílmaz, M.A., Kelesş, O., Güven, H., Tekalp, A.M., Malik, J., Kíranyaz, S.: Self-organized variational autoencoders (self-vae) for learned image compression. In: 2021 IEEE International Conference on Image Processing (ICIP), pp. 3732–3736 (2021). IEEE
Lu, M., Guo, P., Shi, H., Cao, C., Ma, Z.: Transformer-based image compression. arXiv preprint arXiv:2111.06707 (2021)
Minnen, D., Singh, S.: Channel-wise autoregressive entropy models for learned image compression. In: 2020 IEEE International Conference on Image Processing (ICIP), pp. 3339–3343 (2020). IEEE
He, D., Yang, Z., Peng, W., Ma, R., Qin, H., Wang, Y.: Elic: Efficient learned image compression with unevenly grouped space-channel contextual adaptive coding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5718–5727 (2022)
Kim, J.-H., Heo, B., Lee, J.-S.: Joint global and local hierarchical priors for learned image compression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5992–6001 (2022)
Ma, H., Liu, D., Xiong, R., Wu, F.: iwave: Cnn-based wavelet-like transform for image compression. IEEE Trans. Multimed. 22(7), 1667–1679 (2019)
Kodak, E.: Kodak Lossless True Color Image Suite (PhotoCD PCD0992). Last accessed 2 February 2023 (2023). http://r0k.us/graphics/kodak
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Ballé, J.: Efficient nonlinear transforms for lossy image compression. In: 2018 Picture Coding Symposium (PCS), pp. 248–252 (2018). IEEE
Marcellin, M.W., Lepley, M.A., Bilgin, A., Flohr, T.J., Chinen, T.T., Kasner, J.H.: An overview of quantization in jpeg 2000. Signal Process.Image Commun. 17(1), 73–84 (2002)
Ballé, J., Laparra, V., Simoncelli, E.P.: Density modeling of images using a generalized normalization transformation. arXiv preprint arXiv:1511.06281 (2015)
Bégaint, J., Racapé, F., Feltman, S., Pushparaja, A.: Compressai: a pytorch library and evaluation platform for end-to-end compression research. arXiv preprint arXiv:2011.03029 (2020)
Chilinski, P., Silva, R.: Neural likelihoods via cumulative distribution functions. In: Conference on Uncertainty in Artificial Intelligence, pp. 420–429 (2020). PMLR
Van den Oord, A., Kalchbrenner, N., Espeholt, L., Vinyals, O., Graves, A., et al.: Conditional image generation with pixelcnn decoders. Adv. Neural Inform. Process Syst. 29 (2016)
Salimans, T., Karpathy, A., Chen, X., Kingma, D.P.: Pixelcnn++: Improving the pixelcnn with discretized logistic mixture likelihood and other modifications. arXiv preprint arXiv:1701.05517 (2017)
Xue, T., Chen, B., Wu, J., Wei, D., Freeman, W.T.: Video enhancement with task-oriented flow. CoRR arXiv:abs/1711.09078 (2017)
Sahin, U.B., Kamisli, F.: Learned-DWT-and-Tree-based-Entropy-Models. Last accessed 26 February 2023 (2023). https://github.com/uberkk/ImageCompressionLearnedLiftingandLearnedTreeBasedModels
Ballé, J., Laparra, V., Simoncelli, E.P.: End-to-end optimization of nonlinear transform codes for perceptual quality. In: 2016 Picture Coding Symposium (PCS), pp. 1–5 (2016). IEEE
Pakdaman, F., Gabbouj, M.: Comprehensive complexity assessment of emerging learned image compression on cpu and gpu. In: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5 (2023). IEEE
Sovrasov, V.: Ptflops: a Flops Counting Tool for Neural Networks in Pytorch Framework. https://github.com/sovrasov/flops-counter.pytorch
Funding
No funding was received for conducting this study.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Code availability
The codes to repeat the results in this paper are available from the authors on GitHub [49].
Additional information
Communicated by Q. Shen.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A
Appendix A
Following parameters are used in experimental results. In Fig. 7 a), \(ch=32\) is used for processing LL and \(ch=96\) is used for processing LH, HL, and HH subbands together. In Fig. 7 b), \(ch=32\) is used for each subband. In Fig. 8, \(ch=243\) is used for jointly processing LH, HL, and HH subbands, and output gives mean and scale for the corresponding three channels. In Fig. 9, on the right-hand side, \(ch=243\) is used for jointly processing LH, HL, and HH subbands and ch/3 is used on the left-hand side for processing each subband LH, HL, and HH separately (total of 243 channels). In Fig. 10, \(ch=162\) is used for processing each LH, HL, and HH subband separately. In Fig. 11, \(ch=81\) is used for processing each subband LL, LH, HL, and HH. In Fig. 13, \(ch=32\) is used. Our codes are available on github at [49].
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sahin, U.B., Kamisli, F. Image compression with learned lifting-based DWT and learned tree-based entropy models. Multimedia Systems 29, 3369–3384 (2023). https://doi.org/10.1007/s00530-023-01192-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00530-023-01192-w