Just Recognizable Distortion for Machine Vision Oriented Image and Video Coding

Zhang, Qi; Wang, Shanshe; Zhang, Xinfeng; Ma, Siwei; Gao, Wen

doi:10.1007/s11263-021-01505-4

Just Recognizable Distortion for Machine Vision Oriented Image and Video Coding

Published: 13 August 2021

Volume 129, pages 2889–2906, (2021)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Qi Zhang ORCID: orcid.org/0000-0002-1189-8755¹,
Shanshe Wang¹,
Xinfeng Zhang²,
Siwei Ma¹ &
…
Wen Gao^1,3

1768 Accesses
4 Altmetric
Explore all metrics

Abstract

Machine visual intelligence has exploded in recent years. Large-scale, high-quality image and video datasets significantly empower learning-based machine vision models, especially deep-learning models. However, images and videos are usually compressed before being analyzed in practical situations where transmission or storage is limited, leading to a noticeable performance loss of vision models. In this work, we broadly investigate the impact on the performance of machine vision from image and video coding. Based on the investigation, we propose Just Recognizable Distortion (JRD) to present the maximum distortion caused by data compression that will reduce the machine vision model performance to an unacceptable level. A large-scale JRD-annotated dataset containing over 340,000 images is built for various machine vision tasks, where the factors for different JRDs are studied. Furthermore, an ensemble-learning-based framework is established to predict the JRDs for diverse vision tasks under few- and non-reference conditions, which consists of multiple binary classifiers to improve the prediction accuracy. Experiments prove the effectiveness of the proposed JRD-guided image and video coding to significantly improve compression and machine vision performance. Applying predicted JRD is able to achieve remarkably better machine vision task accuracy and save a large number of bits.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

2C-Net: integrate image compression and classification via deep neural network

Article 01 December 2022

End-to-end image compression method based on perception metric

Article 01 February 2022

Deep learning-guided video compression for machine vision tasks

Article Open access 20 September 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

Which means for some images, lower image quality can lead to more accurate category prediction from machine classifier. Nevertheless, these images are still ignored as explained in Sect. 4, and they will not influence the conclusion of this section or the paper.
Which means these person objects cannot be detected under any selected compression levels.

References

Aqqa, M., Mantini, P., & Shah, S. K. (2019). Understanding how video quality affects object detection algorithms. In VISIGRAPP (5: VISAPP) (pp. 96–104).
Bross, B., Chen, J., & Liu, S. (2018). Versatile video coding (draft 5). JVET-K1001.
Chen, Y., Murherjee, D., Han, J., Grange, A., Xu, Y., Liu, Z., Parker, S., Chen, C., Su, H., Joshi, U., & Chiang, C. H. (2018). An overview of core coding tools in the av1 video codec. In 2018 picture coding symposium (PCS) (pp. 41–45). IEEE.
Chen, Z., Fan, K., Wang, S., Duan, L., Lin, W., & Kot, A. C. (2019). Toward intelligent sensing: Intermediate deep feature compression. IEEE Transactions on Image Processing, 29, 2230–2243.
Article Google Scholar
Chou, C. H., & Li, Y. C. (1995). A perceptually tuned subband image coder based on the measure of just-noticeable-distortion profile. IEEE Transactions on Circuits and Systems for Video Technology, 5(6), 467–476.
Article Google Scholar
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248–255). IEEE.
Dodge, S., & Karam, L. (2016). Understanding how image quality affects deep neural networks. In 2016 eighth international conference on quality of multimedia experience (QoMEX) (pp. 1–6), IEEE.
Dodge, S., & Karam, L. (2017). A study and comparison of human and deep learning recognition performance under visual distortions. In 2017 26th international conference on computer communication and networks (ICCCN) (pp 1–7). IEEE.
Duan, L. Y., Chandrasekhar, V., Chen, J., Lin, J., Wang, Z., Huang, T., et al. (2015). Overview of the mpeg-cdvs standard. IEEE Transactions on Image Processing, 25(1), 179–194.
Article MathSciNet Google Scholar
Duan, L. Y., Lou, Y., Bai, Y., Huang, T., Gao, W., Chandrasekhar, V., et al. (2018). Compact descriptors for video analysis: The emerging mpeg standard. IEEE MultiMedia, 26(2), 44–54.
Article Google Scholar
Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2), 303–338.
Article Google Scholar
Fan, C., Lin, H., Hosu, V., Zhang, Y., Jiang, Q., Hamzaoui, R., & Saupe, D. (2019). Sur-net: Predicting the satisfied user ratio curve for image compression with deep learning. In 2019 eleventh international conference on quality of multimedia experience (QoMEX) (pp. 1–6), IEEE.
Geirhos, R., Temme, C. R., Rauber, J., Schütt, H. H., Bethge, M., & Wichmann, F. A. (2018). Generalisation in humans and deep neural networks. Advances in Neural Information Processing Systems, 31, 7538–7550.
Google Scholar
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).
Hu, T., Qi, H., Huang, Q., & Lu, Y. (2019). See better before looking closer: Weakly supervised data augmentation network for fine-grained visual classification. arXiv preprint arXiv:1901.09891.
Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017a). Densely connected convolutional networks. Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4700–4708).
Huang, Q., Wang, H., Lim, S. C., Kim, H. Y., Jeong, S. Y., & Kuo, C. C. J. (2017b). Measure and prediction of hevc perceptually lossy/lossless boundary qp values. In: 2017 data compression conference (DCC) (pp. 42–51). IEEE.
Jayant, N., Johnston, J., & Safranek, R. (1993). Signal compression based on models of human perception. Proceedings of the IEEE, 81(10), 1385–1422.
Article Google Scholar
Jin, L., Lin, J. Y., Hu, S., Wang, H., Wang, P., Katsavounidis, I., et al. (2016). Statistical study on perceived jpeg image quality via mcl-jci dataset construction and analysis. Electronic Imaging, 2016(13), 1–9.
Article Google Scholar
Li, Y., Jia, C., Wang, S., Zhang, X., Wang, S., Ma, S., & Gao, W. (2018). Joint rate-distortion optimization for simultaneous texture and deep feature compression of facial images. In 2018 IEEE fourth international conference on multimedia big data (BigMM) (pp 1–5). IEEE.
Lin, H., Hosu, V., Fan, C., Zhang, Y., Mu, Y., Hamzaoui, R., et al. (2020). Sur-featnet: Predicting the satisfied user ratio curve for image compression with deep feature learning. Quality and User Experience, 5, 1–23.
Article Google Scholar
Lin, J. Y., Jin, L., Hu, S., Katsavounidis, I., Li, Z., Aaron, A., & Kuo, C. C. J. (2015). Experimental design and analysis of jnd test on coded image/video. In Applications of digital image processing XXXVIII, International Society for optics and photonics (Vol. 9599, p. 95990Z).
Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In European conference on computer vision (pp. 740–755). Springer.
Liu, D., Wang, D., & Li, H. (2017). Recognizable or not: Towards image semantic quality assessment for compression. Sensing and Imaging, 18(1), 1.
Article Google Scholar
Liu, H., Zhang, Y., Zhang, H., Fan, C., Kwong, S., Kuo, C. C. J., et al. (2019). Deep learning-based picture-wise just noticeable distortion prediction model for image compression. IEEE Transactions on Image Processing, 29, 641–656.
Article MathSciNet Google Scholar
Lou, Y., Duan, L. Y., Wang, S., Chen, Z., Bai, Y., Chen, C., et al. (2019). Front-end smart visual sensing and back-end intelligent analysis: A unified infrastructure for economizing the visual system of city brain. IEEE Journal on Selected Areas in Communications, 37(7), 1489–1503.
Article Google Scholar
Ma, S., Zhang, X., Wang, S., Zhang, X., Jia, C., & Wang, S. (2018). Joint feature and texture coding: Toward smart video representation via front-end intelligence. IEEE Transactions on Circuits and Systems for Video Technology, 29(10), 3095–3105.
Article Google Scholar
Redondi, A., Baroffio, L., Bianchi, L., Cesana, M., & Tagliasacchi, M. (2016). Compress-then-analyze versus analyze-then-compress: What is best in visual sensor networks? IEEE Transactions on Mobile Computing, 15(12), 3000–3013.
Article Google Scholar
Ren, S., He, K., Girshick, R., & Sun, J. (2016). Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6), 1137–1149.
Article Google Scholar
Schrimpf, M., Kubilius, J., Hong, H., Majaj, N. J., Rajalingham, R., Issa, E. B., Kar, K., Bashivan, P., Prescott-Roy, J., Schmidt, K., et al. (2018). Brain-score: Which artificial neural network for object recognition is most brain-like? BioRxiv p. 407007.
Shi, J., & Chen, Z. (2020). Reinforced bit allocation under task-driven semantic distortion metrics. In 2020 IEEE international symposium on circuits and systems (ISCAS) (pp. 1–5). IEEE.
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
Skodras, A., Christopoulos, C., & Ebrahimi, T. (2001). The jpeg 2000 still image compression standard. IEEE Signal Processing Magazine, 18(5), 36–58.
Article Google Scholar
Su, J., Vargas, D. V., & Sakurai, K. (2019). One pixel attack for fooling deep neural networks. IEEE Transactions on Evolutionary Computation, 23(5), 828–841.
Article Google Scholar
Sullivan, G. J., Ohm, J. R., Han, W. J., & Wiegand, T. (2012). Overview of the high efficiency video coding (hevc) standard. IEEE Transactions on Circuits and Systems for Video Technology, 22(12), 1649–1668.
Article Google Scholar
Tan, M., & Le, Q. V. (2019). Efficientnet: Rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946.
Wah, C., Branson, S., Welinder, P., Perona, P., & Belongie, S. (2011). The caltech-ucsd birds-200-2011 dataset.
Wang, H., Gan, W., Hu, S., Lin, J. Y., Jin, L., Song, L., Wang, P., Katsavounidis, I., Aaron, A., & Kuo, C. C. J. (2016). Mcl-jcv: a jnd-based h. 264/avc video quality assessment dataset. In 2016 IEEE international conference on image processing (ICIP) (pp. 1509–1513). IEEE.
Wang, H., Katsavounidis, I., Zhou, J., Park, J., Lei, S., Zhou, X., et al. (2017). Videoset: A large-scale compressed video quality dataset based on jnd measurement. Journal of Visual Communication and Image Representation, 46, 292–302.
Article Google Scholar
Wang, H., Katsavounidis, I., Huang, Q., Zhou, X., & Kuo, C. C. J. (2018a). Prediction of satisfied user ratio for compressed video. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 6747–6751). IEEE.
Wang, H., Zhang, X., Yang, C., & Kuo, C. C. J. (2018b). Analysis and prediction of jnd-based video quality model. In 2018 picture coding symposium (PCS) (pp 278–282). IEEE.
Wang, S., Wang, S., Yang, W., Zhang, X., Wang, S., Ma, S., & Gao, W. (2020). Towards analysis-friendly face representation with scalable feature and texture compression. arXiv preprint arXiv:2004.10043.
Wiegand, T., Sullivan, G. J., Bjontegaard, G., & Luthra, A. (2003). Overview of the h. 264/avc video coding standard. IEEE Transactions on Circuits and Systems for Video Technology, 13(7), 560–576.
Xie, S., Girshick, R., Dollár, P., Tu, Z., & He, K. (2017). Aggregated residual transformations for deep neural networks. Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1492–1500.
Yamins, D. L., & DiCarlo, J. J. (2016). Using goal-driven deep learning models to understand sensory cortex. Nature Neuroscience, 19(3), 356–365.
Article Google Scholar
Yang, X., Ling, W., Lu, Z., Ong, E. P., & Yao, S. (2005). Just noticeable distortion model and its applications in video coding. Signal Processing: Image Communication, 20(7), 662–680.
Google Scholar
Zhang, J., Jia, C., Lei, M., Wang, S., Ma, S., & Gao, W. (2019). Recent development of avs video coding standard: Avs3. In 2019 picture coding symposium (PCS) (pp. 1–5). IEEE.
Zhang, X., Ma, S., Wang, S., Zhang, X., Sun, H., & Gao, W. (2016). A joint compression scheme of video feature descriptors and visual content. IEEE Transactions on Image Processing, 26(2), 633–647.
Article MathSciNet Google Scholar
Zhang, X., Yang, C., Wang, H., Xu, W., & Kuo, C. C. J. (2020). Satisfied-user-ratio modeling for compressed video. IEEE Transactions on Image Processing, 29, 3777–3789.
Article Google Scholar
Zhou, X., Wang, D., & Krähenbühl, P. (2019). Objects as points. arXiv preprint arXiv:1904.07850.

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (62072008, 62025101), PKU-Baidu Fund (2019BD003) and High-performance Computing Platform of Peking University, which are gratefully acknowledged.

Author information

Authors and Affiliations

National Engineering Laboratory for Video Technology, Peking University, Beijing, China
Qi Zhang, Shanshe Wang, Siwei Ma & Wen Gao
School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, China
Xinfeng Zhang
Peng Cheng Laboratory, Shenzhen, Guangdong Province, China
Wen Gao

Authors

Qi Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Shanshe Wang
View author publications
You can also search for this author inPubMed Google Scholar
Xinfeng Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Siwei Ma
View author publications
You can also search for this author inPubMed Google Scholar
Wen Gao
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Qi Zhang.

Additional information

Communicated by Dong Xu.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Q., Wang, S., Zhang, X. et al. Just Recognizable Distortion for Machine Vision Oriented Image and Video Coding. Int J Comput Vis 129, 2889–2906 (2021). https://doi.org/10.1007/s11263-021-01505-4

Download citation

Received: 15 December 2020
Accepted: 12 July 2021
Published: 13 August 2021
Issue Date: October 2021
DOI: https://doi.org/10.1007/s11263-021-01505-4

Keywords

Part of a collection:

Special Issue on Deep Learning for Video Analysis and Compression

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Just Recognizable Distortion for Machine Vision Oriented Image and Video Coding

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

2C-Net: integrate image compression and classification via deep neural network

End-to-end image compression method based on perception metric

Deep learning-guided video compression for machine vision tasks

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now