Smoothed dilated convolutions for improved dense prediction | Data Mining and Knowledge Discovery Skip to main content
Log in

Smoothed dilated convolutions for improved dense prediction

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Dilated convolutions, also known as atrous convolutions, have been widely explored in deep convolutional neural networks (DCNNs) for various dense prediction tasks. However, dilated convolutions suffer from the gridding artifacts, which hampers the performance. In this work, we propose two simple yet effective degridding methods by studying a decomposition of dilated convolutions. Unlike existing models, which explore solutions by focusing on a block of cascaded dilated convolutional layers, our methods address the gridding artifacts by smoothing the dilated convolution itself. In addition, we point out that the two degridding approaches are intrinsically related and define separable and shared (SS) operations, which generalize the proposed methods. We further explore SS operations in view of operations on graphs and propose the SS output layer, which is able to smooth the entire DCNNs by only replacing the output layer. We evaluate our degridding methods and the SS output layer thoroughly, and visualize the smoothing effect through effective receptive field analysis. Results show that our methods degridding yield consistent improvements on the performance of dense prediction tasks, while adding negligible amounts of extra training parameters. And the SS output layer improves the performance by 3.3% and contains only 9% training parameters of the original output layer.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. https://github.com/divelab/dilated/.

References

  • Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) Tensorflow: a system for large-scale machine learning. In: Proceedings of the 12th \(\{\)USENIX\(\}\) symposium on operating systems design and implementation. pp 265–283

  • Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017a) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848

  • Chen LC, Papandreou G, Schroff F, Adam H (2017b) Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587

  • Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 1251–1258

  • Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 3213–3223

  • Dai J, Li Y, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks. In: Advances in neural information processing systems. pp 379–387

  • Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 248–255

  • Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338

    Article  Google Scholar 

  • Gao H, Wang Z, Ji S (2018) Large-scale learnable graph convolutional networks. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1416–1424

  • Gao H, Yuan H, Wang Z, Ji S (2019) Pixel transposed convolutional networks. IEEE Trans Pattern Anal Mach Intell 42(5):1218–1227

    Google Scholar 

  • Giusti A, Cireşan DC, Masci J, Gambardella LM, Schmidhuber J (2013) Fast image scanning with deep max-pooling convolutional neural networks. In: Proceedings of the IEEE international conference on image processing. IEEE, pp 4034–4038

  • Hamaguchi R, Fujita A, Nemoto K, Imaizumi T, Hikosaka S (2018) Effective use of dilated convolutions for segmenting small object instances in remote sensing imagery. In: Proceedings of the IEEE winter conference on applications of computer vision. IEEE, pp 1442–1450

  • Hamilton W, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. In: Advances in neural information processing systems. pp 1024–1034

  • Hariharan B, Arbeláez P, Bourdev L, Maji S, Malik J (2011) Semantic contours from inverse detectors. In: Proceedings of the IEEE international conference on computer vision. IEEE, pp 991–998

  • He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 770–778

  • Holschneider M, Kronland-Martinet R, Morlet J, Tchamitchian P (1990) A real-time algorithm for signal analysis with the help of the wavelet transform. In: Wavelets. Springer, pp 286–297

  • Huang J, Rathod V, Sun C, Zhu M, Korattikara A, Fathi A, Fischer I, Wojna Z, Song Y, Guadarrama S et al (2017) Speed/accuracy trade-offs for modern convolutional object detectors. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 7310–7311

  • Kalchbrenner N, Espeholt L, Simonyan K, Oord Avd, Graves A, Kavukcuoglu K (2016) Neural machine translation in linear time. arXiv preprint arXiv:1610.10099

  • Kalchbrenner N, van den Oord A, Simonyan K, Danihelka I, Vinyals O, Graves A, Kavukcuoglu K (2017) Video pixel networks. In: Proceedings of the international conference on machine learning. pp 1771–1779

  • Li H, Zhao R, Wang X (2014) Highly efficient forward and backward propagation of convolutional neural networks for pixelwise classification. arXiv preprint arXiv:1412.4526

  • Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: Proceedings of the European conference on computer vision. Springer, pp 740–755

  • Liu W, Rabinovich A, Berg AC (2015) Parsenet: Looking wider to see better. arXiv preprint arXiv:1506.04579

  • Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 3431–3440

  • Luo W, Li Y, Urtasun R, Zemel R (2016) Understanding the effective receptive field in deep convolutional neural networks. In: Advances in neural information processing systems. pp 4898–4906

  • Mamalet F, Garcia C (2012) Simplifying convnets for fast learning. In: Proceedings of the international conference on artificial neural networks. Springer, pp 58–65

  • Monti F, Boscaini D, Masci J, Rodola E, Svoboda J, Bronstein MM (2017) Geometric deep learning on graphs and manifolds using mixture model CNNs. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 5115–5124

  • Oord Avd, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kalchbrenner N, Senior A, Kavukcuoglu K (2016) Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499

  • Papandreou G, Kokkinos I, Savalle PA (2015) Modeling local and global deformations in deep learning: epitomic convolution, multiple instance learning, and sliding window detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 390–399

  • Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, Lecun Y (2014) Overfeat: Integrated recognition, localization and detection using convolutional networks. In: Proceedings of the international conference on learning representations

  • Shensa MJ (1992) The discrete wavelet transform: wedding the a trous and Mallat algorithms. IEEE Trans Signal Process 40(10):2464–2482

    Article  Google Scholar 

  • Veličković P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y (2018) Graph attention networks. In: Proceedings of the international conference on learning representations

  • Wang Z, Ji S (2018) Smoothed dilated convolutions for improved dense prediction. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 2486–2495

  • Wang P, Chen P, Yuan Y, Liu D, Huang Z, Hou X, Cottrell G (2018) Understanding convolution for semantic segmentation. In: Proceedings of the IEEE winter conference on applications of computer vision. IEEE, pp 1451–1460

  • Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions. In: Proceedings of the international conference on learning representations

  • Yu F, Koltun V, Funkhouser T (2017) Dilated residual networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 472–480

  • Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 2881–2890

  • Ziegler T, Fritsche M, Kuhn L, Donhauser K (2019) Efficient smoothing of dilated convolutions for image segmentation. arXiv preprint arXiv:1903.07992

Download references

Acknowledgements

This work was supported in part by National Science Foundation grant IIS-1633359 and Defense Advanced Research Projects Agency grant N66001-17-2-4031.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shuiwang Ji.

Additional information

Responsible editor: Pierre Baldi.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Z., Ji, S. Smoothed dilated convolutions for improved dense prediction. Data Min Knowl Disc 35, 1470–1496 (2021). https://doi.org/10.1007/s10618-021-00765-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-021-00765-5

Keywords

Navigation