Enhancing Performance of Occlusion-Based Explanation Methods by a Hierarchical Search Method on Input Images | SpringerLink
Skip to main content

Enhancing Performance of Occlusion-Based Explanation Methods by a Hierarchical Search Method on Input Images

  • Conference paper
  • First Online:
Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2021)

Abstract

In this work, we address some drawbacks of back-propagation-based and perturbation-based visualization methods by proposing an explanation method called Fast Multi-resolution Occlusion (FMO). FMO, opposite to the back-propagation-based methods that cannot be applied on all types of Convolutional Neural Networks (CNNs), can highlight the important input features independent of the architecture. Also, FMO introduces a novel fast occlusion strategy called multi-resolution occlusion which not only efficiently addresses the time-consumption issue of the traditional Occlusion Test method but also outperforms the well-known perturbation-based methods. We assess the methods on CNNs DenseNet121, InceptionV3, InceptionResnetV2, MobileNet, and ResNet50 using three datasets ILSVRC2012, PASCAL VOC07, and COCO14.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 17159
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 21449
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Behzadi-khormouji, H., et al.: Deep learning, reusable and problem-based architectures for detection of consolidation on chest X-ray images. Comput. Meth. Program. Biomed. 185, 105162 (2020). ISSN 0169-2607. https://doi.org/10.1016/j.cmpb.2019.105162

  2. Gupta, A., Anpalagan, A., Guan, L., Khwaja, A.S.: Deep learning for object detection and scene perception in self-driving cars: survey, challenges, and open issues. Array 10, 100057 (2021). ISSN 2590-0056. https://doi.org/10.1016/j.array.2021.100057

  3. Xiao, D., Yang, X., Li, J., Islam, M.: Attention deep neural network for lane marking detection. Knowl. Based Syst. 194, 105584 (2020). https://doi.org/10.1016/j.knosys.2020.105584

    Article  Google Scholar 

  4. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. arXiv arXiv:1512.04150 (2015)

  5. Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. In: 2nd International Conference on Learning Representations, ICLR 2014 (Workshop Track Proceedings) (2014)

    Google Scholar 

  6. José Oramas, M., Wang, K., Tuytelaars, T.: Visual explanation by interpretation: improving visual feedback capabilities of deep neural networks. In: 7th International Conference on Learning Representations, ICLR 2019 (2019)

    Google Scholar 

  7. Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. arXiv arXiv:1610.02391 (2017)

  8. Shrikumar, A., Greenside, P., Kundaje, A.: Learning important features through propagating activation differences. arXiv arXiv:1704.02685 (2017)

  9. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. arXiv arXiv:1311.2901 (2014)

  10. Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should i trust you?” Explaining the predictions of any classifier. arXiv arXiv:1602.04938 (2016)

  11. Petsiuk, V., Das, A., Saenko, K.: RISE: randomized input sampling for explanation of black-box models, v1 (2018). http://arxiv.org/abs/1806.07421

  12. Fong, R.C., Vedaldi, A.: Interpretable explanations of black boxes by meaningful perturbation. arXiv arXiv:1704.03296 (2018)

  13. Fong, R., Patrick, M., Vedaldi, A.: Understanding deep networks via extremal perturbations and smooth masks. In: 2019 Proceedings of the IEEE International Conference on Computer Vision, pp. 2950–2958 (2019). https://doi.org/10.1109/ICCV.2019.00304

  14. Behzadi-Khormouji, H., Rostami, H.: Fast multi-resolution occlusion: a method for explaining and understanding deep neural networks. Appl. Intell. 51(4), 2431–2455 (2020). https://doi.org/10.1007/s10489-020-01946-3

    Article  Google Scholar 

  15. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2–9 (2009). https://doi.org/10.1109/CVPR.2009.5206848

  16. Everingham, M., Ali Eslami, S.M., Van Gool, L., Williams, C.K.I., Winn, J.M., Zisserman, A.: The Pascal visual object classes challenge - a retrospective. Int. J. Comput. Vis. 111, 98–136 (2014)

    Google Scholar 

  17. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  18. Shakeel, M.S., Lam, K.M.: Deep-feature encoding-based discriminative model for age-invariant face recognition. Pattern Recogn. 93, 442–457 (2019). https://doi.org/10.1016/j.patcog.2019.04.028

    Article  Google Scholar 

  19. Szegedy, C., Vanhoucke, V., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. arXiv arXiv:1512.00567 (2015)

  20. Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.: Inception-v4, Inception-ResNet and the impact of residual connections on learning. arXiv arXiv:1602.07261 (2016)

  21. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. In: 2018 Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018). https://doi.org/10.1109/CVPR.2018.00474

  22. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015). https://doi.org/10.1109/CVPR.2016.90

  23. Zhang, J., Bargal, S.A., Lin, Z., Brandt, J., Shen, X., Sclaroff, S.: Top-down neural attention by excitation backprop. Int. J. Comput. Vis. 126(10), 1084–1102 (2017). https://doi.org/10.1007/s11263-017-1059-x

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hamed Behzadi-Khormouji .

Editor information

Editors and Affiliations

Appendices

Appendix A: More Details of the Equations

The index j in Eq. (3) indicates the index of elements in the probability matrix \(P_{n_i*n_i}^{R_i}\). Also Z is the probability of the original unoccluded image I belonged to the class index Y, and \(\hat{Z}\) also shows the probability belonged to the class index Y, but when the occluded image I has been passed through the model. Therefore, to record the changes in the output probability, opposite to the Occlusion Test method that just records the output probability of occluded image, we record the normalized change of probability. As a result, each cell \([h_i^j.w_i^j ]\) in the probability matrix \(P_{n_i*n_i}^{R_i}\) indicates the normalized change of probability pertaining to a region of original image. The value in this cell shows the importance of that region in the form of normalized change of probability.

\(\gamma ^R_i\) in Eq. (4) indicates the probability matrix weight in the resolution \(R_i\). In order to see the heatmap in each resolution \(R_i\), the weight of the resolution \(R_i\) is set to 1 and the weight of the others is set to 0. In this equation, before performing the weighted sum, all probability matrix \(P_{n_i*n_i}^{R_i}\) are resized to the shape of the original image.

Appendix B: Details of Time Consumption

Table 1 shows the average time consumption of the FMO, RISE, LIME, Extremal Perturbation and Meaningful Perturbation methods on the models DenseNet121, InceptionV2, Inception V3, MobileNet, ResNet50. As can be seen, the proposed method, FMO, had the lowest time consumption over all models in comparison to the other methods, whereas Occlusion Test method had the highest time consumption. For example, FMO takes 1.90 s, 4.86 s, 2.71 s, 0.59 s and 2.70 s. On models DenseNet121, InceptionResNetV2, InceptionV3, MobileNet and ResNet50, respectively, which are far less than those of the Occlusion Test, RISE, LIME, Extremal Perturbation and Meaningful Perturbation methods in all of the five models.

Table 1. Average time consumption of the FMO, RISE, LIME, Extremal Perturbation and Meaningful Perturbation methods

Appendix C: Details of Visual Accuracy

Table 2 shows the localization accuracy of the methods on DenseNet121 and ResNet50. As can be seen, FMO outperforms other methods in terms of localization accuracy and time consumption on two datasets VOC07 and COCO14.

Table 2. Localized accuracy of each method on two hard datasets.

Figure 2 and 3 illustrate the visualization results on two datasets VOC07 and COCO14. According to these figures, FMO and Meaningful Perturbation methods can highlight properly the regions of interest in comparison to other methods.

Fig. 2.
figure 2

The visualization output of the methods with the PASCAL VOC dataset on model DenseNet121.

Fig. 3.
figure 3

The visualization output of the methods with COCO dataset on the ResNet50 model.

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Behzadi-Khormouji, H., Rostami, H. (2021). Enhancing Performance of Occlusion-Based Explanation Methods by a Hierarchical Search Method on Input Images. In: Kamp, M., et al. Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2021. Communications in Computer and Information Science, vol 1524. Springer, Cham. https://doi.org/10.1007/978-3-030-93736-2_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-93736-2_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-93735-5

  • Online ISBN: 978-3-030-93736-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics