In this work, we address some drawbacks of back-propagation-based and perturbation-based visualization methods by proposing an explanation method called Fast Multi-resolution Occlusion (FMO). FMO, opposite to the back-propagation-based methods that cannot be applied on all types of Convolutional Neural Networks (CNNs), can highlight the important input features independent of the architecture. Also, FMO introduces a novel fast occlusion strategy called multi-resolution occlusion which not only efficiently addresses the time-consumption issue of the traditional Occlusion Test method but also outperforms the well-known perturbation-based methods. We assess the methods on CNNs DenseNet121, InceptionV3, InceptionResnetV2, MobileNet, and ResNet50 using three datasets ILSVRC2012, PASCAL VOC07, and COCO14.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Behzadi-khormouji, H., et al.: Deep learning, reusable and problem-based architectures for detection of consolidation on chest X-ray images. Comput. Meth. Program. Biomed. 185, 105162 (2020). ISSN 0169-2607. https://doi.org/10.1016/j.cmpb.2019.105162
Gupta, A., Anpalagan, A., Guan, L., Khwaja, A.S.: Deep learning for object detection and scene perception in self-driving cars: survey, challenges, and open issues. Array 10, 100057 (2021). ISSN 2590-0056. https://doi.org/10.1016/j.array.2021.100057
Xiao, D., Yang, X., Li, J., Islam, M.: Attention deep neural network for lane marking detection. Knowl. Based Syst. 194, 105584 (2020). https://doi.org/10.1016/j.knosys.2020.105584
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. arXiv arXiv:1512.04150 (2015)
Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. In: 2nd International Conference on Learning Representations, ICLR 2014 (Workshop Track Proceedings) (2014)
José Oramas, M., Wang, K., Tuytelaars, T.: Visual explanation by interpretation: improving visual feedback capabilities of deep neural networks. In: 7th International Conference on Learning Representations, ICLR 2019 (2019)
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. arXiv arXiv:1610.02391 (2017)
Shrikumar, A., Greenside, P., Kundaje, A.: Learning important features through propagating activation differences. arXiv arXiv:1704.02685 (2017)
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. arXiv arXiv:1311.2901 (2014)
Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should i trust you?” Explaining the predictions of any classifier. arXiv arXiv:1602.04938 (2016)
Petsiuk, V., Das, A., Saenko, K.: RISE: randomized input sampling for explanation of black-box models, v1 (2018). http://arxiv.org/abs/1806.07421
Fong, R.C., Vedaldi, A.: Interpretable explanations of black boxes by meaningful perturbation. arXiv arXiv:1704.03296 (2018)
Fong, R., Patrick, M., Vedaldi, A.: Understanding deep networks via extremal perturbations and smooth masks. In: 2019 Proceedings of the IEEE International Conference on Computer Vision, pp. 2950–2958 (2019). https://doi.org/10.1109/ICCV.2019.00304
Behzadi-Khormouji, H., Rostami, H.: Fast multi-resolution occlusion: a method for explaining and understanding deep neural networks. Appl. Intell. 51(4), 2431–2455 (2020). https://doi.org/10.1007/s10489-020-01946-3
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2–9 (2009). https://doi.org/10.1109/CVPR.2009.5206848
Everingham, M., Ali Eslami, S.M., Van Gool, L., Williams, C.K.I., Winn, J.M., Zisserman, A.: The Pascal visual object classes challenge - a retrospective. Int. J. Comput. Vis. 111, 98–136 (2014)
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Shakeel, M.S., Lam, K.M.: Deep-feature encoding-based discriminative model for age-invariant face recognition. Pattern Recogn. 93, 442–457 (2019). https://doi.org/10.1016/j.patcog.2019.04.028
Szegedy, C., Vanhoucke, V., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. arXiv arXiv:1512.00567 (2015)
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.: Inception-v4, Inception-ResNet and the impact of residual connections on learning. arXiv arXiv:1602.07261 (2016)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. In: 2018 Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018). https://doi.org/10.1109/CVPR.2018.00474
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015). https://doi.org/10.1109/CVPR.2016.90
Zhang, J., Bargal, S.A., Lin, Z., Brandt, J., Shen, X., Sclaroff, S.: Top-down neural attention by excitation backprop. Int. J. Comput. Vis. 126(10), 1084–1102 (2017). https://doi.org/10.1007/s11263-017-1059-x
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix A: More Details of the Equations
The index j in Eq. (3) indicates the index of elements in the probability matrix \(P_{n_i*n_i}^{R_i}\). Also Z is the probability of the original unoccluded image I belonged to the class index Y, and \(\hat{Z}\) also shows the probability belonged to the class index Y, but when the occluded image I has been passed through the model. Therefore, to record the changes in the output probability, opposite to the Occlusion Test method that just records the output probability of occluded image, we record the normalized change of probability. As a result, each cell \([h_i^j.w_i^j ]\) in the probability matrix \(P_{n_i*n_i}^{R_i}\) indicates the normalized change of probability pertaining to a region of original image. The value in this cell shows the importance of that region in the form of normalized change of probability.
\(\gamma ^R_i\) in Eq. (4) indicates the probability matrix weight in the resolution \(R_i\). In order to see the heatmap in each resolution \(R_i\), the weight of the resolution \(R_i\) is set to 1 and the weight of the others is set to 0. In this equation, before performing the weighted sum, all probability matrix \(P_{n_i*n_i}^{R_i}\) are resized to the shape of the original image.
Appendix B: Details of Time Consumption
Table 1 shows the average time consumption of the FMO, RISE, LIME, Extremal Perturbation and Meaningful Perturbation methods on the models DenseNet121, InceptionV2, Inception V3, MobileNet, ResNet50. As can be seen, the proposed method, FMO, had the lowest time consumption over all models in comparison to the other methods, whereas Occlusion Test method had the highest time consumption. For example, FMO takes 1.90 s, 4.86 s, 2.71 s, 0.59 s and 2.70 s. On models DenseNet121, InceptionResNetV2, InceptionV3, MobileNet and ResNet50, respectively, which are far less than those of the Occlusion Test, RISE, LIME, Extremal Perturbation and Meaningful Perturbation methods in all of the five models.
Appendix C: Details of Visual Accuracy
Table 2 shows the localization accuracy of the methods on DenseNet121 and ResNet50. As can be seen, FMO outperforms other methods in terms of localization accuracy and time consumption on two datasets VOC07 and COCO14.
Figure 2 and 3 illustrate the visualization results on two datasets VOC07 and COCO14. According to these figures, FMO and Meaningful Perturbation methods can highlight properly the regions of interest in comparison to other methods.
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Behzadi-Khormouji, H., Rostami, H. (2021). Enhancing Performance of Occlusion-Based Explanation Methods by a Hierarchical Search Method on Input Images. In: Kamp, M., et al. Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2021. Communications in Computer and Information Science, vol 1524. Springer, Cham. https://doi.org/10.1007/978-3-030-93736-2_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-93736-2_9
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-93735-5
Online ISBN: 978-3-030-93736-2
eBook Packages: Computer ScienceComputer Science (R0)