Online continual learning with saliency-guided experience replay using tiny episodic memory | Machine Vision and Applications Skip to main content

Advertisement

Log in

Online continual learning with saliency-guided experience replay using tiny episodic memory

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

Artificial learning systems aspire to mimic human intelligence by continually learning from a stream of tasks without forgetting past knowledge. One way to enable such learning is to store past experiences in the form of input examples in episodic memory and replay them when learning new tasks. However, performance of such method suffers as the size of the memory becomes smaller. In this paper, we propose a new approach for experience replay, where we select the past experiences by looking at the saliency maps, which provide visual explanations for the model’s decision. Guided by these saliency maps, we pack the memory with only the parts or patches of the input images important for the model’s prediction. While learning a new task, we replay these memory patches with appropriate zero-padding to remind the model about its past decisions. We evaluate our algorithm on CIFAR-100, miniImageNet and CUB datasets and report better performance than the state-of-the-art approaches. We perform a detailed study to show the effectiveness of zero-padded patch replay compared to the other candidate approaches. Moreover, with qualitative and quantitative analyses we show that our method captures richer summaries of past experiences without any memory increase and hence performs well with small episodic memory.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep Learning, vol. 1. MIT Press, Cambridge (2016)

    MATH  Google Scholar 

  2. Mccloskey, M., Cohen, N.J.: Catastrophic interference in connectionist networks: the sequential learning problem. Psychol. Learn. Motiv. 24, 104–169 (1989)

    Google Scholar 

  3. Ratcliff, R.: Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. Psychol. Rev. 97(2), 285–308 (1990)

    Article  Google Scholar 

  4. Ring, M.B.: Child: a first step towards continual learning (1998)

  5. Thrun, S., Mitchell, T.M.: Lifelong robot learning. Robotics Auton. Syst. 15, 25–46 (1995)

    Article  Google Scholar 

  6. Chaudhry, A., et al.: Continual learning with tiny episodic memories. arXiv:1902.10486 (2019)

  7. Chaudhry, A., Gordo, A., Dokania, P.K., Torr, P.H., Lopez-Paz, D.: Using hindsight to anchor past knowledge in continual learning (2021)

  8. Riemer, M., et al.: Learning to learn without forgetting by maximizing transfer and minimizing interference (2019)

  9. Buzzega, P., Boschini, M., Porrello, A., Abati, D., Calderara, S.: Dark Experience for General Continual Learning: A Strong, Simple Baseline, vol. 33, pp. 15920–15930. Curran Associates, Inc. (2020)

  10. Mai, Z., Li, R., Jeong, J., Quispe, R., Kim, H., Sanner, S.: Online continual learning in image classification: An empirical survey. Neurocomputing 469, 28–51 (2022)

    Article  Google Scholar 

  11. Knoblauch, J., Husain, H., Diethe, T.: Optimal continual learning has perfect memory and is np-hard (2020)

  12. Prabhu, A., Torr, P., Dokania, P.: Gdumb: a simple approach that questions our progress in continual learning (2020)

  13. Verwimp, E., De Lange, M., Tuytelaars, T.: Rehearsal revealed: the limits and merits of revisiting samples in continual learning, pp. 9385–9394 (2021)

  14. Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps (2014)

  15. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization, pp. 2921–2929 (2016)

  16. Adadi, A., Berrada, M.: Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6, 52138–52160 (2018)

    Article  Google Scholar 

  17. Selvaraju, R.R., et al.: Grad-CAM: visual explanations from deep networks via gradient-based localization (2017)

  18. Zhang, J., et al.: Top-down neural attention by excitation backprop, 126 (2018)

  19. De Lange, M., Tuytelaars, T.: Continual prototype evolution: learning online from non-stationary data streams, pp. 8250–8259 (2021)

  20. Shim, D., et al.: Online class-incremental continual learning with adversarial shapley value 35, 9630–9638 (2021)

  21. Saha, G., Roy, K.: Saliency guided experience packing for replay in continual learning, pp. 5273–5283 (2023)

  22. Delange, M., et al.: A continual learning survey: defying forgetting in classification tasks. IEEE Trans. Pattern Anal. Mach. Intell. 1–1 (2021)

  23. Kirkpatrick, J., et al.: Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. 114, 3521–3526 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  24. Zenke, F., Poole, B., Ganguli, S.: Continual learning through synaptic intelligence. In: Proceedings of Machine Learning Research, vol. 70, pp. 3987–3995. PMLR (2017)

  25. Aljundi, R., Babiloni, F., Elhoseiny, M., Rohrbach, M., Tuytelaars, T.: Memory aware synapses: learning what (not) to forget, pp. 139–154 (2018)

  26. Li, Z., Hoiem, D.: Learning without forgetting. IEEE Trans. Pattern Anal. Mach. Intell. 40, 2935–2947 (2018)

    Article  Google Scholar 

  27. Dhar, P., Singh, R.V., Peng, K.-C., Wu, Z., Chellappa, R.: Learning without memorizing, pp. 5133–5141 (2019)

  28. Nguyen, C.V., Li, Y., Bui, T.D., Turner, R.E.: Variational continual learning (2018)

  29. Rusu, A.A. et al.: Progressive neural networks. arXiv:1606.04671 (2016)

  30. Yoon, J., Yang, E., Lee, J., Hwang, S.J.: Lifelong learning with dynamically expandable networks (2018)

  31. Mallya, A., Lazebnik, S.: Packnet: adding multiple tasks to a single network by iterative pruning. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7765–7773 (2018)

  32. Serrà, J., Surís, D., Miron, M., Karatzoglou, A.: Overcoming catastrophic forgetting with hard attention to the task. In: Proceedings of Machine Learning Research, vol. 80, pp. 4548–4557. PMLR (2018)

  33. Saha, G., Garg, I., Ankit, A., Roy, K.: SPACE: structured compression and sharing of representational space for continual learning. IEEE Access 1–1 (2021)

  34. Robins, A.V.: Catastrophic forgetting, rehearsal and pseudorehearsal. Connect. Sci. 7, 123–146 (1995)

    Article  Google Scholar 

  35. Rebuffi, S.-A., Kolesnikov, A., Sperl, G., Lampert, C.H.: iCaRL: incremental classifier and representation learning. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5533–5542 (2017)

  36. Shin, H., Lee, J.K., Kim, J., Kim, J.: Continual learning with deep generative replay, vol. 30 (2017)

  37. Farajtabar, M., Azizan, N., Mott, A., Li, A. Chiappa, S., Calandra, R. (eds.): Orthogonal gradient descent for continual learning. In: Chiappa, S., Calandra, R. (eds.) Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, vol. 108, pp. 3762–3773. PMLR (2020)

  38. Saha, G., Garg, I., Roy, K.: Gradient projection memory for continual learning (2021)

  39. Saha, G., Roy, K.: Continual learning with scaled gradient projection. arXiv:2302.01386 (2023). https://ojs.aaai.org/index.php/AAAI/article/view/26157

  40. Aljundi, R., Lin, M., Goujaud, B., Bengio, Y. Wallach, H., et al. (eds.): Gradient based sample selection for online continual learning. In: Wallach, H., et al. (eds.) Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019)

  41. Aljundi, R. et al. (eds.): Online continual learning with maximal interfered retrieval. In: Wallach, H., et al.(eds.) Advances in Neural Information Processing Systems, vol. 32 (2019)

  42. Mai, Z., Li, R., Kim, H., Sanner, S.: Supervised contrastive replay: revisiting the nearest class mean classifier in online class-incremental continual learning, pp. 3589–3599 (2021)

  43. Sun, S., Calandriello, D., Hu, H., Li, A., Titsias, M.: Information-theoretic online memory selection for continual learning (2022). https://openreview.net/forum?id=IpctgL7khPp

  44. Zhang, Y., et al. (eds.): A simple but strong baseline for online continual learning: Repeated augmented rehearsal. In: Oh, A.H., Agarwal, A., Belgrave, D., Cho, K. (eds.) Advances in Neural Information Processing Systems (2022)

  45. Lopez-Paz, D., Ranzato, M.A.: Gradient episodic memory for continual learning, vol. 30 (2017)

  46. Chaudhry, A., Ranzato, M., Rohrbach, M., Elhoseiny, M.: Efficient lifelong learning with A-GEM (2019)

  47. Guo, Y., Liu, M., Yang, T., Rosing, T.: Improved schemes for episodic memory-based lifelong learning. In: Advances in Neural Information Processing Systems, vol. 33 (2020)

  48. Ebrahimi, S., et al.: Remembering for the right reasons: explanations reduce catastrophic forgetting (2021). https://openreview.net/forum?id=tHgJoMfy6nI

  49. Chrysakis, A., Moens, M.-F.: Online continual learning from imbalanced data. In: Proceedings of Machine Learning Research, vol. 119, pp. 1952–1961. PMLR (2020)

  50. Zhao, R., Ouyang, W., Li, H., Wang, X.: Saliency detection by multi-context deep learning, pp. 1265–1274 (2015)

  51. Wang, L., Lu, H., Ruan, X., Yang, M.-H.: Deep networks for saliency detection via local estimation and global search, pp. 3183–3192 (2015)

  52. Chattopadhay, A., Sarkar, A., Howlader, P., Balasubramanian, V.N.: Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks, pp. 839–847. IEEE (2018)

  53. Smilkov, D., Thorat, N., Kim, B., Viégas, F., Wattenberg, M.: SmoothGrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825 (2017)

  54. Srinivas, S., Fleuret, F.: Full-gradient representation for neural network visualization (2019)

  55. Rao, S., Böhle, M., Schiele, B.: Towards better understanding attribution methods, pp. 10223–10232 (2022)

  56. Li, L., et al.: Scouter: slot attention-based classifier for explainable image recognition, pp. 1046–1055 (2021)

  57. Hayes, T.L., Cahill, N.D., Kanan, C.: Memory efficient experience replay for streaming learning. IEEE (2019)

  58. Krizhevsky, A.: Learning multiple layers of features from tiny images. Tech. Rep. (2009)

  59. Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., Wierstra, D.: Matching networks for one shot learning, vol. 29 (2016)

  60. Welinder, P., et al.: Caltech-UCSD Birds 200. Tech. Rep. CNS-TR-2010-001, California Institute of Technology (2010)

  61. Hsu, Y.-C., Liu, Y.-C., Ramasamy, A., Kira, Z.: Re-evaluating continual learning scenarios: a categorization and case for strong baselines. arXiv:1810.12488 (2018)

  62. Yun, S., et al.: Cutmix: regularization strategy to train strong classifiers with localizable features, pp. 6023–6032 (2019)

  63. Rolnick, D., Ahuja, A., Schwarz, J., Lillicrap, T., Wayne, G. Wallach, H., et al. (eds.): Experience replay for continual learning. In: Wallach, H., et al. (eds.) Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019)

  64. Saha, G., Wang, C., Raghunathan, A., Roy, K.: A cross-layer approach to cognitive computing, pp. 1327–1330 (2022)

  65. Mahendran, A., Vedaldi, A.: Visualizing deep convolutional neural networks using natural pre-images. Int. J. Comput. Vis. 120, 233–255 (2016)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Science Foundation, Vannevar Bush Faculty Fellowship, Army Research Office, MURI, and by Center for Brain-Inspired Computing (C-BRIC), one of six centers in JUMP, a SRC program sponsored by DARPA.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gobinda Saha.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Algorithm

Pseudocode for the episodic memory (\({\mathcal {M}}_E\)) update algorithm in EPR is given in Algorithm 2. For each example, I in \({\mathcal {M}}_T\), we generate the corresponding saliency map, \(I_{sm}\) using the chosen saliency method, \(\texttt {XAI}\) (Line 5). Then, we identify a square region (of size \(W_p\times W_p\)) in that map that has the maximum average intensity and find the corresponding image patch, \(I_p\) (Lines 6–7). We then zero-pad the image patch and record the model prediction on it (Lines 8–11). We also record the true image label and image patch coordinates (Lines 12–14). Given the memory budget (\(|{\mathcal {M}}_E|\)) and EPF, we then perform the memory selection procedure (Line 16), where memory patches are selected in priority of their correct predictions after zero-padding (see Sect. 4 for details). Each selected image patch is then added to \({\mathcal {M}}_E\) with corresponding task-ID, class label, and localizable coordinates in the original image (Lines 17–19).

Appendix B: Grad-CAM

Gradient-weighted Class Activation Mapping (Grad-CAM) [17] is a saliency method that uses gradients to determine the impact of specific feature map activations on a given prediction. Since later layers in the convolutional neural network capture high-level semantics [65], taking gradients of a model output with respect to the feature map activations from one such layers identifies which high-level semantics are important for the model prediction. In our analysis, we select this layer and refer to as target layer [48].

Let’s consider the target layer has M feature maps where each feature map, \(A^m \in {\mathbb {R}}^{u\times v}\) is of width u and height v. Also consider, for a given image (\(I \in {\mathbb {R}}^{W\times H\times C}\)) belonging to class c, the pre-softmax score of the image classifier is \(y_c\). To obtain the class-discriminative saliency map, Grad-CAM first takes derivative of \(y_c\) with respect to each feature map \(A^m\). These gradients are then global-average-pooled over u and v to obtain importance weight, \(\alpha _m^c\) for each feature map:

$$\begin{aligned} \alpha _m^c = \frac{1}{uv} \sum _{i=1}^u\sum _{j=1}^v \frac{\partial y_c}{\partial A_{ij}^m}, \end{aligned}$$
(B1)

where \(A^m_{ij}\) denotes location (ij) in the feature map \(A^m\). Next, these weights are used for computing linear combination of the feature map activations, which is then followed by ReLU to obtain the localization map:

$$\begin{aligned} L^c_{{\textit{Grad-CAM}}} = \text {ReLU}\left( \sum _{m=1}^M \alpha _m^cA^m\right) \end{aligned}$$
(B2)

This map is of the same size (\(u\times v\)) of \(A^m\). Finally, saliency map, \(I_{sm} \in {\mathbb {R}}^{W\times H}\), is generated by upsampling \(L^c_{{\textit{Grad-CAM}}}\) to the input image resolution using bilinear interpolation.

$$\begin{aligned} I_{sm} = \texttt {Upsample}~(L^c_{{\textit{Grad-CAM}}}) \end{aligned}$$
(B3)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saha, G., Roy, K. Online continual learning with saliency-guided experience replay using tiny episodic memory. Machine Vision and Applications 34, 65 (2023). https://doi.org/10.1007/s00138-023-01420-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-023-01420-3

Keywords