Online continual learning with saliency-guided experience replay using tiny episodic memory

Saha, Gobinda; Roy, Kaushik

doi:10.1007/s00138-023-01420-3

Online continual learning with saliency-guided experience replay using tiny episodic memory

Original Paper
Published: 12 July 2023

Volume 34, article number 65, (2023)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

318 Accesses
2 Altmetric
Explore all metrics

Abstract

Artificial learning systems aspire to mimic human intelligence by continually learning from a stream of tasks without forgetting past knowledge. One way to enable such learning is to store past experiences in the form of input examples in episodic memory and replay them when learning new tasks. However, performance of such method suffers as the size of the memory becomes smaller. In this paper, we propose a new approach for experience replay, where we select the past experiences by looking at the saliency maps, which provide visual explanations for the model’s decision. Guided by these saliency maps, we pack the memory with only the parts or patches of the input images important for the model’s prediction. While learning a new task, we replay these memory patches with appropriate zero-padding to remind the model about its past decisions. We evaluate our algorithm on CIFAR-100, miniImageNet and CUB datasets and report better performance than the state-of-the-art approaches. We perform a detailed study to show the effectiveness of zero-padded patch replay compared to the other candidate approaches. Moreover, with qualitative and quantitative analyses we show that our method captures richer summaries of past experiences without any memory increase and hence performs well with small episodic memory.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Continual learning with selective nets

Article Open access 29 March 2025

Uncertainty-aware enhanced dark experience replay for continual learning

Article 03 June 2024

Dynamic Memory-Based Continual Learning with Generating and Screening

References

Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep Learning, vol. 1. MIT Press, Cambridge (2016)
MATH Google Scholar
Mccloskey, M., Cohen, N.J.: Catastrophic interference in connectionist networks: the sequential learning problem. Psychol. Learn. Motiv. 24, 104–169 (1989)
Google Scholar
Ratcliff, R.: Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. Psychol. Rev. 97(2), 285–308 (1990)
Article Google Scholar
Ring, M.B.: Child: a first step towards continual learning (1998)
Thrun, S., Mitchell, T.M.: Lifelong robot learning. Robotics Auton. Syst. 15, 25–46 (1995)
Article Google Scholar
Chaudhry, A., et al.: Continual learning with tiny episodic memories. arXiv:1902.10486 (2019)
Chaudhry, A., Gordo, A., Dokania, P.K., Torr, P.H., Lopez-Paz, D.: Using hindsight to anchor past knowledge in continual learning (2021)
Riemer, M., et al.: Learning to learn without forgetting by maximizing transfer and minimizing interference (2019)
Buzzega, P., Boschini, M., Porrello, A., Abati, D., Calderara, S.: Dark Experience for General Continual Learning: A Strong, Simple Baseline, vol. 33, pp. 15920–15930. Curran Associates, Inc. (2020)
Mai, Z., Li, R., Jeong, J., Quispe, R., Kim, H., Sanner, S.: Online continual learning in image classification: An empirical survey. Neurocomputing 469, 28–51 (2022)
Article Google Scholar
Knoblauch, J., Husain, H., Diethe, T.: Optimal continual learning has perfect memory and is np-hard (2020)
Prabhu, A., Torr, P., Dokania, P.: Gdumb: a simple approach that questions our progress in continual learning (2020)
Verwimp, E., De Lange, M., Tuytelaars, T.: Rehearsal revealed: the limits and merits of revisiting samples in continual learning, pp. 9385–9394 (2021)
Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps (2014)
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization, pp. 2921–2929 (2016)
Adadi, A., Berrada, M.: Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6, 52138–52160 (2018)
Article Google Scholar
Selvaraju, R.R., et al.: Grad-CAM: visual explanations from deep networks via gradient-based localization (2017)
Zhang, J., et al.: Top-down neural attention by excitation backprop, 126 (2018)
De Lange, M., Tuytelaars, T.: Continual prototype evolution: learning online from non-stationary data streams, pp. 8250–8259 (2021)
Shim, D., et al.: Online class-incremental continual learning with adversarial shapley value 35, 9630–9638 (2021)
Saha, G., Roy, K.: Saliency guided experience packing for replay in continual learning, pp. 5273–5283 (2023)
Delange, M., et al.: A continual learning survey: defying forgetting in classification tasks. IEEE Trans. Pattern Anal. Mach. Intell. 1–1 (2021)
Kirkpatrick, J., et al.: Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. 114, 3521–3526 (2017)
Article MathSciNet MATH Google Scholar
Zenke, F., Poole, B., Ganguli, S.: Continual learning through synaptic intelligence. In: Proceedings of Machine Learning Research, vol. 70, pp. 3987–3995. PMLR (2017)
Aljundi, R., Babiloni, F., Elhoseiny, M., Rohrbach, M., Tuytelaars, T.: Memory aware synapses: learning what (not) to forget, pp. 139–154 (2018)
Li, Z., Hoiem, D.: Learning without forgetting. IEEE Trans. Pattern Anal. Mach. Intell. 40, 2935–2947 (2018)
Article Google Scholar
Dhar, P., Singh, R.V., Peng, K.-C., Wu, Z., Chellappa, R.: Learning without memorizing, pp. 5133–5141 (2019)
Nguyen, C.V., Li, Y., Bui, T.D., Turner, R.E.: Variational continual learning (2018)
Rusu, A.A. et al.: Progressive neural networks. arXiv:1606.04671 (2016)
Yoon, J., Yang, E., Lee, J., Hwang, S.J.: Lifelong learning with dynamically expandable networks (2018)
Mallya, A., Lazebnik, S.: Packnet: adding multiple tasks to a single network by iterative pruning. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7765–7773 (2018)
Serrà, J., Surís, D., Miron, M., Karatzoglou, A.: Overcoming catastrophic forgetting with hard attention to the task. In: Proceedings of Machine Learning Research, vol. 80, pp. 4548–4557. PMLR (2018)
Saha, G., Garg, I., Ankit, A., Roy, K.: SPACE: structured compression and sharing of representational space for continual learning. IEEE Access 1–1 (2021)
Robins, A.V.: Catastrophic forgetting, rehearsal and pseudorehearsal. Connect. Sci. 7, 123–146 (1995)
Article Google Scholar
Rebuffi, S.-A., Kolesnikov, A., Sperl, G., Lampert, C.H.: iCaRL: incremental classifier and representation learning. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5533–5542 (2017)
Shin, H., Lee, J.K., Kim, J., Kim, J.: Continual learning with deep generative replay, vol. 30 (2017)
Farajtabar, M., Azizan, N., Mott, A., Li, A. Chiappa, S., Calandra, R. (eds.): Orthogonal gradient descent for continual learning. In: Chiappa, S., Calandra, R. (eds.) Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, vol. 108, pp. 3762–3773. PMLR (2020)
Saha, G., Garg, I., Roy, K.: Gradient projection memory for continual learning (2021)
Saha, G., Roy, K.: Continual learning with scaled gradient projection. arXiv:2302.01386 (2023). https://ojs.aaai.org/index.php/AAAI/article/view/26157
Aljundi, R., Lin, M., Goujaud, B., Bengio, Y. Wallach, H., et al. (eds.): Gradient based sample selection for online continual learning. In: Wallach, H., et al. (eds.) Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019)
Aljundi, R. et al. (eds.): Online continual learning with maximal interfered retrieval. In: Wallach, H., et al.(eds.) Advances in Neural Information Processing Systems, vol. 32 (2019)
Mai, Z., Li, R., Kim, H., Sanner, S.: Supervised contrastive replay: revisiting the nearest class mean classifier in online class-incremental continual learning, pp. 3589–3599 (2021)
Sun, S., Calandriello, D., Hu, H., Li, A., Titsias, M.: Information-theoretic online memory selection for continual learning (2022). https://openreview.net/forum?id=IpctgL7khPp
Zhang, Y., et al. (eds.): A simple but strong baseline for online continual learning: Repeated augmented rehearsal. In: Oh, A.H., Agarwal, A., Belgrave, D., Cho, K. (eds.) Advances in Neural Information Processing Systems (2022)
Lopez-Paz, D., Ranzato, M.A.: Gradient episodic memory for continual learning, vol. 30 (2017)
Chaudhry, A., Ranzato, M., Rohrbach, M., Elhoseiny, M.: Efficient lifelong learning with A-GEM (2019)
Guo, Y., Liu, M., Yang, T., Rosing, T.: Improved schemes for episodic memory-based lifelong learning. In: Advances in Neural Information Processing Systems, vol. 33 (2020)
Ebrahimi, S., et al.: Remembering for the right reasons: explanations reduce catastrophic forgetting (2021). https://openreview.net/forum?id=tHgJoMfy6nI
Chrysakis, A., Moens, M.-F.: Online continual learning from imbalanced data. In: Proceedings of Machine Learning Research, vol. 119, pp. 1952–1961. PMLR (2020)
Zhao, R., Ouyang, W., Li, H., Wang, X.: Saliency detection by multi-context deep learning, pp. 1265–1274 (2015)
Wang, L., Lu, H., Ruan, X., Yang, M.-H.: Deep networks for saliency detection via local estimation and global search, pp. 3183–3192 (2015)
Chattopadhay, A., Sarkar, A., Howlader, P., Balasubramanian, V.N.: Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks, pp. 839–847. IEEE (2018)
Smilkov, D., Thorat, N., Kim, B., Viégas, F., Wattenberg, M.: SmoothGrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825 (2017)
Srinivas, S., Fleuret, F.: Full-gradient representation for neural network visualization (2019)
Rao, S., Böhle, M., Schiele, B.: Towards better understanding attribution methods, pp. 10223–10232 (2022)
Li, L., et al.: Scouter: slot attention-based classifier for explainable image recognition, pp. 1046–1055 (2021)
Hayes, T.L., Cahill, N.D., Kanan, C.: Memory efficient experience replay for streaming learning. IEEE (2019)
Krizhevsky, A.: Learning multiple layers of features from tiny images. Tech. Rep. (2009)
Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., Wierstra, D.: Matching networks for one shot learning, vol. 29 (2016)
Welinder, P., et al.: Caltech-UCSD Birds 200. Tech. Rep. CNS-TR-2010-001, California Institute of Technology (2010)
Hsu, Y.-C., Liu, Y.-C., Ramasamy, A., Kira, Z.: Re-evaluating continual learning scenarios: a categorization and case for strong baselines. arXiv:1810.12488 (2018)
Yun, S., et al.: Cutmix: regularization strategy to train strong classifiers with localizable features, pp. 6023–6032 (2019)
Rolnick, D., Ahuja, A., Schwarz, J., Lillicrap, T., Wayne, G. Wallach, H., et al. (eds.): Experience replay for continual learning. In: Wallach, H., et al. (eds.) Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019)
Saha, G., Wang, C., Raghunathan, A., Roy, K.: A cross-layer approach to cognitive computing, pp. 1327–1330 (2022)
Mahendran, A., Vedaldi, A.: Visualizing deep convolutional neural networks using natural pre-images. Int. J. Comput. Vis. 120, 233–255 (2016)
Article MathSciNet Google Scholar

Download references

Acknowledgements

This work was supported in part by the National Science Foundation, Vannevar Bush Faculty Fellowship, Army Research Office, MURI, and by Center for Brain-Inspired Computing (C-BRIC), one of six centers in JUMP, a SRC program sponsored by DARPA.

Author information

Authors and Affiliations

Elmore Family School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, 47907, USA
Gobinda Saha & Kaushik Roy

Authors

Gobinda Saha
View author publications
You can also search for this author inPubMed Google Scholar
Kaushik Roy
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Gobinda Saha.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Algorithm

Pseudocode for the episodic memory (${\mathcal {M}}_E$) update algorithm in EPR is given in Algorithm 2. For each example, I in ${\mathcal {M}}_T$, we generate the corresponding saliency map, $I_{sm}$ using the chosen saliency method, $\texttt {XAI}$ (Line 5). Then, we identify a square region (of size $W_p\times W_p$) in that map that has the maximum average intensity and find the corresponding image patch, $I_p$ (Lines 6–7). We then zero-pad the image patch and record the model prediction on it (Lines 8–11). We also record the true image label and image patch coordinates (Lines 12–14). Given the memory budget ($|{\mathcal {M}}_E|$) and EPF, we then perform the memory selection procedure (Line 16), where memory patches are selected in priority of their correct predictions after zero-padding (see Sect. 4 for details). Each selected image patch is then added to ${\mathcal {M}}_E$ with corresponding task-ID, class label, and localizable coordinates in the original image (Lines 17–19).

Appendix B: Grad-CAM

Gradient-weighted Class Activation Mapping (Grad-CAM) [17] is a saliency method that uses gradients to determine the impact of specific feature map activations on a given prediction. Since later layers in the convolutional neural network capture high-level semantics [65], taking gradients of a model output with respect to the feature map activations from one such layers identifies which high-level semantics are important for the model prediction. In our analysis, we select this layer and refer to as target layer [48].

Let’s consider the target layer has M feature maps where each feature map, $A^m \in {\mathbb {R}}^{u\times v}$ is of width u and height v. Also consider, for a given image ($I \in {\mathbb {R}}^{W\times H\times C}$) belonging to class c, the pre-softmax score of the image classifier is $y_c$. To obtain the class-discriminative saliency map, Grad-CAM first takes derivative of $y_c$ with respect to each feature map $A^m$. These gradients are then global-average-pooled over u and v to obtain importance weight, $\alpha _m^c$ for each feature map:

$$\begin{aligned} \alpha _m^c = \frac{1}{uv} \sum _{i=1}^u\sum _{j=1}^v \frac{\partial y_c}{\partial A_{ij}^m}, \end{aligned}$$

(B1)

where $A^m_{ij}$ denotes location (i, j) in the feature map $A^m$. Next, these weights are used for computing linear combination of the feature map activations, which is then followed by ReLU to obtain the localization map:

$$\begin{aligned} L^c_{{\textit{Grad-CAM}}} = \text {ReLU}\left( \sum _{m=1}^M \alpha _m^cA^m\right) \end{aligned}$$

(B2)

This map is of the same size ($u\times v$) of $A^m$. Finally, saliency map, $I_{sm} \in {\mathbb {R}}^{W\times H}$, is generated by upsampling $L^c_{{\textit{Grad-CAM}}}$ to the input image resolution using bilinear interpolation.

$$\begin{aligned} I_{sm} = \texttt {Upsample}~(L^c_{{\textit{Grad-CAM}}}) \end{aligned}$$

(B3)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Saha, G., Roy, K. Online continual learning with saliency-guided experience replay using tiny episodic memory. Machine Vision and Applications 34, 65 (2023). https://doi.org/10.1007/s00138-023-01420-3

Download citation

Received: 22 March 2023
Revised: 22 April 2023
Accepted: 13 June 2023
Published: 12 July 2023
DOI: https://doi.org/10.1007/s00138-023-01420-3

Keywords

Part of a collection:

Special Issue on WACV 2023

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Online continual learning with saliency-guided experience replay using tiny episodic memory

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Continual learning with selective nets

Uncertainty-aware enhanced dark experience replay for continual learning

Dynamic Memory-Based Continual Learning with Generating and Screening

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix A: Algorithm

Appendix B: Grad-CAM

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now