Abstract
Artificial learning systems aspire to mimic human intelligence by continually learning from a stream of tasks without forgetting past knowledge. One way to enable such learning is to store past experiences in the form of input examples in episodic memory and replay them when learning new tasks. However, performance of such method suffers as the size of the memory becomes smaller. In this paper, we propose a new approach for experience replay, where we select the past experiences by looking at the saliency maps, which provide visual explanations for the model’s decision. Guided by these saliency maps, we pack the memory with only the parts or patches of the input images important for the model’s prediction. While learning a new task, we replay these memory patches with appropriate zero-padding to remind the model about its past decisions. We evaluate our algorithm on CIFAR-100, miniImageNet and CUB datasets and report better performance than the state-of-the-art approaches. We perform a detailed study to show the effectiveness of zero-padded patch replay compared to the other candidate approaches. Moreover, with qualitative and quantitative analyses we show that our method captures richer summaries of past experiences without any memory increase and hence performs well with small episodic memory.





Similar content being viewed by others
References
Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep Learning, vol. 1. MIT Press, Cambridge (2016)
Mccloskey, M., Cohen, N.J.: Catastrophic interference in connectionist networks: the sequential learning problem. Psychol. Learn. Motiv. 24, 104–169 (1989)
Ratcliff, R.: Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. Psychol. Rev. 97(2), 285–308 (1990)
Ring, M.B.: Child: a first step towards continual learning (1998)
Thrun, S., Mitchell, T.M.: Lifelong robot learning. Robotics Auton. Syst. 15, 25–46 (1995)
Chaudhry, A., et al.: Continual learning with tiny episodic memories. arXiv:1902.10486 (2019)
Chaudhry, A., Gordo, A., Dokania, P.K., Torr, P.H., Lopez-Paz, D.: Using hindsight to anchor past knowledge in continual learning (2021)
Riemer, M., et al.: Learning to learn without forgetting by maximizing transfer and minimizing interference (2019)
Buzzega, P., Boschini, M., Porrello, A., Abati, D., Calderara, S.: Dark Experience for General Continual Learning: A Strong, Simple Baseline, vol. 33, pp. 15920–15930. Curran Associates, Inc. (2020)
Mai, Z., Li, R., Jeong, J., Quispe, R., Kim, H., Sanner, S.: Online continual learning in image classification: An empirical survey. Neurocomputing 469, 28–51 (2022)
Knoblauch, J., Husain, H., Diethe, T.: Optimal continual learning has perfect memory and is np-hard (2020)
Prabhu, A., Torr, P., Dokania, P.: Gdumb: a simple approach that questions our progress in continual learning (2020)
Verwimp, E., De Lange, M., Tuytelaars, T.: Rehearsal revealed: the limits and merits of revisiting samples in continual learning, pp. 9385–9394 (2021)
Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps (2014)
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization, pp. 2921–2929 (2016)
Adadi, A., Berrada, M.: Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6, 52138–52160 (2018)
Selvaraju, R.R., et al.: Grad-CAM: visual explanations from deep networks via gradient-based localization (2017)
Zhang, J., et al.: Top-down neural attention by excitation backprop, 126 (2018)
De Lange, M., Tuytelaars, T.: Continual prototype evolution: learning online from non-stationary data streams, pp. 8250–8259 (2021)
Shim, D., et al.: Online class-incremental continual learning with adversarial shapley value 35, 9630–9638 (2021)
Saha, G., Roy, K.: Saliency guided experience packing for replay in continual learning, pp. 5273–5283 (2023)
Delange, M., et al.: A continual learning survey: defying forgetting in classification tasks. IEEE Trans. Pattern Anal. Mach. Intell. 1–1 (2021)
Kirkpatrick, J., et al.: Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. 114, 3521–3526 (2017)
Zenke, F., Poole, B., Ganguli, S.: Continual learning through synaptic intelligence. In: Proceedings of Machine Learning Research, vol. 70, pp. 3987–3995. PMLR (2017)
Aljundi, R., Babiloni, F., Elhoseiny, M., Rohrbach, M., Tuytelaars, T.: Memory aware synapses: learning what (not) to forget, pp. 139–154 (2018)
Li, Z., Hoiem, D.: Learning without forgetting. IEEE Trans. Pattern Anal. Mach. Intell. 40, 2935–2947 (2018)
Dhar, P., Singh, R.V., Peng, K.-C., Wu, Z., Chellappa, R.: Learning without memorizing, pp. 5133–5141 (2019)
Nguyen, C.V., Li, Y., Bui, T.D., Turner, R.E.: Variational continual learning (2018)
Rusu, A.A. et al.: Progressive neural networks. arXiv:1606.04671 (2016)
Yoon, J., Yang, E., Lee, J., Hwang, S.J.: Lifelong learning with dynamically expandable networks (2018)
Mallya, A., Lazebnik, S.: Packnet: adding multiple tasks to a single network by iterative pruning. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7765–7773 (2018)
Serrà, J., Surís, D., Miron, M., Karatzoglou, A.: Overcoming catastrophic forgetting with hard attention to the task. In: Proceedings of Machine Learning Research, vol. 80, pp. 4548–4557. PMLR (2018)
Saha, G., Garg, I., Ankit, A., Roy, K.: SPACE: structured compression and sharing of representational space for continual learning. IEEE Access 1–1 (2021)
Robins, A.V.: Catastrophic forgetting, rehearsal and pseudorehearsal. Connect. Sci. 7, 123–146 (1995)
Rebuffi, S.-A., Kolesnikov, A., Sperl, G., Lampert, C.H.: iCaRL: incremental classifier and representation learning. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5533–5542 (2017)
Shin, H., Lee, J.K., Kim, J., Kim, J.: Continual learning with deep generative replay, vol. 30 (2017)
Farajtabar, M., Azizan, N., Mott, A., Li, A. Chiappa, S., Calandra, R. (eds.): Orthogonal gradient descent for continual learning. In: Chiappa, S., Calandra, R. (eds.) Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, vol. 108, pp. 3762–3773. PMLR (2020)
Saha, G., Garg, I., Roy, K.: Gradient projection memory for continual learning (2021)
Saha, G., Roy, K.: Continual learning with scaled gradient projection. arXiv:2302.01386 (2023). https://ojs.aaai.org/index.php/AAAI/article/view/26157
Aljundi, R., Lin, M., Goujaud, B., Bengio, Y. Wallach, H., et al. (eds.): Gradient based sample selection for online continual learning. In: Wallach, H., et al. (eds.) Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019)
Aljundi, R. et al. (eds.): Online continual learning with maximal interfered retrieval. In: Wallach, H., et al.(eds.) Advances in Neural Information Processing Systems, vol. 32 (2019)
Mai, Z., Li, R., Kim, H., Sanner, S.: Supervised contrastive replay: revisiting the nearest class mean classifier in online class-incremental continual learning, pp. 3589–3599 (2021)
Sun, S., Calandriello, D., Hu, H., Li, A., Titsias, M.: Information-theoretic online memory selection for continual learning (2022). https://openreview.net/forum?id=IpctgL7khPp
Zhang, Y., et al. (eds.): A simple but strong baseline for online continual learning: Repeated augmented rehearsal. In: Oh, A.H., Agarwal, A., Belgrave, D., Cho, K. (eds.) Advances in Neural Information Processing Systems (2022)
Lopez-Paz, D., Ranzato, M.A.: Gradient episodic memory for continual learning, vol. 30 (2017)
Chaudhry, A., Ranzato, M., Rohrbach, M., Elhoseiny, M.: Efficient lifelong learning with A-GEM (2019)
Guo, Y., Liu, M., Yang, T., Rosing, T.: Improved schemes for episodic memory-based lifelong learning. In: Advances in Neural Information Processing Systems, vol. 33 (2020)
Ebrahimi, S., et al.: Remembering for the right reasons: explanations reduce catastrophic forgetting (2021). https://openreview.net/forum?id=tHgJoMfy6nI
Chrysakis, A., Moens, M.-F.: Online continual learning from imbalanced data. In: Proceedings of Machine Learning Research, vol. 119, pp. 1952–1961. PMLR (2020)
Zhao, R., Ouyang, W., Li, H., Wang, X.: Saliency detection by multi-context deep learning, pp. 1265–1274 (2015)
Wang, L., Lu, H., Ruan, X., Yang, M.-H.: Deep networks for saliency detection via local estimation and global search, pp. 3183–3192 (2015)
Chattopadhay, A., Sarkar, A., Howlader, P., Balasubramanian, V.N.: Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks, pp. 839–847. IEEE (2018)
Smilkov, D., Thorat, N., Kim, B., Viégas, F., Wattenberg, M.: SmoothGrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825 (2017)
Srinivas, S., Fleuret, F.: Full-gradient representation for neural network visualization (2019)
Rao, S., Böhle, M., Schiele, B.: Towards better understanding attribution methods, pp. 10223–10232 (2022)
Li, L., et al.: Scouter: slot attention-based classifier for explainable image recognition, pp. 1046–1055 (2021)
Hayes, T.L., Cahill, N.D., Kanan, C.: Memory efficient experience replay for streaming learning. IEEE (2019)
Krizhevsky, A.: Learning multiple layers of features from tiny images. Tech. Rep. (2009)
Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., Wierstra, D.: Matching networks for one shot learning, vol. 29 (2016)
Welinder, P., et al.: Caltech-UCSD Birds 200. Tech. Rep. CNS-TR-2010-001, California Institute of Technology (2010)
Hsu, Y.-C., Liu, Y.-C., Ramasamy, A., Kira, Z.: Re-evaluating continual learning scenarios: a categorization and case for strong baselines. arXiv:1810.12488 (2018)
Yun, S., et al.: Cutmix: regularization strategy to train strong classifiers with localizable features, pp. 6023–6032 (2019)
Rolnick, D., Ahuja, A., Schwarz, J., Lillicrap, T., Wayne, G. Wallach, H., et al. (eds.): Experience replay for continual learning. In: Wallach, H., et al. (eds.) Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019)
Saha, G., Wang, C., Raghunathan, A., Roy, K.: A cross-layer approach to cognitive computing, pp. 1327–1330 (2022)
Mahendran, A., Vedaldi, A.: Visualizing deep convolutional neural networks using natural pre-images. Int. J. Comput. Vis. 120, 233–255 (2016)
Acknowledgements
This work was supported in part by the National Science Foundation, Vannevar Bush Faculty Fellowship, Army Research Office, MURI, and by Center for Brain-Inspired Computing (C-BRIC), one of six centers in JUMP, a SRC program sponsored by DARPA.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Algorithm
Pseudocode for the episodic memory (\({\mathcal {M}}_E\)) update algorithm in EPR is given in Algorithm 2. For each example, I in \({\mathcal {M}}_T\), we generate the corresponding saliency map, \(I_{sm}\) using the chosen saliency method, \(\texttt {XAI}\) (Line 5). Then, we identify a square region (of size \(W_p\times W_p\)) in that map that has the maximum average intensity and find the corresponding image patch, \(I_p\) (Lines 6–7). We then zero-pad the image patch and record the model prediction on it (Lines 8–11). We also record the true image label and image patch coordinates (Lines 12–14). Given the memory budget (\(|{\mathcal {M}}_E|\)) and EPF, we then perform the memory selection procedure (Line 16), where memory patches are selected in priority of their correct predictions after zero-padding (see Sect. 4 for details). Each selected image patch is then added to \({\mathcal {M}}_E\) with corresponding task-ID, class label, and localizable coordinates in the original image (Lines 17–19).
Appendix B: Grad-CAM
Gradient-weighted Class Activation Mapping (Grad-CAM) [17] is a saliency method that uses gradients to determine the impact of specific feature map activations on a given prediction. Since later layers in the convolutional neural network capture high-level semantics [65], taking gradients of a model output with respect to the feature map activations from one such layers identifies which high-level semantics are important for the model prediction. In our analysis, we select this layer and refer to as target layer [48].
Let’s consider the target layer has M feature maps where each feature map, \(A^m \in {\mathbb {R}}^{u\times v}\) is of width u and height v. Also consider, for a given image (\(I \in {\mathbb {R}}^{W\times H\times C}\)) belonging to class c, the pre-softmax score of the image classifier is \(y_c\). To obtain the class-discriminative saliency map, Grad-CAM first takes derivative of \(y_c\) with respect to each feature map \(A^m\). These gradients are then global-average-pooled over u and v to obtain importance weight, \(\alpha _m^c\) for each feature map:
where \(A^m_{ij}\) denotes location (i, j) in the feature map \(A^m\). Next, these weights are used for computing linear combination of the feature map activations, which is then followed by ReLU to obtain the localization map:
This map is of the same size (\(u\times v\)) of \(A^m\). Finally, saliency map, \(I_{sm} \in {\mathbb {R}}^{W\times H}\), is generated by upsampling \(L^c_{{\textit{Grad-CAM}}}\) to the input image resolution using bilinear interpolation.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Saha, G., Roy, K. Online continual learning with saliency-guided experience replay using tiny episodic memory. Machine Vision and Applications 34, 65 (2023). https://doi.org/10.1007/s00138-023-01420-3
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00138-023-01420-3