Abstract
New models of random forests jointly using the attention and self-attention mechanisms are proposed for solving the regression problem. The models can be regarded as extensions of the attention-based random forest whose idea stems from applying a combination of the Nadaraya–Watson kernel regression and the Huber’s contamination model to random forests. The self-attention aims to capture dependencies of the tree predictions and to remove noise or anomalous predictions in the random forest. The self-attention module is trained jointly with the attention module for computing weights. It is shown that the training process of attention weights is reduced to solving a single quadratic or linear optimization problem. Three modifications of the self-attention are proposed and compared. A specific multi-head self-attention for the random forest is also considered. Heads of the self-attention are obtained by changing its tuning parameters including the kernel parameters and the contamination parameter of models. The proposed modifications of the attention and self-attention combinations are verified and compared with other random forest models by using several datasets. The code implementing the corresponding algorithms is publicly available.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
Data are available from open sources.
References
Arik, S., Pfister, T.: Tabnet: Attentive interpretable tabular learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 6679–6687 (2021)
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. Preprint at arXiv:1409.0473 (2014)
Beltagy, I., Peters, M., Cohan, A.: Longformer: The long-document transformer.Preprint at arXiv:2004.05150 (2020)
Borisov, V., Leemann, T., Sessler, K., et al.: Deep neural networks and tabular data: A survey. Preprint at arXiv:2110.01889v2 (2021)
Brauwers, G., Frasincar, F.: A general survey on attention mechanisms in deep learning. Preprint at arXiv:2203.14263 (2022)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Chaudhari, S., Mithal, V., Polatkan, G., et al.: An attentive survey of attention models. Preprint at arXiv:1904.02874 (2019)
Chen, Z., Xie, L., Niu, J., et al.: Joint self-attention and scale-aggregation for self-calibrated deraining network. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 2517–2525 (2020)
Cheng, J., Dong, L., Lapata, M.: Long short-term memory-networks for machine reading. Preprint at arXiv:1601.06733 (2016)
Choromanski, K., Chen, H., Lin, H., et al.: Hybrid random features. Preprint at arXiv:2110.04367v2 (2021a)
Choromanski, K., Likhosherstov, V., Dohan, D., et al.: Rethinking attention with performers. In: 2021 International Conference on Learning Representations, pp. 1–38 (2021b)
Correia, A., Colombini, E.: Attention, please! A survey of neural attention models in deep learning. Preprint at arXiv:2103.16775 (2021a)
Correia, A., Colombini, E.: Neural attention models in deep learning: survey and taxonomy. Preprint at arXiv:2112.05909 (2021b)
Daho, M., Settouti, N., Lazouni, M., et al.: Weighted vote for trees aggregation in random forest. In: 2014 International Conference on Multimedia Computing and Systems (ICMCS). IEEE, pp. 438–443 (2014)
Dai, Z., Yang, Z., Yang, Y., et al.: Transformer-xl: attentive language models beyond a fixed-length context. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp. 2978–2988 (2019)
Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Devlin, J., Chang, M., Lee, K., et al.: Bert: Pre-training of deep bidirectional transformers for language understanding. Preprint at arXiv:1810.04805 (2018)
Dua, D., Graff, C.: UCI machine learning repository. (2017). http://archive.ics.uci.edu/ml
Fournier, Q., Caron, G., Aloise, D.: A practical survey on faster and lighter transformers. Preprint at arXiv:2103.14636 (2021)
Friedman, J.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001)
Friedman, J.: Stochastic gradient boosting. Comput. Stat. Data Anal. 38(4), 367–378 (2002)
Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63, 3–42 (2006)
Goncalves, T., Rio-Torto, I., Teixeira, L., et al.: A survey on attention mechanisms for medical applications: are we moving towards better algorithms?. Preprint at arXiv:2204.12406 (2022)
Guo, MH., Liu, ZN., Mu, T.J., et al.: Beyond self-attention: external attention using two linear layers for visual tasks. Preprint at arXiv:2105.02358 (2021)
Hassanin, M., Anwar, S., Radwan, I., et al.: Visual attention methods in deep learning: an in-depth survey. Preprint at arXiv:2204.07756 (2022)
Huber, P.: Robust Statistics. Wiley, New York (1981)
Katzir, L., Elidan, G., El-Yaniv, R.: Net-dnf: effective deep modeling of tabular data. In: 9th International Conference on Learning Representations, ICLR 2021, pp 1–16 (2021)
Khan, S., Naseer, M., Hayat, M., et al.: Transformers in vision: a survey. ACM Comput. Surv. 54, 1–41 (2022)
Kim, H., Kim, H., Moon, H., et al.: A weight-adjusted voting algorithm for ensemble of classifiers. J. Korean Stat. Soc. 40(4), 437–449 (2011)
Konstantinov, A., Utkin, L., Kirpichenko, S.: AGBoost: attention-based modification of gradient boosting machine. In: 31st Conference of Open Innovations Association (FRUCT). IEEE, pp. 96–101 (2022)
Li, H.B., Wang, W., Ding, H.W, et al.: Trees weighting random forest method for classifying high-dimensional noisy data. In: 2010 IEEE 7th International Conference on E-Business Engineering. IEEE, pp. 160–163 (2010)
Li, M., Hsu, W., Xie, X., et al.: SACNN: Self-attention convolutional neural network for low-dose CT denoising with self-supervised perceptual loss network. IEEE Trans. Med. Imaging 39(7), 2289–2301 (2020)
Lin, T., Wang, Y., Liu, X., et al.: A survey of transformers. Preprint at arXiv:2106.04554 (2021)
Lin, Z., Feng, M., dos Santos, C., et al.: A structured self-attentive sentence embedding. In: The 5th International Conference on Learning Representations (ICLR 2017), pp. 1–15 (2017)
Liu, F., Huang, X., Chen, Y., et al.: Random features for kernel approximation: A survey on algorithms, theory, and beyond. Preprint at arXiv:2004.11154v5 (2021a)
Liu, Z., Lin, Y., Cao, Y., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, pp. 10,012–10,022 (2021b)
Luong, T., Pham, H., Manning, C.: Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. The Association for Computational Linguistics, pp. 1412–1421 (2015)
Ma, X., Kong, X., Wang, S., et al.: Luna: Linear unified nested attention. Preprint at arXiv:2106.01540 (2021)
Nadaraya, E.: On estimating regression. Theory Probab. Appl. 9(1), 141–142 (1964)
Niu, Z., Zhong, G., Yu, H.: A review on the attention mechanism of deep learning. Neurocomputing 452, 48–62 (2021)
Parikh, A., Tackstrom, O., Das, D., et al.: A decomposable attention model for natural language inference. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, pp. 2249–2255 (2016)
Parmar, N., Vaswani, A., Uszkoreit, J., et al.: Image transformer. In: International Conference on Machine Learning. PMLR, pp. 4055–4064 (2018)
Peng, H., Pappas, N., Yogatama, D., et al.: Random feature attention. In: International Conference on Learning Representations (ICLR 2021), pp. 1–19 (2021)
Povey, D., Hadian, H., Ghahremani, P., et al.: A time-restricted self-attention layer for ASR. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp. 5874–5878 (2018)
Ramachandran, P., Parmar, N., Vaswani, A., Bello, I., Levskaya, A., Shlens, J.: Stand-alone self-attention in vision models. In: Proceedings of the 33rd Conference on Neural Information Processing Systems, pp.1–13 (2019)
Ronao, C., Cho, S.B.: Random forests with weighted voting for anomalous query access detection in relational databases. In: Artificial Intelligence and Soft Computing. ICAISC 2015, Lecture Notes in Computer Science, vol. 9120, pp. 36–48. Springer, Cham (2015)
Schlag, I., Irie, K., Schmidhuber, J.: Linear transformers are secretly fast weight programmers. In: International Conference on Machine Learning 2021. PMLR, pp. 9355–9366 (2021)
Shen, Z., Bello, I., Vemulapalli, R., et al.: Global self-attention networks for image recognition. Preprint at arXiv:2010.03019 (2020)
Shim, K., Choi, J., Sung, W.: Understanding the role of self attention for efficient speech recognition. In: The Tenth International Conference on Learning Representations (ICLR), pp. 1–19 (2022)
Shwartz-Ziv, R., Amitai, A.: Tabular data: deep learning is not all you need. Inf. Fus. 81, 84–90 (2022)
Somepalli, G., Goldblum, M., Schwarzschild, A., et al.: Saint: Improved neural networks for tabular data via row attention and contrastive pre-training. Preprint at arXiv:2106.01342 (2021)
Soydaner, D.: Attention mechanism in neural networks: where it comes and where it goes. Preprint at arXiv:2204.13154 (2022)
Tay, Y., Dehghani, M., Bahri, D., et al.: Efficient transformers: a survey. ACM Comput. Surv. 55(6), 1–28 (2022)
Tian, C., Fei, L., Zheng, W., et al.: Deep learning on image denoising: an overview. Neural Netw. 131, 251–275 (2020)
Utkin, L., Konstantinov, A.: Attention-based random forest and contamination model. Neural Netw. 154, 346–359 (2022)
Utkin, L., Konstantinov, A., Chukanov, V., et al.: A weighted random survival forest. Knowl.-Based Syst. 177, 136–144 (2019)
Utkin, L., Kovalev, M., Meldo, A.: A deep forest classifier with weights of class probability distribution subsets. Knowl.-Based Syst. 173, 15–27 (2019)
Utkin, L., Konstantinov, A., Chukanov, V., et al.: A new adaptive weighted deep forest and its modifications. Int. J. Inf. Technol. Decis. Mak. 19(4), 963–986 (2020)
Utkin, L., Kovalev, M., Coolen, F.: Imprecise weighted extensions of random forests for classification and regression. Appl. Soft Comput. 92(106324), 1–14 (2020)
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems 30, Curran Associates pp. 5998–6008, (2017)
Vidal, R.: Attention: Self-expression is all you need, iCLR 2022, OpenReview.net. https://openreview.net/forum?id=MmujBClawFo (2022)
Vyas, A., Katharopoulos, A., Fleuret, F.: Fast transformers with clustered attention. In: Advances in Neural Information Processing Systems 33, pp. 21665–21674 (2020)
Wang, F., Jiang, M., Qian, C., et al.: Residual attention network for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3156–3164 (2017)
Wang, X., Girshick, R., Gupta, A., et al.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)
Watson, G.: Smooth regression analysis. Sankhya: Indian J. Stat. Ser. A 26, 359–372 (1964)
Winham, S., Freimuth, R., Biernacka, J.: A weighted random forests approach to improve predictive performance. Stat. Anal. Data Min. 6(6), 496–505 (2013)
Wu, F., Fan, A., Baevski, A., et al.: Pay less attention with lightweight and dynamic convolutions. In: International Conference on Learning Representations (ICLR 2019), pp. 1–14 (2019)
Xu, Y., Wei, H., Lin, M., et al.: Transformers in computational visual media: a survey. Comput. Vis. Media 8(1), 33–62 (2022)
Xuan, S., Liu, G., Li, Z.: Refined weighted random forest and its application to credit card fraud detection. In: Computational Data and Social Networks, pp. 343–355. Springer International Publishing, Cham (2018)
Yu, J., Nie, Y., Long, C., et al.: Monte Carlo denoising via auxiliary feature guided self-attention. ACM Trans. Gr. 40(6), 1–13 (2021)
Zhang, A., Lipton, Z., Li, M., et al.: Dive into deep learning. Preprint at arXiv:2106.11342 (2021)
Zhang, H., Quost, B., Masson, M.H.: Cautious weighted random forests. Expert Syst. Appl. 213, 118883 (2023)
Zhang, X., Wang, M.: Weighted random forest algorithm based on bayesian algorithm. In: Journal of Physics: Conference Series, vol 1924. IOP Publishing, p. 012006 (2021)
Zhao, H., Jia, J., Koltun, V.: Exploring self-attention for image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10076–10085 (2020)
Zheng, L., Wang, C., Kong, L.: Linear complexity randomized self-attention mechanism. In: Proceedings of the 39th International Conference on Machine Learning. PMLR, pp. 27011–27041 (2022)
Zhou, Z.H., Feng, J.: Deep forest: Towards an alternative to deep neural networks. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI’17). AAAI Press, Melbourne, Australia, pp. 3553–3559 (2017)
Zuo, Z., Chen, X., Xu, H., et al.: Idea-net: Adaptive dual self-attention network for single image denoising. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops, pp. 739–748 (2022)
Acknowledgements
The authors would like to express their appreciation to the anonymous referees whose very valuable comments have improved the paper.
Funding
This work is supported by the Russian Science Foundation under grant 21-11-00116.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
I certify that no party having a direct interest in the results of the research supporting this article has or will confer a benefit on me or on any organization with which I am associated, and I certify that all financial and material supports for this research and work are clearly identified in the title page of the manuscript.
Code availability
The corresponding code implementing the method is publicly available https://github.com/andruekonst/forest-self-attention.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Utkin, L.V., Konstantinov, A.V. & Kirpichenko, S.R. Attention and self-attention in random forests. Prog Artif Intell 12, 257–273 (2023). https://doi.org/10.1007/s13748-023-00301-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13748-023-00301-0