Abstract
The natural gradient boosting method for probabilistic regression \((\mathrm {{\textbf {NGBoost}}})\) is capable of predicting not only point estimates but also target distributions under sample conditions, thereby quantifying prediction uncertainty. However, NGBoost is designed only for batch settings, which are not well-suited for data stream learning. In this paper, we present an incremental natural gradient boosting method for probabilistic regression \((\mathrm {{\textbf {INGBoost}}})\). The proposed method employs scoring rule reduction as a metric and applies the Hoeffding inequality incrementally to construct decision trees that fit the natural gradient, thus achieving incremental natural gradient boosting. Experimental results demonstrate that INGBoost performs well in both point regression and probabilistic regression tasks while maintaining the interpretability of the tree model. Furthermore, the model size of INGBoost is significantly smaller than that of NGBoost.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Amari, S.I.: Natural gradient works efficiently in learning. Neural Comput. 10(2), 251–276 (1998)
Asuncion, A., Newman, D.: UCI machine learning repository (2007)
Avati, A., Duan, T., Zhou, S., Jung, K., Shah, N.H., Ng, A.Y.: Countdown regression: sharp and calibrated survival predictions. In: Uncertainty in Artificial Intelligence, pp. 145–155. PMLR (2020)
Bifet, A., Gavaldà, R.: Learning from time-changing data with adaptive windowing. In: Proceedings of the 2007 SIAM International Conference on Data Mining, pp. 443–448. Society for Industrial and Applied Mathematics (2007). https://doi.org/10.1137/1.9781611972771.42
Bifet, A., Gavaldà, R.: Adaptive learning from evolving data streams. In: Adams, N.M., Robardet, C., Siebes, A., Boulicaut, J.-F. (eds.) IDA 2009. LNCS, vol. 5772, pp. 249–260. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03915-7_22
Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016)
Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71–80. ACM (2000). https://doi.org/10.1145/347090.347107
Duan, T., et al.: NGBoost: natural gradient boosting for probabilistic prediction. In: International Conference on Machine Learning, pp. 2690–2700. PMLR (2020)
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001). https://doi.org/10.1214/aos/1013203451
Gama, J., Rocha, R., Medas, P.: Accurate decision trees for mining high-speed data streams. In: Proceedings of the ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 523–528. ACM (2003). https://doi.org/10.1145/956750.956813
Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. J. Am. stat. Assoc. 102(477), 359–378 (2007). https://doi.org/10.1198/016214506000001437
Hoeffding, W.: Probability inequalities for sums of bounded random variables. In: Fisher, N.I., Sen, P.K. (eds.) The Collected Works of Wassily Hoeffding, pp. 409–426. Springer, New York (1994). https://doi.org/10.1007/978-1-4612-0865-5_26
Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proceedings of the seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 97–106. ACM (2001). https://doi.org/10.1145/502512.502529
Ikonomovska, E., Gama, J., Džeroski, S.: Learning model trees from evolving data streams. Data Min. Knowl. Disc. 23(1), 128–168 (2011). https://doi.org/10.1007/s10618-010-0201-y
Ikonomovska, E., Gama, J., Džeroski, S.: Online tree-based ensembles and option trees for regression on evolving data streams. Neurocomputing 150, 458–470 (2015). https://doi.org/10.1016/j.neucom.2014.04.076
Ikonomovska, E., Gama, J., Zenko, B., Dzeroski, S.: Speeding-up hoeffding-based regression trees with options. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 537–544 (2011)
Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Advances in Neural Information Processing Systems 30 (2017)
Maron, O., Moore, A.: Hoeffding races: accelerating model selection search for classification and function approximation. In: Advances in Neural Information Processing Systems 6 (1993)
Mastelini, S.M., Barbon Jr, S., de Carvalho, A.C.P.d.L.F.: Online multi-target regression trees with stacked leaf models. arXiv preprint arXiv:1903.12483 (2019)
Read, J., Bifet, A., Holmes, G., Pfahringer, B.: Scalable and efficient multi-label classification for evolving data streams. Mach. Learn. 88(1–2), 243–272 (2012). https://doi.org/10.1007/s10994-012-5279-6
Acknowments
provincial scientific research institutes’ achievement transformation project of the science and technology department of Sichuan Province, China (2023JDZH0011).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wu, W., Zhang, H., Yang, C., Li, B., Zhao, X. (2023). Incremental Natural Gradient Boosting for Probabilistic Regression. In: Yang, X., et al. Advanced Data Mining and Applications. ADMA 2023. Lecture Notes in Computer Science(), vol 14176. Springer, Cham. https://doi.org/10.1007/978-3-031-46661-8_32
Download citation
DOI: https://doi.org/10.1007/978-3-031-46661-8_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46660-1
Online ISBN: 978-3-031-46661-8
eBook Packages: Computer ScienceComputer Science (R0)