EXtreme Gradient Boosting over Decision Trees (XGBoost or XGBDT) is a powerful tool to model a wide range of processes. We propose a new approach to create a global total electron content model, using machine-learning-based techniques, in particular, gradient boosting. The model is based on the Global Ionospheric Maps computed by Universitat Politècnica de Catalunya with a tomographic-kriging combined technique (UQRG). To reduce the problem complexity, we used empirical orthogonal functions (EOFs). The created model involves the first 16 spatial EOFs. For training and validation we used the 1998–2016 data sets, and the 2017 data as a test data set. To drive the model, we used the following features: (1) geomagnetic activity indexes (Kp, Ap, AE, AU, AL) and solar activity indexes (R, F10.7); (2) derivative values from these indexes such as the mean value and standard deviations within the last 12 h, last 11 days, and last 40 days; (3) day of the year (DOY); (4) averaged EOFs for given Kp and UT, and those for a given DOY and UT. The validation data set revealed the following hyperparameters for XGBoost learning: number of trees is 100, tree depth is 6, and learning rate is 0.1. Comparisons with the NeQuick2, Klobuchar, and GEMTEC models show that machine learning achieves higher accuracy for the 2017 test data set. The global averaged root-mean-square errors and mean absolute percentage errors were about 2.5 TECU and 19% for the nonlinear GIMLi-XGBDT model, about 4 TECU and 30–40% for NeQuick2, GEMTEC, and the linear model GIMLi-LM, and about 5.2 TECU and 73% for the Klobuchar model. A 4-fully-connected-layer artificial neural network provided a higher error (3.28 TECU and 27.7%) as compared to GIMLi-XGBDT. For all models mentioned, the error peaked in the equatorial anomaly region. The solar activity increase does not affect the error of the nonlinear GIMLi-XGBDT model. However, an increase in geomagnetic activity strongly affects that model.

GIM data is available through ftp://cddis.gsfc.nasa.gov/gps/products/ionex. Indexes of solar and geomagnetic activity are available through OMNI database (https://omniweb.gsfc.nasa.gov/).
A Correction to this paper has been published: https://doi.org/10.1007/s10291-020-01063-1
The authors thank D.A. Zatolokin for his help in preparing the GEMTEC and Klobuchar model data, and Dr. B. Nava for making the source code of NeQuick2 available. We acknowledge the Universitat Politècnica de Catalunya and the International GNSS Service for the GIM data and OMNI database on solar and geomagnetic activities. The study is supported by the Russian Foundation for Basic Research Grant No. 18-35-20038 and partly by the Ministry of Education and Science (Basic Research program II.16).
Zhukov, A.V., Yasyukevich, Y.V. & Bykov, A.E. GIMLi: Global Ionospheric total electron content model based on machine learning. GPS Solut 25, 19 (2021). https://doi.org/10.1007/s10291-020-01055-1
