Abstract
Time series prediction, which obtains historical data of multiple features to predict values of features of interest in the future, is widely used in many fields. One of the critical issues in dealing with the time series prediction task is how to choose appropriate input features. This paper proposes a novel approach to select a sub-optimal feature combination automatically. Our proposed method is model-agnostic that can be integrated with any prediction model. The basic idea is to use a Genetic Algorithm to discover a near-optimal feature combination; the fitness of a solution is calculated based on the accuracy obtained from the prediction model. In addition, to reduce the time complexity, we introduce a strategy to generate training data used in the fitness calculation. The proposed strategy aims to satisfy at the same time two objectives: minimizing the amount of training data, thereby saving the model’s training time, and ensuring the diversity of the data to guarantee the prediction accuracy. The experimental results show that our proposed GA-based feature selection method can improve the prediction accuracy by an average of 28.32% compared to other existing approaches. Moreover, by using the proposed training data generation strategy we can shorten the time complexity by 25.67% to 85.34%, while the prediction accuracy is degraded by only 2.97% on average.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Hanoi dataset. https://bit.ly/hanoi-pm25. (Accessed Nov 2020)
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Brezočnik, L., Fister, I., Podgorelec, V.: Swarm intelligence algorithms for feature selection: A review. Applied Sciences 8(9) (2018)
Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system (2016)
Collischonn, W., Haas, R., Andreolli, I., Tucci, C.E.M.: Forecasting river uruguay flow using rainfall forecasts from a regional weather-prediction model. J. Hydrol. 305(1), 87–98 (2005)
Gui, N., Ge, D., Hu, Z.: Afs: An attention-based mechanism for supervised feature selection. In: AAAI, vol. 33(01) (2019)
Han, K., Wang, Y., Zhang, C., Li, C., Xu, C.: Autoencoder inspired unsupervised feature selection. In: ICASSP, pp. 2941–2945. IEEE (2018)
Haq, A.U., Zhang, D., Peng, H., Rahman, S.U.: Combining multiple feature-ranking techniques and clustering of variables for feature selection. IEEE Access 7, 151482–151492 (2019)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–80 (1997)
Jadhav, S., He, H., Jenkins, K.: Information gain directed genetic algorithm wrapper feature selection for credit rating. Appl. Soft Comput. 69, 541–553 (2018)
Li, J., et al.: Feature selection: A data perspective. ACM Comput. Surv. 50, 1–45 (2016)
Liu, M., et al.: The applicability of lstm-knn model for real-time flood forecasting in different climate zones in china. Water 12(2), 440 (2020)
Nguyen, M.H., Le Nguyen, P., Nguyen, K., Le, V.A., Nguyen, T.H., Ji, Y.: Pm2.5 prediction using genetic algorithm-based feature selection and encoder-decoder model. IEEE Access 9, 57338–57350 (2021)
Oreski, S., Oreski, G.: Genetic algorithm-based heuristic for feature selection in credit risk assessment. Expert Syst. Appli. 41, 2052–2064 (2014)
Pan, M., et al.: Water level prediction model based on gru and cnn. IEEE Access 8, 60090–60100 (2020)
Qi, Y., Li, Q., Karimian, H., Liu, D.: A hybrid model for spatiotemporal forecasting of pm2.5 based on graph convolutional neural network and long short-term memory. Sci. Total Environ. 664, 1–10 (2019)
Hua, R., Fanga, F., Pain, C.C., Navon, I.M.: Rapid spatio-temporal flood prediction and uncertainty quantification using a deep learning method. J. Hydrol. 575, 911–920 (2019)
San José, R., Pérez, J.L., Morant, J.L., González, R.M.: European operational air quality forecasting system by using mm5-cmaq-emimo tool. Simul. Model. Pract. Theory 16(10), 1534–1540 (2008)
Shiri, J., Shamshirband, S., Kisi, O.: Prediction of water-level in the urmia lake using the extreme learning machine approach. Water Resour Manag. 30, 5217–5229 (2016)
Tsai, Y., Zeng, Y., Chang, Y.: Air pollution forecasting using rnn with lstm. In: Proceedings of IEEE DASC/PiCom/DataCom/CyberSciTech, pp. 1074–1079 (2018)
Xue, B., Zhang, M., Browne, W.N., Yao, X.: A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evolut. Comput. 20, 606–626 (2016)
Yahya, K., Zhang, Y., Vukovich, J.M.: Real-time air quality forecasting over the southeastern united states using wrf/chem-madrid: Multiple-year assessment and sensitivity studies. Atmos. Environ. 92, 318–338 (2014)
Acknoledgement
This research is funded by Hanoi University of Science and Technology under grant number T2021-PC-019.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Nguyen, M.H., Nguyen, V.H., Huynh, T.T., Nguyen, T.H., Nguyen, Q.V.H., Nguyen, P.L. (2022). A Lightweight and Efficient GA-Based Model-Agnostic Feature Selection Scheme for Time Series Forecasting. In: Nguyen, N.T., Tran, T.K., Tukayev, U., Hong, TP., Trawiński, B., Szczerbicki, E. (eds) Intelligent Information and Database Systems. ACIIDS 2022. Lecture Notes in Computer Science(), vol 13758. Springer, Cham. https://doi.org/10.1007/978-3-031-21967-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-21967-2_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21966-5
Online ISBN: 978-3-031-21967-2
eBook Packages: Computer ScienceComputer Science (R0)