Abstract
The rapid development of machine learning has spurred wide applications to various industries, where prediction models are built to forecast sales to help enterprises and governments make better plans. Alibaba Cloud and the Yancheng Municipal Government held a competition in 2018, calling for global efforts to build machine learning models that can accurately forecast vehicle sales based on large-scale datasets. This paper presents the design, implementation and evaluation of ForeXGBoost, and our proposed model that won the first place in the competition. ForeXGBoost takes full advantage of carefully-designed data filling algorithms for missing values to improve data quality. By using the sliding window to extract historical sales and production data features, ForeXGBoost can improve prediction accuracy. An extensive study is conducted to evaluate the influence of different attributes on vehicle sales via information gain and data correlation, based on which we select the most indicative features from the feature set for prediction. Furthermore, we leverage the XGBoost prediction algorithm to achieve a high prediction accuracy with short running time for vehicle sales prediction. Extensive experiments confirm that ForeXGBoost can achieve a high prediction accuracy with a low overhead.



















Similar content being viewed by others
References
Alibaba Cloud TIANCHI Prediction of Passenger Car Sales Challenge (2018). https://tianchi.aliyun.com/competition/ information.htm?raceId=231640
Astakhova, N.N., Demidova, L.A., Nikulchev, E.V.: Forecasting method for grouped time series with the use of k-means algorithm. Contemp. Eng. Sci. 8(2015), 1659–1677 (2015)
Barfield, J.R., Welch, S., Taylor, T.S., et al.: Prediction of vehicle transactions and targeted advertising using vehicle telematics. US Patent App. 14/197,286 (2015)
Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system. In: ACM International Conference on Knowledge Discovery and Data Mining (2016)
Chen, T., He, T.: Higgs Boson discovery with boosted trees. In: NIPS 2014 Workshop on High-Energy Physics and Machine Learning (2015)
Chen, T., He, T., Benesty, M., et al.: Xgboost: extreme gradient boosting. R package version 04–2, 1–4 (2015a)
Chen, Y., Chen, Q., Zhang, F., Zhang, Q., Wu, K., Huang, R., Zhou, L.: Understanding viewer engagement of video service in wi-fi network. Comput. Netw. 91, 101–116 (2015b)
Do, D., Huynh, P., Vo, P., Vu, T.: Customer churn prediction in an internet service provider. In: IEEE International Conference on Big Data (Big Data), pp. 3928–3933 (2017)
Drucker, H., Cortes, C.: Boosting decision trees. In: Advances in Neural Information Processing Systems, pp. 479–485 (1996)
Fantazzini, D., Toktamysova, Z.: Forecasting German car sales using Google data and multivariate models. Int. J. Product. Econ. 170, 97–135 (2015)
Gao, J., Xie, Y., Gu, F., Xiao, W., Hu, J., Yu, W.: A hybrid optimization approach to forecast automobile sales of China. Adv. Mech. Eng. 9(8), 1687814017719422 (2017a)
Gao, L., Wu, J., Zhou, C., Hu, Y.: Collaborative dynamic sparse topic regression with user profile evolution for item recommendation. In: Thirty-First AAAI Conference on Artificial Intelligence (2017b)
Hassan, M., Yang, M., Rasheed, A., Jin, X., Xia, X., Xiao, Y., He, Z.: Time-series multispectral indices from unmanned aerial vehicle imagery reveal senescence rate in bread wheat. Remote Sens. 10(6), 809 (2018)
Hebert, J.: Predicting rare failure events using classification trees on large scale manufacturing data with complex interactions. In: 2016 IEEE International Conference on Big Data (Big Data), pp. 2024–2028 (2016)
Hong, T., Fan, S.: Probabilistic electric load forecasting: a tutorial review. Int. J. Forecast. 32(3), 914–938 (2016)
Hosmer Jr., D.W., Lemeshow, S., Sturdivant, R.X.: Applied Logistic Regression. Wiley, New York (2013)
Hsu, M.W., Lessmann, S., Sung, M.C., Ma, T., Johnson, J.E.: Bridging the divide in financial market forecasting: machine learners vs. financial economists. Expert Syst. Appl. 61, 215–234 (2016)
Hülsmann, M., Borscheid, D., Friedrich, C.M., Reith, D.: General sales forecast models for automobile markets and their analysis. Trans MLDM 5(2), 65–86 (2012)
Javed, M.A., Zeadally, S., Hamida, E.B.: Data analytics for cooperative intelligent transport systems. Veh. Commun. 15, 63–72 (2019)
Jiang, B., Fei, Y.: Vehicle speed prediction by two-level data driven models in vehicular networks. IEEE Trans. Intell. Transport. Syst. 18(7), 1793–1801 (2016)
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., Liu, T.Y.: Lightgbm: a highly efficient gradient boosting decision tree. In: Advances in Neural Information Processing Systems, pp. 3146–3154 (2017)
Kitapcı, O., Özekicioğlu, H., Kaynar, O., Taştan, S.: The effect of economic policies applied in Turkey to the sale of automobiles: multiple regression and neural network analysis. Procedia 148, 653–661 (2014)
Koochakpour, K., Tarokh, M.J.: Sales budget forecasting and revision by adaptive network fuzzy base inference system and optimization methods. J. Comput. Robot. 9(1), 25–38 (2016)
Kuremoto, T., Kimura, S., Kobayashi, K., Obayashi, M.: Time series forecasting using a deep belief network with restricted Boltzmann machines. Neurocomputing 137, 47–56 (2014)
Lin, K., Lin, Q., Zhou, C., Yao, J.: Time series prediction based on linear regression and SVR. In: IEEE International Conference on Natural Computation (2007)
Ling, X., Deng, W., Gu, C., Zhou, H., Li, C., Sun, F.: Model ensemble for click prediction in bing search ads. In: Proceedings of the 26th International Conference on World Wide Web Companion, pp. 689–698 (2017)
Litman, T.: Autonomous Vehicle Implementation Predictions. Victoria Transport Policy Institute, Victoria (2017)
Lu, W.X., Zhou, C., Wu, J.: Big social network influence maximization via recursively estimating influence spread. Knowl. Based Syst. 113, 143–154 (2016)
Meneguette, R.I.: A vehicular cloud-based framework for the intelligent transport management of big cities. Int. J. Distrib. Sens. Netw. 12(5), 8198597 (2016)
Mitchell, T.M., Learning, M.: Mcgraw-Hill science. Eng. Math. 1, 27 (1997)
Nielsen, D.: Tree Boosting With XGBoost-Why Does XGBoost Win “Every” Machine Learning Competition? Master’s Thesis, NTNU (2016)
Pai, P.F., Liu, C.H.: Predicting vehicle sales by sentiment analysis of twitter data and stock market values. IEEE Access 6, 57655–57662 (2018)
Pavlyshenko, B.M.: Linear, machine learning and probabilistic approaches for time series analysis. In: IEEE First International Conference on Data Stream Mining & Processing (DSMP), pp. 377–381 (2016)
Perallos, A., Hernandez-Jayo, U., Zuazola, I.J.G., Onieva, E.: Intelligent Transport Systems: Technologies and Applications. Wiley, New York (2015)
Sapankevych, N.I., Sankar, R.: Time series prediction using support vector machines: a survey. IEEE Comput. Intell. Mag. 4, 2 (2009)
Scholkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT, Cambridge (2001)
Seber, G.A., Lee, A.J.: Linear Regression Analysis. Wiley, New York (2012)
Sjoberg, K., Andres, P., Buburuzan, T., Brakemeier, A.: Cooperative intelligent transport systems in europe: current deployment status and outlook. IEEE Veh. Technol. Mag. 12(2), 89–97 (2017)
Sładkowski, A., Pamuła, W.: Intelligent Transportation Systems-Problems and Perspectives, vol. 303. Springer, Berlin (2016)
Stein, R.A., Jaques, P.A., Valiati, J.F.: An analysis of hierarchical text classification using word embeddings. Inf. Sci. 471, 216–232 (2019)
Wang, F.K., Chang, K.K., Tzeng, C.W.: Using adaptive network-based fuzzy inference system to forecast automobile sales. Expert Syst. Appl. 38(8), 10587–10593 (2011)
Wang, J., Wang, J., Fang, W., Niu, H.: Financial time series prediction using elman recurrent random neural networks. Comput. Intell. Neurosci. (2016). https://doi.org/10.1155/2016/4742515
Weigend, A.S.: Time Series Prediction: Forecasting the Future and Understanding the Past. Routledge, London (2018)
Wu, J., Cai, Z., Zeng, S., Zhu, X.: Artificial immune system for attribute weighted Naive Bayes classification. In: IEEE the 2013 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2013)
Wu, J., Pan, S., Zhu, X., Zhang, C., Wu, X.: Multi-instance learning with discriminative bag mapping. IEEE Trans. Knowl. Data Eng. 30(6), 1065–1080 (2018)
Ye, J., Chow, J.H., Chen, J., Zheng, Z.: Stochastic gradient boosted distributed decision trees. In: ACM Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 2061–2064 (2009)
Yu, H.F., Rao, N., Dhillon, I.S.: Temporal regularized matrix factorization for high-dimensional time series prediction. In: Advances in Neural Information Processing Systems, pp. 847–855 (2016)
Yuan, C., Liu, S., Fang, Z.: Comparison of China’s primary energy consumption forecasting by using ARIMA (the autoregressive integrated moving average) model and GM (1, 1) model. Energy 100, 384–390 (2016)
Zaytar, M.A., El Amrani, C.: Sequence to sequence weather forecasting with long short-term memory recurrent neural networks. Int. J. Comput. Appl. 143(11), 7–11 (2016)
Zhang, Q., Wu, J., Yang, H., Tian, Y., Zhang, C.: Unsupervised feature learning from time series. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, AAAI Press, IJCAI’16, pp. 2322–2328 (2016)
Zhang, Y., Wu, J., Zhou, C., Cai, Z.: Instance cloned extreme learning machine. Pattern Recogn. 68, 52–65 (2017)
Zhao, K., Wang, C.: Sales Forecast in E-commerce using Convolutional Neural Network. arXiv preprint arXiv:170807946 (2017)
Zheng, H., Yuan, J., Chen, L.: Short-term load forecasting using EMD-LSTM neural networks with a Xgboost algorithm for feature importance evaluation. Energies 10(8), 1168 (2017)
Acknowledgements
Funding was provided by the National Natural Science Foundation of China (Nos. 61772377, 61572370, 91746206), the Natural Science Foundation of Hubei Province of China (No. 2017CFA007), Science and Technology planning project of ShenZhen (JCYJ20170818112550194).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Xia, Z., Xue, S., Wu, L. et al. ForeXGBoost: passenger car sales prediction based on XGBoost. Distrib Parallel Databases 38, 713–738 (2020). https://doi.org/10.1007/s10619-020-07294-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10619-020-07294-y