1 Introduction

Time series (or time sequence) is a significant type of data, which is defined as a series of numerical values of an occurrence in time order. It is widely distributed in many fields, such as clinical medicine [1], power science [2], meteorology [3], finance [4], etc. Time series affects every aspect of our life, and its analysis and prediction has become a research hotspot.

Time series prediction is to model the regular pattern between historical data and future values through the prediction model on the premise of the principle of continuity assumption, so as to achieve the quantitative estimation of the predicted values. Time series forecasting is of great importance in industrial application and daily life. For example, the accurate prediction of power load data can not only facilitate the rational arrangement of energy production, but also avoid the waste caused by the overproduction of electricity [5]. Short-term traffic flow prediction can be used to optimize the urban traffic system and reduce the likelihood of traffic accidents [6]. Therefore, an exact time sequence prediction is of great value in many application scenarios. The current prediction models are facing two challenges: first, there are time correlations and dependencies between variables in the input sequence. How to fully extract and map such complex sequence dependencies with a high-precision prediction model needs to be solved. Second, with the development of the era of big data, the time series is becoming larger in scale and more diverse, and the data presents complex characteristics such as nonlinear, non-stationary and irregularity. This makes it harder for the model to extract dependencies between data.

The basic prediction models can be roughly divided into three types: traditional statistical methods, machine learning methods and deep learning methods. In the statistical methods, the Autoregressive Integral Moving Average (ARIMA) is one of the most classic models. Although the ARIMA is applicable to the prediction of relatively stable and linear data, it cannot effectively forecast the actual field data accurately. After that, machine learning methods, such as Bayesian method, Support Vector Machine (SVM) and Support Vector Regression (SVR), were used for the prediction. Machine learning methods can learn mapping rules from a large amount of historical data to build nonlinear prediction models. Therefore, through iterative training and learning approximations, the machine learning methods can achieve a higher prediction accuracy than the statistical methods. However, the prediction performance of machine learning methods is very dependent on delicate and tedious feature engineering, which makes it unable to obtain a relatively ideal prediction accuracy in many application scenarios. In recent years, deep learning has become more and more popular in time series prediction task because of its powerful feature extraction ability and excellent generalization performance. As we know, the recurrent neural network (RNN) in deep learning methods is considered to be one of the most classical sequence processing networks. RNN is a kind of neural network with feedback structure; it obtains the characteristic information of sequence data in the time direction by passing down the state information of hidden layer, that is, the dependence information between data. Because of this special structure, RNN model has better performance in processing text, speech and other sequence data. However, when the length of the input sequence increases, RNN needs to add more hidden layers, that is, increase the number of network layers, which makes RNN prone to problems such as gradient disappearance and affects the prediction accuracy of the model. Aiming at the deficiency of RNN, an improved model, long short-term memory (LSTM) network, was proposed. LSTM utilizes three gate structures to process the information and introduces Sigmoid activation function to limit the range of output values, so as to alleviate the problem of gradient disappearance and improve the prediction accuracy. In general, LSTM is considered to be one of the state-of-the-art methods for dealing with sequence problems. However, the defects of LSTM are obvious. Although the gate structure in LSTM can alleviate the gradient vanishing and explosion problems, these problems still exist when dealing with very long sequences. What’s more, each cell in LSTM is equivalent to a fully connected (FC) layer, which means that each LSTM unit needs four FC layers. When the time span is large, the quantity of computation is enormous, which raises the difficulty of model training. Recently, a special CNN, temporal convolutional network (TCN), overcomes these shortcomings and performs better than LSTM in temporal sequence prediction tasks.

However, the aforementioned methods only emphasize the prediction module of the model, thereby ignoring the influence of data and lacking an effective data preprocessing. In order to alleviate the effect of data fluctuation, an appropriate data preprocessing is necessary.

In view of the above mentioned challenges, we propose a new hybrid model based on CEEMDAN algorithm and multiple TCNs, called CEEMDAN-TCN model, to complete the time series prediction task. The CEEMDAN algorithm can process nonlinear and non-stationary time sequences adaptively. The original time sequence is decomposed into a certain number of components with various frequencies by means of CEEMDAN algorithm, and then these components are respectively used as the inputs of the multiple TCNs. Finally, all the outputs from the multiple TCNs are integrated to form the ultimate forecast outcome. The simulation results indicate that in contrast with the LSTM model and other hybrid models, the proposed CEEMDAN-TCN model shows a better performance in both univariate and multivariate forecasting tasks.

2 Related Work

In the field of time sequence forecasting, researchers have done a lot of research work, proposed many different prediction schemes, and conducted experimental verification on various time series data such as wind speed data, traffic flow data etc. Finally, with continuous research and practice, the prediction accuracy has been greatly improved. Generally speaking, the research can be roughly divided into four aspects: statistical methods, machine learning methods, deep learning methods and hybrid methods.

In the statistical methods, the ARIMA and its variants are widely studied. For instance, Haneen et al. [7] proposed a forecasting model based on ARIMA to predict the COVID-19 spread. In this method, the accuracy of ARIMA model was investigated and validated over a relatively long period of time using Kuwait as a case study and the experimental results showed that the actual values were within bounds of prediction at 95% confidence interval. Mao et al. [8] applied a seasonal auto-regressive integrated moving average (SARIMA) model to forecast the incidence of tuberculosis in China and the result showed that the SARIMA is a useful tool for monitoring epidemics. However, the ability of statistical methods to deal with complex and nonlinear data is limited.

Compared with the statistical methods, the machine learning methods can process and model more complex data, among which SVR and SVM are commonly used models. For example, Nieto et al. [9] utilized four different models, including SVM, ARIMA, vector autoregressive moving-average (VARMA) and multilayer perceptron (MLP), to predict the value of PM10 in northern Spanish city. The simulations showed that the SVM model performs better than the other models when forecasting one month ahead and also for the following seven months. Liu et al. [10] proposed the network traffic forecast model of SVR algorithm optimized by global artificial fish swarm algorithm (GAFSA). In this method, the optimum training parameters used for SVR could be calculated by GAFSA, which made the forecast more accurate and the prediction results of the proposed model is more stable with an improved precision of more than 89%. Although the machine learning method has a good effect in some prediction tasks and has a solid mathematical theoretical foundation as support, its effect depends on expert experience and manual feature selection, which affect its practical values.

In recent years, models on the basis of deep learning are gaining popularity. Among them, long short-term memory (LSTM) and its related models are of prominent importance. Sun et al. [11] adopted an LSTM model to predict the ionospheric vertical total electron content (TEC) sequence over Beijing and the obtained root mean square error (RMSE) between the predicted values and the ground-truths was 30% lower than a multi-layer perceptron (MLP) network. Hu et al. [12] proposed a model based on a variant of LSTM, in which the forget and input gates were combined into an update gate, and a Sigmoid layer was utilized to control the information update. In this way, the parameters of the model were reduced and the memory module was simpler. Li et al. [13] applied a model which utilizes an evolutionary attention-based LSTM (EA-LSTM) training with competitive random search in multivariate prediction task. This model introduces an evolutionary attention mechanism to transfer the shared parameters, which can confirm the pattern for importance-based attention sampling during temporal relationship mining. The experimental results have illustrated that the proposed model can achieve competitive prediction performance compared with other baseline methods. Lin et al. [14] proposed a novel end-to-end hybrid neural network, called TreNet, to learn local and global contextual features for predicting the trend of time series. TreNet consists of a convolutional neural network (CNN) and an LSTM network. This model exploited the CNN to obtain local information and the LSTM to get the long-term dependence of data. In [15], Ding et al. focused on the need of flood prediction and utilized an interpretable Spatio-Temporal Attention LSTM (STA-LSTM) model. This model was able to enhance the ability of LSTM to capture data features. After the LSTM model, Bai et al. [16] proposed the TCN model has received an extensive attention due to its excellent performance. Yin et al. [17] proposed a multi-temporal-spatial-scale convolutional network in power load forecasting tasks. The experimental results proved that the model was able to obtain a higher accuracy than other short-term power load prediction models. Zhu et al. [18] explored the TCN model to predict the wind power data sequences. The designed method solved the problems of long-term dependencies and reduced the performance degradation of deep convolutional model in sequence prediction by dilated causal convolutions and residual connections. Lin et al. [19] investigated the application of TCN for solar power forecasting and the simulation results showed that the performance of the proposed model is enhanced in contrast with the LSTM and gated recurrent unit (GRU) models. Wang et al. [20] applied a TCN model for short-term load forecasting. The experimental results showed that the TCN model achieved a higher forecast precision with less training time and computational memory. The model’s evaluation index, RMSE, decreased by 51 and 28% compared with SVM and LSTM, respectively.

The study found that using the primary time series data directly to establish prediction model is subject to substantial errors and the characteristics of original data must be fully analyzed and preprocessed. Therefore, the ensemble empirical mode decomposition (EEMD) and its advanced version, the complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) [21, 22], were successively proposed, and further, these algorithms were combined with the deep learning methods to form a hybrid model. Chen et al. [23] applied a combination model to enhance the forecast accuracy of financial time sequences. This model consisted of an empirical mode decomposition (EMD) and an attention-based LSTM model, obtaining an RMSE decrease of 36% compared with the original LSTM. Xuan et al. [24] established a novel prediction model based on EMD, LSTM and Cubic Spline Interpolation (CSI) to improve the accuracy and efficiency of the short-term stock price trend prediction. Cao et al. [25] utilized the CEEMDAN algorithm and LSTM to forecast the stock market prices. Compared with support vector machine (SVM) [26,27,28] and MLP network [29, 30], the experimental results indicated that Cao’s algorithm obtained a better prediction in one-step-ahead forecasting tasks.

3 Methodology

3.1 CEEMDAN Algorithm

The fluctuation and complexity of the original time sequence will affect the function-fitting and subsequent convergence of the TCN model, thus limiting the prediction accuracy of the TCN model. In response to this difficulty, the original nonlinear and non-stationary time sequence is preprocessed by the CEEMDAN algorithm.

Early in 1998, Huang et al. [31] proposed an adaptive signal time–frequency processing method, called EMD algorithm, to decompose a time sequence according to the time scale information without pre-setting of any basis function. In EMD, the original time series can be broken up into a certain amount of intrinsic mode functions (IMFs) and a residual (Res). Different IMF components represent different features of the original data at different time scales [32]. The IMF is regarded as a specific factor that affects the time series, while the Res determines the overall trend of the signal. Although EMD algorithm has an edge over dealing with time series, the “mode mixing” is its defect [33]. Mode mixing refers to the existence of very analogous oscillations in different IMF components or very different amplitudes in an IMF. It can cause the adjacent IMF waveforms to be aliasing and not able to represent the real physical process. By adding a white Gaussian noise to the signal, the EEMD algorithm is able to eliminate the mode mixing in EMD [34]. However, eliminating the noise needs a repeated calculation of average, which spends a considerable amount of computation time. Moreover, the reconstructed signal including the residual noise will cause reconstruction errors and the noisy signal may produce a number of additional IMF components.

To solve the aforementioned problems, P. Flandrin et al. put forward the CEEMDAN algorithm [35]. The CEEMDAN algorithm is a modified form of EEMD with several advantages. It can eliminate mode mixing more efficiently, with a reconstruction error being almost zero and a considerable reduction of computing time [36]. The flowchart of CEEMDAN is shown in Fig. 1 and the decomposition procedure can be given as follows:

Fig. 1
figure 1

Flow chat of the CEEMDAN algorithm

  1. (1)

    Suppose one has a signal \(x\left( t \right), t = 0,1, \ldots ,T - 1\), into which a white Gaussian noise with zero mean and unit variance,\(N(\mathrm{0,1})\), is added:

    $$ y_{i} \left( t \right) = x\left( t \right) + \varepsilon_{0} \delta_{i} \left( t \right),{ }i = 1,2, \ldots ,K $$
    (1)

    where K is the number of white Gaussian noises, \({\varepsilon }_{0}\) is a coefficient of intensity, \({\delta }_{i}(t)\) represents the \(i\mathrm{th}\) realization of a random Gaussian process. From (1), K noisy versions of signals are obtained.

  2. (2)

    The first IMF, \(\overline{{Imf }_{1}}(t)\), is calculated by averaging all the first decomposition components of EMD:

    $$ \overline{{Imf_{1} }} \left( t \right) = \frac{1}{K}\mathop \sum \limits_{i = 1}^{K} EMD_{1} \left( {y_{i} \left( t \right)} \right) $$
    (2)
    $$ r_{1} \left( t \right) = x\left( t \right) - \overline{{Imf_{1} }} \left( t \right) $$
    (3)

    where \(EMD_{1} \left( \cdot \right)\) represents the first IMF component generated by EMD algorithm, \(r_{1} \left( t \right)\) is the residual for the first stage.

  3. (3)

    The signal \({r}_{1}\left(t\right)+{\varepsilon }_{1}{EMD}_{1}({\delta }_{i}(t))\) can be further decomposed by EMD and combined to obtain the second IMF component:

    $$ \left\{ \begin{gathered} \overline{{Imf_{2} }} \left( t \right) = \frac{1}{K}\mathop \sum \limits_{i = 1}^{K} EMD_{1} \left( {r_{1} \left( t \right) + \varepsilon_{1} EMD_{1} \left( {\delta_{i} \left( t \right)} \right)} \right) \hfill \\ \;\;\;\;\;\;\;\;\;\;\;\;\;r_{2} \left( t \right) = r_{1} \left( t \right) - \overline{{Imf_{2} }} \left( t \right) \hfill \\ \end{gathered} \right. $$
    (4)

    where \(r_{2} \left( t \right)\) is the residual of the second stage.

The \(j{\text{th}}\) IMF component and \(j{\text{th}}\) residual can be computed as:

$$ \left\{ \begin{gathered} \overline{{Imf_{j} }} \left( t \right) = \frac{1}{K}\mathop \sum \limits_{i = 1}^{K} EMD_{1} \left( {r_{j - 1} \left( t \right) + \varepsilon_{j - 1} EMD_{j - 1} \left( {\delta_{i} \left( t \right)} \right)} \right) \hfill \\ \;\;\;\;\;\;\;\;\;\;\;\;\;r_{j} \left( t \right) = r_{j - 1} \left( t \right) - \overline{{Imf_{j} }} \left( t \right) \hfill \\ \end{gathered} \right. $$
(5)

where \(EMD_{j - 1} \left( \cdot \right)\) means the \(\left( {j - 1} \right){\text{th}}\) IMF component obtained by EMD algorithm, \(r_{j} \left( t \right)\) is the residual after the \(j{\text{th}}\) decomposition.

(5) \(j\to j+1,\) Eq. (5) is repeated until the residual \({r}_{j}\left(t\right)\) satisfies a preset end standard:

$$ \mathop \sum \limits_{t = 0}^{T - 1} \frac{{\left| {r_{j - 1} \left( t \right) - r_{j} \left( t \right)} \right|^{2} }}{{r_{j - 1}^{2} \left( t \right)}} \le D_{s} $$
(6)

where \({D}_{s}\) is an empirical threshold value [37]. Suppose \(j=N\) when the decomposition procedure stops.

Finally, the time sequence \(x\left(t\right)\) can be expressed as:

$$ x\left( t \right) = \mathop \sum \limits_{n = 1}^{N} \overline{{Imf_{n} }} \left( t \right) + r_{N} \left( t \right) $$
(7)

3.2 Temporal Convolutional Network

In time sequence prediction, the objective of a mapping function,\(F\), is to learn a mapping from input series \(x_{1} ,x_{2} , \ldots ,x_{t}\) to output values\({y}_{t}\). Formally, in univariate prediction task, there is:

$$ \hat{y}_{t} = F\left( {x_{1} ,x_{2} , \ldots ,x_{t} } \right) $$
(8)

where \(x_{t} ,{ }y_{t} \in {\mathbb{R}}\), and \(\hat{y}_{t}\) is an estimate of the real target value \(y_{t}\).

Similarly, in multivariate prediction task, there is:

$$ \hat{y}_{t} = F\left( {x_{1} ,x_{2} , \ldots ,x_{t} } \right) $$
(9)

where \({x}_{t},{y}_{t}\in {\mathbb{R}}^{n}\), and \({\widehat{y}}_{t}\) is an estimate of the real target value \({y}_{t}\).

The mapping function \(F\) is obtained by a supervised learning. Additionally, the mapping function must conform to the causal constraint that \({\widehat{y}}_{t}\) relies only on \(x_{1} ,x_{2} , \ldots ,x_{t}\) rather than any “future” inputs \(x_{t + 1} ,x_{t + 2} , \ldots ,x_{T}\).

Traditional convolutional neural network does not conform to the causal constraints. Moreover, it is not suitable for time series modeling because the receptive field of the network is very small, which is not conducive to capturing long-term dependent information. In order to solve these two problems, a special convolutional neural network, namely TCN, was proposed. A TCN model consists of three parts: causal convolution, dilated convolution and residual connection.

3.2.1 Causal Convolution

When handling with the sequential difficulty, the temporal order of data that is the causality between the data needs to be considered. Therefore, an improvement is made on the ordinary CNN, and the causal convolution is explored. Causal convolution indicates that the present output of the upper network layer only relies on the historical and present input of the lower network layer. The visualization of causal convolution is described in Fig. 2. The causal convolution also uses padding to ensure the shape of the output to be the same as that of the input. But compared with the traditional convolution network, causal convolution uses one-sided padding because of the causal constraints [38]. Although causal convolution meets the condition of causal limitation, it has a serious problem, that is, the receptive field is greatly limited. In Fig. 2, when the convolution filter size is 2 and the number of network layers is 4, the receptive field of causal convolution is only 4. If we want to have a larger receptive field, we need to increase the convolution filter size or add more layers, which results in a huge computation load. The dilated convolution can effectively overcome the difficulty of insufficient receptive field of causal convolution.

Fig. 2
figure 2

Visualization of causal convolution with filter size \(k=2\)

3.2.2 Dilated Convolution

The solution to the problem of causal convolution is to utilize dilated convolution. Oord et al. [39] proposed the dilated convolution, which increases the receptive field of the network by introducing the dilation factor as a hyperparameter. More formally, for a time sequence \(x\in {\mathbb{R}}^{n}\) and a filter \(f:\left\{ {0, \ldots ,k - 1} \right\}\), the operation \(F\) on element \(s\) of the sequence is formulated as follows:

$$ F\left( s \right) = \left( {x{*}f} \right)\left( s \right) = \mathop \sum \limits_{i = 0}^{k - 1} f\left( i \right) \cdot x_{s - d \cdot i} $$
(10)

where \(*\) is the convolutional operation, \(d\) is the dilation factor, \(k\) is the filter size and \( s - d \cdot i \) is the direction of the past.

In the dilated convolution, the dilation means to use a specific step size between each two adjacent filters, which is controlled by the dilation factor \(d\). The receptive field of each layer can be calculated by the following formula:

$$ \left( {k - 1} \right) \cdot \frac{{b^{n} - 1}}{b - 1} + 1 $$
(11)

where \(b\) is the base dilation factor, \(n\) is the number of layers, and \(b={d}^{\frac{1}{n}}\).

In conclusion, there are two ways in which we can enlarge the receptive field: increasing the dilation factor \(d\) and using a larger filter size \(k\). Moreover, \(d\) increases exponentially with the increasing depth of the network, which ensures that some filters can hit each input within the effective history. The construction of dilated causal convolution is shown in Fig. 3.

Fig. 3
figure 3

Dilated causal convolution with a dilation factor of \(d=1, 2, 4\) and a filter size of \(k=2\)

3.3 Proposed Model

The proposed model consists of two parts: decomposition algorithm CEEMDAN and TCN residual block. The original time sequence can be decomposed into several IMF components and a residual by CEEMDAN. With the increase of the order of the components, the frequency of IMF decreases gradually, resulting in the component flattening. This makes the impact of sequence fluctuations be effectively suppressed, so as to further reduce the complexity of temporal data. A reduction of data complexity is conducive to the training of prediction model. On the basis of dilated causal convolution, the method of dropout is used. Moreover, the layer normalization and skip connection are added into the block which are recognized as effective methods for training deep networks. The TCN residual block applied in the proposed model is illustrated in Fig. 4.

Fig. 4
figure 4

TCN residual block

In order to be suitable for supervised learning, the input time sequence is divided into several overlapped samples and marked with corresponding labels, as shown in Fig. 5. When the length of time series is\(L\), the size of time window is\(W\), the length of output is \(O\left( {O \ge 1} \right)\), the input of the \(t\) th sample is \(x_{t} ,x_{t + 1} , \ldots ,x_{t + W - 1}\) and its label is \(x_{t + W} ,x_{t + W + 1} , \ldots ,x_{t + W + O - 1}\).

Fig. 5
figure 5

Supervised learning in time series prediction

After the TCN block, an FC layer is added, with its number of neurons identical to the length of the output, and a linear function is used as the activation function. Mean absolute error (MAE) is chosen as the loss function and the Adam optimizer is used in back propagation. The specific process of the proposed model is shown in Fig. 6.

Fig. 6
figure 6

Flow chat of proposed model

4 Experimental Results and Analysis

4.1 Datasets

The performance of the model was tested in two prediction tasks on three datasets. One prediction task is the univariate sequence prediction and the other is the multivariate sequence prediction. Among the three datasets, one is the ionospheric TECs over Nanchang in Jiangxi Province, P. R. China, the other is the air quality (AQ) data of Nanchang city. The third is of an artificial data. The type of TEC data is univariate sequence, and the type of AQ data is multivariate sequences, corresponding to our two tasks of univariate prediction and multivariate prediction, respectively, as shown in Figs. 7 and 8. The global TEC grid data was downloaded from the International GNSS Services (IGS) center and the data duration spans from January 2003 to August 2017, with a time resolution of 2 h and a spatial resolution of 5° (latitude) × 2.5° (longitude). The air AQ data of Nanchang was downloaded from the Air Quality Historical Data Platform [40] and the time coverage is from January 2015 to March 2021, with a time resolution of 1 day and a total of 6 interdependent variables: PM2.5, PM10, CO, NO2, O3 and SO2. A short-term prediction of the ionospheric TEC data can, to a certain extent, improve the navigation and positioning accuracy based on electromagnetic wave propagation and the working performance of the radio communication system. Research of air quality is of great significance to local and global public health security. In addition, both the datasets are historical data, with which the future sequential data is to be predicted. Therefore, the choice of the datasets is appropriate to test the proposed CEEMDAN-TCN model.

Fig. 7
figure 7

Samples of TEC data of Nanchang

Fig. 8
figure 8

Samples of air quality data of Nanchang

In addition, a dataset of artificial data was chosen. The artificial data is deterministic and facilitates an exact evaluation of the model. Specifically, four sine signals, respectively at frequencies of 25, 30, 40, 50 Hz, initial phases of 0, \(\frac{\pi }{8}\), \(\frac{\pi }{5}\), \(\frac{\pi }{3}\) and the same sampling rate of 2000 Hz, were generated, superimposed, and mixed with a Gaussian noise sequence \(N\left( n \right)\) of zero mean and unit variance:

$$ y_{1} \left( t \right) = 6{\text{sin}}\left( {50\pi t} \right) $$
(12)
$$ y_{2} \left( t \right) = 4{\text{sin}}\left( {60\pi t + \frac{\pi }{8}} \right) $$
(13)
$$ y_{3} \left( t \right) = 2{\text{sin}}\left( {80\pi t + \frac{\pi }{5}} \right) $$
(14)
$$ y_{4} \left( t \right) = {\text{sin}}\left( {100\pi t + \frac{\pi }{3}} \right) $$
(15)

resulting in an artificial sequence of \(y\left( n \right) = y_{1} \left( n \right) + y_{2} \left( n \right) + y_{3} \left( n \right) + y_{4} \left( n \right) + N\left( n \right)\), as shown in Fig. 9.

Fig. 9
figure 9

Samples of the artificial data

4.2 Data Preprocessing

Before the original time sequence is decomposed by the CEEMDAN algorithm, a sequential preprocessing was conducted, including data sorting, data cleaning (removing the duplicate data), data normalization (alleviating the impact of different data scales) and data partitioning. In data partitioning, the dataset is divided into three parts: the training dataset (for model learning), the validation dataset (for model tuning) and the test dataset (for model evaluation), as shown in Table 1. The artificial dataset is divided into the training dataset, validation dataset and test dataset according to a ratio of 7:2:1.

Table 1 Dataset partitioning details (Date format: yyyy/mm/dd)

4.3 Parameter Setting of the Model

There are three parameters in the basic LSTM model and five parameters in the basic TCN model. These parameters are the length of time window \(W\), the size of hidden units \(n\) in LSTM, the batchsize \(b\), the number of filters \(f\), the filter size \(k\), the dilation factor \(d\) and the rate of dropout \(r\), as shown in Table 2.

Table 2 Hyperparameters for LSTM and TCN

4.4 Univariate Experiment

After a series of preprocessing, the CEEMDAN algorithm is applied to decompose the artificial data and TEC data. In the decomposition process, the operation of adding white Gaussian noise was performed 30 times to construct 30 noisy signals, and the decomposition results of artificial data and TEC data are presented in Figs. 10 and 11. As can be seen in Figs. 10 and 11, the frequency of the IMFs gradually decreases with the increasing order. Therefore, the prediction of an IMF and the residual can enhance the forecast precision. Then, the forecasting model is constructed for each component. Eventually, the prediction results are obtained by summing up the outputs of all components.

Fig. 10
figure 10

Results of CEEMDAN decomposition of the artificial data

Fig. 11
figure 11

Results of CEEMDAN decomposition of TEC data of Nanchang

To evaluate the performance of the proposed model numerically, four evaluation metrics are adopted, including root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE) and determination coefficient \({\mathrm{R}}^{2}\). They are calculated as follows:

$$ RMSE = \sqrt {\frac{1}{N}\mathop \sum \limits_{t = 1}^{N} \left( {y_{t} - \hat{y}_{t} } \right)^{2} } $$
(16)
$$ MAE = \frac{1}{N}\mathop \sum \limits_{t = 1}^{N} \left| {y_{t} - \hat{y}_{t} } \right| $$
(17)
$$ MAPE = 100 \times \frac{1}{N}\mathop \sum \limits_{t = 1}^{N} \left| {\frac{{y_{t} - \hat{y}_{t} }}{{y_{t} }}} \right| $$
(18)
$$ R^{2} = 1 - \frac{{\mathop \sum \nolimits_{t = 1}^{N} \left( {y_{t} - \hat{y}_{t} } \right)^{2} }}{{\mathop \sum \nolimits_{t = 1}^{N} \left( {y_{t} - \overline{y}} \right)^{2} }} $$
(19)

where \(\overline{y }\) is the average of the sequence values of \({y}_{t}\). Smaller values of RMSE, MAE and MAPE indicate a smaller deviation for the predicted value from the ground-truth. The determination coefficient \({R}^{2}\in (0, 1)\) represents the degree of model fitting to the data. An \({R}^{2}\) value close to 1 means prediction value very close to the ground-truth.

In the univariate experiment, 9 models are selected for comparison, namely ARIMA, EA-LSTM, TreNet, BPNN, LSTM, TCN, EMD-LSTM, EMD-TCN, CEEMDAN-LSTM and CEEMDAN-TCN, among which EA-LSTM and TreNet are the state-of-the-arts.

The evaluation metrics of different models on artificial data and TEC data are shown in Tables 3 and 4, respectively. Focusing on the RMSE, MAE and MAPE in Tables 3 and 4, it can be seen that the ARIMA and BPNN were the poor performers and all combination models (the last four) achieved better prediction performance. Furthermore, the proposed CEEMDAN-TCN obtained the best forecast results in all the ten prediction models in terms of all the evaluation metrics. As a result, it can be concluded that CEEMDAN-TCN has a more stable and accurate prediction ability.

Table 3 Evaluation metrics of different models on artificial data
Table 4 Evaluation metrics of different models on TEC data

In order to make a cost analysis for the proposed model, the numbers of model parameters are used for comparison, as shown in Table 5. From Table 5, it can be seen that the parameters of TCN is smaller than that of LSTM which indicates the TCN is simpler in structure than the LSTM network. The number of parameters of CEEMDAN-TCN is 9 times that of TCN in that the original data is decomposed into 9 sub-sequences in CEEMDAN, and the nine sub-sequences need to be modeled separately. Considering its significant performance gain, the proposed CEEMDAN-TCN has an acceptable complexity in network structure.

Table 5 Parameters of different models on TEC data

Figure 12 shows the prediction curves and the histogram of evaluation metrics on the artificial data. From Fig. 12, it can be seen that the prediction curves of all models can fit the original data well, and the superiority of the proposed CEEMDAN-TCN can be seen more apparently.

Fig. 12
figure 12

Prediction curves (upper) and bar chart (lower) of evaluation metrics on the artificial data. (AR: ARIMA; EA: EA-LSTM; Tre: TreNet; EL: EMD-LSTM; ET: EMD-TCN; CL: CEEMDAN-LSTM; CT: CEEMDAN-TCN)

Figure 13 shows the short-term prediction curves of all models on TEC data to provide a clearer assessment of the performance. Figure 14 shows the prediction curves and bar chart of evaluation metrics on TEC data in order to visually demonstrate the accuracy of the CEEMDAN-TCN. It can be seen from Figs. 13 and 14 that the prediction error of BPNN is the largest and its prediction curve deviates from the actual data. Moreover, the prediction curve of combination models, especially CEEMDAN-TCN, are closer to the actual data than that of single models. Therefore, it can be concluded that the combination models performed better than single models, and among all combination models, the proposed CEEMDAN-TCN performed the best.

Fig. 13
figure 13

Short-term prediction curves on the TEC data

Fig. 14
figure 14

Prediction curves (upper) and bar chart (lower) of evaluation metrics on the TEC data

4.5 Multivariate Experiment

There are 6 sequences in AQ dataset and the PM2.5 sequence is chosen as the target series for prediction. Then the CEEMDAN operation is carried out on the 6 sequences respectively, and several corresponding IMF components and residuals are obtained. In this paper, only the decomposition results of PM2.5 are shown in Fig. 15. The IMF components of the same order decomposed from different sequences are combined into six 6-dimensional inputs, which are fed respectively into the TCN model for training, and the final prediction result is obtained by summing up the training results of all inputs. Figure 16 shows the first-order IMF components of the 6 sequences of Nanchang.

Fig. 15
figure 15

Results of CEEMDAN decomposition of AQ data of Nanchang

Fig. 16
figure 16

First-order IMF components of 6 AQ sequences of Nanchang

In the experiment, RMSE, MAE, MAPE and \({\mathrm{R}}^{2}\) are again selected as the performance evaluation metrics of the models and 7 comparison models are selected, namely LSTM, TCN, EMD-LSTM, EMD-TCN, CEEMDAN-LSTM, CEEMDAN-TCN and EA-LSTM. As can be seen in Table 6, the RMSE, MAE and MAPE of the proposed model are the smallest among these models, and the \({\mathrm{R}}^{2}\) is the closest to 1, which indicates that the proposed model performs better than all other models. In addition, the fitting performances of the prediction results of the models with the real data are drawn in Fig. 17. As can be seen more intuitively in the figures, the fitting effect of the proposed CEEMDAN-TCN model is superior to all other models.

Table 6 Evaluation indexes of different models on AQ data
Fig. 17
figure 17

Prediction curves (upper) and bar chart (lower) of evaluation metrics on the AQ data

5 Conclusion

In this paper, a novel hybrid model is proposed which combines the CEEMDAN algorithm and TCN. The CEEMDAN algorithm, as an advanced data-adaptive time series processing technology, is applied to decompose the complicated time series into several relatively simple IMF components for facilitation of deliberate analysis and accurate prediction. In the meantime, considering the characteristics of time series, such as causal constraints, TCN model, as a powerful deep learning network, is introduced. Compared the CEEMDAN-LSTM and CEEMDAN-TCN with ARIMA, BPNN, LSTM, TCN, EMD-LSTM, EMD-TCN, EA-LSTM and TreNet, it can be concluded that the time series decomposed by CEEMDAN is more favorable to function-fitting and subsequent model convergence. By Comparing the CEEMDAN-TCN with CEEMDAN-LSTM, it can be concluded that TCN can better preserve the temporal features than LSTM. Besides, the proposed model performs well in univariate and multivariable prediction, and the parameters of the model are acceptable, which reflects its flexibility and expansibility. Finally, in prospect, the proposed model could be able to employ in other time series prediction, such as prediction of finance and weather sequence data.