时间序列模型:严格来说包含4个要素,Trend/趋势、Circle/循环、Seasonal /季节性和不规则要素。但是实际中C和S往往几乎重合,所以模型通常是3个要素。

一组时间序列数据有可能包含T和S,这都导致数据集不平稳。因为T会造成均值跟着时间变化而变化,S会造成方差随时间变动。

在平稳化时间序列数据中,差分/differencing是种用得广&受欢迎的方法。

笔记的目的是为了理解:
  • 平稳的时间序列数据和非平稳的区别,
  • 什么是差分,
  • 怎么用差分把linear trend component从数据中移除,
  • 怎么用差分把seasonal component从数据中移除;
分4部分来解释
1)平稳
  1. 什么是平稳的时间序列:观测值不受时间的影响。
    如果数据是平稳的,就没有T和S的存在,均值和方差也不随时间变动。同时时间序列模型都建立在平稳数据的基础之上。
  2. 判断时间序列数据是否平稳的方法:
    1)plt作图,直接观察;2)更准确的方法是用Dickey-Fuller统计检验(一般是ADfuller test/增强版)

" If we fit a stationary model to data, we assume our data are realization of a stationary process. So our first step in an analysis should be to check whether there is any evidece of a trend or seasonal effects and, if there is, remove them."
——Page 122, Introductory Time Series with R.

2)差分变换
  1. 差分是用于处理时间序列数据的方法。

“Differencing can help stabilize the mean of the time series by removeing changes in the level of a time series, and so eliminating (or reducing) trend and seasonality.”
—— Page 215, Forcasting: principles and practice.

一阶差分 / lag-1 difference:
python resample 按季节 python季节性分解_差分法
复原差分:
python resample 按季节 python季节性分解_季节性因素_02

# 差分的函数
def difference(dataset, interval=1):
	diff = list()
	for i in range(interval, len(dataset)):
		value = dataset[i] - dataset[i-interval]
		diff.append()
	return Series(diff)
	
# 复原差分的函数
def inverse_difference(las_ob, value):
	return value + last_ob

差分特性

使用模型

适用的数据走势类型

一阶差分

一次线性模型

python resample 按季节 python季节性分解_季节性因素_03

二阶差分

二次线性模型

python resample 按季节 python季节性分解_python resample 按季节_04

三阶差分

三次线性模型

python resample 按季节 python季节性分解_时间序列_05

一阶差分的函数pandas是df = df.diff(),二阶的是df = df.diff().diff(),以此类推得到
lag-n difference。

3)差分消除T

T会使得时间序列不平稳,这会让不同时间的均值受影响。直接上例子:

# 先造个差分方程出来
def difference(dataset, interval=1):
   diff = list()
   for i in range(interval, len(dataset)):
   		value = dataset[i] - dataset[i - interval]
   		diff.append(value)
   return diff

# 再造个复原差分的函数
def inverse_difference(last_ob, value):
   return value + last_ob

# 定义个有linear trend的数据集
data = [i+1 for i in range(20)]
print(data)
# 用差分函数处理data
diff = difference(data)
print(diff)
# 复原diff
inverted = [inverse_difference(data[i], diff[i]) for i in range(len(diff))]
print(inverted)

# 结果如下
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
4)差分消除S

所谓S / Seasonal variation/ seasonality,即随着时间周期性出现的波动。

A repeating pattern within each year is known as seasonal variation, although the term is applied more generally to repeating patterns within any fixed period.
—— Page 6, Introductory Time Series with R.

看例子:

from numpy import sin, radians
import matplotlib.pyplot as plt

def difference(dataset, interval=1):
	diff = list()
	for i in range(interval, len(dataset)):
		value = dataset[i] - dataset[i - interval]
		diff.append(value)
	return diff

def inverse_difference(last_ob, value):
	return value + last_ob

data = [sin(radians(i)) for i in range(360)] + [sin(radians(i)) for i in range(360)]
diff = difference(data, 360)
inverted = [inverse_difference(data[i], diff[i]) for i in range(len(diff))]

fig, axes = plt.subplots(3, 1)
axes[0].plot(data)
axes[0].title.set_text('data')

axes[1].plot(diff)
axes[1].title.set_text('diff')

axes[2].plot(inverted)
axes[2].title.set_text('inverted')

plt.tight_layout()
plt.show()

结果输出如下:

python resample 按季节 python季节性分解_差分法_06