StemGNN
文章目录
- StemGNN
- 一、项目背景和概念介绍
- 动机:
- 二、应用场景
- 2.1 问题定义
- 三、前提知识
- 自注意力机制
- 四、 项目概念
- 4.1 关于数据
- 4.2 模型架构
- 4.3 自定义参数
- 五、 代码部分
- 总括
- 1、main.py(控制调用顺序和主体框架)
- 2、handler.py(训练、测试、验证等关键环节)
- (1)训练函数
- (2)验证函数
- (3)测试函数
- (4)
- (5)保存模型
- (6)加载模型
- 3、forecast_dataloader.py(数据集处理)
- (1)ForecastDataset类
- 4、base_model.py(模型设计)
- (1)class Model(nn.Module):
- (2)class StockBlockLayer(nn.Module):
- (3)class GLU(nn.Module):
一、项目背景和概念介绍
动机:
对于每一支股票:用t时刻收盘价预测t+1时刻的收盘价
这些所有的股票价格序列之间是否存在某种联系?
首先得到一个先验矩阵(dependency graph),他们认为事先计算出的东西可以用另一种方式代替,在神经网络可以自动学出这种关系。
时间和空间维度能否转到谱的维度上,进行卷积计算?
GFT是将空间输入转到谱域
DFT单变量的时间序列转移到谱域(为什么会出现单变量?因为图经过GFT就变成了一个值)
二、应用场景
谱域GCN:时间+空域——>傅里叶变换——>谱域
多变量依赖问题
2.1 问题定义
给了一系列的时间预测数据X
X = {xit} 是属于RN*T是一个时间序列的
T是时间戳长度,N是节点或者特征的个数(认为节点更方便)
通过一个方法来自动生成N个节点的相互关系——>W:维度是N*N的(adjacency matrix)
工作内容是,利用t-K–>t-1的序列,预测t -->t+H-1的序列
三、前提知识
自注意力机制
来自于NLP中的Transformer
四、 项目概念
4.1 关于数据
训练/测试数据 ECG_data 140列(属性),n行代表的是n个时间戳
模型输入
数据集预处理
4.2 模型架构
Input可以是多个属性的时间序列
Laten Correlation Layer中可以自己学习出一个权重矩阵W,通过W其实就构成了一个图,(矩阵中的每一个值表示两点之间连边的权重)
StemGNNBlock1展开为上面的一张图, 先经过GFT和DFT(时域加空域转为谱域),谱域数据通过GConv和IGFT最后产生了预测值
GFT就是图傅里叶变换
IGFT就是逆傅里叶变换
可以看到在两个StemGNNBlock1之间,StemGNNBlock2接受的X实际上是X-X‘的输入,类似于残差的效果
最后的输出为X1+X2和Y1+Y2
4.3 自定义参数
windowsize:用t时刻前多少(int)个时间戳作为输入数据
horizon:未来预测多少天(int)的数据
五、 代码部分
总括
这是整个项目的结构,可以通过文件夹的名字了解到各个部分的功能。
1、main.py(控制调用顺序和主体框架)
main函数的功能是规定参数、数据集处理、创建和训练模型
第一部分如下,就是初始化参数
parser = argparse.ArgumentParser()
parser.add_argument('--train', type=bool, default=True)
parser.add_argument('--evaluate', type=bool, default=True)
parser.add_argument('--dataset', type=str, default='ECG_data')
parser.add_argument('--window_size', type=int, default=12)# 用t时刻前多少(int)个时间戳作为输入数据
parser.add_argument('--horizon', type=int, default=3)# 未来预测多少天(int)的数据
parser.add_argument('--train_length', type=float, default=7)
parser.add_argument('--valid_length', type=float, default=2)
parser.add_argument('--test_length', type=float, default=1)
parser.add_argument('--epoch', type=int, default=50)
parser.add_argument('--lr', type=float, default=1e-4)
parser.add_argument('--multi_layer', type=int, default=5)
parser.add_argument('--device', type=str, default='cpu')
parser.add_argument('--validate_freq', type=int, default=1)
parser.add_argument('--batch_size', type=int, default=32)
parser.add_argument('--norm_method', type=str, default='z_score')
parser.add_argument('--optimizer', type=str, default='RMSProp')
parser.add_argument('--early_stop', type=bool, default=False)
parser.add_argument('--exponential_decay_step', type=int, default=5)
parser.add_argument('--decay_rate', type=float, default=0.5)
parser.add_argument('--dropout_rate', type=float, default=0.5)
parser.add_argument('--leakyrelu_rate', type=int, default=0.2)
args = parser.parse_args()
print(f'Training configs: {args}')
接下来是读入数据
data_file = os.path.join('dataset', args.dataset + '.csv')
result_train_file = os.path.join('output', args.dataset, 'train')
result_test_file = os.path.join('output', args.dataset, 'test')
if not os.path.exists(result_train_file):
os.makedirs(result_train_file)
if not os.path.exists(result_test_file):
os.makedirs(result_test_file)
data = pd.read_csv(data_file).values
分割数据,产生训练数据、验证数据和测试数据
# split data
train_ratio = args.train_length / (args.train_length + args.valid_length + args.test_length)
valid_ratio = args.valid_length / (args.train_length + args.valid_length + args.test_length)
test_ratio = 1 - train_ratio - valid_ratio
train_data = data[:int(train_ratio * len(data))]
valid_data = data[int(train_ratio * len(data)):int((train_ratio + valid_ratio) * len(data))]
test_data = data[int((train_ratio + valid_ratio) * len(data)):]
torch.manual_seed(0)
最后一部分是执行训练和测试
if __name__ == '__main__':
if args.train:
try:
before_train = datetime.now().timestamp()
_, normalize_statistic = train(train_data, valid_data, args, result_train_file) # 开始训练
after_train = datetime.now().timestamp()
print(f'Training took {(after_train - before_train) / 60} minutes')
except KeyboardInterrupt:
print('-' * 99)
print('Exiting from training early')
if args.evaluate:
before_evaluation = datetime.now().timestamp()
test(test_data, args, result_train_file, result_test_file)
after_evaluation = datetime.now().timestamp()
print(f'Evaluation took {(after_evaluation - before_evaluation) / 60} minutes')
print('done')
2、handler.py(训练、测试、验证等关键环节)
包含以下6个函数:
(1)训练函数
这是我们在main.py中最先会调用的部分
def train(train_data, valid_data, args, result_file):
首先构建模型
node_cnt = train_data.shape[1] # 节点数或属性(140)
model = Model(node_cnt, 2, args.window_size, args.multi_layer, horizon=args.horizon)
model.to(args.device)
if len(train_data) == 0:
raise Exception('Cannot organize enough training data')
if len(valid_data) == 0:
raise Exception('Cannot organize enough validation data')
归一化并存储
# 下面是归一化步骤,对每一列的属性时间序列计算均值和方差,并保存到norm_stat.json中
if args.norm_method == 'z_score':
train_mean = np.mean(train_data, axis=0)
train_std = np.std(train_data, axis=0)
normalize_statistic = {"mean": train_mean.tolist(), "std": train_std.tolist()}
elif args.norm_method == 'min_max':
train_min = np.min(train_data, axis=0)
train_max = np.max(train_data, axis=0)
normalize_statistic = {"min": train_min.tolist(), "max": train_max.tolist()}
else:
normalize_statistic = None
if normalize_statistic is not None:
with open(os.path.join(result_file, 'norm_stat.json'), 'w') as f:
json.dump(normalize_statistic, f)
定义优化器
if args.optimizer == 'RMSProp':
my_optim = torch.optim.RMSprop(params=model.parameters(), lr=args.lr, eps=1e-08)
else:
my_optim = torch.optim.Adam(params=model.parameters(), lr=args.lr, betas=(0.9, 0.999))
my_lr_scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer=my_optim, gamma=args.decay_rate)
利用ForecastDataset对数据进行实质的划分
关于class ForecastDataset,会在forecast_dataloader.py进行介绍
然后定义loss函数为MSELoss
# 进行数据的划分
train_set = ForecastDataset(train_data, window_size=args.window_size, horizon=args.horizon,
normalize_method=args.norm_method, norm_statistic=normalize_statistic)
valid_set = ForecastDataset(valid_data, window_size=args.window_size, horizon=args.horizon,
normalize_method=args.norm_method, norm_statistic=normalize_statistic)
train_loader = torch_data.DataLoader(train_set, batch_size=args.batch_size, drop_last=False, shuffle=True,
num_workers=0)
valid_loader = torch_data.DataLoader(valid_set, batch_size=args.batch_size, shuffle=False, num_workers=0)
forecast_loss = nn.MSELoss(reduction='mean').to(args.device)
total_params = 0
for name, parameter in model.named_parameters():
if not parameter.requires_grad: continue
param = parameter.numel()
total_params += param
print(f"Total Trainable Params: {total_params}")
best_validate_mae = np.inf # 存下最小的mae结果
validate_score_non_decrease_count = 0 # 累计多少次验证集得分不降低就终止,防止过拟合
performance_metrics = {}
接下来开始训练,进行epoch次,对于每一批batch对数据进行训练
如果本次训练产生的误差是小于之前的最好的一次训练,则更新矩阵
for epoch in range(args.epoch):
epoch_start_time = time.time()
model.train()
loss_total = 0
cnt = 0
for i, (inputs, target) in enumerate(train_loader):
# inputs.shape = [32,12,140]
# target.shape = [32,3,140]
# 其中32是每次batch的大小
inputs = inputs.to(args.device)
target = target.to(args.device)
model.zero_grad() # 不同的batch不要累积梯度
forecast, _ = model(inputs)
loss = forecast_loss(forecast, target)
cnt += 1
loss.backward()
my_optim.step()
loss_total += float(loss)
print('| end of epoch {:3d} | time: {:5.2f}s | train_total_loss {:5.4f}'.format(epoch, (
time.time() - epoch_start_time), loss_total / cnt))
save_model(model, result_file, epoch)
if (epoch+1) % args.exponential_decay_step == 0:
my_lr_scheduler.step()
if (epoch + 1) % args.validate_freq == 0:
is_best_for_now = False
print('------ validate on data: VALIDATE ------')
performance_metrics = \
validate(model, valid_loader, args.device, args.norm_method, normalize_statistic,
node_cnt, args.window_size, args.horizon,
result_file=result_file)
if best_validate_mae > performance_metrics['mae']:
best_validate_mae = performance_metrics['mae']
is_best_for_now = True
validate_score_non_decrease_count = 0
else:
validate_score_non_decrease_count += 1
# save model
if is_best_for_now:
save_model(model, result_file)
# early stop
if args.early_stop and validate_score_non_decrease_count >= args.early_stop_step:
break
return performance_metrics, normalize_statistic
(2)验证函数
def validate(model, dataloader, device, normalize_method, statistic,
node_cnt, window_size, horizon,
result_file=None):
(3)测试函数
def test(test_data, args, result_train_file, result_test_file):
(4)
def inference(model, dataloader, device, node_cnt, window_size, horizon):
(5)保存模型
def save_model(model, model_dir, epoch=None):
(6)加载模型
def load_model(model_dir, epoch=None):
3、forecast_dataloader.py(数据集处理)
(1)ForecastDataset类
不作详细介绍,__getitem__是根据传入的参数得到划分出的训练集和测试集
get_x_end_idx是得到所有区间开始位置的索引
class ForecastDataset(torch_data.Dataset):
def __init__(self, df, window_size, horizon, normalize_method=None, norm_statistic=None, interval=1):
self.window_size = window_size
self.interval = interval
self.horizon = horizon
self.normalize_method = normalize_method
self.norm_statistic = norm_statistic
df = pd.DataFrame(df)
df = df.fillna(method='ffill', limit=len(df)).fillna(method='bfill', limit=len(df)).values
self.data = df
self.df_length = len(df)
self.x_end_idx = self.get_x_end_idx()
if normalize_method:
self.data, _ = normalized(self.data, normalize_method, norm_statistic)
def __getitem__(self, index):
hi = self.x_end_idx[index]
lo = hi - self.window_size
train_data = self.data[lo: hi]
target_data = self.data[hi:hi + self.horizon]
x = torch.from_numpy(train_data).type(torch.float)
y = torch.from_numpy(target_data).type(torch.float)
return x, y
def __len__(self):
return len(self.x_end_idx)
def get_x_end_idx(self):
# each element `hi` in `x_index_set` is an upper bound for get training data
# training data range: [lo, hi), lo = hi - window_size
# 从window_size开始(因为第一组预测是在window_size的下一个),到length-horizon+1结束,因为最后一组预测的第一个元素的位置就是self.df_length - self.horizon,因为有预测宽度
x_index_set = range(self.window_size, self.df_length - self.horizon + 1)
# 这里x_end_idx存储着所有从开始到结束的索引
x_end_idx = [x_index_set[j * self.interval] for j in range((len(x_index_set)) // self.interval)]
return x_end_idx
4、base_model.py(模型设计)
便于理解,将框图再次放到这里
(1)class Model(nn.Module):
初始化部分
class Model(nn.Module):
def __init__(self, units, stack_cnt, time_step, multi_layer, horizon=1, dropout_rate=0.5, leaky_rate=0.2,
device='cpu'):
super(Model, self).__init__()
self.unit = units # 输出的维度,一般也是140,因为要算140*140的矩阵
self.stack_cnt = stack_cnt # StemGNNBlock的块数(2)
self.alpha = leaky_rate
self.time_step = time_step # 引用窗口大小
self.horizon = horizon # 预测窗口大小
self.weight_key = nn.Parameter(torch.zeros(size=(self.unit, 1))) # 自注意力机制的Wk
nn.init.xavier_uniform_(self.weight_key.data, gain=1.414)
self.weight_query = nn.Parameter(torch.zeros(size=(self.unit, 1))) # 自注意力机制的Wq
nn.init.xavier_uniform_(self.weight_query.data, gain=1.414)
# 直接调用GRU模型
self.GRU = nn.GRU(self.time_step, self.unit)
self.multi_layer = multi_layer
self.stock_block = nn.ModuleList()
# 初始化两个StemGNNBlock
self.stock_block.extend(
[StockBlockLayer(self.time_step, self.unit, self.multi_layer, stack_cnt=i) for i in range(self.stack_cnt)])
self.fc = nn.Sequential(
nn.Linear(int(self.time_step), int(self.time_step)),
nn.LeakyReLU(),
nn.Linear(int(self.time_step), self.horizon),
)
self.leakyrelu = nn.LeakyReLU(self.alpha)
self.dropout = nn.Dropout(p=dropout_rate)
self.to(device)
def get_laplacian(self, graph, normalize):
"""
return the laplacian of the graph.
:param graph: the graph structure without self loop, [N, N].
:param normalize: whether to used the normalized laplacian.
:return: graph laplacian.
"""
if normalize:
D = torch.diag(torch.sum(graph, dim=-1) ** (-1 / 2))
L = torch.eye(graph.size(0), device=graph.device, dtype=graph.dtype) - torch.mm(torch.mm(D, graph), D)
else:
D = torch.diag(torch.sum(graph, dim=-1))
L = D - graph
return L
返回出切比雪夫多项式
def cheb_polynomial(self, laplacian):
"""
Compute the Chebyshev Polynomial, according to the graph laplacian.
:param laplacian: the graph laplacian, [N, N].
:return: the multi order Chebyshev laplacian, [K, N, N].
"""
N = laplacian.size(0) # [N, N]
laplacian = laplacian.unsqueeze(0)
first_laplacian = torch.zeros([1, N, N], device=laplacian.device, dtype=torch.float)
second_laplacian = laplacian
third_laplacian = (2 * torch.matmul(laplacian, second_laplacian)) - first_laplacian
forth_laplacian = 2 * torch.matmul(laplacian, third_laplacian) - second_laplacian
multi_order_laplacian = torch.cat([first_laplacian, second_laplacian, third_laplacian, forth_laplacian], dim=0)
return multi_order_laplacian
下面这个部分对数据进行处理,然后用GRU进行时序学习,进而通过自注意力机制(self_graph_attention)得到attention
def latent_correlation_layer(self, x):
# 原本[32, 12, 140]的输入变成了[140, 32, 12]的数据
# 因为我们要用GRU对时间序列学习,所以最后的维度应该是时间的序列
input, _ = self.GRU(x.permute(2, 0, 1).contiguous())
input = input.permute(1, 0, 2).contiguous()
# 下一步调用本模型函数得到attention
attention = self.self_graph_attention(input)
# attention.shape:[32,140,140]
attention = torch.mean(attention, dim=0)
# attention.shape:[140,140]
degree = torch.sum(attention, dim=1)
# degree.shape:[140]
# laplacian is sym or not
# 接下来产生一个对称的attention
attention = 0.5 * (attention + attention.T)
degree_l = torch.diag(degree)
# 返回一个以度的逆为对角线的二维矩阵,
diagonal_degree_hat = torch.diag(1 / (torch.sqrt(degree) + 1e-7))
# 对度的逆矩阵开根号
laplacian = torch.matmul(diagonal_degree_hat,
torch.matmul(degree_l - attention, diagonal_degree_hat))
# 得到拉普拉斯矩阵
mul_L = self.cheb_polynomial(laplacian)
#利用切比雪夫公式得到四阶的拉普拉斯矩阵
# mul_L的维度为[4, 140, 140]
# 返回到forward函数中
return mul_L, attention
def self_graph_attention(self, input):
input = input.permute(0, 2, 1).contiguous()
bat, N, fea = input.size()
key = torch.matmul(input, self.weight_key)
# self.weight_key.shape:[140,1]
# key.shape:[32,140,1]
query = torch.matmul(input, self.weight_query)
# repeat就是将每个对应维度长度乘倍数,下面就是将第三个维度乘N
# key就变成了[32,140,140]的维度
data = key.repeat(1, 1, N).view(bat, N * N, 1) + query.repeat(1, N, 1)
# data.shape:[32,140*140,1]
data = data.squeeze(2)
data = data.view(bat, N, -1)
data = self.leakyrelu(data)
# data.shape:[32,140,140]
attention = F.softmax(data, dim=2)
attention = self.dropout(attention)
# attention.shape:[32,140,140]
return attention
def graph_fft(self, input, eigenvectors):
return torch.matmul(eigenvectors, input)
def forward(self, x):
mul_L, attention = self.latent_correlation_layer(x)
X = x.unsqueeze(1).permute(0, 1, 3, 2).contiguous()
result = []
# 开始进入到两层的StockBlockLayer中,传入X和四阶拉欧拉斯矩阵
for stack_i in range(self.stack_cnt):
forecast, X = self.stock_block[stack_i](X, mul_L)
result.append(forecast)
forecast = result[0] + result[1]
forecast = self.fc(forecast)
if forecast.size()[-1] == 1:
return forecast.unsqueeze(1).squeeze(-1), attention
else:
return forecast.permute(0, 2, 1).contiguous(), attention
(2)class StockBlockLayer(nn.Module):
def __init__(self, time_step, unit, multi_layer, stack_cnt=0):
super(StockBlockLayer, self).__init__()
self.time_step = time_step
self.unit = unit
self.stack_cnt = stack_cnt
self.multi = multi_layer
self.weight = nn.Parameter(
torch.Tensor(1, 3 + 1, 1, self.time_step * self.multi,
self.multi * self.time_step)) # [K+1, 1, in_c, out_c]
nn.init.xavier_normal_(self.weight)
self.forecast = nn.Linear(self.time_step * self.multi, self.time_step * self.multi)
self.forecast_result = nn.Linear(self.time_step * self.multi, self.time_step)
if self.stack_cnt == 0:
self.backcast = nn.Linear(self.time_step * self.multi, self.time_step)
self.backcast_short_cut = nn.Linear(self.time_step, self.time_step)
self.relu = nn.ReLU()
self.GLUs = nn.ModuleList()
self.output_channel = 4 * self.multi
for i in range(3):
if i == 0:
self.GLUs.append(GLU(self.time_step * 4, self.time_step * self.output_channel))
self.GLUs.append(GLU(self.time_step * 4, self.time_step * self.output_channel))
elif i == 1:
self.GLUs.append(GLU(self.time_step * self.output_channel, self.time_step * self.output_channel))
self.GLUs.append(GLU(self.time_step * self.output_channel, self.time_step * self.output_channel))
else:
self.GLUs.append(GLU(self.time_step * self.output_channel, self.time_step * self.output_channel))
self.GLUs.append(GLU(self.time_step * self.output_channel, self.time_step * self.output_channel))
def spe_seq_cell(self, input):
batch_size, k, input_channel, node_cnt, time_step = input.size()
input = input.view(batch_size, -1, node_cnt, time_step)
ffted = torch.rfft(input, 1, onesided=False)
real = ffted[..., 0].permute(0, 2, 1, 3).contiguous().reshape(batch_size, node_cnt, -1)
img = ffted[..., 1].permute(0, 2, 1, 3).contiguous().reshape(batch_size, node_cnt, -1)
for i in range(3):
real = self.GLUs[i * 2](real)
img = self.GLUs[2 * i + 1](img)
real = real.reshape(batch_size, node_cnt, 4, -1).permute(0, 2, 1, 3).contiguous()
img = img.reshape(batch_size, node_cnt, 4, -1).permute(0, 2, 1, 3).contiguous()
time_step_as_inner = torch.cat([real.unsqueeze(-1), img.unsqueeze(-1)], dim=-1)
iffted = torch.irfft(time_step_as_inner, 1, onesided=False)
return iffted
输入的X和mul_L相乘
x.shape:[32,1,140,12]
mul_L.shape:[4,140,140]
gfted.shape:[32,4,1,140,12]
def forward(self, x, mul_L):
mul_L = mul_L.unsqueeze(1)
# mul_L.shape:[4,1,140,140]
x = x.unsqueeze(1)
# x.shape:[32,1,1,140,12]
gfted = torch.matmul(mul_L, x)
gconv_input = self.spe_seq_cell(gfted).unsqueeze(2)
igfted = torch.matmul(gconv_input, self.weight)
igfted = torch.sum(igfted, dim=1)
forecast_source = torch.sigmoid(self.forecast(igfted).squeeze(1))
forecast = self.forecast_result(forecast_source)
if self.stack_cnt == 0:
backcast_short = self.backcast_short_cut(x).squeeze(1)
backcast_source = torch.sigmoid(self.backcast(igfted) - backcast_short)
else:
backcast_source = None
return forecast, backcast_source
(3)class GLU(nn.Module):
门控单元:
作用:1. 序列深度建模 2. 减轻梯度弥散,加速收敛
class GLU(nn.Module):
def __init__(self, input_channel, output_channel):
super(GLU, self).__init__()
self.linear_left = nn.Linear(input_channel, output_channel)
self.linear_right = nn.Linear(input_channel, output_channel)
def forward(self, x):
return torch.mul(self.linear_left(x), torch.sigmoid(self.linear_right(x)))