pytorch-3
- 数据加载和预处理
- 创建一个Dataset类
- DataLoader
- 神经网络包
- nn.Module
- nn.functional
- 模型容器(Containers)
- 权值初始化
- 网络构建
- 优化器
- 损失函数
数据加载和预处理
pytorch通过torch.utils.data对数据加载进行封装,可以容易地实现多线程数据预读和批量加载。
DataLoader
torch.utils.data.DataLoader(
dataset,
batch_size=1,
shuffle=False,
num_workers=0,
drop_last=False)
dataset:Dataset类,决定数据从哪读取及如何读取
batch_size:批次大小
num_workers:是否多线程读取数据
shuffle:每个epoch是否乱序
drop_last:当样本数不能被batchsize整除时,是否舍弃最后一批数据。
Dataset
class Dataset(object):
def __getitem__(self, index):
raise NotImplementedError
def __add__(self, other):
return ConcatDataset([self, other])
Dataset的抽象类,所有定义的Dataset需要继承它,并且复习。
__getitem__()
getitem:接收一个索引,返回一个样本。
epoch:所有样本都输入至模型中,称为一个epoch。
Iteration:一批样本输入至模型中,成为一个iteration
Batchsize:批次大小,决定一个epoch有多个iteration。
创建一个Dataset类
from torch.utils.data import Dataset
class MyDataset(Dataset):
def __init__(self, data_path, data, label):
self.data_path = data_path
self.data = data
self.lable = label
def __len__(self):
return len(self.label)
def __getitem__(self, index):
return self.data[index], self.label[index]
DataLoader
DataLoader提供了对Dataset的读取操作,对Dataset进行封装返回一个可迭代对象,即可利用迭代器分批读取数据。
神经网络包
torch.nn是专门为神经网络设计的模块化接口。nn构建于Autograd之上,可用于定义和运行网络。
nn.Module
parameters:存储管理nn.Parameter类
modules:存储管理nn.Module类
buffers:存储管理缓冲属性
***_hooks:存储管理钩子函数
nn.functional
nn.functional:包含了神经网络中使用的一些常用函数,这些函数不具有可学习的参数
import torch.nn.functional as F
模型容器(Containers)
nn.Sequential是nn.Module的容器,用于按顺序包装一组网络层。
顺序性:各网络层之间严格按照顺序构建
自带forward():自带的forward里,通过for循环依次执行前向传播运算。
nn.ModuleList是nn.module的容器,用于包装一组网络层,以迭代方式调用网络层
append():在ModuleList后面添加网络层
extend():拼接两个ModuleList
insert():指定在ModuleList中位置插入网络层
nn.ModuleDict是nn.module的容器,用于包装一组网络层,以索引方式调用网络层
clear():清空ModuleDict
items():返回可迭代的键值对
keys():返回字典的键
values(): 返回字典中的值
pop():返回一对键值对,并从字典中删除,默认删除最后一组。
容器总结:
- nn.Sequential:顺序性,各网络层之间严格按顺序执行,常用于block构建
- nn.ModuleList:迭代性,常用于大量重复网络构建,通过for循环实现重复构建
- nn.ModuleDict:索引性,常用于可选择的网络层
权值初始化
权重初始化的方法封装在torch.nn.init里。具体在使用的时候先初始化层之后直接调。
- 常数初始化
import torch.nn as nn
nn.init.constant_(w, a) # w为网络权重参数,a为常数
- 均匀分布
import torch.nn as nn
nn.init.uniform_(tensor, a=0, b=1)
- 正态分布
import torch.nn as nn
nn.init.normal_(tensor, mean=0, std=1)
- Xavier初始化
基本思想是通过网络层时,输入和输出的方差相同,包括前向传播和后向传播。
对于Xavier初始化方式,pytorch提供了uniform和normal两种。
import torch.nn as nn
nn.init.xavier_uniform_(tensor, gain=1) # 均匀分布
nn.init.xavier_normal_(tensor, gain=1) # 正态分布
- kaiming (He initialization)
import torch.nn as nn
nn.init.kaiming_uniform_(tensor, a=0, mode=‘fan_in’, nonlinearity='leaky_relu') # 均匀分布
nn.init.kaiming_normal_(tensor, a=0, mode=‘fan_in’, nonlinearity='leaky_relu') # 正态分布
举例:
# 单层神经网络
layer1 = torch.nn.Linear(10,20)
torch.nn.init.xavier_uniform_(layer1.weight)
torch.nn.init.constant_(layer1.bias, 0)
# 利用apply
def weight_init(m):
classname = m.__class__.__name__ # 得到网络层的名字,如ConvTranspose2d
if classname.find('Conv') != -1: # 使用了find函数,如果不存在返回值为-1,所以让其不等于-1
m.weight.data.normal_(0.0, 0.02)
elif classname.find('BatchNorm') != -1:
m.weight.data.normal_(1.0, 0.02)
m.bias.data.fill_(0)
model = net()
model.apply(weight_init)
网络构建
一维卷积神经网络构建
import torch
import torch.nn as nn
import torch.nn.functional as F
class MyNet(nn.Module):
def __init__(self, config):
super(MyNet, self).__init__()
self.config = config
if config.pretrained:
self.emb = nn.Embedding.from_pretrained(config.emb)
else:
self.emb = nn.Embedding(config.vocav_len, config.emb_size)
self.conv1 = nn.Conv1d(in_channels=config.emb_size,
out_channels=config.outchannles,
kernel_size=config.kernel,
stride=(1,))
self.maxpool = nn.MaxPool1d(kernel_size=config.length-config.kernel+1, stride=1)
self.avgpool = nn.AvgPool1d(kernel_size=config.length-config.kernel+1, stride=1)
self.fc = nn.Linear(config.outchannles, config.classes)
self.dropout = nn.Dropout(0.3)
def foward(self, x):
x_emb = self.emb(x)
x_emb = x_emb.permute(0, 2, 1)
x_out = self.conv1(x_emb)
x_out = self.maxpool(x_out) # self.avgpool(x_conv)
x_out = self.dropout(x_out)
x_out = self.fc(x_out)
return x_out
二维卷积神经网络
import torch
import torch.nn as nn
import torch.nn.functional as F
class MyNet(nn.Module):
def __init__(self, config):
super(MyNet, self).__init__()
self.config = config
if config.pretrained:
self.emb = nn.Embedding.from_pretrained(config.emb)
else:
self.emb = nn.Embedding(config.vocav_len, config.emb_size)
self.conv2 = nn.Conv2d(in_channels=1,
out_channels=config.filter_num,
kernel_size=(config.filter_size, config.emb_size),
stride=(1,)
)
self.maxpool = nn.MaxPool1d(kernel_size=config.length-config.kernel+1, stride=1)
self.avgpool = nn.AvgPool1d(kernel_size=config.length-config.kernel+1, stride=1)
self.fc = nn.Linear(config.outchannles, config.classes)
self.dropout = nn.Dropout(0.3)
def foward(self, x):
x_emb = self.emb(x)
x_emb = x_emb.unsqueeze(1)
x_out = self.conv2(x_emb)
x_out = self.maxpool(x_out) # self.avgpool(x_conv)
x_out = self.dropout(x_out)
x_out = self.fc(x_out)
return x_out
循环神经网络
import torch
import torch.nn as nn
import torch.nn.functional as F
class MyNet(nn.Module):
def __init__(self, config):
super(MyNet, self).__init__()
self.config = config
if config.pretrained:
self.emb = nn.Embedding.from_pretrained(config.emb)
else:
self.emb = nn.Embedding(config.vocav_len, config.emb_size)
if config.lstm:
self.rnn = nn.LSTM(input_size=config.emb_size,
hidden_size=config.hidden_size,
bidirectional=config.bidirectional,
batch_first=True,
num_layers=config.lstm_layers)
else:
self.rnn = nn.GRU(input_size=config.emb_size,
hidden_size=config.hidden_size,
bidirectional=config.bidirectional,
batch_first=True,
num_layers=config.lstm_layers)
self.fc = nn.Linear(config.outchannles, config.classes)
self.dropout = nn.Dropout(0.3)
def foward(self, x):
x_emb = self.emb(x)
x_out = self.rnn(x_emb)
x_out = self.dropout(x_out)
x_out = self.fc(x_out)
return x_out
Transformer
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import copy
class Positional_Encoding(nn.Module):
def __init__(self, config):
super(Positional_Encoding, self).__init__()
self.pe = torch.tensor([[pos / (10000.0 ** (i // 2 * 2.0 / config.embed_size))for i in range(config.embed_size)] for pos in range(config.pad_size)])
self.pe[:, 0::2] = np.sin(self.pe[:, 0::2])
self.pe[:, 1::2] = np.cos(self.pe[:, 1::2])
self.dropout = nn.Dropout(config.dropout)
def forward(self, x):
out = x + nn.Parameter(self.pe, requires_grad=False).cuda()
out = self.dropout(out)
return out
class Scaled_Dot_Product_Attention(nn.Module):
def __init__(self):
super(Scaled_Dot_Product_Attention, self).__init__()
def forward(self, Q, K, V, scale=None):
attention = torch.matmul(Q, K.permute(0, 2, 1))
if scale:
attention = attention * scale
attention = F.softmax(attention, dim=-1)
context = torch.matmul(attention, V)
return context
class Multi_Head_Attention(nn.Module):
def __init__(self, config, dropout=0):
super(Multi_Head_Attention, self).__init__()
self.num_head = config.Transformer_num_head
assert config.Transformer_dim_model % config.Transformer_num_head == 0
self.dim_head = config.Transformer_dim_model // self.num_head
self.fc_Q = nn.Linear(config.Transformer_dim_model, config.Transformer_num_head * self.dim_head)
self.fc_K = nn.Linear(config.Transformer_dim_model, config.Transformer_num_head * self.dim_head)
self.fc_V = nn.Linear(config.Transformer_dim_model, config.Transformer_num_head * self.dim_head)
self.attention = Scaled_Dot_Product_Attention()
self.fc = nn.Linear(config.Transformer_num_head * self.dim_head, config.Transformer_dim_model)
self.dropout = nn.Dropout(dropout)
self.layer_norm = nn.LayerNorm(config.Transformer_dim_model)
def forward(self, x):
batch_size = x.size(0)
x = torch.tensor(x, dtype=torch.float32).cuda()
Q = self.fc_Q(x)
K = self.fc_K(x)
V = self.fc_V(x)
Q = Q.view(batch_size * self.num_head, -1, self.dim_head)
K = K.view(batch_size * self.num_head, -1, self.dim_head)
V = V.view(batch_size * self.num_head, -1, self.dim_head)
scale = K.size(-1) ** -0.5
context = self.attention(Q, K, V, scale)
context = context.view(batch_size, -1, self.dim_head * self.num_head)
out = self.fc(context)
out = self.dropout(out)
out = out + x
out = self.layer_norm(out)
return out
class Position_wise_Feed_Forward(nn.Module):
def __init__(self, config, dropout=0):
super(Position_wise_Feed_Forward, self).__init__()
self.fc1 = nn.Linear(config.Transformer_dim_model, config.Transformer_hidden)
self.fc2 = nn.Linear(config.Transformer_hidden, config.Transformer_dim_model)
self.dropout = nn.Dropout(dropout)
self.layer_norm = nn.LayerNorm(config.Transformer_dim_model)
def forward(self, x):
out = self.fc1(x)
out = F.relu(out)
out = self.fc2(out)
out = self.dropout(out)
out = out + x
out = self.layer_norm(out)
return out
class Encoder(nn.Module):
def __init__(self, config):
super(Encoder, self).__init__()
self.attention = Multi_Head_Attention(config, config.dropout)
self.feed_forward = Position_wise_Feed_Forward(config)
def forward(self, x):
out = self.attention(x)
out = self.feed_forward(out)
return out
class TRModel(nn.Module):
def __init__(self, config):
super(TRModel, self).__init__()
if config.embedding_pretrained is not None:
self.embedding = nn.Embedding.from_pretrained(config.embedding_pretrained, freeze=False)
else:
self.embedding = nn.Embedding(config.vocab_size, config.embed_size, padding_idx=config.n_vocab-1)
self.position_embedding = Positional_Encoding(config)
self.encoder = Encoder(config)
self.encoders = nn.ModuleList([copy.deepcopy(self.encoder) for _ in range(config.Transformer_num_encoder)])
self.fc1 = nn.Linear(config.pad_size * config.Transformer_dim_model, config.label_num)
def forward(self, x):
out = self.embedding(x)
out = self.position_embedding(out)
for encoder in self.encoders:
out = encoder(out)
out = out.view(out.size(0), -1)
out = self.fc1(out)
return out
优化器
优化器:管理并更新模型中可学习参数的值,使得模型输出更接近真实值。
常用的优化方法为Adam、SGD
import torch.optim as optim
optimizer = optim.Adam(model.parameters(), lr=0.01)
loss = loss_f(out, y)
optimizer.zero_grad() # 先将梯度清零
loss.backward() # 反向传播
optimizer.step() # 参数更新
损失函数
损失函数:衡量模型输出与真实标签的差异
- 1 交叉熵损失函数
import torch.nn as nn
nn.CrossEntropyLoss(
weight=None,
size_average=None,
ignore_index=-100,
reduce=None,
reduction='mean')
weight:各类别的loss设置权值
ignore_index:忽略某个类别
reduction:计算模式,可为None/sum/mean
None:逐个元素计算
sum:所有元素求和,返回标量
mean:加权平均,返回标量
- 2 负对数似然函数
nn.NLLLoss(
weight=None,
size_average=None,
ignore_index=-100,
reduce=None,
reduction='mean')
- 3 二分类交叉熵
nn.BCELoss(
weight=None,
size_average=None,
ignore_index=-100,
reduce=None,
reduction='mean')
输入值在[0, 1]
- 4 结合sigmoid和二分类交叉熵
nn.BCEWithLogitsLoss(
weight=None,
size_average=None,
ignore_index=-100,
reduce=None,
reduction='mean')
损失函数自带sigmoid,因此网络最后输出不应加sigmoid
- 5 计算out和target之差的绝对值
nn.L1Loss
- 6 计算out和target之差的平方
nn.MSELoss()
- 7 SmoothL1Loss
nn.SmoothL1Loss()
- 8 泊松分布的负对数似然损失函数
nn.PoissonNLLLoss()
- 9 KL散度,相对熵
nn.KLDivLoss()
注:需要在网络中将输出计算log-probabilities
如nn.logsoftmax()
- 10 计算两个向量之间的相似度,用于排序任务
nn.MarginRankingLoss()
- 11 多标签边界损失函数
nn.MultiLabelMarginLoss()
- 12 计算二分类的logistic损失
nn.SoftMarginLoss()
- 13 SoftMarginLoss的多标签版本
nn.MultiLableSoftMarginLoss()
- 14 计算多分类的折页损失
- 15 计算三元组损失(人脸识别中常用)
- 16 计算两个输入的相似性
- 17 采用余弦相似度计算两个输入的相似性