pytorch-3

  • 数据加载和预处理
  • 创建一个Dataset类
  • DataLoader
  • 神经网络包
  • nn.Module
  • nn.functional
  • 模型容器(Containers)
  • 权值初始化
  • 网络构建
  • 优化器
  • 损失函数


数据加载和预处理

pytorch通过torch.utils.data对数据加载进行封装,可以容易地实现多线程数据预读和批量加载。
DataLoader

torch.utils.data.DataLoader(
							dataset,
							batch_size=1,
							shuffle=False,
							num_workers=0,
							drop_last=False)
dataset:Dataset类,决定数据从哪读取及如何读取
batch_size:批次大小
num_workers:是否多线程读取数据
shuffle:每个epoch是否乱序
drop_last:当样本数不能被batchsize整除时,是否舍弃最后一批数据。

Dataset

class Dataset(object):
	def __getitem__(self, index):
		raise NotImplementedError
	def __add__(self, other):
		return ConcatDataset([self, other])
Dataset的抽象类,所有定义的Dataset需要继承它,并且复习。
__getitem__() 
getitem:接收一个索引,返回一个样本。

epoch:所有样本都输入至模型中,称为一个epoch。
Iteration:一批样本输入至模型中,成为一个iteration
Batchsize:批次大小,决定一个epoch有多个iteration。

创建一个Dataset类

from torch.utils.data import Dataset
class MyDataset(Dataset):
	def __init__(self, data_path, data, label):
		self.data_path = data_path
		self.data = data
		self.lable = label
	def __len__(self):
		return len(self.label)
	def __getitem__(self, index):
		return self.data[index], self.label[index]

DataLoader

DataLoader提供了对Dataset的读取操作,对Dataset进行封装返回一个可迭代对象,即可利用迭代器分批读取数据。

神经网络包

torch.nn是专门为神经网络设计的模块化接口。nn构建于Autograd之上,可用于定义和运行网络。

pytorch可以使用多线程吗 pytorch 多线程读取数据_pytorch可以使用多线程吗

nn.Module

pytorch可以使用多线程吗 pytorch 多线程读取数据_pytorch_02


parameters:存储管理nn.Parameter类

modules:存储管理nn.Module类

buffers:存储管理缓冲属性

***_hooks:存储管理钩子函数

nn.functional

nn.functional:包含了神经网络中使用的一些常用函数,这些函数不具有可学习的参数

import torch.nn.functional as F

模型容器(Containers)

pytorch可以使用多线程吗 pytorch 多线程读取数据_pytorch可以使用多线程吗_03


nn.Sequential是nn.Module的容器,用于按顺序包装一组网络层。

顺序性:各网络层之间严格按照顺序构建

自带forward():自带的forward里,通过for循环依次执行前向传播运算。

nn.ModuleList是nn.module的容器,用于包装一组网络层,以迭代方式调用网络层

append():在ModuleList后面添加网络层

extend():拼接两个ModuleList

insert():指定在ModuleList中位置插入网络层

nn.ModuleDict是nn.module的容器,用于包装一组网络层,以索引方式调用网络层

clear():清空ModuleDict

items():返回可迭代的键值对

keys():返回字典的键

values(): 返回字典中的值

pop():返回一对键值对,并从字典中删除,默认删除最后一组。

容器总结:

  • nn.Sequential:顺序性,各网络层之间严格按顺序执行,常用于block构建
  • nn.ModuleList:迭代性,常用于大量重复网络构建,通过for循环实现重复构建
  • nn.ModuleDict:索引性,常用于可选择的网络层

权值初始化

权重初始化的方法封装在torch.nn.init里。具体在使用的时候先初始化层之后直接调。

  • 常数初始化
import torch.nn as nn
nn.init.constant_(w, a)   # w为网络权重参数,a为常数
  • 均匀分布
import torch.nn as nn
nn.init.uniform_(tensor, a=0, b=1)
  • 正态分布
import torch.nn as nn
nn.init.normal_(tensor, mean=0, std=1)
  • Xavier初始化
    基本思想是通过网络层时,输入和输出的方差相同,包括前向传播和后向传播。
    对于Xavier初始化方式,pytorch提供了uniform和normal两种。
import torch.nn as nn
nn.init.xavier_uniform_(tensor, gain=1)   # 均匀分布
nn.init.xavier_normal_(tensor, gain=1) # 正态分布
  • kaiming (He initialization)
import torch.nn as nn
nn.init.kaiming_uniform_(tensor, a=0, mode=‘fan_in’, nonlinearity='leaky_relu') # 均匀分布
nn.init.kaiming_normal_(tensor, a=0, mode=‘fan_in’, nonlinearity='leaky_relu') # 正态分布

举例:

# 单层神经网络
layer1 = torch.nn.Linear(10,20)
torch.nn.init.xavier_uniform_(layer1.weight)
torch.nn.init.constant_(layer1.bias, 0)

# 利用apply
def weight_init(m):
    classname = m.__class__.__name__ # 得到网络层的名字,如ConvTranspose2d
    if classname.find('Conv') != -1:  # 使用了find函数,如果不存在返回值为-1,所以让其不等于-1
        m.weight.data.normal_(0.0, 0.02)
    elif classname.find('BatchNorm') != -1:
        m.weight.data.normal_(1.0, 0.02)
        m.bias.data.fill_(0)

model = net()
model.apply(weight_init)

网络构建

一维卷积神经网络构建

import torch
import torch.nn as nn
import torch.nn.functional as F


class MyNet(nn.Module):
    def __init__(self, config):
        super(MyNet, self).__init__()
        self.config = config
        if config.pretrained:
            self.emb = nn.Embedding.from_pretrained(config.emb)
        else:
            self.emb = nn.Embedding(config.vocav_len, config.emb_size)

        self.conv1 = nn.Conv1d(in_channels=config.emb_size,
                               out_channels=config.outchannles,
                               kernel_size=config.kernel,
                               stride=(1,))
        self.maxpool = nn.MaxPool1d(kernel_size=config.length-config.kernel+1, stride=1)
        self.avgpool = nn.AvgPool1d(kernel_size=config.length-config.kernel+1, stride=1)
        
        self.fc = nn.Linear(config.outchannles, config.classes)
        self.dropout = nn.Dropout(0.3)
        
    def foward(self, x):
        x_emb = self.emb(x)
        x_emb = x_emb.permute(0, 2, 1)
        x_out = self.conv1(x_emb)
        x_out = self.maxpool(x_out)  # self.avgpool(x_conv)
        x_out = self.dropout(x_out)
        x_out = self.fc(x_out)
        return x_out

二维卷积神经网络

import torch
import torch.nn as nn
import torch.nn.functional as F

class MyNet(nn.Module):
    def __init__(self, config):
        super(MyNet, self).__init__()
        self.config = config
        if config.pretrained:
            self.emb = nn.Embedding.from_pretrained(config.emb)
        else:
            self.emb = nn.Embedding(config.vocav_len, config.emb_size)

        self.conv2 = nn.Conv2d(in_channels=1,
                               out_channels=config.filter_num,
                               kernel_size=(config.filter_size, config.emb_size),
                               stride=(1,)
                               )
        self.maxpool = nn.MaxPool1d(kernel_size=config.length-config.kernel+1, stride=1)
        self.avgpool = nn.AvgPool1d(kernel_size=config.length-config.kernel+1, stride=1)
        
        self.fc = nn.Linear(config.outchannles, config.classes)
        self.dropout = nn.Dropout(0.3)
        
    def foward(self, x):
        x_emb = self.emb(x)
        x_emb = x_emb.unsqueeze(1)
        x_out = self.conv2(x_emb)
        x_out = self.maxpool(x_out)  # self.avgpool(x_conv)
        x_out = self.dropout(x_out)
        x_out = self.fc(x_out)
        return x_out

循环神经网络

import torch
import torch.nn as nn
import torch.nn.functional as F


class MyNet(nn.Module):
    def __init__(self, config):
        super(MyNet, self).__init__()
        self.config = config
        if config.pretrained:
            self.emb = nn.Embedding.from_pretrained(config.emb)
        else:
            self.emb = nn.Embedding(config.vocav_len, config.emb_size)

        if config.lstm:
            self.rnn = nn.LSTM(input_size=config.emb_size,
                               hidden_size=config.hidden_size,
                               bidirectional=config.bidirectional,
                               batch_first=True,
                               num_layers=config.lstm_layers)
        else:
            self.rnn = nn.GRU(input_size=config.emb_size,
                              hidden_size=config.hidden_size,
                              bidirectional=config.bidirectional,
                              batch_first=True,
                              num_layers=config.lstm_layers)

        self.fc = nn.Linear(config.outchannles, config.classes)
        self.dropout = nn.Dropout(0.3)

    def foward(self, x):
        x_emb = self.emb(x)
        x_out = self.rnn(x_emb)
        x_out = self.dropout(x_out)
        x_out = self.fc(x_out)
        return x_out

Transformer

import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import copy


class Positional_Encoding(nn.Module):
    def __init__(self, config):
        super(Positional_Encoding, self).__init__()
        self.pe = torch.tensor([[pos / (10000.0 ** (i // 2 * 2.0 / config.embed_size))for i in range(config.embed_size)] for pos in range(config.pad_size)])
        self.pe[:, 0::2] = np.sin(self.pe[:, 0::2])
        self.pe[:, 1::2] = np.cos(self.pe[:, 1::2])
        self.dropout = nn.Dropout(config.dropout)

    def forward(self, x):
        out = x + nn.Parameter(self.pe, requires_grad=False).cuda()
        out = self.dropout(out)
        return out


class Scaled_Dot_Product_Attention(nn.Module):
    def __init__(self):
        super(Scaled_Dot_Product_Attention, self).__init__()

    def forward(self, Q, K, V, scale=None):
        attention = torch.matmul(Q, K.permute(0, 2, 1))
        if scale:
            attention = attention * scale
        attention = F.softmax(attention, dim=-1)
        context = torch.matmul(attention, V)
        return context


class Multi_Head_Attention(nn.Module):
    def __init__(self, config, dropout=0):
        super(Multi_Head_Attention, self).__init__()
        self.num_head = config.Transformer_num_head
        assert config.Transformer_dim_model % config.Transformer_num_head == 0
        self.dim_head = config.Transformer_dim_model // self.num_head
        self.fc_Q = nn.Linear(config.Transformer_dim_model, config.Transformer_num_head * self.dim_head)
        self.fc_K = nn.Linear(config.Transformer_dim_model, config.Transformer_num_head * self.dim_head)
        self.fc_V = nn.Linear(config.Transformer_dim_model, config.Transformer_num_head * self.dim_head)
        self.attention = Scaled_Dot_Product_Attention()
        self.fc = nn.Linear(config.Transformer_num_head * self.dim_head, config.Transformer_dim_model)
        self.dropout = nn.Dropout(dropout)
        self.layer_norm = nn.LayerNorm(config.Transformer_dim_model)

    def forward(self, x):
        batch_size = x.size(0)
        x = torch.tensor(x, dtype=torch.float32).cuda()
        Q = self.fc_Q(x)
        K = self.fc_K(x)
        V = self.fc_V(x)
        Q = Q.view(batch_size * self.num_head, -1, self.dim_head)
        K = K.view(batch_size * self.num_head, -1, self.dim_head)
        V = V.view(batch_size * self.num_head, -1, self.dim_head)
        scale = K.size(-1) ** -0.5
        context = self.attention(Q, K, V, scale)
        context = context.view(batch_size, -1, self.dim_head * self.num_head)
        out = self.fc(context)
        out = self.dropout(out)
        out = out + x
        out = self.layer_norm(out)
        return out


class Position_wise_Feed_Forward(nn.Module):
    def __init__(self, config, dropout=0):
        super(Position_wise_Feed_Forward, self).__init__()
        self.fc1 = nn.Linear(config.Transformer_dim_model, config.Transformer_hidden)
        self.fc2 = nn.Linear(config.Transformer_hidden, config.Transformer_dim_model)
        self.dropout = nn.Dropout(dropout)
        self.layer_norm = nn.LayerNorm(config.Transformer_dim_model)

    def forward(self, x):
        out = self.fc1(x)
        out = F.relu(out)
        out = self.fc2(out)
        out = self.dropout(out)
        out = out + x
        out = self.layer_norm(out)
        return out


class Encoder(nn.Module):
    def __init__(self, config):
        super(Encoder, self).__init__()
        self.attention = Multi_Head_Attention(config, config.dropout)
        self.feed_forward = Position_wise_Feed_Forward(config)

    def forward(self, x):
        out = self.attention(x)
        out = self.feed_forward(out)
        return out


class TRModel(nn.Module):
    def __init__(self, config):
        super(TRModel, self).__init__()
        if config.embedding_pretrained is not None:
            self.embedding = nn.Embedding.from_pretrained(config.embedding_pretrained, freeze=False)
        else:
            self.embedding = nn.Embedding(config.vocab_size, config.embed_size, padding_idx=config.n_vocab-1)
        self.position_embedding = Positional_Encoding(config)
        self.encoder = Encoder(config)
        self.encoders = nn.ModuleList([copy.deepcopy(self.encoder) for _ in range(config.Transformer_num_encoder)])
        self.fc1 = nn.Linear(config.pad_size * config.Transformer_dim_model, config.label_num)

    def forward(self, x):
        out = self.embedding(x)
        out = self.position_embedding(out)
        for encoder in self.encoders:
            out = encoder(out)
        out = out.view(out.size(0), -1)
        out = self.fc1(out)
        return out

优化器

优化器:管理并更新模型中可学习参数的值,使得模型输出更接近真实值。
常用的优化方法为Adam、SGD

import torch.optim as optim

optimizer = optim.Adam(model.parameters(), lr=0.01)
loss = loss_f(out, y)

optimizer.zero_grad()    # 先将梯度清零
loss.backward()       # 反向传播
optimizer.step()       # 参数更新

损失函数

损失函数:衡量模型输出与真实标签的差异

pytorch可以使用多线程吗 pytorch 多线程读取数据_深度学习_04

  • 1 交叉熵损失函数
import torch.nn as nn
nn.CrossEntropyLoss(
					weight=None,
					size_average=None,
					ignore_index=-100,
					reduce=None,
					reduction='mean')
weight:各类别的loss设置权值
ignore_index:忽略某个类别
reduction:计算模式,可为None/sum/mean
			None:逐个元素计算
			sum:所有元素求和,返回标量
			mean:加权平均,返回标量

pytorch可以使用多线程吗 pytorch 多线程读取数据_pytorch_05

  • 2 负对数似然函数
nn.NLLLoss(
			weight=None,
			size_average=None,
			ignore_index=-100,
			reduce=None,
			reduction='mean')
  • 3 二分类交叉熵
nn.BCELoss(
			weight=None,
			size_average=None,
			ignore_index=-100,
			reduce=None,
			reduction='mean')
输入值在[0, 1]
  • 4 结合sigmoid和二分类交叉熵
nn.BCEWithLogitsLoss(
					weight=None,
					size_average=None,
					ignore_index=-100,
					reduce=None,
					reduction='mean')
损失函数自带sigmoid,因此网络最后输出不应加sigmoid
  • 5 计算out和target之差的绝对值
nn.L1Loss
  • 6 计算out和target之差的平方
nn.MSELoss()

pytorch可以使用多线程吗 pytorch 多线程读取数据_神经网络_06

  • 7 SmoothL1Loss
nn.SmoothL1Loss()

pytorch可以使用多线程吗 pytorch 多线程读取数据_深度学习_07

  • 8 泊松分布的负对数似然损失函数
nn.PoissonNLLLoss()

pytorch可以使用多线程吗 pytorch 多线程读取数据_pytorch_08

  • 9 KL散度,相对熵
nn.KLDivLoss()
注:需要在网络中将输出计算log-probabilities
如nn.logsoftmax()

pytorch可以使用多线程吗 pytorch 多线程读取数据_pytorch可以使用多线程吗_09

  • 10 计算两个向量之间的相似度,用于排序任务
nn.MarginRankingLoss()

pytorch可以使用多线程吗 pytorch 多线程读取数据_深度学习_10

  • 11 多标签边界损失函数
nn.MultiLabelMarginLoss()

pytorch可以使用多线程吗 pytorch 多线程读取数据_pytorch可以使用多线程吗_11

  • 12 计算二分类的logistic损失
nn.SoftMarginLoss()

pytorch可以使用多线程吗 pytorch 多线程读取数据_深度学习_12

  • 13 SoftMarginLoss的多标签版本
nn.MultiLableSoftMarginLoss()

pytorch可以使用多线程吗 pytorch 多线程读取数据_pytorch_13

  • 14 计算多分类的折页损失
  • 15 计算三元组损失(人脸识别中常用)
  • 16 计算两个输入的相似性
  • 17 采用余弦相似度计算两个输入的相似性