文章目录

  • 1. pytorch 的 Tensor保存和加载
  • 2. 保存和提取神经网络
  • 2.1 只保存和加载模型参数(推荐使用,但需要重新模型结构)
  • 2.2 保存和加载整个模型
  • 2.3模型后缀
  • 3.保存-读取模型实例----->语言模型(预测一句话的下一个单词)
  • 3.1 代码
  • 3.1.1数据准备
  • 3.1.2定义模型
  • 3.1.3 训练模型+保存模型
  • 3.1.4 加载训练好的模型:
  • 4.参考


1. pytorch 的 Tensor保存和加载

保存和提取主要使用torch.save和torch.load方法实现保存和提取

import torch
 
test_data = torch.FloatTensor(2,3)
# 保存数据
torch.save(test_data, "test_data.pkl")
 
print test_data
# 提取数据
print torch.load("test_data.pkl")

2. 保存和提取神经网络

2.1 只保存和加载模型参数(推荐使用,但需要重新模型结构)

# 保存
torch.save(the_model.state_dict(), PATH)

# 提取
# 需要先重新模型结构
the_model = TheModelClass(*args, **kwargs)
# 再加载参数
the_model.load_state_dict(torch.load(PATH))

2.2 保存和加载整个模型

如下会保存整个网络,如果数据量比较大,会消耗大量时间。占用的内存也比较高,所以不推荐使用

# 保存
torch.save(the_model, PATH)
# 提取
the_model = torch.load(PATH)

2.3模型后缀

在保存模型时,会看到有的保存为***.pt,有的是***.pth,有的是***.pkl
这几种模型文件效果上没什么不同,只是后缀不一样而已!

3.保存-读取模型实例----->语言模型(预测一句话的下一个单词)

学习目标

  • 学习语言模型,以及如何训练一个语言模型
  • 学习torchtext的基本使用方法
  • 构建 vocabulary
  • word to inde 和 index to word
  • 学习torch.nn的一些基本模型
  • Linear
  • RNN
  • LSTM
  • GRU
  • RNN的训练技巧
  • Gradient Clipping
  • 如何保存和读取模型

3.1 代码

3.1.1数据准备

我们会使用 torchtext 来创建vocabulary, 然后把数据读成batch的格式。请大家自行阅读README来学习torchtext。

import torchtext
from torchtext.vocab import Vectors
import torch
import numpy as np
import random

#cuda是否可用
EXIST_CUDA=torch.cuda.is_available()

#为了保证结果可以复现,经常把random seed固定为一个值
random.seed(1)
np.random.seed(1)
torch.manual_seed(1)
if EXIST_CUDA:
    torch.cuda.manual_seed(1)
    
BATCH_SIZE=32    # 一个batch里有32个句子
EMBEDDING_SIZE=100 #把一个单词embedding为100维
MAX_VOCAB_SIZE=50000 #高频词表最大容量
  • 我们会继续使用上次的text8作为我们的训练,验证和测试数据
  • torchtext提供了LanguageModelingDataset这个class来帮助我们处理语言模型数据集
  • BPTTIterator可以连续地得到连贯的句子
TEXT=torchtext.data.Field(lower=True)
#创建数据集
train,val,test=torchtext.datasets.LanguageModelingDataset.splits(
    path="datas/3/text8/",
    train="text8.train.txt",
    validation="text8.dev.txt",
    test="text8.test.txt",
    text_field=TEXT
)
#创建高频词表
TEXT.build_vocab(train,max_size=MAX_VOCAB_SIZE)
#构建每一个batch,上面定义一个batch有32条句子
device=torch.device("cuda" if EXIST_CUDA else "cpu")
train_iter,val_iter,test_iter=torchtext.data.BPTTIterator.splits(
    (train,val,test),
    batch_size=BATCH_SIZE,
    device=device,
    bptt_len=50,
    repeat=False,
    shuffle=True
)

3.1.2定义模型

  • 继承nn.Module
  • 初始化函数
  • forward函数
  • 其余可以根据模型需要定义相关的函数

模型的输入是一串文字,模型的输出也是一串文字,他们之间相差一个位置,因为语言模型的目标是根据之前的单词预测下一个单词。

import torch.nn as nn
class RNNModel(nn.Module):
    def __init__(self,vocab_size,embed_size,hidden_size):
        super(RNNModel,self).__init__()
        self.embed=nn.Embedding(vocab_size,embed_size)
        self.lstm=nn.LSTM(embed_size,hidden_size)
        self.linear=nn.Linear(hidden_size,vocab_size)
        self.hidden_size=hidden_size
        
    def forward(self,text,hidden):
        #forward pass
        # the shape of text:seq_length * batch_size  
        emb=self.embed(text)#seq_length * batch_size * embed_size
        output,hidden=self.lstm(emb,hidden)
        out_vocab=self.linear(output.view(-1,output.shape[2]))
        out_vocab=out_vocab.view(output.size(0),output.size(1),out_vocab.size(-1))
        return out_vocab,hidden
    
    def init_hidden(self,bsz,requires_grad=True):
        weight=next(self.parameters())
        return (weight.new_zeros((1,bsz,self.hidden_size),requires_grad=True),
                weight.new_zeros((1,bsz,self.hidden_size),requires_grad=True))
#初始化模型
model=RNNModel(vocab_size=len(TEXT.vocab),embed_size=EMBEDDING_SIZE,hidden_size=100)
if EXIST_CUDA:
    model=model.to(device)
loss_fn=nn.CrossEntropyLoss()
learning_rate=0.001
optimizer=torch.optim.Adam(model.parameters(),lr=learning_rate)
NUM_EPOCHS=1 # 全部训练数据训练的轮次
VOCAB_SIZE=len(TEXT.vocab)
GRAD_CLIP=5.0
scheduler=torch.optim.lr_scheduler.ExponentialLR(optimizer,0.5)
val_losses=[10]
  • 我们首先定义评估模型的代码。
  • 模型的评估和模型的训练逻辑基本相同,唯一的区别是我们只需要forward pass,不需要backward pass
#评估
def evaluate(model,data):
    model.eval()
    it=iter(data)
    total_loss=0.
    total_count=0.
    with torch.no_grad():
        hidden=model.init_hidden(BATCH_SIZE,requires_grad=False)
        for i,batch in enumerate(it):
            data,target=batch.text,batch.target
            hidden=repackage_hidden(hidden)
            output,hidden=model(data,hidden)
            loss=loss_fn(output.view(-1,VOCAB_SIZE),target.view(-1))
            total_loss+=loss.item()*np.multiply(*data.size())
            total_count+=np.multiply(*data.size())
    loss=total_loss/total_count        
    model.train()
    return loss

#我们需要定义下面的一个function,帮助我们把一个hidden state和计算图之前的历史分离
def repackage_hidden(h):
    if isinstance(h,torch.Tensor):
        return h.detach()
    else:
        return tuple(repackage_hidden(v) for v in h)

3.1.3 训练模型+保存模型

  • 模型一般需要训练若干个epoch
  • 每个epoch我们都把所有的数据分成若干个batch
  • 把每个batch的输入和输出都包装成cuda tensor
  • forward pass,通过输入的句子预测每个单词的下一个单词
  • 用模型的预测和正确的下一个单词计算cross entropy loss
  • 清空模型当前gradient
  • backward pass
  • gradient clipping,防止梯度爆炸
  • 更新模型参数
  • 每隔一定的iteration输出模型在当前iteration的loss,以及在验证集上做模型的评估
for epoch in range(NUM_EPOCHS):
    model.train()
    it=iter(train_iter)
    hidden=model.init_hidden(BATCH_SIZE)
    for i,batch in enumerate(it):
        data,target=batch.text,batch.target
        hidden=repackage_hidden(hidden)
        output,hidden=model(data,hidden)
        
        loss=loss_fn(output.view(-1,VOCAB_SIZE),target.view(-1))
        optimizer.zero_grad()
        
        loss.backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(),GRAD_CLIP)#防止梯度爆炸
        optimizer.step()
        
        if i%100==0:
            print(i," loss:",loss.item())
            
        #每1900次判断下验证集的loss,如果比之前最小的还要小的话,保存这个模型的参数
        if i%1900==0:
            val_loss=evaluate(model,val_iter)#是用验证集获得当前模型下的loss
            if val_loss < min(val_losses):
                print("best model saved to 03lstm.pth")
                torch.save(model.state_dict(),"model/3/03lstm.pth")
            #发现模型的loss降不下来,因此通过上面定义的方法减小learning_rate
            else:
                #降低learning_rate
                print("learning_rate decay")
                scheduler.step()
            val_losses.append(val_loss)
0  loss: 5.700478553771973
best model saved to 03lstm.pth
100  loss: 5.474410533905029
200  loss: 5.539557456970215
300  loss: 5.759274482727051
400  loss: 5.686248779296875
500  loss: 5.632628917694092
600  loss: 5.481709003448486
700  loss: 5.584092617034912
800  loss: 5.7943501472473145
900  loss: 5.541199207305908
1000  loss: 5.437957763671875
1100  loss: 5.763401031494141
1200  loss: 5.438232898712158
1300  loss: 5.728765487670898
1400  loss: 5.6812005043029785
1500  loss: 5.331437587738037
1600  loss: 5.531680107116699
1700  loss: 5.482674598693848
1800  loss: 5.578347206115723
1900  loss: 5.531027317047119
learning_rate decay
2000  loss: 5.6362833976745605
2100  loss: 5.604646682739258
2200  loss: 5.438443183898926
2300  loss: 5.304264068603516
2400  loss: 5.690061092376709
2500  loss: 5.453220367431641
2600  loss: 5.441572189331055
2700  loss: 5.776185512542725
2800  loss: 5.629850387573242
2900  loss: 5.619969367980957
3000  loss: 5.5757222175598145
3100  loss: 5.772238731384277
3200  loss: 5.692197322845459
3300  loss: 5.51469612121582
3400  loss: 5.358908176422119
3500  loss: 5.429351806640625
3600  loss: 5.5990190505981445
3700  loss: 5.883382797241211
3800  loss: 5.582748889923096
learning_rate decay
3900  loss: 5.5894575119018555
4000  loss: 5.436612606048584
4100  loss: 5.603799819946289
4200  loss: 5.246464729309082
4300  loss: 5.7568840980529785
4400  loss: 5.332048416137695
4500  loss: 5.250970840454102
4600  loss: 5.414524555206299
4700  loss: 5.852789878845215
4800  loss: 5.710803031921387
4900  loss: 5.4412336349487305
5000  loss: 5.87037467956543
5100  loss: 5.393296718597412
5200  loss: 5.630399703979492
5300  loss: 5.1652703285217285
5400  loss: 5.573890209197998
5500  loss: 5.438013076782227
5600  loss: 5.229452610015869
5700  loss: 5.355339527130127
best model saved to 03lstm.pth
5800  loss: 5.6232757568359375
5900  loss: 5.606210708618164
6000  loss: 5.606449604034424
6100  loss: 5.649041652679443
6200  loss: 5.638283729553223
6300  loss: 5.740434169769287
6400  loss: 5.819083213806152
6500  loss: 5.349177837371826
6600  loss: 5.7113494873046875
6700  loss: 5.720933437347412
6800  loss: 5.368650913238525
6900  loss: 5.252537250518799
7000  loss: 5.532567977905273
7100  loss: 5.527868270874023
7200  loss: 5.364249229431152
7300  loss: 5.634284496307373
7400  loss: 5.607549667358398
7500  loss: 5.378734111785889
7600  loss: 5.748443126678467
best model saved to 03lstm.pth
7700  loss: 5.56899356842041
7800  loss: 5.3647565841674805
7900  loss: 5.424122333526611
8000  loss: 5.5352325439453125
8100  loss: 5.26278018951416
8200  loss: 5.719631195068359
8300  loss: 5.376105308532715
8400  loss: 5.5696845054626465
8500  loss: 5.4810261726379395
8600  loss: 5.4345703125
8700  loss: 5.505951404571533
8800  loss: 5.745686054229736
8900  loss: 5.7545599937438965
9000  loss: 5.610304355621338
9100  loss: 5.596979141235352
9200  loss: 5.378000259399414
9300  loss: 5.5428948402404785
9400  loss: 5.66567325592041
9500  loss: 5.3651909828186035
best model saved to 03lstm.pth

3.1.4 加载训练好的模型:

best_model=RNNModel(vocab_size=len(TEXT.vocab),embed_size=EMBEDDING_SIZE,hidden_size=100)
if EXIST_CUDA:
    best_model=best_model.to(device)
best_model.load_state_dict(torch.load("model/3/03lstm.pth"))
<All keys matched successfully>

使用训练好的模型生成句子

hidden = best_model.init_hidden(1)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
input = torch.randint(VOCAB_SIZE, (1, 1), dtype=torch.long).to(device)
words = []
for i in range(100):
    output, hidden = best_model(input, hidden)
    word_weights = output.squeeze().exp().cpu()
    word_idx = torch.multinomial(word_weights, 1)[0]
    input.fill_(word_idx)
    word = TEXT.vocab.itos[word_idx]
    words.append(word)
print(" ".join(words))
unfair trees have some perfect secret use only the boy period per years the density of the pyruvate of steam bass operators often odd this article superior s point can transform its superpower state of parent and the responsibility for each other to who is identical to the royal society against adventurers and is an thrust to form decree of fundamental musicians are cross climate to poland than to be blinded meters and hence for ftp breeding to be defined is presenting strong consequences of the confession include java problem solomon <unk> minuet razor algorithm or opponents dispute by mina

使用训练好的模型在测试数据上计算perplexity

test_loss = evaluate(best_model, test_iter)
print("perplexity: ", np.exp(test_loss))
perplexity:  261.8622638740215

4.参考


https://www.zhihu.com/question/274533811?sort=created