41激活函数与GPU加速
sigmoid /Tanh 会出现梯度离散问题,就是梯度为0(导数为0)
relu 在x=0处不连续,x小于0时梯度为0,x大于0梯度为1不变,利于串行的传播,这样就不会出现梯度爆炸或梯度离散的情况
relu x小于0时梯度为0,为解决这个在x小于0部分 设置了y=a*x,使得有一定的梯度a而不是0,斜角一般默认0.02的样子
selu=relu+指数函数,使得在x=0出也有平滑的曲线变得连续。
softplus, 是relu在x=0做个平滑的曲线使得其在附近连续gpu加速
使用.to(device)
方法
注意data和data.to(device)的类型是不一样的 因为一个是cpu版本一个是gpu版本,除此之外的.to(deviece)是一样的
激活函数改成leakyrelu且搬到gpu上加速, 比原来的relu acc从83%变94%
# 超参数
from torchvision import datasets, transforms
batch_size = 200
learning_rate = 0.01
epochs = 10
# 获取训练数据
train_db = datasets.MNIST('../data', train=True, download=True, # train=True则得到的是训练集
transform=transforms.Compose([ # transform进行数据预处理
transforms.ToTensor(), # 转成Tensor类型的数据
transforms.Normalize((0.1307,), (0.3081,)) # 进行数据标准化(减去均值除以方差)
]))
# DataLoader把训练数据分成多个小组,此函数每次抛出一组数据。直至把所有的数据都抛出。就是做一个数据的初始化
train_loader = torch.utils.data.DataLoader(train_db, batch_size=batch_size, shuffle=True)
# 获取测试数据
test_db = datasets.MNIST('../data', train=False,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
]))
test_loader = torch.utils.data.DataLoader(test_db, batch_size=batch_size, shuffle=True)
class MLP(nn.Module):
def __init__(self):
super(MLP,self).__init__()
self.model=nn.Sequential(#sequential串联起来
nn.Linear(784,200),
nn.LeakyReLU(inplace=True),
nn.Linear(200, 200),
nn.LeakyReLU(inplace=True),
nn.Linear(200,10),
nn.LeakyReLU(inplace=True),
)
def forward(self,x):
x = self.model(x)
return x
#Train
device=torch.device('cuda:0')
net=MLP().to(device)#网络结构 就是foward函数
optimizer=optim.SGD(net.parameters(),lr=learning_rate)#使用nn.Module可以直接代替之前[w1,b1,w2,b2.。。]
criteon=nn.CrossEntropyLoss().to(device)
for epoch in range(epochs):
for batch_ind,(data,target) in enumerate(train_loader):
data=data.view(-1,28*28)
data,target=data.to(device),target.to(device) #target.cuda()
logits=net(data)#这不要再加softmax logits就是pred
loss=criteon(logits,target)#求loss
optimizer.zero_grad()
loss.backward()
optimizer.step()
if batch_ind%100==0:
print('Train Epoch:{} [{}/{} ({:.0f}%)]\t Loss:{:.6f}'.format(
epoch,batch_ind*len(data),len(train_loader.dataset),
100.* batch_ind/len(train_loader),loss.item()
))
test_loss=0
correct=0
for data,target in test_loader:
data=data.view(-1,28*28)#第一个维度保持不变写-1
data, target = data.to(device), target.to(device)
logits=net(data)
test_loss+=criteon(logits,target).item()
pred=logits.data.max(1)[1]#因为
correct+=pred.eq(target.data).sum()
test_loss/=len(train_loader.dataset)
print('\n test set:average loss:{:.4f},Accuracy:{}/{} ({:.0f}%)\n'.format(
test_loss,correct,len(test_loader.dataset),
100.*correct/len(test_loader.dataset)
))
'''
F:\anaconda\envs\pytorch\python.exe F:/pythonProject1/pythonProject3/ll.py
Train Epoch:0 [0/60000 (0%)] Loss:2.315502
Train Epoch:0 [20000/60000 (33%)] Loss:2.117644
Train Epoch:0 [40000/60000 (67%)] Loss:1.659186
Train Epoch:1 [0/60000 (0%)] Loss:1.290930
Train Epoch:1 [20000/60000 (33%)] Loss:1.049087
Train Epoch:1 [40000/60000 (67%)] Loss:0.872082
Train Epoch:2 [0/60000 (0%)] Loss:0.528612
Train Epoch:2 [20000/60000 (33%)] Loss:0.402818
Train Epoch:2 [40000/60000 (67%)] Loss:0.400452
Train Epoch:3 [0/60000 (0%)] Loss:0.318432
Train Epoch:3 [20000/60000 (33%)] Loss:0.344411
Train Epoch:3 [40000/60000 (67%)] Loss:0.443066
Train Epoch:4 [0/60000 (0%)] Loss:0.310835
Train Epoch:4 [20000/60000 (33%)] Loss:0.263893
Train Epoch:4 [40000/60000 (67%)] Loss:0.292117
Train Epoch:5 [0/60000 (0%)] Loss:0.331171
Train Epoch:5 [20000/60000 (33%)] Loss:0.192741
Train Epoch:5 [40000/60000 (67%)] Loss:0.396357
Train Epoch:6 [0/60000 (0%)] Loss:0.363707
Train Epoch:6 [20000/60000 (33%)] Loss:0.225204
Train Epoch:6 [40000/60000 (67%)] Loss:0.218652
Train Epoch:7 [0/60000 (0%)] Loss:0.209941
Train Epoch:7 [20000/60000 (33%)] Loss:0.210056
Train Epoch:7 [40000/60000 (67%)] Loss:0.296629
Train Epoch:8 [0/60000 (0%)] Loss:0.361880
Train Epoch:8 [20000/60000 (33%)] Loss:0.213277
Train Epoch:8 [40000/60000 (67%)] Loss:0.170169
Train Epoch:9 [0/60000 (0%)] Loss:0.301176
Train Epoch:9 [20000/60000 (33%)] Loss:0.175931
Train Epoch:9 [40000/60000 (67%)] Loss:0.214820
test set:average loss:0.0002,Accuracy:9370/10000 (94%)
Process finished with exit code 0
'''
42测试方法
当在train上面不停的train,可能会使得loss很低 accuracy很高,但其实模型只是记住很浅层的东西,不能学习本质上的东西,造成over fitting过拟合,在validation上做test就可以发现在后面阶段acc不稳定甚至下降,loss不稳定甚至上升,所以不是说越训练越好,数据量和架构是核心
logits=torch.rand(4,10)
#四张图片每张图片10维的vector(代表特征:类别0-9)
pred=F.softmax(logits,dim=1)
#在dim=1上做softmax 因为希望对每张图片的输出值做softmax, 再dim=0做softmax结果也是[4,10]的tensor但是结果不同
pred_label=pred.argmax(dim=1)
logits.argmax(dim=1)
#先对pred的值做argmax,再对logits的值做argmax,返回的都是[b]大小的tensor,发现是二者的argmax是一样的,因为4张图片每张图片都有一个最大可能性的label,所以对pred还是对logits对argmax都可以
correct=torch.eq(pred_label,label)#比较是否预测正确
correct.sum().float().item()/4#就是acc 因为correct是tensor,.item()取标量
控制精度test频率变高,花大量时间train就是test频率变小
# -*- codeing = utf-8 -*-
# @Time :2021/5/14 21:06
# @Author:sueong
# @File:ll.py
# @Software:PyCharm
import torch
import torch.nn as nn
from torch import optim
# 超参数
from torchvision import datasets, transforms
batch_size = 200
learning_rate = 0.01
epochs = 10
# 获取训练数据
train_db = datasets.MNIST('../data', train=True, download=True, # train=True则得到的是训练集
transform=transforms.Compose([ # transform进行数据预处理
transforms.ToTensor(), # 转成Tensor类型的数据
transforms.Normalize((0.1307,), (0.3081,)) # 进行数据标准化(减去均值除以方差)
]))
# DataLoader把训练数据分成多个小组,此函数每次抛出一组数据。直至把所有的数据都抛出。就是做一个数据的初始化
train_loader = torch.utils.data.DataLoader(train_db, batch_size=batch_size, shuffle=True)
# 获取测试数据
test_db = datasets.MNIST('../data', train=False,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
]))
test_loader = torch.utils.data.DataLoader(test_db, batch_size=batch_size, shuffle=True)
class MLP(nn.Module):
def __init__(self):
super(MLP,self).__init__()
self.model=nn.Sequential(#sequential串联起来
nn.Linear(784,200),
nn.LeakyReLU(inplace=True),
nn.Linear(200, 200),
nn.LeakyReLU(inplace=True),
nn.Linear(200,10),
nn.LeakyReLU(inplace=True),
)
def forward(self,x):
x = self.model(x)
return x
#Train
device=torch.device('cuda:0')
net=MLP().to(device)#网络结构 就是foward函数
optimizer=optim.SGD(net.parameters(),lr=learning_rate)#使用nn.Module可以直接代替之前[w1,b1,w2,b2.。。]
criteon=nn.CrossEntropyLoss().to(device)
for epoch in range(epochs):
for batch_ind,(data,target) in enumerate(train_loader):
data=data.view(-1,28*28)
data,target=data.to(device),target.to(device) #target.cuda()
logits=net(data)#这不要再加softmax logits就是pred
loss=criteon(logits,target)#求loss
optimizer.zero_grad()
loss.backward()
optimizer.step()
if batch_ind%100==0:
print('Train Epoch:{} [{}/{} ({:.0f}%)]\t Loss:{:.6f}'.format(
epoch,batch_ind*len(data),len(train_loader.dataset),
100.* batch_ind/len(train_loader),loss.item()
))
#每一个epcho test一次可以发现acc再增加
test_loss=0
correct=0
for data,target in test_loader:
data=data.view(-1,28*28)#第一个维度保持不变写-1
data, target = data.to(device), target.to(device)
logits=net(data)
test_loss+=criteon(logits,target).item()
pred=logits.data.max(1)[1]#因为
correct+=pred.eq(target.data).sum()
test_loss/=len(train_loader.dataset)
print('\n test set:average loss:{:.4f},Accuracy:{}/{} ({:.0f}%)\n'.format(
test_loss,correct,len(test_loader.dataset),
100.*correct/len(test_loader.dataset)
))
'''
F:\anaconda\envs\pytorch\python.exe F:/pythonProject1/pythonProject3/ll.py
Train Epoch:0 [0/60000 (0%)] Loss:2.308717
Train Epoch:0 [20000/60000 (33%)] Loss:2.017611
Train Epoch:0 [40000/60000 (67%)] Loss:1.563952
test set:average loss:0.0011,Accuracy:6175/10000 (62%)
Train Epoch:1 [0/60000 (0%)] Loss:1.301144
Train Epoch:1 [20000/60000 (33%)] Loss:1.313298
Train Epoch:1 [40000/60000 (67%)] Loss:1.184744
test set:average loss:0.0008,Accuracy:7102/10000 (71%)
Train Epoch:2 [0/60000 (0%)] Loss:0.946402
Train Epoch:2 [20000/60000 (33%)] Loss:0.762401
Train Epoch:2 [40000/60000 (67%)] Loss:0.697880
test set:average loss:0.0004,Accuracy:8841/10000 (88%)
Train Epoch:3 [0/60000 (0%)] Loss:0.579781
Train Epoch:3 [20000/60000 (33%)] Loss:0.480412
Train Epoch:3 [40000/60000 (67%)] Loss:0.347749
test set:average loss:0.0003,Accuracy:9047/10000 (90%)
Train Epoch:4 [0/60000 (0%)] Loss:0.363675
Train Epoch:4 [20000/60000 (33%)] Loss:0.304079
Train Epoch:4 [40000/60000 (67%)] Loss:0.401550
test set:average loss:0.0003,Accuracy:9118/10000 (91%)
Train Epoch:5 [0/60000 (0%)] Loss:0.324268
Train Epoch:5 [20000/60000 (33%)] Loss:0.269142
Train Epoch:5 [40000/60000 (67%)] Loss:0.284855
test set:average loss:0.0002,Accuracy:9195/10000 (92%)
Train Epoch:6 [0/60000 (0%)] Loss:0.181122
Train Epoch:6 [20000/60000 (33%)] Loss:0.214253
Train Epoch:6 [40000/60000 (67%)] Loss:0.310929
test set:average loss:0.0002,Accuracy:9229/10000 (92%)
Train Epoch:7 [0/60000 (0%)] Loss:0.233558
Train Epoch:7 [20000/60000 (33%)] Loss:0.345559
Train Epoch:7 [40000/60000 (67%)] Loss:0.240973
test set:average loss:0.0002,Accuracy:9286/10000 (93%)
Train Epoch:8 [0/60000 (0%)] Loss:0.197916
Train Epoch:8 [20000/60000 (33%)] Loss:0.368038
Train Epoch:8 [40000/60000 (67%)] Loss:0.367101
test set:average loss:0.0002,Accuracy:9310/10000 (93%)
Train Epoch:9 [0/60000 (0%)] Loss:0.221928
Train Epoch:9 [20000/60000 (33%)] Loss:0.190280
Train Epoch:9 [40000/60000 (67%)] Loss:0.183632
test set:average loss:0.0002,Accuracy:9351/10000 (94%)
Process finished with exit code 0
'''
43可视化
TensorBoard
visdom
1安装
2 run server damon
legend里面放的是y1y2的一个图标
# -*- codeing = utf-8 -*-
# @Time :2021/5/14 21:06
# @Author:sueong
# @File:ll.py
# @Software:PyCharm
import torch
import torch.nn as nn
from torch import optim
from visdom import Visdom
# 超参数
from torchvision import datasets, transforms
from visdom import Visdom
batch_size = 200
learning_rate = 0.01
epochs = 10
# 获取训练数据
train_db = datasets.MNIST('../data', train=True, download=True, # train=True则得到的是训练集
transform=transforms.Compose([ # transform进行数据预处理
transforms.ToTensor(), # 转成Tensor类型的数据
transforms.Normalize((0.1307,), (0.3081,)) # 进行数据标准化(减去均值除以方差)
]))
# DataLoader把训练数据分成多个小组,此函数每次抛出一组数据。直至把所有的数据都抛出。就是做一个数据的初始化
train_loader = torch.utils.data.DataLoader(train_db, batch_size=batch_size, shuffle=True)
# 获取测试数据
test_db = datasets.MNIST('../data', train=False,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
]))
test_loader = torch.utils.data.DataLoader(test_db, batch_size=batch_size, shuffle=True)
class MLP(nn.Module):
def __init__(self):
super(MLP,self).__init__()
self.model=nn.Sequential(#sequential串联起来
nn.Linear(784,200),
nn.LeakyReLU(inplace=True),
nn.Linear(200, 200),
nn.LeakyReLU(inplace=True),
nn.Linear(200,10),
nn.LeakyReLU(inplace=True),
)
def forward(self,x):
x = self.model(x)
return x
#Train
device=torch.device('cuda:0')
net=MLP().to(device)#网络结构 就是foward函数
optimizer=optim.SGD(net.parameters(),lr=learning_rate)#使用nn.Module可以直接代替之前[w1,b1,w2,b2.。。]
criteon=nn.CrossEntropyLoss().to(device)
#在训练-测试的迭代过程之前,定义两条曲线,这里相当于是占位,
#在训练-测试的过程中再不断填充点以实现曲线随着训练动态增长:
'''
这里第一步可以提供参数env='xxx'来设置环境窗口的名称,这里什么都没传,所以是在默认的main窗口下。
第二第三步的viz.line的前两个参数是曲线的Y和X的坐标(前面是纵轴后面才是横轴),
这里为了占位所以都设置了0(实际上为Loss初始Y值设置为0的话,
在图中刚开始的地方会有个大跳跃有点难看,因为Loss肯定是从大往小了走的)。
为它们设置了不同的win参数,它们就会在不同的窗口中展示,
因为第三步定义的是测试集的loss和acc两条曲线,所以在X等于0时Y给了两个初始值。
'''
viz = Visdom()
viz.line([0.], [0.], win='train_loss', opts=dict(title='train loss'))
viz.line([[0.0, 0.0]], [0.], win='test', opts=dict(title='test loss&acc.',legend=['loss', 'acc.']))
global_step = 0
#为了知道训练了多少个batch了,紧接着设置一个全局的计数器:
for epoch in range(epochs):
for batch_ind,(data,target) in enumerate(train_loader):
data=data.view(-1,28*28)
data,target=data.to(device),target.to(device) #target.cuda()
logits=net(data)#这不要再加softmax logits就是pred
loss=criteon(logits,target)#求loss
optimizer.zero_grad()
loss.backward()
# print(w1.grad.norm(), w2.grad.norm())
optimizer.step()
#在每个batch训练完后,为训练曲线添加点,来让曲线实时增长:
#注意这里用win参数来选择是哪条曲线,
# 用update='append'的方式添加曲线的增长点,前面是Y坐标,后面是X坐标。
global_step+=1
viz.line([loss.item()], [global_step], win='train_loss', update='append')
if batch_ind%100==0:
print('Train Epoch:{} [{}/{} ({:.0f}%)]\t Loss:{:.6f}'.format(
epoch,batch_ind*len(data),len(train_loader.dataset),
100.* batch_ind/len(train_loader),loss.item()
))
#每一个epcho test一次可以发现acc再增加
test_loss=0
correct=0
for data,target in test_loader:
data=data.view(-1,28*28)#第一个维度保持不变写-1
data, target = data.to(device), target.to(device)
logits=net(data)
test_loss+=criteon(logits,target).item()
pred=logits.data.max(1)[1]# 在dim=1上找最大值
correct += pred.eq(target).float().sum().item()
#在每次测试结束后, 并在另外两个窗口(用win参数设置)中展示图像(.images)和真实值(文本用.text):
viz.line([[test_loss, correct / len(test_loader.dataset)]],[global_step], win='test', update='append')
viz.images(data.view(-1, 1, 28, 28), win='x')
viz.text(str(pred.detach().cpu().numpy()), win='pred',
opts=dict(title='pred'))#老师的代码里用到了.detach(),并把数据搬到了CPU上,这样才能展示出来。
test_loss/=len(train_loader.dataset)
print('\n test set:average loss:{:.4f},Accuracy:{}/{} ({:.0f}%)\n'.format(
test_loss,correct,len(test_loader.dataset),
100.*correct/len(test_loader.dataset)
))
44欠拟合和过拟合
真实分布符合认知,但是不知道真实的分布和function的参数
而且这些函数不是线性的可能存在噪声
次方越大,波形越大,抖动越大波形越复杂
衡量模型的学习能力,次方增加了,表达能力增强,能表达的分布更复杂,对复杂的映射也能学习到,即model capacity增大了
estimated用的模型的表达能力
ground-truth真实模型的复杂度
case1:Estimated<Ground-truth :under-fitting,用的模型表达能力不够导致欠拟合
underfitting的表现,增加模型复杂度/层数是否得到改善
case2:Ground-truth<Estimated :under-fitting,模型过于复杂,在有限数据集上包含了噪声,过拟合导致泛化能力不好
在train上很好 但是test不好
现实中通常是overfitting
45交叉验证 1Train-Val-Test划分
我们做test的目的是看有没有overfitting 选取在overfitting之前最好的参数,这里的test_loader其实的validation
我们一般选取test acc最高的点,然后终止训练,然后选取最高点作为模型的最终状态
保存在overfitting前效果最好的参数w和b
val set:挑选模型参数,在overfitting前停止train
test set:测试,是交给客户在验收的时候看性能怎么样(test是模型不知道的数据,防止val和train一起训练,如果客户用val测试,那么val已经train过了效果就很好就是作弊)因为test是看不见的,如果根据test set反馈的acc去调整参数,那么test和val的功能就一样就会造成数据污染