Pytorch基础基础基础教程

里面的MNIST和ImageNet的例子都可以研究一下，处理命令行参数的部分比较多余可以略过，看一下标准范式，另外[Learning PyTorch with Examples]：(https://pytorch.org/tutorials/beginner/pytorch_with_examples.html)

官方tutorial里面也有对应的讲解，结合起来看。

上面看完基本就想动手用了，觉得不够还可以补充看下[yunjey/pytorch-tutorial]：(https://github.com/yunjey/pytorch-tutorial)

这个，有好几个入门的例子。

第三步边看文档边用

PyTorch的官方文档[PyTorch documentation]：(https://pytorch.org/docs/master/index.html)

有一些不足，很多关键概念和原理都没有讲清楚，但是作为API参考手册是相当好的，先通读一遍，PyTorch具体能干那些事情有个印象，然后开始搞自己的任务，遇到想要实现的操作就去官方文档查API。

到这里，就算入门了，尽情用PyTorch完成自己的任务吧。

资源篇：常用资源

入门后，在具体的日常使用上面，可能经常需要利用到的几个资源：

[bharathgs/Awesome-pytorch-list]：(https://github.com/bharathgs/Awesome-pytorch-list)：
Awesome系列，收录各种PyTorch的资源，有需求，这里去找，包括各种模型，各种有趣的应用，更多的教程，各种论文复现等等

1. Awesome主要内容：

（1）PyTorch&相关库：这一部分只有一个资源，也就是PyTorch的官方网站。
（2）NLP&语音处理：这一部分暂时有二十六个资源，主要涉及语音处理、NLP、多说话人语音处理、语音合成、机器翻译等等。
（3）计算机视觉：这一部分暂时有十四个资源，主要涵盖图像增强、语义分割、风格迁移等等。
（4）概率/生成库：这一部分暂时有七个资源，主要涵盖概率编程、统计推理和生成模型等等。
（5）其他库：这一部分暂时有七十八个资源，主要涵盖上述领域之外的一些PyTorch库。
（6）教程&实例：这一部分暂时有五十三个资源，不仅有官方的教程，也有许多非官方的开发者自己的经验，而且也有中文版的教程。
（7）论文实现：这一部分资源是最多的，暂时有二百七十三个。基本上涵盖了所有顶尖的论文，有兴趣的可以mark下来，一篇一篇的自己过一遍。

2. 相关链接：

（1）[PyTorch Forums]：(https://discuss.pytorch.org/):
PyTorch的官方论坛，有问题，除了谷歌百度，去github提issue，还有去这里问，我在这儿找到不少问题的解答；

（2）[Cadene/pretrained-models.pytorch]：(https://github.com/Cadene/pretrained-models.pytorch)：
最后，想要自定义网络，这里有Inception、ResNet、ResNeXt等各种模型的预训练模型，可以在此基础上该，可以找到各种模型；

（3）b站的河北工业大学老师视频：https://www.bilibili.com/video/BV1Y7411d7Ys/?spm_id_from=333.788.recommend_more_video.1

总结：PyTorch大法好，不过还有很多具体功能怎么用并不是很直接，怎么自定义控制加载不同模型的权重，怎么样多GPU并行，怎么样自定义每一层的学习率和weight decay，以及怎么调整学习率等等，都要自己摸索，官方支持还不是很人性化，后面博客可能会介绍这些topics。

零、pytorch简介

1.pytorch优势

PyTorch是深度学习的主流框架，优势：

（1）可以用tensor（类似numpy）进行GPU加速

（2）DNN建立在autograd上

Pytorch基础基础基础教程_github

2.用pytorch训练DNN的过程

Pytorch基础基础基础教程_神经网络_02

使用torch.nn创建神经网络，nn包会使用autograd包定义模型和求梯度。一个nn.Module对象包括了许多网络层，并且用forward(input)方法来计算损失值，返回output。

训练一个神经网络通畅需要以下步骤：

定义一个神经网络，通常有一些可以训练的参数
迭代一个数据集（Dataset）
处理网络的输入
计算损失（会调用Module对象的forward()方法）
计算损失函数对参数的梯度
更新参数，通常使用如下的梯度下降方法来更新：weight=weight-learning_rate × gradien。

一、数据操作（tensor）

1.1 创建Tensor

（1）创建未初始化的Tensor

import torch

# 创建未初始化的Tensor
x = torch.empty(5, 3)
print(x)

#### 结果为：####

tensor([[-7.9905e+25,  8.1556e-43, -7.9905e+25],
        [ 8.1556e-43, -7.9899e+25,  8.1556e-43],
        [-7.9899e+25,  8.1556e-43, -7.9884e+25],
        [ 8.1556e-43, -7.9884e+25,  8.1556e-43],
        [-7.9900e+25,  8.1556e-43, -7.9900e+25]])

（2）创建随机初始化的Tensor

# 创建随机初始化的Tensor
x = torch.rand(5, 3)
print(x)

#### 结果为：####

tensor([[0.1757, 0.9102, 0.0980],
        [0.0969, 0.6846, 0.5546],
        [0.3665, 0.2245, 0.2967],
        [0.5773, 0.4293, 0.5060],
        [0.0633, 0.2833, 0.2325]])

如果是选择随机数，可以通过torch.randperm(10)产生10个随机数。
如果是生成一个区间的数，可以用torch.arange(10, 30, 5)。

torch.linspace(2, 10, steps = 9)
Out[5]: tensor([ 2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.])

torch.arange(10, 30, 5)
Out[6]: tensor([10, 15, 20, 25])

（3）创建全为0的Tensor

# 创建全为0的Tensor
x = torch.zeros(5, 3, dtype = torch.long)
print(x)

#### 结果为：####

tensor([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]])

（4）根据数据创建Tensor

# 根据数据创建Tensor
x = torch.tensor([5.5, 3])
print(x)

结果为：

tensor([5.5000, 3.0000])

（5）修改原Tensor为全1的Tensor

# 修改原Tensor为全1的Tensor
x = x.new_ones(5, 3, dtype = torch.float64)
print(x)

# 修改数据
x = torch.rand_like(x, dtype = torch.float64)
print(x)

#### 结果为：####

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]], dtype=torch.float64)
tensor([[0.3330, 0.9622, 0.9146],
        [0.2841, 0.9874, 0.3035],
        [0.2449, 0.2221, 0.1693],
        [0.2697, 0.7510, 0.7994],
        [0.1660, 0.9774, 0.4102]], dtype=torch.float64)

（6）获取Tensor的形状

# 获取Tensor的形状
print(x.size())
print(x.shape)
# 注意：返回的torch.Size就是一个tuple，支持所有tuple的操作

#### 结果为：####
torch.Size([5, 3])
torch.Size([5, 3])

（7）通过切分数列初始化

# 切分 linspace
torch.linspace(2, 10, steps = 9)
# tensor([ 2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.])

1.2 基本操作（算术or索引or改变size）

1.2.1 算术操作

同一种操作可能有多种操作方法，下面用加法作栗子：
（1）形式1：

# 同一种操作可能有很多种形式
# 形式1：
y = torch.rand(5, 3)
print(x + y)

tensor([[0.6024, 1.9602, 0.9764],
        [1.2583, 1.6134, 0.6532],
        [0.6273, 0.4975, 0.4529],
        [1.1975, 0.8352, 1.5810],
        [0.2917, 1.4789, 1.1978]], dtype=torch.float64)

（2）形式2：

# 形式2
print(torch.add(x, y))
# 还可以指定输出
result = torch.empty(5, 3)
torch.add(x, y, out = result)
print(result)

tensor([[0.6024, 1.9602, 0.9764],
        [1.2583, 1.6134, 0.6532],
        [0.6273, 0.4975, 0.4529],
        [1.1975, 0.8352, 1.5810],
        [0.2917, 1.4789, 1.1978]], dtype=torch.float64)
tensor([[0.6024, 1.9602, 0.9764],
        [1.2583, 1.6134, 0.6532],
        [0.6273, 0.4975, 0.4529],
        [1.1975, 0.8352, 1.5810],
        [0.2917, 1.4789, 1.1978]])

（3）形式3

# 形式3
y.add_(x)
print(y)

tensor([[0.6024, 1.9602, 0.9764],
        [1.2583, 1.6134, 0.6532],
        [0.6273, 0.4975, 0.4529],
        [1.1975, 0.8352, 1.5810],
        [0.2917, 1.4789, 1.1978]])

1.2.2 索引

可以使用类似NumPy的索引操作来访问Tensor的一部分。
注意：索引的结果与原数据共享内存（修改一个，另一个也会随之被修改）。

# 用类似NumPy的索引操作来访问Tensor的一部分
# 注意：索引出来的结果与原来的数据共享内存
y = x[0, :]
y += 1
print(y)
print(x[0, :]) # 观察x是否改变了

tensor([1.3330, 1.9622, 1.9146], dtype=torch.float64)
tensor([1.3330, 1.9622, 1.9146], dtype=torch.float64)

1.2.3 改变形状

view()返回的是新tensor与源tensor共享内存，即更改其中，另一个也会随之改变。
就是说，view仅仅改变了对这个张量的观察角度。

y = x.view(15)
z = x.view(-1, 5)# -1所指的维度可以根据其他维度的值推出来
print(x.size(), y.size(), z.size())

结果为：

torch.Size([5, 3]) torch.Size([15]) torch.Size([3, 5])

x += 1
print(x)
print(y)

结果为：

tensor([[2.3330, 2.9622, 2.9146],
        [1.2841, 1.9874, 1.3035],
        [1.2449, 1.2221, 1.1693],
        [1.2697, 1.7510, 1.7994],
        [1.1660, 1.9774, 1.4102]], dtype=torch.float64)
tensor([2.3330, 2.9622, 2.9146, 1.2841, 1.9874, 1.3035, 1.2449, 1.2221, 1.1693,
        1.2697, 1.7510, 1.7994, 1.1660, 1.9774, 1.4102], dtype=torch.float64)

如果想返回一个真正新的副本（即不共享内存），则可以使用pytorch的reshape()改变形状，但是不能保证返回的是其拷贝，所以不推荐。
可以用clone创造一个副本然后再使用view！

x_cp = x.clone().view(15)# 用clone创造一个副本
x -= 1
print(x)
print(x_cp)

结果为：

tensor([[1.3330, 1.9622, 1.9146],
        [0.2841, 0.9874, 0.3035],
        [0.2449, 0.2221, 0.1693],
        [0.2697, 0.7510, 0.7994],
        [0.1660, 0.9774, 0.4102]], dtype=torch.float64)
tensor([2.3330, 2.9622, 2.9146, 1.2841, 1.9874, 1.3035, 1.2449, 1.2221, 1.1693,
        1.2697, 1.7510, 1.7994, 1.1660, 1.9774, 1.4102], dtype=torch.float64)

另一个常用的函数item()，可以将一个标量Tensor转换成一个Pyhotn number。

# item()可以将一个标量Tensor转换成一个Python number
x = torch.randn(1)
print(x)
print(x.item())

结果为

tensor([0.2603])
0.2603132724761963

1.3 广播机制

当对两个形状不同的 Tensor 按元素运算时，可能会触发广播（broadcasting）机制：先适当复制元素使这两个 Tensor 形状相同后再按元素运算。例如：

x = torch.arange(1, 3).view(1, 2)
print(x)
y = torch.arange(1, 4).view(3, 1)
print(y)
print(x + y)

结果为

tensor([[1, 2]])
tensor([[1],
        [2],
        [3]])
tensor([[2, 3],
        [3, 4],
        [4, 5]])

1.4 Tensor和Numpy相互转化

⽤ numpy() 和 from_numpy() 将 Tensor 和NumPy中的数组相互转换。但是需要注意的⼀点是：这两个函数所产生的的 Tensor 和NumPy中的数组共享相同的内存。

a = torch.ones(5)
b = a.numpy()
print(a, b)

结果为：

tensor([1., 1., 1., 1., 1.]) [1. 1. 1. 1. 1.]

a += 1
print(a, b)

结果为：

tensor([2., 2., 2., 2., 2.]) [2. 2. 2. 2. 2.]

b += 1
print(a, b)

结果为：

tensor([3., 3., 3., 3., 3.]) [3. 3. 3. 3. 3.]

使⽤ from_numpy() 将NumPy数组转换成 Tensor :

import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)
print(a, b)

结果为：

[1. 1. 1. 1. 1.] tensor([1., 1., 1., 1., 1.], dtype=torch.float64)

a += 1
print(a, b)
b += 1
print(a, b)

结果为：

[2. 2. 2. 2. 2.] tensor([2., 2., 2., 2., 2.], dtype=torch.float64)
[3. 3. 3. 3. 3.] tensor([3., 3., 3., 3., 3.], dtype=torch.float64)

二、自动求梯度（敲黑板）

这里可以参考：Tensor的自动求导(AoutoGrad)

自动求导的一些原理性的知识
autograd软件包是PyTorch中所有神经网络的核心。让我们首先简要地访问它，然后我们将去训练我们的第一个神经网络。

该autograd软件包可自动区分张量上的所有操作。这是一个按运行定义的框架，这意味着您的backprop是由代码的运行方式定义的，并且每次迭代都可以不同。

如果想了解数值微分数值积分和自动求导的知识，可以查看邱锡鹏老师的《神经网络与深度学习》第四章第五节：
下载地址：https://nndl.github.io/

在这里简单说说自动微分的原理吧：我们的目标是求

Pytorch基础基础基础教程_神经网络_03

Pytorch基础基础基础教程_深度学习_04

处的导数。利用链式法则分解为一系列的操作：

Pytorch基础基础基础教程_机器学习_05

Pytorch基础基础基础教程_深度学习_06

2.1张量及张量的求导（Tensor）

# 加入requires_grad=True参数可追踪函数求导
x = torch.ones(2, 2, requires_grad=True)
print(x)
print(x.grad_fn)

结果为：

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)
None

# 进行运算
y = x + 2 # 创建了一个加法操作
print(y)
print(y.grad_fn)

结果为：

tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)
<AddBackward0 object at 0x00000246EA421460>
像x这种直接创建的称为叶子节点，叶子节点对应的 grad_fn 是 None 。
```python
print(x.is_leaf, y.is_leaf)

结果为：

True False

# 整点复杂的操作
z = y * y * 3
out = z.mean()
print(z, out)

结果为：

tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>) tensor(27., grad_fn=<MeanBackward0>)

requires_grad_( … )改变requires_grad 的属性。

a = torch.randn(2, 2) # 缺失情况下默认 requires_grad = False
a = ((a * 3)/(a - 1))
print(a.requires_grad) # False
a.requires_grad_(True)
print(a.requires_grad)
b = (a * a).sum()
print(b.grad_fn)

结果为：

False
True
<SumBackward0 object at 0x00000246E6851FD0>

2.2 梯度

反向传播：因为out包含单个标量，out.backward()所以等效于out.backward(torch.tensor(1.))。

out.backward()
print(x.grad)

结果为：

tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])

# 再来反向传播一次，注意grad是累加的
out2 = x.sum()
out2.backward()
print(x.grad)

out3 = x.sum()
x.grad.data.zero_()
out3.backward()
print(x.grad)

结果为：

tensor([[5.5000, 5.5000],
        [5.5000, 5.5000]])
tensor([[1., 1.],
        [1., 1.]])

三、神经网络设计的pytorch版本

一个简单的前馈网络。它获取输入，将其一层又一层地馈入，然后最终给出输出。神经网络的典型训练过程如下：
（1）定义具有一些可学习参数（或权重）的神经网络
（2）遍历输入数据集
（3）通过网络处理输入
（4）计算损失（输出正确的距离有多远）
（5）将梯度传播回网络参数

通常使用简单的更新规则来更新网络的权重：weight = weight - learning_rate * gradient

3.1 定义网络

# 定义网络
import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    
    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 3 x 3 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 3)
        self.conv2 = nn.Conv2d(6, 16, 3)
        # an affine operation :y =Wx + b
        self.fc1 = nn.Linear(16*6*6, 120) # 6*6 from image dimension
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)
        
    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2)) 
        # CLASStorch.nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
    
    def num_flat_features(self, x):
        size = x.size()[1:] # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        print(num_features)
        return num_features
    
net = Net()
print(net)

结果为：

Net(
  (conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))
  (fc1): Linear(in_features=576, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)

# 模型的可学习参数由返回 net,parameters()
params = list(net.parameters())
print(len(params))
print(params[0].size())  # conv1's weight

结果为

10
torch.Size([6, 1, 3, 3])

# 尝试一个32 x 32随机输入
input = torch.randn(1, 1, 32, 32)
out = net(input)
print(out)

结果为：

576
tensor([[ 0.0496, -0.1179, -0.0271, -0.0818, -0.1386, -0.1017, -0.0374,  0.1208,
          0.0532,  0.0830]], grad_fn=<AddmmBackward>)

# 用随机梯度将所有参数和反向传播器的梯度缓冲区归零
net.zero_grad()
out.backward(torch.randn(1, 10))

3.2 损失函数

output = net(input)
target = torch.randn(10)    # a dummy target, for example
target = target.view(-1,1)  # # make it the same shape as output
criterion = nn.MSELoss()

loss = criterion(output,target)
print(loss)

结果为：

576
tensor(0.9183, grad_fn=<MseLossBackward>)

我们现在的网络结构：

Pytorch基础基础基础教程_深度学习_07

# 如果loss使用.grad_fn属性的属性向后移动，可查看网络结构
print(loss.grad_fn)  # MSELoss
print(loss.grad_fn.next_functions[0][0])  # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0])  # ReLU

结果为：

<MseLossBackward object at 0x00000246EB9CDC10>
<ExpandBackward object at 0x00000246EB9CD1C0>
<AddmmBackward object at 0x00000246EB9CDC10>

3.3 更新权重

实践中最简单的更新规则是随机梯度下降（SGD）:
weight = weight - learning_rate * gradient

# 实践中最简单的更新规则是随机梯度下降（SGD）
import torch.optim as optim

# create your optimizer
optimizer = optim.SGD(net.parameters(), lr = 0.01)

# in your training loop
optimizer.zero_grad()# zero ther gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()

四、数据集加载

1.dataset

Pytorch基础基础基础教程_深度学习_08

从dataset的源码中发现，Dataset自带有__add__内置函数，dataset对象可以用+号来cat，更多参考：https://zhuanlan.zhihu.com/p/222772996

Pytorch基础基础基础教程_深度学习_09

2.dataloader

五、GPU跑深度学习

李沐老师的手把手教学视频：https://www.zhihu.com/zvideo/1363284223420436480

（1）cmd命令：dxdiag，查看电脑的芯片配置：

Pytorch基础基础基础教程_github_10

（2）下载CUDA：https://developer.nvidia.com/cuda-downloads?target_os=Windows&target_arch=x86_64&target_versinotallow=10&target_type=exe_local

Pytorch基础基础基础教程_神经网络_11

（3）cmd命令行查看（下图的光标的位置）

Pytorch基础基础基础教程_深度学习_12

（4）下载pytorch的GPU版本：https://pytorch.org/get-started/locally/（pytorch的官网）

Pytorch基础基础基础教程_github_13

复制底下的command命令到anaconda prompt命令行中

（5）下载需要点内存空间，卡了很多次断了，但是后来没继续下竟然测试时也显示可以用GPU的torch，，可能之前下过。。

import torch
flag = torch.cuda.is_available()
if flag:
    print("CUDA可使用")
else:
    print("CUDA不可用")

ngpu= 1
# Decide which device we want to run on
device = torch.device("cuda:0" if (torch.cuda.is_available() and ngpu > 0) else "cpu")
print("驱动为：",device)
print("GPU型号： ",torch.cuda.get_device_name(0))

测试结果为：

CUDA可使用
驱动为： cuda:0

六、其他问题

（1）torch.nn.Linear(a, b) 的用法

首先我们可以查找pytorch官方文档：https://pytorch.org/docs/master/nn.html#linear-layers，可知torch.nn的线性层有如下几种：

Pytorch基础基础基础教程_pytorch_14

import torch

x = torch.randn(128, 20)  # 输入的维度是（128，20）
m = torch.nn.Linear(20, 30)  # 20,30是指维度
output = m(x)
print('m.weight.shape:\n ', m.weight.shape)
print('m.bias.shape:\n', m.bias.shape)
print('output.shape:\n', output.shape)

# ans = torch.mm(input,torch.t(m.weight))+m.bias 等价于下面的
ans = torch.mm(x, m.weight.t()) + m.bias   
print('ans.shape:\n', ans.shape)

print(torch.equal(ans, output))

结果为：

m.weight.shape:
  torch.Size([30, 20])
m.bias.shape:
 torch.Size([30])
output.shape:
 torch.Size([128, 30])
ans.shape:
 torch.Size([128, 30])
True

为什么 m.weight.shape = (30,20)?
因为线性变换的公式是：
$Pytorch基础基础基础教程_github_15$
先生成一个（30，20）的weight，实际运算中再转置，这样就能和x做矩阵乘法了

reference

1）pytorch中文文档：https://pytorch-cn.readthedocs.io/zh/latest/
2）pytorch英文文档：https://pytorch.org/docs/stable/index.html
3）pytorch官方教程的笔记：javascript:void(0) 4）学习GNN可看pytorch的geometric文档：https://pytorch-geometric.readthedocs.io/en/latest/index.html 5）小土堆pytorch的b站视频：https://www.bilibili.com/video/BV1hE411t7RN
6）PyTorch官方教程介绍 7）datawhale的PyTorch基础教程 8）《深度学习框架PyTorch入门与实践》陈云
9）https://www.zhihu.com/question/55720139/answer/294449487

上一篇：【NumPy】常用姿势积累

下一篇：机器学习和深度学习的主要术语（中英）

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯