一. Pytorch Basic

(一)简介

Pytorch是python中开源的一个机器学习库,类似tensorflow, keras, 可用于自然语言处理等应用,由Facebook 人工智能团队提出。加载cuda时,可使用GPU加速计算。

1. tensor basic

张量(tensor)可以简单地看作存储多维数据的容器。如下图所示0维张量是scalar,1维张量是vector,2维张量是matrix,高于2维称为tensor。

PyTorch和Spark结合_DL

1.1 create a torch tensor

import torch

# create a torch tensor
t = torch.tensor([[1,2,3],[4,5,6]])
t

output:

tensor([[1, 2, 3],
        [4, 5, 6]])

1.2 two ways for transpose(转置的两种方式)

第一种方式:t.t()

t.t()

output:

tensor([[1, 4],
        [2, 5],
        [3, 6]])

第二种方式:t.permute(-1,0)

t.permute(-1,0)

output:

tensor([[1, 4],
        [2, 5],
        [3, 6]])

1.3 reshape a tensor with view()

t.view(3,2)

output:

tensor([[1, 2],
        [3, 4],
        [5, 6]])

try another one:

a = t.view(6,1)
a

output:

tensor([[1],
        [2],
        [3],
        [4],
        [5],
        [6]])

1.4 create tensor of zeros

t = torch.zeros(3,3)
t

output:

tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])

1.5 create tensor from normal distribution randoms

t = torch.randn(3,3)
t

output:

tensor([[ 0.2511, -0.7670, -1.2358],
        [-0.9764, -0.1060,  0.4308],
        [-2.2955,  0.3311, -1.0970]])

1.6 some tensor information

print('tensor shape:',t.shape)
print('number of dimension:',t.dim())
print('tensor type:',t.type())

output:

tensor shape: torch.Size([3, 3])
number of dimension: 2
tensor type: torch.FloatTensor

1.7 sclicing like numpy

t = torch.tensor([[1,2,3],[4,5,6],[7,8,9]])
# every row, only the last column
print(t[:,-1])
# first 2 rows, all columns
print(t[:2,:])
# lower right most corner
print(t[-1:,-1:])

output:

tensor([3, 6, 9])
tensor([[1, 2, 3],
        [4, 5, 6]])
tensor([[9]])

1.8 pytorch tensor to and from numpy ndarray

1) ndarray to tensor

import numpy as np

# ndarray to tensor
a = np.random.randn(2,3)
t = torch.from_numpy(a)
print(a)
print(t)
print(type(a))
print(type(t))

output:

[[ 0.65463612 -1.85520278  0.28951441]
 [-1.11854953  0.92410894  1.71107649]]
tensor([[ 0.6546, -1.8552,  0.2895],
        [-1.1185,  0.9241,  1.7111]], dtype=torch.float64)
<class 'numpy.ndarray'>
<class 'torch.Tensor'>

2) tensor to ndarray

# tensor to ndarray
t = torch.randn(2,3)
a = t.numpy()
print(t)
print(a)
print(type(t))
print(type(a))

output:

tensor([[ 0.1747, -0.2457,  2.4347],
        [ 1.5476,  0.5925, -2.5421]])
[[ 0.17465861 -0.24565548  2.434704  ]
 [ 1.5475734   0.59250295 -2.5421169 ]]
<class 'torch.Tensor'>
<class 'numpy.ndarray'>

1.9 basic tensor operations

1) 交叉积,外积,a x b = ab sin*

# compute cross product
t1 = torch.tensor([[1,2,3],[1,2,3]])
t2 = torch.tensor([[1,2,3],[4,5,6]])
t1.cross(t2)

output:

tensor([[ 0,  0,  0],
        [-3,  6, -3]])

2) 矩阵乘积

# compute matrix product
t1 = torch.tensor([[2,4],[5,10]])
t2 = torch.tensor([[10],[20]])
t1.mm(t2)

output:

tensor([[100],
        [250]])

3) 逐元素乘法

# elementwise multiplication
t = torch.tensor([[1,2],[3,4]])
t.mul(7)

output1:

tensor([[ 7, 14],
        [21, 28]])
t.mul(t)

output2:

tensor([[ 1,  4],
        [ 9, 16]])

1.10 GPU support

1)is cuda gpu available    torch.cuda.is_available()

2)how many cuda devices    torch.cuda.is_available()

3)move to gpu   t.cuda()

torch.cuda.is_available()

output:

False

解释:我的电脑没有Nvidia的GPU,所以没法加载cuda运算。当计算量相当大运算时间过长时,可以考虑购买显卡,加速计算。

 

二. Back Propagation

反向传播是计算梯度 

PyTorch和Spark结合_初始化_02

的一个很方便的算法,使用的基本思想是链式求导。在上一篇博客中,模型很简单:

PyTorch和Spark结合_初始化_03

手动求导:

每次更新:

PyTorch和Spark结合_pytorch_04

但是当模型很复杂时,明显手动计算很麻烦,这时就要使用机器代替手工劳动力。

其中,pytorch自动求导机制,可以自动计算梯度并存储,只需要使用backward方法,就能得到梯度,方便直接使用。

由此,我们可以改进上一篇博客中关于学习时间与成绩绩点的简单例子。

(一)使用back propagation求梯度

1.完整程序

import torch

x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0]

w = torch.tensor([1.0])  # initial w = 1
w.requires_grad = True  # requires_grad = False by default


# our model forward pass
def forward(x):
    return x*w

# loss function
def loss(x,y):
    y_pred = forward(x)
    return (y_pred-y)*(y_pred-y)


# before training
print("predict (before training)", "x = 4 ", "y =", forward(4).item())

# training loop
for epoch in range(10):
    for x_val,y_val in zip(x_data,y_data):
        l = loss(x_val,y_val)
        l.backward()
        print("\tgrad:",x_val,y_val,w.grad.item())
        w.data = w.data - 0.01*w.grad.item()

        # Manually zero the gradients after updating weights
        w.grad.data.zero_()
       
    print("progress: epoch:",epoch, "loss =", l.item())


#after training
print("predict(after training)", "x = 4 ", "y =", forward(4).item())

输出结果:

predict (before training) x = 4  y = 4.0
	grad: 1.0 2.0 -2.0
	grad: 2.0 4.0 -7.840000152587891
	grad: 3.0 6.0 -16.228801727294922
progress: epoch: 0 loss = 7.315943717956543
	grad: 1.0 2.0 -1.478623867034912
	grad: 2.0 4.0 -5.796205520629883
	grad: 3.0 6.0 -11.998146057128906
progress: epoch: 1 loss = 3.9987640380859375
:
:
:
    grad: 1.0 2.0 -0.1319713592529297
	grad: 2.0 4.0 -0.5173273086547852
	grad: 3.0 6.0 -1.070866584777832
progress: epoch: 9 loss = 0.03185431286692619
predict(after training) x = 4  y = 7.804864406585693

2.分步解析

1)输入数据,参数初始化。

初始w = 1, 默认下requires_grad = False, 但是本模型中需要用到梯度,所以开启requires_grad = True.

注释中使用Variable创建w是pytorch以前版本的方式,新版本中可以不使用Variable,直接创建tensor,开启requires_grad = True即可。

当前,我们可以尝试输出w, w.data, w.data[0], w.item(), 比较它们的异同,方便后续引用。

import torch

x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0]

w = torch.tensor([1.0])  # initial w = 1
w.requires_grad = True  # requires_grad = False by default
# from torch.autograd import Variable
# w = Variable(torch.Tensor([1.0]),requires_grad = True)

# print(w)          # tensor([1.], requires_grad=True)
# print(w.data)     # tensor([1.]) 数据部分
# print(w.data[0])  # tensor(1.) 张量 1.0
# print(w.item())   # 1.0  float 1.0

2)模型建立

简单的前向模型:

# our model forward pass
def forward(x):
    return x*w

3)损失函数

简单的MSE均方误差损失函数:

# loss function
def loss(x,y):
    y_pred = forward(x)
    return (y_pred-y)*(y_pred-y)

4)训练循环

显然,初始化w = 1时,x = 4, y =4。

每一次训练中,先计算损失函数,然后调用backward方法得出梯度,紧接着更新梯度。在一轮梯度更新完成后,将梯度初始化为0. 按照以上过程重复训练10次,得到较优的w。

训练完成后,测试 x = 4, y = 7.8048...

效果还可以。

# before training
print("predict (before training)", "x = 4 ", "y =", forward(4).item())

# training loop
for epoch in range(10):
    for x_val,y_val in zip(x_data,y_data):
        l = loss(x_val,y_val)
        l.backward()
        print("\tgrad:",x_val,y_val,w.grad.item())
        w.data = w.data - 0.01*w.grad.item()

        # Manually zero the gradients after updating weights
        w.grad.data.zero_()
       
    print("progress: epoch:",epoch, "loss =", l.item())


#after training
print("predict(after training)", "x = 4 ", "y =", forward(4).item())

(二)使用pytorch标准化模型构建

以上的例子中,只是使用了pytorch中的backward方法求梯度,其他诸如构建模型,构建损失函数等过程都是自己设置的。在pytorch中有一系列模块和方法帮助我们快速构建模型,统一模型搭建。

具体过程分为三步:

第一步:使用class类设计模型

第二步:构建损失函数与优化器(从 pytorch API 中选择)

第三步:训练循环(前向,后向,更新)

完整代码如下:

import torch

x_data = torch.Tensor([[1.0],[2.0],[3.0]])
y_data = torch.Tensor([[2.0],[4.0],[6.0]])

#################################################
# 01 design your model with class
class Model(torch.nn.Module):
   def __init__(self):
       super(Model, self).__init__()
       self.linear = torch.nn.Linear(1, 1)  # One in and one out

   def forward(self, x):
       y_pred = self.linear(x)
       return y_pred


# our model
model = Model()

###################################################
# 02 construct loss and optimizer  (select from Pytorch API)

criterion = torch.nn.MSELoss(reduction='mean')  # mse损失函数
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)   # 随机梯度下降

###################################################
# 03 training circle (forward, backward, update)
# Training loop
for epoch in range(500):
   # Forward pass: Compute predicted y by passing x to the model
   y_pred = model(x_data)

   # Compute and print loss
   loss = criterion(y_pred, y_data)
   print(epoch, loss.item())

   # Zero gradients, perform a backward pass, and update the weights.
   optimizer.zero_grad()  # initial grad with 0
   loss.backward()   # compute gradient w.grad
   optimizer.step()  # update w with w.grad, eg: w = w - 0.01*w.grad

   
# After training
hour_var = torch.Tensor([[4.0]])
print("predict(after training) ", 4.0, model.forward(hour_var).item())