机器学习中按照目的不同可以分为两大类:回归和分类。逻辑回归就是完成分类任务。

多讨论二分类和多分类问题。

用于二分类的机器学习模型 二分类有哪些模型_初始化

二分类问题可以分为线性可分和线性不可分。在做回归问题的时候,经常做确定性的判别模型。在做分类问题的时候,这种函数一定是不可微的。在分类问题中,看到更多是概率判别模型。

用于二分类的机器学习模型 二分类有哪些模型_二分类_02

用于二分类的机器学习模型 二分类有哪些模型_用于二分类的机器学习模型_03

用于二分类的机器学习模型 二分类有哪些模型_模型预测_04

用于二分类的机器学习模型 二分类有哪些模型_模型预测_05

当我们在谈论分类指标的时候,例如二分类问题

用于二分类的机器学习模型 二分类有哪些模型_模型预测_06

用于二分类的机器学习模型 二分类有哪些模型_初始化_07

用于二分类的机器学习模型 二分类有哪些模型_二分类_08

softmax回归模型的从零开始实现,实现一个对Fashion-MNIST训练集中的图像数据进行分类的模型:

softmax从零开始的实现

In [47]:


import torchimport torchvision
import numpy as np
import sys
sys.path.append("/home/kesci/input")
import d2lzh1981 as d2l

print(torch.__version__)
print(torchvision.__version__)
 
1.3.00.4.1a0+d94043a


获取训练集数据和测试集数据

In [48]:
 
batch_size = 256train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)


模型参数初始化

In [49]:
 
num_inputs = 784print(28*28)
num_outputs = 10

W = torch.tensor(np.random.normal(0, 0.01, (num_inputs, num_outputs)), dtype=torch.float)
b = torch.zeros(num_outputs, dtype=torch.float)
 
784
 
In [50]:
 
W.requires_grad_(requires_grad=True)b.requires_grad_(requires_grad=True)
 
Out[50]:
 
tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], requires_grad=True)


对多维Tensor按维度操作

In [51]:
 
X = torch.tensor([[1, 2, 3], [4, 5, 6]])print(X.sum(dim=0, keepdim=True))  # dim为0,按照相同的列求和,并在结果中保留列特征
print(X.sum(dim=1, keepdim=True))  # dim为1,按照相同的行求和,并在结果中保留行特征
print(X.sum(dim=0, keepdim=False)) # dim为0,按照相同的列求和,不在结果中保留列特征
print(X.sum(dim=1, keepdim=False)) # dim为1,按照相同的行求和,不在结果中保留行特征
 
tensor([[5, 7, 9]])tensor([[ 6],
        [15]])
tensor([5, 7, 9])
tensor([ 6, 15])


定义softmax操作

y^j=exp(oj)∑3i=1exp(oi)y^j=exp⁡(oj)∑i=13exp⁡(oi)
In [52]:
 
def softmax(X):    X_exp = X.exp()
    partition = X_exp.sum(dim=1, keepdim=True)
    # print("X size is ", X_exp.size())
    # print("partition size is ", partition, partition.size())
    return X_exp / partition  # 这里应用了广播机制
 
In [53]:
 
X = torch.rand((2, 5))X_prob = softmax(X)
print(X_prob, '\n', X_prob.sum(dim=1))
 
tensor([[0.1927, 0.2009, 0.1823, 0.1887, 0.2355],        [0.1274, 0.1843, 0.2536, 0.2251, 0.2096]]) 
 tensor([1., 1.])


softmax回归模型

o(i)y^(i)=x(i)W+b,=softmax(o(i)).o(i)=x(i)W+b,y^(i)=softmax(o(i)).
In [54]:
 
def net(X):    return softmax(torch.mm(X.view((-1, num_inputs)), W) + b)


定义损失函数

H(y(i),y^(i))=−∑j=1qy(i)jlogy^(i)j,H(y(i),y^(i))=−∑j=1qyj(i)log⁡y^j(i),
ℓ(Θ)=1n∑i=1nH(y(i),y^(i)),ℓ(Θ)=1n∑i=1nH(y(i),y^(i)),
ℓ(Θ)=−(1/n)∑i=1nlogy^(i)y(i)ℓ(Θ)=−(1/n)∑i=1nlog⁡y^y(i)(i)
In [55]:
 
y_hat = torch.tensor([[0.1, 0.3, 0.6], [0.3, 0.2, 0.5]])y = torch.LongTensor([0, 2])
y_hat.gather(1, y.view(-1, 1))
 
Out[55]:
 
tensor([[0.1000],        [0.5000]])
 
In [56]:
 
def cross_entropy(y_hat, y):    return - torch.log(y_hat.gather(1, y.view(-1, 1)))


定义准确率

我们模型训练完了进行模型预测的时候,会用到我们这里定义的准确率。
In [57]:
 
def accuracy(y_hat, y):    return (y_hat.argmax(dim=1) == y).float().mean().item()
 
In [58]:
 
print(accuracy(y_hat, y))
 
0.5
 
In [59]:
 
# 本函数已保存在d2lzh_pytorch包中方便以后使用。该函数将被逐步改进:它的完整实现将在“图像增广”一节中描述def evaluate_accuracy(data_iter, net):
    acc_sum, n = 0.0, 0
    for X, y in data_iter:
        acc_sum += (net(X).argmax(dim=1) == y).float().sum().item()
        n += y.shape[0]
    return acc_sum / n
 
In [60]:
 
print(evaluate_accuracy(test_iter, net))
 
0.1457


训练模型

In [61]:
 
num_epochs, lr = 5, 0.1# 本函数已保存在d2lzh_pytorch包中方便以后使用
def train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size,
              params=None, lr=None, optimizer=None):
    for epoch in range(num_epochs):
        train_l_sum, train_acc_sum, n = 0.0, 0.0, 0
        for X, y in train_iter:
            y_hat = net(X)
            l = loss(y_hat, y).sum()
            
            # 梯度清零
            if optimizer is not None:
                optimizer.zero_grad()
            elif params is not None and params[0].grad is not None:
                for param in params:
                    param.grad.data.zero_()
            
            l.backward()
            if optimizer is None:
                d2l.sgd(params, lr, batch_size)
            else:
                optimizer.step() 
            
            
            train_l_sum += l.item()
            train_acc_sum += (y_hat.argmax(dim=1) == y).sum().item()
            n += y.shape[0]
        test_acc = evaluate_accuracy(test_iter, net)
        print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f'
              % (epoch + 1, train_l_sum / n, train_acc_sum / n, test_acc))

train_ch3(net, train_iter, test_iter, cross_entropy, num_epochs, batch_size, [W, b], lr)
 
epoch 1, loss 0.7870, train acc 0.751, test acc 0.794epoch 2, loss 0.5702, train acc 0.813, test acc 0.809
epoch 3, loss 0.5254, train acc 0.826, test acc 0.814
epoch 4, loss 0.5009, train acc 0.832, test acc 0.822
epoch 5, loss 0.4853, train acc 0.837, test acc 0.828


模型预测

现在我们的模型训练完了,可以进行一下预测,我们的这个模型训练的到底准确不准确。 现在就可以演示如何对图像进行分类了。给定一系列图像(第三行图像输出),我们比较一下它们的真实标签(第一行文本输出)和模型预测结果(第二行文本输出)。

In [62]:
 
X, y = iter(test_iter).next()true_labels = d2l.get_fashion_mnist_labels(y.numpy())
pred_labels = d2l.get_fashion_mnist_labels(net(X).argmax(dim=1).numpy())
titles = [true + '\n' + pred for true, pred in zip(true_labels, pred_labels)]

d2l.show_fashion_mnist(X[0:9], titles[0:9])


softmax的简洁实现

In [63]:
 
# 加载各种包或者模块import torch
from torch import nn
from torch.nn import init
import numpy as np
import sys
sys.path.append("/home/kesci/input")
import d2lzh1981 as d2l

print(torch.__version__)
 
1.3.0


初始化参数和获取数据

In [64]:
 
batch_size = 256train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)


定义网络模型

In [65]:
 
num_inputs = 784num_outputs = 10

class LinearNet(nn.Module):
    def __init__(self, num_inputs, num_outputs):
        super(LinearNet, self).__init__()
        self.linear = nn.Linear(num_inputs, num_outputs)
    def forward(self, x): # x 的形状: (batch, 1, 28, 28)
        y = self.linear(x.view(x.shape[0], -1))
        return y
    
# net = LinearNet(num_inputs, num_outputs)

class FlattenLayer(nn.Module):
    def __init__(self):
        super(FlattenLayer, self).__init__()
    def forward(self, x): # x 的形状: (batch, *, *, ...)
        return x.view(x.shape[0], -1)

from collections import OrderedDict
net = nn.Sequential(
        # FlattenLayer(),
        # LinearNet(num_inputs, num_outputs) 
        OrderedDict([
           ('flatten', FlattenLayer()),
           ('linear', nn.Linear(num_inputs, num_outputs))]) # 或者写成我们自己定义的 LinearNet(num_inputs, num_outputs) 也可以
        )


初始化模型参数

In [66]:
 
init.normal_(net.linear.weight, mean=0, std=0.01)init.constant_(net.linear.bias, val=0)
 
Out[66]:
 
Parameter containing:tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], requires_grad=True)


定义损失函数

In [67]:
 
loss = nn.CrossEntropyLoss() # 下面是他的函数原型# class torch.nn.CrossEntropyLoss(weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean')


定义优化函数

In [68]:
 
optimizer = torch.optim.SGD(net.parameters(), lr=0.1) # 下面是函数原型# class torch.optim.SGD(params, lr=, momentum=0, dampening=0, weight_decay=0, nesterov=False)


训练

In [69]:
 
num_epochs = 5d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size, None, None, optimizer)
 
epoch 1, loss 0.0031, train acc 0.749, test acc 0.794epoch 2, loss 0.0022, train acc 0.814, test acc 0.800
epoch 3, loss 0.0021, train acc 0.826, test acc 0.811
epoch 4, loss 0.0020, train acc 0.833, test acc 0.826
epoch 5, loss 0.0019, train acc 0.837, test acc 0.825