目录

  • 一、损失函数概念
  • 二、交叉熵损失函数
  • 交叉熵和相对熵的关系
  • 三、PyTorch中的损失函数
  • 1. nn.CrossEntropyLoss
  • 1.1 reduction三种模式测试
  • 1.2 weight测试
  • 2. nn.NLLLOSS
  • 3. nn.BCELoss
  • 4. nn.BCEWithLogitsLoss



任务简介:

学习权值初始化的原理;介绍损失函数、代价函数与目标函数的关系,并学习交叉熵损失函数

详细说明:

本节学习损失函数、代价函数与目标函数的联系与不同之处,然后学习人民币二分类任务中使用到的交叉熵损失函数,在讲解交叉熵损失函数时补充分析自信息、信息熵、相对熵和交叉熵之间的关系,最后学习四种损失函数:

  1. nn.CrossEntropyLoss
  2. nn.NLLLoss
  3. nn.BCELoss
  4. nn.BCEWithLogitsLoss

一、损失函数概念

机器学习算法损失值_git


机器学习算法损失值_机器学习算法损失值_02


size_averagereduce两个参数将被舍弃,勿用。

测试代码:


......
# 参数设置
......
# ============================ step 1/5 数据 ============================
......

# 构建MyDataset实例
......
# 构建DataLoder
......
# ============================ step 2/5 模型 ============================
......
# ============================ step 3/5 损失函数 ============================
loss_functoin = nn.CrossEntropyLoss()                                                   # 选择损失函数
# ============================ step 4/5 优化器 ============================
 # 选择优化器
 # 设置学习率下降策略
 .......
# ============================ step 5/5 训练 ============================
......
for epoch in range(MAX_EPOCH):
......
    for i, data in enumerate(train_loader):
        # forward
		......
        # backward
        optimizer.zero_grad()
        loss = loss_functoin(outputs, labels)
        loss.backward()
        # update weights
		......
        # 统计分类情况
        # 打印训练信息
		......

loss_functoin = nn.CrossEntropyLoss()loss = loss_functoin(outputs, labels)处设置断点,进行debug了解其机制。

首先debug到loss_functoin = nn.CrossEntropyLoss(),并"step into"进入。

class CrossEntropyLoss(_WeightedLoss):
    r"""This criterion combines :func:`nn.LogSoftmax` and 
	.......
    Examples::

        >>> loss = nn.CrossEntropyLoss()
        >>> input = torch.randn(3, 5, requires_grad=True)
        >>> target = torch.empty(3, dtype=torch.long).random_(5)
        >>> output = loss(input, target)
        >>> output.backward()
    """
    __constants__ = ['weight', 'ignore_index', 'reduction']

    def __init__(self, weight=None, size_average=None, ignore_index=-100,
                 reduce=None, reduction='mean'):
        super(CrossEntropyLoss, self).__init__(weight, size_average, reduce, reduction)
        self.ignore_index = ignore_index

    def forward(self, input, target):
        return F.cross_entropy(input, target, weight=self.weight,
                               ignore_index=self.ignore_index, reduction=self.reduction)

"step into"进入:super(CrossEntropyLoss, self).__init__(weight, size_average, reduce, reduction)

class _WeightedLoss(_Loss):
    def __init__(self, weight=None, size_average=None, reduce=None, reduction='mean'):
        super(_WeightedLoss, self).__init__(size_average, reduce, reduction)
        self.register_buffer('weight', weight)

_WeightedLoss继承于_Loss

"step into"进入:super(_WeightedLoss, self).__init__(size_average, reduce, reduction)

class _Loss(Module):
    def __init__(self, size_average=None, reduce=None, reduction='mean'):
        super(_Loss, self).__init__()
        if size_average is not None or reduce is not None:
            self.reduction = _Reduction.legacy_get_string(size_average, reduce)
        else:
            self.reduction = reduction

_Loss 又继承于 Module

接下来继续debug,"step into"进入:loss_functoin = nn.CrossEntropyLoss()

机器学习算法损失值_机器学习算法损失值_03


"step into"进入:result = self.forward(*input, **kwargs)

机器学习算法损失值_损失函数_04


这里调用了F.cross_entropy,在该处"step into"进入cross_entropy中,对reduction进行判断并进行计算。

机器学习算法损失值_git_05

二、交叉熵损失函数

交叉熵和相对熵的关系

机器学习算法损失值_损失函数_06

机器学习算法损失值_bc_07是一个真实的概率分布(训练集、样本分布),机器学习算法损失值_bc_08是模型的分布。由于训练集是固定的,机器学习算法损失值_git_09是固定的,是一个常数,在做优化时,常数是可以忽略掉的。所以,优化交叉熵,相当于优化相对熵。

三、PyTorch中的损失函数

1. nn.CrossEntropyLoss

机器学习算法损失值_损失函数_10

1.1 reduction三种模式测试

测试代码:

import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np

# fake data
inputs = torch.tensor([[1, 2], [1, 3], [1, 3]], dtype=torch.float)
target = torch.tensor([0, 1, 1], dtype=torch.long)

# ----------------------------------- CrossEntropy loss: reduction -----------------------------------
# flag = 0
flag = 1
if flag:
    # def loss function
    loss_f_none = nn.CrossEntropyLoss(weight=None, reduction='none')
    loss_f_sum = nn.CrossEntropyLoss(weight=None, reduction='sum')
    loss_f_mean = nn.CrossEntropyLoss(weight=None, reduction='mean')

    # forward
    loss_none = loss_f_none(inputs, target)
    loss_sum = loss_f_sum(inputs, target)
    loss_mean = loss_f_mean(inputs, target)

    # view
    print("Cross Entropy Loss:\n ", loss_none, loss_sum, loss_mean)

输出:

Cross Entropy Loss:
  tensor([1.3133, 0.1269, 0.1269]) tensor(1.5671) tensor(0.5224)

1.2 weight测试

测试代码:

# ----------------------------------- weight -----------------------------------
# flag = 0
flag = 1
if flag:
    # def loss function
    weights = torch.tensor([1, 2], dtype=torch.float)
    # weights = torch.tensor([0.7, 0.3], dtype=torch.float)

    loss_f_none_w = nn.CrossEntropyLoss(weight=weights, reduction='none')
    loss_f_sum = nn.CrossEntropyLoss(weight=weights, reduction='sum')
    loss_f_mean = nn.CrossEntropyLoss(weight=weights, reduction='mean')

    # forward
    loss_none_w = loss_f_none_w(inputs, target)
    loss_sum = loss_f_sum(inputs, target)
    loss_mean = loss_f_mean(inputs, target)

    # view
    print("\nweights: ", weights)
    print(loss_none_w, loss_sum, loss_mean)

输出:

weights:  tensor([1., 2.])
tensor([1.3133, 0.2539, 0.2539]) tensor(1.8210) tensor(0.3642)

不带weight的loss值为:

Cross Entropy Loss:
  tensor([1.3133, 0.1269, 0.1269]) tensor(1.5671) tensor(0.5224)

计算过程:机器学习算法损失值_损失函数_11

这是因为1.3133 是第0类,权重为10.12690.1269 是第1类,权重为2

以及:机器学习算法损失值_bc_12

2. nn.NLLLOSS

机器学习算法损失值_损失函数_13


测试:

# ----------------------------------- 2 NLLLoss -----------------------------------
# flag = 0
flag = 1
if flag:

    weights = torch.tensor([1, 1], dtype=torch.float)

    loss_f_none_w = nn.NLLLoss(weight=weights, reduction='none')
    loss_f_sum = nn.NLLLoss(weight=weights, reduction='sum')
    loss_f_mean = nn.NLLLoss(weight=weights, reduction='mean')

    # forward
    loss_none_w = loss_f_none_w(inputs, target)
    loss_sum = loss_f_sum(inputs, target)
    loss_mean = loss_f_mean(inputs, target)

    # view
    print("\nweights: ", weights)
    print("NLL Loss", loss_none_w, loss_sum, loss_mean)

输出:

weights:  tensor([1., 1.])
NLL Loss tensor([-1., -3., -3.]) tensor(-7.) tensor(-2.3333)

3. nn.BCELoss

机器学习算法损失值_bc_14


测试代码:

# ----------------------------------- 3 BCE Loss -----------------------------------
# flag = 0
flag = 1
if flag:
    inputs = torch.tensor([[1, 2], [2, 2], [3, 4], [4, 5]], dtype=torch.float)
    target = torch.tensor([[1, 0], [1, 0], [0, 1], [0, 1]], dtype=torch.float)

    target_bce = target

    # itarget 使用sigmoid函数压缩到0-1之间
    inputs = torch.sigmoid(inputs)

    weights = torch.tensor([1, 1], dtype=torch.float)

    loss_f_none_w = nn.BCELoss(weight=weights, reduction='none')
    loss_f_sum = nn.BCELoss(weight=weights, reduction='sum')
    loss_f_mean = nn.BCELoss(weight=weights, reduction='mean')

    # forward
    loss_none_w = loss_f_none_w(inputs, target_bce)
    loss_sum = loss_f_sum(inputs, target_bce)
    loss_mean = loss_f_mean(inputs, target_bce)

    # view
    print("\nweights: ", weights)
    print("BCE Loss", loss_none_w, loss_sum, loss_mean)

输出:

weights:  tensor([1., 1.])
BCE Loss tensor([[0.3133, 2.1269],
        [0.1269, 2.1269],
        [3.0486, 0.0181],
        [4.0181, 0.0067]]) tensor(11.7856) tensor(1.4732)

4. nn.BCEWithLogitsLoss

机器学习算法损失值_损失函数_15


测试代码:

# ----------------------------------- 4 BCE with Logis Loss -----------------------------------
# flag = 0
flag = 1
if flag:
    inputs = torch.tensor([[1, 2], [2, 2], [3, 4], [4, 5]], dtype=torch.float)
    target = torch.tensor([[1, 0], [1, 0], [0, 1], [0, 1]], dtype=torch.float)

    target_bce = target

    # inputs = torch.sigmoid(inputs)

    weights = torch.tensor([1, 1], dtype=torch.float)

    loss_f_none_w = nn.BCEWithLogitsLoss(weight=weights, reduction='none')
    loss_f_sum = nn.BCEWithLogitsLoss(weight=weights, reduction='sum')
    loss_f_mean = nn.BCEWithLogitsLoss(weight=weights, reduction='mean')

    # forward
    loss_none_w = loss_f_none_w(inputs, target_bce)
    loss_sum = loss_f_sum(inputs, target_bce)
    loss_mean = loss_f_mean(inputs, target_bce)

    # view
    print("\nweights: ", weights)
    print(loss_none_w, loss_sum, loss_mean)


# --------------------------------- pos weight

# flag = 0
flag = 1
if flag:
    inputs = torch.tensor([[1, 2], [2, 2], [3, 4], [4, 5]], dtype=torch.float)
    target = torch.tensor([[1, 0], [1, 0], [0, 1], [0, 1]], dtype=torch.float)

    target_bce = target

    # itarget
    # inputs = torch.sigmoid(inputs)

    weights = torch.tensor([1], dtype=torch.float)
    pos_w = torch.tensor([1], dtype=torch.float)        # 3

    loss_f_none_w = nn.BCEWithLogitsLoss(weight=weights, reduction='none', pos_weight=pos_w)
    loss_f_sum = nn.BCEWithLogitsLoss(weight=weights, reduction='sum', pos_weight=pos_w)
    loss_f_mean = nn.BCEWithLogitsLoss(weight=weights, reduction='mean', pos_weight=pos_w)

    # forward
    loss_none_w = loss_f_none_w(inputs, target_bce)
    loss_sum = loss_f_sum(inputs, target_bce)
    loss_mean = loss_f_mean(inputs, target_bce)

    # view
    print("\npos_weights: ", pos_w)
    print(loss_none_w, loss_sum, loss_mean)

输出:

weights:  tensor([1., 1.])
tensor([[0.3133, 2.1269],
        [0.1269, 2.1269],
        [3.0486, 0.0181],
        [4.0181, 0.0067]]) tensor(11.7856) tensor(1.4732)

pos_weights:  tensor([1.])
tensor([[0.3133, 2.1269],
        [0.1269, 2.1269],
        [3.0486, 0.0181],
        [4.0181, 0.0067]]) tensor(11.7856) tensor(1.4732)

这个时候loss值是一致的,当设置pos_w = torch.tensor([3], dtype=torch.float)时:

weights:  tensor([1., 1.])
tensor([[0.3133, 2.1269],
        [0.1269, 2.1269],
        [3.0486, 0.0181],
        [4.0181, 0.0067]]) tensor(11.7856) tensor(1.4732)

pos_weights:  tensor([3.])
tensor([[0.9398, 2.1269],
        [0.3808, 2.1269],
        [3.0486, 0.0544],
        [4.0181, 0.0201]]) tensor(12.7158) tensor(1.5895)

正样本位置上的loss,放大了3倍。