目录
- 一、损失函数概念
- 二、交叉熵损失函数
- 交叉熵和相对熵的关系
- 三、PyTorch中的损失函数
- 1. nn.CrossEntropyLoss
- 1.1 reduction三种模式测试
- 1.2 weight测试
- 2. nn.NLLLOSS
- 3. nn.BCELoss
- 4. nn.BCEWithLogitsLoss
任务简介:
学习权值初始化的原理;介绍损失函数、代价函数与目标函数的关系,并学习交叉熵损失函数
详细说明:
本节学习损失函数、代价函数与目标函数的联系与不同之处,然后学习人民币二分类任务中使用到的交叉熵损失函数,在讲解交叉熵损失函数时补充分析自信息、信息熵、相对熵和交叉熵之间的关系,最后学习四种损失函数:
- nn.CrossEntropyLoss
- nn.NLLLoss
- nn.BCELoss
- nn.BCEWithLogitsLoss
一、损失函数概念
size_average和reduce两个参数将被舍弃,勿用。
测试代码:
......
# 参数设置
......
# ============================ step 1/5 数据 ============================
......
# 构建MyDataset实例
......
# 构建DataLoder
......
# ============================ step 2/5 模型 ============================
......
# ============================ step 3/5 损失函数 ============================
loss_functoin = nn.CrossEntropyLoss() # 选择损失函数
# ============================ step 4/5 优化器 ============================
# 选择优化器
# 设置学习率下降策略
.......
# ============================ step 5/5 训练 ============================
......
for epoch in range(MAX_EPOCH):
......
for i, data in enumerate(train_loader):
# forward
......
# backward
optimizer.zero_grad()
loss = loss_functoin(outputs, labels)
loss.backward()
# update weights
......
# 统计分类情况
# 打印训练信息
......
在loss_functoin = nn.CrossEntropyLoss()
和loss = loss_functoin(outputs, labels)
处设置断点,进行debug了解其机制。
首先debug到loss_functoin = nn.CrossEntropyLoss()
,并"step into"进入。
class CrossEntropyLoss(_WeightedLoss):
r"""This criterion combines :func:`nn.LogSoftmax` and
.......
Examples::
>>> loss = nn.CrossEntropyLoss()
>>> input = torch.randn(3, 5, requires_grad=True)
>>> target = torch.empty(3, dtype=torch.long).random_(5)
>>> output = loss(input, target)
>>> output.backward()
"""
__constants__ = ['weight', 'ignore_index', 'reduction']
def __init__(self, weight=None, size_average=None, ignore_index=-100,
reduce=None, reduction='mean'):
super(CrossEntropyLoss, self).__init__(weight, size_average, reduce, reduction)
self.ignore_index = ignore_index
def forward(self, input, target):
return F.cross_entropy(input, target, weight=self.weight,
ignore_index=self.ignore_index, reduction=self.reduction)
"step into"进入:super(CrossEntropyLoss, self).__init__(weight, size_average, reduce, reduction)
class _WeightedLoss(_Loss):
def __init__(self, weight=None, size_average=None, reduce=None, reduction='mean'):
super(_WeightedLoss, self).__init__(size_average, reduce, reduction)
self.register_buffer('weight', weight)
_WeightedLoss继承于_Loss
"step into"进入:super(_WeightedLoss, self).__init__(size_average, reduce, reduction)
class _Loss(Module):
def __init__(self, size_average=None, reduce=None, reduction='mean'):
super(_Loss, self).__init__()
if size_average is not None or reduce is not None:
self.reduction = _Reduction.legacy_get_string(size_average, reduce)
else:
self.reduction = reduction
_Loss 又继承于 Module
接下来继续debug,"step into"进入:loss_functoin = nn.CrossEntropyLoss()
"step into"进入:result = self.forward(*input, **kwargs)
这里调用了F.cross_entropy
,在该处"step into"进入cross_entropy
中,对reduction进行判断并进行计算。
二、交叉熵损失函数
交叉熵和相对熵的关系
是一个真实的概率分布(训练集、样本分布),是模型的分布。由于训练集是固定的,是固定的,是一个常数,在做优化时,常数是可以忽略掉的。所以,优化交叉熵,相当于优化相对熵。
三、PyTorch中的损失函数
1. nn.CrossEntropyLoss
1.1 reduction三种模式测试
测试代码:
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
# fake data
inputs = torch.tensor([[1, 2], [1, 3], [1, 3]], dtype=torch.float)
target = torch.tensor([0, 1, 1], dtype=torch.long)
# ----------------------------------- CrossEntropy loss: reduction -----------------------------------
# flag = 0
flag = 1
if flag:
# def loss function
loss_f_none = nn.CrossEntropyLoss(weight=None, reduction='none')
loss_f_sum = nn.CrossEntropyLoss(weight=None, reduction='sum')
loss_f_mean = nn.CrossEntropyLoss(weight=None, reduction='mean')
# forward
loss_none = loss_f_none(inputs, target)
loss_sum = loss_f_sum(inputs, target)
loss_mean = loss_f_mean(inputs, target)
# view
print("Cross Entropy Loss:\n ", loss_none, loss_sum, loss_mean)
输出:
Cross Entropy Loss:
tensor([1.3133, 0.1269, 0.1269]) tensor(1.5671) tensor(0.5224)
1.2 weight测试
测试代码:
# ----------------------------------- weight -----------------------------------
# flag = 0
flag = 1
if flag:
# def loss function
weights = torch.tensor([1, 2], dtype=torch.float)
# weights = torch.tensor([0.7, 0.3], dtype=torch.float)
loss_f_none_w = nn.CrossEntropyLoss(weight=weights, reduction='none')
loss_f_sum = nn.CrossEntropyLoss(weight=weights, reduction='sum')
loss_f_mean = nn.CrossEntropyLoss(weight=weights, reduction='mean')
# forward
loss_none_w = loss_f_none_w(inputs, target)
loss_sum = loss_f_sum(inputs, target)
loss_mean = loss_f_mean(inputs, target)
# view
print("\nweights: ", weights)
print(loss_none_w, loss_sum, loss_mean)
输出:
weights: tensor([1., 2.])
tensor([1.3133, 0.2539, 0.2539]) tensor(1.8210) tensor(0.3642)
不带weight的loss值为:
Cross Entropy Loss:
tensor([1.3133, 0.1269, 0.1269]) tensor(1.5671) tensor(0.5224)
计算过程:
这是因为1.3133 是第0类,权重为1。0.1269 和0.1269 是第1类,权重为2。
以及:
2. nn.NLLLOSS
测试:
# ----------------------------------- 2 NLLLoss -----------------------------------
# flag = 0
flag = 1
if flag:
weights = torch.tensor([1, 1], dtype=torch.float)
loss_f_none_w = nn.NLLLoss(weight=weights, reduction='none')
loss_f_sum = nn.NLLLoss(weight=weights, reduction='sum')
loss_f_mean = nn.NLLLoss(weight=weights, reduction='mean')
# forward
loss_none_w = loss_f_none_w(inputs, target)
loss_sum = loss_f_sum(inputs, target)
loss_mean = loss_f_mean(inputs, target)
# view
print("\nweights: ", weights)
print("NLL Loss", loss_none_w, loss_sum, loss_mean)
输出:
weights: tensor([1., 1.])
NLL Loss tensor([-1., -3., -3.]) tensor(-7.) tensor(-2.3333)
3. nn.BCELoss
测试代码:
# ----------------------------------- 3 BCE Loss -----------------------------------
# flag = 0
flag = 1
if flag:
inputs = torch.tensor([[1, 2], [2, 2], [3, 4], [4, 5]], dtype=torch.float)
target = torch.tensor([[1, 0], [1, 0], [0, 1], [0, 1]], dtype=torch.float)
target_bce = target
# itarget 使用sigmoid函数压缩到0-1之间
inputs = torch.sigmoid(inputs)
weights = torch.tensor([1, 1], dtype=torch.float)
loss_f_none_w = nn.BCELoss(weight=weights, reduction='none')
loss_f_sum = nn.BCELoss(weight=weights, reduction='sum')
loss_f_mean = nn.BCELoss(weight=weights, reduction='mean')
# forward
loss_none_w = loss_f_none_w(inputs, target_bce)
loss_sum = loss_f_sum(inputs, target_bce)
loss_mean = loss_f_mean(inputs, target_bce)
# view
print("\nweights: ", weights)
print("BCE Loss", loss_none_w, loss_sum, loss_mean)
输出:
weights: tensor([1., 1.])
BCE Loss tensor([[0.3133, 2.1269],
[0.1269, 2.1269],
[3.0486, 0.0181],
[4.0181, 0.0067]]) tensor(11.7856) tensor(1.4732)
4. nn.BCEWithLogitsLoss
测试代码:
# ----------------------------------- 4 BCE with Logis Loss -----------------------------------
# flag = 0
flag = 1
if flag:
inputs = torch.tensor([[1, 2], [2, 2], [3, 4], [4, 5]], dtype=torch.float)
target = torch.tensor([[1, 0], [1, 0], [0, 1], [0, 1]], dtype=torch.float)
target_bce = target
# inputs = torch.sigmoid(inputs)
weights = torch.tensor([1, 1], dtype=torch.float)
loss_f_none_w = nn.BCEWithLogitsLoss(weight=weights, reduction='none')
loss_f_sum = nn.BCEWithLogitsLoss(weight=weights, reduction='sum')
loss_f_mean = nn.BCEWithLogitsLoss(weight=weights, reduction='mean')
# forward
loss_none_w = loss_f_none_w(inputs, target_bce)
loss_sum = loss_f_sum(inputs, target_bce)
loss_mean = loss_f_mean(inputs, target_bce)
# view
print("\nweights: ", weights)
print(loss_none_w, loss_sum, loss_mean)
# --------------------------------- pos weight
# flag = 0
flag = 1
if flag:
inputs = torch.tensor([[1, 2], [2, 2], [3, 4], [4, 5]], dtype=torch.float)
target = torch.tensor([[1, 0], [1, 0], [0, 1], [0, 1]], dtype=torch.float)
target_bce = target
# itarget
# inputs = torch.sigmoid(inputs)
weights = torch.tensor([1], dtype=torch.float)
pos_w = torch.tensor([1], dtype=torch.float) # 3
loss_f_none_w = nn.BCEWithLogitsLoss(weight=weights, reduction='none', pos_weight=pos_w)
loss_f_sum = nn.BCEWithLogitsLoss(weight=weights, reduction='sum', pos_weight=pos_w)
loss_f_mean = nn.BCEWithLogitsLoss(weight=weights, reduction='mean', pos_weight=pos_w)
# forward
loss_none_w = loss_f_none_w(inputs, target_bce)
loss_sum = loss_f_sum(inputs, target_bce)
loss_mean = loss_f_mean(inputs, target_bce)
# view
print("\npos_weights: ", pos_w)
print(loss_none_w, loss_sum, loss_mean)
输出:
weights: tensor([1., 1.])
tensor([[0.3133, 2.1269],
[0.1269, 2.1269],
[3.0486, 0.0181],
[4.0181, 0.0067]]) tensor(11.7856) tensor(1.4732)
pos_weights: tensor([1.])
tensor([[0.3133, 2.1269],
[0.1269, 2.1269],
[3.0486, 0.0181],
[4.0181, 0.0067]]) tensor(11.7856) tensor(1.4732)
这个时候loss值是一致的,当设置pos_w = torch.tensor([3], dtype=torch.float)
时:
weights: tensor([1., 1.])
tensor([[0.3133, 2.1269],
[0.1269, 2.1269],
[3.0486, 0.0181],
[4.0181, 0.0067]]) tensor(11.7856) tensor(1.4732)
pos_weights: tensor([3.])
tensor([[0.9398, 2.1269],
[0.3808, 2.1269],
[3.0486, 0.0544],
[4.0181, 0.0201]]) tensor(12.7158) tensor(1.5895)
正样本位置上的loss,放大了3倍。