目标检测 YOLOv5 - 早停机制(Early Stopping)

flyfish

Early Stopping but when? YOLOv5:v5的版本没有早停机制,在2021年9月5日后的版本更新了早停机制
EarlyStopper updates #4679(Sep 5, 2021)

参数

patience:训练了多少个epoch,如果模型效果未提升,就让模型提前停止训练。
fitness监控的是增大的数值,例如mAP,如果mAP在连续训练patience次内没有增加就停止训练。

如何使用

方式需要两步

一 声明,初始化patience参数

stopper = EarlyStopping(patience=3)

二 训练过程中判断是否需要早停

stopper(epoch=epoch, fitness=mAP的数值)
下例中使用随机数替代传入的mAP,并编写代码进行测试

import random
class EarlyStopping:
    # YOLOv5 simple early stopper
    def __init__(self, patience=30):
        self.best_fitness = 0.0  # i.e. mAP
        self.best_epoch = 0
        self.patience = patience or float('inf')  # epochs to wait after fitness stops improving to stop
        self.possible_stop = False  # possible stop may occur next epoch

    def __call__(self, epoch, fitness):
        if fitness >= self.best_fitness:  # >= 0 to allow for early zero-fitness stage of training
            self.best_epoch = epoch
            self.best_fitness = fitness
        delta = epoch - self.best_epoch  # epochs without improvement
        print("delta:",delta)
        print("best_fitness:", self.best_fitness)
        self.possible_stop = delta >= (self.patience - 1)  # possible stop may occur next epoch
        stop = delta >= self.patience  # stop training if patience exceeded
        if stop:
            print(f'EarlyStopping patience {self.patience} exceeded, stopping training.')
        return stop

#我们编写如下代码进行测试
stopper = EarlyStopping(patience=3)
epochs=10
start_epoch=0
for epoch in range(start_epoch, epochs):
    random.seed(epoch)
    print("function:",stopper(epoch=epoch, fitness=random.uniform(0.1, 0.5)))
    print("possible_stop:",stopper.possible_stop)

输出结果

# delta: 0
# best_fitness: 0.43776874061001925
# function: False
# possible_stop: False
# delta: 1
# best_fitness: 0.43776874061001925
# function: False
# possible_stop: False

# delta: 0
# best_fitness: 0.4824137087556998
# function: False
# possible_stop: False
# delta: 1
# best_fitness: 0.4824137087556998
# function: False
# possible_stop: False
# delta: 2
# best_fitness: 0.4824137087556998
# function: False
# possible_stop: True

# delta: 3
# best_fitness: 0.4824137087556998
# EarlyStopping patience 3 exceeded, stopping training.
# function: True
# possible_stop: True

改进方法

YOLOv5自带的早停机制只能监控不断增大的数值,如果改进可以这样,改进方式如下
(1)增加参数mode,max表示监控增大的数值,min表示监控减小的数值,既能监控不断增大的数值例如mAP,也可以监控不断减少的数值例如loss
(2)如果再精确设置一个最小改变值的参数diff,变化范围太小我们也认为模型效果未提升,代码就不是if fitness >= self.best_fitness 这种比较,而是相减。
(3)当多个模型比较谁厉害的时候,可以设置baseline,训练了多少个epoch,mAP还没有超过baseline也早早停止,不浪费资源了。