Tensorflow2 猫狗识别训练图像资源下载猫狗分类 tensorflow

转载

mob64ca1414c613 2024-06-03 10:58:54

文章标签 网络分类神经网络计算机视觉 tensorflow 文章分类 机器学习人工智能

前言

深度是深度神经网络的标志，但深度越大意味着顺序计算越多延迟也越大。这就引出了一个问题——是否有可能构建高性能的“非深度”神经网络？作者实现了一个12层的网络结构实现了top-1 accuracy over 80%on ImageNet的效果。分析网络设计的伸缩规则，并展示如何在不改变网络深度的情况下提高性能。

下面我们就看看作者在论文中是怎么说的吧！

论文地址：https://arxiv.org/abs/2110.07641

1. Introduction（介绍）

人们普遍认为，大深度是高性能网络的重要组成部分，因为深度增加了网络的表征能力，并有助于学习越来越抽象的特征。但是大深度总是必要的吗?这个问题值得一问，因为大深度并非没有缺点。更深层次的网络会导致更多的顺序处理和更高的延迟;它很难并行化，也不太适合需要快速响应的应用程序。

为此，作者进行了研究提出了ParNet。ParNet可以被有效的并行化，并且在速度和准确性上都优于Resnet。注意，尽管处理单元之间的通信带来了额外的延迟，但还是实现了这一点。如果可以进一步减少通信延迟，类似parnet的体系结构可以用于创建非常快速的识别系统。

不仅如此，ParNet可以通过增加宽度、分辨率和分支数量来有效缩放，同时保持深度不变。作者观察到ParNet的性能并没有饱和，而是随着计算吞吐量的增加而增加。这表明，通过进一步增加计算，可以实现更高的性能，同时保持较小的深度(~ 10)和低延迟。

下图是论文中ParNet与其它网络的比较。

Tensorflow2 猫狗识别训练图像资源下载猫狗分类 tensorflow_tensorflow

论文作者的贡献：

首次证明，深度仅为12的神经网络可以在非常有竞争力的基准测试中取得高性能（ImageNet上80.7%）
展示了如何利用ParNet中的并行结构进行快速、低延迟的推断
研究了ParNet的缩放规则，并证明了恒定的低深度下的有效缩放

2. Related Work（相关工作）

2.1 Analyzing importance of depth（分析网络深度的重要性）

已有大量的研究证实了深层网络的优点，具有sigmoid激活的单层神经网络可以以任意小的误差近似任何函数，但是需要使用具有足够大宽度的网络。而要近似函数，具有非线性的深度网络需要的参数要比浅层网络所需要的参数少，而且在固定的预算参数下，深度网络的性能优于浅层网络，这通常被认为是大深度的主要优势之一。

但是在这样的分析中，先前的工作只研究了线性顺序结构的浅层网络，不清楚这个结论是否仍然适用于其他设计。在这项工作中，作者表明浅层网络也可以表现得非常好，但关键是要有并行的子结构。

2.2 Scaling DNNs（深度神经网络的尺寸）

有研究表明，增加深度、宽度和分辨率会导致卷积网络的有效缩放。我们也研究标度规则，但重点关注低深度的机制。我们发现，可以通过增加分支的数量、宽度和分辨率来有效地扩展ParNet，同时保持深度不变和较低。

2.3 Shallow networks（浅层网络）

浅网络在理论机器学习中引起了广泛的关注。在无限宽的情况下，单层神经网络的行为类似于高斯过程，可以用核方法来理解训练过程。然而，与最先进的网络相比，这些模型没有竞争力，我们提供了经验证明，非深度网络可以与深度网络竞争。

2.4 Multi-stream networks（多尺寸流的网络）

多流神经网络已被用于各种计算机视觉任务，如分割、检测、视频分类，我们也使用不同分辨率的流，但我们的网络要低得多，并且流在最后只融合一次，使并行化更容易。

3. METHOD（网络设计方法）

3.1 PARNET BLOCK

在RepVGG中提出了结构重参数化的思想，简单来说就是可以将3x3卷积，1x1卷积两个分支通过代数的处理变成另外的一个3x3的卷积操作。

作者就是借鉴了Rep-VGG的初始块设计，并对其进行修改，使其更适合的非深度架构。但一个只有3×3卷积的非深度网络的挑战是感受野相当有限。为此，作者对结构进行了改进，如图所示：

Tensorflow2 猫狗识别训练图像资源下载猫狗分类 tensorflow_分类_02

作者将上图的block称为RepVGG-SSE。

因为ImageNet这样的大规模数据集，非深度网络可能没有足够的非线性，限制了它的表征能力。因此，作者用SiLU代替ReLU激活。

代码如下：

def SSEblock(x, filters):
    bn = BatchNormalization()(x)
    x = GlobalAveragePool2D()(bn)
    x = Conv2D(filters=filters, kernel_size=(1, 1))(x)
    x = Activation('sigmoid')(x)
    x = Multiply()([bn, x])
    return x
def FuseBlock(x, filters):
    a = conv_bn(x, filters, kernel_size=1, padding='valid')
    b = conv_bn(x, filters, kernel_size=3, stride=1)
    c = Add()([a, b])
    return c
def Stream(x, filters):
    a = SSEblock(x, filters)
    b = FuseBlock(x, filters)
    c = Add()([a, b])
    
    c = Silu()(c)
    return c

3.2 DOWNSAMPLING AND FUSION BLOCK

RepVGG-SSE block的输入与输出的大小是相同的，此外，ParNet结构中还有Downsampling block与fusion block。
Downsampling block的作用是降低分辨率，增加宽度，以实现多尺度处理。fusion block的作用是合并来自多个分辨率的信息。
具体如下：

在降采样 block 中添加了一个与卷积层并行的单层 SE 模块。
在 1×1 卷积分支中添加了 2D 平均池化。
融合 block 额外包含了一个串联（concatenation）层。由于串联，融合 block 的输入通道数是降采样 block 的两倍。
具体结构如图所示：

左图是Fusion，右图是Downsampling_block
代码如下：

def Fusion(input1, input2, filters):
    group = input1.shape[-1]
    input1 = BatchNormalization()(input1)
    input2 = BatchNormalization()(input2)
    a = Concatenate(axis=-1)([input1, input2])
    a = channel_shuffle(a, group)
    x = AveragePooling2D(pool_size=(2, 2))(a)
    x = conv_bn(x, filters, kernel_size=1, stride=1, groups=2, padding='valid')
    y = conv_bn(a, filters, kernel_size=3, stride=2, groups=2)
    z = GlobalAveragePool2D()(a)
    z = Conv2D(filters=filters, kernel_size=1, groups=2)(z)
    z = Activation('sigmoid')(z)
    a = Add()([x, y])
    b = Multiply()([a, z])
    out = Silu()(b)
    return out
def Downsampling_block(inputs, filters):
    x = AveragePooling2D(pool_size=(2, 2))(inputs)
    x = conv_bn(x, filters, kernel_size=1, padding='valid')
    
    y = conv_bn(inputs, filters, kernel_size=3, stride=2)

    z = GlobalAveragePool2D()(inputs)
    z = Conv2D(filters=filters, kernel_size=1, use_bias=False)(z)
    z = Activation('sigmoid')(z)
    
    a = Add()([x, y])
    b = Multiply()([a, z])
    
    out = Silu()(b)
    return out

3.3 NETWORK ARCHITECTURE

ParNet架构示意图如下：

Tensorflow2 猫狗识别训练图像资源下载猫狗分类 tensorflow_网络_03

网络结构如下：

Tensorflow2 猫狗识别训练图像资源下载猫狗分类 tensorflow_分类_04

4. RESULTS（结果展示）

Tensorflow2 猫狗识别训练图像资源下载猫狗分类 tensorflow_神经网络_05

Tensorflow2 猫狗识别训练图像资源下载猫狗分类 tensorflow_计算机视觉_06

感谢博主：

代码演示

参考代码：https://github.com/murufeng/awesome_lightweight_networks/blob/main/light_cnns/mobile_real_time_network/parnet.py
数据集下载：
链接：https://pan.baidu.com/s/1zs9U76OmGAIwbYr91KQxgg
提取码：bhjx

新建train.py文件、logs文件夹（存入模型文件）、logs1文件夹（查看TensorBoard）

1. 导入库

from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint, LearningRateScheduler, TensorBoard
from tensorflow.keras.layers import Input
import tensorflow as tf
from tensorflow.keras.layers import (
    Conv2D, BatchNormalization, AveragePooling2D, Activation,
    Multiply, Add, Concatenate, Dense, Input, Flatten, Reshape
)
from tensorflow.keras.models import Model

2. 设置超参数

classes = 2
batch_size = 16
epochs = 100
img_size = 256
lr = 1e-3
datasets = './dataset/data1_dog_cat'
gpus = tf.config.experimental.list_physical_devices(device_type='GPU')
for gpu in gpus:
    tf.config.experimental.set_memory_growth(gpu, True)

3. 数据预处理

def data_process_func():
    # ---------------------------------- #
    #   训练集进行的数据增强操作
    #   1. rotation_range -> 随机旋转角度
    #   2. width_shift_range -> 随机水平平移
    #   3. width_shift_range -> 随机数值平移
    #   4. rescale -> 数据归一化
    #   5. shear_range -> 随机错切变换
    #   6. zoom_range -> 随机放大
    #   7. horizontal_flip -> 水平翻转
    #   8. brightness_range -> 亮度变化
    #   9. fill_mode -> 填充方式
    # ---------------------------------- #
    train_data = ImageDataGenerator(
        rotation_range=50, 
        width_shift_range=0.1, 
        height_shift_range=0.1,
        rescale=1/255.0,
        shear_range=10,
        zoom_range=0.1,
        horizontal_flip=True,
        brightness_range=(0.7, 1.3),
        fill_mode='nearest'
    )
    # ---------------------------------- #
    #   测试集数据增加操作
    #   归一化即可
    # ---------------------------------- #
    test_data = ImageDataGenerator(
        rescale=1/255
    )
    # ---------------------------------- #
    #   训练器生成器
    #   测试集生成器
    # ---------------------------------- #
    train_generator = train_data.flow_from_directory(
        f'{datasets}/train',
        target_size=(img_size, img_size),
        
        batch_size=batch_size
    )
    
    test_generator = test_data.flow_from_directory(
        f'{datasets}/test',
        target_size=(img_size, img_size),
        batch_size=batch_size
    )
    
    return train_generator, test_generator

4. 构建ParNet

import tensorflow as tf
from tensorflow.keras.layers import (
    Conv2D, BatchNormalization, AveragePooling2D, Activation,
    Multiply, Add, Concatenate, Dense, Input, Flatten, Reshape
)
from tensorflow.keras.models import Model



# 在宽和高上进行平均池化
class GlobalAveragePool2D(tf.keras.layers.Layer):
    def __init__(self):
        super(GlobalAveragePool2D, self).__init__()
        self.keepdim = True

    def call(self, inputs):
        return tf.compat.v1.reduce_mean(inputs, axis=[1, 2], keepdims=self.keepdim)

class Silu(tf.keras.layers.Layer):
    def __init__(self, **kwargs):
        super(Silu, self).__init__(**kwargs)
        self.activation = tf.nn.silu

    def call(self, inputs):
        return self.activation(inputs)

    def compute_output_shape(self, input_shape):
        return input_shape

    def get_config(self):
        config = {'activation': tf.keras.activations.serialize(self.activation)}
        base_config = super(Silu, self).get_config()
        return dict(list(base_config.items()) + list(config.items()))

def conv_bn(x,out_channels,kernel_size, stride=1, groups=1, padding='same'):
    x = Conv2D(filters=out_channels, kernel_size=kernel_size,
               strides=stride, groups=groups, use_bias=False, padding=padding)(x)
    x = BatchNormalization()(x)
    return x
    
def SSEblock(x, filters):
    bn = BatchNormalization()(x)
    x = GlobalAveragePool2D()(bn)
    x = Conv2D(filters=filters, kernel_size=(1, 1))(x)
    x = Activation('sigmoid')(x)
    x = Multiply()([bn, x])
    return x

def Downsampling_block(inputs, filters):
    x = AveragePooling2D(pool_size=(2, 2))(inputs)
    x = conv_bn(x, filters, kernel_size=1, padding='valid')
    
    y = conv_bn(inputs, filters, kernel_size=3, stride=2)

    z = GlobalAveragePool2D()(inputs)
    z = Conv2D(filters=filters, kernel_size=1, use_bias=False)(z)
    z = Activation('sigmoid')(z)
    
    a = Add()([x, y])
    b = Multiply()([a, z])
    
    out = Silu()(b)
    return out

def channel_shuffle(x, group):
    batchsize, height, width, num_channels = x.shape
    assert num_channels % group == 0
    group_channels = int(num_channels // group)
    x = Reshape((height, width,group_channels, group))(x)
    x = tf.transpose(x, perm=[0,1,2,4,3])
    x = Reshape((height, width, num_channels))(x)
    return x

def Fusion(input1, input2, filters):
    group = input1.shape[-1]
    input1 = BatchNormalization()(input1)
    input2 = BatchNormalization()(input2)
    a = Concatenate(axis=-1)([input1, input2])
    a = channel_shuffle(a, group)
    x = AveragePooling2D(pool_size=(2, 2))(a)
    x = conv_bn(x, filters, kernel_size=1, stride=1, groups=2, padding='valid')
    y = conv_bn(a, filters, kernel_size=3, stride=2, groups=2)
    z = GlobalAveragePool2D()(a)
    z = Conv2D(filters=filters, kernel_size=1, groups=2)(z)
    z = Activation('sigmoid')(z)
    a = Add()([x, y])
    b = Multiply()([a, z])
    out = Silu()(b)
    return out

def FuseBlock(x, filters):
    a = conv_bn(x, filters, kernel_size=1, padding='valid')
    b = conv_bn(x, filters, kernel_size=3, stride=1)
    c = Add()([a, b])
    return c

# RepVGG-SSE
def Stream(x, filters):
    a = SSEblock(x, filters)
    b = FuseBlock(x, filters)
    c = Add()([a, b])
    
    c = Silu()(c)
    return c

def ParNetEncoder(inputs, block_channels, depth):
    x = Downsampling_block(inputs, block_channels[0])
    # 第一个并行子结构
    x = Downsampling_block(x, block_channels[1])
    y = Stream(x, block_channels[1])
    for _ in range(depth[0]-1):
        y = Stream(y, block_channels[1])
    y = Downsampling_block(y, block_channels[2])
    
    # 第二个并行子结构
    x = Downsampling_block(x, block_channels[2])
    z = Stream(x, block_channels[2])
    for _ in range(depth[1]-1):
        z = Stream(z, block_channels[2])

    z = Fusion(y, z, block_channels[3])
    # 第三个并行子结构
    x = Downsampling_block(x, block_channels[3])
    a = Stream(x, block_channels[3])
    for _ in range(depth[2]-1):
        a = Stream(a, block_channels[3])
    b = Fusion(z, a, block_channels[3])
    x = Downsampling_block(b, block_channels[4])
    return x

def ParNetDecoder(x, n_classes):
    x = AveragePooling2D(pool_size=(1,1))(x)
    x = x = Flatten()(x)
    x = Dense(n_classes, activation='softmax')(x)
    return x

def ParNet(x, n_classes, block_channels=[64, 128, 256, 512, 2048], depth=[4, 5, 5]):
    x = ParNetEncoder(x, block_channels=block_channels, depth=depth)
    x = ParNetDecoder(x, n_classes)
    
    return x

# 四个不同大小版本的网络模型
def parnet_s(inputs, classes):
    return ParNet(inputs, classes, block_channels=[64, 96, 192, 384, 1280])

def parnet_m(in_channels, classes):
    return ParNet(in_channels, classes, block_channels=[64, 128, 256, 512, 2048])

def parnet_l(in_channels, classes):
    return ParNet(in_channels, classes, block_channels=[64, 160, 320, 640, 2560])

def parnet_xl(in_channels, classes):
    return ParNet(in_channels, classes, block_channels=[64, 200, 400, 800, 3200])

5. 设置回调函数

# 学习率调整
def adjust_lr(epoch, lr):
    print("Seting to %s" % (lr))

    return lr * 0.95
callbackss = [
            EarlyStopping(monitor='val_loss', patience=15, verbose=1),
            ModelCheckpoint('logs/ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5',monitor='val_loss',
                            save_weights_only=True, save_best_only=False, period=1),
            LearningRateScheduler(adjust_lr),
            TensorBoard(log_dir='./logs1')
        ]

6. 训练模型

训练的是parnet_s版本

inputs = Input(shape=(img_size,img_size,3))
train_generator, test_generator = data_process_func()
model = Model(inputs=inputs, outputs=parnet_s(inputs=inputs, classes=classes))
# model.summary()
model.compile(optimizer=Adam(lr=lr), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(
           x = train_generator,
           validation_data = test_generator,
           epochs = epochs,
           callbacks = callbackss
          )

训练完之后可在当前目录下输入cmd命令查看tensorboard：tensorboard --logdir=./logs1

7. 预测图片

新建predict文件

import tensorflow as tf
from PIL import Image
from tensorflow.keras.models import load_model
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input
import tensorflow as tf
from tensorflow.keras.layers import (
    Conv2D, BatchNormalization, AveragePooling2D, Activation,
    Multiply, Add, Concatenate, Dense, Input, Flatten, Reshape
)
from tensorflow.keras.models import Model
from tensorflow.keras.preprocessing.image import img_to_array, load_img
import numpy as np
import os
import matplotlib.pyplot as plt

# 在宽和高上进行平均池化
class GlobalAveragePool2D(tf.keras.layers.Layer):
    def __init__(self):
        super(GlobalAveragePool2D, self).__init__()
        self.keepdim = True

    def call(self, inputs):
        return tf.compat.v1.reduce_mean(inputs, axis=[1, 2], keepdims=self.keepdim)

class Silu(tf.keras.layers.Layer):
    def __init__(self, **kwargs):
        super(Silu, self).__init__(**kwargs)
        self.activation = tf.nn.silu

    def call(self, inputs):
        return self.activation(inputs)

    def compute_output_shape(self, input_shape):
        return input_shape

    def get_config(self):
        config = {'activation': tf.keras.activations.serialize(self.activation)}
        base_config = super(Silu, self).get_config()
        return dict(list(base_config.items()) + list(config.items()))

def conv_bn(x,out_channels,kernel_size, stride=1, groups=1, padding='same'):
    x = Conv2D(filters=out_channels, kernel_size=kernel_size,
               strides=stride, groups=groups, use_bias=False, padding=padding)(x)
    x = BatchNormalization()(x)
    return x

def SSEblock(x, filters):
    bn = BatchNormalization()(x)
    x = GlobalAveragePool2D()(bn)
    x = Conv2D(filters=filters, kernel_size=(1, 1))(x)
    x = Activation('sigmoid')(x)
    x = Multiply()([bn, x])
    return x

def Downsampling_block(inputs, filters):
    x = AveragePooling2D(pool_size=(2, 2))(inputs)
    x = conv_bn(x, filters, kernel_size=1, padding='valid')

    y = conv_bn(inputs, filters, kernel_size=3, stride=2)

    z = GlobalAveragePool2D()(inputs)
    z = Conv2D(filters=filters, kernel_size=1, use_bias=False)(z)
    z = Activation('sigmoid')(z)

    a = Add()([x, y])
    b = Multiply()([a, z])

    out = Silu()(b)
    return out

def channel_shuffle(x, group):
    batchsize, height, width, num_channels = x.shape
    assert num_channels % group == 0
    group_channels = int(num_channels // group)
    x = Reshape((height, width,group_channels, group))(x)
    x = tf.transpose(x, perm=[0,1,2,4,3])
    x = Reshape((height, width, num_channels))(x)
    return x

def Fusion(input1, input2, filters):
    group = input1.shape[-1]
    input1 = BatchNormalization()(input1)
    input2 = BatchNormalization()(input2)
    a = Concatenate(axis=-1)([input1, input2])
    a = channel_shuffle(a, group)
    x = AveragePooling2D(pool_size=(2, 2))(a)
    x = conv_bn(x, filters, kernel_size=1, stride=1, groups=2, padding='valid')
    y = conv_bn(a, filters, kernel_size=3, stride=2, groups=2)
    z = GlobalAveragePool2D()(a)
    z = Conv2D(filters=filters, kernel_size=1, groups=2)(z)
    z = Activation('sigmoid')(z)
    a = Add()([x, y])
    b = Multiply()([a, z])
    out = Silu()(b)
    return out

def FuseBlock(x, filters):
    a = conv_bn(x, filters, kernel_size=1, padding='valid')
    b = conv_bn(x, filters, kernel_size=3, stride=1)
    c = Add()([a, b])
    return c

# RepVGG-SSE
def Stream(x, filters):
    a = SSEblock(x, filters)
    b = FuseBlock(x, filters)
    c = Add()([a, b])

    c = Silu()(c)
    return c

def ParNetEncoder(inputs, block_channels, depth):
    x = Downsampling_block(inputs, block_channels[0])
    # 第一个并行子结构
    x = Downsampling_block(x, block_channels[1])
    y = Stream(x, block_channels[1])
    for _ in range(depth[0]-1):
        y = Stream(y, block_channels[1])
    y = Downsampling_block(y, block_channels[2])

    # 第二个并行子结构
    x = Downsampling_block(x, block_channels[2])
    z = Stream(x, block_channels[2])
    for _ in range(depth[1]-1):
        z = Stream(z, block_channels[2])

    z = Fusion(y, z, block_channels[3])
    # 第三个并行子结构
    x = Downsampling_block(x, block_channels[3])
    a = Stream(x, block_channels[3])
    for _ in range(depth[2]-1):
        a = Stream(a, block_channels[3])
    b = Fusion(z, a, block_channels[3])
    x = Downsampling_block(b, block_channels[4])
    return x

def ParNetDecoder(x, n_classes):
    x = AveragePooling2D(pool_size=(1,1))(x)
    x = x = Flatten()(x)
    x = Dense(n_classes, activation='softmax')(x)
    return x

def ParNet(x, n_classes, block_channels=[64, 128, 256, 512, 2048], depth=[4, 5, 5]):
    x = ParNetEncoder(x, block_channels=block_channels, depth=depth)
    x = ParNetDecoder(x, n_classes)

    return x

# 四个不同大小版本的网络模型
def parnet_s(inputs, classes):
    return ParNet(inputs, classes, block_channels=[64, 96, 192, 384, 1280])

def parnet_m(in_channels, classes):
    return ParNet(in_channels, classes, block_channels=[64, 128, 256, 512, 2048])

def parnet_l(in_channels, classes):
    return ParNet(in_channels, classes, block_channels=[64, 160, 320, 640, 2560])

def parnet_xl(in_channels, classes):
    return ParNet(in_channels, classes, block_channels=[64, 200, 400, 800, 3200])

datasets = './dataset/data1_dog_cat/test'
names = os.listdir(datasets)
weight = './model_data/val_loss0.145_test_acc0.947_parnet_dog.h5' # 模型文件路径
net = parnet_s
classes = 2
img_size = 256

# 归一化
def preprocess_input(x):
    x /= 255

    return x
inputs = Input(shape=(img_size,img_size,3))
model = Model(inputs=inputs, outputs=parnet_s(inputs=inputs, classes=classes))
model.load_weights(weight)
while True:

    img_path = input('input img_path:')
    try:
        img = Image.open(img_path)
        img = img.resize((img_size, img_size))
        image_data = np.expand_dims(preprocess_input(np.array(img, np.float32)), 0)
    except:
        print('The path is error!')
        continue
    else:
        plt.imshow(img)
        plt.axis('off')
        p =model.predict(image_data)[0]
        pred_name = names[np.argmax(p)]
        plt.title('%s:%.3f'%(pred_name, np.max(p)))
        plt.show()

输入预测图片路径

效果如下：

猫的概率100%

Tensorflow2 猫狗识别训练图像资源下载猫狗分类 tensorflow_网络_07