随着深度学习的发展,其优越的性能影响深远。图像分类也是视觉任务的经典问题。但是某些特殊的场景下,分类的样本少,无法利用深度学习模型重新训练,迁移学习在少量样本下也是有心无力。好在,已经有很多的大牛开源了众多的预训练模型,比如经典的MobileNeta家族、VGG家族、ResNet家族等等,以及最近火热的大模型,比如CLIP,Dinov2以及其的衍生版本。其性能在不同的硬件水平时期得到了充分的证明。今天,我们就用预训练的模型进行图像分类任务。

首先,这里贴一点经典预训练模型的下载地址

VGG = {
    'vgg11': 'https://download.pytorch.org/models/vgg11-bbd30ac9.pth',
    'vgg13': 'https://download.pytorch.org/models/vgg13-c768596a.pth',
    'vgg16': 'https://download.pytorch.org/models/vgg16-397923af.pth',
    'vgg19': 'https://download.pytorch.org/models/vgg19-dcbb9e9d.pth',
    'vgg11_bn': 'https://download.pytorch.org/models/vgg11_bn-6002323d.pth',
    'vgg13_bn': 'https://download.pytorch.org/models/vgg13_bn-abd245e5.pth',
    'vgg16_bn': 'https://download.pytorch.org/models/vgg16_bn-6c64b313.pth',
    'vgg19_bn': 'https://download.pytorch.org/models/vgg19_bn-c79401a0.pth',
}

ResNet = {
 	'resnet18': 'https://download.pytorch.org/models/resnet18-5c106cde.pth',
    'resnet34': 'https://download.pytorch.org/models/resnet34-333f7ec4.pth',
    'resnet50': 'https://download.pytorch.org/models/resnet50-19c8e357.pth',
    'resnet101': 'https://download.pytorch.org/models/resnet101-5d3b4d8f.pth',
    'resnet152': 'https://download.pytorch.org/models/resnet152-b121ed2d.pth',
}

SqueezeNet = {
    'squeezenet1_0': 'https://download.pytorch.org/models/squeezenet1_0-a815701f.pth',
    'squeezenet1_1': 'https://download.pytorch.org/models/squeezenet1_1-f364aa15.pth',
}

MobileNet = {
    'mobilenet_v2': 'https://download.pytorch.org/models/mobilenet_v2-b0353104.pth',
     'mobilenet_v3_small': "https://download.pytorch.org/models/mobilenet_v3_small-047dcff4.pth",
    'mobilenet_v3_large': "https://download.pytorch.org/models/mobilenet_v3_large-5c1a4163.pth"
}

# 不需要外网就能下载
Dinov2 = {
	'https://hf-mirror.com/facebook/dinov2-small/tree/main',
    'https://hf-mirror.com/facebook/dinov2-base/tree/main',
    'https://hf-mirror.com/facebook/dinov2-large/tree/main',
    'https://hf-mirror.com/facebook/dinov2-giant/tree/main'
}

这里已ResNet18为例,话不多说,直接上代码:

from sklearn import svm
import numpy as np
import torch
import torch.nn as nn
import os
import cv2
import torchvision.models as models
from torchsummary import summary
from datetime import datetime
import torchvision.transforms as transforms
import torchvision
from PIL import Image

device = torch.device("cuda")  # 电脑无GPU则选择'cpu' 

# 加载模型
model = models.resnet18(pretrained=True) # pretrained=True会从官网下载预训练模型并加载,需要保证电脑联网
classifier = nn.Sequential()
model.fc = classifier  # 将最后的全连接层置为空,只取卷积后的第一层全连接,维度512
model.to(device)

# 训练集ImageNet归一化方法,(img - mean)/std
transform_image = transforms.Compose([transforms.ToTensor(),
                                       transforms.Normalize([0.485, 0.456, 0.406],
                                                            [0.229, 0.224, 0.225])])  # 均值方差均为RGB顺序
                                              
img = cv2.imread('xxx.png')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # opencv读进来是BGR格式
img = cv2.resize(img, (224, 224))  # ResNet18训练集是ImageNet,输入224*224

 except TypeError:
     continue
 if img.ndim == 3:
     img = img.reshape(1, 3, 224, 224)
 output = model(img.to(device)) # 输出为512维特征

根据上述方案可得到单张图像的一维特征表示,接下来分类。我们可以计算每张图像的特征间的余弦相似度,余弦值的范围在[-1,1]之间,值越趋近于1,代表两个向量的方向越接近;越趋近于-1,他们的方向越相反;接近于0,表示两个向量近乎于正交。