深度学习中常见的图像处理任务

原创

Hong_Youth 2022-01-14 14:32:39 博主文章分类：神经网络和深度学习 ©著作权

文章标签 深度学习计算机视觉图像处理 xml 目标检测 文章分类 代码人生

©著作权归作者所有：来自51CTO博客作者Hong_Youth的原创作品，请联系作者获取转载授权，否则将追究法律责任

`常见图像处理的任务`

1、分类

给定一副图像，我们用计算机模型预测图片中有什么对象。

深度学习中常见的图像处理任务_xml

2、分类与定位

我们不仅要知道图片中的对象是什么，还要在对象的附近画一个边框，确定该对象所处的位置。

深度学习中常见的图像处理任务_目标检测_02

3、语义分割

区分到图中每一点像素点，而不仅仅是矩形框框住。

深度学习中常见的图像处理任务_计算机视觉_03

4、目标检测

目标检测简单来说就是回答图片里面有什么？分别在哪里？（并把它们使用矩形框框住）

深度学习中常见的图像处理任务_图像处理_04

5、实例分割

实例分割是目标检测和语义分割的结合。相对目标检测的边界框，实例分割可精确到物体的边缘；相对语义分割，实例分割需要标出图上同一物体的不同个体。

深度学习中常见的图像处理任务_计算机视觉_05

`图像定位`

对于单纯的分类问题，比较容易理解，给定一副图片，我们输出一个标签类别，已经很熟悉。

定位比较复杂，需要输出四个数字（x，y，w，h），图像中某一个点的坐标（x，y），以及图像的宽度和高度，有了这四个数字，我们可以很容易地找到物体的边框。

深度学习中常见的图像处理任务_计算机视觉_06

Oxford-IIIT数据集

The Oxford-IIIT Pet Dataset是一个宠物图像数据集，包含37种宠物，每种宠物200张左右宠物图片，该数据集同时包含宠物分类、头部轮廓标注和语义分割信息。

深度学习中常见的图像处理任务_深度学习_07

先来看看图片定位

# -*- coding: UTF-8 -*-"""
Author: LGD
FileName: image_position
DateTime: 2020/12/25 09:45 
SoftWare: PyCharm
"""import tensorflow as tf
import matplotlib.pyplot as plt
from lxml import etree
import numpy as np
import glob
from matplotlib.patches import Rectangle

print(tf.__version__)
print(tf.test.is_gpu_available())

img = tf.io.read_file('images/Abyssinian_1.jpg')

img = tf.image.decode_jpeg(img)
print(img.shape)

plt.imshow(img)
plt.show()

xml = open('annotations/xmls/Abyssinian_1.xml').read()
sel = etree.HTML(xml)
width = int(sel.xpath('//size/width/text()')[0])
print(width)
height = int(sel.xpath('//size/height/text()')[0])

xmin = int(sel.xpath('//bndbox/xmin/text()')[0])
xmax = int(sel.xpath('//bndbox/xmax/text()')[0])
ymin = int(sel.xpath('//bndbox/ymin/text()')[0])
ymax = int(sel.xpath('//bndbox/ymax/text()')[0])

print(height, xmin, xmax, ymin, ymax)

plt.imshow(img)
rect = Rectangle((xmin, ymin), (xmax-xmin), (ymax-ymin), fill=False, color='red')
ax = plt.gca()
ax.axes.add_patch(rect)
plt.show()

图片缩放与目标值的规范

img = tf.image.resize(img, [224, 224])
img = img / 255
plt.imshow(img)
plt.show()

xmin = (xmin/width)*224
xmax = (xmax/width)*224
ymin = (ymin/height)*224
ymax = (ymax/height)*224


plt.imshow(img)
rect = Rectangle((xmin, ymin), (xmax - xmin), (ymax - ymin), fill=False, color='red')
ax = plt.gca()
ax.axes.add_patch(rect)
plt.show()

模型训练

# -*- coding: UTF-8 -*-"""
Author: LGD
FileName: image_position
DateTime: 2020/12/25 09:45 
SoftWare: PyCharm
"""import tensorflow as tf
import matplotlib.pyplot as plt
from lxml import etree
import numpy as np
import glob
from matplotlib.patches import Rectangle

print(tf.__version__)
print(tf.test.is_gpu_available())

# img = tf.io.read_file('images/Abyssinian_1.jpg')## img = tf.image.decode_jpeg(img)# print(img.shape)## plt.imshow(img)# plt.show()## xml = open('annotations/xmls/Abyssinian_1.xml').read()# sel = etree.HTML(xml)# width = int(sel.xpath('//size/width/text()')[0])# height = int(sel.xpath('//size/height/text()')[0])## xmin = int(sel.xpath('//bndbox/xmin/text()')[0])# xmax = int(sel.xpath('//bndbox/xmax/text()')[0])# ymin = int(sel.xpath('//bndbox/ymin/text()')[0])# ymax = int(sel.xpath('//bndbox/ymax/text()')[0])## print(height, xmin, xmax, ymin, ymax)## plt.imshow(img)# rect = Rectangle((xmin, ymin), (xmax - xmin), (ymax - ymin), fill=False, color='red')# ax = plt.gca()# ax.axes.add_patch(rect)# plt.show()## img = tf.image.resize(img, [224, 224])# img = img / 255# plt.imshow(img)# plt.show()## xmin = (xmin/width)*224# xmax = (xmax/width)*224# ymin = (ymin/height)*224# ymax = (ymax/height)*224### plt.imshow(img)# rect = Rectangle((xmin, ymin), (xmax - xmin), (ymax - ymin), fill=False, color='red')# ax = plt.gca()# ax.axes.add_patch(rect)# plt.show()

# 创建输入管道
images = glob.glob('images/*.jpg')
print(images[:5])
print(images[-5:])
print(len(images))

xmls = glob.glob('annotations/xmls/*.xml')
print(xmls[:5])
print(xmls[-5:])
print(len(xmls))

names = [x.split('\\')[-1].split('.xml')[0] for x in xmls]
print(names, len(names))

imgs_train = [img for img in images if img.split('\\')[-1].split('.jpg')[0] in names]
print(imgs_train, len(imgs_train))

imgs_test = [img for img in images if img.split('\\')[-1].split('.jpg')[0] not in names]

imgs_train.sort(key=lambda x: x.split('\\')[-1].split('.jpg')[0])
xmls.sort(key=lambda x: x.split('\\')[-1].split('.jpg')[0])

print(imgs_train[-5:])
print(xmls[-5:])


def to_labels(path):
    xml = open('{}'.format(path)).read()
    sel = etree.HTML(xml)
    width = int(sel.xpath('//size/width/text()')[0])
    height = int(sel.xpath('//size/height/text()')[0])
    xmin = int(sel.xpath('//bndbox/xmin/text()')[0])
    xmax = int(sel.xpath('//bndbox/xmax/text()')[0])
    ymin = int(sel.xpath('//bndbox/ymin/text()')[0])
    ymax = int(sel.xpath('//bndbox/ymax/text()')[0])

    return [xmin / width, ymin / height, xmax / width, ymax / height]


labels = [to_labels(path) for path in xmls]

print(labels[:3])
out1, out2, out3, out4 = list(zip(*labels))
print(len(out1), len(out2), len(out3), len(out4))

out1 = np.array(out1)
out2 = np.array(out2)
out3 = np.array(out3)
out4 = np.array(out4)

label_datasets = tf.data.Dataset.from_tensor_slices((out1, out2, out3, out4))
print('label_datasets: ', label_datasets)


def load_image(path):
    img = tf.io.read_file(path)
    img = tf.image.decode_jpeg(img, channels=3)
    img = tf.image.resize(img, (224, 224))
    img = img / 127.5 - 1
    return img


image_dataset = tf.data.Dataset.from_tensor_slices(imgs_train)
image_dataset = image_dataset.map(load_image)
print(image_dataset)

dataset = tf.data.Dataset.zip((image_dataset, label_datasets))
dataset = dataset.repeat().shuffle(len(imgs_train)).batch(32)
for img, label in dataset.take(1):
    plt.imshow(tf.keras.preprocessing.image.array_to_img(img[0]))
    out1, out2, out3, out4 = label
    xmin, ymin, xmax, ymax = out1[0].numpy() * 224, out2[0].numpy() * 224, out3[0].numpy() * 224, out4[0].numpy() * 224
    rect = Rectangle((xmin, ymin), (xmax - xmin), (ymax - ymin), fill=False, color='red')
    ax = plt.gca()
    ax.axes.add_patch(rect)
    plt.show()

# 模型创建阶段
xception = tf.keras.applications.Xception(weights='imagenet', include_top=False, input_shape=(224, 224, 3))

inputs = tf.keras.layers.Input(shape=(224, 224, 3))

x = xception(inputs)
x = tf.keras.layers.GlobalAveragePooling2D()(x)
x = tf.keras.layers.Dense(2048, activation='relu')(x)
x = tf.keras.layers.Dense(256, activation='relu')(x)
out1 = tf.keras.layers.Dense(1)(x)
out2 = tf.keras.layers.Dense(1)(x)
out3 = tf.keras.layers.Dense(1)(x)
out4 = tf.keras.layers.Dense(1)(x)

prediction = [out1, out2, out3, out4]

model = tf.keras.models.Model(inputs=inputs, outputs=prediction)

model.compile(
    tf.keras.optimizers.Adam(learning_rate=0.0001),
    loss='mse',
    metrics=['mae']
)

EPOCHS = 50

history = model.fit(
    dataset,
    epochs=EPOCHS,
    steps_per_epoch=len(imgs_train) // 32
)