使用Keras进行迁移学习_kreas

导读

使用Keras进行迁移学习,从实际代码出发,清楚明白。

 

使用Keras进行迁移学习_kreas_02

 

Inception-V3 Google Research

 

什么是迁移学习?

迁移学习是机器学习中的一个研究问题,它侧重于存储在解决一个问题时获得的知识,并将其应用于另一个不同但相关的问题。

为什么要用迁移学习?

 

  • 在实践中,很少有人从零开始训练卷积网络(随机初始化),因为很少有足够的数据集。因此,使用预先训练的网络权值作为初始化或固定的特征提取器有助于解决现有的大多数问题。

  • 非常深的网络训练是昂贵的。最复杂的模型需要使用数百台配备了昂贵gpu的机器,数周的时间来进行训练。

  • 因为深度学习确定结构/调整/训练方法/超参数是一门没有太多理论指导的黑盒子。

我的经验:

"DON'T TRY TO BE AN HERO" ~Andrej Karapathy

我遇到的大多数计算机视觉问题都没有非常大的数据集(5000张图像- 40000张图像)。即使使用极端的数据增强策略,也很难达到较高的精度。用数百万个参数训练这些网络通常会使模型过拟合。所以迁移学习对我们有帮助。

迁移学习如何帮忙?

 

当你观察这些深度学习网络学习的内容时,它们会尝试在早期的层中检测边缘,在中间层中检测形状,在后期层中检测一些高级数据的特定特征。这些训练有素的网络通常有助于解决其他计算机视觉问题。让我们来看看如何使用Keras进行迁移学习,以及迁移学习中的各种情况。

使用Keras进行迁移学习_kreas_03

 

Inception V3 Google Research

 

使用Keras的简单实现:

 

  1. from keras import applications

  2. from keras.preprocessing.image importImageDataGenerator

  3. from keras import optimizers

  4. from keras.models importSequential, Model

  5. from keras.layers importDropout, Flatten, Dense, GlobalAveragePooling2D

  6. from keras import backend as k

  7. from keras.callbacks importModelCheckpoint, LearningRateScheduler, TensorBoard, EarlyStopping

  8.  

  9. img_width, img_height = 256, 256

  10. train_data_dir = "data/train"

  11. validation_data_dir = "data/val"

  12. nb_train_samples = 4125

  13. nb_validation_samples = 466

  14. batch_size = 16

  15. epochs = 50

  16.  

  17. model = applications.VGG19(weights = "imagenet", include_top=False, input_shape = (img_width, img_height, 3))

  18.  

  19. """

  20. Layer (type) Output Shape Param #

  21. =================================================================

  22. input_1 (InputLayer) (None, 256, 256, 3) 0

  23. _________________________________________________________________

  24. block1_conv1 (Conv2D) (None, 256, 256, 64) 1792

  25. _________________________________________________________________

  26. block1_conv2 (Conv2D) (None, 256, 256, 64) 36928

  27. _________________________________________________________________

  28. block1_pool (MaxPooling2D) (None, 128, 128, 64) 0

  29. _________________________________________________________________

  30. block2_conv1 (Conv2D) (None, 128, 128, 128) 73856

  31. _________________________________________________________________

  32. block2_conv2 (Conv2D) (None, 128, 128, 128) 147584

  33. _________________________________________________________________

  34. block2_pool (MaxPooling2D) (None, 64, 64, 128) 0

  35. _________________________________________________________________

  36. block3_conv1 (Conv2D) (None, 64, 64, 256) 295168

  37. _________________________________________________________________

  38. block3_conv2 (Conv2D) (None, 64, 64, 256) 590080

  39. _________________________________________________________________

  40. block3_conv3 (Conv2D) (None, 64, 64, 256) 590080

  41. _________________________________________________________________

  42. block3_conv4 (Conv2D) (None, 64, 64, 256) 590080

  43. _________________________________________________________________

  44. block3_pool (MaxPooling2D) (None, 32, 32, 256) 0

  45. _________________________________________________________________

  46. block4_conv1 (Conv2D) (None, 32, 32, 512) 1180160

  47. _________________________________________________________________

  48. block4_conv2 (Conv2D) (None, 32, 32, 512) 2359808

  49. _________________________________________________________________

  50. block4_conv3 (Conv2D) (None, 32, 32, 512) 2359808

  51. _________________________________________________________________

  52. block4_conv4 (Conv2D) (None, 32, 32, 512) 2359808

  53. _________________________________________________________________

  54. block4_pool (MaxPooling2D) (None, 16, 16, 512) 0

  55. _________________________________________________________________

  56. block5_conv1 (Conv2D) (None, 16, 16, 512) 2359808

  57. _________________________________________________________________

  58. block5_conv2 (Conv2D) (None, 16, 16, 512) 2359808

  59. _________________________________________________________________

  60. block5_conv3 (Conv2D) (None, 16, 16, 512) 2359808

  61. _________________________________________________________________

  62. block5_conv4 (Conv2D) (None, 16, 16, 512) 2359808

  63. _________________________________________________________________

  64. block5_pool (MaxPooling2D) (None, 8, 8, 512) 0

  65. =================================================================

  66. Total params: 20,024,384.0

  67. Trainable params: 20,024,384.0

  68. Non-trainable params: 0.0

  69. """

  70.  

  71. # Freeze the layers which you don't want to train. Here I am freezing the first 5 layers.

  72. for layer in model.layers[:5]:

  73. layer.trainable = False

  74.  

  75. #Adding custom Layers

  76. x = model.output

  77. x = Flatten()(x)

  78. x = Dense(1024, activation="relu")(x)

  79. x = Dropout(0.5)(x)

  80. x = Dense(1024, activation="relu")(x)

  81. predictions = Dense(16, activation="softmax")(x)

  82.  

  83. # creating the final model

  84. model_final = Model(input = model.input, output = predictions)

  85.  

  86. # compile the model

  87. model_final.compile(loss = "categorical_crossentropy", optimizer = optimizers.SGD(lr=0.0001, momentum=0.9), metrics=["accuracy"])

  88.  

  89. # Initiate the train and test generators with data Augumentation

  90. train_datagen = ImageDataGenerator(

  91. rescale = 1./255,

  92. horizontal_flip = True,

  93. fill_mode = "nearest",

  94. zoom_range = 0.3,

  95. width_shift_range = 0.3,

  96. height_shift_range=0.3,

  97. rotation_range=30)

  98.  

  99. test_datagen = ImageDataGenerator(

  100. rescale = 1./255,

  101. horizontal_flip = True,

  102. fill_mode = "nearest",

  103. zoom_range = 0.3,

  104. width_shift_range = 0.3,

  105. height_shift_range=0.3,

  106. rotation_range=30)

  107.  

  108. train_generator = train_datagen.flow_from_directory(

  109. train_data_dir,

  110. target_size = (img_height, img_width),

  111. batch_size = batch_size,

  112. class_mode = "categorical")

  113.  

  114. validation_generator = test_datagen.flow_from_directory(

  115. validation_data_dir,

  116. target_size = (img_height, img_width),

  117. class_mode = "categorical")

  118.  

  119. # Save the model according to the conditions

  120. checkpoint = ModelCheckpoint("vgg16_1.h5", monitor='val_acc', verbose=1, save_best_only=True, save_weights_only=False, mode='auto', period=1)

  121. early = EarlyStopping(monitor='val_acc', min_delta=0, patience=10, verbose=1, mode='auto')

  122.  

  123.  

  124. # Train the model

  125. model_final.fit_generator(

  126. train_generator,

  127. samples_per_epoch = nb_train_samples,

  128. epochs = epochs,

  129. validation_data = validation_generator,

  130. nb_val_samples = nb_validation_samples,

  131. callbacks = [checkpoint, early])

请记住,convnet的特性在早期的层中更通用,在后期的层中更具体于原始数据集,这里有一些4个主要场景的通用经验规则:

1. 新数据集很小,和原始数据集相似:

如果我们试图训练整个网络,就会出现过拟合的问题。由于数据与原始数据相似,我们希望ConvNet中的高级特性也与此数据集相关。因此,最好的方法是在CNN代码上训练一个线性分类器。

因此,让我们冻结所有的VGG19层,只训练分类器

  1. for layer in model.layers:

  2. layer.trainable = False

  3. #Now we will be training only the classifiers (FC layers)

2. 新数据集很大,和原始数据集相似:

因为我们有更多的数据,所以如果我们试图通过整个网络进行微调,我们就会更有信心不会过拟合。

  1. for layer in model.layers:

  2. layer.trainable = True

  3. #The default is already set to True. I have mentioned it here to make things clear.

如果你想冻结前几层,因为这些层将检测边缘和区块,你可以使用以下代码冻结它们。

  1. for layer in model.layers[:5]:

  2. layer.trainable = False.

  3. # Here I am freezing the first 5 layers

3. 新数据集很小,但与原始数据集非常不同

由于数据集非常小,我们可能希望从较早的层提取特性,并在此基础上训练分类器。这需要一些h5py的知识。

  1. from keras import applications

  2. from keras.preprocessing.image importImageDataGenerator

  3. from keras import optimizers

  4. from keras.models importSequential, Model

  5. from keras.layers importDropout, Flatten, Dense, GlobalAveragePooling2D

  6. from keras import backend as k

  7. from keras.callbacks importModelCheckpoint, LearningRateScheduler, TensorBoard, EarlyStopping

  8.  

  9. img_width, img_height = 256, 256

  10.  

  11. ### Build the network

  12. img_input = Input(shape=(256, 256, 3))

  13. x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv1')(img_input)

  14. x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv2')(x)

  15. x = MaxPooling2D((2, 2), strides=(2, 2), name='block1_pool')(x)

  16.  

  17. # Block 2

  18. x = Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv1')(x)

  19. x = Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv2')(x)

  20. x = MaxPooling2D((2, 2), strides=(2, 2), name='block2_pool')(x)

  21.  

  22. model = Model(input = img_input, output = x)

  23.  

  24. model.summary()

  25. """

  26. _________________________________________________________________

  27. Layer (type) Output Shape Param #

  28. =================================================================

  29. input_1 (InputLayer) (None, 256, 256, 3) 0

  30. _________________________________________________________________

  31. block1_conv1 (Conv2D) (None, 256, 256, 64) 1792

  32. _________________________________________________________________

  33. block1_conv2 (Conv2D) (None, 256, 256, 64) 36928

  34. _________________________________________________________________

  35. block1_pool (MaxPooling2D) (None, 128, 128, 64) 0

  36. _________________________________________________________________

  37. block2_conv1 (Conv2D) (None, 128, 128, 128) 73856

  38. _________________________________________________________________

  39. block2_conv2 (Conv2D) (None, 128, 128, 128) 147584

  40. _________________________________________________________________

  41. block2_pool (MaxPooling2D) (None, 64, 64, 128) 0

  42. =================================================================

  43. Total params: 260,160.0

  44. Trainable params: 260,160.0

  45. Non-trainable params: 0.0

  46. """

  47.  

  48. layer_dict = dict([(layer.name, layer) for layer in model.layers])

  49. [layer.name for layer in model.layers]

  50. """

  51. ['input_1',

  52. 'block1_conv1',

  53. 'block1_conv2',

  54. 'block1_pool',

  55. 'block2_conv1',

  56. 'block2_conv2',

  57. 'block2_pool']

  58. """

  59.  

  60. import h5py

  61. weights_path = 'vgg19_weights.h5'# ('https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg19_weights_tf_dim_ordering_tf_kernels.h5)

  62. f = h5py.File(weights_path)

  63.  

  64. list(f["model_weights"].keys())

  65. """

  66. ['block1_conv1',

  67. 'block1_conv2',

  68. 'block1_pool',

  69. 'block2_conv1',

  70. 'block2_conv2',

  71. 'block2_pool',

  72. 'block3_conv1',

  73. 'block3_conv2',

  74. 'block3_conv3',

  75. 'block3_conv4',

  76. 'block3_pool',

  77. 'block4_conv1',

  78. 'block4_conv2',

  79. 'block4_conv3',

  80. 'block4_conv4',

  81. 'block4_pool',

  82. 'block5_conv1',

  83. 'block5_conv2',

  84. 'block5_conv3',

  85. 'block5_conv4',

  86. 'block5_pool',

  87. 'dense_1',

  88. 'dense_2',

  89. 'dense_3',

  90. 'dropout_1',

  91. 'global_average_pooling2d_1',

  92. 'input_1']

  93. """

  94.  

  95. # list all the layer names which are in the model.

  96. layer_names = [layer.name for layer in model.layers]

  97.  

  98.  

  99. """

  100. # Here we are extracting model_weights for each and every layer from the .h5 file

  101. >>> f["model_weights"]["block1_conv1"].attrs["weight_names"]

  102. array([b'block1_conv1/kernel:0', b'block1_conv1/bias:0'],

  103. dtype='|S21')

  104. # we are assiging this array to weight_names below

  105. >>> f["model_weights"]["block1_conv1"]["block1_conv1/kernel:0]

  106. <HDF5 dataset "kernel:0": shape (3, 3, 3, 64), type "<f4">

  107. # The list comprehension (weights) stores these two weights and bias of both the layers

  108. >>>layer_names.index("block1_conv1")

  109. 1

  110. >>> model.layers[1].set_weights(weights)

  111. # This will set the weights for that particular layer.

  112. With a for loop we can set_weights for the entire network.

  113. """

  114. for i in layer_dict.keys():

  115. weight_names = f["model_weights"][i].attrs["weight_names"]

  116. weights = [f["model_weights"][i][j] for j in weight_names]

  117. index = layer_names.index(i)

  118. model.layers[index].set_weights(weights)

  119.  

  120.  

  121. import cv2

  122. import numpy as np

  123. import pandas as pd

  124. from tqdm import tqdm

  125. import itertools

  126. import glob

  127.  

  128. features = []

  129. for i in tqdm(files_location):

  130. im = cv2.imread(i)

  131. im = cv2.resize(cv2.cvtColor(im, cv2.COLOR_BGR2RGB), (256, 256)).astype(np.float32) / 255.0

  132. im = np.expand_dims(im, axis =0)

  133. outcome = model_final.predict(im)

  134. features.append(outcome)

  135.  

  136. ## collect these features and create a dataframe and train a classfier on top of it.

上面的代码应该会有所帮助。它将提取“block2_pool”特性。一般来说,这是没有用的,因为这一层有(64x64x128)的特征,并且在它上面训练分类器可能对我们没有帮助。我们可以添加几个FC层,并在其上训练一个神经网络。这应该是直截了当的。

  • 添加几个FC层和输出层。

  • 设置早期图层的权重并将其冻结。

  • 训练网络。

4. 新数据集很大,与原始数据集非常不同

这很简单。由于你拥有大型数据集,你可以设计自己的网络或使用现有的网络。

  • 使用随机初始化训练网络或使用预先训练的网络权重作为初始化器。第二种方法通常是首选的。

  • 如果你使用的是不同的网络,或者对现有网络进行了一些小的修改,请注意命名约定。

     

参考

 

  1. cs231n.github.io/transfer-learning/

  2. keras.io

  3. https://github.com/fchollet/keras

使用Keras进行迁移学习_kreas_04—END—

 

 

使用Keras进行迁移学习_kreas_05