
WIDER Face for face detection and Celeba for landmark detection


两个文件夹分别表示来源不同的图片。It contains 5,590 LFW images and 7,876 other images downloaded from the web. The training set and validation set are defined in trainImageList.txt and testImageList.txt

裁剪方式——对目标区域,做平移、缩放等变换得到裁剪区域(Since the training data for landmark is less.I use transform,random rotate and random flip to conduct data augment)


正样本:IoU >= 0.65,标签为1
负样本:IoU < 0.3,标签为0
部分(part)样本:0.65 > IoU >= 0.4,标签为-1

Since MTCNN is a Multi-task Network,we should pay attention to the format of training data.The format is:
[path to image][cls_label][bbox_label][landmark_label]
For pos sample,cls_label=1,bbox_label(calculate),landmark_label=[0,0,0,0,0,0,0,0,0,0]. 
For part sample,cls_label=-1,bbox_label(calculate),landmark_label=[0,0,0,0,0,0,0,0,0,0].
For landmark sample,cls_label=-2,bbox_label=[0,0,0,0],landmark_label(calculate).
For neg sample,cls_label=0,bbox_label=[0,0,0,0],landmark_label=[0,0,0,0,0,0,0,0,0,0].

PNet:12 x 12,负责粗选得到候选框,功能有:分类、回归
RNet:24 x 24,负责筛选PNet的粗筛结果,并微调box使得更加准确,功能有:分类、回归
ONet:48 x 48,负责最后的筛选判定,并微调box,回归得到keypoint的位置,功能有:分类、回归、关键点

c.网络大小的问题,训练时输入图像大小为网络指定的大小,例如12 x 12,而因为PNet没有全连接层,是全卷积的网络,所以预测识别的时候是没有尺寸要求的,那么PNet可以对任意输入尺寸进行预测得到k个boundingbox和置信度,通过阈值过滤即可完成候选框提取过程,而该网络因为结构小,所以效率非常高。


  1. Run prepare_data/gen_12net_data.py to generate training data(Face Detection Part) for PNet.
  2. Run gen_landmark_aug_12.py to generate training data(Face Landmark Detection Part) for PNet.
  3. Run gen_imglist_pnet.py to merge two parts of training data.
  4. Run gen_PNet_tfrecords.py to generate tfrecord for PNet.

生成数据(for Face Detection)


12880 pics in total
12800 images done, pos: 458655 part: 1125289 neg: 995342

a.循环5次,取人脸框附近的IoU < 0.3的剪裁图像作为负样本,若剪裁图中的坐标超过原图大小,则抛弃

b.循环20次,取人脸框附近的剪裁图,IoU >= 0.65作为正样本,0.65 > IoU >= 0.4作为部分样本



import numpy.random as npr
neg_num = 0
# keep crop random parts, until have 50 negative examples
# get 50 negative sample from every image
while neg_num < 50:
        #neg_num's size [40,min(width, height) / 2],min_size:40
        # size is a random number between 12 and min(width,height)
        size = npr.randint(12, min(width, height) / 2)
        #top_left coordinate
        nx = npr.randint(0, width - size)
        ny = npr.randint(0, height - size)
        #random crop
        crop_box = np.array([nx, ny, nx + size, ny + size])
        #calculate iou
        Iou = IoU(crop_box, boxes)

        #crop a part from inital image
        cropped_im = img[ny : ny + size, nx : nx + size, :]
        #resize the cropped image to size 12*12
        resized_im = cv2.resize(cropped_im, (12, 12), interpolation=cv2.INTER_LINEAR)

        if np.max(Iou) < 0.3:
            # Iou with all gts must below 0.3
            save_file = os.path.join(neg_save_dir, "%s.jpg"%n_idx)
            f2.write("DATA/12/negative/%s.jpg"%n_idx + ' 0\n')
            cv2.imwrite(save_file, resized_im)
            n_idx += 1
            neg_num += 1

#for every bounding boxes
for box in boxes:
    # box (x_left, y_top, x_right, y_bottom)
    x1, y1, x2, y2 = box
    #gt's width
    w = x2 - x1 + 1
    #gt's height
    h = y2 - y1 + 1

    # ignore small faces and those faces has left-top corner out of the image
    # in case the ground truth boxes of small faces are not accurate
    if max(w, h) < 20 or x1 < 0 or y1 < 0:

    # crop another 5 images near the bounding box if IoU less than 0.5, save as negative samples
    for i in range(5):
        #size of the image to be cropped
        size = npr.randint(12, min(width, height) / 2)
        # delta_x and delta_y are offsets of (x1, y1)
        # max can make sure if the delta is a negative number , x1+delta_x >0
        # parameter high of randint make sure there will be intersection between bbox and cropped_box
        delta_x = npr.randint(max(-size, -x1), w)
        delta_y = npr.randint(max(-size, -y1), h)
        # max here not really necessary
        nx1 = int(max(0, x1 + delta_x))
        ny1 = int(max(0, y1 + delta_y))
        # if the right bottom point is out of image then skip
        if nx1 + size > width or ny1 + size > height:
        crop_box = np.array([nx1, ny1, nx1 + size, ny1 + size])
        Iou = IoU(crop_box, boxes)

        cropped_im = img[ny1: ny1 + size, nx1: nx1 + size, :]
        #rexize cropped image to be 12 * 12
        resized_im = cv2.resize(cropped_im, (12, 12), interpolation=cv2.INTER_LINEAR)

        if np.max(Iou) < 0.3:
            # Iou with all gts must below 0.3
            save_file = os.path.join(neg_save_dir, "%s.jpg" % n_idx)
            f2.write("DATA/12/negative/%s.jpg" % n_idx + ' 0\n')
            cv2.imwrite(save_file, resized_im)
            n_idx += 1

    #generate positive examples and part faces
    for i in range(20):
        # pos and part face size [minsize*0.8,maxsize*1.25]
        size = npr.randint(int(min(w, h) * 0.8), np.ceil(1.25 * max(w, h)))

        # delta here is the offset of box center
        if w<5:
            print (w)
        delta_x = npr.randint(-w * 0.2, w * 0.2)
        delta_y = npr.randint(-h * 0.2, h * 0.2)

        #show this way: nx1 = max(x1+w/2-size/2+delta_x)
        # x1+ w/2 is the central point, then add offset , then deduct size/2
        # deduct size/2 to make sure that the right bottom corner will be out of
        nx1 = int(max(x1 + w / 2 + delta_x - size / 2, 0))
        #show this way: ny1 = max(y1+h/2-size/2+delta_y)
        ny1 = int(max(y1 + h / 2 + delta_y - size / 2, 0))
        nx2 = nx1 + size
        ny2 = ny1 + size

        if nx2 > width or ny2 > height:
        crop_box = np.array([nx1, ny1, nx2, ny2])
        #yu gt de offset
        offset_x1 = (x1 - nx1) / float(size)
        offset_y1 = (y1 - ny1) / float(size)
        offset_x2 = (x2 - nx2) / float(size)
        offset_y2 = (y2 - ny2) / float(size)
        cropped_im = img[ny1 : ny2, nx1 : nx2, :]
        resized_im = cv2.resize(cropped_im, (12, 12), interpolation=cv2.INTER_LINEAR)

        box_ = box.reshape(1, -1)
        iou = IoU(crop_box, box_)
        if iou  >= 0.65:
            save_file = os.path.join(pos_save_dir, "%s.jpg"%p_idx)
            f1.write("DATA/12/positive/%s.jpg"%p_idx + ' 1 %.2f %.2f %.2f %.2f\n'%(offset_x1, offset_y1, offset_x2, offset_y2))
            cv2.imwrite(save_file, resized_im)
            p_idx += 1
        elif iou >= 0.4:
            save_file = os.path.join(part_save_dir, "%s.jpg"%d_idx)
            f3.write("DATA/12/part/%s.jpg"%d_idx + ' -1 %.2f %.2f %.2f %.2f\n'%(offset_x1, offset_y1, offset_x2, offset_y2))
            cv2.imwrite(save_file, resized_im)
            d_idx += 1

生成数据(for Landmark)




gt_box = np.array([bbox.left,bbox.top,bbox.right,bbox.bottom])
#initialize the landmark
landmark = np.zeros((5, 2))
for index, one in enumerate(landmarkGt):
    # (( x - bbox.left)/ width of bounding box, (y - bbox.top)/ height of bounding box特征点调整
    rv = ((one[0]-gt_box[0])/(gt_box[2]-gt_box[0]), (one[1]-gt_box[1])/(gt_box[3]-gt_box[1]))
    # put the normalized value into the new list landmark
    landmark[index] = rv

2.对数据进行拓展(旋转,翻转等,具体内容参考 prepare_data/gen_landmark_aug_12.py )


When training PNet,I merge four parts of data(pos,part,landmark,neg) into one tfrecord,since their total number radio is almost 1:1:1:3



with tf.python_io.TFRecordWriter(tf_filename) as tfrecord_writer:
        for i, image_example in enumerate(dataset):
            if (i+1) % 100 == 0:
                sys.stdout.write('\r>> %d/%d images has been converted' % (i+1, len(dataset)))
                #sys.stdout.write('\r>> Converting image %d/%d' % (i + 1, len(dataset)))
            filename = image_example['filename']
            _add_to_tfrecord(filename, image_example, tfrecord_writer)

def _add_to_tfrecord(filename, image_example, tfrecord_writer):
    """Loads data from image and annotations files and add them to a TFRecord.

      filename: Dataset directory;
      name: Image name to add to the TFRecord;
      tfrecord_writer: The TFRecord writer to use for writing.
    # 其中的_process_image_withoutcoder,_convert_to_example_simple两个函数在tfrecord_utils.py文件中
    image_data, height, width = _process_image_withoutcoder(filename)
    example = _convert_to_example_simple(image_example, image_data)


def _process_image_withoutcoder(filename):
    image = cv2.imread(filename)
    # transform data into string format
    image_data = image.tostring()
    assert len(image.shape) == 3
    height = image.shape[0]
    width = image.shape[1]
    assert image.shape[2] == 3
    # return string data and initial height and width of the image
    return image_data, height, width

def _convert_to_example_simple(image_example, image_buffer):
    covert to tfrecord file
    :param image_example: dict, an image example
    :param image_buffer: string, JPEG encoding of RGB image
    :param colorspace:
    :param channels:
    :param image_format:
    Example proto
    # filename = str(image_example['filename'])

    # class label for the whole image
    class_label = image_example['label']
    bbox = image_example['bbox']
    roi = [bbox['xmin'],bbox['ymin'],bbox['xmax'],bbox['ymax']]
    landmark = [bbox['xlefteye'],bbox['ylefteye'],bbox['xrighteye'],bbox['yrighteye'],bbox['xnose'],bbox['ynose'],
    example = tf.train.Example(features=tf.train.Features(feature={
        'image/encoded': _bytes_feature(image_buffer),
        'image/label': _int64_feature(class_label),
        'image/roi': _float_feature(roi),
        'image/landmark': _float_feature(landmark)
    return example

def _int64_feature(value):
    """Wrapper for insert int64 feature into Example proto."""
    if not isinstance(value, list):
        value = [value]
    return tf.train.Feature(int64_list=tf.train.Int64List(value=value))

def _float_feature(value):
    """Wrapper for insert float features into Example proto."""
    if not isinstance(value, list):
        value = [value]
    return tf.train.Feature(float_list=tf.train.FloatList(value=value))

def _bytes_feature(value):
    """Wrapper for insert bytes features into Example proto."""
    if not isinstance(value, list):
        value = [value]
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=value))

prepare_data/read_tfrecord_v2.py 在训练的时候需要解析tfrecord文件

def read_single_tfrecord(tfrecord_file, batch_size, net):
    # generate a input queue
    # each epoch shuffle
    filename_queue = tf.train.string_input_producer([tfrecord_file],shuffle=True)
    # read tfrecord
    reader = tf.TFRecordReader()
    _, serialized_example = reader.read(filename_queue)
    image_features = tf.parse_single_example(
            'image/encoded': tf.FixedLenFeature([], tf.string),#one image  one record
            'image/label': tf.FixedLenFeature([], tf.int64),
            'image/roi': tf.FixedLenFeature([4], tf.float32),
            'image/landmark': tf.FixedLenFeature([10],tf.float32)
    if net == 'PNet':
        image_size = 12
    elif net == 'RNet':
        image_size = 24
        image_size = 48
    image = tf.decode_raw(image_features['image/encoded'], tf.uint8)
    image = tf.reshape(image, [image_size, image_size, 3])
    image = (tf.cast(image, tf.float32)-127.5) / 128
    # image = tf.image.per_image_standardization(image)
    label = tf.cast(image_features['image/label'], tf.float32)
    roi = tf.cast(image_features['image/roi'],tf.float32)
    landmark = tf.cast(image_features['image/landmark'],tf.float32)
    image, label,roi,landmark = tf.train.batch(
        [image, label,roi,landmark],
        capacity=1 * batch_size
    label = tf.reshape(label, [batch_size])
    roi = tf.reshape(roi,[batch_size,4])
    landmark = tf.reshape(landmark,[batch_size,10])
    return image, label, roi,landmark

[root@node5 MTCNN-Tensorflow]# python train_models/train_PNet.py 
['/ssd/yuansaijie/MTCNN-Tensorflow/train_models', '/ssd/yuansaijie/MTCNN-Tensorflow', '/usr/lib64/python27.zip', '/usr/lib64/python2.7', '/usr/lib64/python2.7/plat-linux2', '/usr/lib64/python2.7/lib-tk', '/usr/lib64/python2.7/lib-old', '/usr/lib64/python2.7/lib-dynload', '/usr/lib64/python2.7/site-packages', '/usr/lib/python2.7/site-packages', '/usr/lib/python2.7/site-packages/pika-0.9.14-py2.7.egg', '/usr/lib/python2.7/site-packages/elasticsearch-1.4.0-py2.7.egg', '../prepare_data']
('Total size of the dataset is: ', 1260000)
('dataset dir is:', 'DATA/imglists/PNet/train_PNet_landmark.tfrecord_shuffle')
('Total size of the dataset is: ', 1260000)
('dataset dir is:', 'DATA/imglists/PNet/train_PNet_landmark.tfrecord_shuffle')
(384, 12, 12, 3)
('load summary for : ', u'conv1/add')
(384, 10, 10, 10)
('load summary for : ', u'pool1/MaxPool')
(384, 5, 5, 10)
('load summary for : ', u'conv2/add')
(384, 3, 3, 16)
('load summary for : ', u'conv3/add')
(384, 1, 1, 32)
('load summary for : ', u'conv4_1/Reshape_1')
(384, 1, 1, 2)
('load summary for : ', u'conv4_2/BiasAdd')
(384, 1, 1, 4)
('load summary for : ', u'conv4_3/BiasAdd')
(384, 1, 1, 10)
WARNING:tensorflow:From /ssd/yuansaijie/MTCNN-Tensorflow/train_models/mtcnn_model.py:235: get_regularization_losses (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30.
Instructions for updating:
Use tf.losses.get_regularization_losses instead.
2018-10-19 11:44:15.160774: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled t.......................................
2018-10-19 10:23:49.778847 : Step: 97900/98460, accuracy: 0.934169, cls loss: 0.223913, bbox loss: 0.065459,Landmark loss :0.018630,L2 loss: 0.016533, Total Loss: 0.282490 ,lr:0.000001 
2018-10-19 10:23:52.010314 : Step: 98000/98460, accuracy: 0.916667, cls loss: 0.278652, bbox loss: 0.075655,Landmark loss :0.016387,L2 loss: 0.016533, Total Loss: 0.341207 ,lr:0.000001 
2018-10-19 10:23:54.169109 : Step: 98100/98460, accuracy: 0.961039, cls loss: 0.175593, bbox loss: 0.071169,Landmark loss :0.032753,L2 loss: 0.016533, Total Loss: 0.244087 ,lr:0.000001 
2018-10-19 10:23:56.376758 : Step: 98200/98460, accuracy: 0.890365, cls loss: 0.327316, bbox loss: 0.073061,Landmark loss :0.018354,L2 loss: 0.016533, Total Loss: 0.389556 ,lr:0.000001 
2018-10-19 10:23:58.548301 : Step: 98300/98460, accuracy: 0.918919, cls loss: 0.286136, bbox loss: 0.072269,Landmark loss :0.030357,L2 loss: 0.016533, Total Loss: 0.353982 ,lr:0.000001 
2018-10-19 10:24:00.754086 : Step: 98400/98460, accuracy: 0.920000, cls loss: 0.247473, bbox loss: 0.062291,Landmark loss :0.030228,L2 loss: 0.016533, Total Loss: 0.310266 ,lr:0.000001 
('path prefix is :', 'mymodel/MTCNN_model/PNet_landmark/PNet-30')

[root@node5 MTCNN-Tensorflow]# tensorboard --logdir=logs/
TensorBoard 0.4.0rc3 at http://node5:6006 (Press CTRL+C to quit)

def train(net_factory, prefix, end_epoch, base_dir,
          display=200, base_lr=0.01):
    train PNet/RNet/ONet
    :param net_factory: 即mtcnn_model.py中定义的三个网络结构
    :param prefix: model path  模型保存路径
    :param end_epoch:
    :param dataset:   base_dir表示训练数据所在的位置
    :param display:
    :param base_lr:
    net = prefix.split('/')[-1]
    #label file
    label_file = os.path.join(base_dir,'train_%s_landmark.txt' % net)
    #label_file = os.path.join(base_dir,'landmark_12_few.txt')
    f = open(label_file, 'r')
    # get number of training examples
    num = len(f.readlines())
    print("Total size of the dataset is: ", num)

    #PNet use this method to get data读取训练数据
    if net == 'PNet':
        #dataset_dir = os.path.join(base_dir,'train_%s_ALL.tfrecord_shuffle' % net)
        dataset_dir = os.path.join(base_dir,'train_%s_landmark.tfrecord_shuffle' % net)
        print('dataset dir is:',dataset_dir)
        image_batch, label_batch, bbox_batch,landmark_batch = read_single_tfrecord(dataset_dir, config.BATCH_SIZE, net)
    #RNet use 3 tfrecords to get data    
        pos_dir = os.path.join(base_dir,'pos_landmark.tfrecord_shuffle')
        part_dir = os.path.join(base_dir,'part_landmark.tfrecord_shuffle')
        neg_dir = os.path.join(base_dir,'neg_landmark.tfrecord_shuffle')
        #landmark_dir = os.path.join(base_dir,'landmark_landmark.tfrecord_shuffle')
        landmark_dir = os.path.join('DATA/imglists/RNet','landmark_landmark.tfrecord_shuffle')
        dataset_dirs = [pos_dir,part_dir,neg_dir,landmark_dir]
        pos_radio = 1.0/6;part_radio = 1.0/6;landmark_radio=1.0/6;neg_radio=3.0/6
        pos_batch_size = int(np.ceil(config.BATCH_SIZE*pos_radio))
        assert pos_batch_size != 0,"Batch Size Error "
        part_batch_size = int(np.ceil(config.BATCH_SIZE*part_radio))
        assert part_batch_size != 0,"Batch Size Error "
        neg_batch_size = int(np.ceil(config.BATCH_SIZE*neg_radio))
        assert neg_batch_size != 0,"Batch Size Error "
        landmark_batch_size = int(np.ceil(config.BATCH_SIZE*landmark_radio))
        assert landmark_batch_size != 0,"Batch Size Error "
        batch_sizes = [pos_batch_size,part_batch_size,neg_batch_size,landmark_batch_size]
        #print('batch_size is:', batch_sizes)
        image_batch, label_batch, bbox_batch,landmark_batch = read_multi_tfrecords(dataset_dirs,batch_sizes, net)        
    #landmark_dir    定义损失函数比重,毕竟是三个任务损失的结合
    if net == 'PNet':
        image_size = 12
        radio_cls_loss = 1.0;radio_bbox_loss = 0.5;radio_landmark_loss = 0.5;
    elif net == 'RNet':
        image_size = 24
        radio_cls_loss = 1.0;radio_bbox_loss = 0.5;radio_landmark_loss = 0.5;
        radio_cls_loss = 1.0;radio_bbox_loss = 0.5;radio_landmark_loss = 1;
        image_size = 48
    #define placeholder为数据输入和label定义占位符
    input_image = tf.placeholder(tf.float32, shape=[config.BATCH_SIZE, image_size, image_size, 3], name='input_image')
    label = tf.placeholder(tf.float32, shape=[config.BATCH_SIZE], name='label')
    bbox_target = tf.placeholder(tf.float32, shape=[config.BATCH_SIZE, 4], name='bbox_target')
    landmark_target = tf.placeholder(tf.float32,shape=[config.BATCH_SIZE,10],name='landmark_target')
    #get loss and accuracy
    input_image = image_color_distort(input_image)
    cls_loss_op,bbox_loss_op,landmark_loss_op,L2_loss_op,accuracy_op = net_factory(input_image, label, bbox_target,landmark_target,training=True)   #此处net_factory为Pnet,得到各个部分的损失值
    #train,update learning rate(3 loss)
    total_loss_op  = radio_cls_loss*cls_loss_op + radio_bbox_loss*bbox_loss_op + radio_landmark_loss*landmark_loss_op + L2_loss_op
    train_op, lr_op = train_model(base_lr,
    # init
    init = tf.global_variables_initializer()
    sess = tf.Session()

    #save model
    saver = tf.train.Saver(max_to_keep=0)

    #visualize some variables
    tf.summary.scalar("total_loss",total_loss_op)#cls_loss, bbox loss, landmark loss and L2 loss add together
    summary_op = tf.summary.merge_all()
    logs_dir = "logs/%s" %(net)
    if os.path.exists(logs_dir) == False:
    writer = tf.summary.FileWriter(logs_dir,sess.graph)
    projector_config = projector.ProjectorConfig()
    coord = tf.train.Coordinator()
    #begin enqueue thread
    threads = tf.train.start_queue_runners(sess=sess, coord=coord)
    i = 0
    #total steps
    MAX_STEP = int(num / config.BATCH_SIZE + 1) * end_epoch
    epoch = 0
        for step in range(MAX_STEP):
            i = i + 1
            if coord.should_stop():
            image_batch_array, label_batch_array, bbox_batch_array,landmark_batch_array = sess.run([image_batch, label_batch, bbox_batch,landmark_batch])
            #random flip
            image_batch_array,landmark_batch_array = random_flip_images(image_batch_array,label_batch_array,landmark_batch_array)

            _,_,summary = sess.run([train_op, lr_op ,summary_op], feed_dict={input_image: image_batch_array, label: label_batch_array, bbox_target: bbox_batch_array,landmark_target:landmark_batch_array})

            if (step+1) % display == 0:
                #acc = accuracy(cls_pred, labels_batch)
                cls_loss, bbox_loss,landmark_loss,L2_loss,lr,acc = sess.run([cls_loss_op, bbox_loss_op,landmark_loss_op,L2_loss_op,lr_op,accuracy_op],
                                                             feed_dict={input_image: image_batch_array, label: label_batch_array, bbox_target: bbox_batch_array, landmark_target: landmark_batch_array})

                total_loss = radio_cls_loss*cls_loss + radio_bbox_loss*bbox_loss + radio_landmark_loss*landmark_loss + L2_loss
                # landmark loss: %4f,
                print("%s : Step: %d/%d, accuracy: %3f, cls loss: %4f, bbox loss: %4f,Landmark loss :%4f,L2 loss: %4f, Total Loss: %4f ,lr:%f " % (
                datetime.now(), step+1,MAX_STEP, acc, cls_loss, bbox_loss,landmark_loss, L2_loss,total_loss, lr))

            #save every two epochs
            if i * config.BATCH_SIZE > num*2:
                epoch = epoch + 1
                i = 0
                path_prefix = saver.save(sess, prefix, global_step=epoch*2)
                print('path prefix is :', path_prefix)
    except tf.errors.OutOfRangeError:

  1. After training PNet, run gen_hard_example to generate training data(Face Detection Part) for RNet.
  2. Run gen_landmark_aug_24.py to generate training data(Face Landmark Detection Part) for RNet.
  3. Run gen_imglist_rnet.py to merge two parts of training data.
  4. Run gen_RNet_tfrecords.py to generate tfrecords for RNet.(you should run this script four times to generate tfrecords of neg,pos,part and landmark respectively)

生成数据(for Face Detection)


[root@node5 MTCNN-Tensorflow]# python prepare_data/gen_hard_example.py 
Called with argument:
Namespace(batch_size=[2048, 256, 16], epoch=[18, 14, 16], min_face=20, prefix=['data/MTCNN_model/PNet_landmark/PNet', 'data/MTCNN_model/RNet_No_Landmark/RNet', 'data/MTCNN_model/ONet_No_Landmark/ONet'], shuffle=False, slide_window=False, stride=2, test_mode='PNet', thresh=[0.3, 0.1, 0.7], vis=False)
('Test model: ', 'PNet')
(1, ?, ?, 3)
('load summary for : ', u'conv1/add')
(1, ?, ?, 10)
('load summary for : ', u'pool1/MaxPool')
(1, ?, ?, 10)
('load summary for : ', u'conv2/add')
(1, ?, ?, 16)
('load summary for : ', u'conv3/add')
(1, ?, ?, 32)
('load summary for : ', u'conv4_1/Reshape_1')
(1, ?, ?, 2)
('load summary for : ', u'conv4_2/BiasAdd')
(1, ?, ?, 4)
('load summary for : ', u'conv4_3/BiasAdd')
(1, ?, ?, 10)
2018-10-19 14:55:32.129731: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
restore models' param
load test data
finish loading
start detecting....
100 out of 12880 images done
0.735359 seconds for each image
200 out of 12880 images done
0.703251 seconds for each image
300 out of 12880 images done
12700 out of 12880 images done
0.733344 seconds for each image
12800 out of 12880 images done
0.669486 seconds for each image
('num of images', 12880)
time cost in average0.637  pnet 0.637  rnet 0.000  onet 0.000
('boxes length:', 12880)
finish detecting ----------------------------------------以上都是在完成Pnet的预测,预测结果保存为detections.pkl
save_path is :
processing 12880 images in total  -----------------------对比预测和真实结果,生成Rnet的三类训练样本
0 images done
100 images done
200 images done

1 # im_idx_list,gt_boxes_list是原训练集的图片和bounding_box数据,det_boxes是上一个网络的测试结果
 2 for im_idx, dets, gts in zip(im_idx_list, det_boxes, gt_boxes_list):
 3     gts = np.array(gts, dtype=np.float32).reshape(-1, 4)
 5     if dets.shape[0] == 0:
 6         continue
 7     img = cv2.imread(im_idx)
 8     #change to square
 9     dets = convert_to_square(dets)
10     dets[:, 0:4] = np.round(dets[:, 0:4])
11     neg_num = 0
12     for box in dets:
13         x_left, y_top, x_right, y_bottom, _ = box.astype(int)
14         width = x_right - x_left + 1
15         height = y_bottom - y_top + 1
17         # ignore box that is too small or beyond image border
18         if width < 20 or x_left < 0 or y_top < 0 or x_right > img.shape[1] - 1 or y_bottom > img.shape[0] - 1:
19             continue
21         # compute intersection over union(IoU) between current box and all gt boxes
22         Iou = IoU(box, gts)
23         cropped_im = img[y_top:y_bottom + 1, x_left:x_right + 1, :]
24         resized_im = cv2.resize(cropped_im, (image_size, image_size), interpolation=cv2.INTER_LINEAR)
26         # save negative images and write label
27         # Iou with all gts must below 0.3
28         if np.max(Iou) < 0.3 and neg_num < 60:
29             #save the examples
30             save_file = get_path(neg_dir, "%s.jpg" % n_idx)
31             # print(save_file)
32             neg_file.write(save_file + ' 0\n')
33             cv2.imwrite(save_file, resized_im)
34             n_idx += 1
35             neg_num += 1
36         else:
37             # find gt_box with the highest iou
38             idx = np.argmax(Iou)
39             assigned_gt = gts[idx]
40             x1, y1, x2, y2 = assigned_gt
42             # compute bbox reg label
43             offset_x1 = (x1 - x_left) / float(width)
44             offset_y1 = (y1 - y_top) / float(height)
45             offset_x2 = (x2 - x_right) / float(width)
46             offset_y2 = (y2 - y_bottom) / float(height)
48             # save positive and part-face images and write labels
49             if np.max(Iou) >= 0.65:
50                 save_file = get_path(pos_dir, "%s.jpg" % p_idx)
51                 pos_file.write(save_file + ' 1 %.2f %.2f %.2f %.2f\n' % (offset_x1, offset_y1, offset_x2, offset_y2))
52                 cv2.imwrite(save_file, resized_im)
53                 p_idx += 1
55             elif np.max(Iou) >= 0.4:
56                 save_file = os.path.join(part_dir, "%s.jpg" % d_idx)
57                 part_file.write(save_file + ' -1 %.2f %.2f %.2f %.2f\n' % (offset_x1, offset_y1, offset_x2, offset_y2))
58                 cv2.imwrite(save_file, resized_im)
59                 d_idx += 1

生成数据(for Landmark)


[root@node5 MTCNN-Tensorflow]# python train_models/train_RNet.py
('Total size of the dataset is: ', 1895256)
on27.zip', '/usr/lib64/python2.7', '/usr/lib64/python2.7/plat-linux2', '/usr/lib64/python2.7/lib-tk',
'/usr/lib64/python2.7/lib-old', '/usr/lib64/python2.7/lib-dynload', '/usr/lib64/python2.7/site-package
s', '/usr/lib/python2.7/site-packages', '/usr/lib/python2.7/site-packages/pika-0.9.14-py2.7.egg', '/us
r/lib/python2.7/site-packages/elasticsearch-1.4.0-py2.7.egg', '../prepare_data']
('Total size of the dataset is: ', 1895256)
(64, 24, 24, 3)
(64, 24, 24, 3)
(192, 24, 24, 3)
(64, 24, 24, 3)
(384, 24, 24, 3)

(384, 4)
(384, 24, 24, 3)
(384, 22, 22, 28)
(384, 11, 11, 28)
(384, 9, 9, 48)
(384, 4, 4, 48)
(384, 3, 3, 64)
(384, 576)
(384, 128)
(384, 2)
(384, 4)
(384, 10)
WARNING:tensorflow:From /ssd/yuansaijie/MTCNN-Tensorflow/train_models/mtcnn_model.py:282: get_regularization_losses (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30.
Instructions for updating:
Use tf.losses.get_regularization_losses instead.
2018-10-22 11:00:52.810807: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-10-22 11:01:05.694332 : Step: 100/108592, accuracy: 0.750000, cls loss: 0.657524, bbox loss: 0.112904,Landmark loss :0.150184,L2 loss: 0.023872, Total Loss: 0.812940 ,lr:0.001000
2018-10-22 11:01:17.431871 : Step: 200/108592, accuracy: 0.750000, cls loss: 0.648712, bbox loss: 0.093683,Landmark loss :0.141217,L2 loss: 0.023827, Total Loss: 0.789989 ,lr:0.001000
2018-10-22 14:33:03.275786 : Step: 108500/108592, accuracy: 0.976562, cls loss: 0.130488, bbox loss: 0.086588,Landmark loss :0.023444,L2 loss: 0.024208, Total Loss: 0.209711 ,lr:0.000001
('path prefix is :', 'mymodel/MTCNN_model/RNet_landmark/RNet-22')

  1. After training RNet, run gen_hard_example to generate training data(Face Detection Part) for ONet.
  2. Run gen_landmark_aug_48.py to generate training data(Face Landmark Detection Part) for ONet.
  3. Run gen_imglist_onet.py to merge two parts of training data.
  4. Run gen_ONet_tfrecords.py to generate tfrecords for ONet.(you should run this script four times to generate tfrecords of neg,pos,part and landmark respectively)

生成数据(for Face Detection)


[root@node5 MTCNN-Tensorflow]# python prepare_data/gen_hard_example.py
Called with argument:
Namespace(batch_size=[2048, 256, 16], epoch=[18, 14, 16], min_face=20, prefix=['data/MTCNN_model/PNet_landmark/PNet', 'data/MTCNN_model/RNet_landmark/RNet', 'data/MTCNN_model/ONet_No_Landmark/ONet'], shuf
fle=False, slide_window=False, stride=2, test_mode='RNet', thresh=[0.3, 0.1, 0.7], vis=False)
('Test model: ', 'RNet')
(1, ?, ?, 3)
('load summary for : ', u'conv1/add')
(1, ?, ?, 10)
('load summary for : ', u'pool1/MaxPool')
(1, ?, ?, 10)
('load summary for : ', u'conv2/add')
(1, ?, ?, 16)
('load summary for : ', u'conv3/add')
(1, ?, ?, 32)
('load summary for : ', u'conv4_1/Reshape_1')
(1, ?, ?, 2)
('load summary for : ', u'conv4_2/BiasAdd')
(1, ?, ?, 4)
('load summary for : ', u'conv4_3/BiasAdd')
(1, ?, ?, 10)
2018-10-22 14:56:35.504447: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports ins
tructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
restore models' param
('==================================', 'RNet')
(256, 24, 24, 3)
(256, 22, 22, 28)
(256, 11, 11, 28)
(256, 9, 9, 48)
(256, 4, 4, 48)
(256, 3, 3, 64)
(256, 576)
(256, 128)
(256, 2)
(256, 4)
(256, 10)
restore models' param
load test data
finish loading
start detecting....
100 out of 12880 images done
0.969146 seconds for each image
200 out of 12880 images done
0.954468 seconds for each image
300 out of 12880 images done
0.880505 seconds for each image
400 out of 12880 images done
12800 out of 12880 images done
0.826616 seconds for each image
('num of images', 12880)
time cost in average0.839  pnet 0.598  rnet 0.240  onet 0.000
('boxes length:', 12880)
finish detecting
save_path is :
processing 12880 images in total
0 images done
100 images done
200 images done
300 images done
400 images done

生成数据(for Landmark)


[root@node5 MTCNN-Tensorflow]# python train_models/train_ONet.py
('Total size of the dataset is: ', 1395806)
('Total size of the dataset is: ', 1395806)
(64, 48, 48, 3)
(64, 48, 48, 3)
(192, 48, 48, 3)
(64, 48, 48, 3)
(384, 48, 48, 3)

(384, 4)
(384, 48, 48, 3)
(384, 46, 46, 32)
(384, 23, 23, 32)
(384, 21, 21, 64)
(384, 10, 10, 64)
(384, 8, 8, 64)
(384, 4, 4, 64)
(384, 3, 3, 128)
(384, 1152)
(384, 256)
(384, 2)
(384, 4)
(384, 10)
WARNING:tensorflow:From /ssd/yuansaijie/MTCNN-Tensorflow/train_models/mtcnn_model.py:328: get_regularization_losses (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30.
Instructions for updating:
Use tf.losses.get_regularization_losses instead.
2018-10-23 09:44:37.292322: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-10-23 09:44:44.016103 : Step: 10/79970, accuracy: 0.746094, cls loss: 0.683990, bbox loss: 0.171421,Landmark loss :0.382090,L2 loss: 0.049354, Total Loss: 1.201144 ,lr:0.001000
2018-10-23 09:44:50.052537 : Step: 20/79970, accuracy: 0.750000, cls loss: 0.663642, bbox loss: 0.098265,Landmark loss :0.368318,L2 loss: 0.049314, Total Loss: 1.130407 ,lr:0.001000
2018-10-24 06:15:42.631526 : Step: 79970/79970, accuracy: 0.972656, cls loss: 0.115991, bbox loss: 0.059060,Landmark loss :0.017580,L2 loss: 0.043284, Total Loss: 0.206384 ,lr:0.000001 
('path prefix is :', 'mymodel/MTCNN_model/ONet_landmark/ONet-22')

# 此处训练时长已经不对了,因为是半夜重新跑的,大概是花了12h左右吧

