什么是CNN? 首先什么是CNN呢?我们在这里模仿儿童的学习方式,当小孩子学习一个陌生东西的时候,往往会从问题开始,这里我们拿CNN做对比,来介绍什么是CNN。

从上面的对话,我们知道CNN的全称是"Convolutional Neural Network"(卷积神经网络)。而神经网络是一种模仿生物神经网络(动物的中枢神经系统,特别是大脑)结构和功能的数学模型或计算模型。神经网络由大量的人工神经元组成,按不同的连接方式构建不同的网络。CNN是其中的一种,还有GAN(生成对抗网络),RNN(递归神经网络)等,神经网络能够类似人一样具有简单的决定能力和简单的判断能力,在图像和语音识别方面能够给出更好的结果。

CNN的原理 CNN被广泛应用在图像识别领域,那么CNN是如何实现图像识别的呢?我们根据图中的例子来解释CNN的原理。

CNN是一种人工神经网络,CNN的结构可以分为3层:

卷积层(Convolutional Layer) - 主要作用是提取特征。 池化层(Max Pooling Layer) - 主要作用是下采样(downsampling),却不会损坏识别结果。 全连接层(Fully Connected Layer) - 主要作用是分类。 我们可以拿人类来做类比,比如你现在看到上图中的小鸟,人类如何识别它就是鸟的呢?首先你判断鸟的嘴是尖的,全身有羽毛和翅膀,有尾巴。然后通过这些联系起来判断这是一只鸟。而CNN的原理也类似,通过卷积层来查找特征,然后通过全连接层来做分类判断这是一只鸟,而池化层则是为了让训练的参数更少,在保持采样不变的情况下,忽略掉一些信息。

卷积层(Convolutional Layer)

那么卷基层是如何提取特征的呢?我们都知道卷积就是2个函数的叠加,应用在图像上,则可以理解为拿一个滤镜放在图像上,找出图像中的某些特征,而我们需要找到很多特征才能区分某一物体,所以我们会有很多滤镜,通过这些滤镜的组合,我们可以得出很多的特征。

首先一张图片在计算机中保存的格式为一个个的像素,比如一张长度为1080,宽度为1024的图片,总共包含了1080 * 1024的像素,如果为RGB图片,因为RGB图片由3种颜色叠加而成,包含3个通道,因此我们需要用1080 * 1024 * 3的数组来表示RGB图片。

我们先从简单的情况开始考虑,假设我们有一组灰度图片,这样图片就可以表示为一个矩阵,假设我们的图片大小为5 * 5,那么我们就可以得到一个5 * 5的矩阵,接下来,我们用一组过滤器(Filter)来对图片过滤,过滤的过程就是求卷积的过程。假设我们的Filter的大小为3 * 3,我们从图片的左上角开始移动Filter,并且把每次矩阵相乘的结果记录下来。可以通过下面的过程来演示。

动图封面 每次Filter从矩阵的左上角开始移动,每次移动的步长是1,从左到右,从上到下,依次移动到矩阵末尾之后结束,每次都把Filter和矩阵对应的区域做乘法,得出一个新的矩阵。这其实就是做卷积的过程。而Filer的选择非常关键,Filter决定了过滤方式,通过不同的Filter会得到不同的特征。举一个例子就是:

我们选择了2种Filter分别对图中的矩阵做卷积,可以看到值越大的就表示找到的特征越匹配,值越小的就表示找到的特征越偏离。Filter1主要是找到为"|"形状的特征,可以看到找到1处,转换后相乘值为3的网格就表示原始的图案中有"|",而Filter2则表示找到"\"形状的特征,我们可以看到在图中可以找到2处。拿真实的图像举例子,我们经过卷积层的处理之后,得到如下的一些特征结果:

池化层(Max Pooling Layer)

经过卷积层处理的特征是否就可以直接用来分类了呢,答案是不能。我们假设一张图片的大小为500 * 500,经过50个Filter的卷积层之后,得到的结果为500 * 500 * 50",维度非常大,我们需要减少数据大小,而不会对识别的结果产生影响,即对卷积层的输出做下采样(downsampling),这时候就引入了池化层。池化层的原理很简单,先看一个例子:

我们先从右边看起,可以看到把一个4 * 4的矩阵按照2 * 2做切分,每个2 * 2的矩阵里,我们取最大的值保存下来,红色的矩阵里面最大值为6,所以输出为6,绿色的矩阵最大值为8,输出为8,黄色的为3,蓝色的为4,。这样我们就把原来4 * 4的矩阵变为了一个2 * 2的矩阵。在看左边,我们发现原来224 * 224的矩阵,缩小为112 * 112了,减少了一半大小。

那么为什么这样做可行呢?丢失的一部分数据会不会对结果有影响,实际上,池化层不会对数据丢失产生影响,因为我们每次保留的输出都是局部最显著的一个输出,而池化之后,最显著的特征并没。我们只保留了认为最显著的特征,而把其他无用的信息丢掉,来减少运算。池化层的引入还保证了平移不变性,即同样的图像经过翻转变形之后,通过池化层,可以得到相似的结果。

既然是降采样,那么是否有其他方法实现降采样,也能达到同样的效果呢?当然有,通过其它的降采样方式,我们同样可以得到和池化层相同的结果,因此就可以拿这种方式替换掉池化层,可以起到相同的效果。

通常卷积层和池化层会重复多次形成具有多个隐藏层的网络,俗称深度神经网络。 全连接层(Fully Connected Layer)

全连接层的作用主要是进行分类。前面通过卷积和池化层得出的特征,在全连接层对这些总结好的特征做分类。全连接层就是一个完全连接的神经网络,根据权重每个神经元反馈的比重不一样,最后通过调整权重和网络得到分类的结果。

因为全连接层占用了神经网络80%的参数,因此对全连接层的优化就显得至关重要,现在也有用平均值来做最后的分类的。

基本概念 卷积核 - 卷积核就是图像处理时,给定输入图像,在输出图像中每一个像素是输入图像中一个小区域中像素的加权平均,其中权值由一个函数定义,这个函数称为卷积核。kernel 卷积 - 卷积可以对应到2个函数叠加,因此用一个filter和图片叠加就可以求出整个图片的情况,可以用在图像的边缘检测,图片锐化,模糊等方面。Convolution 论文汇总 以下内容引用自[A Beginner's Guide to Convolutional Neural Networks (CNNs)](A Beginner's Guide to Convolutional Neural Networks (CNNs)),主要是为了整理和学习相关内容,从新整理了一遍。

ImageNet分类

  • Microsoft (Deep Residual Learning) Paper Slide
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Deep Residual Learning for Image Recognition, arXiv:1512.03385.
  • Microsoft (PReLu/Weight initialization) Paper
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification, arXiv:1502.01852.
  • Batch Normalization Paper
  • Sergey Ioffe, Christian Szegedy, Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, arXiv:1502.03167.
  • GoogLeNet Paper
  • Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich, CVPR, 2015.
  • VGG-Net [Web](Very Deep CNNS for Large-Scale Visual Recognition) Paper
  • Karen Simonyan and Andrew Zisserman, Very Deep Convolutional Networks for Large-Scale Visual Recognition, ICLR, 2015.
  • AlexNet [Paper](NIPS 2012 Proceedings)
  • Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, NIPS, 2012.

物体检测(Object Detection)

  • PVANET paper Code
  • Kye-Hyeon Kim, Sanghoon Hong, Byungseok Roh, Yeongjae Cheon, Minje Park, PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection, arXiv:1608.08021
  • OverFeat, NYU Paper
  • OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks, ICLR, 2014.
  • R-CNN, UC Berkeley Paper-CVPR14 Paper-arXiv14
  • Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, CVPR, 2014.
  • SPP, Microsoft Research Paper
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition, ECCV, 2014.
  • Fast R-CNN, Microsoft Research Paper
  • Ross Girshick, Fast R-CNN, arXiv:1504.08083.
  • Faster R-CNN, Microsoft Research Paper
  • Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, arXiv:1506.01497.
  • R-CNN minus R, Oxford Paper
  • Karel Lenc, Andrea Vedaldi, R-CNN minus R, arXiv:1506.06981.
  • End-to-end people detection in crowded scenes [Paper](End-to-end people detection in crowded scenes)
  • Russell Stewart, Mykhaylo Andriluka, End-to-end people detection in crowded scenes, arXiv:1506.04878.
  • You Only Look Once: Unified, Real-Time object Detection [Paper](You Only Look Once: Unified, Real-Time Object Detection), [Paper Version 2](YOLO9000: Better, Faster, Stronger), C Code, Tensorflow Code
  • Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi, You Only Look Once: Unified, Real-Time Object Detection, arXiv:1506.02640
  • Joseph Redmon, Ali Farhadi (Version 2)
  • Inside-Outside Net [Paper](Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks)
  • Sean Bell, C. Lawrence Zitnick, Kavita Bala, Ross Girshick, Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks
  • Deep Residual Network (Current State-of-the-Art) [Paper](Deep Residual Learning for Image Recognition)
  • Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Deep Residual Learning for Image Recognition
  • Weakly Supervised Object Localization with Multi-fold Multiple instance Learning Paper
  • R-FCN [Paper](R-FCN: Object Detection via Region-based Fully Convolutional Networks) Code
  • Jifeng Dai, Yi Li, Kaiming He, Jian Sun, R-FCN: Object Detection via Region-based Fully Convolutional Networks
  • SSD Paper Code
  • Wei Liu1, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg, SSD: Single Shot MultiBox Detector, arXiv:1512.02325
  • Speed/accuracy trade-offs for modern convolutional object detectors Paper
  • Jonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio Guadarrama, Kevin Murphy, Google Research, arXiv:1611.10012

视频分类(Video Classification)

  • Nicolas Ballas, Li Yao, Pal Chris, Aaron Courville, “Delving Deeper into Convolutional Networks for Learning Video Representations”, ICLR 2016. Paper
  • Michael Mathieu, camille couprie, Yann Lecun, “Deep Multi Scale Video Prediction Beyond Mean Square Error”, ICLR 2016. Paper

对象跟踪(Object Tracking)

  • Seunghoon Hong, Tackgeun You, Suha Kwak, Bohyung Han, Online Tracking by Learning Discriminative Saliency Map with Convolutional Neural Network, arXiv:1502.06796. Paper
  • Hanxi Li, Yi Li and Fatih Porikli, DeepTrack: Learning Discriminative Feature Representations by Convolutional Neural Networks for Visual Tracking, BMVC, 2014. Paper
  • N Wang, DY Yeung, Learning a Deep Compact Image Representation for Visual Tracking, NIPS, 2013. Paper
  • N Wang, DY Yeung, Learning a Deep Compact Image Representation for Visual Tracking, NIPS, 2013. Paper
  • Chao Ma, Jia-Bin Huang, Xiaokang Yang and Ming-Hsuan Yang, Hierarchical Convolutional Features for Visual Tracking, ICCV 2015 Paper Code
  • Lijun Wang, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu, Visual Tracking with fully Convolutional Networks, ICCV 2015 Paper Code
  • Hyeonseob Namand Bohyung Han, Learning Multi-Domain Convolutional Neural Networks for Visual Tracking, Paper Code [Project Page](Learning Multi-Domain Convolutional Neural Networks for Visual Tracking)

底层视觉(Low-Level Vision)

超分辨率(Super-Resolution)

  • 迭代图像重建(Iterative Image Reconstruction)
  • Sven Behnke: Learning Iterative Image Reconstruction. IJCAI, 2001. Paper
  • Sven Behnke: Learning Iterative Image Reconstruction in the Neural Abstraction Pyramid. International Journal of Computational Intelligence and Applications, vol. 1, no. 4, pp. 427-438, 2001. Paper
  • Super-Resolution (SRCNN) [Web](Learning a Deep Convolutional Network for Image Super-Resolution) Paper-ECCV14 Paper-arXiv15
  • Chao Dong, Chen Change Loy, Kaiming He, Xiaoou Tang, Learning a Deep Convolutional Network for Image Super-Resolution, ECCV, 2014.
  • Chao Dong, Chen Change Loy, Kaiming He, Xiaoou Tang, Image Super-Resolution Using Deep Convolutional Networks, arXiv:1501.00092.
  • Very Deep Super-Resolution
  • Jiwon Kim, Jung Kwon Lee, Kyoung Mu Lee, Accurate Image Super-Resolution Using Very Deep Convolutional Networks, arXiv:1511.04587, 2015. [Paper](Accurate Image Super-Resolution Using Very Deep Convolutional Networks)
  • Deeply-Recursive Convolutional Network
  • Jiwon Kim, Jung Kwon Lee, Kyoung Mu Lee, Deeply-Recursive Convolutional Network for Image Super-Resolution, arXiv:1511.04491, 2015. [Paper](Deeply-Recursive Convolutional Network for Image Super-Resolution)
  • Casade-Sparse-Coding-Network
  • Zhaowen Wang, Ding Liu, Wei Han, Jianchao Yang and Thomas S. Huang, Deep Networks for Image Super-Resolution with Sparse Prior. ICCV, 2015. Paper [Code](Deep Networks for Image Super-Resolution with Sparse Prior)
  • Perceptual Losses for Super-Resolution
  • Justin Johnson, Alexandre Alahi, Li Fei-Fei, Perceptual Losses for Real-Time Style Transfer and Super-Resolution, arXiv:1603.08155, 2016. [Paper](Perceptual Losses for Real-Time Style Transfer and Super-Resolution) Supplementary
  • SRGAN
  • Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, Wenzhe Shi, Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network, arXiv:1609.04802v3, 2016. Paper
  • Others
  • Osendorfer, Christian, Hubert Soyer, and Patrick van der Smagt, Image Super-Resolution with Fast Approximate Convolutional Sparse Coding, ICONIP, 2014. Paper ICONIP-2014

其他应用(Other Applications)

  • Optical Flow (FlowNet) Paper
  • Philipp Fischer, Alexey Dosovitskiy, Eddy Ilg, Philip Häusser, Caner Hazırbaş, Vladimir Golkov, Patrick van der Smagt, Daniel Cremers, Thomas Brox, FlowNet: Learning Optical Flow with Convolutional Networks, arXiv:1504.06852.
  • Compression Artifacts Reduction Paper-arXiv15
  • Chao Dong, Yubin Deng, Chen Change Loy, Xiaoou Tang, Compression Artifacts Reduction by a Deep Convolutional Network, arXiv:1504.06993.
  • Blur Removal
  • Christian J. Schuler, Michael Hirsch, Stefan Harmeling, Bernhard Schölkopf, Learning to Deblur, arXiv:1406.7444 Paper
  • Jian Sun, Wenfei Cao, Zongben Xu, Jean Ponce, Learning a Convolutional Neural Network for Non-uniform Motion Blur Removal, CVPR, 2015 Paper
  • Image Deconvolution [Web](Deep Deconvolution) Paper
  • Li Xu, Jimmy SJ. Ren, Ce Liu, Jiaya Jia, Deep Convolutional Neural Network for Image Deconvolution, NIPS, 2014.
  • Deep Edge-Aware Filter Paper
  • Li Xu, Jimmy SJ. Ren, Qiong Yan, Renjie Liao, Jiaya Jia, Deep Edge-Aware Filters, ICML, 2015.
  • Computing the Stereo Matching Cost with a Convolutional Neural Network Paper
  • Jure Žbontar, Yann LeCun, Computing the Stereo Matching Cost with a Convolutional Neural Network, CVPR, 2015.
  • Colorful Image Colorization Richard Zhang, Phillip Isola, Alexei A. Efros, ECCV, 2016 Paper, Code
  • Ryan Dahl, [Blog](Automatic Colorization)
  • Feature Learning by Inpainting Paper Code
  • Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, Alexei A. Efros, Context Encoders: Feature Learning by Inpainting, CVPR, 2016

边缘检测(Edge Detection)

  • Holistically-Nested Edge Detection Paper Code
  • Saining Xie, Zhuowen Tu, Holistically-Nested Edge Detection, arXiv:1504.06375.
  • DeepEdge Paper
  • Gedas Bertasius, Jianbo Shi, Lorenzo Torresani, DeepEdge: A Multi-Scale Bifurcated Deep Network for Top-Down Contour Detection, CVPR, 2015.
  • DeepContour Paper
  • Wei Shen, Xinggang Wang, Yan Wang, Xiang Bai, Zhijiang Zhang, DeepContour: A Deep Convolutional Feature Learned by Positive-Sharing Loss for Contour Detection, CVPR, 2015.

语义分割(Semantic Segmentation)

  • PASCAL VOC2012 Challenge Leaderboard (01 Sep. 2016)