Deep learning has changed the way visual problems are addressed in machine learning. Elements such as convolutional neural networks (CNN) have now become the standard architecture for areas like image recognition and computer vision. Research in these areas has unveiled scores of new theoretical concepts and innovative practical implementations. A plethora of concepts are available to achieve these futuristic technologies.
In this article, we discuss saliency maps, which is one of the most talked-about image recognition concepts now being used in deep learning. Saliency maps have long been present and used in the image recognition space. We explore the logic behind this concept and see how it is implemented in deep learning.
An Introduction To Saliency Maps
Saliency maps in deep learning were first witnessed in the paper titled Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. The paper presented by researchers of Visual Geometry Group at the University of Oxford, highlighted visualisation techniques to compute images, saliency maps being one of them.
This method is derived from the concept of saliency in images. Saliency refers to unique features (pixels, resolution etc.) of the image in the context of visual processing. These unique features depict the visually alluring locations in an image. Saliency map is a topographical representation of them.
These maps were first proposed by neuroscientists Laurent Itti, Christof Koch and Ernst Niebur in their study on feature extraction in images. They give a detailed description which is given below.
“The purpose of the saliency map is to represent the conspicuity— or ‘saliency’—at every location in the visual field by a scalar quantity and to guide the selection of attended locations, based on the spatial distribution of saliency. A combination of the feature maps provides bottom-up input to the saliency map, modelled as a dynamical neural network.”
Saliency maps process images to differentiate visual features in images. For example, coloured images are converted to black-and-white images in order to analyse the strongest colours present in them. Other instances would be using infrared to detect temperature (red colour is hot and blue is cold) and night vision to detect light sources(green is bright and black is dark).
How To Create Saliency Maps?
This section presents the steps generally considered in creating saliency maps for learning algorithms such as neural networks. For this, the saliency model created by Itti is borrowed for explanation purposes. The model considers three features in an image, namely colours, intensity and orientations. These combinations are presented in the saliency map. The model uses a winner-takes-it-all neural network for working with the saliency map.
The steps are given below:
- The three features mentioned above are extracted from input images.
- For colours, the images are converted to red-green-blue-yellow colour space. For intensity, the images are converted to a grayscale.
- The orientation feature is converted using Gabor filters with respect to four angles.
- All of these processed images are used to create Gaussian pyramids to create feature maps.
- The feature maps are created with regard to each of the three feature. The saliency map is the mean of all the feature maps.
Improvements In Saliency
Recent studies in deep learning have seen vast improvements as well as newer variations in saliency maps. One study has focussed on performance and high level detailing in images. The study dubbed Deep Gaze I, works on object detection in images with respect to fixation prediction. The saliency model which is built on this technique has shown significant improvements with regard to standard saliency models. In addition, this study has also proved that CNNs have pushed saliency prediction to a new level with its magnificent results.
A new study has proposed a unique framework for saliency detection. The researchers present a deep learning framework on a contextual level known as ‘global context’ and ‘local context’. It means image features considered for a full image and partial(processed) image respectively for saliency. With an extensive pre-training method and testing on five datasets, they show that the saliency detection is consistent over the standard models.
The developments in saliency models have even led to various practical applications right from video surveillance to traffic light detection. In the case of automated video surveillance, the object detection is done using a modified principal component analysis (PCA) technique to analyse objects with a dynamic behaviour in the background, and saliency maps are created using these objects in the captured images.
In traffic light detection, a sensor is placed along the visual equipment to capture areas of interest. Saliency maps are created based on the traffic light condition in the images through an illumination algorithm.
Conclusion:
The aggressive developments in saliency detection have almost achieved a human-like precision when it comes to recognising features. Be it with respect to datasets, learning models or with performance, saliency maps is the next big thing for computer vision and image processing projects.