SUN: A Bayesian framework for saliency using natural statistics - PubMed Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2008 Dec 16;8(7):32.1-20.
doi: 10.1167/8.7.32.

SUN: A Bayesian framework for saliency using natural statistics

Affiliations
Comparative Study

SUN: A Bayesian framework for saliency using natural statistics

Lingyun Zhang et al. J Vis. .

Abstract

We propose a definition of saliency by considering what the visual system is trying to optimize when directing attention. The resulting model is a Bayesian framework from which bottom-up saliency emerges naturally as the self-information of visual features, and overall saliency (incorporating top-down information with bottom-up saliency) emerges as the pointwise mutual information between the features and the target when searching for a target. An implementation of our framework demonstrates that our model's bottom-up saliency maps perform as well as or better than existing algorithms in predicting people's fixations in free viewing. Unlike existing saliency measures, which depend on the statistics of the particular image being viewed, our measure of saliency is derived from natural image statistics, obtained in advance from a collection of natural images. For this reason, we call our model SUN (Saliency Using Natural statistics). A measure of saliency based on natural image statistics, rather than based on a single test image, provides a straightforward explanation for many search asymmetries observed in humans; the statistics of a single test image lead to predictions that are not consistent with these asymmetries. In our model, saliency is computed locally, which is consistent with the neuroanatomy of the early visual system and results in an efficient algorithm with few free parameters.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Four scales of difference of Gaussians (DoG) filters were applied to each channel of a set of 138 images of natural scenes. Top: The four scales of difference of Gaussians (DoG) filters that were applied to each channel. Bottom: The graphs show the probability distribution of filter responses for these four filters (with σ increasing from left to right) on the intensity (I) channel collected from the set of natural images (blue line), and the fitted generalized Gaussian distributions (red line). Aside from the natural statistics in this training set being slightly sparser (the blue peak over 0 is slightly higher than the red peak of the fitted function), the generalized Gaussian distributions provide an excellent fit.
Figure 2.
Figure 2.
The 362 linear features learned by applying a complete independent component analysis (ICA) algorithm to 11 × 11 patches of color natural images from the Kyoto data set.
Figure 3.
Figure 3.
Examples of saliency maps for qualitative comparison. Each row contains, from left to right: An original test image with human fixations (from Bruce & Tsotsos, 2006) shown as red crosses; the saliency map produced by our SUN algorithm with DoG filters (Method 1); the saliency map produced by SUN with ICA features (Method 2); and the maps from (Itti et al., 1998, as reported in Bruce & Tsotsos, 2006) as a comparison.
Figure 4.
Figure 4.
Plots of all human eye fixation locations in three data sets. Left: subjects viewing color images (Bruce & Tsotsos, 2006); middle: subjects viewing gray images (Einhäuser et al., 2006); right: subjects viewing color videos (Itti & Baldi, 2006).
Figure 5.
Figure 5.
The average saliency maps of three recent algorithms on the stimuli used in collecting human fixation data by Bruce and Tsotsos (2006). Averages were taken across the saliency maps for the 120 color images. The algorithms used are, from left to right, Bruce and Tsotsos (2006), Gao and Vasconcelos (2007), and Itti et al. (1998). All three algorithms exhibit decreased saliency at the image borders, an artifact of the way they deal with filters that lie partially off the edge of the images.
Figure 6.
Figure 6.
Illustration of edge effects on performance. Left: a saliency map of size 120 × 160 that consists of all 1’s except for a four-pixel-wide border of 0’s. Center: a saliency map of size 120 × 160 that consists of all 1’s except for an eight-pixel-wide border of 0’s. Right: the ROC curves of these two dummy saliency maps, as well as for a baseline saliency map (all 1’s). The ROC areas for these two curves are 0.62 and 0.73, respectively. (The baseline ROC area is 0.5.)
Figure 7.
Figure 7.
Illustration of the “prototypes do not pop out” visual search asymmetry (Treisman & Gormican, 1988). Here we see both the original image (left) and a saliency map for that image computed using SUN with ICA features (right). Top row: a tilted bar in a sea of vertical bars pops out-the tilted bar can be found almost instantaneously. Bottom row: a vertical bar in sea of tilted bars does not pop out. The bar with the odd-one-out orientation in this case requires more time and effort for subjects to find than in the case illustrated in the image in the top row. This psychophysical asymmetry is in agreement with the calculated saliency maps on the right, in which the tilted bars are more salient than the vertical bars.
Figure 8.
Figure 8.
Illustration of a visual search asymmetry with line segments of two different lengths (Treisman & Gormican, 1988). Shown are the original image (left) and the corresponding saliency map (right) computed using SUN with ICA features (Method 2). Top row: a long bar is easy to locate in a sea of short bars. Bottom row: a short bar in a sea of long bars is harder to find. This corresponds with the predictions from SUN’s saliency map, as the longer bars have higher saliency.
Figure 9.
Figure 9.
Demonstration that non-linear features could capture discontinuity of textures without using a statistical model that explicitly measures the local statistics. Left: the input image, adapted from (Bruce & Tsotsos, 2006). Middle: the response of a linear DoG filter. Right: the response of a non-linear feature. The non-linear feature is constructed by applying a DoG filter, then non-linearly transforming the output before another DoG is applied (see Shan et al., 2007, for details on the non-linear transformation). Whereas the linear feature has zero response to the white hole in the image, the non-linear feature responds strongly in this region, consistent with the white region’s perceptual salience.

Similar articles

Cited by

References

    1. Barlow H (1994). What is the computational goal of the neocortex? In Koch C (Ed.), Large scale neuronal theories of the brain (pp. 1–22). Cambridge, MA: MIT Press.
    1. Bell AJ, & Sejnowski TJ (1997). The “independent components” of natural scenes are edge filters. Vision Research, 37, 3327–3338. - PMC - PubMed
    1. Bruce N, & Tsotsos J (2006). Saliency based on information maximization In Weiss Y, Schölkopf B, & Platt J (Eds.), Advances in neural information processing systems 18 (pp. 155–162). Cambridge, MA: MIT Press.
    1. Bundesen C (1990). A theory of visual attention. Psychological Review, 97, 523–547. - PubMed
    1. Carmi R, & Itti L (2006). The role of memory in guiding attention during natural vision. Journal of Vision, 6(9):4, 898–914, http://journalofvision.org/6/9/4/, doi:10.1167/6.9.4. [Article] - DOI - PubMed

Publication types