N A T A N J A C O B S O N
Graduate Student Researcher
Video Processing Lab
ECE @ UCSD
In what ways does stereoscopic depth affect bottom-up human visual saliency? Wolfe and Horowitz (2004) classified stereoscopic depth as a "probable guiding attribute for visual attention" (Wolfe & Horowitz, 2004, p. 24). This is fairly inconclusive, as the other classifications include "undoubted attribute", "possible attribute", "doubtful case" and "probable non-attribute". The classification of probable guiding attribute implies that more data may help to clear up ambiguities. The later work of (Liu, Cormack, & Bovik, 2009, 2011) explored the relationship between luminance and stereoscopic depth in natural images. They found a high correlation between the two features (i.e., between the wavelet coefficients of co-located luminance and depth/disparity). Further analysis of the depth feature was explored by Jansen, Onat, and Konig (2009), in which mean disparity and disparity contrast were investigated with respect to human attention. They found that, for scenes containing objects, subjects are more likely to fixate closer objects than farther objects, but for images composed of white and black pixels (e.g. noise based images) disparity contrast has a small effect on the location of fixations. Jansen et al. also determined through experimentation that subjects are more likely to fixate towards the center of objects in a scene without a stereoscopic depth cue. Liu2010 reported an eye-tracking experiment to investigate the relationship between stereoscopic depth and human fixations. They found that the disparity contrast and gradient measures at human fixations were lower than at randomly sampled positions. This finding is surprising, as the opposite trend is observed for luminance. That is, the luminance contrast and gradient measures are higher at fixation locations than at randomly sampled positions. One possible explanation given by Liu et al. is that bottom-up saliency is inherently linked with low-bandwidth visual processing. The inference of depth ordering at regions with a high disparity contrast is a more demanding visual processing task, for which the bandwidth may not be available in this stage.
In this work, we extend the research of Liu et al. (2010) by performing a set of experiments using a mirror stereoscope and an eye-tracking system to measure fixations. Subjects were shown a collection of images from two separate stimulus sets, where each image was shown with or without a stereoscopic depth cue. The first stimulus set used is the same as utilized by Liu et al. (2010). This commonality allows us to validate previous results using a larger set of subjects. As these previous results are counter-intuitive, we examined a second stimulus set to determine if they are generalizable. The second stimulus set was vastly different in its construction. While for the first stimulus set, depth information was calculated post-hoc and is therefore subject to error; depth for the second stimulus set was determined using structured light, and is more accurate. In addition, the second stimulus set contains color images while the first is monochrome. Finally, the second stimulus set contains man-made scenes while the first is solely of "natural" scenes. Because the second stimulus set contains synthetic scenes, it is possible that attention-drawing, high contrast luminance patches are co-located more frequently with high depth gradients than in the first dataset. In the analyses to follow, we examined the relationship between luminance and depth features at human fixations, as well as randomly sampled positions for the two stimulus sets. Our goal was to develop a richer understanding of the connection between stereoscopic depth and bottom-up saliency.
[1] Jansen, L., Onat, S., & Konig, P. (2009, January). Influence of disparity on fixation and saccades in free viewing of natural scenes. J. Vis., 9(1), 1-19.
[2] Liu, Y., Cormack, L., & Bovik, A. (2011, September). Statistical modeling of 3-d natural scenes with application to bayesian stereopsis. IEEE Trans. Image Process., 20(9), 2515-2530.
[3] Liu, Y., Cormack, L. K., & Bovik, A. C. (2009). Luminance, disparity, and range statistics in 3d natural scenes. In Human vision and electronic imaging.
[4] Liu, Y., Cormack, L. K., & Bovik, A. C. (2010, October). Dichotomy between luminance and disparity features at binocular fixations. J. Vis., 10(12), 1-17.
[5] Wolfe, J. M., & Horowitz, T. S. (2004). What attributes guide the deployment of visual attention and how do they do it? Nat. Rev. Neurosci., 5(6), 495-501.