Detections and Applications of Saliency on 3D Surfaces by Using Retinex Theory

  • Yitian Zhao

Student thesis: Doctoral ThesisDoctor of Philosophy


Unlike traditional 2D images, which are projections of the real world onto a two-dimensional surface, 3D images express the geometry of the objects of interest directly in terms of a set of points, a mesh, or a surface composed of points with three-dimensional coordinates. The size or shape of this 3D information of an object may be computed almost directly from its three-dimensional representation. 3D imaging geometry essentially simulates human binocular vision, and enables a direction acquisition of the depth information from the camera to the object of interest. It finds a variety of applications ranging from reverse engineering, urban planning and simulation to computer games. With the evolution in recent years of more modern technologies and devices, there has been enormous growth in the number of 3D models/3D images and their availability to various communities. Examples include the National Design Repository, which stores 3D computer-aided design (CAD) models for tens of thousands of mechanical parts; and the Princeton Shape Benchmark (PSB) with 36,000 everyday objects represented as polygonal surface models. Most of the latest scanners can generate a huge number of data points within a limited time (a matter of minutes). Even a single scan might contain millions of points, which often leads to expensive computation and storage. The development of relevant software has not matched that of 3D hardware. As the complexity of these data points has increased, the digital representation of the real world objects has become more accurate, but there is a trade-off between degree of accuracy and the cost of processing and storage of these models. Therefore, reduction of information content or simplification of the 3D data points is useful for efficient processing, and necessary for visualization in some cases. In the course of a thorough review of the relevant literature, we found that the existing simplification algorithms perform inadequately, especially at a very high simplification rate. In recent years, the notion of human visual perception has been explored with a view to aiding simplification. With a view to retaining the important surface features and details, the selection of samples is now guided both by geometric properties and by the visual attention properties of the surface. Thus, as the criteria of the simplification or interest points detection, salient regions and non-salient regions can be processed separately, preserving more vertices or facets from salient regions, while selecting fewer vertices or facets from non-salient regions (in our proposed interest points detection method, only select points from salient regions). The estimation of the perceptual properties/saliency of the target object is thus a very important pre-process for simplifying highly complicated 3D models. In this dissertation, a surface smoothing and two novel saliency detection methods on 3D models are proposed. The acquired data usually contains imaging noise, due to low reflection or specular reflection, occlusion and depth discontinuity. Sometimes an Abstract rough surface is generated due to the rapid changes of orientation and vertex locations of reconstructed surfaces caused by noise introduced in the process of surface scanning, image registration and integration. Hence, an extended non-local means filter has been proposed in the case of a 3D surface. To the best of our knowledge, there is no previous work on non-local means filtering of mesh with B-spline optimization. As we know, the non-local means filter takes advantages of the high degree of redundancy of any natural image. For a given pixel, the restored grey value is obtained by the weighted average of the grey values of all pixels in the image; each weight is proportional to the similarity between the local neighbourhood of the pixel being processed and the neighbourhood corresponding to the other image pixels. With this filter, a smoother version can be robustly obtained, since it defines the similarity between patches of pixels, rather than between the individual pixels themselves. However, when extending the 2D non-local means filter to the processing of a 3D mesh, a problem arises in the determination of the similarity neighbourhood. 2D images usually have a regular structure, which in most cases it is not true for a mesh due to variations of sampling density in the range scanning process. In this work, the B-spline is employed to determine the similarity neighbourhood, which in turn generates the control net for the input mesh. The advantage of using B-spline surfaces is that the underlying control net is topologically similar to the image grid structure. The first saliency detection approach adapts Retinex from a 2D image enhancement technique to analysis of geometry or shape variation in 3D models. Retinex investigates the theory behind the constancy of colour. It explains from a psychological perspective why the colours perceived by human beings are relatively stable, usually irrespective of illumination conditions. Retinex has also been imported into the computer vision field, in which the captured data are often unsatisfactory due to low contrast - either locally or globally - caused by too weak or too strong illumination, or even shadow. Retinex is extended here to enhance 3D shape information and aid analysis of global shape and local geometrical details. Normally, human perception and objective information with respect to vision are not in agreement. The human brain interprets an image of a 3D shape differently from how photo-sensors or scanners may sense it, by consciously correcting brightness and removing noise, shadows, glare, or reflections. After the application of Retinex, the 3D shape, component or surface may be represented more faithfully to the original, simulating the effect of human visual systems. After using the Retinex to enhance the surface, a random centre-surround saliency detection is proposed. The main structure of our saliency system is based on the general layout of psychological attention models, and it improves and extends the concept of mesh saliency, integrated for more accurate detection of importance/saliency of points. While the first saliency detection approach is powerful for the characterization of the importance/saliency of points, it may be affected by imaging noise or depth discontinuity, leading to the salient regions being only partially detected. To overcome this shortcoming, a second method is proposed that measures similarity based on patches, rather than individual points. This saliency detection approach is an extension from the first saliency detection method. Based on observations from studies of biological vision, we know that the human vision system is sensitive to contrast in visual signal. It is widely believed that human cortical cells may be hard-wired to respond preferentially to high contrast stimulus in their receptive fields. Therefore, if a specific contrast for the 3D surface is generated, it may also be used to illustrate the difference in the geometry or topology that makes the local details or global shape distinctive. In this study, by combining Retinex-based Importance Feature, and Relative Distance, a weighted dissimilarity map is obtained to generate the ‘surface contrast’. The dissimilarity map is estimated as the sum of difference between geometric invariance of different points inside two patches, inversely proportional to their Euclidean distance. Subsequently, the global nature of salient regions are captured by considering the symmetric surround saliency. As noted above, as we know humans pay more attention to those image regions that contrast strongly with their neighbours. To determine the region-based saliency, a region-growing segmentation is employed to segment the surface. The results show that the proposed approach has the ability to locate the distinctive regions faithfully. In order to validate the proposed saliency detection methods, the detected salient regions have been applied to simplification, and interest points detection. A large number of experiments based on real data captured by Minolta Vivid 700 range camera show that more details have been retained in the process of surface simplification, the detected interest points are more repeatable - useful for the representation of the geometry and detail of the object of interest. In addition, the comparative studies also show that the propose techniques outperform the state-of-the-art methods and have clear advantages.
Date of Award07 Apr 2014
Original languageEnglish
Awarding Institution
  • Aberystwyth University
SupervisorYonghuai Liu (Supervisor) & Fred Labrosse (Supervisor)

Cite this