Lecture 9: Perceptual Grouping

Lecture 9: Perceptual Grouping

Problems with the Hough Technique

Although widely used, the Hough technique has inherent problems which have have already discussed. In particular, because of side lobes, it is unable to distinguish single lines in the case where multiple lines appear in the image window under transformation. It can frequently happen that the extracted edge points form two lines which can be clearly perceived, as shown in diagram 9.1. However, under certain quantisations, the Hough transform may indicate only one main peak in the histogram. Humans will easily separate the two lines. This is due to the fact that there is information in the picture that is being ignored by the Hough transform, eg:

the slopes of the lines

the colour of the lines

the intensity (brightness of the lines)

and human perception can easily make use of this. None of these properties have anyting to do with contextual information discussed in the previous lecture. A further limitation of line extraction by the Hough technique is in the identification of the feature points. Line segments can frequently be perceived by humans even though their edge strength may be low compared with other features. Colour, level of intensity, curvature and contextual information may all provide cues for feature recognition. A further complication is provided by noise, which is often present in real images. Thus, if we reduce the edge point threshold for identifying features with low edge magnitude, the number of erroneous feature points caused by the noise increases. It is therefore an attractive proposition to look for ways in which we can incorporate perceptual criteria into the Hough technique, such that we can distinguish real feature points from noise, and separate line segments with different features. In particular, we are interested in domain independent methods for doing this.

Perceptual Grouping

It is clear that humans can easily partition scenes into meaningful subsets in a highly reliable, and automatic way. It has been proposed that one important part of this organization is the division of the visual field into two parts, figure and ground. The figure usually appears to be nearer than the ground which is extended uniformly behind. This figure-ground organization is one form of perceptual organization. At first sight it would appear that figure ground organisation comes about mainly through the humans knowledge, and contextual information. However, it is also clear that no scene will ever be viewed twice with the same illumination or viewpoint. It therefore follows that some discrimination takes place which is independent of the scene being viewed.

Research into how this domain independent feature extraction comes about was carried out by the Gestalt psychologists. Their fundamental principles of perceptual organization are a set of generic criteria which they suggest underlie the natural mechanisms for partitioning the visual field. One of the earliest and intuitively most acceptable collection of such laws were proposed by Wertheimer in1923. These laws of organization have been formulated on the basis of their use in identifying ambiguous patterns they include:

The Law of Proximity: Stimulus elements which are closer tend to be perceived as one entity. It will be observed that the closer elements in diagram 9.2 can be perceived as groups forming vertical columns.

The Law of Similarity: Similar elements of a stimulus tend to be part of a unit. This similarity may be in grey level, colour, orientation or shape as shown in diagram 9.3, which is perceived as horizontal columns.

The Law of Good Continuity: Stimuli tend to form a group which minimizes a change or discontinuity as demonstrated in diagram 9.4, which is perceived as two lines with first order continuity.

The Law of Closure: Stimulus elements tend to be grouped into a commonly known complete figure. The stimulus in diagram 9.5 will be perceived as a circle despite the fact that some part of it is missing.

The Law of Symmetry: The regions which are surrounded by symmetrical borders are perceived as coherent figures in the scenes shown in diagram 9.6.

The Law of Simplicity: In the stimulus where more than one figure can be perceived, the ambiguity is resolved in favour of the simplest alternative. For example if a smaller number of different angles or lines are required to interpret a figure as three-dimensional instead of two-dimensional, the observer will normally choose the three-dimensional alternative. This effect is shown in diagram 9.7.

The Law of Common Fate: If a group of elements are moving with a common uniform velocity through a field of similar elements, the moving elements are perceived as part of a coherent group.

The Perceptual Hough Transform

The Gestalt laws are simple and of great generality. In order to apply them to the line extraction process, we need to formulate grouping criteria from them in terms of the image parameters that we can readily measure. The following properties of individual points in the raster image are measured readily by standard methods, and can be used to form criteria based on the similarity law:

grey level intensity

edge point magnitude

edge point direction

edge point colour

The proximity of points within any group can easily be obtained by checking distance in a pairwise fashion. For points which are to be aggregated into line segments it is possible to check for continuity of the line and for points to be aggregated into curves we can check for low curvature, which gives us a curvilinear continuity criterion under the simplicity law.

In principal it is very simple to incorporate these into the Hough transform. We proceed as before to create the histogram in parameter space, and then post process the histogram to remove any voting edge points which do not conform to the perceptual criterion that we are applying. In the simplest case, if we apply the criterion of similarity in grey level, then for each point of the histogram where there are sufficient voting points to assume that there is a line segment, we check for clusters in the grey levels of the points. This may most conveniently be done by sorting the points in order of grey level, and looking for wide gaps between adjacent points. Any point isolated from its neighbours in grey level may be removed, and the largest cluster only is retained. Thus, for each point on the histogram, the number of voting points is reduced, and this may result in a different selection being made of the most significant line.

A similar process can be applied to edge point magnitude and to colour. For colour, it is reasonable to use psychological parameters that we introduced perviously (diagram 3.2) which include hue, saturation as well as intensity. The simplest measure then becomes the vector from the white point to the colour point for the image point being tested. For many applications it may be sufficient to consider the direction only, which defines the pure colour. If saturation information is required, then it may be measured by the vector magnitude, and treated as another perceptual criterion.

Proximity and continuity can be handled in a variety of ways, depending on the information that we seek to recover from the image. If we are seeking to extract straight lines, then proximity only applies in a local area, and reduces to the same criterion as continuity, which we can check simply by finding the distance between adjacent points on the line. For detection of other shapes defined by templates, then proximity can be measured by, for example, finding the smallest bounding circle which encloses all the points.

We have already noted that edge point direction can be used to filter unwanted information from Hough transform. However, some care must be taken in the way in which this property is made into a perceptual criterion. One heuristic is to eliminate edge points if their edge point direction direction does not correspond to the line direction, q, for which they vote. This is a reasonable procedure for many edges, however it makes the assumption that the features to be extracted are straight lines, whereas in general the Hough transform will be used to identify piecewise straight line approximations to curved boundaries. Moreover it excludes lines organised by the theta aggregation principle noted by Marr. This states that boundaries are perceived in cases where the edge directions are parallel, though not necessarily in the same direction as the tangent to the boundary. An example is shown in diagram 9.8. For curved boundaries, even in the case where we are using the Hough transform to detect line segments which form a piecewise approximation to the curve, a curvature criterion can be applied by allowing a slow drift in the edge magnitudes. We can therefore deduce two perceptual criteria. For theta aggregation, we can process the edge magnitudes at a histogram point by clustering them, and selecting the largest. For curvilinear continuity, we order the edge points according to their position on the curve, and check for gaps in edge magnitude between adjacent points. As long as adjacent points are close in edge orientation we retain them, allowing a drift along the curve segment.

In the preceding paragraphs we have identified eight criteria that are readily applicable for filtering unwanted line segments from a Hough transform, and there are more possibilities. These are powerful criteria, since they are domain independent, and equally applicable to any image. We need now to consider how we apply them in practical cases. Here, domain dependence becomes unavoidable, because the choice of criterion depends on the features that we wish to extract from the image. For example, if features have lines of identifiable colour, then we simply apply the pure colour criterion to the histogram. On the other hand if specularity is a property of the boundaries we wish to detect, then colour saturation becomes the dominant criterion. For general purpose boundary detection, it may be possible to apply several criteria. Thus we may filter the histogram first on similarity in either edge magnitude or intensity, depending on whether the edges are formed by distinct large steps caused by shadows, or by slower, higher order changes in intensity, caused by ridges or valleys. Then we could apply a second criterion either theta aggregration or curvilnear continuity. Clearly, the choices we make depend on the higher level of the vision system into which we introduce the perceptual criteria. Despite this, the domain independence of the criteria means that they can be parameterised, and implemented in hardware as a constant component of any vision system.

One important feature of this method is that we need not threshold the edge detector output, since unwanted edges are removed at a later stage. All points in the edge map are retained unless their magnitude is too small to provide accurate edge direction information. Thus more information is retained and passed onto the higher level. We therefore retain the capability of detecting weak, but perceptually significant edge features, in the presence of noise or stronger edge features. We also obviate the need to choose a threshold, and so lower the domain dependence. This retention of information is a practical example of the application of Marr's principle of least commitment mentioned in lecture 1.