Lecture 7: The Hough Transform

 

We saw in the last lecture that tracing edge points to form contours, although effective for some classes of problems, was not widely applicable as a solution for extracting boundaries. Moreover, we saw that, for success, a number of heuristics needed to be introduced. An alternative to heuristics is the use of prior knowledge. In general, we will require our vision system to find objects about which we can make some statements. Moreover, we must be able to encode the prior knowledge in a useable way.

 

One fundamental way we can describe our objects is by the mathematical functions that describe the boundary curves. A highly effective method of utilising this kind of information is the Hough Transform, named after Paul Hough who patented the method in 1962. Complex mathematical functions, although theoretically possible to utilise, make unrealisable computational demands, so the method is normally restricted to first and second order boundary equations (straight lines and conic sections). We will consider straight line extraction first.

 

In the original Hough transform, much of the information in the edge map is not used, and for the first step the edge image is converted to a bi-level form by selecting a threshold. Any point whose gradient is above the threshold is considered to belong to a boundary. These pixels are henceforth referred to as edge points. This technique is fine for images with strong contrast, but will reduce its applicability for general shaded images.

The Hough transform is between Cartesian space, and some parameter space in which the straight line (or other boundary formulation) can be defined. If we consider the normal Cartesian formulation of a straight line:

 

y - mx - c = 0

(Eq 1)

 

we can regard the two constants m and c as parameters defining the line. If we choose one point on the Cartesian axis [x,y] it can be considered to belong to a whole family of lines defined by different values of m and c. One point [xi,yi] in Cartesian space will correspond to a line in the m-c space with equation c = -xi m + yi. Thus, if we take a set of points in Cartesian space, this corresponds to a set of lines in m-c space as shown in diagram 7.1. If those points were co-linear, it is easy to see that all the lines meet in a single point, and that point defines the slope and offset of line. In practice, we may have many lines, so the technique is to divide up the m-c space into small areas, and count the number of lines which cross each. The [m,c] value at the centre of the area with the largest number of lines is used as the estimate of the most likely line in Cartesian space. Unfortunately, if we consider all the possible lines that can appear in an image, the slope parameter m covers an infinite range. For that reason the [m,c] paramterisation is difficult to use.

 

A second, more useful parameterisation is shown in Diagram 7.2. A line can be represented by its shortest distance from the origin (r) and its orientation (q). From this diagram we can obtain an equation for the line equivalent to equation 1:

 

r = x Cos(q) + y Sin(q)

(Eq. 2)

 

A mathematical interlude, perhaps of more use later than now:

 

To see what corresponds to a point in [xi,yi] in the r-q space, we make a substitution:

 

M = Ö (xi2 + yi2)

 

Which allows us to write equation 2 in the form:

r =

M { Cos(f) Cos(q) + Sin(f) Sin(q) }

=

M Cos(q-f)

 

where f is a constant defined by:

 

Cos(f) = xi/Ö (xi2 + yi2) , Sin(f) = yi/Ö (xi2 + yi2)

 

So, this time a point in Cartesian space corresponds to a sine wave in r-q space, and we are searching for the point where the majority of these sinusoids intersect. (see diagram 7.3)

 

Equation 2 can be considered a relation between the co-ordinates [x,y] of some point in the edge image, and the values of the parameters [r,q] which define a line. As before, in any practical case, we are seeking not to identify a single intersection, but a cluster of intersections. We therfore must quantise the parameters into discrete values.

 

The main advantage of the r-q parameterisation is that quantisation is easy. Looking at diagram 7.3, it should be clear that not all the parameter space need be considered. The sinusoids all have the same period, and therefore we can limit q to the range [0..2p] without loss of generality. The range can be divided into equal angles denoted:

 

qi, 2qi, 3qi, . . . 2p,

 

It is further possible to limit the range over which we consider r. If the edge image has resolution nxm, it is not necessary to consider values of r greater than Ö (n2+m2). As with q we can divide the range of r [0..rmax] into a number of discrete steps:

 

0, ri, 2ri, 3ri . . . rmax

 

From diagram 7.3 it is clear that only positive values of r need be considered if we are taking q over the range [0..2p].

 

The Hough transform can now be computed. First we set up an histogram array in which the rows correspond to the quantised values of r, and the columns to the values of q. For each edge point of the image [xi,yi] we can use equation 2 to compute a value of r for each q in our range. This is then rounded to the nearest value represented in the array, and the corresponding entry in the histogram array is incremented. Thus each point in the edge map transforms into a number of points in the quantised [r,q] space, or we can say that each point "votes" for a number of lines.

 

Once we have processed all the points in the edge image, we look through the histogram array, and each place which contains a sufficiently large number of votes defines the equation of a line in the image. Note that the accuracy and the computation time for the Hough transform are dependent on the size of the accumulator array actually used.

 

The Wallace Hough Transform

 

An interesting parameterisation for raster images has been suggested by Wallace. If the space which is to be transformed is already quantised, we can use that quantisation directly rather than introduce another. Hence, lines on a raster screen are defined in terms of the end points as shown in Diagram 7.4. The pixels on the edge of the boundary window are numbered anti-clockwise from the bottom left hand corner. The histogram is a triangular array, since the finish point can always be taken beyond the start point.

 

It is possible for each line in the space to compute which points belong to it by using Bresenham's algorithm. This information can be used to determine the number of edge points that vote for the line. Notice that it is not necessary to fill up a square accumulator array in this case. Only a triangular array is necessary since there is a symmetry in the end points that can be exploited.

 

Side Lobes and Bias

 

One inherent problem comes from the fact that each point in the edge map votes for more than one line in the parameter space. In fact, for the ideal case where we have one straight line, there will be one main peak in the histogram, and then several smaller peaks, reducing in magnitude as the distance from the main peak increases. These secondary peaks are called side lobes. They cause a problem when there are several lines in the image. The combined effect of the side lobes can be to create a false peak in the histogram.

 

A similar problem to side lobes is inherent bias in the parameterisastion. This results from the fact that a finite space is used, rather than an infinite one. Using the r-q method, it is clear to see that lines with small values of r in a finite image will usually be longer than those with larger values of r, which are more readily clipped by the edges of the image (Diagram 7.5). Hence, for large r, a histogram point for a significant line will have fewer votes than one with small r. Bias may be removed by either adopting non linear quantisation of the parameter space, or by measuring the histogram for a random sample of edge points, and normalising the image histogram accordingly.

 

Heuristics in the Hough Transform

 

Some workers have proposed heuristics for reducing the effects of noise, and of side lobes. One common idea is to eliminate points from the parameter space if they do not agree with the edge point direction. Thus, for point [xi,yi] with edge point direction f(xi,yi) we assume that it will belong to a line with direction q = 90-f(xi,yi) and consider only points in the range q ± Dq, where Dq is a small quantity, corresponding to three or four quantised levels of q. This appealing idea is really only successful if the image consists of well defined straight lines, as say an engineering drawing, or a photograph of the Barbican centre, may do. Any more complex an image will be over filtered by this method.