Lecture 5: Active Contours
In the last lecture we considered how to obtain an edge map from a raster image. The edge map consists of, for each pixel in the image, a specification of the gradient magnitude and direction. Edge points are in themselves not very useful entities, and so we now consider some methods of joining them up to form closed boundaries.
Graph Searching
The simplest, and also the least effective method of grouping edges is to use heuristic search. That is to say, the algorithm starts from a single boundary pixel, and then attempts to join neighbouring pixels on the basis of their edge strength and direction. At the end of the process the boundary contours are thinned by removing pixels at points where it is more than one pixel thick. This strategy is greatly enhanced if use is made of the edge magnitude and direction. For a given image, let the gradient magnitude be specified by g(x,y) and the gradient direction by angle q(x,y). We can immediatly define heuristics based on gradient to indicate when two adjacent pixels can be joined. A threshold, f, can be defined and a restriction introduced that two adjacent pixels can only be joined if: ( q(xi,yi) - q(xj,yj) ) mod 2p < f
A second heuristic is to take into account the absolute value of the gradient. When two or more pixels are joined, a curve is created which has a tangent defined at each pixel. Looking at Diagram 5.1 we only consider joining arcs to pixels if the implied curve gradient is orthogonal to the edge point direction. This method does work well when the boundaries being recovered are straight lines from, for example, an old engineering drawing. However, it is a poor heuristic for shaded images as we shall see later.
Heuristic restrictions allow us to reduce the problem to the search of a directed graph, in which the pixels are the nodes, and the arcs are those adjacencies allowable under the above restrictions. Now a boundary determination can be made by weighting the nodes according to their gradient magnitude. For any two pixels for which there is more than one path, a graph search can be initiated to find the most likely path (Diagram 5.2). A third heuristic can be used, which is that, given two competing paths, the shorter is the more likely. A weighting factor is defined based on the gradient. If the maximum gradient available is gmax, a suitable weighting function is:
w(x,y) = g(x,y)/gmax
NB the gradient magnitude is always in the range 0 to gmax, so the weight simply gives a measure between 0 and 1 as to how likely the pixel is to be a boundary point. The problem now becomes a simple cost minimisation one which can be solved by any of the standard search techniques (eg depth first search with backtracking and keeping as many expanded paths as memory allows, branch and bound etc.). Other heuristic measures such as the difference in gradient, or the direction of the target node can also be incorporated.
Dynamic Programming Solution
Boundary following can be considered an optimisation problem which can be solved by a variety of techniques. One early method suggested by Ballard is to use dynamic programming. To use optimisation techniques we need to associate a function to otptimise with a boundary. A simple solution is to sum the differences in gradient between adjacent pixels and subtract the edge strengths around the boundary hence we are seeking to minimise:
n |
||
f(p1,p2, . . pn) = |
S |
{ ( |q(pi-1) - q(pi)| ) - a g(pi)}/n |
i=1 |
where, the pi are the boundary points [xi,yi] with the constraint that pi is adjacent to pi+1. For a closed boundary, we have that p1=pn, and for a boundary segment between p1 and pn we define q(p0)=q(p1). a is a constant defining the relative importance of the two criteria. The problem is now to find the boundary which minimises this function. In practice, we shall use the method to join up boundary points rather than to search for a complete boundary. Applied without any a-priori knowledge, the method comes down to exhaustive search. Starting from some pixel p1=ps, we compute a value of f(ps,p2) for p2 equal to each of its possible eight neighbours. The set of all possible paths of length 3 pixels beginning from the start, ps, fall in the square of 25 pixels around ps. For each of these we compute a value for f(ps,p2,p3). Then we proceed to the paths of length 4 pixels, which can go from ps to any pixel in the square of 49 pixels around ps. In total, for a path of length N, there are 1 + 4 N (N+1) pixels in the square around the start pixel. For each of these pixels, to compute the maximum of a path of length n, we need only consider the paths of length n-1 to its eight neighbours, or three neighbours for a pixel on the edge of the square. Note also we must continue until the square includes the end point, and possibly goes some way beyond it. Clearly, this process will soon become too expensive in both memory requirements (since we must store all of the optimal paths found) and computing time. It should be noted that the dynamic programming method merely means that we store one optimal path to each end pixel, rather than all the non looping paths.
It is possible to limit the search space by imposing constraints, as shown in diagram 5.3. For example that the direction does not change by more than 45o at each pixel. If this is the case, then there are only three possible successors to each candidate pixel when boundary following. Another possibility is to prune the search tree. Thus when evaluating paths of a certain length, we eliminate those whose maximum is too small, or those whose distance from the end point is too large. Limitations of this kind, like heuristics, will mean that we cannot guarentee that the path found is an optimal one.
Snakes
A more recent idea is to use the geometric properties of the curve in the function to be minimised. For example, we may wish to constrain the curve so that it has no sharp changes (first order discontinuities), or, in other words, that it is its curvature is low. This can be estimated from the second differential. If we write the curve (using b for boundary) as a parametric equation: b(µ) ={X(µ),Y(µ)} then we write the second differential as b''(µ) = {¶ 2x/¶ µ2,¶ 2y/¶ µ2} and we can add the magnitude of the second differential vector (or its square) to our objective function, which now becomes:
n |
||
f(p1,p2, . . pn) = |
S |
{ ( |q(pi-1) - q(pi)| ) - a g(pi) + b |b''(pi)|}/n |
i=1 |
The second differential b'' could be found from the positions of the two neighbours of a pixel, though, at this resolution, this is rather a crude measure. (See Diagram 5.4).
Boundary refinement
The computational demands of the method make it only suitable for filling up the gaps in a partially completed boundary. Another idea is to refine a given boundary segment. We can again use the function which we choose in the dynamic programming method, but instead of searching the complete space we use only a few control points and minimise the function for these. The strategy is to consider each control point in turn and move it to the pixel in its local neighbourhood which gives us the minimum. (See diagram 5.5) In order to do this some strategy must be adopted for moving the control points in the event of their being no or little other information in the image. For a closed boundary we could make the inital estimate surround the object of interest, and add in another term to the objective function to penalise the total length. Thus, the contour will close like an elastic band around the object until the high gradient term stops the process. If a gap exists then a control point could drift into the object until stopped by the continuity condition. A difficulty with this type of strategy is that the control points can drift apart, and so another term, which preserves continuity, is to penalise badly distributed points. A possible term to do this is the sum over all control points of |dave - |pi-pi+1||. Various authors have suggested variants of this which work more or less sucessfully in different applications.
Once a good approximation has been found for the control points a divide and conquer algorithm could be invoked (diagram 5.6). Starting with the straight line joining the two end points of a segment of the boundary, we pick its midpoint, and search its neighbourhood until we find the optimum pixel. This point is then taken as a boundary point, and the two resulting segments are refined, until sufficient accuracy is obtained, or no suitable edge points can be found. Alternatively, we could then use dynamic programming type search to fill the gaps.
Types of edges
The success of the tracing algorithm will be dependent, to some extent, on the type of edges that describe the object. If they are clearly defined, such as digitizations of black and white objects with thick lines, then simple tracing will produce the correct results. However, if there are degrees of shading in the object, together with some random noise components, then it is not clear where to take the edge point thresholds, and the resulting boundaries from simple tracing will show gaps, which can be filled by dynamic programming or other optimisation methods.
Edge tracing methods do depend on there being simple boundaries with a proportion of edge points that can be determined easily. In some cases the initial edge points could be identified by an operator, and the boundaries found using dynamic programming or divide and conquor. It is however rather harder to find these seed edge points automatically. Thresholding tends to eliminate vital information, and the ordering of points such that they can be joined by boundary segments is a non trivial problem. For pictures which contain many objects, with lots of detail, there is the further problem of deciding which boundary segment goes with which object. Consequently, these types of method often fail.