Lecture 14: Computational Stereo

We turn now to the question of constructing a three dimensional representation of a scene from one or more two dimensional images. General methods which work from single two dimensional images have been published, but unfortunately, they are only effective for very simple scenes where the edges are well defined lines and the surfaces planar. In reality, we are faced with arbitrarily shaped boundaries and incomplete image information. To deal with these problems a variety of methods have been evolved in which the reconstruction of the three dimensional geometry of a scene is done by use of at least two images. The majority of such techniques depend on obtaining two images from different, but known camera positions.

There is an analogy between two camera stereo systems, and the binocular nature of all natural vision systems. However, this can be overstated. It is known that humans can function perfectly well with one eye, and, perhaps more surprisingly the frogs ability to catch flies is only slightly impaired by blinding one eye. This suggests that much of our three dimensional information may be derived from learned context or from eye movements or head movements. We have noted earlier the major advantage of human vision to centre the object of interest in the field of view before recognition. Perhaps it is better to consider computational stereo a compensating capability of artificial systems.

 

Stereo imaging geometry

We will concentrate on two camera methods, the principal of which is shown in Diagram 14.1. The two camera positions, centres of projection, are labelled by position vectors Cl and Cr. The problem is to measure, in the three dimensional co-ordinate system the unit direction vectors pl and pr from the centres of projection to the point on the object whose position vector P we wish to find. The direction is known in each camera co-ordinate system, since the pixel co-ordinates and the focal length is known. Let us consider the left hand camera and suppose that the camera co-ordinate system is defined by unit vectors (u,v,w) in the real world co-ordinate system that we are using. We will take the direction of view to be the w direction. Suppose further that a projected point in one of the cameras has real co-ordinate Q. Looking at Diagram 14.2 we see that the co-ordinates of the point that will be measured in the camera co-ordinate system are:

((Q-Cl).u, (Q-Cl).v, (Q-Cl).w)

We can solve for Q simply by equating these co-ordinates with the measured pixel co-ordinates in the camera:

Xpix = (Q-Cl).u, Ypix = (Q-Cl).v, f=(Q-Cl).w

where f is the focal length of the camera in pixel units. Thus we obtain a simple set of linear equations which we can solve for Q.

x ux + y uy + z uz = -Cl.u - Xpix

x vx + y vy + z vz = -Cl.v - Ypix

x wx + y wy + z wz = -Cl.w - f

Having solved the above equations for the co-ordinates of Q we now note that
pl = (Q-Cl)/|(Q-Cl)|

And by a similar process we can calculate pr.

Looking again at the absolute co-ordinate system we see that we can now write the vector equation:

a pl - b pr = Cr - Cl

Which we can solve in Cartesian space for a and b yielding the absolute co-ordinates of P.

There are some ways in which this problem can be simplified for practical systems. For example, a typical arrangement is that the cameras are arranged so that they are at the same height, and the directions of view are common. In this case the cameras are termed ‘in correspondence’, and the resulting geometry is much simpler. In fact it can be reduced to two two dimensional calculations.

 

Epipolar lines.

The above analysis is fine as long as a match can be made between the corresponding points in the image. The difficulty of point matching varies depending on the problem in hand. In some applications there may be obvious feature points, in others a number of possibilities may exist. One construction that helps to match points is the epipolar line. Consider the projection line from a feature point to one of the image planes. This line will project into a line on the other image plane called the epipolar line, and the corresponding feature point in the second image must lie on that line.

Computation of the epipolar line is simple since it is the intersection of a plane through the two centres of projection and the feature point, with the image plane of the second camera. The arrangement is shown in diagram 14.3. Since we know the vector (Cr-Cl), and can calculate the direction vector pl by the method above, since these two vectors lie in the plane we can write the unit normal vector to the epipolar plane:

n = p1 x (Cr-Cl)/|p1 x (Cr-Cl)|

and the plane equation is:

n. (P-Cl) = 0

n . P = n . Cl

where P=[x,y,z] is the locus of a point on the plane. A similar equation can be written down for the right image plane:

dr . (P-(Cr + f dr)) = 0

dr . P = dr . Cr + f

Where f is the scalar distance from the centre of projection to the image plane as defined above.

The line of intersection can be found by standard vector methods. Its direction is given by n x dr and a point L on the line can be found by taking the Cartesian equations for the planes, choosing a convenient value of z, say z=0, and solving for x and y. Notice that the epipolar line may have a vanishing point or indeed may not appear in the image at all (Diagram 14.4).

Having found the epipolar line, it is now possible to derive a matching algorithm for feature points:

Compute and list all feature points in the each image:

(eg vertices of buildings etc.)

Repeat

for each feature point in the left image,

{ calculate the epipolar line in the right image.

if the epipolar line intersects only one feature point

then match those points and remove them from the lists

}

Until no feature point can be matched uniquely

 

If unmatched feature points exist prior knowledge must be used to separate them. For simple scenes the algorithm may successfully match all the feature points, if not then some further information could be used such as colour or brightness, or some reasoning about the scene. One simple heuristic is that points will most likely be at a depth similar to their neighbours, or to points to which they are closely connected.

 

Interest Operators

The above application of stereo has been successfully applied in robot vision, and vision systems in manufacturing. In many cases these can be reduced to line drawings comparatively easily. For more general scenes the problem of matching becomes more acute. The approach to such problems is to apply interest operators, which select pixels that are good candidates for matching in the two images. These can be based on gradient, colour and so on, and applied quite generally. The most widely used operator is the Moravec interest operator, which shows how ‘distinct’ a piece of the image is from its surround. Suppose that we take a local variance measure for each pixel using:

VL(pi) =

S

{I(pi) - I(pj)}2
 

j neighbours i

 

Now the interest operator is computed as:

IO(pi) =

min

{VL(pi),VL(pj)}
 

j neighbours i

 

ie the minimum variance of itself and its neighbours. Finally, all values of the interest operator that are not local maxima are set to zero:

IO(pi) := 0 if IO(pi) < IO(pj) (j neighbours i)

The points with non zero interest operators are then taken as the set to match. The first step removes isolated points which could have high variance due to noise.