Lecture 15: Photometric Stereo
Last lecture we looked at stereo vision from two, or more cameras, and the computational methods for matching points and re-constructing the three dimensional picture. An alternative way is to use one camera and one or more light sources to illuminate the object. By inverting the illumination equations, it is possible to obtain a three dimensional reconstruction. It is however necessary to make assumptions about the surfaces being re-constructed. This method is appropriate for applications where the lighting can be easily controlled, such as a factory. We use gradient space for the exposition of the method. In this we assume that the surface normal at any point in 3D space is written in the form [p,q,-1]. If we describe a surface by defining z in terms of x and y:
z = f(x,y)
and we seek to identify the surface (spatial) gradient at each pixel in the image we find that:
p = |
¶ f ¶ x |
q = |
¶ f ¶ y |
This information can be transformed into real z values by knowing one fixed point on the surface. Note that the p-q space can equally be defined with p = -Dz/Dx and q = -Dz/Dy, in which case the surface normal is [p,q,1].
Reflectance Map
The light emitted from a surface depends on the surface properties. Surfaces which are matt are usually thought to obey Lambert's cosine law, which states that the emitted light is proportional to the cosine of the angle between the surface normal, and the direction of the light source. This angle is normally called i, the incidence angle, as shown in diagram 15.1. Surfaces which have this property are called Lambertian. We will confine ourselves to a discussion of Lambertian surfaces. More complex surfaces, with specular (mirror like) properties, have illumination equations depending on the emittance angle e and phase angle between the view direction and the light source direction, and require more complex analysis.
The reflectance map gives the relationship between the brightness and the gradient space parameters. Referring again to diagram 15.1 we know that the reflectance map is defined by:
R(p,q) = r Iinc n.s/|n||s|
where Iinc is the incident light intensity and r is a surface constant called the albedo. To see the effect of this equation we need to make some simplifying assumptions. At any one pixel we can lump the Incident light and the albedo into one constant. It is usual to take the light source as uniform, in which case we can choose to define s as a unit vector, and noting that the surface normal is defined, in the left handed axis system, as [p,q,-1], then we can write:
R(p,q) = K (sx p + sy q - sz)/Ö (p2+q2+1)
which is called the reflectance equation, and is a second order equation in p and q. In order to visualize the function R it can be drawn as a series of contours R(p,q) = const. For the special case where s = [0,0,-1], ie the light direction is coincident with the viewing direction, the contours for which R(p,q) is constant are simply circles centred on the origin as shown in diagram 15.2. For cases where the light direction is not coincident with the view direction, then, for constant values of R, the equation is a general second order conic section, giving contours of the form of diagram 15.3.
If we now take an image pixel, and ask what its p and q values are, we see that any single intensity is a contour in the p-q space. Consider the case where we have a fixed object, and we have three independent light sources. If we measure the light intensity at one of the pixels for each light source, then we will obtain three contours in the p-q space, and there will be exactly one point where the contours all coincide, as shown in diagram 15.4. This point is at the actual p-q value for the point.
For in the case of simple Lambertian surfaces, an analytical solution for p and q can be found. At first sight it may look as if there is redundancy in the system, since three light sources are required to determine p and q. However, in general we do not know the albedo r or the incident light intensity, and so these must be eliminated. The easiest way to do this is to take the ratio of the light intensities. Using subscripts to indicate the three different light sources we have that:
R1(p,q)/R2(p,q) = (sx1 p + sy1 q - sz1)/(sx2 p + sy2 q - sz2)
and
R1(p,q)/R3(p,q) = (sx1 p + sy1 q - sz1)/(sx3 p + sy3 q - sz3)
which can be solved easily.
This method has several limitations. The worst being that the light sources must be uniform, or at least a large distance from the object. The second restriction is that surfaces should be Lambertian. In practice all surfaces will have a specular content, which will complicate the equations. Other formulations have been made in which close point sources of light can be utilised. Often graphical solutions, rather than analytical ones are the only possibility.
Global methods of extracting shape from shade
The above method is defined as local, since it calculates each pixel independently. A different method, called the global method, has been formulated, in which only one light source is used, but a relationship is established between adjacent pixels. It is usually assumed that the objects are smooth, and this smoothness condition is formulated in a relaxation process for estimating [p,q].
The method starts from a pixel (x,y), and it is assumed that an estimate of the surface gradient (p,q) has been made for that pixel. There will, in general, be an error that can be measured between the measured intensity at that pixel I(x,y), and the intensity estimated from the reflectance map R(p,q). We therefore want to find a process that minimises (I(x,y) - R(p,q))2. We know from the definition of the reflectance map that there is no unique solution to this minimisation problem, and therefore we need to add a further constraint. This is where the smoothness comes in. We can get a measure of smoothness by taking the derivatives of p and q. To compensate for the differing signs, the smoothness functions is taken to be:
S(p,q,x,y) = | ¶ p2
¶ x |
+ | ¶
p2
¶ y |
+ | ¶
q2
¶ x |
+ | ¶q2
¶ y |
Thus, we can write an error function to minimise as:
E(p,q,x,y) = (I(x,y)-R(p,q))2 + l S(p,q,x,y)
Where l is a factor relating the relative importance of the two quantities to be minimised. To solve the global problem we sum the error over the whole image:
E = ò ò E(p,q,x,y) dx dy
To find the minimum of the function E we need to differentiate it and set the result to zero. Unfortunately this is not a straightforward process, since p and q are derivatives with respect to x and y. The process is carried out by a method called the calculus of variations, originally developed by Lagrange and Euler. (Descriptions of the process may be found in Robot Vision by B Horn, and other texts. We will not go through the process here). The result is the following two equations.
p = pav + (1/l) [I(x,y) - R(p,q)] |
¶ R
¶ p |
q = qav + (1/l) [I(x,y) - R(p,q)] |
¶ R
¶ q |
and pav and qav are the averages of the four values of p and q for the neighbouring pixels at [x+1,y] [x-1,y] [x,y+1] and [x,y-1].
This now gives us the basis of a relaxation process. We estimate an initial [p,q] value for each pixel, then compute an update using the two equations in the form:
pk+1 = pavk + (1/l) [I(x,y) - R(pk,qk)] |
¶ Rk
¶ p |
In order to make this algorithm work correctly, it is necessary to specify some boundary conditions. For example, we do not wish the relaxation process to cross an occluding contour, since at those points the smoothness criterion does not apply.
Local Linear Shape from Shading
An ingenious algorithm for reconstruction of p-q values, using only one light source located at the same place as the viewpoint, was devised by Haroon Rashid (PhD thesis at IC). It is based on combining the reflectance equation with its first differential. Starting again from the reflectance equation:
R(p,q) = r I n.s/|n||s|
where I is the incident light intensity at the object point.. We note that if the light source is coincident with the camera, and the coordinates that we are using are centered at the camera, then the light source vector can be written as
s = [x,y,f]
where x and y are the pixel coordinates of the point we are looking at, and f is the focal length of the camera expressed in pixel units. However, since the light source is now close to the object, we need to take into account the fall off of the light with distance from the object. This can be expressed as:
I = K/s.s
Thus lumping the constants together,
R(p,q) = C n.s/|n||s|3
This yields:
R(p,q) = C (px + qy - f) /{ Ö (p2+q2+1) (x2 + y2 +f2)3/2}
Since we are considering just one pixel, we can treat the Ö (p2+q2+1) as a constant yielding:
R(p,q) = D (px + qy - f) / (x2 + y2 +f2)3/2
If we differentiate this with respect to x
¶ R/¶ x |
= Px(R) = |
D p /(x2 + y2 +f2)3/2 - 3 D (px + qy - f) x /(x2 + y2 +f2)5/2 |
= |
D { p(x2 + y2 +f2) - 3(px + qy -f) x }/(x2 + y2 +f2)5/2 |
dividing this by the actual reflectance we get
Px(R)/R = p/(px + qy - f) - 3x /(x2 + y2 +f2)
Differentiating by y gives us another similar equation. Now since we know x,y and f (the pixel coordinates) and we have eliminated all the lumped constants (D), which included the distance of the pixel from the camera, the light source intensity and the local surface albedo, and we can obtain R(p,q) as the pixel intensity and its differential from the sobel (or similar) operator, we have just a linear equation to solve in p and q. This means that the p-q map can be calculated very fast. The problem is that estimating the gradient requires looking at the neighbouring pixels, and thus can be subject to error where the surface albedo changes.