Lecture 12: The 2D Fourier Transform

We now look at the Fourier transform in two dimensions. The equations are a simple extension of the one dimensional case, and the proof of the equations is, as before, based on the orthogonal properties of the Sin and Cosine functions.

 

M-1

N-1

 

F(u,v) = (1/MN)

S

S

f(x,y) exp(-2pjux/M) exp(-2pjvy/N)
 

x=0

y=0

 
 

M-1

N-1

 

f(x,y) =

S

S

F(u,v) exp(2pjux/M) exp(2pjvy/N)
 

u=0

v=0

 

Again, a fast algorithm (FFT) is available for computing this transform, providing that N and M are powers of 2. In fact, a two dimensional transform can be separated into a series of one dimensional transforms. In other words, we transform each horizontal line of the image individually to yield an intermediate form in which the horizontal axis is frequency (u) and the vertical axis is space (y). We then transform individually each vertical line of this intermediate image to obtain each vertical line of the transformed image. Hence a two dimensional transform with of an n by n image consists of 2n one dimensional transforms (See Diagram 12.1). The same assumption used in the 1D transform is made, namely that the M samples in x represent a fundamental, which is repeated ad infinitum, and similarly in y. Hence the transform is of an infinite matrix of 'tiles', each a copy of the image as shown in Diagram 12.2. The difficulties that are encountered in understanding the one dimensional transform are more than doubled, since the transform is now a bi-varate complex function. Thus we have a real function of u and v (discrete cosine transform), and a complex function of u and v (discrete sine transform), in total a four dimensional space. For this reason, most illustrative examples are confined to the magnitude of the transform.

Before we turn to applications of the two dimensional transform, two important properties should be noted. First, the magnitude of the transform is symmetric through the origin. (This is not true of the Sine component.) Secondly, rotation of the image in the x-y plane about the origin will cause a similar rotation in the u-v plane. This property can be shown by substituting

x := x Cos f - y Sin f

y := x Sin f + y Cos f

into the transform equation, and then manipulating the exponentials to the canonical form. This second property applies to the magnitude and both the cosine and sine components.

Windows

One of the fundamental properties that we used in the proof of the one dimensional transform is that the signal is periodic. However, when we think of taking a transform of an image this assumption clearly does not hold. In practice, there will always be a discontinuity between the two ends of each line of the image. In frequency terms, this discontinuity will lead to the appearance of high frequency components in the spectrum. The usual way round this problem is to taper the image intensities so that they fall off to zero at the edges. One way to achieve this is to multiply the image by a Gaussian (yes, wherever you go in computer vision you cannot avoid the dreaded Gaussian) with its maximum at the centre, and the standard deviation set so that at the edges of the image it is effectively zero. This is called the Hamming window. The justification for its use lies in the important property that the continuous Fourier transform of a Gaussian is a Gaussian. (Note that the continuous transform is defined over the space from -¥ to +¥ so the Gaussian can be considered periodic over that space). The variance is inverted by the transform, so a broad Gaussian transforms into a narrow Gaussian and vice versa, as shown in Diagram 12.3. Notice that a single DC level (y=const) in the spatial domain could be considered to be a Gaussian of infinite variance, and will transform to a point at the origin in the frequency domain. Using the Convolution theorem introduced in the last lecture, we see that multiplication in the spatial domain by a wide Gaussian, is equivalent to convolution in the frequency domain with a narrow Gaussian, which is equivalent to smoothing the spectrum. Hence, it is argued, we can remove the effects of the non-periodicity without changing the frequency information significantly.

Anti-aliasing

The two dimensional Fourier transform can be used for many image processing tasks. It is not the intention of this course to enumerate these, but one will serve as an illustration. Alias frequencies in images turn up as jagged lines, especially at the places where lines approach the horizontal or vertical. This effect is caused by there being too few discrete samples to represent the higher frequency components of the ideal image. The solution is to remove all frequencies above a certain threshold. This will degrade the image, since high frequencies are a result of step changes, however it will also remove the alias. The greater the distance from the origin, the higher the frequency, so multiplying the spectrum by a Gaussian effectively tapers the frequencies to zero at a distance from the centre depending on the variance. When the spectrum has been thus filtered it is inverse transformed back to the spatial domain, as shown in Diagram 12.4. Now consider the convolution theorem again. Since a broad Gaussian in the frequency domain will be transformed into a narrow Gaussian in the spatial domain, we can see that the process is identical to spatial convolution with the mask:

1/36

1/9

1/36

1/9

4/9

1/9

1/36

1/9

1/36

which is used in the standard algorithm, and which we saw in tutorial 3 is a very close approximation to a Gaussian.

Notice also that the simpler form of filter, in which we simply use a box (or top hat) in two dimensions) will transform into a Sinc function (Mexican hat) in spatial domain, and will produce curious artefacts in the image (see Diagram 12.5). For this reason the Gaussian filter is preferred.

Texture Features

It is frequently suggested that spatial frequencies would provide a convenient way to extract characteristics from textured images. This is so in cases where there are definite recurring patterns, such as in the examples we used of to illustrate the co-occurrence matrix method, though it should be noted that not all textures have these. The general method will be to divide the image into a number of small regions, say 16 by 16 pixels, apply a window function to taper the edges to zero, and then take the transform. Next, a filter bank is applied in the frequency domain to find the energy in specific regions of the spectrum. Two types of filter are often justified in terms of the information content that humans extract from images. The first is the annular ring filter, in which the spectrum is divided up into a number of rings as shown in Diagram 12.6, and for each ring we simply sum the magnitudes of every point inside. Here we are searching for the presence of repeated patterns, but not restricting the directions. The second type of filter is the wedge (Gabor) filter in which we divide the spectrum into slices as shown in Diagram 12.7. Here, by contrast we are looking for directional features without any reference to their fundamental frequencies. Clearly these two methods can be combined as shown in Diagram 12.8. The output of the filter bank forms a characteristic vector in exactly the same way that the properties of the co-occurrence matrices (energy, entropy etc.), and the characteristic vector compared with the characteristics of all possible textures we may be trying to find. Notice that there is a restriction on scaling. If the scale of an image is increased (zoom in) then the frequencies will be generally lowered. Thus, if we are depending on annular ring filter outputs the vectors will all shift towards the low frequency end, though the wedge filters will remain unaffected. Similarly, rotating the image will shift all the wedge filters, but will not affect the annular rings.

2D Correlations

One important property of the Fourier transform is the invariance of its spectrum magnitude under translation. Thus correlation with a template can be done in one pointwise multiply of the spectrum magnitudes with the template spectrum magnitudes. Thus for a set of templates, whose spectra we can pre-calculate, we can very quickly find the best match, and the size of the correlation tells us whether the template matches at all. This is compuationally much more efficient than doing a correlation by trying the template at all possible positions in spatial domain. Moreover, the position of a matched template in the spatial domain can be deduced from the phase (ratio of since to cosine components in the spectrum) of the fundamental.