Aliasing

In statistics, signal processing, and related disciplines, aliasing is an effect that causes different continuous signals to become indistinguishable (or aliases of one another) when sampled. When this happens, the original signal cannot be uniquely reconstructed from the sampled signal. Aliasing can take place either in time, temporal aliasing, or in space, spatial aliasing.

Aliasing is a major concern in the analog-to-digital conversion of video and audio signals: improper sampling of the analog signal will cause high-frequency components to be aliased with genuine low-frequency ones, and be incorrectly reconstructed as such during the subsequent digital-to-analog conversion. To prevent this problem, the signals must be appropriately filtered before sampling.

It is also a major concern in digital imaging and computer graphics, where it may give rise to moiré patterns (when the original image is finely textured) or jagged outlines (when the original has sharp contrasting edges, e.g. screen fonts). Anti-aliasing techniques are used to reduce such artifacts.

Aliasing in periodic phenomena
The sun moves east to west in the sky, with 24 hours between sunrises. If one were to take a picture of the sky every 23 hours, the sun would appear to move west to east, with 24 &times; 23 = 552 hours between sunrises. Note that both motions would result in the same pictures. The same phenomenon causes spoked wheels to apparently turn at the wrong speed or in the wrong direction when filmed, or illuminated with a flashing light source &mdash; such as fluorescent lamp, a CRT, or a strobe light. These are examples of temporal aliasing.

If someone wearing a tweed jacket with a pronounced herringbone pattern was videoed, and the video played on a TV screen with a smaller number of lines than the image of the pattern or on a computer monitor with pixels larger than the elements of the pattern, then one would see large areas of darkness and lightness over the image of the jacket and not the herringbone pattern. This is an example of spatial aliasing, also known as a moiré pattern; how it is produced is illustrated next.

Sampling a sinusoidal signal
In the same way, when one measures a sinusoidal signal at regular intervals, one may obtain the same sequence of samples that one would get from a sinusoid with lower frequency. Specifically, if a sinusoid of frequency f (in cycles per second for a time-varying signal, or in cycles per centimeter for space-varying signal) is sampled s times per second or s intervals per centimeter, with s &le; 2 f, the resulting samples will also be compatible with a sinusoid of frequency 2 f - s. In the area's jargon, each sinusoid gets aliased to (becomes an alias for) the other.



Therefore, if we sample at frequency $$s$$ a continuous signal that may contain both sinusoids, we will not be able to reconstruct the original signal from the samples, because it is mathematically impossible to tell how much of each component we should take.

The Nyquist criterion
One way to avoid such aliasing is to make sure that the signal does not contain any sinusoidal component with a frequency greater than s/2. More generally, it suffices to ensure that the signal is appropriately band-limited, namely that the difference between the frequencies of any two of its sinusoidal components must be strictly less than s/2.

This condition is called the Nyquist criterion, and is equivalent to saying that the sampling frequency (s) must be strictly greater than twice the signal's bandwidth, the difference between the maximum and minimum frequencies of its sinusoidal components.

Origin of the term
The term "aliasing" derives from the usage in radio engineering, where a radio signal could be picked up at two different positions on the radio dial in a superheterodyne radio: one where the local oscillator was above the radio frequency, and one where it was below. This is analogous to the frequency-space "wrapround" that is one way of understanding aliasing.

An audio example
The qualitative effects of aliasing can be heard in the following audio demonstration. Six sawtooth waves are played in succession, with the first two sawtooths having a fundamental frequency of 440 Hz (A4), the second two having fundamental frequency of 880 Hz (A5), and the final two at 1760 Hz (A6). The sawtooths alternate between bandlimited (non-aliased) sawtooths and aliased sawtooths and the sampling rate is 22.05 kHz. The bandlimited sawtooths are synthesized from the sawtooth waveform's Fourier series such that no harmonics above the Nyquist frequency are present.

The aliasing distortion in the lower frequencies is increasingly obvious with higher fundamental frequencies, and while the bandlimited sawtooth is still clear at 1760 Hz, the aliased sawtooth is degraded and harsh with a buzzing audible at frequencies lower than the fundamental. Note that the audio file has been coded using Ogg's Vorbis codec, and as such the audio is somewhat degraded.


 * [[media:Sawtooth-aliasingdemo.ogg|Sawtooth aliasing demo]] {440 Hz bandlimited, 440 Hz aliased, 880 Hz bandlimited, 880 Hz aliased, 1760 Hz bandlimited, 1760 Hz aliased}

Mathematical explanation of aliasing
The preceding explanation and the Nyquist criterion are somewhat idealised, because they assume instantaneous sampling and other slightly unrealistic hypotheses, although useful approximations to these things do exist. The following is a more detailed explanation of the phenomenon in terms of function approximation theory.

Continuous signals
For the purposes of this analysis, we define (continuous) signal as a real or complex valued function whose domain is the interval [0,1]. To quantify the "magnitude" of a signal (and, in particular, to measure the difference between two signals), we will use the root mean square norm (see Lp spaces for some details), namely


 * $$||f||^2 := \int_0^1|f(t)|^2\,dt.$$

Accordingly, we will consider only signals that have finite norm, i.e. the square-integrable functions


 * $$L^2=L^2([0,1]):=\left\{ f:[0,1] \rightarrow \Bbb C : ||f||<\infin\right\}.\,$$

Note that these signals need not be continuous as functions; the adjective "continuous" refers only to the domain.

To be precise, we do not distinguish between functions that differ only on sets of zero measure. This technicality turns || || into a norm, and explains some of the difficulties (see the S0 sampling method, below.) For details, see Lp spaces.

Point sampling
The conversion of a continuous signal f to an n-dimensional vector of equally spaced samples (a sampled signal) can be modeled as a point sampling operator $$S_0$$, defined by $$S_0 f := (f(t_1), f(t_2), \dots, f(t_n))$$, where $$t_i = i/n$$. That is, the function is sampled at the points $$1/n, 2/n, \dots, 1.\,$$

Note that $$S_0$$ is a linear map: for any two signals f and g, and any scalar a, then $$S_0(af+g)=aS_0(f)+S_0(g).\,$$

Unfortunately, while $$S_0(f)$$ is well-defined if f is continuous (say), it is not well defined on the space $$L^2$$ defined above. A symptom of this is, even if we restrict our attention to functions f that are continuous, function $$S_0$$ of f is not continuous in the $$L^2$$ norm.

In many physically significant settings, the $$L^2$$ norm, or a similar norm, is an appropriate measure of similarity between signals. What will then happen is that two signals f and g that are deemed very similar to begin with will sample to two signals $$S_0f$$ and $$S_0g$$ which are very dissimilar.

A better sampling method (filtering)
In order to preserve closeness of signals after sampling (in other words, to get a sampling method which varies smoothly as a function of the signal f) we need to modify our sampling strategy $$S_0$$. An improved method is as follows:

$$S_1f(k)=n \int_{(k-1)/n}^{k/n} f(t)dt, k=1,...,n$$

This is a better filtering method, as $$S_1$$ is now a continuous linear map from $$L^2$$ to $$\Bbb C^n$$.

Reconstruction
Given a sampled signal $$f(k)\in \Bbb C^n$$ one would like to reconstruct the original signal $$f(x)\in L^2$$. This is obviously impossible in general, as $$L^2$$ is an infinite dimensional vector space, while $$\Bbb C^n$$ is a finite dimensional vector space (of dimension n.)

In practice, one picks a subspace $$H \subset L^2$$ of dimension n and a reconstruction linear map R from $$\Bbb C^n$$ to H. The purpose of R is to turn a sampled signal into a continuous one in a way that makes sense to us.

An example reconstruction map would be


 * $$R_1s=\sum_{k=1}^n s(k) 1_{[(k-1)/n,k/n)}$$

where $$1_E(x)$$ is 1 if $$x\in E$$ and 0 otherwise.

Ideally, we would have $$S(R(s))=s$$ for all $$s\in \Bbb C^n$$. If this occurs, then R and S both have the same picture of how signals in $$L^2$$ and in $$\Bbb C^n$$ behave, we might say that S and R are coherent. Here, $$R_1$$ and $$S_1$$ are in fact coherent, but $$R_1$$ and $$S_0$$ aren't.

Another way of saying that R and S are coherent is that R is a right-inverse for S (or S is a left-inverse for R.)

Aliasing
For any sampled signal $$v\in \Bbb C^n$$ the set of continuous signals $$f\in L^2$$ which sample to the same $$v$$ are called aliases of one another. The fact that there are many aliases for any one given sampled signal is called aliasing. As previously mentioned, the large quantity of aliasing is caused by $$L^2$$ being infinite dimensional while $$\Bbb C^n$$ is finite dimensional.

Optimal filtering
In certain physical situations, the choice of R, H or S are somehow constrained. For instance, it is usual to choose H to be the linear span of low-degree trigonometric polynomials:


 * $$H=\left\{\sum_{k=-n}^n \alpha_k \exp 2\pi ikx\,;\, \alpha_k \in \Bbb C \right\}.\,$$

Further restrictions are that, for instance, S should coincide with $$S_0$$ on H. If sufficiently many of these demands are put forward, we eventually conclude that the sampling algorithm must take a very special shape:


 * $$S_\mathrm{opt}f=S_0(\mathrm{sinc}*f)$$

where $$\mathrm{sinc}*f$$ is some sort of sinc filter or sinc function.

The reconstruction formula R is chosen so that R and S are coherent.

Caveats :)
It is extremely important to keep in mind what is much repeated in the above discussion: the Nyquist theorem, the optimality of the sinc filter, the choice of the error norm (we chose $$L^2$$) and so on are all assumptions we are making about the underlying physical problem.

In many problems, these assumptions are absolutely wrong, and in these cases, the Nyquist theorem needs to be modified so that it makes a true statement pertinent to the situation at hand.

For instance, when solving certain nonlinear partial differential equations, one finds that low and high frequencies are difficult to compute; then the filters used need to remove high and low frequency components. The sinc filter only removes high frequency components, so it is not optimal for this particular problem.

In some other problems, the space of signals is not a vector space. In this case, the Nyquist theorem needs to be rephrased very carefully, as it no longer speaks of vector spaces.

For instance, when observing a signal which is known to be of the form f(t)=(exp(it)-a)/|exp(it)-a| (see Figure 1 where the trajectory of f(t) is plotted, with a=exp(i)) where a is some unit complex number, then in fact the set of possible signals can be viewed as a one-dimensional manifold. Since a is unit, f will have a singularity somewhere and therefore will contain very high frequency information. Yet, the only necessary information is a; from this we are able to reconstruct the entirety of the signal with arbitrary precision. To recover a, it is sufficient to have any single Fourier coefficient of f, except the zeroth one. In other words, while f has extremely wide bandwidth (and the Nyquist theorem would seem to suggest that we need many samples) a single sample of f suffices with the $$S_0$$ method to recover f fully, to arbitrary precision. The reconstruction filter to recover f from $$S_0f$$ in this case is not any of the discussed reconstruction formulae.

Figure 1 shows the rational function described above, as well as a filtered version where we kept 21 Fourier coefficients. The filtered version corresponds to applying a sinc filter to the rational function f. As we can see, the sinc filtered version is an extremely poor approximation of the unfiltered function. Figure 2 illustrates the problem: the Fourier coefficients of f decay roughly as fast as 1/frequency. Therefore, the assumptions used to justify the optimality of the sinc filter are false in this case.

An example in astronomy


An interesting problem of astronomy is to compute the apparent radius of a star. Stars look like points in most cases, but a large, nearby star could have an apparent radius in an extremely powerful telescope. The problem is that earthbound optical telescopes have to deal with the distortions of the atmosphere, and so the images of stars we obtain are frequently of very poor quality (we might say that there is a lot of aliasing). We could try to improve them by filtering with a sinc filter, as above; however, the trouble is that we're trying to find information smaller than the smallest detail that the atmosphere allows through (the diameter of the star is much less than the "sample rate" of a telescope peering through the atmosphere; we are fighting against the Nyquist theorem). However, in this particular problem, some simplifying assumptions make the problem tractable.

With a good telescope and in an ideal world, the star is completely spherical and completely white against a completely black background. The star may not be centered. It is physically reasonable to presume that the pure image f and the image degraded by the atmosphere g are related via the relation


 * g=&Phi;*f,

where &Phi; is an unknown convolution operator. We know that f is precisely the indicating function of some disc,

f(x,y)=1 if (x-a)2+(y-b)2&le;r2, 0 otherwise,

and the only unknown quantities are a,b,r. The apparent radius of the star is r. From Fourier analysis, we know that therefore


 * $$\hat g=\hat \phi \cdot \hat f$$,

where $$\hat \cdot$$ denotes the Fourier transform. It turns out that the radius r of the indicating function of a disc (in our example, f) is completely determined by the smallest zero w0=(w,z) of $$\hat f(w,z)$$ via the relation


 * $$|(w,z)| r = \sqrt{w^2+z^2} r = Z_0$$,

where $$Z_0=3.83170597...$$ is the first positive zero of the Bessel function $$J_1(x)$$.

In Figure 3, we have simulated such an experiment. The top-left image is f, an off-center purely circular star. The bottom-left image is g, the image of f after being disturbed randomly by the atmosphere; it is the image an earthbound astronomer might observe in his telescope. The top-right and bottom-right images are the Fourier transforms of f and g, respectively. We have circled in blue the radius of the smallest zeros of $$\hat f$$ and $$\hat g$$; the circle for $$\hat g$$ was used to compute $$w_0$$, from which we calculated an approximation of r. The star in this example has a radius of 15 pixels, and the measured radius from $$\hat g$$ is 14.6935, which is an error of less than half a pixel. The reconstructed star image is almost identical to the ideal star image, as seen in Figure 4. The only different pixels are colored black. The red cross marks the origin of the coordinate system as in the top-left image of Figure 3; only the area near the star is shown and is zoomed by a factor of four for clarity.

The origin of the star was calculated from g by simple averaging.

In this case, the reconstruction map is nonlinear but very accurate for expected inputs. This is an example where black-box filtering with a sinc filter is not able to recover the desired data. The algorithm presented here is able to compute the radius very precisely even when the measured data is orders of magnitude worse than pictured above.

External link

 * Sampling Theory white paper by Dan Lavry