Compression of data using Fourier methods


The JPEG algorithm

The JPEG algorithm is brutally simple: an image is divided into blocks of 8x8 pixels and each is Fourier transformed. The smallest coefficients are set to zero and not stored. There is no attempt to enforce continuity between blocks. This can be seen in the following images (256x256, with exactly 8x8 pixel chequerboard tiles). In all cases, severe JPEG compression was applied, reducing the filesize by a factor 10. In the first image, this loses no information, since the black and white squares all occupy a single JPEG block each. But when the same image is shifted by half a block, artefacts appear - shown more clearly in the third image, which is an amplification of the difference between the JPEG and the exact image.

Now consider how all this works with aliasing. The first image has progressively finer structure, down to the 1-pixel level, as we move to the top right. In JPEG form, as shown, we again see artefacts. But more severe artefacts appear if the pattern is sampled onto pixels half and then 1/4 the original size (these images are shown without JPEG compression). This is a vivid illustration of the sampling theorem at work: the new pixel grids fail to cope with the high-frequency information in the original image.

All of this can be ameliorated if the image is smoothed before downsampling. Here is the effect of a Gaussian with a sigma equal to 1.5 pixels. This removes artefacts for the half-size image, although they are starting to reappear for the 1/4-size image. Note that the Gaussian never sets high frequencies to exactly zero, so it does not yield something that satisfies the sampling theorem exactly. But in practice the errors are unimportant when sigma is around the pixel scale.

The MP3 algorithm

The MP3 algorithm is similar in concept to the JPEG, but significantly more complicated. The data timestream (normally samples at 44100 Hz) is divided into 'frames' of 1152 samples, which are treated by Fourier means. But the frames are not dealt with in complete isolation, as this would risk discontinuities and thus audible clicks between frames.

As an example, here is a squarewave with a period of 96 samples (exactly 12 oscillations per frame - about 459 Hz). In the zoomed-in frame, you can see the samples connected by the continuous function generated via sinc interpolation. At the step, you see oscillations in that interpolation: this is the Gibbs phenomenon, reflecting the fact that the step doesn't obey the sampling theorem.

Now here's the same thing rendered as a 256 kb/s (thousand bits per second) MP3 (using the Audacity package). This is normally considered to be indistinguishable from the original. That may be true by ear, but the eye sees quite a difference, even so. Notice that the MP3 takes a while to 'get going': even where the original signal is (by design) an integral number of frames, the data are padded at the start with (almost) zero signal lasting around 2 frames. This is presumably to avoid any Gibbs-like transient from starting the signal abruptly from zero (see the LAME website for more information). As a result (presumably) the quality of the first frames is not as good as the others. When MP3s are played, this opening silence is removed, but not all players do a good job of this, hence sometimes you get audible clicks at the track joins (made worse if the original file was not an integral number of frames in length - then you get a final frame padded out with zeroes).

Now here's the same thing at 32 kb/s. This is below the normal quality threshold (things start to become acceptable around 128 kb/s, depending on the music and on your ears), and one can see why. The waveform is heavily distorted, and in a way that varies strangely with position within the file (beyond the first few frames shown here).

Finally, here's what iTunes makes of the squarewave at 256 kb/s. This is strikingly different to Audacity's version at the same bitrate: there is less noise in the parts of the squarewave where the signal should be constant - but the result has some slow tilt. This doesn't look right. Again, it's surprising that such visually striking differences aren't more audible.