Fractal Image Compression: What's it all About?
-----------------------------------------------
Seven things you should know about Fractal Image Compression (assuming
that you want to know about it).
1. It is a promising new technology, arguably superior to JPEG --
but only with an argument.
2. It is a lossy compression method.
3. The fractals in Fractal Image Compression are Iterated Function
Systems.
4. It is a form of Vector Quantization, one that employs a virtual
codebook.
5. Resolution enhancement is a powerful feature but is not some
magical way of achieving 1000:1 compression.
6. Compression is slow, decompression is fast.
7. The technology is patented.
That's the scoop in condensed form. Now to elaborate, beginning with a
little background.
A Brief History of Fractal Image Compression
--------------------------------------------
The birth of fractal geometry (or rebirth, rather) is usually traced to
IBM mathematician Benoit B. Mandelbrot and the 1977 publication of his
seminal book The Fractal Geometry of Nature. The book put forth a power-
ful thesis: traditional geometry with its straight lines and smooth
surfaces does not resemble the geometry of trees and clouds and moun-
tains. Fractal geometry, with its convoluted coastlines and detail ad
infinitum, does.
This insight opened vast possibilities. Computer scientists, for one,
found a mathematics capable of generating artificial and yet realistic
looking landscapes, and the trees that sprout from the soil. And mathe-
maticians had at their disposal a new world of geometric entities.
It was not long before mathematicians asked if there was a unity among
this diversity. There is, as John Hutchinson demonstrated in 1981, it is
the branch of mathematics now known as Iterated Function Theory. Later
in the decade Michael Barnsley, a leading researcher from Georgia Tech,
wrote the popular book Fractals Everywhere. The book presents the mathe-
matics of Iterated Functions Systems (IFS), and proves a result known as
the Collage Theorem. The Collage Theorem states what an Iterated Func-
tion System must be like in order to represent an image.
This presented an intriguing possibility. If, in the forward direction,
fractal mathematics is good for generating natural looking images, then,
in the reverse direction, could it not serve to compress images? Going
from a given image to an Iterated Function System that can generate the
original (or at least closely resemble it), is known as the inverse
problem. This problem remains unsolved.
Barnsley, however, armed with his Collage Theorem, thought he had it
solved. He applied for and was granted a software patent and left acade-
mia to found Iterated Systems Incorporated (US patent 4,941,193. Alan
Sloan is the co-grantee of the patent and co-founder of Iterated Sys-
tems.) Barnsley announced his success to the world in the January 1988
issue of BYTE magazine. This article did not address the inverse problem
but it did exhibit several images purportedly compressed in excess of
10,000:1. Alas, it was a slight of hand. The images were given sugges-
tive names such as "Black Forest" and "Monterey Coast" and "Bolivian
Girl" but they were all manually constructed. Barnsley's patent has come
to be derisively referred to as the "graduate student algorithm."
Graduate Student Algorithm
o Acquire a graduate student.
o Give the student a picture.
o And a room with a graphics workstation.
o Lock the door.
o Wait until the student has reverse engineered the picture.
o Open the door.
Attempts to automate this process have continued to this day, but the
situation remains bleak. As Barnsley admitted in 1988: "Complex color
images require about 100 hours each to encode and 30 minutes to decode
on the Masscomp [dual processor workstation]." That's 100 hours with a
_person_ guiding the process.
Ironically, it was one of Barnsley's PhD students that made the graduate
student algorithm obsolete. In March 1988, according to Barnsley, he
arrived at a modified scheme for representing images called Partitioned
Iterated Function Systems (PIFS). Barnsley applied for and was granted a
second patent on an algorithm that can automatically convert an image
into a Partitioned Iterated Function System, compressing the image in
the process. (US patent 5,065,447. Granted on Nov. 12 1991.) For his PhD
thesis, Arnaud Jacquin implemented the algorithm in software, a descrip-
tion of which appears in his landmark paper "Image Coding Based on a
Fractal Theory of Iterated Contractive Image Transformations." The
algorithm was not sophisticated, and not speedy, but it was fully au-
tomatic. This came at price: gone was the promise of 10,000:1 compres-
sion. A 24-bit color image could typically be compressed from 8:1 to
50:1 while still looking "pretty good." Nonetheless, all contemporary
fractal image compression programs are based upon Jacquin's paper.
That is not to say there are many fractal compression programs avail-
able. There are not. Iterated Systems sell the only commercial compres-
sor/decompressor, an MS-Windows program called "Images Incorporated."
There are also an increasing number of academic programs being made
freely available. Unfortunately, these programs are -- how should I put
it? -- of merely academic quality.
This scarcity has much to do with Iterated Systems' tight lipped policy
about their compression technology. They do, however, sell a Windows DLL
for programming. In conjunction with independent development by re-
searchers elsewhere, therefore, fractal compression will gradually
become more pervasive. Whether it becomes all-pervasive remains to be
seen.
Historical Highlights:
1977 -- Benoit Mandelbrot finishes the first edition of The Fractal
Geometry of Nature.
1981 -- John Hutchinson publishes "Fractals and Self-Similarity."
1983 -- Revised edition of The Fractal Geometry of Nature is
published.
1985 -- Michael Barnsley and Stephen Demko introduce Iterated
Function Theory in "Iterated Function Systems and the Global
Construction of Fractals."
1987 -- Iterated Systems Incorporated is founded.
1988 -- Barnsley publishes the book Fractals Everywhere.
1990 -- Barnsley's first patent is granted.
1991 -- Barnsley's second patent is granted.
1992 -- Arnaud Jacquin publishes describes the first practical
fractal image compression method.
1993 -- The Iterated Systems' product line matures. It gets plenty
of press but not so much market share.
1994 -- Put your name here.
On the Inside
-------------
The fractals that lurk within fractal image compression are not those of
the complex plane (Mandelbrot Set, Julia sets), but of Iterated Function
Theory. When lecturing to lay audiences, the mathematician Heinz-Otto
Peitgen introduces the notion of Iterated Function Systems with the
alluring metaphor of a Multiple Reduction Copying Machine. A MRCM is
imagined to be a regular copying machine except that:
1. There are multiple lens arrangements to create multiple overlapping
copies of the original.
2. Each lens arrangement reduces the size of the original.
3. The copier operates in a feedback loop, with the output of one
stage the input to the next. The initial input may be anything.
The first point is what makes an IFS a system. The third is what makes
it iterative. As for the second, it is implicitly understood that the
functions of an Iterated Function Systems are contractive.
An IFS, then, is a set of contractive transformations that map from a
defined rectangle of the real plane to smaller portions of that rectan-
gle. Almost invariably, affine transformations are used. Affine trans-
formations act to translate, scale, shear, and rotate points in the
plane. Here is a simple example:
|---------------| |-----|
|x | |1 |
| | | |
| | |---------------|
| | |2 |3 |
| | | | |
|---------------| |---------------|
Before After
Figure 1. IFS for generating Sierpinski's Triangle.
This IFS contains three component transformations (three separate lens
arrangements in the MRCM metaphor). Each one shrinks the original by a
factor of 2, and then translates the result to a new location. It may
optionally scale and shift the luminance values of the rectangle, in a
manner similar to the contrast and brightness knobs on a TV.
The amazing property of an IFS is that when the set is evaluated by
iteration, (i.e. when the copy machine is run), a unique image emerges.
This latent image is called the fixed point or attractor of the IFS. As
guaranteed by a result known as the Contraction Theorem, it is complete-
ly independent of the initial image. Two famous examples are Sierpins-
ki's Triangle and Barnsley's Fern. Because these IFSs are contractive,
self-similar detail is created at every resolution down to the infini-
tesimal. That is why the images are fractal.
The promise of using fractals for image encoding rests on two supposi-
tions: 1. many natural scenes possess this detail within detail struc-
ture (e.g. clouds), and 2. an IFS can be found that generates a close
approximation of a scene using only a few transformations. Barnsley's
fern, for example, needs but four. Because only a few numbers are re-
quired to describe each transformation, an image can be represented very
compactly. Given an image to encode, finding the optimal IFS from all
those possible is known as the inverse problem.
The inverse problem -- as mentioned above -- remains unsolved. Even if
it were, it may be to no avail. Everyday scenes are very diverse in
subject matter; on whole, they do not obey fractal geometry. Real ferns
do not branch down to infinity. They are distorted, discolored, perfo-
rated and torn. And the ground on which they grow looks very much dif-
ferent.
To capture the diversity of real images, then, Partitioned IFSs are
employed. In a PIFS, the transformations do not map from the whole image
to the parts, but from larger parts to smaller parts. An image may vary
qualitatively from one area to the next (e.g. clouds then sky then
clouds again). A PIFS relates those areas of the original image that are
similar in appearance. In the literature, the big areas are called
domain blocks and the small areas are called range blocks. It is neces-
sary that every pixel of the original image belong to (at least) one
range block. The pattern of range blocks is called the partitioning of
an image.
Because this system of mappings is still contractive, when iterated it
will quickly converge to its latent fixed point image. Constructing a
PIFS amounts to pairing each range block to the domain block that it
most closely resembles under some to-be-determined affine transforma-
tion. Done properly, the PIFS encoding of an image will be much smaller
than the original, while still resembling it closely.
Therefore, a fractal compressed image is an encoding that describes:
1. The grid partitioning (the range blocks).
2. The affine transforms (one per each range block).
The decompression process begins with a flat gray background. Then the
set of transformations is repeatedly applied. After about four itera-
tions the attractor stabilizes. The result will not (usually) be an
exact replica of the original, but reasonably close.
Scalelessnes and Resolution Enhancement
---------------------------------------
When an image is captured by an acquisition device, such as a camera or
scanner, it has an inherent scale determined by the sampling resolution
of that device -- 300 dots per inch is common for scanners. If software
is used to zoom in on the image, you don't see additional detail, just
bigger pixels.
A fractal image is different. Because the individual transformations of
a PIFS are spatially contractive, detail is created at finer and finer
resolutions with each iteration. In the limit, self-similar detail is
created at all levels of resolution, down the infinitesimal. Because
there is no level that 'bottoms out' fractal images are considered to be
scaleless.
What this means in practice is that as you zoom in on a fractal image,
it will still look 'as it should.' In particular, there won't be the
staircase effect of pixel replication. The significance of this is cause
of some misconception, so here is the right spot for a public service
announcement.
/--- READER BEWARE ---\
Iterated Systems is fond of the following argument. Take a portrait that
is, let us say, a grayscale image 250x250 pixels in size, 1 byte per
pixel. You run it through their software and get a 2500 byte file. A
compression ration of 25:1, wonderful. Now zoom in on the person's hair
at 4x magnification. What do you see? A texture that still looks like
hair. Well then, it's as if you had an image 1000x1000 pixels in size.
So your _effective_ compression ratio is 25 x 16 = 400. Amazing! Buy our
software today!
The sleight of hand is apparent. The detail has not been retained, but
generated. With a little luck it will look as it should, but don't count
on it. Zooming in on a person's face will not reveal the pores.
Objectively, what fractal image compression offers is an advanced form
of interpolation. This is a useful and attractive property. Useful to
graphic artists, for example, or for printing on a high resolution
device. But you can't use it to claim fantastically high compression
ratios.
\--- READER BEWARE ---/
That said, what is resolution enhancement? It is the process of com-
pressing an image, expanding it to a higher resolution, saving it, then
discarding the iterated function system. In other words, the compressed
fractal image is the means to an end, not the end itself.
The Speed Problem
-----------------
The essence of the compression process is the pairing of each range
block to a domain block such that the difference between the two, under
an affine transformation, is minimal. This involves a lot of searching.
As a matter of fact, there is nothing that says the blocks have to be
squares or even rectangles. That is just an imposition made to keep the
problem tractable.
More generally, the method of finding a good PIFS for any given image
involves five main issues:
1. Partitioning the image into range blocks.
2. Forming the set of domain blocks.
3. Choosing type of transformations that will be considered.
4. Selecting a distance metric between blocks.
5. Specifying a method for pairing range blocks to domain blocks.
Many possibilities exist for each of these. The choices that Jacquin
offered in his paper are:
1. A two-level regular square grid with 8x8 pixels for the large
range blocks and 4x4 for the small ones.
2. Domain blocks are 16x16 and 8x8 pixels in size with a subsampling
step size of four. The 8 isometric symmetries (four rotations,
four mirror flips) expand the domain pool to a virtual domain
pool eight times larger.
3. The choices in the last point imply a shrinkage by two in each
direction, with a possible rotation or flip, and then a trans-
lation in the image plane.
4. Mean square error is used.
5. The blocks are categorized as of type smooth, midrange, simple
edge, and complex edge. For a given range block the respective
category is searched for the best match.
The importance of categorization can be seen by calculating the size of
the total domain pool. Suppose the image is partitioned into 4x4 range
blocks. A 256x256 image contains a total of (256-8+1)^2 = 62,001 dif-
ferent 8x8 domain blocks. Including the 8 isometric symmetries increases
this total to 496,008. There are (256-4+1)^2 = 64,009 4x4 range blocks,
which makes for a maximum of 31,748,976,072 possible pairings to test.
Even on a fast workstation an exhaustive search is prohibitively slow.
You can start the program before departing work Friday afternoon; Monday
morning, it will still be churning away.
Increasing the search speed is the main challenge facing fractal image
compression.
Similarity to Vector Quantization
---------------------------------
To the VQ community, a "vector" is a small rectangular block of pixels.
The premise of vector quantization is that some patterns occur much more
frequently than others. So the clever idea is to store only a few of
these common patterns in a separate file called the codebook. Some
codebook vectors are flat, some are sloping, some contain tight texture,
some sharp edges, and so on -- there is a whole corpus on how to con-
struct a codebook. Each codebook entry (each domain block) is assigned
an index number. A given image, then, is partitioned into a regular grid
array. Each grid element (each range block) is represented by an index
into the codebook. Decompressing a VQ file involves assembling an image
out of the codebook entries. Brick by brick, so to speak.
The similarity to fractal image compression is apparent, with some
notable differences.
1. In VQ the range blocks and domain blocks are the same size; in an
IFS the domain blocks are always larger.
2. In VQ the domain blocks are copied straight; in an IFS each domain
block undergoes a luminance scaling and offset.
3. In VQ the codebook is stored apart from the image being coded; in
an IFS the codebook is not explicitly stored. It is comprised of
portions of the attractor as it emerges during iteration. For that
reason it is called a "virtual codebook." It has no existence
independent of the affine transformations that define an IFS.
4. In VQ the codebook is shared among many images; in an IFS the
virtual codebook is specific to each image.
There is a more refined version of VQ called gain-shape vector quantiza-
tion in which a luminance scaling and offset is also allowed. This makes
the similarity to fractal image compression as close as can be.
Compression Ratios
------------------
Exaggerated claims not withstanding, compression ratios typically range
from 4:1 to 100:1. All other things equal, color images can be com-
pressed to a better extent that grayscale images.
The size of a fractal image file is largely determined by the number of
transformations of the IFS. For the sake of simplicity, and for the sake
of comparison to JPEG, assume that a 256x256x8 image is partitioned into
a regular partitioning of 8x8 blocks. There are 1024 range blocks and
thus 1024 transformations to store. How many bits are required per
transformation?
In most implementations the domain blocks are twice the size of the
range blocks. So the spatial contraction is constant and can be hard
coded into the decompression program. What needs to be stored are:
x position of domain block 8 6
y position of domain block 8 6
luminance scaling 8 5
luminance offset 8 6
symmetry indicator 3 3
-- --
35 26 bits
In the first scheme, a byte is allocated to each number except for the
symmetry indicator. The upper bound on the compression ratio is thus
(8x8x8)/35 = 14.63. In the second scheme, domain blocks are restricted
to coordinates modulo 4. And experiments have revealed that 5 bits per
scale factor and 6 bits per offset still give good visual results. So
the compression ratio limit is now 19.69. Respectable but not outstand-
ing.
There are other, more complicated, schemes to reduce the bit rate fur-
ther. The most common is to use a three or four level quadtree structure
for the range partitioning. That way, smooth areas can be represented
with large range blocks (high compression), while smaller blocks are
used as necessary to capture the details. In addition, entropy coding
can be applied as a back-end step to gain an extra 20% or so.
Quality: Fractal vs. JPEG
-------------------------
The greatest irony of the coding community is that great pains are taken
to precisely measure and quantify the error present in a compressed
image, and great effort is expended toward minimizing an error measure
that most often is -- let us be gentle -- of dubious value. These meas-
ure include signal-to-noise ratio, root mean square error, and (less
often) mean absolute error. A simple counter example is systematic
error: add a value of 10 to every pixel. Standard error measures in-
dicate a large distortion, but the image has merely been brightened.
With respect to those dubious error measures, the results of tests
reveal the following: for low compression ratios JPEG is better, for
high compression ratios fractal encoding is better. The crossover point
varies but is often around 40:1 to 50:1. This figure bodes well for JPEG
since beyond the crossover point images so severely distorted that they
are seldom worth using.
Proponents of fractal compression counter that signal-to-noise is not a
good error measure and that the distortions present are much more 'natu-
ral looking' than the blockiness of JPEG, at both low and high bit
rates. This is a valid point but is by no means universally accepted.
What the coding community desperately needs is an easy to compute error
measure that accurately captures subjective impression of human viewers.
Until then, your eyes are the best judge.