Quite Ok image format

Share on:

QOI

There is a new image format in town. It is about on par with PNG for compression, and only encodes RGB and RGBA. What is interesting about it is how simple it is, and as a result of that simplicity how fast it is. So fast that it has achieved essentially O(n) which is the optimal performance scale for an image. That green line above in the image from Wikipedia shows O(n).

What is QOI?

QOI is the “Quite Ok Image” format. The author, Dominic Szablewski openly admits he isn’t an compression guy, but I think that might be in his favor.

He starts out talking about the image and video formats that we have today. His comments ring true. They are complex and stink of design by committee. I’ve talked about this before myself. These image formats are overly complex and force us to rely on complex libraries rather than understand the format. They are also slow for the same reasons. We just haven’t noticed as our hardware got faster and included silicon optimizations for many of the formats we use for multimedia.

QOI is the exact opposite. He started with a simple goal of encoding an RGB image in a smaller way that was simple. Now the file format isn’t done yet (last change was a week ago and made the design simpler and smaller), but we can look at the format and understand what is going on.

He starts with a 14 byte header.

Field name Type Description
“qoif” UTF-7 String magic value
Width Big Endian unsigned 32 bit integer
Height Big Endian unsigned 32 bit integer
Channels unsigned 8 bit (3 for RGB, 4 for RGBA, or illegal)
Color Space unsigned 8 bit mask 0000rgba (0 is sRGBA, 1 is linear)

Fairly straightforward. There is no place to store metadata. That is a double edge sword. Throwing in a variable length field just after the image is an easy way to include metadata, but it also means that is one more thing to process. It could be placed after the image data itself, but what is it? What rules does it follow? The author left it out for all of these reasons, but I believe because he is wanting to head to a video format eventually and all of the metadata options get sticker in that case.

Things built in run time

Next we establish a standard “previous pixel” where one isn’t defined. This way there is no initial pixel to worry about encoding. He wisely chooses

{r: 0, g: 0, b: 0, a: 255}

There is also a small cache of 64 pixels that are kept live based on all the pixels previously seen. Each pixel is “hashed” by taking the XOR of each channel and then masking off the lower 6 bits. This would look like

(r^g^b^a) % 64

in C. New values replace the old values.

Pixel encoding

Then pixels are processed into 1 of 4 types of encodings as the following text copied from github denotes.

  • a run of the previous pixel
  • an index into a previously seen pixel
  • a difference to the previous pixel value in r,g,b,a
  • full r,g,b,a values

The encoding is choosen in the order of the list above. You use the first one that works. Let’s go down in bit order though.

0 0 - Index

0 0 for the first 2 bits means it is an index. That means recover the pixel from the cache described above. The remaining 6 bits are the number in the cache of the pixel value. For encoding, if the pixel in the cache is the same as the one you are encoding, you use the index encoding. Run should be used before index if it fits.

0 1 0 - Small Run

The next 5 bits is the count for how long the previous pixel is repeated. 1 to 32 times. If run works then it should be the encoding used.

0 1 1 - Large Run

The next 13 bits is the count from 33-8224 for how many times the previous pixel is repeated.

Note, early statistics showed this is almost never used and it will probably be dropped for a simpler version or another feature.

1 0 - 8 bit diff

Next 6 bits are the 2 bit difference for R, G, and B (in that order). From -1 to 2 from the value of the last pixel.

1 1 0 - 16 bit diff

The next 13 bits are the difference from the last pixel. Next 5 bits is the difference for Red (-15 to 16). Followed by 4 bits for Green and the Blue (-7 to 8).

1 1 1 0 - 24 bit diff

The next 20 bits are the 5 bit difference for R, G, B, and A in that order (-15 to 16). Notice this is the only difference that supports the alpha channel. If there is a smooth gradient in the Alpha this will be the most common encoding for that area I think.

1 1 1 1 - Full RGBA value

If none of the previous encodings work, you use this one. After the 4 header bits, the next 4 are a bitmask for rgba. If any of the bitmask is 1 than a byte is allocated to store the value of that color. Those bytes are stored in the order of the mask.

Results

The initial results look roughly 20x faster than the next fastest PNG library for encoding and 3x faster for decode. Compared to the reference library libpng it is closer to 50x in encoding and 4x in decode. These are nothing to sneeze at.

Now we could beat this with raw value stored in a file that is memory mapped, unless the time to read over I/O is slowed by the size, which is a real problem with larger images. The time to read from a disk or over a network is often longer for raw bitmaps than PNG encoding. So how does QOI handle that? By compressing roughly as well as PNG.

In the limited tests the author has published the images are roughly between the size encoded by stb_image and libpng and leaning fairly close to PNG. So a QOI will read over I/O at about the speed of a PNG, but decode 3-4 times faster, thus saving CPU cycles.

All of this with almost no optimization. Keeping the format simple (we fully described it above and I think it could be fully described on a single sheet of paper) has multiple values to it. From an anticryptography stand point, this is a format that would be easy to push through space and time while being fairly efficient.

Now someone somewhere is probably crowing about JPEG and the other more advanced formats. Stop right now. They are slower, more complex and they loose data. QOI is a lossless format. Sure JPEG and others result in a smaller file, but sometimes an image has to be lossless, and other times JPEG makes a mess of things (look at a scanned document converted to JPEG sometime and tell me that isn’t ugly).

Where this can go.

There is a large area where lossless images are a requirement and this new format kind of throws shade on using PNG at all. With some additional tweaks in QOI it might do better than PNG in many cases, but it is already in the ballpark and the author may not care.

For scientific results or legal evidence, QOI may be the format of choice in the near future. It is so simple that archiving data in it makes sense. In 100 years someone could blow the dust off the short format description and be decoding images in a few hours.

It could also be a real replacement for MJPEG. Encoding video as a stream of compressed images is normally very lossy. MPEG and MJPEG offer compression by throwing out things that the human eye are less likely to see. MPEG is what we use for entertainment, while MJPEG is often used where video needs to be encoded quickly and a larger file is ok.

A stack of QOI images could be encoded on a very low end processor and sent over a network, or stored easily. This means less power, and potentially much smaller hardware. At the same time QOI preserves the original pixel which neither MJPEG or MPEG are going to actually do. For scientific or legal purposes this may be the only approved video format some day.

There are of course some interesting variants that could be applied. Maybe denote if a frame is standalone or the difference from the one before. If you do have a little more time and CPU this could start being competitive in size to early versions of MPEG and still be lossless running on a minimal CPU.

Final thoughts

I’m really excited about QOI. I’m going to watch the for the final version of the spec on 12/20/2021. This could change images, document scans and video in the long run. If the author keeps it simple that is. If not the format is so simple that anyone could fork it and have their own in an afternoon.

How are you going to use QOI?