How Westwood VQA works

November 14th, 2024

In which I investigate a 30-year-old video format.

Imagine if you could watch movies on your computer. With the arrival of CD-ROM discs in the 90s, many game developers were dreaming of this. With hundreds of megabytes to spare, you could finally make your game a cinematic multimedia masterpiece.

One of those game developers was Westwood Studios. Their games such as Command & Conquer entertained their players with carefully produced but unintentionally campy intermission cutscenes.

A still from Command & Conquer’s opening cutscene embedded above. Encoded in 8-bit color at 320x156. Scaled here a bit vertically to compensate for the non-square pixels of the original.

What concerns us here however, is their in-house “VQA” video format. I got into it in 2020 when the C&C remaster was released with its somewhat strange-looking upscaled high-definition videos. I learned more about VQA when extending its support in the popular ffmpeg command-line tool. Later I also wrote a standalone encoder for it.

A still from Command & Conquer: Tiberian Sun’s cutscene. This one is encoded in 15-bit color at 640x400. We’ll use it as an example in this article. — A still from *Command & Conquer: Tiberian Sun*’s cutscene. This one is encoded in 15-bit color at 640x400. We’ll use it as an example in this article.

In this article I’ll use the VQA format as a vehicle to talk about more general data compression related concepts. I think VQA is an interesting example since it is optimized for a fast decoding on 90’s CPUs, slow CD-ROM drives and cheap graphics chips. It has shipped in games played by millions of people but is simple by necessity.

A high-level overview

There isn’t anything really difficult here. Every frame gets split into blocks of 4x2 or 4x4 pixels in size, depending on the game. Common blocks are shared by nearby frames and are collected to a small set known as the codebook. Each frame can then be represented using those common blocks instead of the original pixels. Occasionally the codebook blocks get a refresh, for example when a scene cut happens.

The image can been seen to consist of 4x4 pixel blocks.

VQA videos use both lossy and lossless compression. The lossy techniques are

vector quantization,
color palettes, and
delta encoding also known as “conditional replenishment”.

The lossless ones compress the already smaller lossy stream further. They are

sliding-window compression,
data reordering, and
run-length encoding.

Compressors often squeeze the final bitstream even further with an “entropy coding” step such as Huffman coding. The coding uses fewer bits to represent commonly occurring sequences. However, in VQA there’s no entropy coding at all. I suppose it keeps the decoder simple.

OK let’s talk about the details now. We’ll begin with the most important concept.

The “VQ” in the name

At its core VQA is a vector quantization codec. Vector quantization is a technical term for a compression scheme where you replace many similar parts with a single representative. It’s the block reuse scheme I just described in the high-level overview section earlier. In compression jargon the blocks are called vectors and the indices that refer to codebook blocks are called codes.

The codebooks can look something like below:

The roughly 3000 codebook blocks used in our example frame. They seem to be ordered by contrast but I don’t know why. Magnified by 3x to make the 4x4 pixel blocks easier to see.

So why call blocks “vectors?” Let’s say we have 4x4 pixel RGB blocks. They consist of 4*4*3 = 48 values each. We can consider the blocks as 48-dimensional vectors if we put all the RGB values of every pixel one after another. Now it’s possible to apply vector operations to the blocks!

So for example the block on the left would become this vector (the exact order of doesn’t matter as long as its consistent):

[140 140 140 173 173 156 206 206 173 214 214 181 140 140 140 173 173 156
 214 206 181 222 222 181 140 140 140 173 173 156 214 214 181 231 231 189
 140 140 140 173 173 156 214 214 181 231 222 189]

In practice the only vector operation we are interested in is computing a distance between two points. Now a clustering algorithm such as Lloyd’s algorithm a.k.a. k-means can build the codebook for us. Finding the nearest codebook entry for some block is can be done by finding the nearest cluster centroid. In my ClusterVQA encoder I used scikit-learn’s MiniBatchKMeans class and it worked OK.

The connection to color quantization

Compressing a high number of colors to a small palette is called color quantization but if you look at it, it’s also vector quantization! The vectors are 3-dimensional RGB colors and the codebook is now the palette. This is why for example Apple Video can be considered to be a VQ-based video codec as well, even though it doesn’t use a codebook. It effectively creates small 4-color palettes for 4x4 pixel blocks.

Frame updates

Each new frame is a 2D array of 16-bit codes that represent block indices to the codebook. These take most of the video stream so it makes sense to compress them.

Sliding-window compression

Both codebooks and frame updates are compressed with sliding-window compression. It’s LZ77-style and called “LCW”, and also known as “format80” because the byte 0x80 marks the end of data. It’s the same idea as in zip files: if some byte sequence repeats, just refer to it with an (offset, byte_length) pair instead of writing out the whole thing. The only unique LCW feature is a special command that writes a repeated constant instead of copying it from the stream.

In Command & Conquer’s frame update data the sliding-window compression seems to result in sizes 55–75% of the original.

Data reordering

Sometimes it’s worthwhile to reorder your data to be easier to compress. In 8-bit VQA files, the frame updates are not stored just as a sequence of 16-bit integers but as two streams. The 16-bit values are split in two bytes and the low bytes come first in the file, followed by the high ones. In the below hex dump taken right at the middle of a chunk, you can see how different the value distributions are and how the high bytes have more repetition:

000017b0: 5e5e b1a4 d57c 811c 02ba 5248 411c d2a3  ^^...|....RHA...
000017c0: 84c3 82c2 7a4b b59a 500f e6af f138 ed72  ....zK..P....8.r
000017d0: c34e 46b4 29ce 7bab 9508 9fe1 4dbd b948  .NF.).{.....M..H
000017e0: 5ab2 ea48 076a f9f9 ee90 fef9 5916 02ee  Z..H.j......Y...
000017f0: 8bf2 69b0 13d6 7ebd de5f 7b35 2a68 abdb  ..i...~.._{5*h..
00001800: e67e 2f97 407d 2797 7b97 cbbc 0a88 0a5c  .~/.@}'.{......\
00001810: 0b7a d0bc 4a94 1a9c 9a3d 15af ce8b b41d  .z..J....=......
00001820: 3c1b 3c36 91bb 69ce d20a eb37 b913 a83b  <.<6..i....7...;
00001830: a348 ea48 07f7 f9f9 ee90 1bf9 59de 02ee  .H.H........Y...
00001840: 8b7c 18b0 4622 d6bd de24 ef35 389a abab  .|..F"...$.58...
00001850: 9446 5697 1a1d 2797 7b7d 6600 5c1d 0a7b  .FV...'.{}f.\..{
00001860: 0f01 0000 0500 0004 0003 0001 0005 0202  ................
00001870: 0f02 0401 0004 0000 0000 0001 0000 0000  ................
00001880: 0505 0206 0301 0100 0001 0100 000f 0000  ................
00001890: 010f 0f00 0001 0f00 0f0f 0f0f 0f0f 010f  ................
000018a0: 0f0f 0f0f 0505 0101 0001 0101 0103 0100  ................
000018b0: 0f00 070f 0000 0502 0205 0206 0000 0200  ................
000018c0: 0f00 0004 0000 0400 0201 0001 0001 0100  ................
000018d0: 0605 0205 0301 0102 0000 0701 010f 000f  ................
000018e0: 0000 0000 0f00 0000 0f0f 0f0f 0f0f 0f0f  ................
000018f0: 0f01 0f0f 0603 0002 0401 0000 0106 0401  ................
00001900: 0f00 0106 0001 0706 0502 0000 0402 0f0f  ................

Something similar is supported in Apache Parquet, for example.

Run-length and delta encoding

In 15-bit Hi-Color VQAs the frame updates use a run-length encoding (RLE) scheme. For repeated “runs” of the same code they write out (code, length) pairs instead of the original data. Note that this is simpler than sliding-window compression because it can repeat only a single byte.

Now frame updates can also skip over some blocks. This way static image areas screen can be kept unchanged. This is what I refer to as delta encoding. In practice almost all blocks get updated anyway:

Changed (red) and unchanged (blue) blocks.

To be honest, I was surprised to learn that 8-bit VQAs don’t do any delta encoding and always update every block. Also, the run-length encoded code updates are also compressed via LCW like before.

Effectiveness

The video bitrate seems to be about 1 Mbit/s for the 8-bit videos and 2.5 Mbit/s for the 15-bit ones. For comparison, DVD video bitrates are 3–6 Mbit/s.

How much smaller are the compressed videos than the originals? Assuming each 8-bit video frame would have its own palette, let’s compute the uncompressed bit rates.

Uncompressed bitrate (8-bit)
((256 * 2 + 320*156)*15*8) / 1e6 = 6.05 Mbit/s

Uncompressed bitrate (15-bit)
((640*400*2)*15*8) / 1e6         = 61.44 Mbit/s

So we get 5.9x and 25x compression ratios for 8-bit and 15-bit videos, respectively:

Compression Ratio          = Uncompressed data rate / Compressed data rate
Compression Ratio (8-bit)  = 6.05  Mbit/s / 1.02 Mbit/s = 5.9x
Compression Ratio (15-bit) = 61.44 Mbit/s / 2.42 Mbit/s = 25x

You are also supposed to measure image distortion with some measure such as PSNR but we don’t have access to the originals so it can’t be done. Of course I could encode my own VQAs and compare but that’s too much work :)

Where the bits are spent

Here are some more details on the six test videos I used to compute the above numbers. Apparently the 15-bit “VQA Hi-Color” videos consist of 96% code indices while on 8-bit ones that portion doesn’t get much higher than 80%.

filename      resolution  frames  colors  Mbit/s  codebooks  indices  palettes  size (MiB)
AKIRA.VQA     320x156     845     8-bit   1.01    23.12 %    76.83 %  0.05 %    6.81
SPYCRASH.VQA  320x156     245     8-bit   0.82    24.84 %    75.11 %  0.04 %    1.61
INTRO.VQA     320x156     1933    8-bit   1.22    18.95 %    80.52 %  0.54 %    18.74
nowcnot.vqa   640x400     286     15-bit  2.35    3.05 %     96.95 %            5.34
hideseek.vqa  640x400     255     15-bit  2.29    3.37 %     96.63 %            4.64
iceskate.vqa  640x400     247     15-bit  2.63    3.90 %     96.10 %            5.17

All the videos played at 15 frames per second.

Looking back

While VQA served it purpose, in Red Alert 2 the videos were encoded with the industry-standard Bink Video.

Ray Wise pacing around the Oval Office in a Red Alert 2 cutscene. Screenshot via MobyGames.

The last game that shipped with VQA was the NoX action RPG, also from 2000 like RA2. It is rumored that NoX accidentally shipped with the original codebook construction code in it. I haven’t seen it myself but it would be a neat swan song for the format if it were true.

Resources

VQA format details that get very technical.
The VQA Enc encoder by Gordan Ugarkovic from 2005.
The ClusterVQA encoder by yours truly.
ffmpeg’s VQA decoder source code: vqavideo.c and westwood_vqa.c.
ffmpeg’s test data from various Westwood games.
My Image vector quantization is just like creating a tilemap Jupyter notebook that shows how to use k-means to construct an image codebook.
The original LCW encoder written the MASM assembly language.
Denzil Long’s interview that touches on video streaming for C&C.

Some unassorted details

The VQA container format is structured as RIFF-style chunks. This is useful because readers can skip any data they are not interested in. For example ffmpeg skips many chunk types:

 /* catch any unknown header tags, for curiosity */
         switch (chunk_tag) {
         case CINF_TAG:
         case CINH_TAG:
         case CIND_TAG:
         case LINF_TAG:
         case PINF_TAG:
         case PINH_TAG:
         case PIND_TAG:
         case FINF_TAG:
         case CMDS_TAG:
         case VIEW_TAG:
         case ZBUF_TAG:
             break;

Interestingly, Blade Runner from 1997 also includes Z-buffers in its videos ZBUF chunks mentioned above. The Z-buffers are used to correctly occlude characters behind the looping video environments.
- The game consists basically solely of looping VQA videos!
There at least three versions of VQA files. VQA1 (Legend of Kyrandia III) and VQA2 (Command & Conquer) use 8-bit colors and 4x2 pixel blocks. In VQA2 videos the codebooks are sent in parts one 1/8 at a time for bitrate control. The “Hi Color VQA” version 3 has 15-bit colors and 4x4 pixel blocks. In theory you could mix & match the block sizes between versions.
Frame rate seems to be 15 except in the first VQA-enabled game Legend of Kyrandia III where it’s 10.
All codebooks also include solid-colored blocks by default without them needing to be sent.
For the Command & Conquer remaster they had to upscale the compressed videos because the original recordings were lost. As a source material I think they used the MJPEG-like data from the PlayStation release though.