July 17, 2025
The first principal component of a set of points measures their main axis of variation. It’s a unit vector that points in the direction where the points are “most spread”, if you will. In computer graphics, this concept is very useful in color quantization, texture compression and when fitting oriented bounding boxes to meshes.
Let’s think pixel colors as 3D points inside the RGB cube. Now color quantization can be formulated as the task of splitting this 3D space in mutually-exclusive subspaces that each represent a palette color. This description doesn’t fit all possible approaches but is close enough. The hierarchical top-down methods such as median cut recursively split the space in two by a plane, like in a BSP-tree, and bin the colors based on which side of the plane they end up on.
Choosing an axis-aligned split plane is easy to do, for example by computing per-channel variances (squared deviation from the mean), sorting the points along the axis with the maximum variance, and then splitting the set of points in the middle. We can interpret this operation as a split based on a crude axis-aligned guess of the main axis of variation, the first principal component.
So of course instead of using that crude guess, we can put on our Numerical Computing Expert hat and do a full Principal Component Analysis (PCA) to get the real thing. For example the “Binary Split” method of 1991 and Xiaolin Wu’s 1992 dynamic programming approach compute the “principal eigenvectors” and use those for splits.
Applying PCA isn’t that bad but it’s a bit annoying since you need a 3x3 covariance matrix plus some routine to extract its eigenvectors. For example look at how the libsquish texture compressor computes the matrix and the vectors. It’s not exactly free.
But isn’t there an approximation that’s in between the “crude axis-aligned” and “going full PCA” solutions? A simple approach to get only the “principal eigenvector”?
When studying Dennis Ranke’s exoquant-rs color quantization library, I came across its “primary vector” computation. It estimates an approximate direction of largest variation in a set of 3D colors, to be used to divide the set in half along that vector. Sound familiar?
The code looked like this:
// Histogram has been sorted by the channel with the largest variance
// The mean color is stored in 'avg'
// Determine primary vector of distribution in the histogram
let mut dir = Colorf::zero();
for entry in &histogram {
let mut tmp = (entry.color - avg) * entry.count as f64;
if tmp.dot(&dir) < 0.0 {
*= -1.0;
tmp }
+= tmp;
dir }
*= {
dir let s = dir.dot(&dir).sqrt();
if s < 0.000000001 { 1.0 } else { 1.0 / s }
};
Source: quantizer.rs
If we ignore the “histogram” concept, it’s really simple: for each point, compute the difference from the mean point, sum the resulting direction to the accumulator dir
(flipped to match its direction), and normalize at the end.
Done.
Note that the data is still sorted on the coordinate axis of the largest variance. I don’t know how important this detail is, but at least it makes the summation deterministic. Therefore you still have to compute per-channel variances like in the axis-aligned case, but you’d need that loop for the mean point anyway.
I haven’t seen this approximation before, but given its so basic (a special case of incremental PCA?), I’m sure it has been independently discovered multiple times. When I asked the author what is it based on, the answer was “school level vector math” :) For reference, the original exoquant.c is from 2004.
Well, all the better! Below is my direct translation to Python using NumPy.
def approx_first_pc(data, sort_axis=None):
# Sort the data for determinism, possibly also helps estimation.
# The axis of max variance is compatible with exoquant.
if sort_axis is None:
= np.var(data, axis=0)
var = np.argmax(var)
sort_axis
= data[np.argsort(data[:, sort_axis])]
data
= np.mean(data)
avg dir = np.zeros(data.shape[1])
for i in range(data.shape[0]):
= data[i] - avg
delta if dir.dot(delta) < 0:
*= -1
delta dir += delta
= np.linalg.norm(dir)
s if s < 1e-9:
return dir
else:
return dir / s
I tested it against sklearn.decomposition.PCA
on some random Gaussians in 2D and it seems to work well, see the plots in the beginning of this article.
Since the data is sorted by the coordinate axis of maximum variance (I left that out of the Rust snippet), negative values come first.
Therefore the resulting vector tends to point to the negative direction too.
Also, there might be a hint of bias toward zero (see the annotation in the plot below) but I didn’t try to measure it.
I also tried adding some Gaussian noise to see if the approximation performs worse than PCA. They seem to be roughly on par:
Since the original C version of exoquant with the same approximation has proven reliable in practice, the probability of a hidden catastrophic defect seems quite low. At least I’ll be adding this to my proverbial “numerical coding bag of tricks” in the future.
I’m writing a book on computer graphics. Sign up here if you’re interested.
Links to materials mentioned in the text.