Sort a sphere BVH with split planes

January 10, 2024

Hierarchical frustum culling is quick to do with a bounding volume hierarchy (BVH). Possibly the simplest of such is the sphere BVH where each node is represented by a sphere that is large enough to hold all of its children’s bounding spheres:

Two objects A and B wrapped in a bounding sphere parent node.

Once you know the camera position and orientation, a traversal of a BVH produces a list of all possibly visible objects. But you knew all that. OK now the new part: Wouldn’t it be great to already have them sorted in a rough front-to-back order? That way the GPU’s early-Z culling could do its job and you’d gain performance.

A simple approach is to pick an axis-aligned plane for each sphere, divide its children based on which side of the plane their origins are, and then at runtime visit the front-side children first. Just like traversing a BSP or kd-tree. Like this:

At BVH build time object A was stored in the left child of its parent node (see the tree on the right) because its origin lies on the front side of the plane (red). The box represents the extents of all children inside the node. The plane was chosen to split the box along its longest axis.

The plane can be stored in two bits (normal vector’s axis) if you assume the plane always crosses the sphere origin. Of course now you may need to make the bounding sphere larger if you wish to choose its origin to lie in the middle of all objects. If you were building just a BVH you could make its bounds tighter. It’s a tradeoff.

At runtime this change is pretty much free. One check of the split plane axis, like in any kd-tree:

// In function find(Node* n)
int axis = n->axis;
bool left_first = camera_pos[axis] < n->pos[axis];

// Recurse first to the front side?
if (left_first) {
} else {

Is this a new idea? I don’t think so since it’s effectively just a kd-tree with also a bounding sphere stored per node. For example in Quake they had a bounding box per BSP node and used them for frustum culling. Pretty much the same thing. However I tried this sphere-BVH-with-axis-aligned-planes-tree in practice on my Nintendo 64 project; it worked fine and was easy to implement. And that’s what counts, right?

There’s a small performance gotcha here though. In a compact BVH the left child is always stored just after its parent, so this data structure will guarantee cache misses if the camera happens to lie in the positive quadrant of the world coordinate space. In that region every split plane will face away from camera and the right child will always be visited first.

This doesn’t matter in the case your nodes are exactly a cache line long like in the linked example.

Thanks to Branch for feedback on a draft of this article.


For completeness, I’ve copied my BVH building function below. It uses GCC extensions to manage memory and declare nested functions so it’s not standard C. It appears somewhat dense but is nothing out of ordinary. It uses object AABBs to compute the box seen in diagrams above but stores only the bounding spheres, and of course the split plane bits.

BVH builder excerpt

// This code snippet is licensed under CC0 and placed in the public domain.

bool bvh_build(const float* origins, const float* radiuses, const aabb_t* aabbs, uint32_t num, sphere_bvh_t* out_bvh)
    sphere_bvh_t bvh = {};

    // the number of nodes for a full binary tree with 'num' leaves
    const uint32_t max_nodes = 2 * num - 1;
    debugf("max_nodes: %lu\n", max_nodes);
    // temporary array where we add nodes as we go and truncate at the end
    bvh_node_t* nodes = malloc(max_nodes * sizeof(bvh_node_t));
    aabb_t* node_aabbs = malloc(max_nodes * sizeof(aabb_t));
    // references to data arrays origins[] and radiuses[]
    uint32_t* inds = malloc(max_nodes * sizeof(uint32_t));

    // Start at identity mapping that gets permuted by sorting of recursive subranges
    for (uint32_t i = 0; i < num; i++) {
        inds[i] = i;

    // Convention: 
    // variable 'i'    indexes the 'inds[]' array.
    // variable 'idx'  indexes data arrays with idx=inds[i]
    // variable 'n_id' indexes nodes[]

    // Free temporary data at the end of this scope.

    bvh.num_nodes = 0;

    uint32_t bvh_build_range(uint32_t start, uint32_t num) {
        debugf("[%lu,num=%lu] node=%lu\n", start, num, bvh.num_nodes);
        assert(num > 0);
        assert(num != UINT32_MAX);

        uint32_t n_id = bvh.num_nodes++;
        assert(n_id <= max_nodes);

        bvh_node_t* n = &nodes[n_id];
        *n = (bvh_node_t){};

        bool is_leaf = num == 1;
        if (is_leaf) {
            uint32_t idx = inds[start];
            n->pos[0] = origins[idx*3+0];
            n->pos[1] = origins[idx*3+1];
            n->pos[2] = origins[idx*3+2];
            n->radius_sqr = radiuses[idx] * radiuses[idx];
            n->flags = 0;
            n->idx = idx; // represents an index to data array
            node_aabbs[n_id] = aabbs[idx];
            debugf("  ");
        } else {
            aabb_t bounds = (aabb_t){.lo = {__FLT_MAX__, __FLT_MAX__, __FLT_MAX__}, .hi={-__FLT_MAX__, -__FLT_MAX__, -__FLT_MAX__}};

            int count = 0;
            for (uint32_t i = start; i < start + num; i++) {
                aabb_union_(&bounds, &aabbs[inds[i]]);

            node_aabbs[n_id] = bounds;

            float dims[3];
            aabb_get_size(&bounds, dims);
            uint32_t axis = 0;
            if (dims[0] < dims[1]) {
                axis = 1;
            if (dims[1] < dims[2] && dims[0] < dims[2]) {
                axis = 2;

            debugf("dims: (%f, %f, %f) --> axis=%lu\n", dims[0], dims[1], dims[2], axis);

            int compare(const void *a, const void *b)
                float fa = origins[3 * (*(int *)a) + axis];
                float fb = origins[3 * (*(int *)b) + axis];

                if (fa < fb) {
                    return -1;
                else if (fa > fb) {
                    return 1;
                else {
                    return 0;

            qsort(&inds[start], num, sizeof(inds[0]), compare);

            float pos[3];
            aabb_get_center(&bounds, pos);

            uint32_t split = start + num/2;  // median cut by default

            // See if there's an split index that splits the bounding box roughly in two.
            // We ignore the first and last elements because that would be a degenerate split.
            // We'll fall back to default 50/50 split if no a good one was found.
            assert(num >= 1);
            assert(start < max_nodes - 1);
            for (uint32_t i = start + 1; i < start + num - 1; i++) {
                uint32_t idx = inds[i];
                const float* p = &origins[3*idx];
                if (p[axis] >= pos[axis]) {
                    split = i;

            // then compute max radius that holds all spheres inside
            float max_radius = 0.0f;
            for (uint32_t i = start; i < start + num; i++) {
                uint32_t idx = inds[i];
                const float* p = &origins[3*idx];
                float r = radiuses[idx];
                float delta[3] = {p[0] - pos[0], p[1] - pos[1], p[2] - pos[2]};
                float dist = sqrtf(delta[0]*delta[0] + delta[1]*delta[1] + delta[2]*delta[2]);
                max_radius = max(max_radius, dist + r); // must contain sphere origin + its farthest point

            n->pos[0] = pos[0];
            n->pos[1] = pos[1];
            n->pos[2] = pos[2];
            n->radius_sqr = max_radius * max_radius;
            n->flags = 0;
            n->flags |= axis << 2;

            uint32_t left_num = split - start;
            uint32_t right_num = num - left_num;

            if (left_num > 0) {
                n->flags |= BVH_FLAG_LEFT_CHILD;
            if (right_num > 0) {
                n->flags |= BVH_FLAG_RIGHT_CHILD;

            if (left_num > 0) {
                uint32_t left_id = bvh_build_range(start, left_num);
                assert(left_id == n_id + 1);
            if (right_num > 0) {
                // represents an index to nodes[] array
                n->idx = bvh_build_range(split, right_num);

            debugf("  ");
            debugf("   (n_id=%lu)", n_id);

        return n_id;

    // Call the inner function recursively
    bvh_build_range(0, num);

    const uint32_t size = bvh.num_nodes * sizeof(bvh_node_t);
    bvh.nodes = calloc(size, 1);
    memcpy(bvh.nodes, nodes, size);
    bvh.num_leaves = num;
    bvh.node_aabbs = calloc(bvh.num_nodes, sizeof(bvh.node_aabbs[0]));
    memcpy(bvh.node_aabbs, node_aabbs, bvh.num_nodes * sizeof(bvh.node_aabbs[0]));

    *out_bvh = bvh;
    return true;

A more complete example: It doesn’t build as-is though but is standalone.