January 4th, 2024
I wrote a software rasterizer for occlusion culling and hit many small speed bumps along the way. Here I reveal what I’ve learned in the hope of you writing your own with more ease than I did.
This is what the system was for:
This page talks about a depth-only rasterizer for small resolutions, like 160x120 at most. It’s not that different from doing tiled rendering at a higher resolution though.
This stuff will probably only make sense if you’re working on something similar yourself right now. Marcus Svensson’s 2016 thesis is a good place to start if you wish to familiarize yourself with the problem. It has a good overview of this particular technique, of conservative rasterization, and discusses the issue of silhouette extraction.
On to the implementation details.
During triangle setup you need to compute per-pixel differences for Z in X and Y axes. These are used to increment an interpolated depth value when moving from one pixel to the next. I call them the deltas below.
You need to compute the deltas in snapped integer coordinates. If you do it in floating point, the interpolated Z may overflow because you are rendering a different triangle than you set up.
The formula for computing the deltas can be derived in two ways:
The first point of view is commonly taken because it’s simpler. In the end the code ends up being something like this:
// Floating point coordinates but snapped to subpixels already.
= vec3f(v1.x, v1.y, z1) - vec3f(v0.x, v0.y, z0);
vec3f v01 = vec3f(v2.x, v2.y, z2) - vec3f(v0.x, v0.y, z0);
vec3f v02
// Compute the triangle normal (not a unit vector)
= cross3d(v01, v02);
vec3f N
// Now "the deltas" for X and Y increments
float dZdX = -N.x / N.z;
float dZdY = -N.y / N.z;
In which vec3f
is a floating point 3-element vector.
The deltas above can become very large in small triangles. As a workaround I made the triangle setup code use a constant Z if the triangle area was small enough. For rendering occluders it uses the maximum of the vertex depths, and for testing occludees against the Z-buffer I used the minimum respectively. This way the depth tests stay conservative, i.e. never hide objects that should be visible.
The divison in the setup code above is hard to do in fixed point. I just did it with floats and converted to fixed point afterwards. I’m sure there’s some clever way to do it with integers though, for example using a piece-wise linear approximation is suggested in “Small Triangle Barycentrics – Fixed Point” section in Nicolas Guillemot’s 2016 blog post.
If your Z buffer resolution is tiny like mine and you do frustum culling then the guard-band is effectively huge so you can get away with just clipping to near and far planes. See “Guard-band clipping” in this post if you don’t know what I’m talking about.
It’s tempting to skimp on clipping and just discard any occluder that touches the near plane. This is not a profitable strategy because you’ll discard perfectly valid occluders that just happen to pass through the near plane outside the camera view, like this:
The same applies to large objects: you can’t declare them visible just because their bounding box touched the near plane.
Triangles share edges with each other. You should order the edge vertices consistently before intersection with a clipping plane, otherwise you get cracks due to inaccuracies. In practice it’s enough to always flip the edge so that the vertex inside the viewport comes first. See section 8.3.5 “More on Polygon Splitting Robustness” in the Realtime Collision Detection book.
This means either inner-conservative or outer-conservative rasterization where you fill pixels fully inside or all pixels touching the triangle, respectively. See the 2016 thesis mentioned in the beginning.
For occluders shrinking silhouttes is needed for reliability because otherwise they might cover objects that are slightly peeking behind them in the actual render. If this is done with conservative rasterization and not with shrinking occluder geometry, finding silhouette edges is a necessity. See the comparison below.
This feature was an endless source of tiny problems.
In order to shrink or inflate silhouettes, some kind of logic is needed to detect them. It’s straightforward for convex objects if you know how – find all edges that connect a front-facing triangle to a back-facing one – but of course you need a data structure to store this info in. More work for the graphics coder!
Also, clipping often produces new edges, so the triangle neighbor info computed at startup is not valid anymore. In other words, even if an edge was part of the silhouette before clipping, it may not be afterwards and you need to deal with it somehow. I dealt with it by simply not doing conservative rasterization of any clipped triangles :)
Also, outer-conservative rasterization (inflation) will cause interpolants to “overshoot” if you will. Since it by definition fills pixels with their centroids outside the exact triangle bounds, you will end up computing values that are outside of value range of your interpolant. In practice this means a near clipped triangle with a vertex at Z=0 will produce values with negative depths. Good luck storing that in your UINT16 Z-buffer!
The third one is a small one that took me a while to track down. Two-sided triangles can be implemented by flipping vertex order of back-facing triangles but edge identities also change when you do that. If the code earlier had already marked silhouette edges, you need to rotate those flags accordingly. Otherwise back-facing triangles will show cracks because the wrong edges get drawn conservatively.
// Returns a normal vector pointing to the left of line segment ab. In screen space where y axis grows downwards.
(vec2 a, vec2 b)
vec2 get_edge_normal{
return (vec2){ b.y - a.y, -(b.x - a.x) };
}
// See Tomas Akenine-Möller and Timo Aila, "A Simple Algorithm for Conservative and Tiled Rasterization", 2005.
int compute_conservative_edge_bias(vec2 a, vec2 b, bool shrink)
{
// normal points inside the triangle, or the left of line segment 'ab'
= get_edge_normal(a, b);
vec2 n // The division by 2 is here because the results looked better.
int edge_bias = (abs(n.x) + abs(n.y)) / 2;
if (shrink) {
return -edge_bias;
} else {
return edge_bias;
}
}
with
typedef struct vec2_s {
int x, y;
} vec2;
Finally, reverse-Z will not give you better precision you if you have an integer Z-buffer. It’s just a Zreverse = 1 − Z transform and helps to distribute floating point precision more evenly.
Thanks to mankeli for reading the first draft of this essay.
I refer to Fabian Giesen’s blog post listed above. The function I mean is the λ(p)k function for each triangle edge k that returns the interpolated Z for a screen coordinate p. Then the X delta for the first edge would be dZ/dX = λ(p+(1,0))0 − λ(p)0, that’s a finite difference between two pixels.
I wrote some code the verify that I get the same results as with the plane equation route and didn’t simplify the equation algebraically.
Here’s an excerpt with the draw_tri
and
occ_draw_indexed_mesh_flags
functions of my renderer. It’s
in C, for the Nintendo 64, and a bit naive but I left it here to answer
any extremely detailed questions that may have arisen in the critical
Internet reader that is the today’s youth.
#define SUBPIXEL_BITS (2)
#if SUBPIXEL_BITS == 0
#define SUBPIXEL_ROUND_BIAS (0)
#else
#define SUBPIXEL_ROUND_BIAS (1<<(SUBPIXEL_BITS-1))
#endif
#define SUBPIXEL_SCALE (1 << SUBPIXEL_BITS)
#define DELTA_BITS (24) // It seems 24 bits give about 1/8 mean per-pixel error of 16 bits deltas.
#define DELTA_SCALE (1<<DELTA_BITS)
#define FLOAT_TO_FIXED32(f) (int32_t)(f * 0x10000)
#define FLOAT_TO_FIXED32_ROUND(f) (int32_t)(f * 0x10000 + 0.5f)
#define FLOAT_TO_U16(f) (uint16_t)(f * 0x10000)
#define U16_TO_FLOAT(u) ((float)u * 0.0002442002f) // Approximately f/0xffff
#define OCC_MAX_Z (0xffff)
#define ZBUFFER_UINT_PTR_AT(zbuffer, x, y) ((u_uint16_t *)(zbuffer->buffer + (zbuffer->stride * y + x * sizeof(uint16_t))))
#define DUMP_WHEN_Z_OVERFLOWS 1
#define SCREENSPACE_BIAS (-0.50f) // an empirical bias for (x,y) screenspace coordinates to make them cover OpenGL drawn pixels
extern bool config_shrink_silhouettes; // detect edges with flipped viewspace Z signs in each neighbor and add inner conservative flags
extern bool config_discard_based_on_tr_code;
extern bool config_inflate_rough_bounds;
extern bool config_report_near_clip_as_visible; // if queried polygons clip the near plane, always report them as visible
extern int config_force_rough_test_only; // return screen space bounding box test result directly
enum {
= 1,
RASTER_FLAG_BACKFACE_CULL = 1 << 1, // Return when the first depth pass is found
RASTER_FLAG_EARLY_OUT = 1 << 2, // Negative slope bias, nudge closer, minimum per-pixel depth
RASTER_FLAG_NEG_SLOPE_BIAS = 1 << 3, // Positive slope bias, nudge farther, maximum per-pixel depth
RASTER_FLAG_POS_SLOPE_BIAS = 1 << 4, // Round depths to the next higher 16-bit integer. Default is rounding down.
RASTER_FLAG_ROUND_DEPTH_UP = 1 << 5, // Discard any triangle that touches the far plane.
RASTER_FLAG_DISCARD_FAR = 1 << 6, // Actually write to depth buffer
RASTER_FLAG_WRITE_DEPTH = 1 << 7, // Write out visible pixel, combined with EARLY_OUT for testing-only mode
RASTER_FLAG_REPORT_VISIBILITY = 1 << 8, // Inner-conservative rasterization
RASTER_FLAG_SHRINK_EDGE_01 = 1 << 9,
RASTER_FLAG_SHRINK_EDGE_12 = 1 << 10,
RASTER_FLAG_SHRINK_EDGE_20 = 1 << 11, // Outer-conservative rasterization
RASTER_FLAG_EXPAND_EDGE_01 = 1 << 12,
RASTER_FLAG_EXPAND_EDGE_12 = 1 << 13,
RASTER_FLAG_EXPAND_EDGE_20 = 1 << 14,
RASTER_FLAG_RESERVED14 = 1 << 15,
RASTER_FLAG_RESERVED15 };
typedef uint32_t occ_raster_flags_t;
#define RASTER_FLAG_MASK_SHRINK (RASTER_FLAG_SHRINK_EDGE_01 | RASTER_FLAG_SHRINK_EDGE_12 | RASTER_FLAG_SHRINK_EDGE_20)
enum {
= 1,
OCCLUDER_TWO_SIDED = 1 << 1,
OCCLUDER_TEST_FIRST };
typedef uint32_t occ_occluder_flags_t;
#define OCC_RASTER_FLAGS_DRAW (RASTER_FLAG_BACKFACE_CULL | RASTER_FLAG_WRITE_DEPTH |RASTER_FLAG_ROUND_DEPTH_UP)
#define OCC_RASTER_FLAGS_QUERY (RASTER_FLAG_BACKFACE_CULL | RASTER_FLAG_EARLY_OUT | RASTER_FLAG_REPORT_VISIBILITY | RASTER_FLAG_EXPAND_EDGE_01 | RASTER_FLAG_EXPAND_EDGE_12 | RASTER_FLAG_EXPAND_EDGE_20)
static bool isTopLeftEdge(vec2 a, vec2 b)
{
// "In a counter-clockwise triangle, a top edge is an edge that is exactly horizontal
// and goes towards the left, i.e. its end point is left of its start point."
if (a.y == b.y && b.x < a.x) return true;
// "In a counter-clockwise triangle, a left edge is an edge that goes down,
// i.e. its end point is strictly below its start point."
// But note that on the screen Y axis grows downwards so we check if the end point
// Y coordinate is greater than start point Y.
if (b.y > a.y) return true;
return false;
}
static float orient2df(float* a, float* b, float* c)
{
return ((b[0] - a[0]) * (c[1] - a[1]) - (b[1] - a[1]) * (c[0] - a[0]));
}
static int orient2d(vec2 a, vec2 b, vec2 c)
{
return ((b.x - a.x) * (c.y - a.y) - (b.y - a.y) * (c.x - a.x));
}
static int orient2d_subpixel(vec2 a, vec2 b, vec2 c)
{
// We multiply two I.F fixed point numbers resulting in (I-F).2F format,
// so we shift by F to the right to get the the result in I.F format again.
return orient2d(a, b, c) >> SUBPIXEL_BITS;
}
// Returns a normal vector pointing to the left of line segment ab. In screen space where y axis grows downwards.
static vec2 get_edge_normal(vec2 a, vec2 b)
{
return (vec2){ b.y - a.y, -(b.x - a.x) };
}
static int compute_conservative_edge_bias(vec2 a, vec2 b, bool shrink)
{
// See Tomas Akenine-Möller and Timo Aila, "A Simple Algorithm for Conservative and Tiled Rasterization", 2005.
= get_edge_normal(a, b); // normal points inside the triangle, or the left of line segment 'ab'
vec2 n int edge_bias = (abs(n.x) + abs(n.y)) * 0.75; // factor chosen empirically to keep low-rez pixels inside high-rez RDP bounds
if (shrink) {
return -edge_bias;
} else {
return edge_bias;
}
}
void draw_tri(
,
vec2f v0f,
vec2f v1f,
vec2f v2ffloat Z0f,
float Z1f,
float Z2f,
,
occ_raster_flags_t flags* result,
occ_raster_query_result_t*zbuffer)
surface_t {
const bool super_verbose = false;
// The screenspace bias is added to empirically match SW depth map pixels to hi-rez RDP picture.
= {SUBPIXEL_SCALE * (v0f.x+SCREENSPACE_BIAS) + 0.5f, SUBPIXEL_SCALE * (v0f.y+SCREENSPACE_BIAS) + 0.5f};
vec2 v0 = {SUBPIXEL_SCALE * (v1f.x+SCREENSPACE_BIAS) + 0.5f, SUBPIXEL_SCALE * (v1f.y+SCREENSPACE_BIAS) + 0.5f};
vec2 v1 = {SUBPIXEL_SCALE * (v2f.x+SCREENSPACE_BIAS) + 0.5f, SUBPIXEL_SCALE * (v2f.y+SCREENSPACE_BIAS) + 0.5f};
vec2 v2
int area2x = -orient2d_subpixel(v0, v1, v2);
if (false) {
("area2x int: %d\n", area2x);
debugf// debugf("minb: (%d, %d), maxb: (%d, %d), size: %dx%d\n", minb.x, minb.y, maxb.x, maxb.y, maxb.x-minb.x, maxb.y-minb.y);
("v0f: (%f, %f), v1f: (%f, %f), v2f: (%f, %f)\n",
debugf.x, v0f.y, v1f.x, v1f.y, v2f.x, v2f.y);
v0f("v0: (%d, %d), v1: (%d, %d), v2: (%d, %d)\n",
debugf.x, v0.y, v1.x, v1.y, v2.x, v2.y);
v0
// debugf("v01: (%f, %f, %f), v02: (%f, %f, %f)\n", v01.x, v01.y, v01.z, v02.x, v02.y, v02.z);
}
if (area2x <= 0) {
if (flags & RASTER_FLAG_BACKFACE_CULL) {
return;
} else if (area2x != 0) {
(v1, v2);
SWAP(Z1f, Z2f);
SWAP= -area2x;
area2x // Swap shrink flags now that edge identities are different
= flags & ~RASTER_FLAG_MASK_SHRINK;
occ_raster_flags_t new |= (flags & RASTER_FLAG_SHRINK_EDGE_01) << 2; // swap 01 -> 02=20
new |= (flags & RASTER_FLAG_SHRINK_EDGE_20) >> 2; // swap 20 -> 10=01
new = new;
flags } else {
return;
}
}
= {
vec2 minb (min(v0.x, min(v1.x, v2.x)) >> SUBPIXEL_BITS),
(min(v0.y, min(v1.y, v2.y)) >> SUBPIXEL_BITS)
};
= {
vec2 maxb ((max(v0.x, max(v1.x, v2.x)) + SUBPIXEL_SCALE-1) >> SUBPIXEL_BITS),
((max(v0.y, max(v1.y, v2.y)) + SUBPIXEL_SCALE-1) >> SUBPIXEL_BITS)
};
if (minb.x < 0) minb.x = 0;
if (minb.y < 0) minb.y = 0;
if (maxb.x > zbuffer->width - 1) maxb.x = zbuffer->width - 1;
if (maxb.y > zbuffer->height - 1) maxb.y = zbuffer->height - 1;
if (minb.x >= maxb.x || minb.y >= maxb.y) return;
= { minb.x << SUBPIXEL_BITS, minb.y << SUBPIXEL_BITS };
vec2 p_start
int A01 = -(v0.y - v1.y), B01 = -(v1.x - v0.x);
int A12 = -(v1.y - v2.y), B12 = -(v2.x - v1.x);
int A20 = -(v2.y - v0.y), B20 = -(v0.x - v2.x);
int w0_row = -orient2d_subpixel(v1, v2, p_start);
int w1_row = -orient2d_subpixel(v2, v0, p_start);
int w2_row = -orient2d_subpixel(v0, v1, p_start);
int bias0 = isTopLeftEdge(v1, v2) ? 0 : -1;
int bias1 = isTopLeftEdge(v2, v0) ? 0 : -1;
int bias2 = isTopLeftEdge(v0, v1) ? 0 : -1;
// Adjust edge functions based on triangle edge normals.
(!((flags & RASTER_FLAG_SHRINK_EDGE_12) && (flags & RASTER_FLAG_EXPAND_EDGE_12)));
assert(!((flags & RASTER_FLAG_SHRINK_EDGE_20) && (flags & RASTER_FLAG_EXPAND_EDGE_20)));
assert(!((flags & RASTER_FLAG_SHRINK_EDGE_01) && (flags & RASTER_FLAG_EXPAND_EDGE_01)));
assert
if (flags & (RASTER_FLAG_SHRINK_EDGE_12 | RASTER_FLAG_EXPAND_EDGE_12)) {
+= compute_conservative_edge_bias(v1, v2, flags & RASTER_FLAG_SHRINK_EDGE_12);
bias0 }
if (flags & (RASTER_FLAG_SHRINK_EDGE_20 | RASTER_FLAG_EXPAND_EDGE_20)) {
+= compute_conservative_edge_bias(v2, v0, flags & RASTER_FLAG_SHRINK_EDGE_20);
bias1 }
if (flags & (RASTER_FLAG_SHRINK_EDGE_01 | RASTER_FLAG_EXPAND_EDGE_01)) {
+= compute_conservative_edge_bias(v0, v1, flags & RASTER_FLAG_SHRINK_EDGE_01);
bias2 }
+= bias0;
w0_row += bias1;
w1_row += bias2;
w2_row
// Prepare Z deltas
// Prepare inputs to a formula solved via a 3D plane equation with subpixel XY coords and Z.
// See https://tutorial.math.lamar.edu/classes/calciii/eqnsofplanes.aspx
// and the "Interpolated values" section at
// https://fgiesen.wordpress.com/2011/07/08/a-trip-through-the-graphics-pipeline-2011-part-7/
= vec3f_sub((vec3f){v1.x, v1.y, Z1f}, (vec3f){v0.x, v0.y, Z0f});
vec3f v01 = vec3f_sub((vec3f){v2.x, v2.y, Z2f}, (vec3f){v0.x, v0.y, Z0f});
vec3f v02
= cross3df(v01, v02);
vec3f N .z *= inv_subpixel_scale; // Scale back the fixed point scale multiply inside cross3df
N
// N is now in subpixel scale, divide again to bring it to per-pixel scale
.x *= inv_subpixel_scale;
N.y *= inv_subpixel_scale;
N.z *= inv_subpixel_scale;
Nfloat dZdx = -N.x / N.z;
float dZdy = -N.y / N.z;
// Compute Z value at the starting pixel at (minb.x, minb.y). It's computed by extrapolating the Z value at vertex 0.
float Zf_row = Z0f
+ ((minb.x * SUBPIXEL_SCALE - v0.x) / SUBPIXEL_SCALE) * dZdx
+ ((minb.y * SUBPIXEL_SCALE - v0.y) / SUBPIXEL_SCALE) * dZdy;
// Write a constant depth for small triangles.
// It's a a workaround for small triangle delta precision problems.
// Small triangle area is less than one pixel^2.
const int min_triangle_area = SUBPIXEL_SCALE * SUBPIXEL_SCALE * 2;
const bool is_small = abs(area2x) < min_triangle_area;
if (is_small) {
= 0.f;
dZdx = 0.f;
dZdy if (flags & RASTER_FLAG_WRITE_DEPTH) {
// Writing a depth? Make it conservative by writing farthest depth.
= max(Z0f, max(Z1f, Z2f));
Zf_row } else {
// Testing the depth? Test only the closest vertex value.
= min(Z0f, min(Z1f, Z2f));
Zf_row }
}
if (false) {
("area2x: %d\n", area2x);
debugf("Z0f: %f, Z1f: %f, Z2f: %f\n", Z0f, Z1f, Z2f);
debugf("Zf_row: %f, dZdx: %f, dZdy: %f\n", Zf_row, dZdx, dZdy);
debugf("minb: (%d, %d), maxb: (%d, %d), size: %dx%d\n", minb.x, minb.y, maxb.x, maxb.y, maxb.x - minb.x, maxb.y - minb.y);
debugf("v0f: (%f, %f), v1f: (%f, %f), v2f: (%f, %f)\n", v0f.x, v0f.y, v1f.x, v1f.y, v2f.x, v2f.y);
debugf("v0: (%d, %d), v1: (%d, %d), v2: (%d, %d)\n", v0.x, v0.y, v1.x, v1.y, v2.x, v2.y);
debugf("v01: (%f, %f, %f), v02: (%f, %f, %f)\n", v01.x, v01.y, v01.z, v02.x, v02.y, v02.z);
debugf}
// Fixed point deltas for the integer-only inner loop. We use DELTA_BITS of precision.
int32_t Z_row_fixed = (int32_t)(DELTA_SCALE * Zf_row);
int32_t dZdx_fixed = (int32_t)(DELTA_SCALE * dZdx);
int32_t dZdy_fixed = (int32_t)(DELTA_SCALE * dZdy);
int32_t max_Z_fixed = (int32_t)(DELTA_SCALE * 1.0f) - 1;
// Problem: Negative biases make queries super conservative and you can see objects behind walls!
if (flags & RASTER_FLAG_NEG_SLOPE_BIAS) {
// We can make sure each interpolated depth value is less or equal to the actual triangle min depth with this bias.
// Note: this will produce values one-pixel's worth under the triangle's original depth range.
// See MaxDepthSlope in https://microsoft.github.io/DirectX-Specs/d3d/archive/D3D11_3_FunctionalSpec.htm#DepthBias
-= max(abs(dZdx_fixed), abs(dZdy_fixed));
Z_row_fixed } else if (flags & RASTER_FLAG_POS_SLOPE_BIAS) {
+= max(abs(dZdx_fixed), abs(dZdy_fixed));
Z_row_fixed }
(!((flags & RASTER_FLAG_NEG_SLOPE_BIAS) && (flags & RASTER_FLAG_POS_SLOPE_BIAS))); // Mutually exclusive flags
assert
if (super_verbose) {
("Z_row = %f, Z_row_fixed = %ld\n", Zf_row, Z_row_fixed);
debugf}
// Only 'p', 'minb' and 'maxb' are in whole-pixel coordinates here. Others are all in sub-pixel coordinates.
= {-1, -1};
vec2 p
for (p.y = minb.y; p.y <= maxb.y; p.y++) {
int32_t w0 = w0_row;
int32_t w1 = w1_row;
int32_t w2 = w2_row;
int32_t Z_incr_fixed = Z_row_fixed;
for (p.x = minb.x; p.x <= maxb.x; p.x++) {
if (super_verbose) {
("| %s (%-2d, %-2d) %-8f ", (w0 | w1 | w2) >= 0 ? "X" : ".", p.x, p.y, Z_incr_fixed/(float)DELTA_SCALE);
debugf}
if ((w0 | w1 | w2) >= 0) {
int bias = 0;
if (flags & RASTER_FLAG_ROUND_DEPTH_UP) {
= (1 << DELTA_BITS) - 1;
bias }
uint16_t depth;
if (Z_incr_fixed < max_Z_fixed) {
= (Z_incr_fixed + bias) >> (DELTA_BITS - 16);
depth } else {
= 0xffff - 1;
depth }
*buf = ZBUFFER_UINT_PTR_AT(zbuffer, p.x, p.y);
u_uint16_t
if (depth < *buf) {
if (flags & RASTER_FLAG_WRITE_DEPTH) {
*buf = depth;
}
if (flags & RASTER_FLAG_REPORT_VISIBILITY) {
(result);
assert->visible = true;
result->x = p.x;
result->y = p.y;
result->depth = depth;
resultif (g_verbose_early_out) {
("\nvisible (%d < %d) at (%d, %d), v0=(%d,%d)\n", depth, *buf, p.x, p.y, v0.x>>SUBPIXEL_BITS, v0.y>>SUBPIXEL_BITS);
debugf}
if (super_verbose) {
("Z_incr_fixed: %ld\n", Z_incr_fixed);
debugf("relative: (%d, %d)\n", p.x - minb.x, p.y - minb.y);
debugf("minb: (%d, %d), maxb: (%d, %d), size: %dx%d\n", minb.x, minb.y, maxb.x, maxb.y, maxb.x-minb.x, maxb.y-minb.y);
debugf("z0f: %f, z1f: %f, z2f: %f\n", Z0f, Z1f, Z2f);
debugf("v0f: (%f, %f), v1f: (%f, %f), v2f: (%f, %f)\n",
debugf.x, v0f.y, v1f.x, v1f.y, v2f.x, v2f.y);
v0f("v0: (%d, %d), v1: (%d, %d), v2: (%d, %d)\n",
debugf.x, v0.y, v1.x, v1.y, v2.x, v2.y);
v0
("v01: (%f, %f, %f), v02: (%f, %f, %f)\n",
debugf.x, v01.y, v01.z, v02.x, v02.y, v02.z);
v01("N: (%f, %f, %f)\n",
debugf.x, N.y, N.z);
N("dZdx, dZdy: (%f, %f)\n", dZdx, dZdy);
debugf("dZdx_fixed, dZdy_fixed: (%ld, %ld)\n", dZdx_fixed, dZdy_fixed);
debugf
("zf_row2: %f\n", Zf_row);
debugffloat lambda0 = (float)(w0 - bias0) / area2x;
float lambda1 = (float)(w1 - bias1) / area2x;
float lambda2 = (float)(w2 - bias2) / area2x;
float Zf_bary = lambda0 * Z0f + lambda1 * Z1f + lambda2 * Z2f;
("Zf_bary = %f; L0=%f; L1=%f; L2=%f;\n", Zf_bary, lambda0, lambda1, lambda2);
debugf("w0=%ld; w1=%ld; w2=%ld;\n", w0, w1, w2);
debugf
("A01 = %d; B01 = %d\n", A01, B01);
debugf("A12 = %d; B12 = %d\n", A12, B12);
debugf("A20 = %d; B20 = %d\n", A20, B20);
debugf("area2x: %d\n", area2x);
debugf("\n");
debugf("set_grid(min=(%d, %d), max=(%d, %d))\n", minb.x, minb.y, maxb.x+1, maxb.y+1);
debugf("draw_grid()\n");
debugf("draw_tri([(%f, %f), (%f, %f), (%f, %f)])\n",
debugf.x, v0f.y, v1f.x, v1f.y, v2f.x, v2f.y);
v0f
while (true) {}; // HACK
}
}
if (flags & RASTER_FLAG_EARLY_OUT) {
return;
}
}
}
+= A12;
w0 += A20;
w1 += A01;
w2 += dZdx_fixed;
Z_incr_fixed }
+= B12;
w0_row += B20;
w1_row += B01;
w2_row += dZdy_fixed;
Z_row_fixed
if (super_verbose) { debugf("\n"); }
}
}
#define OCC_MAX_MESH_VERTEX_COUNT (24) // enough for a cube with duplicated verts
#define OCC_MAX_MESH_INDEX_COUNT (30)
float matrix_mult_z_only(const matrix_t *m, const float *v)
{
return m->m[0][2] * v[0] + m->m[1][2] * v[1] + m->m[2][2] * v[2] + m->m[3][2] * v[3];
}
void occ_draw_indexed_mesh_flags(occ_culler_t *occ, surface_t *zbuffer, const matrix_t *model_xform, const occ_mesh_t* mesh,
* cached_cs_verts,
cpu_vtx_tuint16_t* tri_neighbors, occ_target_t* target, const occ_raster_flags_t flags,
* query_result)
occ_raster_query_result_t{
(zbuffer->width == occ->viewport.width);
assert(zbuffer->height == occ->viewport.height);
assert
if (query_result) {
->visible = false;
query_result}
// Transform all vertices first
(REGION_TRANSFORM);
prof_begin
(REGION_TRANSFORM_MVP);
prof_begin* mvp = &occ->mvp;
matrix_t* modelview = NULL;
matrix_t, modelview_new;
matrix_t mvp_new
if (!model_xform) {
// No per-object global transform: use defaults
//debugf("using default for mesh %p\n", mesh);
= &occ->mvp;
mvp = &occ->view_matrix;
modelview } else {
// Otherwise compute new ModelView and MVP matrices
//debugf("computing for mesh %p\n", mesh);
//print_matrix(model_xform);
= &mvp_new;
mvp = &modelview_new;
modelview (mvp, &occ->mvp, model_xform);
matrix_mult_full(modelview, &occ->view_matrix, model_xform);
matrix_mult_full}
(REGION_TRANSFORM_MVP);
prof_end
(REGION_TRANSFORM_DRAW);
prof_begin
[OCC_MAX_MESH_VERTEX_COUNT];
cpu_vtx_t all_vertsbool tri_faces_camera[OCC_MAX_MESH_INDEX_COUNT];
bool tri_faces_camera_has_data = false;
for (uint32_t i = 0; i < mesh->num_vertices; i++) {
[i].obj_attributes.position[0] = mesh->vertices[i].position[0];
all_verts[i].obj_attributes.position[1] = mesh->vertices[i].position[1];
all_verts[i].obj_attributes.position[2] = mesh->vertices[i].position[2];
all_verts[i].obj_attributes.position[3] = 1.0f; // Q: where does cpu pipeline set this?
all_verts
if (cached_cs_verts) {
[i] = cached_cs_verts[i];
all_verts} else {
(&all_verts[i], mvp);
cpu_vertex_pre_tr}
(&all_verts[i]);
cpu_vertex_calc_screenspace}
int num_tris = mesh->num_indices/3;
// Compute signed triangle areas only for occluders
if (tri_neighbors && config_shrink_silhouettes) {
for (int i = 0; i < num_tris; i++) {
float *a = &all_verts[mesh->indices[i * 3 + 0]].screen_pos[0];
float *b = &all_verts[mesh->indices[i * 3 + 1]].screen_pos[0];
float *c = &all_verts[mesh->indices[i * 3 + 2]].screen_pos[0];
float area = -orient2df(a, b, c);
[i] = area > 0;
tri_faces_camera// debugf("tri %d facing: %d with area: %f\n", i, tri_faces_camera[i], area);
}
= true;
tri_faces_camera_has_data }
(REGION_TRANSFORM_DRAW);
prof_end(REGION_TRANSFORM);
prof_end
int num_tris_drawn = 0;
int ofs = 0;
if (target) {
= target->last_visible_idx; // start from where we last found a visible pixel
ofs ->last_visible_idx = 0;
target}
for (int draw_idx = 0; draw_idx < mesh->num_indices; draw_idx += 3) {
int idx = (draw_idx + ofs) % mesh->num_indices; // start from 'ofs' but render the whole mesh
const uint16_t *inds = &mesh->indices[idx];
[3] = {all_verts[inds[0]], all_verts[inds[1]], all_verts[inds[2]]};
cpu_vtx_t verts[CLIPPING_CACHE_SIZE];
cpu_vtx_t clipping_cache= {.count = 0};
cpu_clipping_list_t clipping_list
if (g_verbose_setup) {
("pos=(%f, %f, %f, %f), cs_pos=(%f, %f, %f, %f), tr_code=%d\n",
debugf[0].obj_attributes.position[0],
verts[0].obj_attributes.position[1],
verts[0].obj_attributes.position[2],
verts[0].obj_attributes.position[3],
verts[0].cs_pos[0],
verts[0].cs_pos[1],
verts[0].cs_pos[2],
verts[0].cs_pos[3],
verts[0].tr_code);
verts("screen_pos: (%f, %f), depth=%f, inv_w=%f\n",
debugf[0].screen_pos[0],
verts[0].screen_pos[1],
verts[0].depth,
verts[0].inv_w);
verts}
uint8_t tr_codes = 0xff;
&= verts[0].tr_code;
tr_codes &= verts[1].tr_code;
tr_codes &= verts[2].tr_code;
tr_codes
// Trivial rejection
if (tr_codes) {
continue;
}
uint32_t edge_flag_mask = 0;
int tri_idx = idx/3;
if (false) {
// Workaround: Disable early backface culling because it seems unreliable near the screen edges.
if (flags & RASTER_FLAG_BACKFACE_CULL) {
(tri_faces_camera_has_data);
assertif (!tri_faces_camera[tri_idx]) {
continue;
}
}
if (!(flags & RASTER_FLAG_BACKFACE_CULL)) {
("is=%d, tri_idx=%d, faces=%d\n", draw_idx, tri_idx, tri_faces_camera[tri_idx]);
debugf}
}
if (tri_neighbors && config_shrink_silhouettes) {
(tri_faces_camera_has_data);
assert
// Silhouette edges join triangles that face different directions
bool this_facing = tri_faces_camera[tri_idx];
for (int j = 0; j < 3; j++) {
uint16_t other = tri_neighbors[tri_idx * 3 + j];
if (other != OCC_NO_EDGE_NEIGHBOR) {
if (this_facing != tri_faces_camera[other]) {
|= (RASTER_FLAG_SHRINK_EDGE_01 << j);
edge_flag_mask break;
}
}
}
}
const bool clips_near = (verts[0].tr_code | verts[1].tr_code | verts[2].tr_code) & (1 << NEAR_PLANE_INDEX);
const bool clips_far = (verts[0].tr_code | verts[1].tr_code | verts[2].tr_code) & (1 << FAR_PLANE_INDEX);
if (config_report_near_clip_as_visible) {
// If we are drawing for a precise query and we hit a near clip: report as visible
if (clips_near && (flags & RASTER_FLAG_REPORT_VISIBILITY) && !(flags & RASTER_FLAG_WRITE_DEPTH)) {
(query_result && "must pass in a non-NULL query_result if asking for a visibility result");
assertif (query_result) {
->visible = true;
query_result->num_tris_drawn = num_tris_drawn;
query_resultreturn;
}
}
}
if (clips_near && g_near_clipping_action == CLIP_ACTION_REJECT) {
continue;
}
(g_near_clipping_action == CLIP_ACTION_DO_IT);
assert
if (clips_near || clips_far) {
// tr_code = clip against screen bounds, used for rejection
// clip_code = clipped against guard bands, used for actual clipping
//
// We clip only against the near and far plane so they are the same.
[0].clip_code = verts[0].tr_code;
verts[1].clip_code = verts[1].tr_code;
verts[2].clip_code = verts[2].tr_code;
verts
const uint8_t plane_mask = (1 << NEAR_PLANE_INDEX) | (1 << FAR_PLANE_INDEX);
(&verts[0], &verts[1], &verts[2], plane_mask, clipping_cache, &clipping_list);
cpu_gl_clip_triangle
// Workaround: Don't shrink edges. Clipping invalidates silhoutte checks we made above so just rasterize normally
// to avoid cracks. This hack makes occluders cover too many pixels which may lead to culling errors.
= 0;
edge_flag_mask }
if (clips_far && (flags & RASTER_FLAG_DISCARD_FAR)) {
// Reject triangles that touch the far clip plane when rendering occluders. We can't store their farthest depth anyway.
continue;
}
if (clipping_list.count == 0) {
(REGION_RASTERIZATION);
prof_begin(
draw_tri(vec2f){verts[0].screen_pos[0], verts[0].screen_pos[1]},
(vec2f){verts[1].screen_pos[0], verts[1].screen_pos[1]},
(vec2f){verts[2].screen_pos[0], verts[2].screen_pos[1]},
[0].depth, verts[1].depth, verts[2].depth,
verts| edge_flag_mask, query_result,
flags );
zbuffer(REGION_RASTERIZATION);
prof_end++;
num_tris_drawn} else {
(REGION_RASTERIZATION);
prof_beginfor (uint32_t i = 1; i < clipping_list.count; i++) {
[3];
vec2f sv[0].x = clipping_list.vertices[0]->screen_pos[0];
sv[0].y = clipping_list.vertices[0]->screen_pos[1];
sv[1].x = clipping_list.vertices[i - 1]->screen_pos[0];
sv[1].y = clipping_list.vertices[i - 1]->screen_pos[1];
sv[2].x = clipping_list.vertices[i]->screen_pos[0];
sv[2].y = clipping_list.vertices[i]->screen_pos[1];
sv
(
draw_tri[0], sv[1], sv[2],
sv.vertices[0]->depth, clipping_list.vertices[i - 1]->depth, clipping_list.vertices[i]->depth,
clipping_list| edge_flag_mask, query_result,
flags );
zbuffer++;
num_tris_drawn
if ((flags & RASTER_FLAG_EARLY_OUT) && query_result && query_result->visible) {
break;
}
}
(REGION_RASTERIZATION);
prof_end}
if (query_result) {
->num_tris_drawn = num_tris_drawn;
query_resultif (g_verbose_visibility_tracking) {
("visible: %d\n", query_result->visible);
debugf}
}
// Early out from all triangles even if only one of them was visible
if ((flags & RASTER_FLAG_EARLY_OUT) && query_result && query_result->visible) {
->last_visible_idx = idx;
targetif (target && g_verbose_visibility_tracking) {
("was visible at wrapped_is = %d = (%d+%d) %% %lu\n", idx, draw_idx, ofs, mesh->num_indices);
debugf}
return;
}
}
}