Acceleration Datastructures for Raytracing

"Raytracing is the future . . . and always will be!"
- attributed to David Kirk (wording varies^[1])

In raytracing, we trace a ray out into the scene and figure out what it hits. The problem is that there are usually many rays to trace and many objects in the scene that each could potentially hit, and so it would be far too slow to test every ray against every object.

Instead, we use an acceleration structure, usually posed as a way to reduce the number of ray / primitive intersection tests that must be performed. Intuitively, we can check if the ray intersects a broad region of the scene and, if not, we can forego testing all primitives within that region, saving work.

Many acceleration structures are essentially search trees, often theoretically if not practically binary search trees. Just as a binary search reduces the number of tests on \(O(n)\) items down to \(O(\log(n))\), an acceleration structure can often also give a logarithmic theoretical speedup.

In fact, one of the main reasons people say that raytracing is the future is that rendering \(n\) primitives with raytracing can require as little as \(O(\log(n))\) time, while rasterization always requires \(O(n)\) time. It's currently a bit difficult to see this in practice, because this access requires out-of-order memory access, which is slower, and because production rasterization systems are both efficient and use software culling, but the potential is there.

There are many kinds of acceleration structures used for raytracing, but it's a bit hard to keep track of them, what properties they have, and what variants are interesting. This page aims to list at least the main categories and primary variants of raytracing acceleration structures, to get the graphics practitioner up-to-speed. (Ideally, it would be an exhaustive list of all variants, but this is a much taller task^[2].) Corrections and contributions are welcome.

Practical Advice

First, some practical advice. If you are considering implementing your own acceleration structure . . . don't. They are difficult to get correct, and much harder still to make performant^[3]. Moreover, it's not the fun part of writing a renderer and, because there are mature, off-the-shelf solutions for both CPU and GPU, there's no point.

If you're doing it for the learning experience, then well and good—just so long as you realize you'll later have to scrap it and use something real to get normal performance. Also, I suppose, you should implement your own acceleration structures if you're doing academic research in the field—but then, in such a case, you would already be an expert and you wouldn't need this article 🙂

On the CPU, the best option is currently the Intel Embree library. You give it your primitives and it builds an advanced BVH over them (they're a little vague on precisely what kind, but it seems to be a multi-level SAH-heuristic BVH, with tree rotations and collapsed to wider (4 or 8 arity) nodes). The traversal supports SIMD kernels, ray packets, and even GPU execution through SYCL.

On the GPU, there are several viable options. You can get cross-platform hardware support for raytracing through Vulkan. Direct3D too, if you only care about Windows. NVIDIA's OptiX^[4] probably easiest, but it is NVIDIA-only. The particulars of how vendors' hardware accelerate the ray / primitive intersections are not publicly known, and in any case probably changes slightly from GPU to GPU, but all of these APIs give you access to it.

Without further ado, a taxonomy of acceleration structures!

Regular Grid

Perhaps the oldest and simplest acceleration structure: you partition the primitives of the world up into cells of a 3D grid, and then walk the ray through the grid, cell by cell, testing all the primitives that are contained in that cell.

Although they have fallen out of favor, regular grids are actually surprisingly good. The stepping through cells is coherent and streamable in memory, which is the main deficiency of tree-based methods. They are especially convenient on the GPU; one reason is that rebuilding them is easy (you can build a grid by rasterizing the scene^[5], using the raster hardware).

There are a few disadvantages. Without hardware support, the stepping is a little obnoxious to implement. One issue there is that primitives usually span multiple cells. You have to account for this to avoid duplicate primitive tests (one reasonable way for static scenes and certain primitive types is to slice any such primitives so each new sub-primitive falls only in a single cell).

The main issue with regular grids is the 'teapot in a stadium' problem: you have a complicated object (the teapot) placed into a large scene (the stadium) with, say, the camera being close to the complicated object. Most of the rays hit the complicated object, but any grid of reasonable resolution that is spatially large enough to be able to handle the full scene will also have cells too large to accelerate hit tests on the complicated object. Fortunately, this can be addressed by nesting.

Nested Regular Grids

A nested regular grid is what it sounds like: inside some of the cells of a regular grid, you have another regular grid, of finer resolution. This fits into the common trope of having a coarse/broadphase acceleration structure and a fine/narrowphase acceleration structure. You can continue recursively, of course.

Octree

An octree is the extreme case of a recursive nested grid, where the grid resolution is fixed to \(2\) on a side.

Octrees were popular in the early days of raytracing, but have since become unpopular for production rendering, basically because they combine the worst aspects of trees and grids: they reduce the (linear) space only by a factor of \(2\) with each level, so you usually need many recursion levels to cut the space down enough, and the grids don't align nicely to primitives so you still need splitting. For some reason, they do remain a popular topic in research graphics, however.

The basic \(2\times 2\times 2\) octree wastes the fewest nodes, but the nodes are near-trivial and the high recursion depth is really detrimental to performance. Collapsing a level (i.e. a tree with \(4\times 4\times 4\) nodes) seems too wide for brute-force testing in software, but like a good choice for hardware.

Binary Space Partition Tree

A binary space partition tree (BSP tree) is the general category for a binary tree where the child nodes are divided by a splitting plane. For example, we can select a triangle from the scene, slice the scene in half by the triangle's plane, and sort the remaining triangles into one half or the other based on which side of the plane they fall on. Then proceed recursively, slicing until you run out of primitives.

Although BSP trees were developed for use in graphics, and were made infamous for that purpose by e.g. Quake, they are not very popular today in this general form. One problem is that refitting a dynamic scene is difficult. Another is that, like grids, primitives can straddle the splitting planes, needing to be split.

\(k\)-d Tree

A \(k\)-dimensional tree (\(k\)-d tree) is basically a BSP tree with the restriction that the planes must be axis-aligned. This resolves some of the major problems of BSP trees (to the extent that people who say "BSP tree" in graphics usually mean "\(k\)-d tree"^[6]).

For example, to separate coplanar polygons, a vanilla BSP tree must pick a splitting plane not belonging to any of them. This goes against the typical build algorithm of a BSP tree, and at the very least it requires potentially expensive detection of this case. Whereas, a \(k\)-d tree will always pick an axis-aligned plane, and will eventually be able to separate at least some of the polygons. The node size is also smaller, too, because the plane's orientation is simpler.

\(k\)-d trees were once very popular in high-performance raytracing. In the scientific literature, \(k\)-d trees and BVHes (see below) were in competition. According to my source, BVHes had a low build time but were less performant to raytrace, while \(k\)-d trees had a high build time and were fast to raytrace. Whether this is true or not, BVHes eventually pulled ahead, and are now preferred.

However, the \(k\)-d tree has not been forgotten, and it remains useful for related rendering tasks. The main application today is for storing photons in photonmapping (and related / descended techniques), because \(k\)-d trees support efficient \(k\)-nearest neighbors \(k\)-NN lookup^[7].

Bounding Volume Hierarchy (BVH)

A bounding volume hierarchy (BVH) is by far the most popular basic category of raytracing acceleration structure. As you'd expect from the name, a BVH is a hierarchy of bounding volumes. Each node comprises a bounding volume and a list of children, such that the node's children are bounded by the node's bounding volume.

The idea is best explained from a bottom-up approach, by example: you wrap each primitive with an axis-aligned bounding box (AABB). Then you group pairs of nearby and potentially overlapping AABBs into a new (probably larger) AABB. You continue upward until you only have one AABB, the root. At runtime, starting from the root AABB, you check if the current AABB is hit. If so, which of its children are hit, and so on recursively until you get down to primitives.

It doesn't technically have to be AABBs (for example, bounding spheres are somewhat common). However, AABBs are basically ubiquitous for raytracing purposes because they are very fast to test rays against using the standard ray / box algorithm.

Indeed, BVHes are very flexible. It's fairly easy to collapse nodes: nodes with \(4\) children are common, and more (e.g. \(8\)) are well-known too; you can pull them out and test them with vector hardware. Precision reduction has also been a popular approach, at least to research. BVHes can even be combined with \(k\)-d tree techniques in hybrid datastructures, as in my paper here.

A major advantage of BVHes is that they work very simply with animated scenes. You can keep the existing tree topology and simply recalculate the bounds above it. Though, tree rotations (below) are often also applied to maintain quality.

Constructing the Tree

For constructing the tree, there two popular methods. The older is the surface area heuristic (SAH), which basically tries to minimize the surface area of grouping nodes together, on the theory that an AABB with a smaller surface area is less likely to be hit, excluding more rays with less work. SAH usually produces a high-quality tree in a moderate build time.

The other popular method is LBVH, which sorts primitives by a 3D Z-order curve and then combines them partitioning that order (for more technical details, the best explanation I've found is on an NVIDIA technical blog). The Z fractal tends to group spatially close objects together, so combined nodes do tend to be smaller. The LBVH doesn't give as high a quality BVH as SAH, but it is easy to parallelize and simple, so it is very fast to build and maps nicely to the GPU.

Both methods can be improved by tree rotations, basically small permutations of the tree. For example, a good way to make a BVH is to start with an LBVH, and then use the SAH to drive tree rotations and node merging, improving its quality. As another example, when animating, the bounds change, degrading the quality of the tree, but tree rotations can be applied to spruce it up again.

One potentially non-obvious design consideration is that the tree should be unbalanced. The usual way a scene works, both artistically and in real-life is you have a few primary shapes, smaller secondary shapes, and then even smaller tertiary shapes. That is, detail is concentrated. Most of the rays hit low-detail objects, so it makes sense to optimize for that case: that is, the tree depth in those areas should be shallow. The consequence is that the tree depth in regions of detail is higher.

Other Approaches

Ray Classification

Rays can be described by their 3D origin and 2D direction, forming a 5D space. The problem, then, is to find the set of objects a particular 5D ray can intersect. In the original work by Arvo and Kirk, they subdivide that 5D space (with what I take to be^[8] a \(k\)-d tree), so that queries of rays return sets of candidate objects. An advantage is being able to exploit the similarity of (especially primary) rays.

It is not clear how well this method has stood up over time. Probably the sets get unwieldy with the large number primitives in modern scenes. The algorithm can also be considered similarly to beam tracing techniques, but as a preprocess.

Potentially Applicable Datastructures

B-Tree Variants

: a B-tree basically generalizes a binary search tree to multiple children. A B+ tree pushes the leaves out of the tree itself, which seems like a good idea to reduce tree traversal cost. UB-trees improve further with a Z-order curve. B*-trees change splitting behavior. B-trees are balanced, though, which as above is disadvantageous.

R-Tree Variants

: an R-tree is basically a B-tree with a spatial meaning similar to a BVH. (As above, B-trees are balanced, however.) Hilbert R-trees take similar ideas as LBVH, while R*-trees take after B*-trees. X-trees and R+ trees try to avoid overlap of child nodes (a problem in higher dimensions), the latter by allowing insertion into multiple nodes.

M-Trees

appear to build a tree structure using a distance metric directly, and are optimized for nearest neighbor lookup.

Notes

There are a ton of resources. For example, Jacco Bikker has an excellent series on BVHes. There's also one from Scratchapixel. PBRT 3 has a whole chapter (4) on acceleration structures, especially BVHes, from an implementation-driven view. A more accessible overview is here. Wikipedia has been cited throughout. And then there is the copious graphics literature itself. Relevant PhD theses include Ingo Wald's and Thiago Ize's.

[1]	The sentiment is old in computer graphics. Experts quote it something like "raytracing is the future and ever will be" (e.g.), though an exact quote from Kirk reads "Ray tracing is the technology of the future and it always will be!", calling it an "old joke".
[2]	Contributions in this area would help me do better. The best thing would be a link to the primary paper and a (brief, like a paragraph) summary of it. Especially looking to describe BVH variants.
[3]	It's a good exercise requiring both compute and memory optimization. However, it's pretty frustrating to come up with an 'optimized' implementation and find out that it's slower. That's true with any code, of course, but the upfront implementation cost for acceleration structures is high.
[4]	There was an API called OptiX Prime, too, now to my understanding deprecated or gone, which used CUDA-like technologies to just intersect rays from the CPU. It was elegant and convenient. Asking an NVIDIA insider (I think it was the late Steve Parker), it was explained that the API was removed because it's actually a bad idea: the PCIe bus and CPU simply can't keep up with the ability of the GPU to intersect rays, so if you're going to bother with the GPU, you really should do all the rendering there.
[5]	Note you should use conservative rasterization, for which hardware extensions now exist.
[6]	As Thiago Ize observed in his thesis (§2.1.2 pg. 5), "Although there are many papers on ray tracing [that use the term 'BSP'], none of the papers we found actually uses arbitrarily oriented planes!".
[7]	The \(k\) is different between them, mind. Alas, it is the standard letter.
[8]	As above, this is a case of their being labeled as BSP trees^[6]. In the paper, it is a "binary space subdivision", yet one that also involves "cycling through the five axes".