|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
CS 6620.001 "Ray Tracing for Graphics" Fall 2014by Ian Mallett (ian AT geometrian.com)Welcome to my ray tracing site for the course CS 6620.001, being taught at the University of Utah in Fall 2014 by Cem Yuksel. You may have been redirected from an HTML file because (though web designer I am not (how dare you)) I prefer PHP, with which the rest of my site is built. Ask me in person how it works. I'm told it's hilarious. Welcome fellow students! I have lots of experience ray tracing, and with graphics in general, and so I'll be pleased to help by giving constructive tips throughout (and I'll also try very hard to get the correct image, or say why particular images are correct as opposed to others). If you shoot me an email with a link to your project, I'm pretty good at guessing what the issues in raytracers are from looking at wrong images. Hardware specifications, see bottom of page. Timing information will look like "(#t #s ##:##:##)" and corresponds to the number of threads used, the number of samples (per pixel, per light, possibly explained in context), and the timing information rounded to the nearest second. Without further ado, the projects, grouped below by number:
I am choosing to use my own code entirely. This is mainly because my graphics codebase is tremendously extensive. As a note to graders/TAs/other-people-with-access-to-the-source, I will try to provide all of the code needed for the code to run, and while it is quite organized, I might leave something out. I used tinyxml2 instead of tinyxml, which is cleaner in my opinion. It's a bit tricky to get find resources on how to use it, but it's pretty foolproof. I had some trouble getting the matrix order to match what the file was expecting. In particular, the way the nested transformations is done is a bit awkward. For debugging, I recommend only drawing "sphere1" and "sphere2" and then work on adding "sphere3" later. One incorrect way to do the transformations is to switch the order you do the transformations for one hierarchy level versus another. If you've got it backwards for this file, you might get this (16t 100s 00:01:21): Here's a Python (with PyGame and PyOpenGL) script to help you visualize the scene and try out different transformation matrices: "prj1_draw.py". Press "v" to toggle whether you're in flyaround mode (where you can move around the scene with a simple camera) or in the render mode (GL view of what your raytracer should make). If you just want the short version, the correct (final, world space) matrices to use are:
If you do everything right, you should get the following result (maybe with jaggies, since this has 100 random samples per pixel) (16t 100s 00:01:16): Here's the depth image, with a minimum depth of 30 and a maximum depth of 55 (16t 100s 00:01:17):
I got excited and added physically-based BSDFs to my raytracer, but I had a hard time working that in with this Blinn-Phong nonsense. So, I had to restructure a bunch of stuff to get it to work. One interesting feature of my BSDF handling is that it handles multi-layer surfaces very naturally. Surfaces are supposed to be represented as at least one, and up to any number of, layers. For example, car paint might be modeled as a specular layer over a diffuse layer. The interaction of light with the layer (its BSDF) is baked into a 3D table and stored with the layer. The material itself is computed as the combination of these layers along all possible modes of light transport. I wasn't finished implementing all this when this assignment came up, though, and restructuring broke a lot of things. Point of fact, the implementation below uses a single layer precomputed into the 3D table. The table is sampled using trilinear filtering. Other than that, it uses Blinn-Phong (of various varieties). When one says "Blinn-Phong", it's hard to know exactly what is meant. These non-physical BRDFs mainly come from hackish approximations. My renderer, which tries to be physically based, doesn't always react in the way these non-physical BRDFs expect. In particular, there is the rendering equation (my explicit formulation, and the implicit formulation (which everyone else uses) is easily Google-able), which has a cosine term in it. This cosine term is actually the same cosine term you see in some formulations of the diffuse calculation. The diffuse BRDF is actually a constant; that cosine term is actually part of the rendering equation! Anyway, the term is important for all BRDFs, but for these hackish sort of things, no one cares. Plus there's "ambient light", which is really hackish . . . Here's what the Blinn-Phong model looks like with ambient light added directly to the pixel color and using the (correct cosine term) (16t 100s 00:03:01):
As I mentioned on the discussion board for this class, I decided to implement z-fractal order rendering. The aim of doing this is to shade pixels that are close to each other spatially in a manner close together temporally. By doing this, there's a better chance the rays will hit the same data and cached data in the CPU can be reused. Here's an example of the z-fractal in action: As it happens, making that .gif was an ordeal. A fifth thread outputted .png files of the current render at 10 fps (the render was also slowed down to allow this to capture it). To make the .gif, I first tried an ancient animation package. Its GUI had a bug that wouldn't allow it to import all 1074 frames conveniently. So, I tried Adobe AfterEffects, before I realized that Photoshop was actually the program that would output .gifs. Unfortunately, Photoshop places (as far as I can tell, a completely arbitrary) limit of 500 frames. After bumbling around, I finally gave up and tried an internet site (which timed out) and eventually downloaded ImageMagick. After installing it, I tried adapting a command I found, but the ImageMagick program "convert" aliases the name of a system program. So, I had to fully qualify the path to get a command like: <path/to/install/dir/convert> -delay 20 -loop 0 *.png zfractalrender.gifRegrettably, this didn't work because it got the frames out of order: apparently, ImageMagick was sorting files by length first, rather than number. I could have fixed the images' names, but the prospect of writing some script was something I really wasn't feeling. So, I changed the filename generation, ran the raytracer again, and generated a fresh batch of images (this time 1058). The result was 8-something MB (which was interesting; the first 500 frames outputted by Photoshop were 117KB, so at that rate, you'd expect 248KB or so). After some more stumbling, I applied the following command: <path/to/install/dir/convert> zfractalrender.gif -coalesce -layers optimize zfractalrender2.gifThat seemed to do a good enough job. Shadowing had actually been long since implemented, but I ran into one sticking issue reenabling it. My BRDFs are based on table representations, but these have significant discretization issues near the ecliptic plane. The problem is that my raytracer is designed to be physically based, so every sample is weighted by the rendering equation's \(\vec{N}\cdot\vec{L}\) term. Unfortunately, this conflicts with OpenGL-style shading, which doesn't. To get around this, I have an option that multiplies a divide-by-N-dot-L factor into the BRDF table. This is where the most objectionable issues come from; since \(\vec{N}\cdot\vec{L}\rightarrow 0\) as the light approaches the ecliptic plane, the table value blows up and kills precision. Fortunately, it seems that multiplying by \(\vec{N}\cdot\vec{L}\) is actually expected, so the problem is now moot. (16t 100s 00:04:24): I took it upon myself to finish up my microfacet BRDF code. It turns out that it's still not done, but the architecture required is now at least present. The only problem, I think, is how Fresnel is handled. I also basically rewrote my scene loader. It was horrifically written last time. This update allows it to check all parameters and ensure everything is loaded. Basically, it should find most errors, in a way that my older code very much wouldn't. The errors are a bit less clear since to do this with a reasonable amount of code there was some generalization (fancy C++ templates! Woohoo!), but I think it's worth it. I also resurrected some distance estimation rendering code. I fixed a number of simple problems with it and started getting some good results. Here's a level 4 Menger Sponge rendered at double the previous resolution (16t 10s 00:08:47): I had more difficulty with a Mandelbulb. The following image was rendered with too high a precision, and as such shadow rays couldn't escape (16t 10s 00:02:59):
Well, I did it again: I ripped apart my materials code and rewrote a lot of it. My last redesign got a lot of things right, but it wasn't perfect. This redesign still isn't perfect, but I think it's a major improvement yet again. The main thing that's lacking is support for multilayer materials and tabular BSDFs. Adding these will take some work, and I think it was a mistake to try adding them so soon. The solution to my architectural woes turned out to be a quite deep inheritance hierarchy (I wanted to say "mother of all inheritance hierarchies", and then I realized that that overloads the "parent"/"child" terminology that inheritance is already using—so you'd get only one inheritance hierarchy, heh). This is quite possibly the deepest inheritance hierarchy (and one of the largest) I've ever used in production code. "Material"s have a std::vector of "Layer"s. Each "Layer" has a "BRDF_Base" brdf, "BTDF_Base" btdf, and "EDF_Base" emission distribution function. The hierarchy for those is (bold is currently implemented):
I'm mostly happy with the way this is structured. There will probably be a few changes, but mostly it will stay this way from now on. I hope. One of the nice features about doing all this is that it lets me represent a huge variety of materials quite accurately. For the given images, for example, the material needs to be non-physical. This is implemented with "BRDF_Composite" combining BRDFs "BRDF_Lambert", "BRDF_BlinnPhong", "BRDF_IdealMirror", and "BTDF_IdealDielectric" as necessary. An important consequence is that my tracer supports images with nested indices of refraction (in the given images, the rays are actually "inside" an air material, which has an IOR of \(1.000277\)). Attenuation is also computed for all rays, although air varies too much to justify having a nonzero default. At this point I'm going to have to say it: I'm breaking conformity with the expected images. It's just too difficult/degrading to try to work in these nonphysical things. For example, from what I understood in class, the given images are supposed to be rendered with diffuse, specular, a delta transmission and a delta reflection—all at the same time? That doesn't even make sense! I worked it in nevertheless (in an additive sortof manner, which is the only half-sane way to even try implementing such a ludicrous BSDF), but I can't render it using a Whitted renderer because my system treats all BSDFs equally (I can't send a refraction ray and a reflection ray and sample the lights, but only do so for particular materials). I probably could fix that, but at some point I've got to draw the line at crufting a workaround to make something that doesn't make sense work. So, my Whitted renderer instead chooses an importance sampled ray that either reflects perfectly or refracts perfectly. This is the only way it is allowed to recurse. As for loading, Fresnel is disabled on all materials' layers, except if the layer contains a refractive BTDF. It's wrong wrong wrong wrong wrong, but that's how it is. I may consider breaking further in future projects and just using my tracer in all its physically accurate glory. Without further ado, here's the test image, sans diffuse and specular cruft on the spheres (16t 100s 00:11:42): It looks quite different (and the high-variance Fresnel makes it look worse. Oh well—at least it's not wrong. Well, it has the ambient light, I guess . . . Well that's depressing. On to better things. Like: It's time to resurrect my path tracing backends! Woohoo! I made this part a while ago, and now I think it's time to share. The box scene I immediately recognized as the variant of the Cornell Box by SmallPT. My renderer can't handle the walls being the insides of the spheres (it means air has a higher priority, and so they need to have negative priorities; this means that refraction out/intersection is meaningless for anything except one of the walls), so I worked out that the thing best approximates an axis-aligned box starting at \(<1,0,0>\) and going to \(<99,81.6,170>\), with camera position / center / up being \(<50,52,295.6>\) / \(<50,51.957388,294.6>\) / \(<0,1,0>\), an image plane of \(1.0\), and a vertical sensor size of \(0.5135\). I represented the walls and floor with triangles, but I made the ceiling a plane (otherwise the large spherical light leaks around the front, I think). Let's render it (16t 100s 00:11:02): I spent an embarrassingly long time on this one—although perhaps not, considering it was a subtle problem. The problem was that the importance sampling method for choosing reflection versus refraction rays was broken. I was selecting the reflection direction according to the reflection, but I was then setting the PDF for the entire ray to be that reflection. In actuality, this isn't importance sampling at all! Well, it kindof is, but not that way. I changed it to a simpler 50/50 PDF; then it made this image of just the refractive sphere (16t 1000s 00:04:55) (and a double resolution version, shrunk; view original to resize; (16t 1000s 00:20:10)): I was going to have my tracer render the larger version, but I decided that would be a pointless exercise. Instead, it's time to render something new. (16t 500s 05:43:30), maximum ray depth \(8\): The Stanford Lucy model is \(~28{,}055{,}742\) triangles. While trying to open it, Blender nearly crashed my computer by running it out of memory (my computer, remember, has 12GB of RAM). It also took me a while to find the mesh: the model is also spatially large; I scaled it by \(0.01\), and that made it at least fit on Blender's grid. I figured that \(28\) million polys was still too much, but when I tried Blender's decimate, it gobbled down so much memory so quickly that I had to hard reset. Meshlab is much better for this task. It, reasonably quickly, and without touching my memory bound (it looked like it was respectfully keeping its distance), decimated the mesh to \(10\%\), \(1\%\), \(0.1\%\), and \(0.01\%\). My renderer could probably handle the larger versions, but most of that detail is superfluous (the render above is \(1\%\)). Since this was such a long render, I had my tracer save its progress. It dumped \(12{,}288\) files (together \(18{,}874{,}368\) bytes) comprising chunked HDR image data in a temporary directory. Another thing that's striking is how much the maximum ray depth affects render time. You're not supposed to stop rays; theoretically a ray could bounce forever. So you use Russian Roulette to kill the rays. The way it works is you kill each ray with a certain probability each bounce (say, one minus the reflection coefficient). Thus, the ray will always stop after some point, but it is a theoretical unbiased estimator for the entire space of infinitely many reflections. Setting a maximum depth adds bias, and is technically wrong. The images for this project used up to eight bounces. For diffuse interreflection, this is often plenty. The above image was rendered using naïve path tracing—which means the ray bounces around until it just happens to hit the light. Clearly, this is inefficient. The Whitted ray tracer was getting results in seconds for similar scenes that were much better looking in many ways (there wasn't any radiosity, but there was very little variance). The common solution is to use explicit light sampling, where at each step you send a ray toward the light source directly. The problem with this is that for certain kinds of objects (e.g. perfect specular surfaces) the chance of that light ray being along the perfect specular direction is literally \(0\). One consequence is that caustics from perfect specular surfaces are not sampled at all. This has been a longstanding issue for my tracer, but it can be corrected by treating delta function surfaces individually. I hadn't done this before because my tracer was really only originally built to handle real BRDFs (which aren't delta functions). It's a surprisingly easy modification to try to do (although you need to be a little careful to get it right). Here's Lucy again, with just \(15\) samples per pixel (16t 15s 00:04:38): Of course, the point of this scene was to have pretty pretty caustics. Part of the problem is the light; it's too large. The other problem is undersampling (See those bright cyan flecks in the shadow? Those are supposed to be largely smoothly-varying refracted light). Although my explicit light sampling code helps, this is really a job for bidirectional path tracing or photon mapping—so that snazzy image will just have to wait a little more.
To properly implement bidirectional path tracing (BDPT) (which was one of my stated goals prior to taking this class), I need to be able to sample points on the image plane (currently, points on the image plane, and in the optical geometry of the camera, generally, are privileged). To do this right, the camera—all of the camera, not just the lens—should be part of the scene. It's okay if the rest of it doesn't interact, but, for example, all the lenses should be modeled with first-class geometry. This will also have nice effects (e.g. physically based lens flares!). The reason this is important is a sampling issue. While I could implement BDPT with the "eye" point being at the lens, this is stupid since the BTDF of the lens is a delta function—any light path point you try to connect to the eye point will have a contribution of zero. This makes it, for example, useless for better sampling e.g. directly viewed caustics on a diffuse surface—which by the way is a major attractor for implementing the algorithm in the first place. To do all that I should implement the lenses as CSG objects, which my raytracer doesn't support. Yet. After implementing CSG, then I'll try to implement BDPT. The point is, all this will take time, and since I've already had triangle-mesh support and a good acceleration data structure for a while now, I effectively have two weeks to implement it (I also want to leave some time to get ahead, since the project after that is texturing, which my tracer has literally no support in any way for currently). These changes will probably break everything for a little bit, so I want to get these updates out of the way. If I can produce snazzy images of the changes I'm making, I might edit in updates as I go. Until then, you might check out the previous project for pretty pictures. Here's the given scene. Again, the refraction and reflection are not using broken BRDFs. Not too broken, anyway. Images are (16t 1s 00:00:08) and (16t 10s 00:01:20), respectively. Both have been shrunk; view originals as desired:
Recall that these updates are mainly lame placeholders that satisfy requirements while I'm implementing several large new features. See project four for pretty pictures. Recall the image from project five. This image rendered in a matter of seconds using my BVH implementation. Rather than wait for my tracer to finish tracing without an acceleration datastructure, I'm going to repost the \(10\) samples/pixel image from project four and give an extrapolation from what my renderer predicted the total time will be without the datastructure (based on rendering \(~0.50\%\) of the image). In reality, since the pixels that were rendered for the extrapolation comprised only simple shading, the timing would probably be somewhat longer. My BVH is extremely dumb. Algorithmically:
I also implemented an octree, which is substantially smarter (although it's a worse datastructure). You basically just add objects into it and it subdivides itself iff it can separate at least two elements in a node. There are some tricks to make the building algorithm run in linear time. In retrospect, I think the renders below used a maximum depth of \(5\). The project five image (rendered using a BVH):
Here are some heatmaps of the acceleration datastructures running. "Scalar" is the amount the visualization parameter is scaled before the range \([0.0,1.0)\) is converted to the heatmap.
Of note is that the acceleration datastructures are brute force in the sense that they don't terminate early if a ray-primitive intersection is found. This is terrible. After doing some other things (see project \(6.5\)), I came back and fixed it. The octree maximum depth here is \(10\). Here's some revised pictures:
I later made a tweak to both datastructures that should improve performance. It's not worth redoing the above experiments.
I know I said I would implement certain features first. I didn't. First: photon mapping. I coded it up pretty quickly, but getting even half-passable results was difficult. I had a bunch of problems with the kNN (k nearest neighbors) photon sampling, mainly because `std::priority_queue<...>` is stupid and useless, and the `std::` functions for heap operations work oddly. My very first photon mapped images (lots of problems, \(10^5\) photons and \(10^6\) photons) (16t 1s 00:00:20), (16t 1s 00:00:29): After I fixed a problem with the priority queue, it seemed to be working decently. There's obviously still some problems though. Resized images with timing information:
Part of the way I do graphics—or really anything—is to stumble around until I discover things. Then I push on these until I understand what they do and why. I'm a fast and good enough coder that this is almost as fast as just looking it up—but a whole lot more fun and lasting in memory. Of course, I also look stuff up occasionally—if only to check my progress. Having pushed on photon mapping a while, I have learned the following:
The main thing is the first point above. When fixing that (and a few other glitches that made their way in during refactoring) one gets (16t 10s 00:05:10) with \(10^6\) photons and \(100\) in the gather. Note: the photon map was cached for this render, so it didn't need to be generated. Alright, well that's nice. Time for some more interesting caustics! Let's revisit Lucy. In retrospect, one of the reasons the refraction project's Lucy looks weird was because her base was sticking out the bottom of the scene. Because of refractive priorities, rays could escape. I'm not sure of that, but just in case, I moved her up \(0.3\) units. Since last time, I also improved the way triangles are handled: there is no transformation cost anymore. I could have sworn I rendered an image (16t 10s 00:14:54) with a low resolution version of the model and cached photonmap, but I can't find it. My conclusion: Blech. Lucy! You need stronger caustics, girl! A smaller and more powerful light and less absorption in its spectrum should help. Also, I realized that the radiosity is actually being calculated twice: once by the path trace, and once by the diffusely reflected photons in the photonmap. I tweaked the photon map so that it doesn't continue any trace that hits a diffuse surface—so this is only a "caustics" photon map. This also allows one to use more photons, since the only photons that have any cost for rendering are the ones that hit Lucy (and those are the ones we want!). I tried all this, and I still couldn't get really visible caustics. Short of shooting a light through her face right up against a wall, I don't think this model is going to generate anything. Boom. Bunny. \(100{,}000\) photons, \(10^5\) gather (16t 10s 00:01:30), including photon map generation this time: My .obj file loader, which was one of the oldest parts of my graphics library, having gone through several major revisions/rewrites basically unscathed, hadn't aged well. In particular, for large files it would take a long time to load. This was mainly due to two related factors. First, the file was loaded into a `std::list<...>` of lines, incurring an allocation cost and the list overhead for each. Second, when the stack was popped, this list had to be deallocated, which could take on the order of minutes! Clearly, unacceptable. So, Friday 26th and Saturday 27th I rewrote the parsing code, mostly from scratch. When it loads the file, it copies the data into a flat buffer with embedded metadata that implements a linked list. This required a lot of pointer arithmetic, `reinterpret_cast`-ing, and even revealed a compiler bug. When reading, there's no real indirection that happens; you're just skipping around a flat buffer. Once this datastructure was built, it actually makes the process of doing the actual parsing easier. To incorporate the new structure, I had to rewrite my parsers for ".stl" and ".obj" files. The ".obj" one in particular was painful, but I implemented some sophisticated features. For example, it only stores unique vertex data and unique vertex records. The old version tried to do this, poorly. This time, there's some fancy precached hashing going on that allows duplicate marking to proceed efficiently. After profiling out a few mostly minor optimizations, the average load time for \(1\%\) decimated Lucy (\(140{,}311\) vertices and \(841{,}668\) triangles) is \(0.4162\) seconds and \(1.248\) seconds, without and with duplicate removal, respectively. For my production code, duplicate removal is enabled. This is a massive speedup from previously. I reworked a bunch of the photonmapping code. The main part had been written at stupid-'o-clock in the morning, so the architecture needed a reboot. I added some snazzy (mostly) cross-platform font coloring and reworked the integrators' rendering phases into distinct queues. I also changed the minimum sample to \(8\) (Jensen's default) and tweaked the algorithm so that it would only render one diffuse bounce (i.e. no radiosity). The result (in a little under \(40\) minutes with eye and light specular depth \(64\) (sorry, no precision timing available)): I strongly suspect that the slowest part is the photonmap traversal, since a Whitted render of the same scene to the same eye depth only takes a minute or so.
I decided to sort out all the problems in photonmapping, and I learned more things. Here are some of the bugs I found and killed: Lucy, (16t ?s 00:12:35): Here's a visualization of an issue related to sampling the KD-tree. The photon adding was slightly messed up. Here's the raw photon map and then different maximum radii. Timing unknown, but pretty fast: I discovered that the sampling of the photons wasn't what I wanted. Not necessarily incorrect, mind you (although I didn't work out whether it was or not), but just somewhat strange. I altered it to be more logical. One nagging issue for the past renders was that the photonmap looks "too dark". I thought about it, and realized that the easiest way to (dis)prove this is to render the caustic from a transparent object with the refractive index of air using photonmapping. So, I implemented an AABB primitive (it was mostly done, but rendering them wasn't quite working right), set it to be completely transparent, and 'lo, the caustics were indeed too dark! I quickly traced it down to a missing factor of \(2\) when calculating the flux (my emissive materials emit on both sides). I played around with it a bit, but the first one I actually let sit for a good long while produced this simply beautiful render in about \(40\) minutes with 50 samples: Here's a much larger version (1920x1080) rendered using the same data (16t 50s 02:25:10). Be sure to view the original!
I have been hard at work on a paper for i3D 2015. I did implement the assignment, mostly, but it's not at a point where I'm happy with it (treating it like a BRDF is kindof weird, and also there's some difference in the scene I need to figure out). I'm going to defer snazzy pictures until next time, and in the meantime I'll leave you with some texture images from some of my previous work (lots more images at that link!).
Our paper deadline got extended by two days, so I pretty much lost all last week (since I was up until early Friday, and that day wasn't very productive either). We did submit our paper though, and I've been playing catchup ever since. Hence, I don't have anything much fancy to show. I wanted to implement reconstruction filtering, and I got part of the way through implementing adaptive subdivision, but my brain wasn't having it yesterday, and I have a "graphics lunch" presentation to prepare for on Tuesday besides. Here are some pictures. I removed the background for clarity. The left sphere is supposed to be glass, but I didn't feel like replacing it with my physically based glass material. The right sphere is another delta+Blinn model that doesn't make sense in real-life/my framework. When I implement multiple layers eventually, I'll be able to do something similar, but it will be correct. Since I didn't provide new images last time, I chose the checkered BRDF instead of the delta. The variance is calculated correctly (i.e., as an unbiased sample variance, not as a biased variance or standard deviation or something). I found \(0.001\) as a variance threshold to be reasonable for this scene. I'm updating my notation for timing to split the minimum and maximum number of samples iff they are different. Basic render (16t 8s 00:01:01): You can see little speckles in the adaptive sampling and in the sample visualization. These correspond to places where all \(8\) samples just happened to be the same, but really more samples should have been used.
I've had physically accurate depth of field implemented for a long time. In fact, my tracer supports any number of optical elements lined up in front of each other (unfortunately, these elements are not first-class tracing objects, so effects like chromatic aberration from the lens(es) aren't yet possible. It has been on my TODO list for a while to support arbitrary primitives with CSG). I instead used most of the time getting my sampler working the way I wanted. Last time, I had some nasty cruft of an implementation of adaptive sampling, which worked, but not well enough to my liking. I reimplemented most of it. Now, it is an adaptive sampler, with adaptive subdivision and n-rooks (Latin hypercube) sampling. Correct reconstruction filtering is, unfortunately, still not implemented—although I did make a sample visualizer to help ensure correctness. The uniform random generator is pretty bad, but it's what I had been using. Interestingly, I found Halton to actually be worse. This happens because for the low minimum number of samples you shoot initially, the Halton pattern doesn't vary among pixels (at least in my implementation) and is not low-discrepancy enough in general. I choose n-rooks because, like the other two, it is simple to compute and easy to implement—but also, it has excellent randomness. Even with four samples per pixel, I am getting great results. Parallel to all this, I also profiled my code pretty hard. The bottleneck has been figuring out which surfaces the ray is inside. Since each object has a refractive priority, this really amounts to a priority queue. The class that handles this was fairly simple and inefficient. So, I hacked it apart and rewrote it so that it only allocates dynamically if it must. This cut the bottleneck from something around \(80\%\) of the rendering time to more like \(5\%\). My tracer is much faster now. This gave me the idea to optimize the memory (de)allocation patterns of the framebuffer. This has been a major problem: for a static allocation, the framebuffer is the number of pixels (~\(10^6\)) times the worst case samples per pixel (\(n \cdot 4^{maxsubdivs}\)). Since each sample requires at minimum a position and a color (which could have more than three channels, of course), this gets in the gigabytes sort of range. Conversely, using a dynamic allocation requires allocating each sample, each of which has a small dynamic array. This allocation pattern is semi-pathological for memory units, which prefer fewer larger allocations. It's not really a problem on modern operating systems, but for memory checkers and debuggers, it's pretty bad. A smarter thing to do is to allocate each pixel with placement new, but this really doesn't solve the fundamental problem of needing to allocate a dynamic number of samples. The first thing I tried was giving each pixel a static array, and then transitioning over to a dynamic array if too many samples were needed. This helps a lot, but it's still pretty crufty. For example, since in adaptive subdivision, you fill each given area with a certain number of samples, you actually know whether an area will need the dynamic store (is the number of samples greater than the preallocation size?), and moreover this is the same for every pixel! So I rewote the memory allocator for the framebuffer to be a stack-based thing. It allocates pages of several megabytes at a time, and then parcels out pieces of them in a LIFO manner to new allocation requests. It makes things a whole lot faster, even if it's not completely done yet. Notice that we get free locality, too! Here are some pictures from various points in the story. Very first adaptive subdivision image visualization (16t 16s 00:03:54):
Here are some more renders, made after fully fixing the memory allocator. Allocation and deallocation time is negligible! The sampling is still minorly incorrect. I had wanted to fix that and add depth of field, but meetings ran long on Tuesday. (16t 4,^2s 00:01:07):
My renderer has supported correct glossy reflections, complete with correct importance sampling, for a very very long time. In fact, the difficult part heretofore has been not using them (much)! The reflections are based on a Phong lobe and can be varied so as to conserve energy with this nasty formula I once derived for a paper:\[ \begin{align*} \gamma &= \arccos\left(\vec{V}\cdot\vec{N}\right)\\ T_i &= \begin{cases} \pi & \text{if } i=0\\ 2 & \text{if } i=1\\ \frac{i-1}{i}T_{i-2} & \text{otherwise} nd{cases}\\ F_o&=\frac{1}{n+1}\left[ \pi +\cos(\gamma)\sum\limits_{m=0}^{n-1}\sin^{2m }(\gamma)T_{2m }\right]\\ F_e&=\frac{1}{n+1}\left[2(\pi-\gamma)+\cos(\gamma)\sum\limits_{m=0}^{n-1}\sin^{2m+1}(\gamma)T_{2m+1}\right] \end{align*} \]In the above, \(\gamma\) is the angle \(\vec{V}\) makes with \(\vec{N}\) and \(n\) is the (integer-valued) specular exponent. \(F_o\) and \(F_e\) are the normalization terms for \(n\) odd or even, respectively. This allows the following images to exist; without them, energy would be lost as the specular lobe approaches the ecliptic. People generally consider this loss to be good because it models shadowing in a microfacet BRDF. Unfortunately, it ignores multiple scattering. The correct answer is somewhere in-between. These images have been resized by a half; view the originals for full detail! Specular exponent \(n:=3\); (16t 16,^2s 00:03:26):Specular exponent \(n:=30\); (16t 16,^2s 00:03:49): Specular exponent \(n:=300\); (16t 16,^2s 00:02:45): Specular exponent \(n:=3000\); (16t 16,^2s 00:01:20): Specular exponent \(n:=30000\); (16t 16,^2s 00:00:55): Specular exponent \(n:=300000\); (16t 16,^2s 00:00:51): Specular exponent \(n:=3000000\); (16t 16,^2s 00:00:51): The last one, in particular, took a long time to set up. This is because the computation of the normalization factors is not memoized, and therefore \(O(n^2)\). Notice that, because we're normalizing, these images converge nicely toward a delta function ignoring the geometry term (and each reflection has the same energy—a handy feature). The edges are very faintly darker too; this comes from using a correct camera model. The other thing to notice is that triangle lights are supported. So are spheres. Other implicit surfaces (e.g. the fractals from project 3) can't be lights because there isn't a nice way to find their total radiant flux (which I require of all lights). Planes are nonmanifold and infinite, and so would emit infinite flux (which is a problem for light-emitting algorithms). I do support axis-aligned boxes, but I'm having some issues integrating them fully. I just need to give them a bit of attention, so I'm omitting the (incorrect) renders that show these objects being used as lights. These renders needed to use naïve path tracing (i.e. no explicit light sampling), since every single triangle is an individual light source! You could sample toward a given triangle (light), but a lot of the time it would be occluded by another light and so would be in the shadow of the first light. Explicit light sampling can't reuse the shadow ray to do lighting coming from the shadowing surface (since that would be biased), so a lot of rays would be wasted. Instead, the ray just bounces randomly until it hits a surface, adding from the surface's emission function as necessary. This works wonderfully since the new ray direction is importance-sampled. The other main thing I did was implement motion blur like I said I was going to. My implementation operates by interpolating transformation matrices, which is horrible but seems to be de rigueur. My implementation supports a stack of matrices, interpolated as frequently as you like, but the loader only supports two (the endpoints of the interval). Only a square-wave shutter is currently implemented. Here's a simple test. The timing is what I remember (16t 4,^2s 00:00:47): I loaded an airplane model (I finally found one that's kindof okay from here). I had to manually adjust the .obj file's .mtl file export, since it got all the textures wrong. Also, glass has two sides, so I had to fix some normals and extrude the canopy inward. I figured out a nice camera position in Blender, but it took me several hours to figure out how to map it correctly to our scenefiles. The algorithm is to set the camera in "XYZ Euler" and then write transforms in your scenefile that look something like this: <rotate x="1" degrees="90"/> <!-- Negative of "Location" --> <translate x="-0.73644" y="1.39943" z="-0.55620"/> <!-- Negative of "Rotation"'s z-component --> <rotate z="1" degrees="-37.463"/> <!-- Negative of "Rotation"'s y-component --> <rotate y="1" degrees="32.076"/> <!-- Negative of "Rotation"'s x-component --> <rotate x="1" degrees="-63.645"/> <!-- [Object Follows] --> Part of the problem was that Blender, like many other modeling softwares use \(\vec{z}\) for up, which is morally wrong (TL;DR: If \(\vec{y}\) is "up" in 2D (clue: it is), then it should also be "up" in 3D; it's demented to redefine it to mean something literally orthogonal to its original meaning when you simply add a new coordinate). Several tests were rendered, but the first of the airplane with a significant of samples looked like this (16t 4,^2s 01:19:11): To get the motion blur on the propeller, it was exported as a separate .obj, and also given a rotational blur of \(30^{\circ}\). Unfortunately, this winds up being too slow relative to the rest of the airplane. The canopy is a third .obj, but the .mtl material is overridden with my own physically-based glass material. There's full environment mapping here. There's a fill light off to the side, but much of the illumination comes from the environment, (which is importance sampled, as all lights in my renderer can be, of course). You can see this on some lighter parts of the aircraft, such as the wing. The most objectionable artifact is the propeller rim, which was caused by an "int" overflowing from an extremely strong specular highlight. A simple fix. The rest was more problematic. I even had dreams about it. I eventually figured out that the transform to object space doesn't work for linearly blended rays (it actually magically becomes nonlinear), so instead you need to do the intersection in world space. I'll think about it more later, but doing it this way at least fixes the nonlinear effects (like the doubled roundrel insigne on the side of the airplane). I also patched the loader to be able to handle any number of interpolation points. After fixing some glitches (which took way too long) I got this simple test (16t 2,^2s 00:00:38): This uses three points of interpolation (so two steps). With that, it's time for another render. Here's the airplane again, but with only one sample per pixel. I tripled the speed of the propeller (up to \(90^{\circ}\)) using the newly improved rotations (using eight points; seven steps). (16t 1,^2s 00:00:32): For the next render I adjusted the sample count and resolution and halved the linear velocity. This render is resized by half. View the original in full HD! (16t 16,^0s 00:42:25): More samples! This render is also resized by half. View the original in full HD! (16t 32,^0s 01:24:07): Bleh . . . I think these used my Whitted renderer instead of path tracing with explicit light sampling . . . Here's the actual assignment, using ELS path tracing (16t 16,^0s 00:02:17):
Like many of the previous projects, I've had indirect illumination implemented for a long time. I suppose I don't actually have a special configuration set up for just the single bounce expected, but that's because it's sortof trivial and boring. My implementation does full path tracing already. So, I spent this project working on improving various other features of my raytracer. First, I implemented spectral rendering. Spectral rendering (where each ray additionally has a wavelength associated with it) is fairly straightforward to think about, but more complex to actually implement in a renderer that's based on non-spectral rendering. Fortunately, I had done a lot of the implementation already, and I just had to finish it up. The first problem I encountered after implementing it was that, when cranking up the samples, my framebuffer implementation couldn't handle it past a certain point. This is because the memory required can become ridiculous. For example, one of my test scenes (from SmallPT) for dispersion (an effect only possible with spectral rendering, caused by the index of refraction varying with wavelength) is \(1024 \times 768 \times 1000\) (XGA \(1000\) samples per pixel). With each sample being stored in "double"s and with associated metadata for each sample, even with only three wavelength buckets, it works out to \(29\) GiB of data! This clearly doesn't fit in my laptop's RAM. So, the first thing I did was rewrite the framebuffer implementation, again. This turned out to be surprisingly easy (although it took longer than it should have; I kept procrastinating because I thought it wasn't going to be easy). The implementation is now cruftier than I'd like, but it works. I also fixed the black background issue (wasn't setting alpha) and implemented reconstruction filtering (currently supporting box, Mitchell-Netravali, and Lanczos windowed sinc). Here's a rendering in progress. The checker pattern is untouched pixels, the magenta pixels are pixels being sampled, the yellow pixels are pixels that have enough samples, but haven't been reconstructed, the green-overlaid pixels are reconstructed pixels awaiting deallocation of their sample data (so that everything fits in RAM) and the other pixels (that are the rendered color) are entirely-completed pixels: The implementation here is kindof protracted. Since the deallocation of a pixel's samples can't be done until all of the other pixels that depend on that pixel have been reconstructed, and since the reconstruction of a pixel's samples can't be done until all of the other pixels that depend on that pixel have been sampled, a naïve implementation would be ridiculously complex. The way I solve this is by delegating the reconstruction and deallocation to two different threads that constantly cycle over all the pixels in a scene. In practice, this eats a surprisingly small amount of CPU time, and the render threads never get far ahead. It's messy and suboptimal, but it works quite well in practice. Note: I shall not count these threads in the thread count for timing statistics. Here are some different filters on the scene for project 7 and 8. All were rendered using path tracing with explicit light sampling. Box (16t 4,^0s 00:00:42): Note that there is some faint ringing especially on the windowed sinc. From these results, I am selecting Mitchell-Netravali as the default filter. A side-effect of all this is that the deallocation is now taking a long time again. It will have to do for now. Without further ado, here's some with/without (cropped to rendered area) dispersion images. Note the extra bluish hue around the edge of the caustic:
Those are wrong. They aren't being scaled by the PDF of choosing the wavelength, and I'm pretty sure they aren't colorimetric. Colorimetry is the study of how radiometry is perceived. I have a tutorial on the basics. Long story short, after finding the radiant flux at a pixel, you convert it to CIE XYZ (defining how it appears) and then from there into sRGB (defining how to display it to get that appearance). I had already implemented a lot of this, but it took some work to get right. In particular, you need to be careful when combining colors: it's the radiometric spectra that must be averaged, not the final sRGB values. Also, since we're now no longer visualizing radiometric spectra, things that were "white" or "red" or whatever now are much more complicated. Here's a white plane rendered under a flat emission spectrum (i.e. constant spectral radiant flux from \(390nm\) to \(700nm\) discretized over six spectral buckets (no timing data available, but I think it was around a minute or so for each):
Also, notice that the plane appears somewhat reddish. I suspect this is because most emission spectra aren't constant—and blackbody radiation, which approximates many light sources (especially the sun and incandescent bulbs) reasonably well, emits more in the blue side. If anyone has some actual, reproducible data on how various surfaces look like under various simple emission spectra, I'd love to see it! I tried to set up a prism to get good dispersion effects. The following is the first one I threw a decent number of samples at (16t 1000,^0s 00:27:17), 16 spectral buckets: I let this render run to completion just for show, but it's really not worth much. Here's a rerender with narrower, nonreflective walls, and fewer samples (16t 100,^0s 00:01:38): It's time to improve importance sampling. Currently, Blinn-Phong BRDFs aren't importance sampled (I didn't know how) and importance sampling generally wasn't taking into account the geometry term. Blinn-Phong can be importance sampled by importance sampling a Phong lobe centered around the normal, and then using that as the half vector \(\vec{\omega}_\vec{H}\). When paired with the first vector \(\vec{\omega}_i\), you can find the second vector \(\vec{\omega}_o\). You need to be careful, though. The PDF you get from sampling \(\vec{\omega}_\vec{H}\) is not the same as that you'd get from sampling \(\vec{\omega}_o\)! You need to transform the PDF. I found this which states that the transformation function needs to be monotonic (which doesn't make sense in spherical coordinates, really!). I presume they meant "injective", in which case we're fine because the transformation mapping \[ \vec{g}(\vec{\omega}_i,\vec{\omega}_\vec{H}):=\vec{R}(\vec{\omega}_i,\vec{\omega}_\vec{H}) = -\vec{\omega}_i+2\left(\vec{\omega}_i\cdot\vec{\omega}_\vec{H}\right)\vec{\omega}_\vec{H}= \vec{\omega}_o \]. . . happens to be bijective in \(\vec{\omega}_i \times \vec{\omega}_o\) space! I worked on figuring out what the transformation should be for a while (way too long), but I couldn't get anything useful. However, it turns out that Physically Based Rendering has the final answer, which they derive through a trigonometric argument (pg.697–698) as:\[ pdf_{\vec{\Omega}_i}\left(\vec{\omega}_i\right) = \frac{ pdf_{\vec{H}}\left(\vec{\omega}_{\vec{H}}\right) }{ 4\left(\vec{\omega}_i\cdot\vec{\omega}_{\vec{H}}\right) } = \frac{ pdf_{\vec{H}}\left(\vec{\omega}_{\vec{H}}\right) }{ 4\left(\vec{\omega}_o\cdot\vec{\omega}_{\vec{H}}\right) } \] As far as the geometry term, the only BRDF I know where you can actually take it into account directly is the Lambert BRDF. Let me know if you know of others! Here's a table of renders from the various BRDFs showing the effect of importance sampling. The scene is a "perfectly reflective" plane with the given BRDF covered by an emitting dome. Note that the BRDFs are not in general "perfectly reflective". The Blinn-Phong model loses energy, and so does the Phong model without the correction I invented (see project 10). The Lambertian BRDF is simple enough to lose no energy.
Here's the first image of the assignment (16t 100,^0s 00:17:53): Here's the second image of the assignment (16t 100,^0s 00:05:29): Here's using the "outdoor" environment map (which I thought was from Debevec, but apparently isn't; the only place I could find it was here—which might just be some site mirroring someone's work for a profit, since I didn't have to pay for the original). I also changed the algorithm from pathtracing to pathtracing with explicit light sampling (which importance samples the environment map) (16t 100,^0s 00:05:29): This looks really realistic, especially from a distance—and it should! A real-world environment light is illuminating a Lambertian surface (reasonably realistic). The light is bouncing around potentially a large number of times, and then reflecting into a physically based camera. After being modulated by the camera's response curve, the signal is converted to CIE XYZ, which defines how it should appear to humans, and then modulates out an sRGB image, thus setting the pixels in the correct way to induce that appearance. The only ways this could be improved are by using more samples with infinite (or RR) depth, using DOF (there was none here), or by using measured BRDFs. None of these are very important for this scene. I spent a lot of Monday rebuilding a broken computer, but in the evening I had my computer render these tests:
The refraction for both of these renders used "Dense Flint" (N-SF66) (data available from refractiveindex.info. I picked this because it has a very low Abbe number and very high refractive index (criteria for maximum dispersion). For the no-dispersion case, a constant approximation is used. Here's a longer rerender I left going overnight (16t 5000,^0s 10:32:50):
My raytracer has been doing path tracing for a very long time. Moreover, the same for importance sampling. Gamma correction is subsumed in proper colorimetry; the image is being corrected for perception (using gamma-like functions, among other things) according to the updated sRGB standard. I implemented light tracing a long time ago, but it's not presentable now. I used most of the time to work on my final project (which is going to require some extra attention, since it's going to be complex). I used some of the time to steal^H^H^H^H^Hgraciously borrow an idea for a better implementation of a framebuffer from Will Usher (whose raytracer is very nice, and quite faster than mine). The idea is something I kindof thought of before, but dismissed: simply, each area keeps a list of samples while it's rendering (this is necessary to compute the estimator variance), but otherwise the general procedure is to splat the sample into all of the pixels it affects and then forget about it. The major problems with this are that the colorimetry needs to be evaluated once for each sample (instead of once for each pixel; in-practice this effect looks interesting while rendering) and that there's a penalty for atomicity that gets worse with more samples. I think this way is somewhat-less-wasteful than having dedicated reconstruction and deallocation threads, and it has the major, major advantage in my book of being simpler. I decided to set up just a few scenes, since I'm working on actually improving the thing instead. Here's a glossy Cornell box with a glass bunny (16t 64,^0s 01:49:05): One thing to notice is that everything looks all wonky. I realized in retrospect that this is partially caused by my BRDFs conserving energy. This has my normalization term that prevents the specular BRDF from losing extra energy, so bounces are quite strong. Try to think of this box as being made out of colored aluminum foil. The weirdest part is the bright area in the bottom left. This is light bouncing off the far side of the bunny, hitting either the wall or the floor, then hitting the floor or the wall, and then going to the eye—in other words, it's a double-reflected caustic. That's also why it looks symmetrical. Here's Lucy (forgot to bump her poly count). I changed the floor, walls, and ceiling back to diffuse so that it's easier to see what's going on, but this reduces the effect (16t 16,^0s 00:03:16): The model here is a higher-resolution one; Lucy is less polygonal, and so reflects less light in sharp ways. This accounts for the major qualitative difference (16t 512,^0s 07:17:26):
I was torn between rewriting my entire rendering pipeline or trying to fix the one I have. Since the one I have isn't actually bad, I eventually decided to try to fix it. I was successful, tweaking my implementation to support caustic and indirect photonmaps with a Monte Carlo final gather. I . . . might be holding back some other features for next week :D My implementation traces photons into a per-thread vector to avoid synchronization overhead, and then merges these vectors into a single flattened representation of a KD-tree. The tree is constructed to be optimally balanced. I discovered something interesting. Here's a low-photon count visualization of only the caustic map for the SmallPT scene modified to have a point light source (timing unavailable): There's that strange ring at the top. At first I thought it might be a double caustic (i.e. an image of the caustic produced on the floor). Some other website I found seemed to support this. However, since caustic map photons are terminated at diffuse surfaces (like the floor), this isn't possible. Indeed, removing the floor entirely doesn't remove it. I thought maybe it was a double reflection/refraction-ish kind of thing using the other ball, but it wasn't this either. I decided to visualize the light paths, so I wrote code that produced (timing unavailable): I've been working a lot on the final project (and I'm deliberately holding back some photon mapping images for that purpose). Here's a simple render of a sphere (timing neglects photon trace and build) (16t ?,^0s 00:06:01): Late Monday I got the idea to try to hack in a relativistic rendering. Here's the first result of relativistic photon mapping (16t 1,^0s 00:07:49): What this really needs is more photons, maybe some color in the reflections, and more samples. But, there's only an hour and a half left, so this is the last image for this week. For more photon mapping, check out project 6.5 and project 6.75.
First, my final results.
For many things, Blender is great—by which I mean it's a powerful tool in capable hands. Unfortunately, mine are not capable. To render the teapot shatter scene, I had to get a teapot. I used the standard one from class, but, since my renderer prefers glass to be volumetric (because, um, that's what it is in real life), I needed a teapot with actual thickness. I couldn't find one online that worked, so instead I painstakingly created a solid object that's not topologically dumb (and doesn't have source URLs baked into the surface as geometry defects (like one example I saw)). From there, I used Blender's massively crufty but still highly winning "Cell Fracture" tool to shatter my beautiful teapot into \(5000\) shards. This took a very long time (most people max out the settings at \(250\). I basically followed the tutorials (of which there are several; this video was the most helpful, but I sourced some tips from this series, whose user seemed to be more adept). There are various things I hate about Blender. Like many other modeling tools, it adopts the wrong \(\vec{z}\)-up convention. Also, it adopts the wrong gravity \(9.8\frac{m}{s^2}\). I could live with these failures if they were actually consistent. Turns out, they're not. In Blender, you cannot as far as I can tell vary the Physics \(\Delta t\). This makes it, for example, very difficult to render things in slow-motion. You can kindof distort time during the render. And you can scale everything up so as to make the forces look like they're acting more slowly, but these are broken hacks (that didn't work anyway). You can't disable gravity either. You can switch to "Game Mode" and set the gravity to zero, but this actually doesn't do anything. The accepted solution seems to be to add a force to counteract gravity exactly, so that gravity has no effect. This is not a solution. This is a horrific workaround to a broken system. Therefore, after shattering the teapot, I exported it as a .obj to my own physics system (also based on Bullet Physics—except, since it's not wrapped in a feature-deficient GUI, I can set important things, like gravity, to be correct (in this case, gravity is \(-9.80665 \frac{m}{s^2}\) and the timestep is \(0.005\frac{s}{frame}\))). How did I export? Slowly and painfully. The Blender .obj exporter works absolutely terribly for this, evidently. Blender started crashing randomly during the save (writing only six-hundred-some shards). I disabled the triangulate faces option (which seemed to improve things). However, my loader doesn't support non-triangular faces (because that would mean doing something ill-defined). So, I decided to do the triangulation in Blender. The triangulate method worked very nicely (to do this, I had to join everything into one object (fast) and then split it again by connectedness (several hours)). I then tried to export it, and Blender started crashing again. I was eventually able to export parts of the object at a time, but Blender couldn't get the back part of the teapot for some reason. Since Blender's crashes terminate immediately, I had to run it from a shell to be able to see the output (I didn't even know it had output, since it doesn't display the terminal unless you tell it to). The output told me to look in (on Windows) "C:/Users/IANMAL~1/AppData/Local/Temp/shatter2.crash.txt". Apart from a few lines of stuff (e.g. such-and-such a file. etc.), the only error within the file was this: bpy.data.window_managers["WinMan"].(null) = False # PropertyI know Python, so maybe I could have fixed the script. But no. No backtrace. After futzing around some more, I gave up and downloaded 3ds Max. Of course, "download" isn't the right word. You actually download a download manager with downloads an installer which downloads 3ds Max. Each stage of the chain is broken, bloated, and stupid. It failed several times before demanding I restart my computer before I'd even started installing. Four IDEs, a web browser, miscellaneous files and configured search windows, and a couple of uninterruptible processes running in the background? Oh sure, I'd love to restart my computer just for you. It's so considerate that you're making sure that the registry-entries-you're-writing-without-my-permission-to-install-rootkits-to-ensure-I-really-am-a-grad-student aren't conflicting with Windows Update. I'm almost morally obligated to sign over all my rights to you since even though this is my computer, it's still your application I'm running on it! Of course, after rebooting, the connection to the internet had been interrupted—which caused the install to crash yet again. So far, so good. Just the quality I'd expect for a \(\$3,\!675\) piece of software (or, through our subscription plan, only \(\$185\) per month!). At least it's "free*" for students. So I started the download again. Then I did something stupid. I closed the browser window (since it's 2014, and only idiots who never outgrew AOL do not understand downloading in the background). Except that Autodesk makes no kind of sense. Closing the browser tab not only aborted the download, it crashed the download manager. On investigation, I kid you not: there are scripts running on that page that actively kill the download when you close the window.After pkilling all the hung processes, I sat down and thought this through. Eventually, I derived a cryptographic proof—a real, honest-to-Zeus mathematical proof—that killing downloads is literally never necessary. You open a connection, two-way hashed identity verification, set up an encrypted tunnel. Blah blah. But, the simian standin for a coder that evidently wrote this cruft without getting fired will likely never know the extent of my fury. There was nothing to do but start the download again. This time, it crashed for literally no reason. I was about to start it yet again when I realized that somehow, it had put a start icon on my desktop. I figured maybe at least the application had installed, and so I tried it. Things then started happening very quickly. In about \(30\) seconds, the program started up, told me it was going to monitor application usage, told me it was checking my license and sent an email confirming my student license. Then the start icon disappeared and the program claimed it had encountered a problem. I figured some temporaries hadn't been deleted, so I set about deleting them myself and reinstalling. And then my computer shut itself down. I realized what was happening, and by sheer luck terminated an admin process that would have massively corrupted the filesystem just in time (seriously—two seconds later, and I wouldn't have been able to boot) (I lost several paragraphs of this text, though—about a page has been reconstructed from memory). This is the programmer equivalent of assault. A computer is your mental extension. It is the most powerful tool you own, and when you put applications on it, it's a matter of trust. When your computer does something unexpected because of what someone else did, it hurts. And spontaneous, forced reboots are the strongest such gesture. I don't know what happened, but I have my suspicions. Remember, at this point, the problem is still writing a bunch of triangles to a file. I'm (failing) to download a software package that costs as much as a used car just so that, somewhere within its \(6.72\)GB hulk, a subroutine can call "fopen" and write \(100\)MB of ASCII text. Four hours wasted, at this point. I finally got 3ds Max started. Then, it crashed on loading Blender's Collada export. So, I tried exporting .fbx from Blender (this crashed Blender). Blender can export .x3d, but 3ds Max doesn't support it. My first success came from .stl, but then each object is welded to its neighbors. I figured I'd have another go at Blender, but then I realized that some of the shards were split into nonmanifold pieces. So all the data was worthless anyway. After a few successful tests with fewer breaks and different settings (such as, in particular triangulating before shattering), I ran the shatter simulation again, which took another three hours. This time, the .obj export worked perfectly and I was able to quickly load it into my library. After some related tweaking, I was finally able to get a simulation I liked. To render the simulation, I needed to export the objects from my simulator, calculating different transforms for the start and end of each frame for motion blur. In addition, it had to export a scenefile for each frame, which was a refreshing bit of coding to implement. Getting my renderer to render frames based on calls from a Python script was also fairly simple. I tweaked the shatter simulation so as to simulate a supersonic shockwave from the bullet (which is simulated traveling at \(1250\frac{m}{s}\). The effect was originally just a simple lateral force added behind the bullet, which produces this: Time to render the first video that actually has a chance of being good. For this, I set up my script to have multiple parallel renders running in parallel. Also, the frames were being generated at the same time (I eventually terminated the simulation at \(501\) frames): I tweaked the refractive index on the shockwave (down to \(1.6\)) and tweaked the resolution to make it actual HD. I also bumped the sample count from \(4\) to \(64\). These changes had to be done in the renderer itself with nasty hacks, since the frames were already generated. The sample count and resolution changes drastically increase rendering time. The previous videoes took maybe \(23\) seconds per frame (half for loading). The first frame of the final version, by contrast, took \(9\) seconds to load and \(35\) minutes, \(7\) seconds to render! At this rate, rendering the other \(500\) frames would take \(12\) days, \(4\) hours, and \(38\frac{1}{3}\) minutes! To help, I enlisted my lab computer which renders (the admittedly much easier) frame \(200\) in \(17\) minutes \(54\) seconds. I also tried setting up my tracer on other computers. A render server for HWRT couldn't be used because the OS is too outdated and I can't compile besides. I couldn't use my undergrad's CS account computer for the same reason. To get the final result, I composited all the frames in After Effects and added some screen-space effects, including time warp. I calculated that the final render took around \(72\) CPU-core-days to render (with parallelism, total \(24\) logical cores on two computers, this worked out to be several days).
For the teapot glow scene, I quickly implemented a hackish volume integrator. The first okay result (16t 16,^0s 00:02:40): After a minor improvement to the math and a touch of depth of field (16t 4,^0s 00:00:46):
This one was essentially an accident. I was picturing a blue teapot with some ocean-wave-like caustics on a simple floor, and just wanted to see what it would look like. Since I had other renders, I figured it might be a nice addition. But, the first result was so amazing that I felt compelled to throw more compute at it. The light emits \(100,\!000\) photons, leading to about a quarter billion caustic photons being stored. Heavily compressed, this is still almost \(650\)MB! I had a hard time fitting this into memory—and, in retrospect, it probably didn't. I set up the final render around \(02\!:\!00\), thinking it would be done in the morning. When I woke up, it was only \(39\%\) done. I figured it would be the end of the day. But, the glass rendered absurdly slowly. I rendered the top part of the scene on my lab computer. Even so, the entire render took days. This was probably due to accessing the photonmap from disk multiple times for every glass pixel. In the end, there were a few incomplete scanlines that didn't get as many samples as they should have—but I called it done anyway. The final version is the original render with swapped red and blue channels. Here are all possible permutations (click to view full HD) (top left was original): Late the day before the competition (around \(03\!:\!00\)), I decided to try adding volume caustics, which I had been meaning to add but didn't. After some work, I got the caustics buffer, which handles single scattering (fourish hours render time):
The rendering competition is open to anyone who has taken the class in previous years. So I participated again. It is worth noting that the raytracer I used is a nearly-complete rewrite of the raytracer used last year. Describing the changes is inconvenient (as I have forgotten them), and also well-beyond the scope here. Suffice to say, the rewritten version is much faster, and capable in different, but nearly-a-superset ways. A few days before the competition, I was asked officially if I wanted to participate. I did, but there was a problem. I TAed the class this second year. So entering would be the TA competing against his own students. Cem acknowledged that that would be unfair, but didn't think it was a problem. By contrast, if I refused the invitation, I wouldn't be entering because I think my raytracer would trounce every student's. Which is not just a bit uppity. I finally resolved the problem by determining that I would enter the competition, but if I won, I would give the prize to the first-place student (I talked it over with the other two second-round contenders, but I don't know what eventually happened). A secondary (and perhaps more-important, if you think about it) problem was that I didn't have any scene in mind. The first scene I came up with was making a teapot out of teapots. This necessitated implementing instancing. Here's one of the earliest working tests of instancing. The teapot is stored exactly once. Notice that transformations are supported. (Timing unavailable.) From here, it is pretty easy to make the teapot out of teapots. I wrote a six-line Python script to parse the OBJ file and output a "recursion point" at each vertex position. (Timing unavailable.) (Click for full res.) The problem is that this is really boring. But then I got an idea. Remember the relativistic photon mapping one-off hack from project 13? In that scene, there is a single Schwarzschild black hole in the Cornell box. What if I made the entire teapot out of black holes!
You can actually calculate the orbit around a single source in closed form. If the black hole is not rotating (the case here), you can use that to find the curved path. Ultimately, I just used raymarching. For the black hole teapot, there's more than one source, so closed-form evaluation becomes impossible, so we have to use raymarching. There are several-thousand vertices in the teapot mesh, and ideally, we shouldn't calculate the gravitational interaction with each one, at each raymarch step. Instead, I came up with a "distortion volume". This is just a big, vector-valued, 3D table. You plunk it down in the scene and compute the gravitational interaction at each grid cell. Then, when raymarching, you do an oct-linear lookup. This (statistically consistent!) approximation works splendidly. Here's an early test (no volume yet) of recreating the original work (timing unavailable): At this point, I discovered that the raymarching I was doing was incorrect. While the ray's position was being updated correctly, the direction was not. Rays would always point in the same direction, even if they orbited around the black hole! When I fixed this, I got the more realistic: Using the forest environment map instead, I get (??t 81,^0s ??:??:??) (click for full res.): I added some basic postprocessing. I did two "despeckle" operations, changed saturation+80 / light+18, and then did another despeckle. This produces the final image submitted to the competition (click for full res.): During the break, I set the lab machine to rendering a much higher quality image with 1225 s/p. This produces (timing unavailable) (click for full res.): The major difficulty was in getting the transformations right, especially with the volume. Several times, I thought I had gotten it right, before I realized I hadn't. The recursion for instancing was particularly annoying. The BVH needs to be multi-scale and handle transformations gracefully. For speed, in the rewrite, I had removed all transformations from triangles, instead evaluating them at load time. This produces much-better BVHes, but it doesn't work once recursion is involved, so I had to undo it. Fortunately, the best of both worlds can be achieved. Identity transformations are optimized to no-ops, and triangle transformations are collapsed upward when possible. A hilarious problem was also discovered. I noticed that scenes were using way too much memory. For example, the shattered teapot model from last year only just barely was fitting into 12 GB of RAM (it doesn't even have a million triangles, IIRC). I attributed this to the fact that I was storing a vector of transform samples, each of which was three 3x4 matrices, for each triangle. But, once I had undone this (from above, so that instead each object (e.g. a whole triangle mesh) stores a transform stack), the problem was not greatly improved. Things came to a head on the night before the competition, as I was furiously implementing the distortion code. Suddenly everything just started crashing. Finally, I was able to trace the problem. See, since the rewrite uses my improved math library, which now vectorizes everything, alignment becomes not just important to performance, but absolutely critical to correctness. So I was using my aligned allocator. However, my aligned allocator had two problems. One was that it was allocating a size for a base type, not the child type. Since child types are usually larger than parent types, this is a segfault waiting to happen. However, a second error caused it to allocate a factor of sizeof(T) too much memory. This had the effect of massively increasing memory use, and also hiding the first error except in pathological cases.
Despite the fact that there was no CS 6620 in Fall 2016, certain contestants pushed for another teapot rendering competition. I was not especially thrilled by the prospect; I had way too much to do, and no really great scene ideas. Nevertheless, it was scheduled, and so of course I had to enter. I had a lot of ideas, but it turns out that my modeling skills are still abysmal. I rejected several ideas, one after another, attempting to model each and failing. But, my last—and most practical—idea started coming together. This was to just take a bunch of stock 3D models and try to make a room scene. I talked about it with Will Usher (who also entered), and the basic idea was to make a hyper-realistic scene. Unfortunately, it was Sunday-before-the-competition-on-Tuesday that I finally got modeling. I didn't have enough time, and some last-minute issues popped up, so the scene had fewer details that I'd have liked. Specifically, rays started going through triangles. After some poking around, I found that this was not an acceleration structure problem (it occurs with brute force as well). It seems to be a triangle intersection problem, which should just not happen—I'm using a robust triangle intersection method. For now, I converted the tracer to double-precision (which necessitated fixing some minor broken compile paths). For rendering, I tried to render a proof on my new desktop. This thing has an i7-6850K overclocked to 4.343 GHz and 64 GB of DDR4 at 3400 MHz. I went to sleep after starting it up, and in the morning the computer was off, its OS was trashed, and the render hadn't finished. While I reinstalled the OS, I just had it render on my somewhat-slower lab machine. While composing the scene, I tried to obey artistic rules (and experience with previous teapot-rendering-competitions has taught me that people prefer artsy entries anyway). I called the scene "SIGGRAPH". The final result, which has a bit of light post-processing (timing unavailable, click for full resolution):
I rewrote my renderer again. This time, I spared no expense in making it flexible, powerful, fast, usable, and so on—which is also why it took months. In fact, I had a great idea for a submission, and I even though I started working on it, I realized I just didn't have time, even if rendering were free. I'll save it for next year though. Instead, I came up with a simple scene in my emerging "still life" style. I decided to make it a tribute to graphics at the University of Utah. To that end, I put in:
Click for full resolution! For rendering, my new renderer is a spectrally-correct unbiased path tracer, striving for ultimate physical accuracy. It reconstructs CIE XYZ radiance values at each pixel using Mitchell-Netravali reconstruction filtering, which allows me to trace up to 8 wavelengths at a time and completely eliminate quantization error due to spectral binning (which, to be clear, I don't have). Indexing triangle meshes for all loaded objects is vital for memory efficiency, especially for the larger models, like the hairball and bust. The fiber-level yarn is loaded from Cem's ".bcc" format (after I converted Kui's output to that by script). This produces many tessellated spline curves, which I load as cylinder primitives. It intersects the cylinder primitives directly; tessellating to triangles would use ridiculous memory. You can see a better view of it in a test render. For some of the models, like the prism and the table, I deliberately rounded sharp corners to give it a feeling of realism. I also added normalmaps, a new feature, to add gritty realism to most objects. If you zoom in on e.g. the teapot, you can see fine scratch marks altering the refraction. The cloth wrinkles came out of a simulation in Blender (the texture is a Crytek Sponza curtain). Unlike last year, I did not use Blender to compose the scene—my renderer turned out to be fast enough to prototype just by test rendering (though I wouldn't do it again). Positioning e.g. the PCB and the prism turned out to require careful fiddling with the camera to get a useful view. There is a subtle (but important) meniscus on the water inside the teapot. This is modeled as actual geometry—and it was a pain since Blender has an apparent bug that prevented their CSG from working correctly. For rendering, I used my laptop and my lab machine for proof renders, with long-running jobs going on the lab machine. I was pleased that my raytracer's performance allowed me as many 1000+ sample/pixel renders as I really wanted. For some configurations, the slowest part of the whole process was loading the scene into memory (even after load, it renders in 8.7 GB, most of that triangles and cylinders). All models were found free for non-commercial use, or else modeled myself. Even in the former case, most models were heavily modified. The final render was done on my lab machine at a resolution of 4 096⨯3 072 with 10 000 samples per pixel. This works out to 125 829 120 000 (~126 billion) primary rays. In-practice, we can multiply this by at least 3 (one is inside the lens, one goes to the scene, minimum one reflection hitting envmap). However, the maximum ray depth is 16, so there could be up to 2 trillion rays in this image. My guess is it's more like 400–600 billion. My renderer reports that it took 266 419.539 796 seconds. My image won first place in the rendering competition this year. There were many interesting entries. I especially liked Nathan Morrical's. I was disappointed that Laura Lediaev and Will Usher didn't compete this year, as their entries are always exceptional. Hardware: except as mentioned, renders are done on my laptop, which has:
HTML Comment Box is loading comments...
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|