Kristof Beets, Senior Director of Technical Product Management at Imagination Technologies, shared some insights into the ray tracing tech and the development of their ray tracing IP.
I’m Kristof Beets, Senior Director of Technical Product Management here at Imagination Technologies. I have worked for over 20 years in the IP development space on GPU technology across the full spectrum from general purpose to high-end gaming. In this time, I have worked in developer support, demo development, and also business development.
I have been in the mobile GPU space since its inception and have also been with Imagination since the beginning of their ray tracing development dating back to 2014 with our Plato Boards. I am now in charge of heading up power-efficient IP to deliver ray tracing to a wide variety of markets and platforms.
Imagination Technologies have been an IP creating company for over 35 years, in that time we have been at the forefront of graphical technology. We specialise in the mobile, embedded, and cloud computing space currently, working out how to bring the best gaming experience to new platforms through intelligent hardware design.
What's Ray Tracing?
We have a number of blog posts and whitepapers which discuss Ray Tracing but fundamentally Ray Tracing builds on reality: light comes from a light source, those light rays bounce around many surfaces and interact with them (absorption, changing colour, changing direction) and ultimately such a ray may hit our eye where its seen as a specific colour. This whole process generates the visuals we see. As you can imagine starting from the light source and hoping that we hit the camera is a bit speculative and very brute force so generally this process is turned upside down, e.g. start from the eye and bounce around. Or even a more effective approach today – Hybrid Rendering where you render most of the scene traditionally and use ray tracing to query the 3D scene around the pixel you are processing.
Detail Lighting, RT solution on:
Basically, for a reflective surface, you run a shader and that shader can now emit a ray based on the view direction and seek in the 3D scene what object(s) reflect into that pixel. Fundamentally, Ray Tracing adds a spatial query as a new capability into the GPU and this is a great fit with many effects in the real world. Lighting is about checking if you can see (directly or with some bounces) a light source to determine if your pixel is lit or in shadow, and reflections I already touched upon above.
The key in GPUs today is that with shaders they are extremely flexible and even more so if combined with General Purpose Graphics Processing unit (GPGPU) style compute. GPUs have always done this, mixing semi-fixed function hardware with fully programmable shaders to get the best of both worlds: flexibility but also efficiency where it makes sense.
In graphics, lights, shadows, and reflections have always been popular and we have suffered approximations for a long time from no shadow to blob shadows, to stencil volumes, to shadow maps with ever more complex filters and multi-resolution look-ups. But today we spend so much compute power and so much bandwidth that it is becoming very costly. The reality is that running real ray tracing for the effect is now simply faster and cheaper and still better in quality than the 100% shader code approximation.
The approaches are fully detailed in our Ray Tracing Levels whitepaper. Imagination has always started from efficiency for battery powered devices, so our focus has always been on Level 4 and Level 5 efficiency solutions for Ray Tracing. Basically, we fully offload the tracing of the ray from the shaders (Level 2 and 3 solutions only do this partially) and we use coherency sorting to improve the memory access and execution efficiencies as we hit objects. Basically, we trace and process bundles of rays, not a spaghetti mix of rays all going in different directions which would be very inefficient.
An example of hidden coherency across an uneven surface:
Faking Ray Tracing Effect
We have already talked about a few examples of tricks regarding shadows (e.g. no shadow, a simple blob under the character, stencils, more complex shadow maps) and while the techniques get more complex and offer higher quality as you go along, they all have flaws.
The same can be done for reflections where you can go from no reflections to fake vague texture look-ups, to pre-baked cube maps, to dynamic cube maps, and even partial software ray tracing e.g. screen space reflections. With these, you do a minimal amount of ray tracing in screen space and use directional vector and depth information to reflect nearby screen objects (but this fails for anything not on the screen).
Imagination Technologies' Ray Tracing IP
We have fully integrated a Ray Tracing unit into our GPUs even in the 2015-time frame with our Plato Hardware. Fundamentally, this means that a shader emits rays and they are sent to this dedicated unit which collects many of these rays and looks for coherency. Effectively, we bundle rays that are traveling in similar directions. Those ray bundles are then processed. We access a Bounding Volume Hierarchy (BVH) where we first check if the bundle of rays intersect with the 3D world (which is basically a very large box). If the rays do hit it, we move down this hierarchy of boxes; this big world box is split into smaller boxes and we check down this hierarchy.
If our bundle intersects with them, we drill deeper into that box hierarchy until eventually that ray bundle will miss a box. If we know our rays don’t hit anything in this box, we save a lot of time and effort by quickly culling large parts of the scene. As we get to ever-smaller boxes at some point we switch to actual triangle geometry and then we move from box-ray testing to triangle-ray testing. This is of course where we find the actual intersections/hits and we then return this to the shader code to process.
Integrating into game engines is easy enough as basically emitting rays is part of standard shader code which game engines such as UE4 already support. We also need to capture the scene information and create the BVH. This is, again, part of the API and is a bit like geometry processing. The game engine submits this geometry via the API to the GPU and driver for processing.
Our main unique offering is the coherency, e.g. the bundling of rays for processing which is just like our Tile Based Rendering where we also group pixels together and process them on chip. So effectively, our Tile Based Rendering and our Level 4 RT with Coherency Sorting are very similar in concept but make a big difference in bandwidth and processing efficiency which means higher performance and fancier effects in the game engines.
Accessibility for Mobile Platform
Of course, there are always differences in approaching ray tracing, but the key here is that faking the effects has gotten so expensive that now using dedicated hardware is more power, bandwidth, and processing-efficient. This is no different for mobile than for a PC card. Doing brute force shader-based graphics is simply no longer the right answer and hence our Level 4 efficiency exceeds the implementations and efficiency of PC just like our Tile Based Deferred Rendering (TBDR) did more than 20 years ago and today it’s the defacto standard for rendering (e.g. IMR is no longer viable).
With the added efficiency of coherency gathering, we can reduce the time needed to calculate the reflections and the ray interactions. This reduces the number of calls and in turn, means that we can run it on much lower-powered hardware.
In mobile, we always look for doing more for less as power and bandwidth budgets always remain tight. We are excited about using Artificial Intelligence and neural network processing to help with graphics. Fortunately, we develop our own accelerators for this and are excited to be working with developer to get the best out of these units for improving graphics and ray tracing in their specific devices.
Of course, we continue to innovate but we never like to reveal our cards.
Compatibility with Hardware
We are always focused on industry standards hence our hardware takes into account what DirectX and Vulkan standards need and expect. This means there is little hardware developers need to do for us which they do not already do for other vendors. The main difference is that we just deliver it at lower power and bandwidth costs and we leave more shader cycles for other work.
Room for Improvement
Process node evolution is reaching a problematic space. We get more density and transistors with 3nm and 5nm for example, but bandwidth and power are not growing at the same rate so we will have to come up with solutions to run workloads on the most efficient processing logic, e.g. what runs on CPU and GPU, and what can be passed to an AI engine. Hence, we will see more accelerators like our ray tracing engines to offer better efficiency in processing and this means managing a more heterogeneous processing resource. Bandwidth and data flow is also expensive and as with our coherency and Tile Based Rendering, we are always looking for ways to keep the critical data on chip and avoid using external power-hungry DDR memory or worse HBM (even more power-hungry).
Basically, we always look to do more with less, more FPS, more quality with less processing, less bandwidth, and less power which means maximal efficiency in everything we do.