Figure 1. Comparison of signal quality using a pathtracer at eight bounces and one sample per pixel. LHS: raw pathtracer output. Centre: pathtracer output terminating early in the NRC. RHS: Accumulated result of the pathtracer without relying on the NRC.
The Neural Radiance Cache is an AI technique aimed at improving signal quality and potentially performance in the context of pathtracing. The NRC operates in world space and predicts radiance at any point in the virtual world using path-traced live-trained data.
The NRC is designed to support dynamic scenes and to be independent of scene data such as materials, BRDFs, geometry, etc. As a result, NRC does not depend on any precomputations or manual parameter adjustments. The key to this property is continuous adaptation of the underlying neural network – it is always updated by tracing a certain amount of training paths at full length which optimizes the network to fit the current state of the scene. Moreover, NRC learns radiance, and as a result works well for glossy reflections, as opposed to irradiance caching techniques which cater to diffuse lighting. The NRC library supports D3D12 and Vulkan rendering APIs while relying on Tensor Cores to train the neural network each frame.
The NRC library is currently in experimental mode and is being actively developed.
The process of tracing paths to their full length (until they reach a light source or exit the scene) is expensive and introduces significant noise. To mitigate these drawbacks, the paths are terminated early (shortened) by querying the radiance cache for the amount of light that should be injected into the ends of these short paths (illustrated in Figure 2.).
To avoid further limitations such as a precomputation step, only supporting diffuse materials, or introducing light leakage due to misalignment of geometry, The NRC algorithm is agnostic to material definition and light setup. It relies on a neural network that is trained while rendering and takes the position and direction as inputs then returns the predicted radiance leaving that position in the specified direction.
Figure 2. Enhancing signal quality with radiance caching. Top LHS: paths are typically traced until they reach a light source or exit the scene - expensive and noisy. Bottom LHS: shorter paths are obtained by terminating into the NRC. RHS: Query and training points during pathtracing.
The workflow is to first run a path tracer at lower-than-target resolution to write training radiance for the neural network. We refer to this pass as the update pass. This is followed by a second, full-resolution, query pathtracer pass, where query points where we want to read predicted radiance are created. Next, the neural network predicts radiance which is read at the queried points during a resolve pass. To generate the training data, the NRC library internally propagates the predicted data backwards along the training path such that each vertex of that training path will get an estimate of reflected light - this will be used to train and optimize the network so it makes accurate predictions.
Instead of tracing a ray or a path to get incident radiance at a given point, we query the radiance cache to obtain an accurate estimate of the radiance, effectively terminating paths early in the cache. This improves the quality and potentially the performance.
Figure 3. Termination heuristic employed when querying the cache - the paths are shortened when the path spread(r) is larger than a custom threshold (t).
The decision to terminate the path early into the cache is informed by a path spread (ray cone) heuristic. The cone approximates the ray and spreads along the path as it intersects surface with various material properties. The interaction with a diffuse surface will significantly increase the cone's radius, whereas smooth/specular surfaces do not contribute as much to the cone's spread. When the radius of the cone reaches a user-specified threshold, the path is safe to terminate early in the cache. This allows the application to control the trade-off between bias and noise.
📗 Background materials
The NRC library is a binary distribution. The content resides in sdk-libraries/nrc and accounts for binaries per API, shader includes, and headers.
Directory | Files |
---|---|
/bin | NRC_D3D12.dll, NRC_Vulkan.dll, CUDA dlls |
/include | Nrc.hlsli, NrcD3d12.h, NrVk.h, misc helpers |
/lib | NRC_D3D12.lib, NRC_Vulkan.lib |
Assuming the application already has a pathtracer and associated requirements such as an acceleration structure (AS) in place, the most relevant parts for implementing NRC can be conceptually divided into two categories - the CPU-side API and the shader-side API.
CPU-side API. The NrcD3d12.h
and the Vulkan counterpart, NrcVk.h
, expose the following functionality:
- Initialization and (re)configuration of the NRC context
- Options regarding resource memory management (app-managed or internally managed by the NRC library)
- Invocations to the underlying neural network to:
- Create query data - points where we want the network to infer radiance values.
- Train the network on data collected in the pathtracer. This optimizes the network and allows it to predict radiance.
- Invocation of a Resolve pass. This is a compute pass internal to the NRC library that resolves radiance at the queried points since this cannot be done in-line in the pathtracer. Additionally, this method is used for debug visualization.
Shader-side API. The application's pathtracer will rely on the Nrc.hlsli
, NrcHelpers.hlsli
, as well an NrcStructures.h
to declare and write training and query data to NRC's resources.
Figure 4. An overview of the integration steps. Initialization is carried out before the render loop, the (re)configuration step is only invoked when context settings change, whilst the remaining tasks occur per-frame. The update and query passes rely on the application's provided pathtracer. If the signal is split per BRDF, the application can define a custom resolve pass instead of relying on the NRC library in-built resolve.
At this stage, the application will include the headers inside the provided /include, as well as link against prebuilt binaries available in the same package (see /bin and /lib). The Resolve pass and kernels used internally by the NRC library come packaged as part of the binary and as such it is not necessary to provide or explicitly load external shader files.
💡 Optional integration class.
The application can invoke the public NRC functions in-place, where they are required, or can opt for the creation of an integration class to encapsulate the functionality (as seen in samples/pathtracer/NrcIntegration.h). The latter approach is used in the pathtracer example for ease of readability and demonstration of the library's functionality for D3D12 and Vulkan; it is not enforced as a best-practice.
The NRC library provides a utility function for DLL signature verification NRC_DECLSPEC Status VerifySignature(const wchar_t* fullPathToFile)
. For example usage on Windows see the Initialization methods for D3D12 and Vulkan in NrcIntegration.cpp.
For the Vulkan API, an additional step is required compared to the D3D12 counterpart, namely extension and feature specification at device-creation time. To achieve this, invoke the provided helper functions:
uint32_t GetVulkanDeviceExtensions(char const* const*& outStringArray);
uint32_t GetVulkanInstanceExtensions(char const* const*& outStringArray);
uint32_t GetVulkanDeviceFeatures(char const* const*& outStringArray);
The following features have to be available (already part of core 2.1.) and enabled:
VK_EXT_SCALAR_BLOCK_LAYOUT_EXTENSION_NAME
VK_KHR_UNIFORM_BUFFER_STANDARD_LAYOUT_EXTENSION_NAME
VK_EXT_scalar_block_layout
❗ Resource memory layout specification
NRC relies on
-fvk-use-dx-layout
. If a custom resolve pass is required, bare this in mind. Read more about alignment in the DirectX Shader Compiler docs.
This is typically achieved in the application's Init()
step. First, set up NRC's GlobalSettings
. Here the application can hook a logger callback for intercepting NRC library messages as well as specifying which memory management mode should be used (consult the following section for further details).
Initialize the library by invoking:
nrc::Status status = nrc::d3d12::Initialize(globalSettings);
Next, create an NRC context:
status = nrc::d3d12::Context::Create(nativeDevice5, m_nrcContext);
❗ At this point, the neural network has not yet been created. This is achieved in the configuration step.
Selection of memory management. The NRC library caters for two approaches when it comes to managing its buffers, controlled via the globalSettings.enableGPUMemoryAllocation
switch is for. When the SDK manages the buffers internally, this flag should be enabled first at initialization time. After that, using the helper function GetBuffers()
, the application will only create views (handles) to these buffers. They are required during path tracing and in a scenario where a custom resolve pass is necessary (more on this in the Resolve section).
If buffers are managed on the application side, the flag will be disabled, and the buffers will be created by the application during the configuration step using GetBuffersAllocationInfo(const ContextSettings& contextSettings, BuffersAllocationInfo& outBuffersAllocationInfo)
to inform properties such as element size and stride, which types of views are allowed, etc.
💡 NrcIntegration.cpp illustrates both approaches depending on the state of the flag. This is available for D3D12 and Vulkan implementations.
This is expected to occur infrequently when constituent parts of the ContextSettings
have changed. E.g. on level load, or when the screen resolution changes. This reloads the neural network configuration and may require buffers to be reallocated.
Considerations when using Configure:
-
Configure(const ContextSettings& contextSettings, const Buffers* buffers = nullptr)
should be called at least once if this was the first time NRC was initialized. This is required asConfigure
performs memory allocations. -
If
enableGPUMemoryAllocation
is switched off and app-side memory management is preferred, then it is at this point when the app-side managed buffers will be passed toConfigure
. -
If
Configure
is a part of the render loop, it could cause a hitch if any of the context settings have changed and a tear-down is required.
The NRC library differentiates between settings that seldomly change (ContextSettings
) and settings that change per-frame (FrameSettings
).
At the start of the frame, invoke:
// This function populates internal data and clears the `Counter` buffer.
Status BeginFrame(ID3D12GraphicsCommandList4* cmdList, const FrameSettings& frameSettings);
Next, populate the NrcConstants
structure by calling:
Status PopulateShaderConstants(NrcConstants& outConstants) const;
The NrcConstants
structure is intended to be passed to the pathtracer inside a constant buffer. It can be included as part of an existing constant buffer for the pathtracer. Its content is derived from FrameSettings
and ContextSettings
.
The NRC requires two pathtracer passes - one for updating (writing path data for training the NN), and one for querying (creating query points where the NN will predict radiance).
-
The update pass runs at lower-than-target resolution. It is recommended to rely on the auxiliary function to obtain this resolution and set it on the
ContextSettings
early on:nrc_uint2 computeIdealTrainingDimensions(nrc_uint2 const& frameDimensions, float avgTrainingVerticesPerPath = 0.f)
The dispatch size of this pass has to match the
contextSettings.trainingDimensions
. -
The query pass dispatch size is equal to the
contextSettings.frameDimensions
.
💡 The update pass is independent of the query pass - this offers a potential performance benefit.
There are several approaches to a pathtracer's modus operandi. The pathtracer project in RTXGI SDK illustrates a trivial path tracer that does not start from the G-Buffer.
Below is another variant which shows how NRC is integrated if the primary ray is reconstructed from the G-Buffer.
// Prepare NRC buffers: queryPathInfo, trainingPathInfo, trainingPathVertices, queryRadianceParams, countersData, debugTrainingPathInfo.
void RayGenFunc()
{
// Prepare launchIndex, launchDimensions
if (NrcIsUpdateMode())
launchIndex = (float2(DispatchRaysIndex().xy) + Rng::GetFloat2()) * nrcTrainingDownscaleFactor;
else
launchIndex = DispatchRaysIndex().xy;
// Load data from G-Buffer...
// Flag G-Buffer miss to write it to NRC later
// Only 1 SPP during NRC Update pass
const uint samplesPerPixel = NrcIsUpdateMode() ? 1 : nrcConstants.samplesPerPixel;
// Prepare NRC context
NrcBuffers nrcBuffers = {queryPathInfo, trainingPathInfo, ...};
NrcContext nrcContext = NrcCreateContext(nrcConstants, nrcBuffers, DispatchRaysIndex().xy);
for (int sampleIndex = 0; sampleIndex < samplesPerPixel; sampleIndex++)
{
// Initialize NRC data for path and sample index traced in this thread
NrcSetSampleIndex(nrcContext, sampleIndex);
NrcPathState nrcPathState = NrcCreatePathState(rand(rngState));
if (NrcIsUpdateMode()) {/*Add random offset to pixel's coords...*/}
if(flagGbufferMiss) {
NrcUpdateOnMiss(nrcPathState);
break;
}
else
{
NrcSurfaceAttributes surfaceAttributes = gBufferData;
NrcProgressState nrcProgressState = NrcUpdateOnHit(...); // Update NRC state on hit
if (nrcProgressState == NrcProgressState::TerminateImmediately)
break;
NrcSetBrdfPdf(nrcPathState, brdfPdf)
}
// Prepare Payload and other data...
for (int bounce = 1; bounce < gData.maxPathVertices; bounce++)
{
TraceRay(...);
if (!payload.hasHit()) NrcUpdateOnMiss(nrcPathState); // Handle miss
// Decode material properties...
NrcSurfaceAttributes surfaceAttributes = decodedMaterial; // Passed to NrcUpdateOnHit
NrcProgressState nrcProgressState = NrcUpdateOnHit(...); // Update NRC state on hit
if (nrcProgressState == NrcProgressState::TerminateImmediately) break;
// Account for emissives and evaluate NEE with RIS...
// Terminate loop early on last bounce (don't sample BRDF)
if (bounce == gData.maxPathVertices - 1) {
NrcSetDebugPathTerminationReason(...);
break;
}
// Terminate loop after emissives and direct light if CreateQuery requests delayed termination.
// If direct lighting isn't cached (radianceCacheDirect is false)
// add direct lighting on hit where we query NRC before terminating the loop.
if (nrcProgressState == NrcProgressState::TerminateAfterDirectLighting) break;
// Sample BRDF to generate the next ray and run MIS...
if(!evaluateCombinedBRDF(...)
NrcSetDebugPathTerminationReason(nrcPathState, BRDFAbsorption);
NrcSetBrdfPdf(nrcPathState, brdfPdf);
} // End of path
NrcWriteFinalPathInfo(nrcContext, nrcPathState, throughput, radiance);
} // End of SPP loop
}
💡 If the application's pathtracer carries out the work in Closest-Hit Shader (CHS) then the
NrcPathState
has to be packed inside the payload and communicated between stages.
Once the pathtracer passes complete, the neural network predicts radiance at the query points. Internally, the NRC library propagates the radiance based on the predicted values and the path data. This is immediately followed by the training of the network which relies on the training data from the propagation. This optimizes the network to produce accurate radiace predictions.
All the aforementioned steps are achieved in a single exposed API call:
Status QueryAndTrain(ID3D12GraphicsCommandList4* cmdList, float* trainingLossPtr)
The final radiance is not obtained in-line in the pathtracer. As such, a separate pass is required to compute the final result. The NRC library exposes an API call to an in-built resolve pass which assumes the signal is combined. This pass takes the predicted radiance from the query records, modulates by the throughput of the path, and adds the result to the final image.
Status Resolve(ID3D12GraphicsCommandList4* cmdList, ID3D12Resource* outputBuffer);
If the application relies on a split signal, then the library call can be skipped in favor of defining a custom compute pass, for example:
void CustomResolve(int3 DispatchThreadID : SV_DispatchThreadID)
{
const uint2 launchIndex = DispatchThreadID.xy;
if(any(launchIndex >= screenResolution))
return;
const uint sampleIndex = 0;
const uint samplesPerPixel = 1;
const uint pathIndex = NrcGetPathInfoIndex(screenResolution, launchIndex, sampleIndex, samplesPerPixel);
const NrcQueryPathInfo path = NrcUnpackQueryPathInfo(nrcQueryPathInfo[pathIndex]);
if (path.queryBufferIndex < 0xFFFFFFFF)
{
float3 radiance = NrcUnpackRadiance(nrcQueryRadiance[path.queryBufferIndex], radianceUnpackMultiplier) * path.prefixThroughput;
uint uBrdfType = brdfTypeTarget[launchIndex];
if(uBrdfType == BRDF_SPECULAR)
specularPathTracingTarget[launchIndex] += float4(radiance, 0.0f);
if(uBrdfType == BRDF_DIFFUSE)
diffusePathTracingTarget[launchIndex] += float4(radiance, 0.0f);
}
}
In the above scenario, the pathtracer uses a probabilistic selection of BRDF type per path and writes diffuse and specular results to two separate buffers (specularPathTracingTarget
and diffusePathTracingTarget
). In order for the custom resolve pass to work in this case, the BRDF selection is recorded in the pathtracer via the brdfTypeTarget
. This intermediate buffer informs what radiance to unpack and to which output buffer to write it to.
💡 The in-built resolve pass can be used for visually debugging the NRC. See the Debugging section for details.
This is when EndFrame(ID3D12CommandQueue* cmdQueue)
is invoked once the command list has been submitted.The command queue must be the same one that was used to execute all the previous command lists.
Any created contexts must be destroyed:
Status Destroy(Context& context)
And finally, the NRC library itself must be shutdown:
void Shutdown()
The NRC library provides several ways to visually inspect the quality of the cache data and narrow down potential integration errors. This can be achieved at the pathtracer level and at the resolve pass level.
Similar to the path tracer debug views, the in-built Resolve pass can be used for debugging by switching its ResolveMode
. This caters for several types of debug cache data visualizations including a result of the query, a training bounce heatmap, training radiance (raw and smoothed) output, and a direct visualization of the cache.
When inspecting the direct visualization of the cache, indirect signal as well as defined shadows should be present. Boiling-like artifacts are expected as the NRC is not intended to be used in this way, only previewed for troubleshooting.
Figure 5. Direct visualization of the cache - querying at vertex index 0.
💡 This debug view should match the pathtracer's intensity closely. If it does not match the intensity or looks flat and lacks details (such as shadows), it could indicate that the radiance is not correctly recorded in the pathtracer passes, or that the update pass does not trace a small fraction to the full length.
The training bounce heatmap is similar to the pathtracer bounce visualization and ensures that during the update pass, paths are traced on average to four bounces and a fraction (~16th of the paths) at full length. The visualization of the training radiances at the primary path vertex, should match closely, when accumulated/smoothed, to the ground-truth path tracer output (when the NRC is disabled).
Figure 6. Resolve pass debug modes. LHS: query results. Centre: Training bounces heatmap (Red encodes two bounces, yellow-three, green-four, white-eight). RHS: Smoothed training radiance for the primary ray.
During the integration it is important to ensure that the NRC buffers are correctly written to by the application's pathtracer. This can be achieved with Nsight Graphics frame capture in conjunction with custom structure definitions for ease of readability.
Figure 7. Visualizing NRC buffers in Nsight Graphics Frame Capture. This illustrates the path tracing update pass with a focus on the buffer holding NrcRadianceParameters.
The Structured Memory Configuration feature in Nsight comes in handy for inspecting the contents of NRC buffers regardless of packing. It's noteworthy that the neural network calls will not be available in Nsight Graphics Frame Debugging at the present time.
Resource | Element Size | Total Count | Total Allocation |
---|---|---|---|
Counter | 4 | 8 | 8 |
QueryPathInfo | 8 | 2073600 | 16588800 |
TrainingPathInfo | 8 | 42196 | 337568 |
TrainingPathVertices | 48 | 337568 | 16203264 |
TrainingRadiance | 12 | 337568 | 4050816 |
TrainingRadianceParams | 56 | 337568 | 18903808 |
QueryRadiance | 12 | 2117844 | 25414128 |
QueryRadianceParams | 56 | 2117844 | 118599264 |
DebugTrainingRadiance | 24 | 42196 | 1012704 |
Total (MB) | 191.793 |
Table 1. Expected values at frame dimensions 1920 x 1080, with a training resolution of 274 x 154, maximum path length of eight bounces, one SPP.