Table of Contents

Class NeRF<T>

Namespace
AiDotNet.NeuralRadianceFields.Models
Assembly
AiDotNet.dll

Implements Neural Radiance Fields (NeRF) for novel view synthesis.

public class NeRF<T> : NeuralNetworkBase<T>, INeuralNetworkModel<T>, IInterpretableModel<T>, IInputGradientComputable<T>, IDisposable, IRadianceField<T>, INeuralNetwork<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>

Type Parameters

T

The numeric type used for calculations (e.g., float, double).

Inheritance
NeRF<T>
Implements
IFullModel<T, Tensor<T>, Tensor<T>>
IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>
IParameterizable<T, Tensor<T>, Tensor<T>>
ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>
IGradientComputable<T, Tensor<T>, Tensor<T>>
Inherited Members
Extension Methods

Remarks

For Beginners: NeRF is a groundbreaking method for creating photorealistic 3D scenes from 2D images.

What NeRF does: - Input: Collection of photos of a scene from different angles - Training: Learn a neural network that represents the 3D scene - Output: Ability to render the scene from any new viewpoint

Key innovation: - Represents entire 3D scene as a continuous 5D function - Input: (x, y, z, θ, φ) - position and viewing direction - Output: (r, g, b, σ) - color and volume density

Architecture: 1. Positional encoding: Transform (x,y,z) to higher-dimensional space - Why: Helps network learn high-frequency details - Example: (x,y,z) → [sin(x), cos(x), sin(2x), cos(2x), ..., sin(2^L*x), cos(2^L*x)] - Similar encoding for direction (θ, φ)

  1. Coarse network (8 layers, 256 units):

    • Input: Encoded position
    • Output: Density + intermediate features
    • Input: Intermediate features + encoded direction
    • Output: RGB color
  2. Fine network (same structure):

    • Resamples based on coarse network predictions
    • Focuses samples where density is high
    • Produces final high-quality output

Why positional encoding matters: - Neural networks naturally learn low-frequency functions (smooth, blurry) - Real scenes have high-frequency details (sharp edges, textures) - Positional encoding enables learning high-frequency details - Without it: Blurry reconstructions - With it: Sharp, detailed reconstructions

Training process: 1. Sample random rays from training images 2. Sample points along each ray 3. Query network at each sample point 4. Render ray using volume rendering 5. Compare rendered color to actual pixel color 6. Backpropagate error and update network weights 7. Repeat for thousands of iterations

Hierarchical sampling: - Coarse sampling: Uniform samples along ray - Analyze coarse results: Where is density high? - Fine sampling: More samples where density is high (near surfaces) - Final rendering: Use both coarse and fine samples - Result: Better quality with fewer total samples

Rendering equation (volume rendering): C(r) = Σ T(t_i) * (1 - exp(-σ_i * δ_i)) * c_i where: - C(r): Final color of ray r - T(t_i): Transmittance (how much light reaches point i) - σ_i: Density at sample point i - δ_i: Distance between sample points - c_i: Color at sample point i - T(t_i) = exp(-Σ(j<i) σ_j * δ_j)

Applications: - Virtual reality: Create immersive 3D environments from photos - Film industry: Digitize real locations for CGI - Real estate: Virtual property tours - Cultural heritage: Preserve historical sites digitally - Robotics: Build 3D maps for navigation - Medical imaging: Reconstruct 3D anatomy from scans

Limitations of original NeRF: - Slow training: Hours to days per scene - Slow rendering: Seconds per image - Scene-specific: Must retrain for each new scene - Static only: Can't handle moving objects

These limitations led to many improved variants:

  • Instant-NGP: 100x faster training and rendering
  • Plenoxels: No neural network, faster optimization
  • TensoRF: Tensor decomposition for efficiency
  • Dynamic NeRF: Handle time-varying scenes
  • Mip-NeRF: Better handling of scale/blur

Reference: "NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis" by Mildenhall et al., ECCV 2020

Constructors

NeRF(int, int, int, int, int, int, bool, int, int, double, double, double, ILossFunction<T>?)

Creates a new NeRF model for 3D scene representation and novel view synthesis.

public NeRF(int positionEncodingLevels = 10, int directionEncodingLevels = 4, int hiddenDim = 256, int numLayers = 8, int colorHiddenDim = 128, int colorNumLayers = 1, bool useHierarchicalSampling = true, int renderSamples = 64, int hierarchicalSamples = 128, double renderNearBound = 2, double renderFarBound = 6, double learningRate = 0.0005, ILossFunction<T>? lossFunction = null)

Parameters

positionEncodingLevels int

Number of frequency levels for position encoding. Higher values enable more high-frequency details but are harder to optimize. Default is 10.

directionEncodingLevels int

Number of frequency levels for direction encoding. Lower than position (view dependence is smoother than geometry). Default is 4.

hiddenDim int

Size of hidden layers. Larger values have more capacity but are slower. Default is 256.

numLayers int

Depth of network. More layers can learn more complex functions. Default is 8.

colorHiddenDim int

Hidden dimension for color prediction network. Default is 128.

colorNumLayers int

Number of layers in color prediction network. Default is 1.

useHierarchicalSampling bool

Whether to use two-stage rendering (coarse + fine). True gives better quality but is slower. Default is true.

renderSamples int

Number of samples per ray for rendering. Default is 64.

hierarchicalSamples int

Additional samples for hierarchical sampling. Default is 128.

renderNearBound double

Near bound for ray sampling. Default is 2.0.

renderFarBound double

Far bound for ray sampling. Default is 6.0.

learningRate double

Learning rate for training. Default is 5e-4.

lossFunction ILossFunction<T>

Loss function for training. If null, MSE loss is used.

Remarks

For Beginners: Creates a NeRF model for 3D scene representation.

Parameters explained:

  • positionEncodingLevels: How many frequencies for position encoding

    • Higher = more high-frequency details (but harder to optimize)
    • Typical: 10 (produces 60-dimensional encoding from 3D position)
    • Formula: 3 * 2 * L = 60 for L=10
  • directionEncodingLevels: Frequencies for viewing direction encoding

    • Lower than position (view dependence is smoother than geometry)
    • Typical: 4 (produces 24-dimensional encoding from 2D direction)
    • Formula: 3 * 2 * L' = 24 for L'=4
  • hiddenDim: Size of hidden layers

    • Larger = more capacity (can represent more complex scenes)
    • Larger = slower and needs more memory
    • Typical: 256
  • numLayers: Depth of network

    • More layers = can learn more complex functions
    • More layers = slower and harder to train
    • Typical: 8
  • useHierarchicalSampling: Two-stage rendering (coarse + fine)

    • True: Better quality, slower (recommended)
    • False: Faster, lower quality

Standard NeRF configuration:

var nerf = new NeRF<float>(
    positionEncodingLevels: 10,
    directionEncodingLevels: 4,
    hiddenDim: 256,
    numLayers: 8,
    useHierarchicalSampling: true);

Properties

SupportsTraining

Gets whether this network supports training.

public override bool SupportsTraining { get; }

Property Value

bool

Methods

Backpropagate(Tensor<T>)

Performs backpropagation to compute gradients.

public override Tensor<T> Backpropagate(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>

Returns

Tensor<T>

CreateNewInstance()

Creates a new instance of this model for cloning.

protected override IFullModel<T, Tensor<T>, Tensor<T>> CreateNewInstance()

Returns

IFullModel<T, Tensor<T>, Tensor<T>>

DeserializeNetworkSpecificData(BinaryReader)

Deserializes network-specific data.

protected override void DeserializeNetworkSpecificData(BinaryReader reader)

Parameters

reader BinaryReader

ForwardWithMemory(Tensor<T>)

Performs forward pass with memory for backpropagation.

public override Tensor<T> ForwardWithMemory(Tensor<T> input)

Parameters

input Tensor<T>

Returns

Tensor<T>

GetModelMetadata()

Gets metadata about the model.

public override ModelMetadata<T> GetModelMetadata()

Returns

ModelMetadata<T>

InitializeLayers()

Initializes the neural network layers.

protected override void InitializeLayers()

Predict(Tensor<T>)

Makes a prediction using the model.

public override Tensor<T> Predict(Tensor<T> input)

Parameters

input Tensor<T>

Returns

Tensor<T>

QueryField(Tensor<T>, Tensor<T>)

Queries the radiance field at given positions and viewing directions.

public (Tensor<T> rgb, Tensor<T> density) QueryField(Tensor<T> positions, Tensor<T> viewingDirections)

Parameters

positions Tensor<T>

3D positions tensor of shape [N, 3].

viewingDirections Tensor<T>

Viewing direction vectors of shape [N, 3].

Returns

(Tensor<T> grad1, Tensor<T> grad2)

RGB colors and volume densities for each query point.

RenderImage(Vector<T>, Matrix<T>, int, int, T)

Renders an image from a camera viewpoint.

public Tensor<T> RenderImage(Vector<T> cameraPosition, Matrix<T> cameraRotation, int imageWidth, int imageHeight, T focalLength)

Parameters

cameraPosition Vector<T>

Camera position in world coordinates.

cameraRotation Matrix<T>

Camera rotation matrix (3x3).

imageWidth int

Output image width in pixels.

imageHeight int

Output image height in pixels.

focalLength T

Camera focal length.

Returns

Tensor<T>

Rendered image tensor of shape [height, width, 3].

RenderRays(Tensor<T>, Tensor<T>, int, T, T)

Renders colors for a batch of rays.

public Tensor<T> RenderRays(Tensor<T> rayOrigins, Tensor<T> rayDirections, int numSamples, T nearBound, T farBound)

Parameters

rayOrigins Tensor<T>

Ray origin positions [N, 3].

rayDirections Tensor<T>

Ray direction vectors [N, 3].

numSamples int

Number of samples per ray.

nearBound T

Near clipping plane.

farBound T

Far clipping plane.

Returns

Tensor<T>

Rendered colors for each ray [N, 3].

SerializeNetworkSpecificData(BinaryWriter)

Serializes network-specific data.

protected override void SerializeNetworkSpecificData(BinaryWriter writer)

Parameters

writer BinaryWriter

Train(Tensor<T>, Tensor<T>)

Trains the model on input data.

public override void Train(Tensor<T> input, Tensor<T> expectedOutput)

Parameters

input Tensor<T>
expectedOutput Tensor<T>

UpdateParameters(Vector<T>)

Updates model parameters using gradient descent.

public override void UpdateParameters(Vector<T> gradients)

Parameters

gradients Vector<T>