Class NeRF<T>
- Namespace
- AiDotNet.NeuralRadianceFields.Models
- Assembly
- AiDotNet.dll
Implements Neural Radiance Fields (NeRF) for novel view synthesis.
public class NeRF<T> : NeuralNetworkBase<T>, INeuralNetworkModel<T>, IInterpretableModel<T>, IInputGradientComputable<T>, IDisposable, IRadianceField<T>, INeuralNetwork<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>
Type Parameters
TThe numeric type used for calculations (e.g., float, double).
- Inheritance
-
NeRF<T>
- Implements
- Inherited Members
- Extension Methods
Remarks
For Beginners: NeRF is a groundbreaking method for creating photorealistic 3D scenes from 2D images.
What NeRF does: - Input: Collection of photos of a scene from different angles - Training: Learn a neural network that represents the 3D scene - Output: Ability to render the scene from any new viewpoint
Key innovation: - Represents entire 3D scene as a continuous 5D function - Input: (x, y, z, θ, φ) - position and viewing direction - Output: (r, g, b, σ) - color and volume density
Architecture: 1. Positional encoding: Transform (x,y,z) to higher-dimensional space - Why: Helps network learn high-frequency details - Example: (x,y,z) → [sin(x), cos(x), sin(2x), cos(2x), ..., sin(2^L*x), cos(2^L*x)] - Similar encoding for direction (θ, φ)
Coarse network (8 layers, 256 units):
- Input: Encoded position
- Output: Density + intermediate features
- Input: Intermediate features + encoded direction
- Output: RGB color
Fine network (same structure):
- Resamples based on coarse network predictions
- Focuses samples where density is high
- Produces final high-quality output
Why positional encoding matters: - Neural networks naturally learn low-frequency functions (smooth, blurry) - Real scenes have high-frequency details (sharp edges, textures) - Positional encoding enables learning high-frequency details - Without it: Blurry reconstructions - With it: Sharp, detailed reconstructions
Training process: 1. Sample random rays from training images 2. Sample points along each ray 3. Query network at each sample point 4. Render ray using volume rendering 5. Compare rendered color to actual pixel color 6. Backpropagate error and update network weights 7. Repeat for thousands of iterations
Hierarchical sampling: - Coarse sampling: Uniform samples along ray - Analyze coarse results: Where is density high? - Fine sampling: More samples where density is high (near surfaces) - Final rendering: Use both coarse and fine samples - Result: Better quality with fewer total samples
Rendering equation (volume rendering): C(r) = Σ T(t_i) * (1 - exp(-σ_i * δ_i)) * c_i where: - C(r): Final color of ray r - T(t_i): Transmittance (how much light reaches point i) - σ_i: Density at sample point i - δ_i: Distance between sample points - c_i: Color at sample point i - T(t_i) = exp(-Σ(j<i) σ_j * δ_j)
Applications: - Virtual reality: Create immersive 3D environments from photos - Film industry: Digitize real locations for CGI - Real estate: Virtual property tours - Cultural heritage: Preserve historical sites digitally - Robotics: Build 3D maps for navigation - Medical imaging: Reconstruct 3D anatomy from scans
Limitations of original NeRF: - Slow training: Hours to days per scene - Slow rendering: Seconds per image - Scene-specific: Must retrain for each new scene - Static only: Can't handle moving objects
These limitations led to many improved variants:
- Instant-NGP: 100x faster training and rendering
- Plenoxels: No neural network, faster optimization
- TensoRF: Tensor decomposition for efficiency
- Dynamic NeRF: Handle time-varying scenes
- Mip-NeRF: Better handling of scale/blur
Reference: "NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis" by Mildenhall et al., ECCV 2020
Constructors
NeRF(int, int, int, int, int, int, bool, int, int, double, double, double, ILossFunction<T>?)
Creates a new NeRF model for 3D scene representation and novel view synthesis.
public NeRF(int positionEncodingLevels = 10, int directionEncodingLevels = 4, int hiddenDim = 256, int numLayers = 8, int colorHiddenDim = 128, int colorNumLayers = 1, bool useHierarchicalSampling = true, int renderSamples = 64, int hierarchicalSamples = 128, double renderNearBound = 2, double renderFarBound = 6, double learningRate = 0.0005, ILossFunction<T>? lossFunction = null)
Parameters
positionEncodingLevelsintNumber of frequency levels for position encoding. Higher values enable more high-frequency details but are harder to optimize. Default is 10.
directionEncodingLevelsintNumber of frequency levels for direction encoding. Lower than position (view dependence is smoother than geometry). Default is 4.
hiddenDimintSize of hidden layers. Larger values have more capacity but are slower. Default is 256.
numLayersintDepth of network. More layers can learn more complex functions. Default is 8.
colorHiddenDimintHidden dimension for color prediction network. Default is 128.
colorNumLayersintNumber of layers in color prediction network. Default is 1.
useHierarchicalSamplingboolWhether to use two-stage rendering (coarse + fine). True gives better quality but is slower. Default is true.
renderSamplesintNumber of samples per ray for rendering. Default is 64.
hierarchicalSamplesintAdditional samples for hierarchical sampling. Default is 128.
renderNearBounddoubleNear bound for ray sampling. Default is 2.0.
renderFarBounddoubleFar bound for ray sampling. Default is 6.0.
learningRatedoubleLearning rate for training. Default is 5e-4.
lossFunctionILossFunction<T>Loss function for training. If null, MSE loss is used.
Remarks
For Beginners: Creates a NeRF model for 3D scene representation.
Parameters explained:
positionEncodingLevels: How many frequencies for position encoding
- Higher = more high-frequency details (but harder to optimize)
- Typical: 10 (produces 60-dimensional encoding from 3D position)
- Formula: 3 * 2 * L = 60 for L=10
directionEncodingLevels: Frequencies for viewing direction encoding
- Lower than position (view dependence is smoother than geometry)
- Typical: 4 (produces 24-dimensional encoding from 2D direction)
- Formula: 3 * 2 * L' = 24 for L'=4
hiddenDim: Size of hidden layers
- Larger = more capacity (can represent more complex scenes)
- Larger = slower and needs more memory
- Typical: 256
numLayers: Depth of network
- More layers = can learn more complex functions
- More layers = slower and harder to train
- Typical: 8
useHierarchicalSampling: Two-stage rendering (coarse + fine)
- True: Better quality, slower (recommended)
- False: Faster, lower quality
Standard NeRF configuration:
var nerf = new NeRF<float>(
positionEncodingLevels: 10,
directionEncodingLevels: 4,
hiddenDim: 256,
numLayers: 8,
useHierarchicalSampling: true);
Properties
SupportsTraining
Gets whether this network supports training.
public override bool SupportsTraining { get; }
Property Value
Methods
Backpropagate(Tensor<T>)
Performs backpropagation to compute gradients.
public override Tensor<T> Backpropagate(Tensor<T> outputGradient)
Parameters
outputGradientTensor<T>
Returns
- Tensor<T>
CreateNewInstance()
Creates a new instance of this model for cloning.
protected override IFullModel<T, Tensor<T>, Tensor<T>> CreateNewInstance()
Returns
- IFullModel<T, Tensor<T>, Tensor<T>>
DeserializeNetworkSpecificData(BinaryReader)
Deserializes network-specific data.
protected override void DeserializeNetworkSpecificData(BinaryReader reader)
Parameters
readerBinaryReader
ForwardWithMemory(Tensor<T>)
Performs forward pass with memory for backpropagation.
public override Tensor<T> ForwardWithMemory(Tensor<T> input)
Parameters
inputTensor<T>
Returns
- Tensor<T>
GetModelMetadata()
Gets metadata about the model.
public override ModelMetadata<T> GetModelMetadata()
Returns
InitializeLayers()
Initializes the neural network layers.
protected override void InitializeLayers()
Predict(Tensor<T>)
Makes a prediction using the model.
public override Tensor<T> Predict(Tensor<T> input)
Parameters
inputTensor<T>
Returns
- Tensor<T>
QueryField(Tensor<T>, Tensor<T>)
Queries the radiance field at given positions and viewing directions.
public (Tensor<T> rgb, Tensor<T> density) QueryField(Tensor<T> positions, Tensor<T> viewingDirections)
Parameters
positionsTensor<T>3D positions tensor of shape [N, 3].
viewingDirectionsTensor<T>Viewing direction vectors of shape [N, 3].
Returns
RenderImage(Vector<T>, Matrix<T>, int, int, T)
Renders an image from a camera viewpoint.
public Tensor<T> RenderImage(Vector<T> cameraPosition, Matrix<T> cameraRotation, int imageWidth, int imageHeight, T focalLength)
Parameters
cameraPositionVector<T>Camera position in world coordinates.
cameraRotationMatrix<T>Camera rotation matrix (3x3).
imageWidthintOutput image width in pixels.
imageHeightintOutput image height in pixels.
focalLengthTCamera focal length.
Returns
- Tensor<T>
Rendered image tensor of shape [height, width, 3].
RenderRays(Tensor<T>, Tensor<T>, int, T, T)
Renders colors for a batch of rays.
public Tensor<T> RenderRays(Tensor<T> rayOrigins, Tensor<T> rayDirections, int numSamples, T nearBound, T farBound)
Parameters
rayOriginsTensor<T>Ray origin positions [N, 3].
rayDirectionsTensor<T>Ray direction vectors [N, 3].
numSamplesintNumber of samples per ray.
nearBoundTNear clipping plane.
farBoundTFar clipping plane.
Returns
- Tensor<T>
Rendered colors for each ray [N, 3].
SerializeNetworkSpecificData(BinaryWriter)
Serializes network-specific data.
protected override void SerializeNetworkSpecificData(BinaryWriter writer)
Parameters
writerBinaryWriter
Train(Tensor<T>, Tensor<T>)
Trains the model on input data.
public override void Train(Tensor<T> input, Tensor<T> expectedOutput)
Parameters
inputTensor<T>expectedOutputTensor<T>
UpdateParameters(Vector<T>)
Updates model parameters using gradient descent.
public override void UpdateParameters(Vector<T> gradients)
Parameters
gradientsVector<T>