Class VoxelCNN<T>
- Namespace
- AiDotNet.NeuralNetworks
- Assembly
- AiDotNet.dll
Represents a Voxel-based 3D Convolutional Neural Network for processing volumetric data.
public class VoxelCNN<T> : NeuralNetworkBase<T>, INeuralNetworkModel<T>, INeuralNetwork<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>, IInterpretableModel<T>, IInputGradientComputable<T>, IDisposable
Type Parameters
TThe numeric type used for calculations (typically float or double).
- Inheritance
-
VoxelCNN<T>
- Implements
- Inherited Members
- Extension Methods
Remarks
A Voxel CNN processes 3D volumetric data using 3D convolutions. This is useful for: - 3D shape recognition from voxelized point clouds (e.g., ModelNet40) - Medical image analysis (CT, MRI scans) - Spatial occupancy prediction
For Beginners: Think of a VoxelCNN as a 3D version of a regular image classifier. Instead of looking at a 2D image, it examines a 3D grid of "blocks" (voxels) to understand 3D shapes. This is like how Minecraft represents the world - each block is either filled or empty, and the pattern of blocks creates recognizable objects.
Applications include:
- Recognizing 3D objects from point cloud scans
- Detecting tumors in 3D medical scans
- Understanding room layouts from depth sensors
Constructors
VoxelCNN(NeuralNetworkArchitecture<T>, int, int, int, IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>?, ILossFunction<T>?, double)
Initializes a new instance of the VoxelCNN<T> class.
public VoxelCNN(NeuralNetworkArchitecture<T> architecture, int voxelResolution = 32, int numConvBlocks = 3, int baseFilters = 32, IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>? optimizer = null, ILossFunction<T>? lossFunction = null, double maxGradNorm = 1)
Parameters
architectureNeuralNetworkArchitecture<T>The architecture defining the structure of the neural network.
voxelResolutionintThe resolution of the voxel grid (e.g., 32 for 32x32x32). Default is 32.
numConvBlocksintNumber of convolutional blocks. Default is 3.
baseFiltersintBase number of filters in first conv layer. Default is 32.
optimizerIGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>The optimizer for training. Defaults to Adam if not specified.
lossFunctionILossFunction<T>The loss function. Defaults based on task type if not specified.
maxGradNormdoubleMaximum gradient norm for clipping. Defaults to 1.0.
Remarks
For Beginners: This constructor creates a VoxelCNN with the specified configuration.
Key parameters explained:
- voxelResolution: The size of the 3D input grid (32 = 32x32x32 voxels)
- numConvBlocks: How many conv+pool layers (more = deeper network)
- baseFilters: Starting number of feature detectors (32 is a good default)
Exceptions
- ArgumentNullException
Thrown when architecture is null.
- ArgumentException
Thrown when voxelResolution or numConvBlocks is not positive.
Properties
BaseFilters
Gets the base number of filters in the first convolutional layer.
public int BaseFilters { get; }
Property Value
Remarks
This value doubles with each convolutional block. For example, with baseFilters=32 and 3 blocks, the filter counts will be 32, 64, 128.
NumConvBlocks
Gets the number of convolutional blocks in the network.
public int NumConvBlocks { get; }
Property Value
Remarks
Each convolutional block consists of a Conv3D layer followed by a MaxPool3D layer. More blocks allow the network to learn more hierarchical features but increase computational cost and risk of overfitting.
VoxelResolution
Gets the voxel grid resolution used by this network.
public int VoxelResolution { get; }
Property Value
Remarks
The voxel resolution determines the spatial dimensions of the input 3D grid. A resolution of 32 means the network expects 32x32x32 voxel grids. Higher resolutions capture more detail but require more computation.
Methods
Backward(Tensor<T>)
Performs a backward pass through the network to compute gradients.
public Tensor<T> Backward(Tensor<T> outputGradient)
Parameters
outputGradientTensor<T>The gradient of the loss with respect to the output.
Returns
- Tensor<T>
The gradient of the loss with respect to the input.
Remarks
The backward pass propagates gradients from the output back through each layer, computing gradients for all trainable parameters.
CreateNewInstance()
Creates a new instance of this model type for cloning purposes.
protected override IFullModel<T, Tensor<T>, Tensor<T>> CreateNewInstance()
Returns
- IFullModel<T, Tensor<T>, Tensor<T>>
A new VoxelCNN<T> instance with the same configuration.
Remarks
For Beginners: This creates a blank version of the same type of neural network.
It's used internally by methods like DeepCopy and Clone to create the right type of network before copying the data into it.
DeserializeNetworkSpecificData(BinaryReader)
Deserializes network-specific data from a binary stream.
protected override void DeserializeNetworkSpecificData(BinaryReader reader)
Parameters
readerBinaryReaderThe binary reader to deserialize from.
Remarks
This method is called at the end of the general deserialization process to allow derived classes to read any additional data specific to their implementation.
For Beginners: Continuing the suitcase analogy, this is like unpacking that special compartment. After the main deserialization method has unpacked the common items (layers, parameters), this method allows each specific type of neural network to unpack its own unique items that were stored during serialization.
Forward(Tensor<T>)
Performs a forward pass through the network.
public Tensor<T> Forward(Tensor<T> input)
Parameters
inputTensor<T>The input voxel grid tensor with shape [batch, channels, depth, height, width] or [channels, depth, height, width] for single samples.
Returns
- Tensor<T>
The output predictions with shape [batch, numClasses] or [numClasses].
Remarks
The forward pass sequentially applies each layer's transformation to the input, producing class probabilities or scores for 3D shape classification.
GetModelMetadata()
Gets metadata about this model for serialization and inspection.
public override ModelMetadata<T> GetModelMetadata()
Returns
- ModelMetadata<T>
A ModelMetadata<T> object containing model information.
InitializeLayers()
Initializes the layers of the VoxelCNN.
protected override void InitializeLayers()
Remarks
If the architecture provides custom layers, those are used. Otherwise, default layers are created using CreateDefaultVoxelCNNLayers(NeuralNetworkArchitecture<T>, int, int, int).
Predict(Tensor<T>)
Generates predictions for the given input.
public override Tensor<T> Predict(Tensor<T> input)
Parameters
inputTensor<T>The input voxel grid tensor.
Returns
- Tensor<T>
The predicted class probabilities or scores.
Remarks
For Beginners: This is the main method you'll use to get results from your trained neural network. You provide some input data (like an image or text), and the network processes it through all its layers to produce an output (like a classification or prediction).
SerializeNetworkSpecificData(BinaryWriter)
Serializes network-specific data to a binary stream.
protected override void SerializeNetworkSpecificData(BinaryWriter writer)
Parameters
writerBinaryWriterThe binary writer to serialize to.
Remarks
This method is called at the end of the general serialization process to allow derived classes to write any additional data specific to their implementation.
For Beginners: Think of this as packing a special compartment in your suitcase. While the main serialization method packs the common items (layers, parameters), this method allows each specific type of neural network to pack its own unique items that other networks might not have.
Train(Tensor<T>, Tensor<T>)
Trains the network on a single batch of input-output pairs.
public override void Train(Tensor<T> input, Tensor<T> expectedOutput)
Parameters
inputTensor<T>The input voxel grid tensor.
expectedOutputTensor<T>The expected output (ground truth labels).
Remarks
Training involves: 1. Forward pass to compute predictions 2. Loss calculation between predictions and expected output 3. Backward pass to compute gradients 4. Gradient clipping to prevent exploding gradients 5. Parameter update using the optimizer
UpdateParameters(Vector<T>)
Updates the network parameters using a flat parameter vector.
public override void UpdateParameters(Vector<T> parameters)
Parameters
parametersVector<T>Vector containing all parameters to set.
Remarks
This method distributes parameters from a flat vector to each layer based on their parameter counts.