Class UNet3D<T>
- Namespace
- AiDotNet.NeuralNetworks
- Assembly
- AiDotNet.dll
Represents a 3D U-Net neural network for volumetric semantic segmentation.
public class UNet3D<T> : NeuralNetworkBase<T>, INeuralNetworkModel<T>, INeuralNetwork<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>, IInterpretableModel<T>, IInputGradientComputable<T>, IDisposable
Type Parameters
TThe numeric type used for calculations (typically float or double).
- Inheritance
-
UNet3D<T>
- Implements
- Inherited Members
- Extension Methods
Remarks
A 3D U-Net extends the classic U-Net architecture to three dimensions for processing volumetric data. It uses an encoder-decoder structure with skip connections to produce dense, per-voxel predictions while preserving both local details and global context.
For Beginners: A 3D U-Net is like an intelligent 3D scanner that can identify and label every single voxel (3D pixel) in a 3D volume.
Think of it like this:
- The encoder (left side of "U") looks at the big picture by progressively zooming out
- The decoder (right side of "U") zooms back in to produce detailed predictions
- Skip connections (horizontal lines in "U") preserve fine details from encoder to decoder
This is useful for:
- Medical imaging: Finding organs or tumors in CT/MRI scans
- 3D scene understanding: Segmenting objects in point clouds
- Part segmentation: Identifying different parts of 3D shapes
The "U" shape comes from the symmetric encoder-decoder design with skip connections.
Constructors
UNet3D(NeuralNetworkArchitecture<T>, int, int, int, IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>?, ILossFunction<T>?, double)
Initializes a new instance of the UNet3D<T> class.
public UNet3D(NeuralNetworkArchitecture<T> architecture, int voxelResolution = 32, int numEncoderBlocks = 4, int baseFilters = 32, IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>? optimizer = null, ILossFunction<T>? lossFunction = null, double maxGradNorm = 1)
Parameters
architectureNeuralNetworkArchitecture<T>The architecture defining the structure of the neural network.
voxelResolutionintThe resolution of the voxel grid (e.g., 32 for 32x32x32). Default is 32.
numEncoderBlocksintNumber of encoder blocks. Default is 4.
baseFiltersintBase number of filters in first encoder block. Default is 32.
optimizerIGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>The optimizer for training. Defaults to Adam if not specified.
lossFunctionILossFunction<T>The loss function. Defaults based on task type if not specified.
maxGradNormdoubleMaximum gradient norm for clipping. Defaults to 1.0.
Remarks
For Beginners: This constructor creates a 3D U-Net with the specified configuration.
Key parameters explained:
- voxelResolution: The size of the 3D input grid (32 = 32×32×32 voxels)
- numEncoderBlocks: How many downsampling stages (more = deeper network)
- baseFilters: Starting number of feature detectors (32 is a good default)
Exceptions
- ArgumentNullException
Thrown when architecture is null.
- ArgumentException
Thrown when voxelResolution or numEncoderBlocks is not positive.
Properties
BaseFilters
Gets the base number of filters in the first encoder block.
public int BaseFilters { get; }
Property Value
Remarks
This value doubles with each encoder block. For example, with baseFilters=32 and 4 blocks, the filter counts will be 32, 64, 128, 256.
NumClasses
Gets the number of output classes (segmentation categories).
public int NumClasses { get; }
Property Value
Remarks
For binary segmentation (foreground/background), this is 1. For multi-class segmentation, this equals the number of categories.
NumEncoderBlocks
Gets the number of encoder blocks in the network.
public int NumEncoderBlocks { get; }
Property Value
Remarks
Each encoder block consists of two Conv3D layers followed by a MaxPool3D layer (except the last encoder block). More blocks allow deeper feature extraction but require higher input resolution and more computation.
VoxelResolution
Gets the voxel grid resolution used by this network.
public int VoxelResolution { get; }
Property Value
Remarks
The voxel resolution determines the spatial dimensions of the input and output 3D grids. A resolution of 32 means the network processes 32×32×32 voxel grids. Input and output have the same spatial resolution (dense prediction).
Methods
Backward(Tensor<T>)
Performs a backward pass through the network to compute gradients.
public Tensor<T> Backward(Tensor<T> outputGradient)
Parameters
outputGradientTensor<T>The gradient of the loss with respect to the output.
Returns
- Tensor<T>
The gradient of the loss with respect to the input.
Remarks
The backward pass propagates gradients from the output back through each layer, computing gradients for all trainable parameters.
CreateNewInstance()
Creates a new instance of this model type for cloning purposes.
protected override IFullModel<T, Tensor<T>, Tensor<T>> CreateNewInstance()
Returns
- IFullModel<T, Tensor<T>, Tensor<T>>
A new UNet3D<T> instance with the same configuration.
Remarks
For Beginners: This creates a blank version of the same type of neural network.
It's used internally by methods like DeepCopy and Clone to create the right type of network before copying the data into it.
DeserializeNetworkSpecificData(BinaryReader)
Deserializes network-specific data from a binary stream.
protected override void DeserializeNetworkSpecificData(BinaryReader reader)
Parameters
readerBinaryReaderThe binary reader to deserialize from.
Remarks
This method is called at the end of the general deserialization process to allow derived classes to read any additional data specific to their implementation.
For Beginners: Continuing the suitcase analogy, this is like unpacking that special compartment. After the main deserialization method has unpacked the common items (layers, parameters), this method allows each specific type of neural network to unpack its own unique items that were stored during serialization.
Forward(Tensor<T>)
Performs a forward pass through the network.
public Tensor<T> Forward(Tensor<T> input)
Parameters
inputTensor<T>The input voxel grid tensor with shape [batch, channels, depth, height, width] or [channels, depth, height, width] for single samples.
Returns
- Tensor<T>
The output segmentation map with shape [batch, numClasses, depth, height, width] or [numClasses, depth, height, width] for single samples.
Remarks
The forward pass sequentially applies each layer's transformation to the input, producing per-voxel class predictions for 3D semantic segmentation.
GetModelMetadata()
Gets metadata about this model for serialization and inspection.
public override ModelMetadata<T> GetModelMetadata()
Returns
- ModelMetadata<T>
A ModelMetadata<T> object containing model information.
InitializeLayers()
Initializes the layers of the 3D U-Net.
protected override void InitializeLayers()
Remarks
If the architecture provides custom layers, those are used. Otherwise, default layers are created using CreateDefaultUNet3DLayers(NeuralNetworkArchitecture<T>, int, int, int).
Predict(Tensor<T>)
Generates predictions for the given input.
public override Tensor<T> Predict(Tensor<T> input)
Parameters
inputTensor<T>The input voxel grid tensor.
Returns
- Tensor<T>
The predicted segmentation map.
Remarks
For Beginners: This is the main method you'll use to get results from your trained neural network. You provide some input data (like an image or text), and the network processes it through all its layers to produce an output (like a classification or prediction).
SerializeNetworkSpecificData(BinaryWriter)
Serializes network-specific data to a binary stream.
protected override void SerializeNetworkSpecificData(BinaryWriter writer)
Parameters
writerBinaryWriterThe binary writer to serialize to.
Remarks
This method is called at the end of the general serialization process to allow derived classes to write any additional data specific to their implementation.
For Beginners: Think of this as packing a special compartment in your suitcase. While the main serialization method packs the common items (layers, parameters), this method allows each specific type of neural network to pack its own unique items that other networks might not have.
Train(Tensor<T>, Tensor<T>)
Trains the network on a single batch of input-output pairs.
public override void Train(Tensor<T> input, Tensor<T> expectedOutput)
Parameters
inputTensor<T>The input voxel grid tensor.
expectedOutputTensor<T>The expected segmentation map (ground truth labels).
Remarks
Training involves: 1. Forward pass to compute predictions 2. Loss calculation between predictions and expected output 3. Backward pass to compute gradients 4. Gradient clipping to prevent exploding gradients 5. Parameter update using the optimizer
UpdateParameters(Vector<T>)
Updates the network parameters using a flat parameter vector.
public override void UpdateParameters(Vector<T> parameters)
Parameters
parametersVector<T>Vector containing all parameters to set.
Remarks
This method distributes parameters from a flat vector to each layer based on their parameter counts.