Class NoisePredictorBase<T>

Namespace: AiDotNet.Diffusion.NoisePredictors

Assembly: AiDotNet.dll

Base class for noise prediction networks used in diffusion models.

public abstract class NoisePredictorBase<T> : INoisePredictor<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>

Type Parameters

T: The numeric type used for calculations.

Inheritance: object

NoisePredictorBase<T>

Implements: INoisePredictor<T>

IFullModel<T, Tensor<T>, Tensor<T>>

IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>

IModelSerializer

ICheckpointableModel

IParameterizable<T, Tensor<T>, Tensor<T>>

IFeatureAware

IFeatureImportance<T>

ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>

IGradientComputable<T, Tensor<T>, Tensor<T>>

IJitCompilable<T>

Derived: DiTNoisePredictor<T>

UNetNoisePredictor<T>

VideoUNetPredictor<T>

Inherited Members: object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Extension Methods: DistributedExtensions.AsDistributedForHighBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributedForLowBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, IShardingConfiguration<T>)

Remarks

This abstract base class provides common functionality for all noise predictors, including timestep embedding, parameter management, serialization, and gradient computation.

For Beginners: This is the foundation that all noise prediction networks build upon. Noise predictors are the neural networks at the heart of diffusion models that learn to predict what noise was added to a sample. Different architectures (U-Net, DiT, etc.) extend this base class.

Constructors

NoisePredictorBase(ILossFunction<T>?, int?)

Initializes a new instance of the NoisePredictorBase class.

protected NoisePredictorBase(ILossFunction<T>? lossFunction = null, int? seed = null)

Parameters

lossFunction ILossFunction<T>: Optional custom loss function. Defaults to MSE.
seed int?: Optional random seed for reproducibility.

Fields

LossFunction

The loss function used for training (typically MSE for noise prediction).

protected readonly ILossFunction<T> LossFunction

Field Value

ILossFunction<T>

NumOps

Provides numeric operations for the specific type T.

protected static readonly INumericOperations<T> NumOps

Field Value

INumericOperations<T>

RandomGenerator

Random number generator for initialization and stochastic operations.

protected Random RandomGenerator

Field Value

Random

Properties

BaseChannels

Gets the base channel count used in the network architecture.

public abstract int BaseChannels { get; }

Property Value

int

Remarks

This determines the model capacity. Common values: - 320 for Stable Diffusion 1.x and 2.x - 384 for Stable Diffusion XL (base) - 1024 for large DiT models

ContextDimension

Gets the expected context dimension for cross-attention conditioning.

public abstract int ContextDimension { get; }

Property Value

int

Remarks

For CLIP-conditioned models, this is typically 768 or 1024. For T5-conditioned models (like SD3), this is typically 2048. Returns 0 if cross-attention is not supported.

DefaultLossFunction

Gets the default loss function used by this model for gradient computation.

public ILossFunction<T> DefaultLossFunction { get; }

Property Value

ILossFunction<T>

Remarks

This loss function is used when calling ComputeGradients(TInput, TOutput, ILossFunction<T>?) without explicitly providing a loss function. It represents the model's primary training objective.

For Beginners: The loss function tells the model "what counts as a mistake". For example: - For regression (predicting numbers): Mean Squared Error measures how far predictions are from actual values - For classification (predicting categories): Cross Entropy measures how confident the model is in the right category

This property provides a sensible default so you don't have to specify the loss function every time, but you can still override it if needed for special cases.

Distributed Training: In distributed training, all workers use the same loss function to ensure consistent gradient computation. The default loss function is automatically used when workers compute local gradients.

Exceptions

InvalidOperationException: Thrown if accessed before the model has been configured with a loss function.

InputChannels

Gets the number of input channels the predictor expects.

public abstract int InputChannels { get; }

Property Value

int

Remarks

For image models, this is typically: - 4 for latent diffusion models (VAE latent channels) - 3 for pixel-space RGB models - Higher for models with additional conditioning channels

OutputChannels

Gets the number of output channels the predictor produces.

public abstract int OutputChannels { get; }

Property Value

int

Remarks

Usually matches InputChannels since we predict noise of the same shape as input. Some architectures may predict additional outputs like variance.

ParameterCount

Gets the number of parameters in the model.

public abstract int ParameterCount { get; }

Property Value

int

Remarks

This property returns the total count of trainable parameters in the model. It's useful for understanding model complexity and memory requirements.

SupportsCFG

Gets whether this noise predictor supports classifier-free guidance.

public abstract bool SupportsCFG { get; }

Property Value

bool

Remarks

Classifier-free guidance allows steering generation toward the conditioning (e.g., text prompt) without a separate classifier. Most modern models support this.

SupportsCrossAttention

Gets whether this noise predictor supports cross-attention conditioning.

public abstract bool SupportsCrossAttention { get; }

Property Value

bool

Remarks

Cross-attention allows the model to attend to conditioning tokens (like text embeddings). This is how text-to-image models incorporate the prompt.

SupportsJitCompilation

Gets whether this model currently supports JIT compilation.

public virtual bool SupportsJitCompilation { get; }

Property Value

bool: True if the model can be JIT compiled, false otherwise.

Remarks

Some models may not support JIT compilation due to: - Dynamic graph structure (changes based on input) - Lack of computation graph representation - Use of operations not yet supported by the JIT compiler

For Beginners: This tells you whether this specific model can benefit from JIT compilation.

Models return false if they:

Use layer-based architecture without graph export (e.g., current neural networks)
Have control flow that changes based on input data
Use operations the JIT compiler doesn't understand yet

In these cases, the model will still work normally, just without JIT acceleration.

TimeEmbeddingDim

Gets the dimension of the time/timestep embedding.

public abstract int TimeEmbeddingDim { get; }

Property Value

int

Remarks

The timestep is embedded into a high-dimensional vector before being injected into the network. Typical values: 256, 512, 1024.

Methods

ApplyGradients(Vector<T>, T)

Applies pre-computed gradients to update the model parameters.

public virtual void ApplyGradients(Vector<T> gradients, T learningRate)

Parameters

gradients Vector<T>: The gradient vector to apply.
learningRate T: The learning rate for the update.

Remarks

Updates parameters using: θ = θ - learningRate * gradients

For Beginners: After computing gradients (seeing which direction to move), this method actually moves the model in that direction. The learning rate controls how big of a step to take.

Distributed Training: In DDP/ZeRO-2, this applies the synchronized (averaged) gradients after communication across workers. Each worker applies the same averaged gradients to keep parameters consistent.

Clone()

Creates a deep copy of the noise predictor.

public abstract INoisePredictor<T> Clone()

Returns

INoisePredictor<T>: A new instance with the same parameters.

ComputeGradients(Tensor<T>, Tensor<T>, ILossFunction<T>?)

Computes gradients of the loss function with respect to model parameters for the given data, WITHOUT updating the model parameters.

public virtual Vector<T> ComputeGradients(Tensor<T> input, Tensor<T> target, ILossFunction<T>? lossFunction = null)

Parameters

input Tensor<T>: The input data.
target Tensor<T>: The target/expected output.
lossFunction ILossFunction<T>: The loss function to use for gradient computation. If null, uses the model's default loss function.

Returns

Vector<T>: A vector containing gradients with respect to all model parameters.

Remarks

This method performs a forward pass, computes the loss, and back-propagates to compute gradients, but does NOT update the model's parameters. The parameters remain unchanged after this call.

Distributed Training: In DDP/ZeRO-2, each worker calls this to compute local gradients on its data batch. These gradients are then synchronized (averaged) across workers before applying updates. This ensures all workers compute the same parameter updates despite having different data.

For Meta-Learning: After adapting a model on a support set, you can use this method to compute gradients on the query set. These gradients become the meta-gradients for updating the meta-parameters.

For Beginners: Think of this as "dry run" training: - The model sees what direction it should move (the gradients) - But it doesn't actually move (parameters stay the same) - You get to decide what to do with this information (average with others, inspect, modify, etc.)

Exceptions

InvalidOperationException: If lossFunction is null and the model has no default loss function.

DeepCopy()

Creates a deep copy of this object.

public abstract IFullModel<T, Tensor<T>, Tensor<T>> DeepCopy()

Returns

IFullModel<T, Tensor<T>, Tensor<T>>

Deserialize(byte[])

Loads a previously serialized model from binary data.

public virtual void Deserialize(byte[] data)

Parameters

data byte[]: The byte array containing the serialized model data.

Remarks

This method takes binary data created by the Serialize method and uses it to restore a model to its previous state.

For Beginners: This is like opening a saved file to continue your work.

When you call this method:

You provide the binary data (bytes) that was previously created by Serialize
The model rebuilds itself using this data
After deserializing, the model is exactly as it was when serialized
It's ready to make predictions without needing to be trained again

For example:

You download a pre-trained model file for detecting spam emails
You deserialize this file into your application
Immediately, your application can detect spam without any training
The model has all the knowledge that was built into it by its original creator

This is particularly useful when:

You want to use a model that took days to train
You need to deploy the same model across multiple devices
You're creating an application that non-technical users will use

Think of it like installing the brain of a trained expert directly into your application.

ExportComputationGraph(List<ComputationNode<T>>)

Exports the model's computation graph for JIT compilation.

public virtual ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)

Parameters

inputNodes List<ComputationNode<T>>: List to populate with input computation nodes (parameters).

Returns

ComputationNode<T>: The output computation node representing the model's prediction.

Remarks

This method should construct a computation graph representing the model's forward pass. The graph should use placeholder input nodes that will be filled with actual data during execution.

For Beginners: This method creates a "recipe" of your model's calculations that the JIT compiler can optimize.

The method should:

Create placeholder nodes for inputs (features, parameters)
Build the computation graph using TensorOperations
Return the final output node
Add all input nodes to the inputNodes list (in order)

Example for a simple linear model (y = Wx + b):

public ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)
{
    // Create placeholder inputs
    var x = TensorOperations<T>.Variable(new Tensor<T>(InputShape), "x");
    var W = TensorOperations<T>.Variable(Weights, "W");
    var b = TensorOperations<T>.Variable(Bias, "b");

    // Add inputs in order
    inputNodes.Add(x);
    inputNodes.Add(W);
    inputNodes.Add(b);

    // Build graph: y = Wx + b
    var matmul = TensorOperations<T>.MatMul(x, W);
    var output = TensorOperations<T>.Add(matmul, b);

    return output;
}

The JIT compiler will then:

Optimize the graph (fuse operations, eliminate dead code)
Compile it to fast native code
Cache the compiled version for reuse

GetActiveFeatureIndices()

Gets the indices of features that are actively used by this model.

public virtual IEnumerable<int> GetActiveFeatureIndices()

Returns

IEnumerable<int>

GetFeatureImportance()

Gets the feature importance scores.

public virtual Dictionary<string, T> GetFeatureImportance()

Returns

Dictionary<string, T>

GetModelMetadata()

Retrieves metadata and performance metrics about the trained model.

public virtual ModelMetadata<T> GetModelMetadata()

Returns

ModelMetadata<T>: An object containing metadata and performance metrics about the trained model.

Remarks

This method provides information about the model's structure, parameters, and performance metrics.

For Beginners: Model metadata is like a report card for your machine learning model.

Just as a report card shows how well a student is performing in different subjects, model metadata shows how well your model is performing and provides details about its structure.

This information typically includes:

Accuracy measures: How well does the model's predictions match actual values?
Error metrics: How far off are the model's predictions on average?
Model parameters: What patterns did the model learn from the data?
Training information: How long did training take? How many iterations were needed?

For example, in a house price prediction model, metadata might include:

Average prediction error (e.g., off by $15,000 on average)
How strongly each feature (bedrooms, location) influences the prediction
How well the model fits the training data

This information helps you understand your model's strengths and weaknesses, and decide if it's ready to use or needs more training.

GetParameters()

Gets the parameters that can be optimized.

public abstract Vector<T> GetParameters()

Returns

Vector<T>

GetTimestepEmbedding(int)

Computes the timestep embedding for a given timestep.

public virtual Tensor<T> GetTimestepEmbedding(int timestep)

Parameters

timestep int: The timestep to embed.

Returns

Tensor<T>: The timestep embedding vector [timeEmbeddingDim].

Remarks

Timesteps are typically embedded using sinusoidal positional encodings (like in Transformers) followed by a small MLP.

IsFeatureUsed(int)

Checks if a specific feature is used by this model.

public virtual bool IsFeatureUsed(int featureIndex)

Parameters

featureIndex int

Returns

bool

LoadModel(string)

Loads the model from a file.

public virtual void LoadModel(string filePath)

Parameters

filePath string: The path to the file containing the saved model.

Remarks

This method provides a convenient way to load a model directly from disk. It combines file I/O operations with deserialization.

For Beginners: This is like clicking "Open" in a document editor. Instead of manually reading from a file and then calling Deserialize(), this method does both steps for you.

Exceptions

FileNotFoundException: Thrown when the specified file does not exist.
IOException: Thrown when an I/O error occurs while reading from the file or when the file contains corrupted or invalid model data.

LoadState(Stream)

Loads the model's state (parameters and configuration) from a stream.

public virtual void LoadState(Stream stream)

Parameters

stream Stream: The stream to read the model state from.

Remarks

This method deserializes model state that was previously saved with SaveState, restoring all parameters and configuration to recreate the saved model state.

For Beginners: This is like loading a saved game.

When you call LoadState:

All the parameters are read from the stream
The model is configured to match the saved architecture
The model becomes identical to when SaveState was called

After loading, the model can make predictions using the restored parameters.

Stream Handling: - The stream position will be advanced by the number of bytes read - The stream is not closed (caller must dispose) - Stream data must match the format written by SaveState

Versioning: Implementations should consider: - Including format version number in serialized data - Validating compatibility before deserialization - Providing migration paths for old formats when possible

Usage:

// Load from file
using var stream = File.OpenRead("model.bin");
model.LoadState(stream);

Important: The stream must contain state data saved by SaveState from a compatible model (same architecture and numeric type).

Exceptions

ArgumentNullException: Thrown when stream is null.
ArgumentException: Thrown when stream is not readable or contains invalid data.
InvalidOperationException: Thrown when deserialization fails or data is incompatible with model architecture.

Predict(Tensor<T>)

Uses the trained model to make predictions for new input data.

public virtual Tensor<T> Predict(Tensor<T> input)

Parameters

input Tensor<T>: A matrix where each row represents a new example to predict and each column represents a feature.

Returns

Tensor<T>: A vector containing the predicted values for each input example.

Remarks

After training, this method applies the learned patterns to new data to predict outcomes.

For Beginners: Prediction is when the model uses what it learned to make educated guesses about new information.

Continuing the fruit identification example:

After learning from many examples, the child (model) can now identify new fruits they haven't seen before
They look at the color, shape, and size to make their best guess

In machine learning:

You give the model new data it hasn't seen during training
The model applies the patterns it learned to make predictions
The output is the model's best estimate based on its training

For example, in a house price prediction model:

You provide features of a new house (square footage, bedrooms, location)
The model predicts what price that house might sell for

This method is used after training is complete, when you want to apply your model to real-world data.

PredictNoise(Tensor<T>, int, Tensor<T>?)

Predicts the noise in a noisy sample at a given timestep.

public abstract Tensor<T> PredictNoise(Tensor<T> noisySample, int timestep, Tensor<T>? conditioning = null)

Parameters

noisySample Tensor<T>: The noisy input sample [batch, channels, height, width].
timestep int: The current timestep in the diffusion process.
conditioning Tensor<T>: Optional conditioning tensor (e.g., text embeddings).

Returns

Tensor<T>: The predicted noise tensor with the same shape as noisySample.

Remarks

This is the main forward pass of the noise predictor. Given a noisy sample at timestep t, it predicts what noise was added.

For Beginners: This is where the actual denoising happens: 1. The network looks at the noisy image 2. It considers how noisy it should be at this timestep 3. It predicts the noise pattern 4. This prediction is subtracted to get a cleaner image

PredictNoiseWithEmbedding(Tensor<T>, Tensor<T>, Tensor<T>?)

Predicts noise with explicit timestep embedding (for batched different timesteps).

public virtual Tensor<T> PredictNoiseWithEmbedding(Tensor<T> noisySample, Tensor<T> timeEmbedding, Tensor<T>? conditioning = null)

Parameters

noisySample Tensor<T>: The noisy input sample [batch, channels, height, width].
timeEmbedding Tensor<T>: Pre-computed timestep embeddings [batch, timeEmbeddingDim].
conditioning Tensor<T>: Optional conditioning tensor (e.g., text embeddings).

Returns

Tensor<T>: The predicted noise tensor with the same shape as noisySample.

Remarks

This overload is useful when you want to use different timesteps per sample in a batch, or when you have pre-computed timestep embeddings for efficiency.

SampleNoise(int[], Random?)

Samples random noise from a standard normal distribution.

protected virtual Tensor<T> SampleNoise(int[] shape, Random? rng = null)

Parameters

shape int[]: The shape of the noise tensor.
rng Random: Optional random number generator.

Returns

Tensor<T>: A tensor of random noise values.

SaveModel(string)

Saves the model to a file.

public virtual void SaveModel(string filePath)

Parameters

filePath string: The path where the model should be saved.

Remarks

This method provides a convenient way to save the model directly to disk. It combines serialization with file I/O operations.

For Beginners: This is like clicking "Save As" in a document editor. Instead of manually calling Serialize() and then writing to a file, this method does both steps for you.

Exceptions

IOException: Thrown when an I/O error occurs while writing to the file.
UnauthorizedAccessException: Thrown when the caller does not have the required permission to write to the specified file path.

SaveState(Stream)

Saves the model's current state (parameters and configuration) to a stream.

public virtual void SaveState(Stream stream)

Parameters

stream Stream: The stream to write the model state to.

Remarks

This method serializes all the information needed to recreate the model's current state, including trained parameters, layer configurations, and any internal state variables.

For Beginners: This is like creating a snapshot of your trained model.

When you call SaveState:

All the learned parameters (weights and biases) are written to the stream
The model's architecture information is saved
Any other internal state (like normalization statistics) is preserved

You can later use LoadState to restore the model to this exact state.

Stream Handling: - The stream position will be advanced by the number of bytes written - The stream is flushed but not closed (caller must dispose) - For file-based persistence, wrap in File.Create/FileStream

Usage:

// Save to file
using var stream = File.Create("model.bin");
model.SaveState(stream);

Exceptions

ArgumentNullException: Thrown when stream is null.
ArgumentException: Thrown when stream is not writable.
InvalidOperationException: Thrown when model state cannot be serialized (e.g., uninitialized model).

Serialize()

Converts the current state of a machine learning model into a binary format.

public virtual byte[] Serialize()

Returns

byte[]: A byte array containing the serialized model data.

Remarks

This method captures all the essential information about a trained model and converts it into a sequence of bytes that can be stored or transmitted.

For Beginners: This is like exporting your work to a file.

When you call this method:

The model's current state (all its learned patterns and parameters) is captured
This information is converted into a compact binary format (bytes)
You can then save these bytes to a file, database, or send them over a network

For example:

After training a model to recognize cats vs. dogs in images
You can serialize the model to save all its learned knowledge
Later, you can use this saved data to recreate the model exactly as it was
The recreated model will make the same predictions as the original

Think of it like taking a snapshot of your model's brain at a specific moment in time.

SetActiveFeatureIndices(IEnumerable<int>)

Sets the active feature indices for this model.

public virtual void SetActiveFeatureIndices(IEnumerable<int> featureIndices)

Parameters

featureIndices IEnumerable<int>

SetParameters(Vector<T>)

Sets the model parameters.

public abstract void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>: The parameter vector to set.

Remarks

This method allows direct modification of the model's internal parameters. This is useful for optimization algorithms that need to update parameters iteratively. If the length of parameters does not match ParameterCount, an ArgumentException should be thrown.

Exceptions

ArgumentException: Thrown when the length of parameters does not match ParameterCount.

Train(Tensor<T>, Tensor<T>)

Trains the model using input features and their corresponding target values.

public virtual void Train(Tensor<T> input, Tensor<T> expectedOutput)

Parameters

input Tensor<T>
expectedOutput Tensor<T>

Remarks

This method takes training data and adjusts the model's internal parameters to learn patterns in the data.

For Beginners: Training is like teaching the model by showing it examples.

Imagine teaching a child to identify fruits:

You show them many examples of apples, oranges, and bananas (input features x)
You tell them the correct name for each fruit (target values y)
Over time, they learn to recognize the patterns that distinguish each fruit

In machine learning:

The x parameter contains features (characteristics) of your data
The y parameter contains the correct answers you want the model to learn
During training, the model adjusts its internal calculations to get better at predicting y from x

For example, in a house price prediction model:

x would contain features like square footage, number of bedrooms, location
y would contain the actual sale prices of those houses

WithParameters(Vector<T>)

Creates a new instance with the specified parameters.

public virtual IFullModel<T, Tensor<T>, Tensor<T>> WithParameters(Vector<T> parameters)

Parameters

parameters Vector<T>

Returns

IFullModel<T, Tensor<T>, Tensor<T>>

Table of Contents

Class NoisePredictorBase<T>

Type Parameters

Remarks

Constructors

NoisePredictorBase(ILossFunction<T>?, int?)

Parameters

Fields

LossFunction

Field Value

NumOps

Field Value

RandomGenerator

Field Value

Properties

BaseChannels

Property Value

Remarks

ContextDimension

Property Value

Remarks

DefaultLossFunction

Property Value

Remarks

Exceptions

InputChannels

Property Value

Remarks

OutputChannels

Property Value

Remarks

ParameterCount

Property Value

Remarks

SupportsCFG

Property Value

Remarks

SupportsCrossAttention

Property Value

Remarks

SupportsJitCompilation

Property Value

Remarks

TimeEmbeddingDim

Property Value

Remarks

Methods

ApplyGradients(Vector<T>, T)

Parameters

Remarks

Clone()

Returns

ComputeGradients(Tensor<T>, Tensor<T>, ILossFunction<T>?)

Parameters

Returns

Remarks

Exceptions

DeepCopy()

Returns

Deserialize(byte[])

Parameters

Remarks

ExportComputationGraph(List<ComputationNode<T>>)

Parameters

Returns

Remarks

GetActiveFeatureIndices()

Returns

GetFeatureImportance()

Returns

GetModelMetadata()

Returns

Remarks

GetParameters()

Returns

GetTimestepEmbedding(int)

Parameters

Returns

Remarks

IsFeatureUsed(int)