Table of Contents

Interface IDiffusionModel<T>

Namespace
AiDotNet.Interfaces
Assembly
AiDotNet.dll

Interface for diffusion-based generative models.

public interface IDiffusionModel<T> : IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>

Type Parameters

T

The numeric type used for calculations.

Inherited Members
Extension Methods

Remarks

Diffusion models are a class of generative models that learn to create data by reversing a gradual noising process. They have achieved state-of-the-art results in image generation, audio synthesis, and other generative tasks.

For Beginners: Diffusion models are like learning to reverse a process of adding static to a TV signal.

How diffusion works:

  1. Forward process (training): Start with real data, gradually add noise until it's pure static
  2. Reverse process (generation): Start with pure static, gradually remove noise to create new data

The model learns: "Given this noisy version, what did the original look like?"

This is different from other generative models:

  • GANs: Two networks competing (generator vs discriminator)
  • VAEs: Compress and decompress through a bottleneck
  • Diffusion: Iteratively denoise from random noise

Diffusion models are known for:

  • High quality outputs (often better than GANs)
  • Stable training (no mode collapse)
  • Good diversity (produces varied outputs)
  • Slower generation (many denoising steps needed)

Key components: - Noise prediction model: A neural network that predicts noise in images - Noise scheduler: Controls the noise schedule (see INoiseScheduler<T>) - Loss function: Measures how well the model predicts noise (usually MSE)

This interface extends IFullModel<T, TInput, TOutput> to provide a consistent API for diffusion models while inheriting all the standard model capabilities (training, saving, loading, gradients, checkpointing, etc.).

Properties

Scheduler

Gets the step scheduler used for the diffusion process.

INoiseScheduler<T> Scheduler { get; }

Property Value

INoiseScheduler<T>

Remarks

The scheduler controls the noise schedule and denoising steps during generation. Different schedulers offer different tradeoffs between quality and speed: - DDPM: Original scheduler, high quality but slow (1000 steps) - DDIM: Deterministic, allows fewer steps (20-100) - PNDM: Fast multi-step scheduler (20-50 steps)

Methods

ComputeLoss(Tensor<T>, Tensor<T>, int[])

Computes the training loss for a batch of samples.

T ComputeLoss(Tensor<T> cleanSamples, Tensor<T> noise, int[] timesteps)

Parameters

cleanSamples Tensor<T>

The original clean samples.

noise Tensor<T>

The noise to add (typically sampled from standard normal).

timesteps int[]

The timesteps at which to compute loss (one per sample).

Returns

T

The computed loss value.

Remarks

The standard diffusion training loss is the mean squared error between the actual noise and the model's predicted noise.

For Beginners: During training: 1. Take a clean image 2. Add known noise to it at a random timestep 3. Ask the model to predict what noise was added 4. Compare the prediction to the actual noise (this is the loss) 5. Update the model to make better predictions

Generate(int[], int, int?)

Generates samples by iteratively denoising from random noise.

Tensor<T> Generate(int[] shape, int numInferenceSteps = 50, int? seed = null)

Parameters

shape int[]

The shape of samples to generate (e.g., [batchSize, channels, height, width]).

numInferenceSteps int

Number of denoising steps. More steps = higher quality, slower.

seed int?

Optional random seed for reproducibility. If null, uses system random.

Returns

Tensor<T>

Generated samples as a tensor.

Remarks

This is the main generation method. It starts with random noise and applies the reverse diffusion process to generate new samples.

For Beginners: This is how you create new images/data: 1. Start with pure random noise (like TV static) 2. Ask the model "what does this look like minus some noise?" 3. Repeat many times, each time removing a bit more noise 4. End with a clean generated sample

More inference steps = cleaner results but slower generation. Typical values: 20-50 for fast generation, 100-200 for high quality.

PredictNoise(Tensor<T>, int)

Predicts the noise in a noisy sample at a given timestep.

Tensor<T> PredictNoise(Tensor<T> noisySample, int timestep)

Parameters

noisySample Tensor<T>

The noisy input sample.

timestep int

The current timestep in the diffusion process.

Returns

Tensor<T>

The predicted noise tensor.

Remarks

This is the core prediction that the model learns. Given a noisy sample at timestep t, predict what noise was added to create it.

For Beginners: The model looks at a noisy image and guesses "what noise was added to make it look like this?" This prediction is then used to remove that noise and get a cleaner image.