Interface IDiffusionModel<T>
- Namespace
- AiDotNet.Interfaces
- Assembly
- AiDotNet.dll
Interface for diffusion-based generative models.
public interface IDiffusionModel<T> : IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>
Type Parameters
TThe numeric type used for calculations.
- Inherited Members
- Extension Methods
Remarks
Diffusion models are a class of generative models that learn to create data by reversing a gradual noising process. They have achieved state-of-the-art results in image generation, audio synthesis, and other generative tasks.
For Beginners: Diffusion models are like learning to reverse a process of adding static to a TV signal.
How diffusion works:
- Forward process (training): Start with real data, gradually add noise until it's pure static
- Reverse process (generation): Start with pure static, gradually remove noise to create new data
The model learns: "Given this noisy version, what did the original look like?"
This is different from other generative models:
- GANs: Two networks competing (generator vs discriminator)
- VAEs: Compress and decompress through a bottleneck
- Diffusion: Iteratively denoise from random noise
Diffusion models are known for:
- High quality outputs (often better than GANs)
- Stable training (no mode collapse)
- Good diversity (produces varied outputs)
- Slower generation (many denoising steps needed)
Key components: - Noise prediction model: A neural network that predicts noise in images - Noise scheduler: Controls the noise schedule (see INoiseScheduler<T>) - Loss function: Measures how well the model predicts noise (usually MSE)
This interface extends IFullModel<T, TInput, TOutput> to provide a consistent API for diffusion models while inheriting all the standard model capabilities (training, saving, loading, gradients, checkpointing, etc.).
Properties
Scheduler
Gets the step scheduler used for the diffusion process.
INoiseScheduler<T> Scheduler { get; }
Property Value
Remarks
The scheduler controls the noise schedule and denoising steps during generation. Different schedulers offer different tradeoffs between quality and speed: - DDPM: Original scheduler, high quality but slow (1000 steps) - DDIM: Deterministic, allows fewer steps (20-100) - PNDM: Fast multi-step scheduler (20-50 steps)
Methods
ComputeLoss(Tensor<T>, Tensor<T>, int[])
Computes the training loss for a batch of samples.
T ComputeLoss(Tensor<T> cleanSamples, Tensor<T> noise, int[] timesteps)
Parameters
cleanSamplesTensor<T>The original clean samples.
noiseTensor<T>The noise to add (typically sampled from standard normal).
timestepsint[]The timesteps at which to compute loss (one per sample).
Returns
- T
The computed loss value.
Remarks
The standard diffusion training loss is the mean squared error between the actual noise and the model's predicted noise.
For Beginners: During training: 1. Take a clean image 2. Add known noise to it at a random timestep 3. Ask the model to predict what noise was added 4. Compare the prediction to the actual noise (this is the loss) 5. Update the model to make better predictions
Generate(int[], int, int?)
Generates samples by iteratively denoising from random noise.
Tensor<T> Generate(int[] shape, int numInferenceSteps = 50, int? seed = null)
Parameters
shapeint[]The shape of samples to generate (e.g., [batchSize, channels, height, width]).
numInferenceStepsintNumber of denoising steps. More steps = higher quality, slower.
seedint?Optional random seed for reproducibility. If null, uses system random.
Returns
- Tensor<T>
Generated samples as a tensor.
Remarks
This is the main generation method. It starts with random noise and applies the reverse diffusion process to generate new samples.
For Beginners: This is how you create new images/data: 1. Start with pure random noise (like TV static) 2. Ask the model "what does this look like minus some noise?" 3. Repeat many times, each time removing a bit more noise 4. End with a clean generated sample
More inference steps = cleaner results but slower generation. Typical values: 20-50 for fast generation, 100-200 for high quality.
PredictNoise(Tensor<T>, int)
Predicts the noise in a noisy sample at a given timestep.
Tensor<T> PredictNoise(Tensor<T> noisySample, int timestep)
Parameters
noisySampleTensor<T>The noisy input sample.
timestepintThe current timestep in the diffusion process.
Returns
- Tensor<T>
The predicted noise tensor.
Remarks
This is the core prediction that the model learns. Given a noisy sample at timestep t, predict what noise was added to create it.
For Beginners: The model looks at a noisy image and guesses "what noise was added to make it look like this?" This prediction is then used to remove that noise and get a cleaner image.