Class LatentDiffusionModelBase<T>

Namespace: AiDotNet.Diffusion

Assembly: AiDotNet.dll

Base class for latent diffusion models that operate in a compressed latent space.

public abstract class LatentDiffusionModelBase<T> : DiffusionModelBase<T>, ILatentDiffusionModel<T>, IDiffusionModel<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>

Type Parameters

T: The numeric type used for calculations.

Inheritance: object

DiffusionModelBase<T>

LatentDiffusionModelBase<T>

Implements: ILatentDiffusionModel<T>

IDiffusionModel<T>

IFullModel<T, Tensor<T>, Tensor<T>>

IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>

IModelSerializer

ICheckpointableModel

IParameterizable<T, Tensor<T>, Tensor<T>>

IFeatureAware

IFeatureImportance<T>

ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>

IGradientComputable<T, Tensor<T>, Tensor<T>>

IJitCompilable<T>

Derived: AudioDiffusionModelBase<T>

ConsistencyModel<T>

ControlNetModel<T>

DallE3Model<T>

DreamFusionModel<T>

IPAdapterModel<T>

PixArtModel<T>

RiffusionModel<T>

SDXLModel<T>

Zero123Model<T>

ThreeDDiffusionModelBase<T>

VideoDiffusionModelBase<T>

Inherited Members: DiffusionModelBase<T>.NumOps

DiffusionModelBase<T>.RandomGenerator

DiffusionModelBase<T>.LossFunction

DiffusionModelBase<T>.LearningRate

DiffusionModelBase<T>.Scheduler

DiffusionModelBase<T>.ParameterCount

DiffusionModelBase<T>.DefaultLossFunction

DiffusionModelBase<T>.SupportsJitCompilation

DiffusionModelBase<T>.ComputeLoss(Tensor<T>, Tensor<T>, int[])

DiffusionModelBase<T>.Train(Tensor<T>, Tensor<T>)

DiffusionModelBase<T>.Predict(Tensor<T>)

DiffusionModelBase<T>.GetModelMetadata()

DiffusionModelBase<T>.GetParameters()

DiffusionModelBase<T>.SetParameters(Vector<T>)

DiffusionModelBase<T>.WithParameters(Vector<T>)

DiffusionModelBase<T>.Serialize()

DiffusionModelBase<T>.Deserialize(byte[])

DiffusionModelBase<T>.SaveModel(string)

DiffusionModelBase<T>.LoadModel(string)

DiffusionModelBase<T>.SaveState(Stream)

DiffusionModelBase<T>.LoadState(Stream)

DiffusionModelBase<T>.GetActiveFeatureIndices()

DiffusionModelBase<T>.SetActiveFeatureIndices(IEnumerable<int>)

DiffusionModelBase<T>.IsFeatureUsed(int)

DiffusionModelBase<T>.GetFeatureImportance()

DiffusionModelBase<T>.DeepCopy()

DiffusionModelBase<T>.Clone()

DiffusionModelBase<T>.ComputeGradients(Tensor<T>, Tensor<T>, ILossFunction<T>)

DiffusionModelBase<T>.ApplyGradients(Vector<T>, T)

DiffusionModelBase<T>.ExportComputationGraph(List<ComputationNode<T>>)

DiffusionModelBase<T>.SampleNoise(int, Random)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Extension Methods: DistributedExtensions.AsDistributedForHighBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributedForLowBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, IShardingConfiguration<T>)

Remarks

This abstract base class provides common functionality for all latent diffusion models, including encoding/decoding, text-to-image generation, image-to-image transformation, and inpainting.

For Beginners: This is the foundation for latent diffusion models like Stable Diffusion. It combines a VAE (for compression), a noise predictor (for denoising), and optional conditioning (for guided generation from text or images).

Constructors

LatentDiffusionModelBase(DiffusionModelOptions<T>?, INoiseScheduler<T>?)

Initializes a new instance of the LatentDiffusionModelBase class.

protected LatentDiffusionModelBase(DiffusionModelOptions<T>? options = null, INoiseScheduler<T>? scheduler = null)

Parameters

options DiffusionModelOptions<T>: Configuration options for the diffusion model.
scheduler INoiseScheduler<T>: Optional custom scheduler.

Properties

Conditioner

Gets the conditioning module (optional, for conditioned generation).

public abstract IConditioningModule<T>? Conditioner { get; }

Property Value

IConditioningModule<T>

GuidanceScale

Gets the default guidance scale for classifier-free guidance.

public virtual double GuidanceScale { get; }

Property Value

double

Remarks

Higher values make generation more closely follow the conditioning. Typical values: 7.5 for Stable Diffusion, 5.0 for SDXL.

LatentChannels

Gets the number of latent channels.

public abstract int LatentChannels { get; }

Property Value

int

Remarks

Typically 4 for Stable Diffusion models.

NoisePredictor

Gets the noise predictor model (U-Net, DiT, etc.).

public abstract INoisePredictor<T> NoisePredictor { get; }

Property Value

INoisePredictor<T>

SupportsInpainting

Gets whether this model supports inpainting.

public virtual bool SupportsInpainting { get; }

Property Value

bool

SupportsNegativePrompt

Gets whether this model supports negative prompts.

public virtual bool SupportsNegativePrompt { get; }

Property Value

bool

VAE

Gets the VAE model used for encoding and decoding.

public abstract IVAEModel<T> VAE { get; }

Property Value

IVAEModel<T>

Methods

ApplyGuidance(Tensor<T>, Tensor<T>, double)

Applies classifier-free guidance to combine conditional and unconditional predictions.

protected virtual Tensor<T> ApplyGuidance(Tensor<T> unconditional, Tensor<T> conditional, double scale)

Parameters

unconditional Tensor<T>: The unconditional noise prediction.
conditional Tensor<T>: The conditional noise prediction.
scale double: The guidance scale.

Returns

Tensor<T>: The guided noise prediction.

BlendLatentsWithMask(Tensor<T>, Tensor<T>, Tensor<T>, int)

Blends generated latents with original latents based on mask for inpainting.

protected virtual Tensor<T> BlendLatentsWithMask(Tensor<T> generated, Tensor<T> original, Tensor<T> mask, int timestep)

Parameters

generated Tensor<T>: The generated latents.
original Tensor<T>: The original latents.
mask Tensor<T>: The mask (1 = inpaint, 0 = keep original).
timestep int: Current timestep for noise addition to original.

Returns

Tensor<T>: Blended latents.

DecodeFromLatent(Tensor<T>)

Decodes a latent representation back to an image.

public virtual Tensor<T> DecodeFromLatent(Tensor<T> latent)

Parameters

latent Tensor<T>: The latent tensor.

Returns

Tensor<T>: The decoded image tensor [batch, channels, height, width].

Remarks

For Beginners: This decompresses a latent back to an image: - Input: Small latent (e.g., 64x64x4) - Output: Full-size image (e.g., 512x512x3)

EncodeToLatent(Tensor<T>, bool)

Encodes an image into latent space.

public virtual Tensor<T> EncodeToLatent(Tensor<T> image, bool sampleMode = true)

Parameters

image Tensor<T>: The input image tensor [batch, channels, height, width].
sampleMode bool: Whether to sample from the VAE distribution.

Returns

Tensor<T>: The latent representation.

Remarks

For Beginners: This compresses an image for processing: - Input: Full-size image (e.g., 512x512) - Output: Small latent (e.g., 64x64x4)

Use sampleMode=true during training for VAE regularization, and sampleMode=false for deterministic encoding during editing.

Generate(int[], int, int?)

Generates samples by iteratively denoising from random noise.

public override Tensor<T> Generate(int[] shape, int numInferenceSteps = 50, int? seed = null)

Parameters

shape int[]: The shape of samples to generate (e.g., [batchSize, channels, height, width]).
numInferenceSteps int: Number of denoising steps. More steps = higher quality, slower.
seed int?: Optional random seed for reproducibility. If null, uses system random.

Returns

Tensor<T>: Generated samples as a tensor.

Remarks

This is the main generation method. It starts with random noise and applies the reverse diffusion process to generate new samples.

For Beginners: This is how you create new images/data: 1. Start with pure random noise (like TV static) 2. Ask the model "what does this look like minus some noise?" 3. Repeat many times, each time removing a bit more noise 4. End with a clean generated sample

More inference steps = cleaner results but slower generation. Typical values: 20-50 for fast generation, 100-200 for high quality.

GenerateFromText(string, string?, int, int, int, double?, int?)

Generates images from text prompts using classifier-free guidance.

public virtual Tensor<T> GenerateFromText(string prompt, string? negativePrompt = null, int width = 512, int height = 512, int numInferenceSteps = 50, double? guidanceScale = null, int? seed = null)

Parameters

prompt string: The text prompt describing the desired image.
negativePrompt string: Optional negative prompt (what to avoid).
width int: Image width in pixels (should be divisible by VAE downsample factor).
height int: Image height in pixels (should be divisible by VAE downsample factor).
numInferenceSteps int: Number of denoising steps.
guidanceScale double?: How closely to follow the prompt (higher = closer).
seed int?: Optional random seed for reproducibility.

Returns

Tensor<T>: The generated image tensor.

Remarks

This is the main text-to-image generation method. It performs: 1. Encode text prompts to conditioning embeddings 2. Generate random latent noise 3. Iteratively denoise with classifier-free guidance 4. Decode latent to image

For Beginners: This is how you generate images from text: - prompt: What you want ("a cat in a spacesuit") - negativePrompt: What to avoid ("blurry, low quality") - guidanceScale: How strictly to follow the prompt (7.5 is typical)

ImageToImage(Tensor<T>, string, string?, double, int, double?, int?)

Performs image-to-image generation (style transfer, editing).

public virtual Tensor<T> ImageToImage(Tensor<T> inputImage, string prompt, string? negativePrompt = null, double strength = 0.8, int numInferenceSteps = 50, double? guidanceScale = null, int? seed = null)

Parameters

inputImage Tensor<T>: The input image to transform.
prompt string: The text prompt describing the desired transformation.
negativePrompt string: Optional negative prompt.
strength double: How much to transform (0.0 = no change, 1.0 = full regeneration).
numInferenceSteps int: Number of denoising steps.
guidanceScale double?: Classifier-free guidance scale.
seed int?: Optional random seed.

Returns

Tensor<T>: The transformed image tensor.

Remarks

Image-to-image works by: 1. Encode the input image to latent 2. Add noise to the latent (controlled by strength) 3. Denoise with text guidance 4. Decode back to image

For Beginners: This transforms an existing image based on a prompt:

strength=0.3: Minor changes, keeps most of the original
strength=0.7: Major changes, but composition remains
strength=1.0: Complete regeneration, original is just a starting point

Inpaint(Tensor<T>, Tensor<T>, string, string?, int, double?, int?)

Performs inpainting (filling in masked regions).

public virtual Tensor<T> Inpaint(Tensor<T> inputImage, Tensor<T> mask, string prompt, string? negativePrompt = null, int numInferenceSteps = 50, double? guidanceScale = null, int? seed = null)

Parameters

inputImage Tensor<T>: The input image with areas to inpaint.
mask Tensor<T>: Binary mask where 1 = inpaint, 0 = keep original.
prompt string: Text prompt describing what to generate in the masked area.
negativePrompt string: Optional negative prompt.
numInferenceSteps int: Number of denoising steps.
guidanceScale double?: Classifier-free guidance scale.
seed int?: Optional random seed.

Returns

Tensor<T>: The inpainted image tensor.

Remarks

Inpainting fills in masked regions while keeping unmasked areas intact. The mask should be the same spatial size as the image.

For Beginners: This is like a smart "fill" tool: - Draw a mask over what you want to replace - Describe what should go there - The model generates content that blends naturally

PredictNoise(Tensor<T>, int)

Predicts the noise in a noisy sample at a given timestep.

public override Tensor<T> PredictNoise(Tensor<T> noisySample, int timestep)

Parameters

noisySample Tensor<T>: The noisy input sample.
timestep int: The current timestep in the diffusion process.

Returns

Tensor<T>: The predicted noise tensor.

Remarks

This is the core prediction that the model learns. Given a noisy sample at timestep t, predict what noise was added to create it.

For Beginners: The model looks at a noisy image and guesses "what noise was added to make it look like this?" This prediction is then used to remove that noise and get a cleaner image.

ResizeMaskToLatent(Tensor<T>, int[])

Resizes a mask tensor to match latent dimensions.

protected virtual Tensor<T> ResizeMaskToLatent(Tensor<T> mask, int[] latentShape)

Parameters

mask Tensor<T>: The original mask [batch, 1, height, width].
latentShape int[]: The target latent shape.

Returns

Tensor<T>: The resized mask matching latent dimensions.

SampleNoiseTensor(int[], Random)

Samples a noise tensor from standard normal distribution.

protected virtual Tensor<T> SampleNoiseTensor(int[] shape, Random rng)

Parameters

shape int[]: The shape of the tensor.
rng Random: Random number generator.

Returns

Tensor<T>: A tensor filled with Gaussian noise.

SetGuidanceScale(double)

Sets the guidance scale for classifier-free guidance.

public virtual void SetGuidanceScale(double scale)

Parameters

scale double: The guidance scale (typically 1.0-20.0).

Table of Contents

Class LatentDiffusionModelBase<T>

Type Parameters

Remarks

Constructors

LatentDiffusionModelBase(DiffusionModelOptions<T>?, INoiseScheduler<T>?)

Parameters

Properties

Conditioner

Property Value

GuidanceScale

Property Value

Remarks

LatentChannels

Property Value

Remarks

NoisePredictor

Property Value

SupportsInpainting

Property Value

SupportsNegativePrompt

Property Value

VAE

Property Value

Methods

ApplyGuidance(Tensor<T>, Tensor<T>, double)

Parameters

Returns

BlendLatentsWithMask(Tensor<T>, Tensor<T>, Tensor<T>, int)

Parameters

Returns

DecodeFromLatent(Tensor<T>)

Parameters

Returns

Remarks

EncodeToLatent(Tensor<T>, bool)

Parameters

Returns

Remarks

Generate(int[], int, int?)

Parameters

Returns

Remarks

GenerateFromText(string, string?, int, int, int, double?, int?)

Parameters

Returns

Remarks

ImageToImage(Tensor<T>, string, string?, double, int, double?, int?)

Parameters

Returns

Remarks

Inpaint(Tensor<T>, Tensor<T>, string, string?, int, double?, int?)

Parameters

Returns

Remarks

PredictNoise(Tensor<T>, int)

Parameters

Returns

Remarks

ResizeMaskToLatent(Tensor<T>, int[])

Parameters

Returns

SampleNoiseTensor(int[], Random)

Parameters

Returns

SetGuidanceScale(double)

Parameters