Interface ILatentDiffusionModel<T>

Namespace: AiDotNet.Interfaces

Assembly: AiDotNet.dll

Interface for latent diffusion models that operate in a compressed latent space.

public interface ILatentDiffusionModel<T> : IDiffusionModel<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>

Type Parameters

T: The numeric type used for calculations.

Inherited Members: IDiffusionModel<T>.Scheduler

IDiffusionModel<T>.Generate(int[], int, int?)

IDiffusionModel<T>.PredictNoise(Tensor<T>, int)

IDiffusionModel<T>.ComputeLoss(Tensor<T>, Tensor<T>, int[])

IFullModel<T, Tensor<T>, Tensor<T>>.DefaultLossFunction

IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>.Train(Tensor<T>, Tensor<T>)

IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>.Predict(Tensor<T>)

IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>.GetModelMetadata()

IModelSerializer.Serialize()

IModelSerializer.Deserialize(byte[])

IModelSerializer.SaveModel(string)

IModelSerializer.LoadModel(string)

ICheckpointableModel.SaveState(Stream)

ICheckpointableModel.LoadState(Stream)

IParameterizable<T, Tensor<T>, Tensor<T>>.GetParameters()

IParameterizable<T, Tensor<T>, Tensor<T>>.SetParameters(Vector<T>)

IParameterizable<T, Tensor<T>, Tensor<T>>.ParameterCount

IParameterizable<T, Tensor<T>, Tensor<T>>.WithParameters(Vector<T>)

IFeatureAware.GetActiveFeatureIndices()

IFeatureAware.SetActiveFeatureIndices(IEnumerable<int>)

IFeatureAware.IsFeatureUsed(int)

IFeatureImportance<T>.GetFeatureImportance()

ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>.DeepCopy()

ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>.Clone()

IGradientComputable<T, Tensor<T>, Tensor<T>>.ComputeGradients(Tensor<T>, Tensor<T>, ILossFunction<T>)

IGradientComputable<T, Tensor<T>, Tensor<T>>.ApplyGradients(Vector<T>, T)

IJitCompilable<T>.ExportComputationGraph(List<ComputationNode<T>>)

IJitCompilable<T>.SupportsJitCompilation

Extension Methods: DistributedExtensions.AsDistributedForHighBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributedForLowBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, IShardingConfiguration<T>)

Remarks

Latent diffusion models are a highly efficient variant of diffusion models that perform the denoising process in a compressed latent space rather than pixel space. This is the architecture behind Stable Diffusion and many other state-of-the-art generative models.

For Beginners: Latent diffusion combines the power of diffusion models with the efficiency of autoencoders.

How it works:

A VAE compresses images (512x512) into small latents (64x64)
Diffusion happens in this compressed space (much faster!)
The VAE decompresses the result back to a full image

Benefits:

Training is ~50x faster than pixel-space diffusion
Generation is ~50x faster
Quality remains very high
Enables practical high-resolution generation

Key components:

VAE: Compresses and decompresses images
Noise Predictor (U-Net/DiT): Predicts noise in latent space
Scheduler: Controls the denoising process
Conditioner: Encodes text/images for guided generation

This interface extends IDiffusionModel<T> with latent-space specific operations.

Properties

Conditioner

Gets the conditioning module (optional, for conditioned generation).

IConditioningModule<T>? Conditioner { get; }

Property Value

IConditioningModule<T>

GuidanceScale

Gets the default guidance scale for classifier-free guidance.

double GuidanceScale { get; }

Property Value

double

Remarks

Higher values make generation more closely follow the conditioning. Typical values: 7.5 for Stable Diffusion, 5.0 for SDXL.

LatentChannels

Gets the number of latent channels.

int LatentChannels { get; }

Property Value

int

Remarks

Typically 4 for Stable Diffusion models.

NoisePredictor

Gets the noise predictor model (U-Net, DiT, etc.).

INoisePredictor<T> NoisePredictor { get; }

Property Value

INoisePredictor<T>

SupportsInpainting

Gets whether this model supports inpainting.

bool SupportsInpainting { get; }

Property Value

bool

SupportsNegativePrompt

Gets whether this model supports negative prompts.

bool SupportsNegativePrompt { get; }

Property Value

bool

VAE

Gets the VAE model used for encoding and decoding.

IVAEModel<T> VAE { get; }

Property Value

IVAEModel<T>

Methods

DecodeFromLatent(Tensor<T>)

Decodes a latent representation back to an image.

Tensor<T> DecodeFromLatent(Tensor<T> latent)

Parameters

latent Tensor<T>: The latent tensor.

Returns

Tensor<T>: The decoded image tensor [batch, channels, height, width].

Remarks

For Beginners: This decompresses a latent back to an image: - Input: Small latent (e.g., 64x64x4) - Output: Full-size image (e.g., 512x512x3)

EncodeToLatent(Tensor<T>, bool)

Encodes an image into latent space.

Tensor<T> EncodeToLatent(Tensor<T> image, bool sampleMode = true)

Parameters

image Tensor<T>: The input image tensor [batch, channels, height, width].
sampleMode bool: Whether to sample from the VAE distribution.

Returns

Tensor<T>: The latent representation.

Remarks

For Beginners: This compresses an image for processing: - Input: Full-size image (e.g., 512x512) - Output: Small latent (e.g., 64x64x4)

Use sampleMode=true during training for VAE regularization, and sampleMode=false for deterministic encoding during editing.

GenerateFromText(string, string?, int, int, int, double?, int?)

Generates images from text prompts using classifier-free guidance.

Tensor<T> GenerateFromText(string prompt, string? negativePrompt = null, int width = 512, int height = 512, int numInferenceSteps = 50, double? guidanceScale = null, int? seed = null)

Parameters

prompt string: The text prompt describing the desired image.
negativePrompt string: Optional negative prompt (what to avoid).
width int: Image width in pixels (should be divisible by VAE downsample factor).
height int: Image height in pixels (should be divisible by VAE downsample factor).
numInferenceSteps int: Number of denoising steps.
guidanceScale double?: How closely to follow the prompt (higher = closer).
seed int?: Optional random seed for reproducibility.

Returns

Tensor<T>: The generated image tensor.

Remarks

This is the main text-to-image generation method. It performs: 1. Encode text prompts to conditioning embeddings 2. Generate random latent noise 3. Iteratively denoise with classifier-free guidance 4. Decode latent to image

For Beginners: This is how you generate images from text: - prompt: What you want ("a cat in a spacesuit") - negativePrompt: What to avoid ("blurry, low quality") - guidanceScale: How strictly to follow the prompt (7.5 is typical)

ImageToImage(Tensor<T>, string, string?, double, int, double?, int?)

Performs image-to-image generation (style transfer, editing).

Tensor<T> ImageToImage(Tensor<T> inputImage, string prompt, string? negativePrompt = null, double strength = 0.8, int numInferenceSteps = 50, double? guidanceScale = null, int? seed = null)

Parameters

inputImage Tensor<T>: The input image to transform.
prompt string: The text prompt describing the desired transformation.
negativePrompt string: Optional negative prompt.
strength double: How much to transform (0.0 = no change, 1.0 = full regeneration).
numInferenceSteps int: Number of denoising steps.
guidanceScale double?: Classifier-free guidance scale.
seed int?: Optional random seed.

Returns

Tensor<T>: The transformed image tensor.

Remarks

Image-to-image works by: 1. Encode the input image to latent 2. Add noise to the latent (controlled by strength) 3. Denoise with text guidance 4. Decode back to image

For Beginners: This transforms an existing image based on a prompt:

strength=0.3: Minor changes, keeps most of the original
strength=0.7: Major changes, but composition remains
strength=1.0: Complete regeneration, original is just a starting point

Inpaint(Tensor<T>, Tensor<T>, string, string?, int, double?, int?)

Performs inpainting (filling in masked regions).

Tensor<T> Inpaint(Tensor<T> inputImage, Tensor<T> mask, string prompt, string? negativePrompt = null, int numInferenceSteps = 50, double? guidanceScale = null, int? seed = null)

Parameters

inputImage Tensor<T>: The input image with areas to inpaint.
mask Tensor<T>: Binary mask where 1 = inpaint, 0 = keep original.
prompt string: Text prompt describing what to generate in the masked area.
negativePrompt string: Optional negative prompt.
numInferenceSteps int: Number of denoising steps.
guidanceScale double?: Classifier-free guidance scale.
seed int?: Optional random seed.

Returns

Tensor<T>: The inpainted image tensor.

Remarks

Inpainting fills in masked regions while keeping unmasked areas intact. The mask should be the same spatial size as the image.

For Beginners: This is like a smart "fill" tool: - Draw a mask over what you want to replace - Describe what should go there - The model generates content that blends naturally

SetGuidanceScale(double)

Sets the guidance scale for classifier-free guidance.

void SetGuidanceScale(double scale)

Parameters

scale double: The guidance scale (typically 1.0-20.0).

Table of Contents

Interface ILatentDiffusionModel<T>

Type Parameters

Remarks

Properties

Conditioner

Property Value

GuidanceScale

Property Value

Remarks

LatentChannels

Property Value

Remarks

NoisePredictor

Property Value

SupportsInpainting

Property Value

SupportsNegativePrompt

Property Value

VAE

Property Value

Methods

DecodeFromLatent(Tensor<T>)

Parameters

Returns

Remarks

EncodeToLatent(Tensor<T>, bool)

Parameters

Returns

Remarks

GenerateFromText(string, string?, int, int, int, double?, int?)

Parameters

Returns

Remarks

ImageToImage(Tensor<T>, string, string?, double, int, double?, int?)

Parameters

Returns

Remarks

Inpaint(Tensor<T>, Tensor<T>, string, string?, int, double?, int?)

Parameters

Returns

Remarks

SetGuidanceScale(double)

Parameters