Class SDXLModel<T>

Namespace: AiDotNet.Diffusion.Models

Assembly: AiDotNet.dll

Stable Diffusion XL (SDXL) model for high-resolution image generation.

public class SDXLModel<T> : LatentDiffusionModelBase<T>, ILatentDiffusionModel<T>, IDiffusionModel<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>

Type Parameters

T: The numeric type used for calculations.

Inheritance: object

DiffusionModelBase<T>

LatentDiffusionModelBase<T>

SDXLModel<T>

Implements: ILatentDiffusionModel<T>

IDiffusionModel<T>

IFullModel<T, Tensor<T>, Tensor<T>>

IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>

IModelSerializer

ICheckpointableModel

IParameterizable<T, Tensor<T>, Tensor<T>>

IFeatureAware

IFeatureImportance<T>

ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>

IGradientComputable<T, Tensor<T>, Tensor<T>>

IJitCompilable<T>

Inherited Members: LatentDiffusionModelBase<T>.GuidanceScale

LatentDiffusionModelBase<T>.SupportsNegativePrompt

LatentDiffusionModelBase<T>.SupportsInpainting

LatentDiffusionModelBase<T>.EncodeToLatent(Tensor<T>, bool)

LatentDiffusionModelBase<T>.DecodeFromLatent(Tensor<T>)

LatentDiffusionModelBase<T>.GenerateFromText(string, string, int, int, int, double?, int?)

LatentDiffusionModelBase<T>.ImageToImage(Tensor<T>, string, string, double, int, double?, int?)

LatentDiffusionModelBase<T>.Inpaint(Tensor<T>, Tensor<T>, string, string, int, double?, int?)

LatentDiffusionModelBase<T>.SetGuidanceScale(double)

LatentDiffusionModelBase<T>.PredictNoise(Tensor<T>, int)

LatentDiffusionModelBase<T>.Generate(int[], int, int?)

LatentDiffusionModelBase<T>.ApplyGuidance(Tensor<T>, Tensor<T>, double)

LatentDiffusionModelBase<T>.SampleNoiseTensor(int[], Random)

LatentDiffusionModelBase<T>.ResizeMaskToLatent(Tensor<T>, int[])

LatentDiffusionModelBase<T>.BlendLatentsWithMask(Tensor<T>, Tensor<T>, Tensor<T>, int)

DiffusionModelBase<T>.NumOps

DiffusionModelBase<T>.RandomGenerator

DiffusionModelBase<T>.LossFunction

DiffusionModelBase<T>.LearningRate

DiffusionModelBase<T>.Scheduler

DiffusionModelBase<T>.DefaultLossFunction

DiffusionModelBase<T>.SupportsJitCompilation

DiffusionModelBase<T>.ComputeLoss(Tensor<T>, Tensor<T>, int[])

DiffusionModelBase<T>.Train(Tensor<T>, Tensor<T>)

DiffusionModelBase<T>.Predict(Tensor<T>)

DiffusionModelBase<T>.GetModelMetadata()

DiffusionModelBase<T>.WithParameters(Vector<T>)

DiffusionModelBase<T>.Serialize()

DiffusionModelBase<T>.Deserialize(byte[])

DiffusionModelBase<T>.SaveModel(string)

DiffusionModelBase<T>.LoadModel(string)

DiffusionModelBase<T>.SaveState(Stream)

DiffusionModelBase<T>.LoadState(Stream)

DiffusionModelBase<T>.GetActiveFeatureIndices()

DiffusionModelBase<T>.SetActiveFeatureIndices(IEnumerable<int>)

DiffusionModelBase<T>.IsFeatureUsed(int)

DiffusionModelBase<T>.GetFeatureImportance()

DiffusionModelBase<T>.ComputeGradients(Tensor<T>, Tensor<T>, ILossFunction<T>)

DiffusionModelBase<T>.ApplyGradients(Vector<T>, T)

DiffusionModelBase<T>.ExportComputationGraph(List<ComputationNode<T>>)

DiffusionModelBase<T>.SampleNoise(int, Random)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Extension Methods: DistributedExtensions.AsDistributedForHighBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributedForLowBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, IShardingConfiguration<T>)

Examples

// Create an SDXL model
var sdxl = new SDXLModel<float>();

// Generate a high-resolution image
var image = sdxl.GenerateFromText(
    prompt: "A majestic dragon perched on a mountain peak at sunset, highly detailed",
    negativePrompt: "blurry, low quality, distorted",
    width: 1024,
    height: 1024,
    numInferenceSteps: 30,
    guidanceScale: 7.5,
    seed: 42);

// Generate with micro-conditioning for aspect ratio
var wideImage = sdxl.GenerateWithMicroCondition(
    prompt: "Panoramic landscape with mountains and lake",
    width: 1536,
    height: 640,
    originalWidth: 1536,
    originalHeight: 640,
    cropTop: 0,
    cropLeft: 0);

// Use the refiner for enhanced details
if (sdxl.SupportsRefiner)
{
    var refined = sdxl.RefineImage(image, "enhance details");
}

Remarks

SDXL is Stability AI's flagship text-to-image model, designed for high-quality 1024x1024 image generation with improved prompt understanding and visual fidelity compared to earlier Stable Diffusion versions.

For Beginners: SDXL is like Stable Diffusion 2.0 but significantly upgraded:

Key improvements over SD 1.5/2.0:

4x larger U-Net (2.6B vs 865M parameters)
Dual text encoders (better prompt understanding)
Native 1024x1024 resolution (vs 512x512)
Optional refiner model for enhanced details

How SDXL works:

Your prompt goes through TWO text encoders (CLIP + OpenCLIP)
These embeddings guide a much larger U-Net during denoising
The base model generates at 1024x1024
(Optional) A refiner model enhances fine details

Example prompt flow: "A majestic dragon" -> [CLIP] + [OpenCLIP] -> Combined embedding -> Large U-Net denoises -> 1024x1024 image -> (Optional) Refiner -> Enhanced details

Use SDXL when you need:

High resolution output
Better text rendering in images
More detailed and coherent images
Following complex prompts accurately

Technical specifications: - Base model: 2.6B parameter U-Net - Text encoders: CLIP ViT-L/14 + OpenCLIP ViT-bigG/14 - Native resolution: 1024x1024 - Latent space: 4 channels, 8x spatial downsampling - Guidance scale: 5.0-9.0 recommended (7.5 default) - Scheduler: DDPM/DPM++/Euler with 20-50 steps

Architecture details:

Micro-conditioning: Size and crop coordinates for multi-aspect training
Dual text encoding: Concatenated CLIP + OpenCLIP embeddings
Channel multipliers: [1, 2, 4, 4] (vs [1, 2, 4, 8] in SD 2.x)
Cross-attention dimension: 2048 (vs 1024 in SD 1.x)

Constructors

SDXLModel()

Initializes a new instance of SDXLModel with default parameters.

public SDXLModel()

Remarks

Creates an SDXL model with standard parameters:

1024x1024 native resolution
2048 cross-attention dimension
Dual text encoder support
DDIM scheduler with 50 steps

SDXLModel(DiffusionModelOptions<T>?, INoiseScheduler<T>?, UNetNoisePredictor<T>?, StandardVAE<T>?, IConditioningModule<T>?, IConditioningModule<T>?, SDXLRefiner<T>?, bool, int, int?)

Initializes a new instance of SDXLModel with custom parameters.

public SDXLModel(DiffusionModelOptions<T>? options = null, INoiseScheduler<T>? scheduler = null, UNetNoisePredictor<T>? unet = null, StandardVAE<T>? vae = null, IConditioningModule<T>? conditioner1 = null, IConditioningModule<T>? conditioner2 = null, SDXLRefiner<T>? refiner = null, bool useDualEncoder = true, int crossAttentionDim = 2048, int? seed = null)

Parameters

options DiffusionModelOptions<T>: Configuration options for the diffusion model.
scheduler INoiseScheduler<T>: Optional custom scheduler.
unet UNetNoisePredictor<T>: Optional custom U-Net noise predictor.
vae StandardVAE<T>: Optional custom VAE.
conditioner1 IConditioningModule<T>: Optional primary text encoder (CLIP).
conditioner2 IConditioningModule<T>: Optional secondary text encoder (OpenCLIP).
refiner SDXLRefiner<T>: Optional refiner model.
useDualEncoder bool: Whether to use dual text encoders.
crossAttentionDim int: Cross-attention dimension (2048 for SDXL).
seed int?: Optional random seed for reproducibility.

Fields

DefaultHeight

Default height for SDXL generation.

public const int DefaultHeight = 1024

Field Value

int

DefaultWidth

Default width for SDXL generation.

public const int DefaultWidth = 1024

Field Value

int

Properties

Conditioner

Gets the conditioning module (optional, for conditioned generation).

public override IConditioningModule<T>? Conditioner { get; }

Property Value

IConditioningModule<T>

CrossAttentionDim

Gets the cross-attention dimension (2048 for SDXL).

public int CrossAttentionDim { get; }

Property Value

int

LatentChannels

Gets the number of latent channels.

public override int LatentChannels { get; }

Property Value

int

Remarks

Typically 4 for Stable Diffusion models.

NoisePredictor

Gets the noise predictor model (U-Net, DiT, etc.).

public override INoisePredictor<T> NoisePredictor { get; }

Property Value

INoisePredictor<T>

ParameterCount

Gets the number of parameters in the model.

public override int ParameterCount { get; }

Property Value

int

Remarks

This property returns the total count of trainable parameters in the model. It's useful for understanding model complexity and memory requirements.

Refiner

Gets the refiner model if available.

public SDXLRefiner<T>? Refiner { get; }

Property Value

SDXLRefiner<T>

SecondaryConditioner

Gets the secondary text encoder if available.

public IConditioningModule<T>? SecondaryConditioner { get; }

Property Value

IConditioningModule<T>

SupportsRefiner

Gets whether this model has a refiner available.

public bool SupportsRefiner { get; }

Property Value

bool

UsesDualEncoder

Gets whether this model uses dual text encoders.

public bool UsesDualEncoder { get; }

Property Value

bool

VAE

Gets the VAE model used for encoding and decoding.

public override IVAEModel<T> VAE { get; }

Property Value

IVAEModel<T>

Methods

Clone()

Creates a deep copy of the model.

public override IDiffusionModel<T> Clone()

Returns

IDiffusionModel<T>: A new instance with the same parameters.

DeepCopy()

Creates a deep copy of this object.

public override IFullModel<T, Tensor<T>, Tensor<T>> DeepCopy()

Returns

IFullModel<T, Tensor<T>, Tensor<T>>

GenerateWithMicroCondition(string, string?, int, int, int?, int?, int, int, int, double?, int?)

Generates an image with micro-conditioning for multi-aspect ratio support.

public virtual Tensor<T> GenerateWithMicroCondition(string prompt, string? negativePrompt = null, int width = 1024, int height = 1024, int? originalWidth = null, int? originalHeight = null, int cropTop = 0, int cropLeft = 0, int numInferenceSteps = 50, double? guidanceScale = null, int? seed = null)

Parameters

prompt string: The text prompt describing the desired image.
negativePrompt string: Optional negative prompt to guide away from.
width int: Output image width.
height int: Output image height.
originalWidth int?: Original target width for conditioning.
originalHeight int?: Original target height for conditioning.
cropTop int: Top crop coordinate for conditioning.
cropLeft int: Left crop coordinate for conditioning.
numInferenceSteps int: Number of denoising steps.
guidanceScale double?: Classifier-free guidance scale.
seed int?: Optional random seed.

Returns

Tensor<T>: Generated image tensor.

Remarks

For Beginners: Micro-conditioning helps SDXL generate better images at various aspect ratios by telling the model about the target size and any cropping applied during training.

When generating at non-square resolutions: - Set originalWidth/originalHeight to your target size - Set cropTop/cropLeft to 0 for centered generation - The model adjusts its generation accordingly

GetParameters()

Gets the parameters that can be optimized.

public override Vector<T> GetParameters()

Returns

Vector<T>

RefineImage(Tensor<T>, string, string?, int, double, int?)

Refines an image using the SDXL refiner model.

public virtual Tensor<T> RefineImage(Tensor<T> image, string prompt, string? negativePrompt = null, int numInferenceSteps = 25, double denoiseStrength = 0.3, int? seed = null)

Parameters

image Tensor<T>: The base image to refine.
prompt string: The text prompt (should match base generation).
negativePrompt string: Optional negative prompt.
numInferenceSteps int: Number of refiner steps (typically 20-30).
denoiseStrength double: How much to denoise (0.2-0.4 typical for refining).
seed int?: Optional random seed.

Returns

Tensor<T>: Refined image tensor.

Remarks

For Beginners: The refiner is a specialized model that takes an already-generated image and enhances fine details:

Without refiner:

Base SDXL generates good overall structure
Some fine details may be slightly soft

With refiner:

Details like skin texture, fabric, hair are enhanced
Overall coherence is preserved
Image looks more "finished"

Best practices:

Use denoiseStrength 0.2-0.4 (higher = more change)
Use 20-30 refiner steps
Keep the same prompt as base generation

SetParameters(Vector<T>)

Sets the model parameters.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>: The parameter vector to set.

Remarks

This method allows direct modification of the model's internal parameters. This is useful for optimization algorithms that need to update parameters iteratively. If the length of parameters does not match ParameterCount, an ArgumentException should be thrown.

Exceptions

ArgumentException: Thrown when the length of parameters does not match ParameterCount.

Table of Contents

Class SDXLModel<T>

Type Parameters

Examples

Remarks

Constructors

SDXLModel()

Remarks

SDXLModel(DiffusionModelOptions<T>?, INoiseScheduler<T>?, UNetNoisePredictor<T>?, StandardVAE<T>?, IConditioningModule<T>?, IConditioningModule<T>?, SDXLRefiner<T>?, bool, int, int?)

Parameters

Fields

DefaultHeight

Field Value

DefaultWidth

Field Value

Properties

Conditioner

Property Value

CrossAttentionDim

Property Value

LatentChannels

Property Value

Remarks

NoisePredictor

Property Value

ParameterCount

Property Value

Remarks

Refiner

Property Value

SecondaryConditioner

Property Value

SupportsRefiner

Property Value

UsesDualEncoder

Property Value

VAE

Property Value

Methods

Clone()

Returns

DeepCopy()

Returns

GenerateWithMicroCondition(string, string?, int, int, int?, int?, int, int, int, double?, int?)

Parameters

Returns

Remarks

GetParameters()

Returns

RefineImage(Tensor<T>, string, string?, int, double, int?)

Parameters

Returns

Remarks

SetParameters(Vector<T>)

Parameters

Remarks

Exceptions