Table of Contents

Class SDXLModel<T>

Namespace
AiDotNet.Diffusion.Models
Assembly
AiDotNet.dll

Stable Diffusion XL (SDXL) model for high-resolution image generation.

public class SDXLModel<T> : LatentDiffusionModelBase<T>, ILatentDiffusionModel<T>, IDiffusionModel<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>

Type Parameters

T

The numeric type used for calculations.

Inheritance
SDXLModel<T>
Implements
IFullModel<T, Tensor<T>, Tensor<T>>
IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>
IParameterizable<T, Tensor<T>, Tensor<T>>
ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>
IGradientComputable<T, Tensor<T>, Tensor<T>>
Inherited Members
Extension Methods

Examples

// Create an SDXL model
var sdxl = new SDXLModel<float>();

// Generate a high-resolution image
var image = sdxl.GenerateFromText(
    prompt: "A majestic dragon perched on a mountain peak at sunset, highly detailed",
    negativePrompt: "blurry, low quality, distorted",
    width: 1024,
    height: 1024,
    numInferenceSteps: 30,
    guidanceScale: 7.5,
    seed: 42);

// Generate with micro-conditioning for aspect ratio
var wideImage = sdxl.GenerateWithMicroCondition(
    prompt: "Panoramic landscape with mountains and lake",
    width: 1536,
    height: 640,
    originalWidth: 1536,
    originalHeight: 640,
    cropTop: 0,
    cropLeft: 0);

// Use the refiner for enhanced details
if (sdxl.SupportsRefiner)
{
    var refined = sdxl.RefineImage(image, "enhance details");
}

Remarks

SDXL is Stability AI's flagship text-to-image model, designed for high-quality 1024x1024 image generation with improved prompt understanding and visual fidelity compared to earlier Stable Diffusion versions.

For Beginners: SDXL is like Stable Diffusion 2.0 but significantly upgraded:

Key improvements over SD 1.5/2.0:

  • 4x larger U-Net (2.6B vs 865M parameters)
  • Dual text encoders (better prompt understanding)
  • Native 1024x1024 resolution (vs 512x512)
  • Optional refiner model for enhanced details

How SDXL works:

  1. Your prompt goes through TWO text encoders (CLIP + OpenCLIP)
  2. These embeddings guide a much larger U-Net during denoising
  3. The base model generates at 1024x1024
  4. (Optional) A refiner model enhances fine details

Example prompt flow: "A majestic dragon" -> [CLIP] + [OpenCLIP] -> Combined embedding -> Large U-Net denoises -> 1024x1024 image -> (Optional) Refiner -> Enhanced details

Use SDXL when you need:

  • High resolution output
  • Better text rendering in images
  • More detailed and coherent images
  • Following complex prompts accurately

Technical specifications: - Base model: 2.6B parameter U-Net - Text encoders: CLIP ViT-L/14 + OpenCLIP ViT-bigG/14 - Native resolution: 1024x1024 - Latent space: 4 channels, 8x spatial downsampling - Guidance scale: 5.0-9.0 recommended (7.5 default) - Scheduler: DDPM/DPM++/Euler with 20-50 steps

Architecture details:

  • Micro-conditioning: Size and crop coordinates for multi-aspect training
  • Dual text encoding: Concatenated CLIP + OpenCLIP embeddings
  • Channel multipliers: [1, 2, 4, 4] (vs [1, 2, 4, 8] in SD 2.x)
  • Cross-attention dimension: 2048 (vs 1024 in SD 1.x)

Constructors

SDXLModel()

Initializes a new instance of SDXLModel with default parameters.

public SDXLModel()

Remarks

Creates an SDXL model with standard parameters:

  • 1024x1024 native resolution
  • 2048 cross-attention dimension
  • Dual text encoder support
  • DDIM scheduler with 50 steps

SDXLModel(DiffusionModelOptions<T>?, INoiseScheduler<T>?, UNetNoisePredictor<T>?, StandardVAE<T>?, IConditioningModule<T>?, IConditioningModule<T>?, SDXLRefiner<T>?, bool, int, int?)

Initializes a new instance of SDXLModel with custom parameters.

public SDXLModel(DiffusionModelOptions<T>? options = null, INoiseScheduler<T>? scheduler = null, UNetNoisePredictor<T>? unet = null, StandardVAE<T>? vae = null, IConditioningModule<T>? conditioner1 = null, IConditioningModule<T>? conditioner2 = null, SDXLRefiner<T>? refiner = null, bool useDualEncoder = true, int crossAttentionDim = 2048, int? seed = null)

Parameters

options DiffusionModelOptions<T>

Configuration options for the diffusion model.

scheduler INoiseScheduler<T>

Optional custom scheduler.

unet UNetNoisePredictor<T>

Optional custom U-Net noise predictor.

vae StandardVAE<T>

Optional custom VAE.

conditioner1 IConditioningModule<T>

Optional primary text encoder (CLIP).

conditioner2 IConditioningModule<T>

Optional secondary text encoder (OpenCLIP).

refiner SDXLRefiner<T>

Optional refiner model.

useDualEncoder bool

Whether to use dual text encoders.

crossAttentionDim int

Cross-attention dimension (2048 for SDXL).

seed int?

Optional random seed for reproducibility.

Fields

DefaultHeight

Default height for SDXL generation.

public const int DefaultHeight = 1024

Field Value

int

DefaultWidth

Default width for SDXL generation.

public const int DefaultWidth = 1024

Field Value

int

Properties

Conditioner

Gets the conditioning module (optional, for conditioned generation).

public override IConditioningModule<T>? Conditioner { get; }

Property Value

IConditioningModule<T>

CrossAttentionDim

Gets the cross-attention dimension (2048 for SDXL).

public int CrossAttentionDim { get; }

Property Value

int

LatentChannels

Gets the number of latent channels.

public override int LatentChannels { get; }

Property Value

int

Remarks

Typically 4 for Stable Diffusion models.

NoisePredictor

Gets the noise predictor model (U-Net, DiT, etc.).

public override INoisePredictor<T> NoisePredictor { get; }

Property Value

INoisePredictor<T>

ParameterCount

Gets the number of parameters in the model.

public override int ParameterCount { get; }

Property Value

int

Remarks

This property returns the total count of trainable parameters in the model. It's useful for understanding model complexity and memory requirements.

Refiner

Gets the refiner model if available.

public SDXLRefiner<T>? Refiner { get; }

Property Value

SDXLRefiner<T>

SecondaryConditioner

Gets the secondary text encoder if available.

public IConditioningModule<T>? SecondaryConditioner { get; }

Property Value

IConditioningModule<T>

SupportsRefiner

Gets whether this model has a refiner available.

public bool SupportsRefiner { get; }

Property Value

bool

UsesDualEncoder

Gets whether this model uses dual text encoders.

public bool UsesDualEncoder { get; }

Property Value

bool

VAE

Gets the VAE model used for encoding and decoding.

public override IVAEModel<T> VAE { get; }

Property Value

IVAEModel<T>

Methods

Clone()

Creates a deep copy of the model.

public override IDiffusionModel<T> Clone()

Returns

IDiffusionModel<T>

A new instance with the same parameters.

DeepCopy()

Creates a deep copy of this object.

public override IFullModel<T, Tensor<T>, Tensor<T>> DeepCopy()

Returns

IFullModel<T, Tensor<T>, Tensor<T>>

GenerateWithMicroCondition(string, string?, int, int, int?, int?, int, int, int, double?, int?)

Generates an image with micro-conditioning for multi-aspect ratio support.

public virtual Tensor<T> GenerateWithMicroCondition(string prompt, string? negativePrompt = null, int width = 1024, int height = 1024, int? originalWidth = null, int? originalHeight = null, int cropTop = 0, int cropLeft = 0, int numInferenceSteps = 50, double? guidanceScale = null, int? seed = null)

Parameters

prompt string

The text prompt describing the desired image.

negativePrompt string

Optional negative prompt to guide away from.

width int

Output image width.

height int

Output image height.

originalWidth int?

Original target width for conditioning.

originalHeight int?

Original target height for conditioning.

cropTop int

Top crop coordinate for conditioning.

cropLeft int

Left crop coordinate for conditioning.

numInferenceSteps int

Number of denoising steps.

guidanceScale double?

Classifier-free guidance scale.

seed int?

Optional random seed.

Returns

Tensor<T>

Generated image tensor.

Remarks

For Beginners: Micro-conditioning helps SDXL generate better images at various aspect ratios by telling the model about the target size and any cropping applied during training.

When generating at non-square resolutions: - Set originalWidth/originalHeight to your target size - Set cropTop/cropLeft to 0 for centered generation - The model adjusts its generation accordingly

GetParameters()

Gets the parameters that can be optimized.

public override Vector<T> GetParameters()

Returns

Vector<T>

RefineImage(Tensor<T>, string, string?, int, double, int?)

Refines an image using the SDXL refiner model.

public virtual Tensor<T> RefineImage(Tensor<T> image, string prompt, string? negativePrompt = null, int numInferenceSteps = 25, double denoiseStrength = 0.3, int? seed = null)

Parameters

image Tensor<T>

The base image to refine.

prompt string

The text prompt (should match base generation).

negativePrompt string

Optional negative prompt.

numInferenceSteps int

Number of refiner steps (typically 20-30).

denoiseStrength double

How much to denoise (0.2-0.4 typical for refining).

seed int?

Optional random seed.

Returns

Tensor<T>

Refined image tensor.

Remarks

For Beginners: The refiner is a specialized model that takes an already-generated image and enhances fine details:

Without refiner:

  • Base SDXL generates good overall structure
  • Some fine details may be slightly soft

With refiner:

  • Details like skin texture, fabric, hair are enhanced
  • Overall coherence is preserved
  • Image looks more "finished"

Best practices:

  • Use denoiseStrength 0.2-0.4 (higher = more change)
  • Use 20-30 refiner steps
  • Keep the same prompt as base generation

SetParameters(Vector<T>)

Sets the model parameters.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>

The parameter vector to set.

Remarks

This method allows direct modification of the model's internal parameters. This is useful for optimization algorithms that need to update parameters iteratively. If the length of parameters does not match ParameterCount, an ArgumentException should be thrown.

Exceptions

ArgumentException

Thrown when the length of parameters does not match ParameterCount.