Class SDXLModel<T>
Stable Diffusion XL (SDXL) model for high-resolution image generation.
public class SDXLModel<T> : LatentDiffusionModelBase<T>, ILatentDiffusionModel<T>, IDiffusionModel<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>
Type Parameters
TThe numeric type used for calculations.
- Inheritance
-
SDXLModel<T>
- Implements
- Inherited Members
- Extension Methods
Examples
// Create an SDXL model
var sdxl = new SDXLModel<float>();
// Generate a high-resolution image
var image = sdxl.GenerateFromText(
prompt: "A majestic dragon perched on a mountain peak at sunset, highly detailed",
negativePrompt: "blurry, low quality, distorted",
width: 1024,
height: 1024,
numInferenceSteps: 30,
guidanceScale: 7.5,
seed: 42);
// Generate with micro-conditioning for aspect ratio
var wideImage = sdxl.GenerateWithMicroCondition(
prompt: "Panoramic landscape with mountains and lake",
width: 1536,
height: 640,
originalWidth: 1536,
originalHeight: 640,
cropTop: 0,
cropLeft: 0);
// Use the refiner for enhanced details
if (sdxl.SupportsRefiner)
{
var refined = sdxl.RefineImage(image, "enhance details");
}
Remarks
SDXL is Stability AI's flagship text-to-image model, designed for high-quality 1024x1024 image generation with improved prompt understanding and visual fidelity compared to earlier Stable Diffusion versions.
For Beginners: SDXL is like Stable Diffusion 2.0 but significantly upgraded:
Key improvements over SD 1.5/2.0:
- 4x larger U-Net (2.6B vs 865M parameters)
- Dual text encoders (better prompt understanding)
- Native 1024x1024 resolution (vs 512x512)
- Optional refiner model for enhanced details
How SDXL works:
- Your prompt goes through TWO text encoders (CLIP + OpenCLIP)
- These embeddings guide a much larger U-Net during denoising
- The base model generates at 1024x1024
- (Optional) A refiner model enhances fine details
Example prompt flow: "A majestic dragon" -> [CLIP] + [OpenCLIP] -> Combined embedding -> Large U-Net denoises -> 1024x1024 image -> (Optional) Refiner -> Enhanced details
Use SDXL when you need:
- High resolution output
- Better text rendering in images
- More detailed and coherent images
- Following complex prompts accurately
Technical specifications: - Base model: 2.6B parameter U-Net - Text encoders: CLIP ViT-L/14 + OpenCLIP ViT-bigG/14 - Native resolution: 1024x1024 - Latent space: 4 channels, 8x spatial downsampling - Guidance scale: 5.0-9.0 recommended (7.5 default) - Scheduler: DDPM/DPM++/Euler with 20-50 steps
Architecture details:
- Micro-conditioning: Size and crop coordinates for multi-aspect training
- Dual text encoding: Concatenated CLIP + OpenCLIP embeddings
- Channel multipliers: [1, 2, 4, 4] (vs [1, 2, 4, 8] in SD 2.x)
- Cross-attention dimension: 2048 (vs 1024 in SD 1.x)
Constructors
SDXLModel()
Initializes a new instance of SDXLModel with default parameters.
public SDXLModel()
Remarks
Creates an SDXL model with standard parameters:
- 1024x1024 native resolution
- 2048 cross-attention dimension
- Dual text encoder support
- DDIM scheduler with 50 steps
SDXLModel(DiffusionModelOptions<T>?, INoiseScheduler<T>?, UNetNoisePredictor<T>?, StandardVAE<T>?, IConditioningModule<T>?, IConditioningModule<T>?, SDXLRefiner<T>?, bool, int, int?)
Initializes a new instance of SDXLModel with custom parameters.
public SDXLModel(DiffusionModelOptions<T>? options = null, INoiseScheduler<T>? scheduler = null, UNetNoisePredictor<T>? unet = null, StandardVAE<T>? vae = null, IConditioningModule<T>? conditioner1 = null, IConditioningModule<T>? conditioner2 = null, SDXLRefiner<T>? refiner = null, bool useDualEncoder = true, int crossAttentionDim = 2048, int? seed = null)
Parameters
optionsDiffusionModelOptions<T>Configuration options for the diffusion model.
schedulerINoiseScheduler<T>Optional custom scheduler.
unetUNetNoisePredictor<T>Optional custom U-Net noise predictor.
vaeStandardVAE<T>Optional custom VAE.
conditioner1IConditioningModule<T>Optional primary text encoder (CLIP).
conditioner2IConditioningModule<T>Optional secondary text encoder (OpenCLIP).
refinerSDXLRefiner<T>Optional refiner model.
useDualEncoderboolWhether to use dual text encoders.
crossAttentionDimintCross-attention dimension (2048 for SDXL).
seedint?Optional random seed for reproducibility.
Fields
DefaultHeight
Default height for SDXL generation.
public const int DefaultHeight = 1024
Field Value
DefaultWidth
Default width for SDXL generation.
public const int DefaultWidth = 1024
Field Value
Properties
Conditioner
Gets the conditioning module (optional, for conditioned generation).
public override IConditioningModule<T>? Conditioner { get; }
Property Value
CrossAttentionDim
Gets the cross-attention dimension (2048 for SDXL).
public int CrossAttentionDim { get; }
Property Value
LatentChannels
Gets the number of latent channels.
public override int LatentChannels { get; }
Property Value
Remarks
Typically 4 for Stable Diffusion models.
NoisePredictor
Gets the noise predictor model (U-Net, DiT, etc.).
public override INoisePredictor<T> NoisePredictor { get; }
Property Value
ParameterCount
Gets the number of parameters in the model.
public override int ParameterCount { get; }
Property Value
Remarks
This property returns the total count of trainable parameters in the model. It's useful for understanding model complexity and memory requirements.
Refiner
Gets the refiner model if available.
public SDXLRefiner<T>? Refiner { get; }
Property Value
- SDXLRefiner<T>
SecondaryConditioner
Gets the secondary text encoder if available.
public IConditioningModule<T>? SecondaryConditioner { get; }
Property Value
SupportsRefiner
Gets whether this model has a refiner available.
public bool SupportsRefiner { get; }
Property Value
UsesDualEncoder
Gets whether this model uses dual text encoders.
public bool UsesDualEncoder { get; }
Property Value
VAE
Gets the VAE model used for encoding and decoding.
public override IVAEModel<T> VAE { get; }
Property Value
- IVAEModel<T>
Methods
Clone()
Creates a deep copy of the model.
public override IDiffusionModel<T> Clone()
Returns
- IDiffusionModel<T>
A new instance with the same parameters.
DeepCopy()
Creates a deep copy of this object.
public override IFullModel<T, Tensor<T>, Tensor<T>> DeepCopy()
Returns
- IFullModel<T, Tensor<T>, Tensor<T>>
GenerateWithMicroCondition(string, string?, int, int, int?, int?, int, int, int, double?, int?)
Generates an image with micro-conditioning for multi-aspect ratio support.
public virtual Tensor<T> GenerateWithMicroCondition(string prompt, string? negativePrompt = null, int width = 1024, int height = 1024, int? originalWidth = null, int? originalHeight = null, int cropTop = 0, int cropLeft = 0, int numInferenceSteps = 50, double? guidanceScale = null, int? seed = null)
Parameters
promptstringThe text prompt describing the desired image.
negativePromptstringOptional negative prompt to guide away from.
widthintOutput image width.
heightintOutput image height.
originalWidthint?Original target width for conditioning.
originalHeightint?Original target height for conditioning.
cropTopintTop crop coordinate for conditioning.
cropLeftintLeft crop coordinate for conditioning.
numInferenceStepsintNumber of denoising steps.
guidanceScaledouble?Classifier-free guidance scale.
seedint?Optional random seed.
Returns
- Tensor<T>
Generated image tensor.
Remarks
For Beginners: Micro-conditioning helps SDXL generate better images at various aspect ratios by telling the model about the target size and any cropping applied during training.
When generating at non-square resolutions: - Set originalWidth/originalHeight to your target size - Set cropTop/cropLeft to 0 for centered generation - The model adjusts its generation accordingly
GetParameters()
Gets the parameters that can be optimized.
public override Vector<T> GetParameters()
Returns
- Vector<T>
RefineImage(Tensor<T>, string, string?, int, double, int?)
Refines an image using the SDXL refiner model.
public virtual Tensor<T> RefineImage(Tensor<T> image, string prompt, string? negativePrompt = null, int numInferenceSteps = 25, double denoiseStrength = 0.3, int? seed = null)
Parameters
imageTensor<T>The base image to refine.
promptstringThe text prompt (should match base generation).
negativePromptstringOptional negative prompt.
numInferenceStepsintNumber of refiner steps (typically 20-30).
denoiseStrengthdoubleHow much to denoise (0.2-0.4 typical for refining).
seedint?Optional random seed.
Returns
- Tensor<T>
Refined image tensor.
Remarks
For Beginners: The refiner is a specialized model that takes an already-generated image and enhances fine details:
Without refiner:
- Base SDXL generates good overall structure
- Some fine details may be slightly soft
With refiner:
- Details like skin texture, fabric, hair are enhanced
- Overall coherence is preserved
- Image looks more "finished"
Best practices:
- Use denoiseStrength 0.2-0.4 (higher = more change)
- Use 20-30 refiner steps
- Keep the same prompt as base generation
SetParameters(Vector<T>)
Sets the model parameters.
public override void SetParameters(Vector<T> parameters)
Parameters
parametersVector<T>The parameter vector to set.
Remarks
This method allows direct modification of the model's internal parameters.
This is useful for optimization algorithms that need to update parameters iteratively.
If the length of parameters does not match ParameterCount,
an ArgumentException should be thrown.
Exceptions
- ArgumentException
Thrown when the length of
parametersdoes not match ParameterCount.