Table of Contents

Class PixArtModel<T>

Namespace
AiDotNet.Diffusion.Models
Assembly
AiDotNet.dll

PixArt-α model for efficient high-quality text-to-image generation using DiT architecture.

public class PixArtModel<T> : LatentDiffusionModelBase<T>, ILatentDiffusionModel<T>, IDiffusionModel<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>

Type Parameters

T

The numeric type used for calculations.

Inheritance
PixArtModel<T>
Implements
IFullModel<T, Tensor<T>, Tensor<T>>
IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>
IParameterizable<T, Tensor<T>, Tensor<T>>
ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>
IGradientComputable<T, Tensor<T>, Tensor<T>>
Inherited Members
Extension Methods

Examples

// Create a PixArt-α model
var pixart = new PixArtModel<float>();

// Generate an image efficiently
var image = pixart.GenerateFromText(
    prompt: "A serene Japanese garden with cherry blossoms",
    negativePrompt: "blurry, low quality",
    width: 1024,
    height: 1024,
    numInferenceSteps: 20,
    guidanceScale: 4.5,
    seed: 42);

// Use different resolutions
var portrait = pixart.GenerateFromText(
    prompt: "Portrait of an astronaut",
    width: 768,
    height: 1024);

// Generate multiple images with different seeds
var variations = pixart.GenerateVariations(
    prompt: "Abstract art with vibrant colors",
    count: 4);

Remarks

PixArt-α is an efficient text-to-image diffusion model that uses a Diffusion Transformer (DiT) architecture. It achieves comparable quality to larger models like Stable Diffusion XL while being significantly faster and more resource-efficient.

For Beginners: PixArt-α is like a sports car version of image generation:

Key advantages over traditional models:

  • 10x faster training than Stable Diffusion
  • Much more parameter-efficient
  • Uses transformer blocks instead of U-Net
  • T5-XXL text encoder for better prompt understanding

How PixArt-α works:

  1. Your prompt goes through a T5-XXL text encoder (larger = better understanding)
  2. The DiT (Diffusion Transformer) denoises using attention blocks
  3. Each block uses cross-attention to the text embedding
  4. The output is decoded by a VAE into an image

Example use cases:

  • Fast prototyping (quick iterations)
  • Resource-constrained environments (smaller models)
  • High-quality generation without massive GPU requirements
  • Applications requiring many generations

When to choose PixArt-α:

  • You need faster generation than SDXL
  • You want good quality without 70B+ model overhead
  • Your prompts are complex (T5 encoder helps)
  • You're doing many generations in batch

Technical specifications: - Architecture: Diffusion Transformer (DiT) with AdaLN-single - Text encoder: T5-XXL (4.3B parameters, optional smaller variants) - Native resolutions: 256x256 to 1024x1024 - Latent space: 4 channels, 8x spatial downsampling - Training: Decomposed training strategy for efficiency

Architecture innovations:

  • Cross-attention in every DiT block
  • AdaLN-single for timestep conditioning (not AdaLN-Zero)
  • Efficient attention patterns
  • Multi-resolution training support

Constructors

PixArtModel()

Initializes a new instance of PixArtModel with default parameters.

public PixArtModel()

Remarks

Creates a PixArt-α model with:

  • 1024x1024 default resolution
  • 1152 hidden dimension
  • 16 attention heads
  • 28 transformer layers

PixArtModel(string, IConditioningModule<T>?, INoiseScheduler<T>?, int?)

Initializes a new instance of PixArtModel with specified model size.

public PixArtModel(string modelSize = "alpha", IConditioningModule<T>? conditioner = null, INoiseScheduler<T>? scheduler = null, int? seed = null)

Parameters

modelSize string

Model variant: "alpha" (1024px), "sigma" (512px), or "delta" (256px).

conditioner IConditioningModule<T>

Optional conditioning module for text encoding.

scheduler INoiseScheduler<T>

Optional custom scheduler.

seed int?

Optional random seed for reproducibility.

Fields

DefaultModelSize

Default model size variant.

public const string DefaultModelSize = "alpha"

Field Value

string

Properties

Conditioner

Gets the conditioning module (optional, for conditioned generation).

public override IConditioningModule<T>? Conditioner { get; }

Property Value

IConditioningModule<T>

DefaultResolution

Gets the default resolution for this model.

public int DefaultResolution { get; }

Property Value

int

HiddenDimension

Gets the hidden dimension of the transformer.

public int HiddenDimension { get; }

Property Value

int

LatentChannels

Gets the number of latent channels.

public override int LatentChannels { get; }

Property Value

int

Remarks

Typically 4 for Stable Diffusion models.

ModelSize

Gets the model size variant.

public string ModelSize { get; }

Property Value

string

NoisePredictor

Gets the noise predictor model (U-Net, DiT, etc.).

public override INoisePredictor<T> NoisePredictor { get; }

Property Value

INoisePredictor<T>

NumAttentionHeads

Gets the number of attention heads.

public int NumAttentionHeads { get; }

Property Value

int

NumLayers

Gets the number of transformer layers.

public int NumLayers { get; }

Property Value

int

ParameterCount

Gets the number of parameters in the model.

public override int ParameterCount { get; }

Property Value

int

Remarks

This property returns the total count of trainable parameters in the model. It's useful for understanding model complexity and memory requirements.

SupportsVariableAspectRatio

Gets whether this model supports variable aspect ratios.

public bool SupportsVariableAspectRatio { get; }

Property Value

bool

VAE

Gets the VAE model used for encoding and decoding.

public override IVAEModel<T> VAE { get; }

Property Value

IVAEModel<T>

Methods

Clone()

Creates a deep copy of the model.

public override IDiffusionModel<T> Clone()

Returns

IDiffusionModel<T>

A new instance with the same parameters.

DeepCopy()

Creates a deep copy of this object.

public override IFullModel<T, Tensor<T>, Tensor<T>> DeepCopy()

Returns

IFullModel<T, Tensor<T>, Tensor<T>>

GenerateFromText(string, string?, int, int, int, double?, int?)

Generates an image with PixArt-α's efficient DiT architecture.

public override Tensor<T> GenerateFromText(string prompt, string? negativePrompt = null, int width = 1024, int height = 1024, int numInferenceSteps = 20, double? guidanceScale = null, int? seed = null)

Parameters

prompt string

The text prompt describing the desired image.

negativePrompt string

Optional negative prompt for things to avoid.

width int

Image width (should be divisible by 8).

height int

Image height (should be divisible by 8).

numInferenceSteps int

Number of denoising steps (20-50 recommended).

guidanceScale double?

Classifier-free guidance scale (4.0-7.5 recommended).

seed int?

Optional random seed for reproducibility.

Returns

Tensor<T>

The generated image tensor.

Remarks

PixArt-α typically uses fewer steps than SDXL due to its efficient architecture. A guidance scale of 4.5 is commonly used (lower than SDXL's typical 7.5).

GenerateVariations(string, string?, int, int, int, int, double?, int?)

Generates multiple image variations with different seeds.

public virtual List<Tensor<T>> GenerateVariations(string prompt, string? negativePrompt = null, int count = 4, int width = 1024, int height = 1024, int numInferenceSteps = 20, double? guidanceScale = null, int? baseSeed = null)

Parameters

prompt string

The text prompt describing the desired images.

negativePrompt string

Optional negative prompt for things to avoid.

count int

Number of variations to generate.

width int

Image width.

height int

Image height.

numInferenceSteps int

Number of denoising steps.

guidanceScale double?

Classifier-free guidance scale.

baseSeed int?

Optional base seed (variations will use baseSeed, baseSeed+1, etc.).

Returns

List<Tensor<T>>

List of generated image tensors.

GenerateWithAspectRatio(string, string, string?, int, int, double?, int?)

Generates an image with specified aspect ratio preset.

public virtual Tensor<T> GenerateWithAspectRatio(string prompt, string aspectRatio = "1:1", string? negativePrompt = null, int baseResolution = 1024, int numInferenceSteps = 20, double? guidanceScale = null, int? seed = null)

Parameters

prompt string

The text prompt describing the desired image.

aspectRatio string

Aspect ratio preset (e.g., "16:9", "4:3", "1:1", "9:16").

negativePrompt string

Optional negative prompt.

baseResolution int

Base resolution for calculation (default 1024).

numInferenceSteps int

Number of denoising steps.

guidanceScale double?

Classifier-free guidance scale.

seed int?

Optional random seed.

Returns

Tensor<T>

The generated image tensor.

GetModelMetadata()

Retrieves metadata and performance metrics about the trained model.

public override ModelMetadata<T> GetModelMetadata()

Returns

ModelMetadata<T>

An object containing metadata and performance metrics about the trained model.

Remarks

This method provides information about the model's structure, parameters, and performance metrics.

For Beginners: Model metadata is like a report card for your machine learning model.

Just as a report card shows how well a student is performing in different subjects, model metadata shows how well your model is performing and provides details about its structure.

This information typically includes:

  • Accuracy measures: How well does the model's predictions match actual values?
  • Error metrics: How far off are the model's predictions on average?
  • Model parameters: What patterns did the model learn from the data?
  • Training information: How long did training take? How many iterations were needed?

For example, in a house price prediction model, metadata might include:

  • Average prediction error (e.g., off by $15,000 on average)
  • How strongly each feature (bedrooms, location) influences the prediction
  • How well the model fits the training data

This information helps you understand your model's strengths and weaknesses, and decide if it's ready to use or needs more training.

GetParameters()

Gets the parameters that can be optimized.

public override Vector<T> GetParameters()

Returns

Vector<T>

GetRecommendedSettings()

Gets the recommended settings for this model variant.

public (int inferenceSteps, double guidanceScale, int resolution) GetRecommendedSettings()

Returns

(int inferenceSteps, double guidanceScale, int resolution)

A tuple containing (inferenceSteps, guidanceScale, resolution).

GetSupportedResolutions()

Gets supported resolutions for this model variant.

public List<(int width, int height, string name)> GetSupportedResolutions()

Returns

List<(int ToNode, int FromNode, string Operation)>

List of supported resolution presets.

ImageToImage(Tensor<T>, string, string?, double, int, double?, int?)

Performs image-to-image transformation with PixArt-α.

public override Tensor<T> ImageToImage(Tensor<T> inputImage, string prompt, string? negativePrompt = null, double strength = 0.8, int numInferenceSteps = 20, double? guidanceScale = null, int? seed = null)

Parameters

inputImage Tensor<T>

The source image to transform.

prompt string

The text prompt for the transformation.

negativePrompt string

Optional negative prompt.

strength double

How much to transform (0.0 = keep original, 1.0 = full generation).

numInferenceSteps int

Number of denoising steps.

guidanceScale double?

Classifier-free guidance scale.

seed int?

Optional random seed.

Returns

Tensor<T>

The transformed image tensor.

SetParameters(Vector<T>)

Sets the model parameters.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>

The parameter vector to set.

Remarks

This method allows direct modification of the model's internal parameters. This is useful for optimization algorithms that need to update parameters iteratively. If the length of parameters does not match ParameterCount, an ArgumentException should be thrown.

Exceptions

ArgumentException

Thrown when the length of parameters does not match ParameterCount.