Class PixArtModel<T>

Namespace: AiDotNet.Diffusion.Models

Assembly: AiDotNet.dll

PixArt-α model for efficient high-quality text-to-image generation using DiT architecture.

public class PixArtModel<T> : LatentDiffusionModelBase<T>, ILatentDiffusionModel<T>, IDiffusionModel<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>

Type Parameters

T: The numeric type used for calculations.

Inheritance: object

DiffusionModelBase<T>

LatentDiffusionModelBase<T>

PixArtModel<T>

Implements: ILatentDiffusionModel<T>

IDiffusionModel<T>

IFullModel<T, Tensor<T>, Tensor<T>>

IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>

IModelSerializer

ICheckpointableModel

IParameterizable<T, Tensor<T>, Tensor<T>>

IFeatureAware

IFeatureImportance<T>

ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>

IGradientComputable<T, Tensor<T>, Tensor<T>>

IJitCompilable<T>

Inherited Members: LatentDiffusionModelBase<T>.GuidanceScale

LatentDiffusionModelBase<T>.SupportsNegativePrompt

LatentDiffusionModelBase<T>.SupportsInpainting

LatentDiffusionModelBase<T>.EncodeToLatent(Tensor<T>, bool)

LatentDiffusionModelBase<T>.DecodeFromLatent(Tensor<T>)

LatentDiffusionModelBase<T>.Inpaint(Tensor<T>, Tensor<T>, string, string, int, double?, int?)

LatentDiffusionModelBase<T>.SetGuidanceScale(double)

LatentDiffusionModelBase<T>.PredictNoise(Tensor<T>, int)

LatentDiffusionModelBase<T>.Generate(int[], int, int?)

LatentDiffusionModelBase<T>.ApplyGuidance(Tensor<T>, Tensor<T>, double)

LatentDiffusionModelBase<T>.SampleNoiseTensor(int[], Random)

LatentDiffusionModelBase<T>.ResizeMaskToLatent(Tensor<T>, int[])

LatentDiffusionModelBase<T>.BlendLatentsWithMask(Tensor<T>, Tensor<T>, Tensor<T>, int)

DiffusionModelBase<T>.NumOps

DiffusionModelBase<T>.RandomGenerator

DiffusionModelBase<T>.LossFunction

DiffusionModelBase<T>.LearningRate

DiffusionModelBase<T>.Scheduler

DiffusionModelBase<T>.DefaultLossFunction

DiffusionModelBase<T>.SupportsJitCompilation

DiffusionModelBase<T>.ComputeLoss(Tensor<T>, Tensor<T>, int[])

DiffusionModelBase<T>.Train(Tensor<T>, Tensor<T>)

DiffusionModelBase<T>.Predict(Tensor<T>)

DiffusionModelBase<T>.WithParameters(Vector<T>)

DiffusionModelBase<T>.Serialize()

DiffusionModelBase<T>.Deserialize(byte[])

DiffusionModelBase<T>.SaveModel(string)

DiffusionModelBase<T>.LoadModel(string)

DiffusionModelBase<T>.SaveState(Stream)

DiffusionModelBase<T>.LoadState(Stream)

DiffusionModelBase<T>.GetActiveFeatureIndices()

DiffusionModelBase<T>.SetActiveFeatureIndices(IEnumerable<int>)

DiffusionModelBase<T>.IsFeatureUsed(int)

DiffusionModelBase<T>.GetFeatureImportance()

DiffusionModelBase<T>.ComputeGradients(Tensor<T>, Tensor<T>, ILossFunction<T>)

DiffusionModelBase<T>.ApplyGradients(Vector<T>, T)

DiffusionModelBase<T>.ExportComputationGraph(List<ComputationNode<T>>)

DiffusionModelBase<T>.SampleNoise(int, Random)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Extension Methods: DistributedExtensions.AsDistributedForHighBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributedForLowBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, IShardingConfiguration<T>)

Examples

// Create a PixArt-α model
var pixart = new PixArtModel<float>();

// Generate an image efficiently
var image = pixart.GenerateFromText(
    prompt: "A serene Japanese garden with cherry blossoms",
    negativePrompt: "blurry, low quality",
    width: 1024,
    height: 1024,
    numInferenceSteps: 20,
    guidanceScale: 4.5,
    seed: 42);

// Use different resolutions
var portrait = pixart.GenerateFromText(
    prompt: "Portrait of an astronaut",
    width: 768,
    height: 1024);

// Generate multiple images with different seeds
var variations = pixart.GenerateVariations(
    prompt: "Abstract art with vibrant colors",
    count: 4);

Remarks

PixArt-α is an efficient text-to-image diffusion model that uses a Diffusion Transformer (DiT) architecture. It achieves comparable quality to larger models like Stable Diffusion XL while being significantly faster and more resource-efficient.

For Beginners: PixArt-α is like a sports car version of image generation:

Key advantages over traditional models:

10x faster training than Stable Diffusion
Much more parameter-efficient
Uses transformer blocks instead of U-Net
T5-XXL text encoder for better prompt understanding

How PixArt-α works:

Your prompt goes through a T5-XXL text encoder (larger = better understanding)
The DiT (Diffusion Transformer) denoises using attention blocks
Each block uses cross-attention to the text embedding
The output is decoded by a VAE into an image

Example use cases:

Fast prototyping (quick iterations)
Resource-constrained environments (smaller models)
High-quality generation without massive GPU requirements
Applications requiring many generations

When to choose PixArt-α:

You need faster generation than SDXL
You want good quality without 70B+ model overhead
Your prompts are complex (T5 encoder helps)
You're doing many generations in batch

Technical specifications: - Architecture: Diffusion Transformer (DiT) with AdaLN-single - Text encoder: T5-XXL (4.3B parameters, optional smaller variants) - Native resolutions: 256x256 to 1024x1024 - Latent space: 4 channels, 8x spatial downsampling - Training: Decomposed training strategy for efficiency

Architecture innovations:

Cross-attention in every DiT block
AdaLN-single for timestep conditioning (not AdaLN-Zero)
Efficient attention patterns
Multi-resolution training support

Constructors

PixArtModel()

Initializes a new instance of PixArtModel with default parameters.

public PixArtModel()

Remarks

Creates a PixArt-α model with:

1024x1024 default resolution
1152 hidden dimension
16 attention heads
28 transformer layers

PixArtModel(string, IConditioningModule<T>?, INoiseScheduler<T>?, int?)

Initializes a new instance of PixArtModel with specified model size.

public PixArtModel(string modelSize = "alpha", IConditioningModule<T>? conditioner = null, INoiseScheduler<T>? scheduler = null, int? seed = null)

Parameters

modelSize string: Model variant: "alpha" (1024px), "sigma" (512px), or "delta" (256px).
conditioner IConditioningModule<T>: Optional conditioning module for text encoding.
scheduler INoiseScheduler<T>: Optional custom scheduler.
seed int?: Optional random seed for reproducibility.

Fields

DefaultModelSize

Default model size variant.

public const string DefaultModelSize = "alpha"

Field Value

string

Properties

Conditioner

Gets the conditioning module (optional, for conditioned generation).

public override IConditioningModule<T>? Conditioner { get; }

Property Value

IConditioningModule<T>

DefaultResolution

Gets the default resolution for this model.

public int DefaultResolution { get; }

Property Value

int

HiddenDimension

Gets the hidden dimension of the transformer.

public int HiddenDimension { get; }

Property Value

int

LatentChannels

Gets the number of latent channels.

public override int LatentChannels { get; }

Property Value

int

Remarks

Typically 4 for Stable Diffusion models.

ModelSize

Gets the model size variant.

public string ModelSize { get; }

Property Value

string

NoisePredictor

Gets the noise predictor model (U-Net, DiT, etc.).

public override INoisePredictor<T> NoisePredictor { get; }

Property Value

INoisePredictor<T>

NumAttentionHeads

Gets the number of attention heads.

public int NumAttentionHeads { get; }

Property Value

int

NumLayers

Gets the number of transformer layers.

public int NumLayers { get; }

Property Value

int

ParameterCount

Gets the number of parameters in the model.

public override int ParameterCount { get; }

Property Value

int

Remarks

This property returns the total count of trainable parameters in the model. It's useful for understanding model complexity and memory requirements.

SupportsVariableAspectRatio

Gets whether this model supports variable aspect ratios.

public bool SupportsVariableAspectRatio { get; }

Property Value

bool

VAE

Gets the VAE model used for encoding and decoding.

public override IVAEModel<T> VAE { get; }

Property Value

IVAEModel<T>

Methods

Clone()

Creates a deep copy of the model.

public override IDiffusionModel<T> Clone()

Returns

IDiffusionModel<T>: A new instance with the same parameters.

DeepCopy()

Creates a deep copy of this object.

public override IFullModel<T, Tensor<T>, Tensor<T>> DeepCopy()

Returns

IFullModel<T, Tensor<T>, Tensor<T>>

GenerateFromText(string, string?, int, int, int, double?, int?)

Generates an image with PixArt-α's efficient DiT architecture.

public override Tensor<T> GenerateFromText(string prompt, string? negativePrompt = null, int width = 1024, int height = 1024, int numInferenceSteps = 20, double? guidanceScale = null, int? seed = null)

Parameters

prompt string: The text prompt describing the desired image.
negativePrompt string: Optional negative prompt for things to avoid.
width int: Image width (should be divisible by 8).
height int: Image height (should be divisible by 8).
numInferenceSteps int: Number of denoising steps (20-50 recommended).
guidanceScale double?: Classifier-free guidance scale (4.0-7.5 recommended).
seed int?: Optional random seed for reproducibility.

Returns

Tensor<T>: The generated image tensor.

Remarks

PixArt-α typically uses fewer steps than SDXL due to its efficient architecture. A guidance scale of 4.5 is commonly used (lower than SDXL's typical 7.5).

GenerateVariations(string, string?, int, int, int, int, double?, int?)

Generates multiple image variations with different seeds.

public virtual List<Tensor<T>> GenerateVariations(string prompt, string? negativePrompt = null, int count = 4, int width = 1024, int height = 1024, int numInferenceSteps = 20, double? guidanceScale = null, int? baseSeed = null)

Parameters

prompt string: The text prompt describing the desired images.
negativePrompt string: Optional negative prompt for things to avoid.
count int: Number of variations to generate.
width int: Image width.
height int: Image height.
numInferenceSteps int: Number of denoising steps.
guidanceScale double?: Classifier-free guidance scale.
baseSeed int?: Optional base seed (variations will use baseSeed, baseSeed+1, etc.).

Returns

List<Tensor<T>>: List of generated image tensors.

GenerateWithAspectRatio(string, string, string?, int, int, double?, int?)

Generates an image with specified aspect ratio preset.

public virtual Tensor<T> GenerateWithAspectRatio(string prompt, string aspectRatio = "1:1", string? negativePrompt = null, int baseResolution = 1024, int numInferenceSteps = 20, double? guidanceScale = null, int? seed = null)

Parameters

prompt string: The text prompt describing the desired image.
aspectRatio string: Aspect ratio preset (e.g., "16:9", "4:3", "1:1", "9:16").
negativePrompt string: Optional negative prompt.
baseResolution int: Base resolution for calculation (default 1024).
numInferenceSteps int: Number of denoising steps.
guidanceScale double?: Classifier-free guidance scale.
seed int?: Optional random seed.

Returns

Tensor<T>: The generated image tensor.

GetModelMetadata()

Retrieves metadata and performance metrics about the trained model.

public override ModelMetadata<T> GetModelMetadata()

Returns

ModelMetadata<T>: An object containing metadata and performance metrics about the trained model.

Remarks

This method provides information about the model's structure, parameters, and performance metrics.

For Beginners: Model metadata is like a report card for your machine learning model.

Just as a report card shows how well a student is performing in different subjects, model metadata shows how well your model is performing and provides details about its structure.

This information typically includes:

Accuracy measures: How well does the model's predictions match actual values?
Error metrics: How far off are the model's predictions on average?
Model parameters: What patterns did the model learn from the data?
Training information: How long did training take? How many iterations were needed?

For example, in a house price prediction model, metadata might include:

Average prediction error (e.g., off by $15,000 on average)
How strongly each feature (bedrooms, location) influences the prediction
How well the model fits the training data

This information helps you understand your model's strengths and weaknesses, and decide if it's ready to use or needs more training.

GetParameters()

Gets the parameters that can be optimized.

public override Vector<T> GetParameters()

Returns

Vector<T>

GetRecommendedSettings()

Gets the recommended settings for this model variant.

public (int inferenceSteps, double guidanceScale, int resolution) GetRecommendedSettings()

Returns

(int inferenceSteps, double guidanceScale, int resolution): A tuple containing (inferenceSteps, guidanceScale, resolution).

GetSupportedResolutions()

Gets supported resolutions for this model variant.

public List<(int width, int height, string name)> GetSupportedResolutions()

Returns

List<(int ToNode, int FromNode, string Operation)>: List of supported resolution presets.

ImageToImage(Tensor<T>, string, string?, double, int, double?, int?)

Performs image-to-image transformation with PixArt-α.

public override Tensor<T> ImageToImage(Tensor<T> inputImage, string prompt, string? negativePrompt = null, double strength = 0.8, int numInferenceSteps = 20, double? guidanceScale = null, int? seed = null)

Parameters

inputImage Tensor<T>: The source image to transform.
prompt string: The text prompt for the transformation.
negativePrompt string: Optional negative prompt.
strength double: How much to transform (0.0 = keep original, 1.0 = full generation).
numInferenceSteps int: Number of denoising steps.
guidanceScale double?: Classifier-free guidance scale.
seed int?: Optional random seed.

Returns

Tensor<T>: The transformed image tensor.

SetParameters(Vector<T>)

Sets the model parameters.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>: The parameter vector to set.

Remarks

This method allows direct modification of the model's internal parameters. This is useful for optimization algorithms that need to update parameters iteratively. If the length of parameters does not match ParameterCount, an ArgumentException should be thrown.

Exceptions

ArgumentException: Thrown when the length of parameters does not match ParameterCount.

Table of Contents

Class PixArtModel<T>

Type Parameters

Examples

Remarks

Constructors

PixArtModel()

Remarks

PixArtModel(string, IConditioningModule<T>?, INoiseScheduler<T>?, int?)

Parameters

Fields

DefaultModelSize

Field Value

Properties

Conditioner

Property Value

DefaultResolution

Property Value

HiddenDimension

Property Value

LatentChannels

Property Value

Remarks

ModelSize

Property Value

NoisePredictor

Property Value

NumAttentionHeads

Property Value

NumLayers

Property Value

ParameterCount

Property Value

Remarks

SupportsVariableAspectRatio

Property Value

VAE

Property Value

Methods

Clone()

Returns

DeepCopy()

Returns

GenerateFromText(string, string?, int, int, int, double?, int?)

Parameters

Returns

Remarks

GenerateVariations(string, string?, int, int, int, int, double?, int?)

Parameters

Returns

GenerateWithAspectRatio(string, string, string?, int, int, double?, int?)

Parameters

Returns

GetModelMetadata()

Returns

Remarks

GetParameters()

Returns

GetRecommendedSettings()

Returns

GetSupportedResolutions()

Returns

ImageToImage(Tensor<T>, string, string?, double, int, double?, int?)

Parameters

Returns

SetParameters(Vector<T>)

Parameters

Remarks

Exceptions