Class PixArtModel<T>
PixArt-α model for efficient high-quality text-to-image generation using DiT architecture.
public class PixArtModel<T> : LatentDiffusionModelBase<T>, ILatentDiffusionModel<T>, IDiffusionModel<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>
Type Parameters
TThe numeric type used for calculations.
- Inheritance
-
PixArtModel<T>
- Implements
- Inherited Members
- Extension Methods
Examples
// Create a PixArt-α model
var pixart = new PixArtModel<float>();
// Generate an image efficiently
var image = pixart.GenerateFromText(
prompt: "A serene Japanese garden with cherry blossoms",
negativePrompt: "blurry, low quality",
width: 1024,
height: 1024,
numInferenceSteps: 20,
guidanceScale: 4.5,
seed: 42);
// Use different resolutions
var portrait = pixart.GenerateFromText(
prompt: "Portrait of an astronaut",
width: 768,
height: 1024);
// Generate multiple images with different seeds
var variations = pixart.GenerateVariations(
prompt: "Abstract art with vibrant colors",
count: 4);
Remarks
PixArt-α is an efficient text-to-image diffusion model that uses a Diffusion Transformer (DiT) architecture. It achieves comparable quality to larger models like Stable Diffusion XL while being significantly faster and more resource-efficient.
For Beginners: PixArt-α is like a sports car version of image generation:
Key advantages over traditional models:
- 10x faster training than Stable Diffusion
- Much more parameter-efficient
- Uses transformer blocks instead of U-Net
- T5-XXL text encoder for better prompt understanding
How PixArt-α works:
- Your prompt goes through a T5-XXL text encoder (larger = better understanding)
- The DiT (Diffusion Transformer) denoises using attention blocks
- Each block uses cross-attention to the text embedding
- The output is decoded by a VAE into an image
Example use cases:
- Fast prototyping (quick iterations)
- Resource-constrained environments (smaller models)
- High-quality generation without massive GPU requirements
- Applications requiring many generations
When to choose PixArt-α:
- You need faster generation than SDXL
- You want good quality without 70B+ model overhead
- Your prompts are complex (T5 encoder helps)
- You're doing many generations in batch
Technical specifications: - Architecture: Diffusion Transformer (DiT) with AdaLN-single - Text encoder: T5-XXL (4.3B parameters, optional smaller variants) - Native resolutions: 256x256 to 1024x1024 - Latent space: 4 channels, 8x spatial downsampling - Training: Decomposed training strategy for efficiency
Architecture innovations:
- Cross-attention in every DiT block
- AdaLN-single for timestep conditioning (not AdaLN-Zero)
- Efficient attention patterns
- Multi-resolution training support
Constructors
PixArtModel()
Initializes a new instance of PixArtModel with default parameters.
public PixArtModel()
Remarks
Creates a PixArt-α model with:
- 1024x1024 default resolution
- 1152 hidden dimension
- 16 attention heads
- 28 transformer layers
PixArtModel(string, IConditioningModule<T>?, INoiseScheduler<T>?, int?)
Initializes a new instance of PixArtModel with specified model size.
public PixArtModel(string modelSize = "alpha", IConditioningModule<T>? conditioner = null, INoiseScheduler<T>? scheduler = null, int? seed = null)
Parameters
modelSizestringModel variant: "alpha" (1024px), "sigma" (512px), or "delta" (256px).
conditionerIConditioningModule<T>Optional conditioning module for text encoding.
schedulerINoiseScheduler<T>Optional custom scheduler.
seedint?Optional random seed for reproducibility.
Fields
DefaultModelSize
Default model size variant.
public const string DefaultModelSize = "alpha"
Field Value
Properties
Conditioner
Gets the conditioning module (optional, for conditioned generation).
public override IConditioningModule<T>? Conditioner { get; }
Property Value
DefaultResolution
Gets the default resolution for this model.
public int DefaultResolution { get; }
Property Value
HiddenDimension
Gets the hidden dimension of the transformer.
public int HiddenDimension { get; }
Property Value
LatentChannels
Gets the number of latent channels.
public override int LatentChannels { get; }
Property Value
Remarks
Typically 4 for Stable Diffusion models.
ModelSize
Gets the model size variant.
public string ModelSize { get; }
Property Value
NoisePredictor
Gets the noise predictor model (U-Net, DiT, etc.).
public override INoisePredictor<T> NoisePredictor { get; }
Property Value
NumAttentionHeads
Gets the number of attention heads.
public int NumAttentionHeads { get; }
Property Value
NumLayers
Gets the number of transformer layers.
public int NumLayers { get; }
Property Value
ParameterCount
Gets the number of parameters in the model.
public override int ParameterCount { get; }
Property Value
Remarks
This property returns the total count of trainable parameters in the model. It's useful for understanding model complexity and memory requirements.
SupportsVariableAspectRatio
Gets whether this model supports variable aspect ratios.
public bool SupportsVariableAspectRatio { get; }
Property Value
VAE
Gets the VAE model used for encoding and decoding.
public override IVAEModel<T> VAE { get; }
Property Value
- IVAEModel<T>
Methods
Clone()
Creates a deep copy of the model.
public override IDiffusionModel<T> Clone()
Returns
- IDiffusionModel<T>
A new instance with the same parameters.
DeepCopy()
Creates a deep copy of this object.
public override IFullModel<T, Tensor<T>, Tensor<T>> DeepCopy()
Returns
- IFullModel<T, Tensor<T>, Tensor<T>>
GenerateFromText(string, string?, int, int, int, double?, int?)
Generates an image with PixArt-α's efficient DiT architecture.
public override Tensor<T> GenerateFromText(string prompt, string? negativePrompt = null, int width = 1024, int height = 1024, int numInferenceSteps = 20, double? guidanceScale = null, int? seed = null)
Parameters
promptstringThe text prompt describing the desired image.
negativePromptstringOptional negative prompt for things to avoid.
widthintImage width (should be divisible by 8).
heightintImage height (should be divisible by 8).
numInferenceStepsintNumber of denoising steps (20-50 recommended).
guidanceScaledouble?Classifier-free guidance scale (4.0-7.5 recommended).
seedint?Optional random seed for reproducibility.
Returns
- Tensor<T>
The generated image tensor.
Remarks
PixArt-α typically uses fewer steps than SDXL due to its efficient architecture. A guidance scale of 4.5 is commonly used (lower than SDXL's typical 7.5).
GenerateVariations(string, string?, int, int, int, int, double?, int?)
Generates multiple image variations with different seeds.
public virtual List<Tensor<T>> GenerateVariations(string prompt, string? negativePrompt = null, int count = 4, int width = 1024, int height = 1024, int numInferenceSteps = 20, double? guidanceScale = null, int? baseSeed = null)
Parameters
promptstringThe text prompt describing the desired images.
negativePromptstringOptional negative prompt for things to avoid.
countintNumber of variations to generate.
widthintImage width.
heightintImage height.
numInferenceStepsintNumber of denoising steps.
guidanceScaledouble?Classifier-free guidance scale.
baseSeedint?Optional base seed (variations will use baseSeed, baseSeed+1, etc.).
Returns
- List<Tensor<T>>
List of generated image tensors.
GenerateWithAspectRatio(string, string, string?, int, int, double?, int?)
Generates an image with specified aspect ratio preset.
public virtual Tensor<T> GenerateWithAspectRatio(string prompt, string aspectRatio = "1:1", string? negativePrompt = null, int baseResolution = 1024, int numInferenceSteps = 20, double? guidanceScale = null, int? seed = null)
Parameters
promptstringThe text prompt describing the desired image.
aspectRatiostringAspect ratio preset (e.g., "16:9", "4:3", "1:1", "9:16").
negativePromptstringOptional negative prompt.
baseResolutionintBase resolution for calculation (default 1024).
numInferenceStepsintNumber of denoising steps.
guidanceScaledouble?Classifier-free guidance scale.
seedint?Optional random seed.
Returns
- Tensor<T>
The generated image tensor.
GetModelMetadata()
Retrieves metadata and performance metrics about the trained model.
public override ModelMetadata<T> GetModelMetadata()
Returns
- ModelMetadata<T>
An object containing metadata and performance metrics about the trained model.
Remarks
This method provides information about the model's structure, parameters, and performance metrics.
For Beginners: Model metadata is like a report card for your machine learning model.
Just as a report card shows how well a student is performing in different subjects, model metadata shows how well your model is performing and provides details about its structure.
This information typically includes:
- Accuracy measures: How well does the model's predictions match actual values?
- Error metrics: How far off are the model's predictions on average?
- Model parameters: What patterns did the model learn from the data?
- Training information: How long did training take? How many iterations were needed?
For example, in a house price prediction model, metadata might include:
- Average prediction error (e.g., off by $15,000 on average)
- How strongly each feature (bedrooms, location) influences the prediction
- How well the model fits the training data
This information helps you understand your model's strengths and weaknesses, and decide if it's ready to use or needs more training.
GetParameters()
Gets the parameters that can be optimized.
public override Vector<T> GetParameters()
Returns
- Vector<T>
GetRecommendedSettings()
Gets the recommended settings for this model variant.
public (int inferenceSteps, double guidanceScale, int resolution) GetRecommendedSettings()
Returns
- (int inferenceSteps, double guidanceScale, int resolution)
A tuple containing (inferenceSteps, guidanceScale, resolution).
GetSupportedResolutions()
Gets supported resolutions for this model variant.
public List<(int width, int height, string name)> GetSupportedResolutions()
Returns
ImageToImage(Tensor<T>, string, string?, double, int, double?, int?)
Performs image-to-image transformation with PixArt-α.
public override Tensor<T> ImageToImage(Tensor<T> inputImage, string prompt, string? negativePrompt = null, double strength = 0.8, int numInferenceSteps = 20, double? guidanceScale = null, int? seed = null)
Parameters
inputImageTensor<T>The source image to transform.
promptstringThe text prompt for the transformation.
negativePromptstringOptional negative prompt.
strengthdoubleHow much to transform (0.0 = keep original, 1.0 = full generation).
numInferenceStepsintNumber of denoising steps.
guidanceScaledouble?Classifier-free guidance scale.
seedint?Optional random seed.
Returns
- Tensor<T>
The transformed image tensor.
SetParameters(Vector<T>)
Sets the model parameters.
public override void SetParameters(Vector<T> parameters)
Parameters
parametersVector<T>The parameter vector to set.
Remarks
This method allows direct modification of the model's internal parameters.
This is useful for optimization algorithms that need to update parameters iteratively.
If the length of parameters does not match ParameterCount,
an ArgumentException should be thrown.
Exceptions
- ArgumentException
Thrown when the length of
parametersdoes not match ParameterCount.