Table of Contents

Class VAEDecoder<T>

Namespace
AiDotNet.Diffusion.VAE
Assembly
AiDotNet.dll

Convolutional decoder for VAE that reconstructs images from latent space.

public class VAEDecoder<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable

Type Parameters

T

The numeric type used for calculations.

Inheritance
VAEDecoder<T>
Implements
Inherited Members

Remarks

This implements the decoder portion of a VAE following the Stable Diffusion architecture: - Post-quant convolution to expand latent channels - Middle blocks at the bottleneck - Multiple UpBlocks with transposed conv upsampling and ResBlocks - Output convolution to produce final image channels

For Beginners: The VAE decoder is like an intelligent image decompressor.

What it does step by step:

  1. Takes a compressed latent (e.g., 64x64x4)
  2. Post-quant conv: Expands channels (4 -> 512)
  3. Middle blocks: Extra processing at the bottleneck
  4. UpBlocks: Progressively doubles resolution while decreasing channels
    • Block 1: 512 channels, 64x64 -> 64x64 (no upsample at start)
    • Block 2: 512 channels, 64x64 -> 128x128
    • Block 3: 256 channels, 128x128 -> 256x256
    • Block 4: 128 channels, 256x256 -> 512x512
  5. Output: Produces 3-channel RGB image with tanh activation

The result is a high-resolution image reconstructed from the compressed latent.

Constructors

VAEDecoder(int, int, int, int[]?, int, int, int)

Initializes a new instance of the VAEDecoder class.

public VAEDecoder(int outputChannels = 3, int latentChannels = 4, int baseChannels = 128, int[]? channelMults = null, int numResBlocks = 2, int numGroups = 32, int outputSpatialSize = 512)

Parameters

outputChannels int

Number of output image channels (default: 3 for RGB).

latentChannels int

Number of latent channels (default: 4).

baseChannels int

Base channel count (default: 128).

channelMults int[]

Channel multipliers per level (default: [1, 2, 4, 4]).

numResBlocks int

Number of residual blocks per UpBlock (default: 2).

numGroups int

Number of groups for GroupNorm (default: 32).

outputSpatialSize int

Spatial size of output images (default: 512).

Properties

LatentChannels

Gets the number of latent channels.

public int LatentChannels { get; }

Property Value

int

OutputChannels

Gets the number of output channels.

public int OutputChannels { get; }

Property Value

int

SupportsJitCompilation

Gets whether this layer supports JIT compilation.

public override bool SupportsJitCompilation { get; }

Property Value

bool

True if the layer can be JIT compiled, false otherwise.

Remarks

This property indicates whether the layer has implemented ExportComputationGraph() and can benefit from JIT compilation. All layers MUST implement this property.

For Beginners: JIT compilation can make inference 5-10x faster by converting the layer's operations into optimized native code.

Layers should return false if they:

  • Have not yet implemented a working ExportComputationGraph()
  • Use dynamic operations that change based on input data
  • Are too simple to benefit from JIT compilation

When false, the layer will use the standard Forward() method instead.

SupportsTraining

Gets a value indicating whether this layer supports training.

public override bool SupportsTraining { get; }

Property Value

bool

true if the layer has trainable parameters and supports backpropagation; otherwise, false.

Remarks

This property indicates whether the layer can be trained through backpropagation. Layers with trainable parameters such as weights and biases typically return true, while layers that only perform fixed transformations (like pooling or activation layers) typically return false.

For Beginners: This property tells you if the layer can learn from data.

A value of true means:

  • The layer has parameters that can be adjusted during training
  • It will improve its performance as it sees more data
  • It participates in the learning process

A value of false means:

  • The layer doesn't have any adjustable parameters
  • It performs the same operation regardless of training
  • It doesn't need to learn (but may still be useful)

UpsampleFactor

Gets the upsampling factor (spatial expansion from input to output).

public int UpsampleFactor { get; }

Property Value

int

Methods

Backward(Tensor<T>)

Performs the backward pass through the decoder.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>

Returns

Tensor<T>

Deserialize(BinaryReader)

Loads the decoder's state from a binary reader.

public override void Deserialize(BinaryReader reader)

Parameters

reader BinaryReader

ExportComputationGraph(List<ComputationNode<T>>)

Exports the layer's computation graph for JIT compilation.

public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)

Parameters

inputNodes List<ComputationNode<T>>

List to populate with input computation nodes.

Returns

ComputationNode<T>

The output computation node representing the layer's operation.

Remarks

This method constructs a computation graph representation of the layer's forward pass that can be JIT compiled for faster inference. All layers MUST implement this method to support JIT compilation.

For Beginners: JIT (Just-In-Time) compilation converts the layer's operations into optimized native code for 5-10x faster inference.

To support JIT compilation, a layer must:

  1. Implement this method to export its computation graph
  2. Set SupportsJitCompilation to true
  3. Use ComputationNode and TensorOperations to build the graph

All layers are required to implement this method, even if they set SupportsJitCompilation = false.

Forward(Tensor<T>)

Decodes a latent representation to an image.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>

Latent tensor [batch, latentChannels, H, W].

Returns

Tensor<T>

Decoded image [batch, outputChannels, Hf, Wf] where f is upsample factor.

GetParameters()

Gets all trainable parameters as a single vector.

public override Vector<T> GetParameters()

Returns

Vector<T>

ResetState()

Resets the internal state of the decoder.

public override void ResetState()

Serialize(BinaryWriter)

Saves the decoder's state to a binary writer.

public override void Serialize(BinaryWriter writer)

Parameters

writer BinaryWriter

SetParameters(Vector<T>)

Sets all trainable parameters from a single vector.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>

UpdateParameters(T)

Updates all learnable parameters using gradient descent.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T