Class VariationalAutoencoder<T>
- Namespace
- AiDotNet.NeuralNetworks
- Assembly
- AiDotNet.dll
Represents a Variational Autoencoder (VAE) neural network architecture, which is used for generating new data similar to the training data and learning compressed representations.
public class VariationalAutoencoder<T> : NeuralNetworkBase<T>, INeuralNetworkModel<T>, INeuralNetwork<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>, IInterpretableModel<T>, IInputGradientComputable<T>, IDisposable, IAuxiliaryLossLayer<T>, IDiagnosticsProvider
Type Parameters
TThe data type used for calculations (typically float or double).
- Inheritance
-
VariationalAutoencoder<T>
- Implements
- Inherited Members
- Extension Methods
Remarks
A Variational Autoencoder is a type of generative model that learns to encode input data into a probabilistic latent space and then decode samples from that space back into the original data space. Unlike standard autoencoders, VAEs ensure the latent space has good properties for generating new samples by learning a distribution rather than a fixed encoding.
VAEs consist of: - An encoder network that maps input data to a probability distribution in latent space - A sampling mechanism that draws samples from this distribution - A decoder network that maps samples from latent space back to the original data space
For Beginners: A Variational Autoencoder is like a creative compression system.
Imagine you have a folder full of photos of cats:
- The encoder is like a person who studies all these photos and learns to describe any cat using just a few key attributes (like fur color, ear shape, size)
- These few attributes are the "latent space" - a much smaller representation of the data
- The special thing about a VAE is that instead of exact values, it describes each attribute as a range of possible values (a probability distribution)
- The decoder is like an artist who can take these attribute descriptions and draw a new cat based on them
This ability to work with probability distributions means:
- You can generate new, never-before-seen cats by sampling from these distributions
- The generated cats will look realistic because they follow the patterns learned from real cats
- You can smoothly transition between different types of cats by moving through the latent space
VAEs are used for image generation, data compression, anomaly detection, and other creative applications.
Constructors
VariationalAutoencoder(NeuralNetworkArchitecture<T>, int, IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>?, ILossFunction<T>?, double)
Initializes a new instance of the VariationalAutoencoder<T> class with the specified architecture, latent space size, and optional optimizer and loss function.
public VariationalAutoencoder(NeuralNetworkArchitecture<T> architecture, int latentSize, IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>? optimizer = null, ILossFunction<T>? lossFunction = null, double maxGradNorm = 1)
Parameters
architectureNeuralNetworkArchitecture<T>The neural network architecture that defines the overall structure.
latentSizeintThe size of the latent space dimension.
optimizerIGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>The gradient optimizer to use for training (optional).
lossFunctionILossFunction<T>The loss function to use for reconstruction loss (optional).
maxGradNormdouble
Remarks
This constructor creates a new VAE with the provided architecture, latent size, and optional optimizer and loss function. If no optimizer or loss function is provided, default ones will be used.
For Beginners: This is where you set up your VAE with basic settings and optional advanced configurations.
When creating a VAE, you need to specify:
- The overall architecture (like how many layers, their sizes, etc.)
- The latent size (how many attributes or features to use in the compressed representation)
You can also optionally specify:
- An optimizer (a method for adjusting the network's internal values during training)
- A loss function (a way to measure how well the VAE is performing)
If you don't specify an optimizer or loss function, the VAE will use default options that work well in most cases.
Properties
AuxiliaryLossWeight
Gets or sets the weight (beta parameter) for the KL divergence auxiliary loss. Default is 1.0. Can be adjusted for beta-VAE variants.
public T AuxiliaryLossWeight { get; set; }
Property Value
- T
LatentSize
Gets the size of the latent space dimension in the Variational Autoencoder.
public int LatentSize { get; }
Property Value
Remarks
The latent size determines the dimensionality of the compressed representation that the VAE learns. Smaller values create more compressed representations but may lose more information, while larger values preserve more details but may be less efficient for compression or generation.
For Beginners: The latent size is like the number of describing words you can use.
Think of it as how many attributes you can use to describe the data:
- A small latent size (e.g., 2-10) means using very few attributes, creating a highly compressed but possibly less detailed representation
- A larger latent size (e.g., 32-256) allows for more detailed representations but requires more computation
For example, if you're working with face images:
- A small latent size might only capture basic features like hair color and face shape
- A larger latent size could capture more subtle details like wrinkles, lighting, and expressions
The right latent size depends on your specific task - smaller for simple datasets, larger for complex ones.
UseAuxiliaryLoss
Gets or sets whether to use auxiliary loss (KL divergence) during training. For VAEs, this should always be true as KL divergence is required for proper functioning.
public bool UseAuxiliaryLoss { get; set; }
Property Value
Methods
ComputeAuxiliaryLoss()
Computes the auxiliary loss for the VAE, which is the KL divergence between the learned latent distribution and a standard normal distribution.
public T ComputeAuxiliaryLoss()
Returns
- T
The KL divergence loss value.
Remarks
The KL divergence is computed as: -0.5 * Σ(1 + log(σ²) - μ² - σ²) This regularizes the latent space to follow a standard normal distribution, which is essential for VAEs to generate new samples and ensure a smooth, continuous latent space.
For Beginners: This computes how different the VAE's compression is from an ideal "standard" compression.
The KL divergence measures:
- How much the learned latent space differs from a standard normal distribution
- This difference acts as a penalty to encourage the VAE to organize its latent space properly
Without this loss:
- The VAE might create "holes" in the latent space where nothing meaningful exists
- Generated samples might not look realistic
- The latent space might not be smooth or continuous
The KL divergence ensures the latent space has good properties for generation and interpolation.
CreateNewInstance()
Creates a new instance of the Variational Autoencoder with the same architecture and configuration.
protected override IFullModel<T, Tensor<T>, Tensor<T>> CreateNewInstance()
Returns
- IFullModel<T, Tensor<T>, Tensor<T>>
A new instance of the Variational Autoencoder with the same configuration as the current instance.
Remarks
This method creates a new Variational Autoencoder with the same architecture, latent size, optimizer, loss function, and gradient clipping settings as the current instance. The new instance has freshly initialized parameters, making it useful for creating separate instances with identical configurations or for resetting the network while preserving its structure.
For Beginners: This creates a brand new VAE with the same setup as the current one.
Think of it like creating a copy of your VAE's blueprint:
- It has the same overall structure
- It uses the same latent size (compression level)
- It has the same optimizer (learning method)
- It uses the same loss function (way of measuring performance)
- But it starts with fresh parameters (internal values)
This is useful when you want to:
- Start over with a fresh network but keep the same design
- Create multiple networks with identical settings for comparison
- Reset a network to its initial state
The new VAE will need to be trained from scratch, as it doesn't inherit any of the learned knowledge from the original network.
Decode(Vector<T>)
Decodes a vector from the latent space back to the original data space.
public Vector<T> Decode(Vector<T> latentVector)
Parameters
latentVectorVector<T>The vector in latent space to decode.
Returns
- Vector<T>
The reconstructed vector in the original data space.
Remarks
This method passes a latent vector through the decoder portion of the VAE to generate a reconstruction in the original data space. The decoder learns to map points in the latent space back to the format of the original input data.
For Beginners: This method recreates the original-style data from the compressed representation.
When you decode a latent vector:
- The vector passes through the second half of the network (the decoder)
- The decoder transforms the compact representation back to the original data format
For example, with a face image:
- The latent vector might contain compressed information about features like hair color, face shape, etc.
- The decoder uses this information to generate a complete face image
The amazing thing is that you can:
- Decode latent vectors that didn't come from real inputs
- Generate new, never-before-seen but realistic-looking data
- Smoothly transition between different types of outputs by moving through the latent space
DeserializeNetworkSpecificData(BinaryReader)
Deserializes network-specific data for the Variational Autoencoder.
protected override void DeserializeNetworkSpecificData(BinaryReader reader)
Parameters
readerBinaryReaderThe BinaryReader to read the data from.
Remarks
This method reads the specific configuration and state of the VAE from a binary stream. It reconstructs the network-specific parameters to match the state of the network when it was serialized.
Encode(Vector<T>)
Encodes an input vector into mean and log variance parameters in the latent space.
public (Vector<T> Mean, Vector<T> LogVariance) Encode(Vector<T> input)
Parameters
inputVector<T>The input vector to encode.
Returns
- (Vector<T> mean, Vector<T> logVar)
A tuple containing the mean and log variance vectors of the latent distribution.
Remarks
This method passes the input through the encoder portion of the VAE to produce the parameters of the latent distribution (mean and log variance). These parameters define a probability distribution in the latent space from which samples can be drawn.
For Beginners: This method compresses your input data into a compact representation.
When you encode data:
- The input passes through the first half of the network (the encoder)
- The encoder produces two sets of values for each dimension in the latent space:
- Mean values (the central or most likely value for each feature)
- Log variance values (how much uncertainty or flexibility there is around each feature)
For example, if encoding a face image:
- The means might represent the most likely values for features like hair color, face shape, etc.
- The log variances represent how certain the model is about these values
This compressed representation captures the essential information about the input in a much smaller form.
Exceptions
- InvalidOperationException
Thrown when the mean layer or log variance layer have not been properly initialized.
GetAuxiliaryLossDiagnostics()
Gets diagnostic information about the auxiliary loss computation.
public Dictionary<string, string> GetAuxiliaryLossDiagnostics()
Returns
- Dictionary<string, string>
A dictionary containing diagnostic information about KL divergence and latent space statistics.
Remarks
This method provides insights into the VAE's latent space behavior, including: - The current KL divergence value - The beta weight parameter - Statistics about the mean and variance of the latent distribution
For Beginners: This gives you information to help understand and debug your VAE.
The diagnostics include:
- KL Divergence: How much the latent space differs from ideal (lower is more "standard")
- Beta Weight: How much the KL divergence is weighted in training
- Latent Mean Norm: How far the average latent values are from zero
- Latent Std Mean: The average uncertainty in the latent space
These values help you:
- Understand if training is progressing well
- Detect problems like "posterior collapse" (when the VAE ignores the latent space)
- Tune hyperparameters like the beta weight
GetDiagnostics()
Gets diagnostic information about this component's state and behavior. Overrides GetDiagnostics() to include auxiliary loss diagnostics.
public Dictionary<string, string> GetDiagnostics()
Returns
- Dictionary<string, string>
A dictionary containing diagnostic metrics including both base layer diagnostics and auxiliary loss diagnostics from GetAuxiliaryLossDiagnostics().
GetModelMetadata()
Gets metadata about the Variational Autoencoder model.
public override ModelMetadata<T> GetModelMetadata()
Returns
- ModelMetadata<T>
A ModelMetaData object containing information about the model.
Remarks
This method returns metadata about the VAE, including the model type, input/output dimensions, latent size, and layer configuration. This information is useful for model management, serialization, and transfer learning.
InitializeLayers()
Sets up the layers of the Variational Autoencoder based on the provided architecture.
protected override void InitializeLayers()
Remarks
This method either uses custom layers provided by the user or creates default VAE layers. It then sets up specific references to key layers like the mean and log variance layers that are essential for the VAE's functioning.
For Beginners: This method builds the structure of your VAE.
It works in one of two ways:
- If you've provided your own custom layers, it uses those
- Otherwise, it creates a standard set of VAE layers based on your settings
Then it identifies and sets up the special layers that make a VAE work:
- The mean layer (for calculating the average values in the latent space)
- The log variance layer (for calculating the uncertainty ranges)
It's like assembling a machine based on either your custom blueprint or a standard design.
Predict(Tensor<T>)
Makes a prediction using the Variational Autoencoder by encoding the input, sampling from the latent space, and decoding.
public override Tensor<T> Predict(Tensor<T> input)
Parameters
inputTensor<T>The input tensor to make a prediction for.
Returns
- Tensor<T>
The reconstructed output tensor after passing through the VAE.
Remarks
This method performs a full forward pass through the VAE: 1. Encodes the input to get mean and log variance of the latent distribution. 2. Samples a point from this distribution using the reparameterization trick. 3. Decodes the sampled point to produce a reconstruction of the input.
For Beginners: This method takes your input data, compresses it, and then tries to recreate it.
The process:
- The input is compressed into a small representation (encoding)
- A random point is chosen from this compressed space (sampling)
- This point is then expanded back into the original data format (decoding)
The output is the VAE's attempt to recreate the input. It won't be exactly the same, but it should capture the important features of the original input.
Reparameterize(Vector<T>, Vector<T>)
Implements the reparameterization trick to sample from the latent distribution in a way that allows gradient flow.
public Vector<T> Reparameterize(Vector<T> mean, Vector<T> logVariance)
Parameters
meanVector<T>The mean vector of the latent distribution.
logVarianceVector<T>The log variance vector of the latent distribution.
Returns
- Vector<T>
A sampled vector from the latent distribution.
Remarks
This method implements the reparameterization trick, which is a key innovation in VAEs. It allows the model to sample from the latent distribution while still enabling gradient flow during training. The trick works by sampling from a standard normal distribution and then transforming those samples using the mean and variance parameters.
For Beginners: This method generates a random sample from your compressed representation.
The "reparameterization trick" is a clever technique that:
- Takes the mean and log variance values from the encoder
- Adds the right amount of randomness to create a sample point in the latent space
- Does this in a way that still allows the network to learn effectively
It's like having a recipe (the mean) and some flexibility (the variance):
- If you're very certain about a feature (low variance), the sample will be close to the mean
- If you're less certain (high variance), the sample could be further from the mean
The randomness is important because:
- It lets the VAE generate different outputs even for the same input
- It forces the VAE to learn a smooth, continuous latent space
- It allows for creative generation of new, unique examples
Exceptions
- ArgumentException
Thrown when the mean and log variance vectors don't have the same length.
SerializeNetworkSpecificData(BinaryWriter)
Serializes network-specific data for the Variational Autoencoder.
protected override void SerializeNetworkSpecificData(BinaryWriter writer)
Parameters
writerBinaryWriterThe BinaryWriter to write the data to.
Remarks
This method writes the specific configuration and state of the VAE to a binary stream. It includes network-specific parameters that are essential for later reconstruction of the network.
Train(Tensor<T>, Tensor<T>)
Trains the Variational Autoencoder using the provided input data.
public override void Train(Tensor<T> input, Tensor<T> expectedOutput)
Parameters
inputTensor<T>The input tensor used for training.
expectedOutputTensor<T>The expected output tensor (typically the same as the input for VAEs).
Remarks
This method implements the training process for the VAE: 1. Performs a forward pass to get the reconstructed output. 2. Calculates the reconstruction loss and the KL divergence. 3. Computes the total loss (reconstruction loss + KL divergence). 4. Backpropagates the error and updates the network parameters using the specified optimizer.
For Beginners: This method teaches the VAE to compress and reconstruct data effectively.
The training process:
- The VAE tries to reconstruct the input
- It measures how well it did (reconstruction error) using the specified loss function
- It also measures how well it's using the latent space (KL divergence)
- It combines these measurements into a total score
- It then adjusts its internal settings to do better next time, using the specified optimizer
This process is repeated many times with different inputs to improve the VAE's performance.
UpdateParameters(Vector<T>)
Updates the parameters of all layers in the VAE network.
public override void UpdateParameters(Vector<T> parameters)
Parameters
parametersVector<T>A vector containing all parameters for the network.
Remarks
This method distributes the parameters to each layer based on their parameter counts. It updates both the standard network layers and the specialized mean and log variance layers. This is typically used during training when applying gradient updates.
For Beginners: This method updates all the internal values of the VAE during training.
Think of parameters as the "settings" of the VAE:
- Each layer needs a certain number of parameters to function
- During training, these parameters are constantly adjusted to improve performance
- This method takes a big list of new parameter values and gives each layer its share
It makes sure to update:
- All the regular layers in the network
- The special mean and log variance layers that make the VAE work
It's like distributing updated parts to each section of a machine so it works better. Each layer gets exactly the number of parameters it needs.
Exceptions
- InvalidOperationException
Thrown when the mean layer or log variance layer have not been properly initialized.
ValidateCustomLayers(List<ILayer<T>>)
Ensures that custom layers provided for the VAE meet the minimum requirements.
protected override void ValidateCustomLayers(List<ILayer<T>> layers)
Parameters
Remarks
A valid VAE must include a mean layer, a log variance layer, and a pooling layer for the reparameterization trick. This method checks for these required components and throws an exception if any are missing.
For Beginners: This method checks if your custom layers will actually work as a VAE.
For a VAE to function properly, it needs at minimum:
- A mean layer (to calculate the central values in the latent space)
- A log variance layer (to calculate the uncertainty ranges)
- A pooling layer (for the "reparameterization trick" - a special technique that makes training possible)
If any of these essential components are missing, it's like trying to build a car without wheels or an engine - it won't work!
This method checks for these essential components and raises an error if they're missing.
Exceptions
- InvalidOperationException
Thrown when the custom layers don't include required layer types.