Class SAGAN<T>
- Namespace
- AiDotNet.NeuralNetworks
- Assembly
- AiDotNet.dll
Self-Attention GAN (SAGAN) implementation that uses self-attention mechanisms to model long-range dependencies in generated images.
For Beginners: Traditional CNNs in GANs only look at nearby pixels (local receptive fields). This works well for textures and local patterns, but struggles with global structure and long-range relationships (like making sure both eyes of a face look similar, or ensuring consistent geometric patterns).
Self-Attention solves this by letting each pixel "attend to" all other pixels, similar to how Transformers work in NLP. Think of it as:
- CNN: "I can only see my immediate neighbors"
- Self-Attention: "I can see the entire image and decide what's important"
Example: When generating a dog's face:
- CNN: Might make one ear pointy and one floppy (inconsistent)
- SAGAN: Notices both ears and makes them match (consistent)
Key innovations:
- Self-Attention Layers: Allow modeling of long-range dependencies
- Spectral Normalization: Stabilizes training for both G and D
- Hinge Loss: More stable than standard GAN loss
- Two Time-Scale Update Rule (TTUR): Different learning rates for G and D
- Conditional Batch Normalization: For class-conditional generation
Based on "Self-Attention Generative Adversarial Networks" by Zhang et al. (2019)
public class SAGAN<T> : NeuralNetworkBase<T>, INeuralNetworkModel<T>, INeuralNetwork<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>, IInterpretableModel<T>, IInputGradientComputable<T>, IDisposable
Type Parameters
TThe numeric type for computations (e.g., double, float)
- Inheritance
-
SAGAN<T>
- Implements
- Inherited Members
- Extension Methods
Constructors
SAGAN(NeuralNetworkArchitecture<T>, NeuralNetworkArchitecture<T>, int, int, int, int, int, int, int, int[]?, InputType, ILossFunction<T>?, double)
Initializes a new instance of Self-Attention GAN.
public SAGAN(NeuralNetworkArchitecture<T> generatorArchitecture, NeuralNetworkArchitecture<T> discriminatorArchitecture, int latentSize = 128, int imageChannels = 3, int imageHeight = 64, int imageWidth = 64, int numClasses = 0, int generatorChannels = 64, int discriminatorChannels = 64, int[]? attentionLayers = null, InputType inputType = InputType.TwoDimensional, ILossFunction<T>? lossFunction = null, double initialLearningRate = 0.0001)
Parameters
generatorArchitectureNeuralNetworkArchitecture<T>Architecture for the generator network.
discriminatorArchitectureNeuralNetworkArchitecture<T>Architecture for the discriminator network.
latentSizeintSize of the latent vector (typically 128)
imageChannelsintNumber of image channels (1 for grayscale, 3 for RGB)
imageHeightintHeight of generated images
imageWidthintWidth of generated images
numClassesintNumber of classes (0 for unconditional)
generatorChannelsintBase number of feature maps in generator (default 64)
discriminatorChannelsintBase number of feature maps in discriminator (default 64)
attentionLayersint[]Indices of layers where self-attention is applied
inputTypeInputTypeThe type of input.
lossFunctionILossFunction<T>Loss function for training (defaults to hinge loss)
initialLearningRatedoubleInitial learning rate (default 0.0001)
Properties
AttentionLayers
Gets the positions where self-attention layers are inserted. Typically at mid-level feature maps (e.g., 32x32 or 64x64 resolution).
public int[] AttentionLayers { get; }
Property Value
- int[]
Discriminator
Gets the discriminator network with self-attention layers.
public ConvolutionalNeuralNetwork<T> Discriminator { get; }
Property Value
Generator
Gets the generator network with self-attention layers.
public ConvolutionalNeuralNetwork<T> Generator { get; }
Property Value
LatentSize
Gets the size of the latent vector (noise input).
public int LatentSize { get; }
Property Value
NumClasses
Gets the number of classes for conditional generation. Set to 0 for unconditional generation.
public int NumClasses { get; }
Property Value
ParameterCount
Gets the total number of trainable parameters in the SAGAN.
public override int ParameterCount { get; }
Property Value
Remarks
This includes all parameters from both the Generator and Discriminator networks.
UseSpectralNormalization
Gets or sets whether to use spectral normalization. Spectral normalization stabilizes GAN training by constraining the Lipschitz constant of the discriminator.
public bool UseSpectralNormalization { get; set; }
Property Value
Methods
CreateNewInstance()
Creates a new instance of the same type as this neural network.
protected override IFullModel<T, Tensor<T>, Tensor<T>> CreateNewInstance()
Returns
- IFullModel<T, Tensor<T>, Tensor<T>>
A new instance of the same neural network type.
Remarks
For Beginners: This creates a blank version of the same type of neural network.
It's used internally by methods like DeepCopy and Clone to create the right type of network before copying the data into it.
DeserializeNetworkSpecificData(BinaryReader)
Deserializes network-specific data that was not covered by the general deserialization process.
protected override void DeserializeNetworkSpecificData(BinaryReader reader)
Parameters
readerBinaryReaderThe BinaryReader to read the data from.
Remarks
This method is called at the end of the general deserialization process to allow derived classes to read any additional data specific to their implementation.
For Beginners: Continuing the suitcase analogy, this is like unpacking that special compartment. After the main deserialization method has unpacked the common items (layers, parameters), this method allows each specific type of neural network to unpack its own unique items that were stored during serialization.
Generate(Tensor<T>, int[]?)
Generates images from specific latent codes.
public Tensor<T> Generate(Tensor<T> latentCodes, int[]? classIndices = null)
Parameters
latentCodesTensor<T>Latent codes to use
classIndicesint[]Optional class indices for conditional generation
Returns
- Tensor<T>
Generated images tensor
Generate(int, int[]?)
Generates images from random latent codes.
public Tensor<T> Generate(int numImages, int[]? classIndices = null)
Parameters
numImagesintNumber of images to generate
classIndicesint[]Optional class indices for conditional generation
Returns
- Tensor<T>
Generated images tensor
GetModelMetadata()
Gets the metadata for this neural network model.
public override ModelMetadata<T> GetModelMetadata()
Returns
- ModelMetadata<T>
A ModelMetaData object containing information about the model.
GetParameters()
Gets all trainable parameters of the network as a single vector.
public override Vector<T> GetParameters()
Returns
- Vector<T>
A vector containing all parameters of the network.
Remarks
For Beginners: Neural networks learn by adjusting their "parameters" (also called weights and biases). This method collects all those adjustable values into a single list so they can be updated during training.
InitializeLayers()
Initializes the layers of the neural network based on the architecture.
protected override void InitializeLayers()
Remarks
For Beginners: This method sets up all the layers in your neural network according to the architecture you've defined. It's like assembling the parts of your network before you can use it.
Predict(Tensor<T>)
Makes a prediction using the neural network.
public override Tensor<T> Predict(Tensor<T> input)
Parameters
inputTensor<T>The input data to process.
Returns
- Tensor<T>
The network's prediction.
Remarks
For Beginners: This is the main method you'll use to get results from your trained neural network. You provide some input data (like an image or text), and the network processes it through all its layers to produce an output (like a classification or prediction).
SerializeNetworkSpecificData(BinaryWriter)
Serializes network-specific data that is not covered by the general serialization process.
protected override void SerializeNetworkSpecificData(BinaryWriter writer)
Parameters
writerBinaryWriterThe BinaryWriter to write the data to.
Remarks
This method is called at the end of the general serialization process to allow derived classes to write any additional data specific to their implementation.
For Beginners: Think of this as packing a special compartment in your suitcase. While the main serialization method packs the common items (layers, parameters), this method allows each specific type of neural network to pack its own unique items that other networks might not have.
SetTrainingMode(bool)
Sets the neural network to either training or inference mode.
public override void SetTrainingMode(bool isTraining)
Parameters
isTrainingboolTrue to enable training mode; false to enable inference mode.
Remarks
For Beginners: Neural networks behave differently during training versus when making predictions.
When in training mode (isTraining = true): - The network keeps track of intermediate calculations needed for learning - Certain layers like Dropout and BatchNormalization behave differently - The network uses more memory but can learn from its mistakes
When in inference/prediction mode (isTraining = false): - The network only performs forward calculations - It uses less memory and runs faster - It cannot learn or update its parameters
Think of it like the difference between taking a practice test (training mode) where you can check your answers and learn from mistakes, versus taking the actual exam (inference mode) where you just give your best answers based on what you've already learned.
Train(Tensor<T>, Tensor<T>)
Trains the neural network on a single input-output pair.
public override void Train(Tensor<T> input, Tensor<T> expectedOutput)
Parameters
inputTensor<T>The input data.
expectedOutputTensor<T>The expected output for the given input.
Remarks
This method performs one training step on the neural network using the provided input and expected output. It updates the network's parameters to reduce the error between the network's prediction and the expected output.
For Beginners: This is how your neural network learns. You provide: - An input (what the network should process) - The expected output (what the correct answer should be)
The network then:
- Makes a prediction based on the input
- Compares its prediction to the expected output
- Calculates how wrong it was (the loss)
- Adjusts its internal values to do better next time
After training, you can get the loss value using the GetLastLoss() method to see how well the network is learning.
TrainStep(Tensor<T>, int, int[]?)
Performs a single training step on a batch of real images. Uses hinge loss for improved stability.
public (T discriminatorLoss, T generatorLoss) TrainStep(Tensor<T> realImages, int batchSize, int[]? realLabels = null)
Parameters
realImagesTensor<T>Batch of real images
batchSizeintNumber of images in the batch
realLabelsint[]Optional class labels for conditional training
Returns
UpdateParameters(Vector<T>)
Updates the network's parameters with new values.
public override void UpdateParameters(Vector<T> parameters)
Parameters
parametersVector<T>The new parameter values to set.
Remarks
For Beginners: During training, a neural network's internal values (parameters) get adjusted to improve its performance. This method allows you to update all those values at once by providing a complete set of new parameters.
This is typically used by optimization algorithms that calculate better parameter values based on training data.