Class ResidualNeuralNetwork<T>
- Namespace
- AiDotNet.NeuralNetworks
- Assembly
- AiDotNet.dll
Represents a Residual Neural Network, which is a type of neural network that uses skip connections to address the vanishing gradient problem in deep networks.
public class ResidualNeuralNetwork<T> : NeuralNetworkBase<T>, INeuralNetworkModel<T>, INeuralNetwork<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>, IInterpretableModel<T>, IInputGradientComputable<T>, IDisposable, IAuxiliaryLossLayer<T>, IDiagnosticsProvider
Type Parameters
TThe numeric type used for calculations, typically float or double.
- Inheritance
-
ResidualNeuralNetwork<T>
- Implements
- Inherited Members
- Extension Methods
Remarks
A Residual Neural Network (ResNet) is an advanced neural network architecture that introduces "skip connections" or "shortcuts" that allow information to bypass one or more layers. These residual connections help address the vanishing gradient problem that occurs in very deep networks, enabling the training of networks with many more layers than previously possible. ResNets were a breakthrough in deep learning that significantly improved performance on image recognition and other tasks.
For Beginners: A Residual Neural Network is like a highway system for information in a neural network.
Think of it like this:
- In a traditional neural network, information must pass through every layer sequentially
- In a ResNet, there are "shortcut paths" or "highways" that let information skip ahead
For example, imagine trying to pass a message through a line of 100 people:
- In a regular network, each person must whisper to the next person in line
- In a ResNet, some people can also shout directly to someone 5 positions ahead
This design solves a major problem: in very deep networks (many layers), information and learning signals tend to fade away or "vanish" as they travel through many layers. The shortcuts in ResNets help information flow more easily through the network, allowing for much deeper networks (some with over 100 layers!) that can learn more complex patterns.
ResNets revolutionized image recognition and are now used in many AI systems that need to identify complex patterns in data.
Constructors
ResidualNeuralNetwork(NeuralNetworkArchitecture<T>, T?, int, int, ILossFunction<T>?)
Initializes a new instance of the ResidualNeuralNetwork<T> class with the specified architecture.
public ResidualNeuralNetwork(NeuralNetworkArchitecture<T> architecture, T? learningRate = default, int epochs = 10, int batchSize = 32, ILossFunction<T>? lossFunction = null)
Parameters
architectureNeuralNetworkArchitecture<T>The neural network architecture to use for the ResNet.
learningRateTThe learning rate for training. Default is 0.01 converted to type T.
epochsintThe number of training epochs. Default is 10.
batchSizeintThe batch size for training. Default is 32.
lossFunctionILossFunction<T>Optional custom loss function. If null, a default will be chosen based on task type.
Remarks
This constructor creates a new Residual Neural Network with the specified architecture. It initializes the network layers based on the architecture, or creates default ResNet layers if no specific layers are provided.
For Beginners: This sets up the Residual Neural Network with its basic structure.
When creating a new ResNet:
- The architecture defines what the network looks like - how many layers it has, how they're connected, etc.
- The constructor prepares the network by either:
- Using the specific layers provided in the architecture, or
- Creating default layers designed for ResNets if none are specified
The default ResNet layers include special residual blocks that have both:
- A main path where information is processed through multiple layers
- A shortcut path that allows information to skip these layers
This combination of paths is what gives ResNets their special ability to train very deep networks.
Properties
AuxiliaryLossWeight
Gets or sets the weight for the deep supervision auxiliary loss.
public T AuxiliaryLossWeight { get; set; }
Property Value
- T
Remarks
This weight controls how much the intermediate auxiliary classifiers contribute to the total loss. The total loss is: main_loss + (auxiliary_weight * auxiliary_loss). Typical values range from 0.1 to 0.5.
For Beginners: This controls how much the network should care about intermediate predictions.
The weight determines the balance between:
- Final output accuracy (main loss)
- Intermediate prediction accuracy (auxiliary loss)
Common values:
- 0.3 (default): Balanced contribution from intermediate classifiers
- 0.1-0.2: Less emphasis on intermediate predictions
- 0.4-0.5: More emphasis on intermediate predictions
Higher values make the network focus more on getting intermediate predictions correct, which can help with gradient flow but may slow convergence.
SupportsTraining
Indicates whether this network supports training (learning from data).
public override bool SupportsTraining { get; }
Property Value
Remarks
This property indicates whether the network is capable of learning from data through training. For ResidualNeuralNetwork, this property always returns true since the network is designed for training.
For Beginners: This tells you if the network can learn from data.
The Residual Neural Network supports training, which means:
- It can adjust its internal values based on examples
- It can improve its performance over time
- It can learn to recognize patterns in data
This property always returns true because ResNets are specifically designed to be trainable, even when they're very deep (many layers).
UseAuxiliaryLoss
public bool UseAuxiliaryLoss { get; set; }
Property Value
Methods
AddAuxiliaryClassifier(ILayer<T>, int)
Adds an auxiliary classifier at the specified layer position for deep supervision.
public void AddAuxiliaryClassifier(ILayer<T> classifier, int layerPosition)
Parameters
classifierILayer<T>The classifier layer to add for intermediate predictions.
layerPositionintThe layer index where this classifier should be applied.
Remarks
Auxiliary classifiers enable deep supervision by providing additional training signals at intermediate layers. This helps with gradient flow and can improve training stability.
For Beginners: Think of auxiliary classifiers as "checkpoints" in your network. They make predictions at intermediate stages, helping the network learn better representations at each layer rather than only at the final output.
ComputeAuxiliaryLoss()
Computes the auxiliary loss for deep supervision from intermediate auxiliary classifiers.
public T ComputeAuxiliaryLoss()
Returns
- T
The computed deep supervision auxiliary loss.
Remarks
This method computes the auxiliary loss from intermediate classifiers placed at strategic positions in the network. For very deep ResNets, these intermediate classifiers help maintain strong gradient signals throughout the network during backpropagation.
For Beginners: This calculates how well the network's intermediate layers are learning.
Deep supervision works by:
- Adding small classifiers at intermediate points in the network
- Each classifier tries to predict the final output from intermediate features
- Computing loss for each intermediate prediction
- Averaging these losses to get the auxiliary loss
This helps because:
- It provides learning signals to earlier layers
- It prevents gradients from becoming too weak in deep networks
- It encourages intermediate layers to learn meaningful features
The auxiliary loss is combined with the main loss during training to guide learning.
CreateNewInstance()
Creates a new instance of the residual neural network with the same configuration.
protected override IFullModel<T, Tensor<T>, Tensor<T>> CreateNewInstance()
Returns
- IFullModel<T, Tensor<T>, Tensor<T>>
A new instance of ResidualNeuralNetwork<T> with the same configuration as the current instance.
Remarks
This method creates a new residual neural network that has the same configuration as the current instance. It's used for model persistence, cloning, and transferring the model's configuration to new instances. The new instance will have the same architecture, learning rate, epochs, batch size, and loss function as the original, but will not share parameter values unless they are explicitly copied after creation.
For Beginners: This method makes a fresh copy of the current model with the same settings.
It's like creating a blueprint copy of your network that can be used to:
- Save your model's settings
- Create a new identical model
- Transfer your model's configuration to another system
This is useful when you want to:
- Create multiple similar residual neural networks
- Save a model's configuration for later use
- Reset a model while keeping its settings
Note that while the settings are copied, the learned parameters (like the weights for detecting features) are not automatically transferred, so the new instance will need training or parameter copying to match the performance of the original.
DeserializeNetworkSpecificData(BinaryReader)
Deserializes network-specific data for the Residual Neural Network.
protected override void DeserializeNetworkSpecificData(BinaryReader reader)
Parameters
readerBinaryReaderThe BinaryReader to read the data from.
Remarks
This method reads the training parameters specific to the Residual Neural Network from the provided BinaryReader. It restores the number of epochs, learning rate, and batch size, ensuring that the network's training configuration is accurately reconstructed during deserialization.
For Beginners: This method loads the special settings for training this ResNet.
It reads:
- The number of times to train on the entire dataset (epochs)
- How quickly the network learns from its mistakes (learning rate)
- How many examples the network looks at before updating (batch size)
Loading these settings ensures that you can continue training or use the network with the exact same configuration it had when it was saved.
GetAuxiliaryLossDiagnostics()
Gets diagnostic information about the deep supervision auxiliary loss.
public Dictionary<string, string> GetAuxiliaryLossDiagnostics()
Returns
- Dictionary<string, string>
A dictionary containing diagnostic information about auxiliary losses.
Remarks
This method returns detailed diagnostics about the deep supervision system, including the number of auxiliary classifiers, their positions in the network, and the computed losses. This information is useful for monitoring training progress and debugging.
For Beginners: This provides information about how deep supervision is working.
The diagnostics include:
- Total auxiliary loss from all intermediate classifiers
- Weight applied to the auxiliary loss
- Number of auxiliary classifiers in the network
- Whether deep supervision is enabled
This helps you:
- Monitor if auxiliary classifiers are contributing to training
- Debug issues with deep supervision
- Understand the impact of intermediate supervision on learning
You can use this information to adjust the auxiliary loss weight or the placement of auxiliary classifiers for better training results.
GetDiagnostics()
Gets diagnostic information about this component's state and behavior. Includes auxiliary loss diagnostics from GetAuxiliaryLossDiagnostics().
public Dictionary<string, string> GetDiagnostics()
Returns
- Dictionary<string, string>
A dictionary containing diagnostic metrics including auxiliary loss diagnostics.
GetModelMetadata()
Gets metadata about the Residual Neural Network model.
public override ModelMetadata<T> GetModelMetadata()
Returns
- ModelMetadata<T>
A ModelMetaData object containing information about the model.
Remarks
This method returns metadata that describes the Residual Neural Network, including its type, architecture details, and training parameters. This information can be useful for model management, documentation, and versioning.
For Beginners: This provides a summary of your network's configuration.
The metadata includes:
- The type of model (Residual Neural Network)
- The number of layers in the network
- Information about the network's structure
- Training parameters like learning rate and epochs
This is useful for:
- Documenting your model
- Comparing different model configurations
- Reproducing your model setup later
InitializeLayers()
Initializes the neural network layers based on the provided architecture or default configuration.
protected override void InitializeLayers()
Remarks
This method sets up the neural network layers for the Residual Neural Network. If the architecture provides specific layers, those are used. Otherwise, a default configuration optimized for ResNets is created. In a typical ResNet, this involves creating residual blocks that combine a main path with a shortcut path, allowing information to either pass through layers or bypass them.
For Beginners: This method sets up the building blocks of the neural network.
When initializing layers:
- If the user provided specific layers, those are used
- Otherwise, default layers suitable for ResNets are created automatically
- The system checks that any custom layers will work properly with the ResNet
A typical ResNet has specialized building blocks called "residual blocks" that contain:
- Convolutional layers that process the input
- Batch normalization layers that stabilize learning
- Activation layers that introduce non-linearity
- Shortcut connections that allow information to bypass these layers
These blocks are then stacked together, often with increasing complexity as you go deeper into the network.
Predict(Tensor<T>)
Makes a prediction using the Residual Neural Network.
public override Tensor<T> Predict(Tensor<T> input)
Parameters
inputTensor<T>The input tensor to make a prediction for.
Returns
- Tensor<T>
The predicted output tensor.
Remarks
This method performs a forward pass through the network to generate a prediction based on the input tensor. The input flows through all layers sequentially, with residual connections allowing information to bypass certain layers where applicable. The output represents the network's prediction, which depends on the task (e.g., class probabilities for classification or continuous values for regression).
For Beginners: This method uses the network to make a prediction based on input data.
The prediction process works like this:
- Input data enters the network at the first layer
- The data passes through each layer in sequence
- At residual blocks, there are two paths:
- A main path through multiple processing layers
- A shortcut path that bypasses these layers
- The outputs from both paths are combined at the end of each block
- The final layer produces the prediction result
For example, in an image recognition task:
- The input might be an image
- Each layer detects increasingly complex patterns
- The shortcuts help information flow through the entire network
- The output tells you what the image contains
SerializeNetworkSpecificData(BinaryWriter)
Serializes network-specific data for the Residual Neural Network.
protected override void SerializeNetworkSpecificData(BinaryWriter writer)
Parameters
writerBinaryWriterThe BinaryWriter to write the data to.
Remarks
This method writes the training parameters specific to the Residual Neural Network to the provided BinaryWriter. These parameters include the number of epochs, learning rate, and batch size, which are crucial for reconstructing the network's training configuration during deserialization.
For Beginners: This method saves the special settings for training this ResNet.
It writes:
- The number of times to train on the entire dataset (epochs)
- How quickly the network learns from its mistakes (learning rate)
- How many examples the network looks at before updating (batch size)
These settings are important because they affect how the network learns and performs. Saving them allows you to recreate the exact same training setup later.
Train(Tensor<T>, Tensor<T>)
Trains the Residual Neural Network on the provided data.
public override void Train(Tensor<T> input, Tensor<T> expectedOutput)
Parameters
inputTensor<T>The input training data.
expectedOutputTensor<T>The expected output for the given input.
Remarks
This method trains the Residual Neural Network on the provided data for the specified number of epochs. It divides the data into batches and trains on each batch using backpropagation and gradient descent. The method tracks and reports the average loss for each epoch to monitor training progress. If deep supervision is enabled and auxiliary classifiers are configured, auxiliary losses from intermediate classifiers are included.
For Beginners: This method teaches the ResNet to recognize patterns in your data.
The training process works like this:
- Divides your data into smaller batches for efficient processing
- For each batch:
- Feeds the input data through the network
- Compares the prediction with the expected output
- Calculates how wrong the prediction was (the "loss")
- If deep supervision is enabled, also computes losses from intermediate classifiers
- Adjusts the network's parameters to reduce errors
- Repeats this process for multiple epochs (complete passes through the data)
The special residual connections in the ResNet help the error signals flow backward through the network more effectively, making it possible to train very deep networks that would otherwise suffer from the vanishing gradient problem.
UpdateParameters(Vector<T>)
Updates the parameters of the residual neural network layers.
public override void UpdateParameters(Vector<T> parameters)
Parameters
parametersVector<T>The vector of parameter updates to apply.
Remarks
This method updates the parameters of each layer in the residual neural network based on the provided parameter updates. The parameters vector is divided into segments corresponding to each layer's parameter count, and each segment is applied to its respective layer. In a ResNet, these parameters typically include weights for convolutional layers, as well as parameters for batch normalization and other operations within residual blocks.
For Beginners: This method updates how the ResNet makes decisions based on training.
During training:
- The network learns by adjusting its internal parameters
- This method applies those adjustments
- Each layer gets the portion of updates meant specifically for it
For a ResNet, these adjustments might include:
- How each convolutional filter detects patterns
- How the batch normalization layers stabilize learning
- How information should flow through both the main and shortcut paths
The residual connections (shortcuts) make it easier for these updates to flow backward through the network during training, which helps very deep networks learn effectively.