Class ActivationLayer<T>
- Namespace
- AiDotNet.NeuralNetworks.Layers
- Assembly
- AiDotNet.dll
A layer that applies an activation function to transform the input data.
Activation functions introduce non-linearity to neural networks. Non-linearity means the output isn't simply proportional to the input (like y = 2x). Instead, it can follow curves or more complex patterns. severely limiting what it can learn.
Common activation functions include: - ReLU: Returns 0 for negative inputs, or the input value for positive inputs - Sigmoid: Squashes values between 0 and 1, useful for probabilities - Tanh: Similar to sigmoid but outputs values between -1 and 1
public class ActivationLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable
Type Parameters
TThe numeric type used for calculations (like float, double, etc.)
- Inheritance
-
LayerBase<T>ActivationLayer<T>
- Implements
-
ILayer<T>
- Inherited Members
Constructors
ActivationLayer(int[], IActivationFunction<T>)
public ActivationLayer(int[] inputShape, IActivationFunction<T> activationFunction)
Parameters
inputShapeint[]activationFunctionIActivationFunction<T>
ActivationLayer(int[], IVectorActivationFunction<T>)
public ActivationLayer(int[] inputShape, IVectorActivationFunction<T> vectorActivationFunction)
Parameters
inputShapeint[]vectorActivationFunctionIVectorActivationFunction<T>
Properties
SupportsGpuExecution
Gets whether this layer's activation function supports GPU execution.
protected override bool SupportsGpuExecution { get; }
Property Value
Remarks
GPU execution is supported for common scalar activation functions that have dedicated GPU kernels: ReLU, LeakyReLU, Sigmoid, Tanh, GELU, and Swish.
For Beginners: This tells you if the activation function can run on GPU. Most common activations like ReLU and Sigmoid have GPU support. Exotic or vector activations (like Softmax) may not support GPU execution yet.
SupportsJitCompilation
Gets whether this activation layer supports JIT compilation.
public override bool SupportsJitCompilation { get; }
Property Value
- bool
True if the activation function supports JIT compilation, false otherwise.
Remarks
This property checks whether the configured activation function supports JIT compilation. Returns false if no activation is configured or if the activation doesn't support JIT.
For Beginners: This tells you if this layer can use JIT compilation for faster inference.
The layer can be JIT compiled if:
- The activation function (ReLU, Sigmoid, etc.) has JIT support implemented
- The activation's gradient computation is available
Common activations like ReLU, Sigmoid, and Tanh typically support JIT. Custom or exotic activations may not support it yet.
SupportsTraining
Indicates whether this layer has trainable parameters.
Always returns false because activation layers don't have parameters to train. Unlike layers such as Dense/Convolutional layers which have weights and biases that need updating during training, activation layers simply apply a fixed mathematical function to their inputs.
public override bool SupportsTraining { get; }
Property Value
Remarks
This property overrides the base class property to specify that activation layers do not have trainable parameters. Trainable parameters are values within a layer that are adjusted during the training process to minimize the loss function. Since activation layers simply apply a fixed mathematical function to their inputs without any adjustable parameters, this property always returns false.
For Beginners: This tells you that activation layers don't learn or change during training.
While layers like Dense layers have weights that get updated during training, activation layers just apply a fixed mathematical formula that never changes.
Think of it like this:
- Dense layers are like adjustable knobs that the network learns to tune
- Activation layers are like fixed functions (like f(x) = max(0, x) for ReLU)
This property helps the training system know that it doesn't need to update anything in this layer during the training process.
Methods
Backward(Tensor<T>)
Calculates how changes in the output affect the input during training.
This is called during the backward pass (backpropagation) when training the neural network. Backpropagation is the algorithm that determines how much each neuron contributed to the error in the network's prediction, allowing the network to adjust its parameters to reduce future errors.
For activation layers, the backward pass calculates how the gradient (rate of change) of the error with respect to the layer's output should be modified to get the gradient with respect to the layer's input. This involves applying the derivative of the activation function.
For example, with ReLU activation, the derivative is 1 for inputs that were positive, and 0 for inputs that were negative or zero. This means the gradient flows unchanged through positive activations but gets blocked (multiplied by zero) for negative activations.
public override Tensor<T> Backward(Tensor<T> outputGradient)
Parameters
outputGradientTensor<T>How much the network's error changes with respect to this layer's output
Returns
- Tensor<T>
How much the network's error changes with respect to this layer's input
Remarks
This method implements the backward pass for the activation layer. It checks that a forward pass has been performed and that the output gradient has the same shape as the input. Then it applies either the scalar or vector activation derivative based on the layer's configuration. For scalar activation, the derivative is applied element-wise and multiplied by the output gradient. For vector activation, the derivative tensor is multiplied by the output gradient.
For Beginners: This method calculates how the error gradient flows backward through this layer.
During backpropagation, the network calculates how each part contributed to the error. This method:
- Checks that Forward() was called first (we need the saved input)
- Verifies the gradient has the correct shape
- Calculates how the gradient changes as it passes through this layer
- Returns the modified gradient
For example, with ReLU activation:
- If the input was positive, the gradient passes through unchanged
- If the input was negative, the gradient is blocked (becomes 0)
This is because ReLU's derivative is 1 for positive inputs and 0 for negative inputs.
This process helps the network understand which neurons to adjust during training.
Exceptions
- ForwardPassRequiredException
Thrown if called before Forward method
- TensorShapeMismatchException
Thrown if the gradient shape doesn't match the input shape
BackwardGpu(IGpuTensor<T>)
Performs GPU-resident backward pass for the activation layer. Computes gradient with respect to input entirely on GPU - no CPU roundtrip.
public override IGpuTensor<T> BackwardGpu(IGpuTensor<T> outputGradient)
Parameters
outputGradientIGpuTensor<T>GPU-resident gradient from the next layer.
Returns
- IGpuTensor<T>
GPU-resident gradient to pass to the previous layer.
Exceptions
- InvalidOperationException
Thrown if ForwardGpu was not called first.
ExportComputationGraph(List<ComputationNode<T>>)
Exports the activation layer's computation graph for JIT compilation.
public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)
Parameters
inputNodesList<ComputationNode<T>>List to populate with input computation nodes.
Returns
- ComputationNode<T>
The output computation node representing the activation function applied to the input.
Remarks
This method constructs a computation graph representation of the activation layer by: 1. Validating input parameters and layer configuration 2. Creating a symbolic input node with proper batch dimension 3. Applying the activation function to the symbolic input
For Beginners: This method converts the activation layer into a computation graph for JIT compilation.
The computation graph describes:
- Input: A symbolic tensor with batch size = 1 plus the layer's input shape
- Operation: Apply the activation function (ReLU, Sigmoid, etc.)
- Output: The activated tensor
JIT compilation can make inference 5-10x faster by optimizing this graph into native code.
Forward(Tensor<T>)
Processes the input data by applying the activation function.
This is called during the forward pass of the neural network, which is when data flows from the input layer through all hidden layers to the output layer. The forward pass is used both during training and when making predictions with a trained model.
For example, if using ReLU activation, this method would replace all negative values in the input with zeros while keeping positive values unchanged.
public override Tensor<T> Forward(Tensor<T> input)
Parameters
inputTensor<T>The input data to process
Returns
- Tensor<T>
The transformed data after applying the activation function
Remarks
This method implements the forward pass for the activation layer. It stores the input tensor for later use in the backward pass, then applies either a scalar or vector activation function based on the layer's configuration. For scalar activation, the function is applied to each element independently. For vector activation, the function is applied to the entire tensor at once.
For Beginners: This method applies the activation function to transform the input data.
During the forward pass, data flows through the network from input to output. This method:
- Saves the input for later use in backpropagation
- Applies the activation function to transform the data
- Returns the transformed data
For example, with ReLU activation:
- Input: [-2, 0, 3, -1, 5]
- Output: [0, 0, 3, 0, 5] (negative values become 0)
This transformation adds non-linearity to the network, which is essential for learning complex patterns in the data.
ForwardGpu(params IGpuTensor<T>[])
Performs the forward pass on GPU using GPU-accelerated activation kernels.
public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)
Parameters
inputsIGpuTensor<T>[]
Returns
- IGpuTensor<T>
A GPU tensor with the activation function applied.
Remarks
This method applies the activation function entirely on the GPU using optimized kernels. Supported activations: ReLU, LeakyReLU, Sigmoid, Tanh, GELU, Swish.
For Beginners: The GPU version of activation is much faster for large tensors because GPUs can process thousands of values in parallel.
Exceptions
- InvalidOperationException
Thrown when GPU execution is not supported for this activation type.
GetParameters()
Gets all trainable parameters of this layer as a flat vector.
This method is useful for operations that need to work with all parameters at once, such as certain optimization algorithms, regularization techniques, or when saving a model.
Returns an empty vector since activation layers have no trainable parameters. Other layer types like Dense layers would return their weights and biases.
public override Vector<T> GetParameters()
Returns
- Vector<T>
An empty vector representing the layer's parameters
Remarks
This method returns all trainable parameters of the layer as a flat vector. For layers with trainable parameters, this would involve reshaping multi-dimensional parameters (like weight matrices) into a one-dimensional vector. However, since activation layers have no trainable parameters, this method returns an empty vector.
For Beginners: This method returns all the layer's trainable values as a single list, but activation layers have none.
Some operations in neural networks need to work with all parameters at once:
- Saving and loading models
- Applying regularization (techniques to prevent overfitting)
- Using advanced optimization algorithms
This method provides those parameters as a single vector, but since activation layers don't have any trainable parameters, it returns an empty vector.
For comparison:
- A Dense layer with 100 inputs and 10 outputs would return a vector with 1,010 values (1,000 weights + 10 biases)
- This ActivationLayer returns an empty vector with 0 values
ResetState()
Clears the layer's memory of previous inputs.
Neural networks maintain state between operations, especially during training. This method resets that state, which is useful in several scenarios: - When starting to process a new batch of data - Between training epochs - When switching from training to evaluation mode - When you want to ensure the layer behaves deterministically
For activation layers, this means forgetting the last input that was processed, which was stored to help with the backward pass calculations.
public override void ResetState()
Remarks
This method resets the internal state of the layer by clearing the cached input tensor. The activation layer stores the input from the most recent forward pass to use during the backward pass for calculating gradients. Resetting this state is useful when starting to process new data or when you want to ensure the layer behaves deterministically.
For Beginners: This method clears the layer's memory of previous calculations.
During training, the layer remembers the last input it processed to help with backpropagation calculations. This method makes the layer "forget" that input.
You might need to reset state:
- When starting a new batch of training data
- Between training epochs
- When switching from training to testing
- When you want to ensure consistent behavior
For activation layers, this is simple - it just clears the saved input tensor. Other layer types might have more complex state to reset.
This helps ensure that processing one batch doesn't accidentally affect the processing of the next batch.
UpdateParameters(T)
Updates the layer's internal parameters during training.
This method is part of the training process where layers adjust their parameters (weights and biases) based on the gradients calculated during backpropagation.
For activation layers, this method does nothing because they have no trainable parameters. Unlike layers such as Dense layers which need to update their weights and biases, activation layers simply apply a fixed mathematical function.
public override void UpdateParameters(T learningRate)
Parameters
learningRateTHow quickly the network should learn from new data. Higher values mean bigger parameter updates.
Remarks
This method is called during the training process after the forward and backward passes have been completed. For layers with trainable parameters, this method would update those parameters based on the gradients calculated during backpropagation and the provided learning rate. However, since activation layers have no trainable parameters, this method does nothing.
For Beginners: This method would update the layer's internal values during training, but activation layers have nothing to update.
In neural networks, training involves adjusting parameters to reduce errors. This method is where those adjustments happen, but activation layers don't have any adjustable parameters, so this method is empty.
For comparison:
- In a Dense layer, this would update weights and biases
- In a BatchNorm layer, this would update scale and shift parameters
- In this ActivationLayer, there's nothing to update
The learning rate parameter controls how big the updates would be if there were any parameters to update - higher values mean bigger changes.