Class ActivationLayer<T>

Namespace: AiDotNet.NeuralNetworks.Layers

Assembly: AiDotNet.dll

A layer that applies an activation function to transform the input data.

Activation functions introduce non-linearity to neural networks. Non-linearity means the output isn't simply proportional to the input (like y = 2x). Instead, it can follow curves or more complex patterns. severely limiting what it can learn.

Common activation functions include: - ReLU: Returns 0 for negative inputs, or the input value for positive inputs - Sigmoid: Squashes values between 0 and 1, useful for probabilities - Tanh: Similar to sigmoid but outputs values between -1 and 1

public class ActivationLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable

Type Parameters

T: The numeric type used for calculations (like float, double, etc.)

Inheritance: object

LayerBase<T>

ActivationLayer<T>

Implements: ILayer<T>

IJitCompilable<T>

IDiagnosticsProvider

IWeightLoadable<T>

IDisposable

Inherited Members: LayerBase<T>.Engine

LayerBase<T>.ScalarActivation

LayerBase<T>.VectorActivation

LayerBase<T>.UsingVectorActivation

LayerBase<T>.NumOps

LayerBase<T>.Random

LayerBase<T>.Parameters

LayerBase<T>.ParameterGradients

LayerBase<T>.InputShape

LayerBase<T>.InputShapes

LayerBase<T>.UpdateInputShape(int[])

LayerBase<T>.OutputShape

LayerBase<T>.IsTrainingMode

LayerBase<T>.InitializationStrategy

LayerBase<T>.IsInitialized

LayerBase<T>.InitializationLock

LayerBase<T>.EnsureInitialized()

LayerBase<T>.UseAutodiff

LayerBase<T>.SetTrainingMode(bool)

LayerBase<T>.GetParameterGradients()

LayerBase<T>.ClearGradients()

LayerBase<T>.GetInputShape()

LayerBase<T>.GetInputShapes()

LayerBase<T>.GetOutputShape()

LayerBase<T>.GetWeights()

LayerBase<T>.GetBiases()

LayerBase<T>.MapActivationToFused()

LayerBase<T>.SupportsGpuTraining

LayerBase<T>.CanExecuteOnGpu

LayerBase<T>.CanTrainOnGpu

LayerBase<T>.UpdateParametersGpu(IGpuOptimizerConfig)

LayerBase<T>.UploadWeightsToGpu()

LayerBase<T>.DownloadWeightsFromGpu()

LayerBase<T>.ZeroGradientsGpu()

LayerBase<T>.GetActivationTypes()

LayerBase<T>.Forward(params Tensor<T>[])

LayerBase<T>.ApplyActivation(Tensor<T>)

LayerBase<T>.ApplyActivation(Vector<T>)

LayerBase<T>.ActivateTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ActivateTensor(IVectorActivationFunction<T>, Tensor<T>)

LayerBase<T>.CalculateInputShape(int, int, int)

LayerBase<T>.CalculateOutputShape(int, int, int)

LayerBase<T>.Clone()

LayerBase<T>.DerivativeTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ApplyActivationDerivative(T, T)

LayerBase<T>.ApplyActivationDerivative(Tensor<T>, Tensor<T>)

LayerBase<T>.ComputeActivationJacobian(Vector<T>)

LayerBase<T>.ApplyActivationDerivative(Vector<T>, Vector<T>)

LayerBase<T>.UpdateParameters(Vector<T>)

LayerBase<T>.ParameterCount

LayerBase<T>.Serialize(BinaryWriter)

LayerBase<T>.Deserialize(BinaryReader)

LayerBase<T>.SetParameters(Vector<T>)

LayerBase<T>.GetDiagnostics()

LayerBase<T>.ApplyActivationToGraph(ComputationNode<T>)

LayerBase<T>.CanActivationBeJitted()

LayerBase<T>.RegisterTrainableParameter(Tensor<T>, PersistentTensorRole)

LayerBase<T>.InvalidateTrainableParameter(Tensor<T>)

LayerBase<T>.HasGpuActivation()

LayerBase<T>.ApplyActivationForwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.ApplyActivationBackwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.GetFusedActivationType()

LayerBase<T>.ApplyGpuActivation(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int, FusedActivationType)

LayerBase<T>.ApplyGpuActivationBackward(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int, FusedActivationType, float)

LayerBase<T>.Dispose()

LayerBase<T>.Dispose(bool)

LayerBase<T>.WeightParameterName

LayerBase<T>.BiasParameterName

LayerBase<T>.SetWeights(Tensor<T>)

LayerBase<T>.SetBiases(Tensor<T>)

LayerBase<T>.GetParameterNames()

LayerBase<T>.TryGetParameter(string, out Tensor<T>)

LayerBase<T>.SetParameter(string, Tensor<T>)

LayerBase<T>.GetParameterShape(string)

LayerBase<T>.NamedParameterCount

LayerBase<T>.ValidateWeights(IEnumerable<string>, Func<string, string>)

LayerBase<T>.LoadWeights(Dictionary<string, Tensor<T>>, Func<string, string>, bool)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Constructors

ActivationLayer(int[], IActivationFunction<T>)

public ActivationLayer(int[] inputShape, IActivationFunction<T> activationFunction)

Parameters

inputShape int[]
activationFunction IActivationFunction<T>

ActivationLayer(int[], IVectorActivationFunction<T>)

public ActivationLayer(int[] inputShape, IVectorActivationFunction<T> vectorActivationFunction)

Parameters

inputShape int[]
vectorActivationFunction IVectorActivationFunction<T>

Properties

SupportsGpuExecution

Gets whether this layer's activation function supports GPU execution.

protected override bool SupportsGpuExecution { get; }

Property Value

bool

Remarks

GPU execution is supported for common scalar activation functions that have dedicated GPU kernels: ReLU, LeakyReLU, Sigmoid, Tanh, GELU, and Swish.

For Beginners: This tells you if the activation function can run on GPU. Most common activations like ReLU and Sigmoid have GPU support. Exotic or vector activations (like Softmax) may not support GPU execution yet.

SupportsJitCompilation

Gets whether this activation layer supports JIT compilation.

public override bool SupportsJitCompilation { get; }

Property Value

bool: True if the activation function supports JIT compilation, false otherwise.

Remarks

This property checks whether the configured activation function supports JIT compilation. Returns false if no activation is configured or if the activation doesn't support JIT.

For Beginners: This tells you if this layer can use JIT compilation for faster inference.

The layer can be JIT compiled if:

The activation function (ReLU, Sigmoid, etc.) has JIT support implemented
The activation's gradient computation is available

Common activations like ReLU, Sigmoid, and Tanh typically support JIT. Custom or exotic activations may not support it yet.

SupportsTraining

Indicates whether this layer has trainable parameters.

Always returns false because activation layers don't have parameters to train. Unlike layers such as Dense/Convolutional layers which have weights and biases that need updating during training, activation layers simply apply a fixed mathematical function to their inputs.

public override bool SupportsTraining { get; }

Property Value

bool

Remarks

This property overrides the base class property to specify that activation layers do not have trainable parameters. Trainable parameters are values within a layer that are adjusted during the training process to minimize the loss function. Since activation layers simply apply a fixed mathematical function to their inputs without any adjustable parameters, this property always returns false.

For Beginners: This tells you that activation layers don't learn or change during training.

While layers like Dense layers have weights that get updated during training, activation layers just apply a fixed mathematical formula that never changes.

Think of it like this:

Dense layers are like adjustable knobs that the network learns to tune
Activation layers are like fixed functions (like f(x) = max(0, x) for ReLU)

This property helps the training system know that it doesn't need to update anything in this layer during the training process.

Methods

Backward(Tensor<T>)

Calculates how changes in the output affect the input during training.

This is called during the backward pass (backpropagation) when training the neural network. Backpropagation is the algorithm that determines how much each neuron contributed to the error in the network's prediction, allowing the network to adjust its parameters to reduce future errors.

For activation layers, the backward pass calculates how the gradient (rate of change) of the error with respect to the layer's output should be modified to get the gradient with respect to the layer's input. This involves applying the derivative of the activation function.

For example, with ReLU activation, the derivative is 1 for inputs that were positive, and 0 for inputs that were negative or zero. This means the gradient flows unchanged through positive activations but gets blocked (multiplied by zero) for negative activations.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>: How much the network's error changes with respect to this layer's output

Returns

Tensor<T>: How much the network's error changes with respect to this layer's input

Remarks

This method implements the backward pass for the activation layer. It checks that a forward pass has been performed and that the output gradient has the same shape as the input. Then it applies either the scalar or vector activation derivative based on the layer's configuration. For scalar activation, the derivative is applied element-wise and multiplied by the output gradient. For vector activation, the derivative tensor is multiplied by the output gradient.

For Beginners: This method calculates how the error gradient flows backward through this layer.

During backpropagation, the network calculates how each part contributed to the error. This method:

Checks that Forward() was called first (we need the saved input)
Verifies the gradient has the correct shape
Calculates how the gradient changes as it passes through this layer
Returns the modified gradient

For example, with ReLU activation:

If the input was positive, the gradient passes through unchanged
If the input was negative, the gradient is blocked (becomes 0)

This is because ReLU's derivative is 1 for positive inputs and 0 for negative inputs.

This process helps the network understand which neurons to adjust during training.

Exceptions

ForwardPassRequiredException: Thrown if called before Forward method
TensorShapeMismatchException: Thrown if the gradient shape doesn't match the input shape

BackwardGpu(IGpuTensor<T>)

Performs GPU-resident backward pass for the activation layer. Computes gradient with respect to input entirely on GPU - no CPU roundtrip.

public override IGpuTensor<T> BackwardGpu(IGpuTensor<T> outputGradient)

Parameters

outputGradient IGpuTensor<T>: GPU-resident gradient from the next layer.

Returns

IGpuTensor<T>: GPU-resident gradient to pass to the previous layer.

Exceptions

InvalidOperationException: Thrown if ForwardGpu was not called first.

ExportComputationGraph(List<ComputationNode<T>>)

Exports the activation layer's computation graph for JIT compilation.

public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)

Parameters

inputNodes List<ComputationNode<T>>: List to populate with input computation nodes.

Returns

ComputationNode<T>: The output computation node representing the activation function applied to the input.

Remarks

This method constructs a computation graph representation of the activation layer by: 1. Validating input parameters and layer configuration 2. Creating a symbolic input node with proper batch dimension 3. Applying the activation function to the symbolic input

For Beginners: This method converts the activation layer into a computation graph for JIT compilation.

The computation graph describes:

Input: A symbolic tensor with batch size = 1 plus the layer's input shape
Operation: Apply the activation function (ReLU, Sigmoid, etc.)
Output: The activated tensor

JIT compilation can make inference 5-10x faster by optimizing this graph into native code.

Forward(Tensor<T>)

Processes the input data by applying the activation function.

This is called during the forward pass of the neural network, which is when data flows from the input layer through all hidden layers to the output layer. The forward pass is used both during training and when making predictions with a trained model.

For example, if using ReLU activation, this method would replace all negative values in the input with zeros while keeping positive values unchanged.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>: The input data to process

Returns

Tensor<T>: The transformed data after applying the activation function

Remarks

This method implements the forward pass for the activation layer. It stores the input tensor for later use in the backward pass, then applies either a scalar or vector activation function based on the layer's configuration. For scalar activation, the function is applied to each element independently. For vector activation, the function is applied to the entire tensor at once.

For Beginners: This method applies the activation function to transform the input data.

During the forward pass, data flows through the network from input to output. This method:

Saves the input for later use in backpropagation
Applies the activation function to transform the data
Returns the transformed data

For example, with ReLU activation:

Input: [-2, 0, 3, -1, 5]
Output: [0, 0, 3, 0, 5] (negative values become 0)

This transformation adds non-linearity to the network, which is essential for learning complex patterns in the data.

ForwardGpu(params IGpuTensor<T>[])

Performs the forward pass on GPU using GPU-accelerated activation kernels.

public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)

Parameters

inputs IGpuTensor<T>[]

Returns

IGpuTensor<T>: A GPU tensor with the activation function applied.

Remarks

This method applies the activation function entirely on the GPU using optimized kernels. Supported activations: ReLU, LeakyReLU, Sigmoid, Tanh, GELU, Swish.

For Beginners: The GPU version of activation is much faster for large tensors because GPUs can process thousands of values in parallel.

Exceptions

InvalidOperationException: Thrown when GPU execution is not supported for this activation type.

GetParameters()

Gets all trainable parameters of this layer as a flat vector.

This method is useful for operations that need to work with all parameters at once, such as certain optimization algorithms, regularization techniques, or when saving a model.

Returns an empty vector since activation layers have no trainable parameters. Other layer types like Dense layers would return their weights and biases.

public override Vector<T> GetParameters()

Returns

Vector<T>: An empty vector representing the layer's parameters

Remarks

This method returns all trainable parameters of the layer as a flat vector. For layers with trainable parameters, this would involve reshaping multi-dimensional parameters (like weight matrices) into a one-dimensional vector. However, since activation layers have no trainable parameters, this method returns an empty vector.

For Beginners: This method returns all the layer's trainable values as a single list, but activation layers have none.

Some operations in neural networks need to work with all parameters at once:

Saving and loading models
Applying regularization (techniques to prevent overfitting)
Using advanced optimization algorithms

This method provides those parameters as a single vector, but since activation layers don't have any trainable parameters, it returns an empty vector.

For comparison:

A Dense layer with 100 inputs and 10 outputs would return a vector with 1,010 values (1,000 weights + 10 biases)
This ActivationLayer returns an empty vector with 0 values

ResetState()

Clears the layer's memory of previous inputs.

Neural networks maintain state between operations, especially during training. This method resets that state, which is useful in several scenarios: - When starting to process a new batch of data - Between training epochs - When switching from training to evaluation mode - When you want to ensure the layer behaves deterministically

For activation layers, this means forgetting the last input that was processed, which was stored to help with the backward pass calculations.

public override void ResetState()

Remarks

This method resets the internal state of the layer by clearing the cached input tensor. The activation layer stores the input from the most recent forward pass to use during the backward pass for calculating gradients. Resetting this state is useful when starting to process new data or when you want to ensure the layer behaves deterministically.

For Beginners: This method clears the layer's memory of previous calculations.

During training, the layer remembers the last input it processed to help with backpropagation calculations. This method makes the layer "forget" that input.

You might need to reset state:

When starting a new batch of training data
Between training epochs
When switching from training to testing
When you want to ensure consistent behavior

For activation layers, this is simple - it just clears the saved input tensor. Other layer types might have more complex state to reset.

This helps ensure that processing one batch doesn't accidentally affect the processing of the next batch.

UpdateParameters(T)

Updates the layer's internal parameters during training.

This method is part of the training process where layers adjust their parameters (weights and biases) based on the gradients calculated during backpropagation.

For activation layers, this method does nothing because they have no trainable parameters. Unlike layers such as Dense layers which need to update their weights and biases, activation layers simply apply a fixed mathematical function.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T: How quickly the network should learn from new data. Higher values mean bigger parameter updates.

Remarks

This method is called during the training process after the forward and backward passes have been completed. For layers with trainable parameters, this method would update those parameters based on the gradients calculated during backpropagation and the provided learning rate. However, since activation layers have no trainable parameters, this method does nothing.

For Beginners: This method would update the layer's internal values during training, but activation layers have nothing to update.

In neural networks, training involves adjusting parameters to reduce errors. This method is where those adjustments happen, but activation layers don't have any adjustable parameters, so this method is empty.

For comparison:

In a Dense layer, this would update weights and biases
In a BatchNorm layer, this would update scale and shift parameters
In this ActivationLayer, there's nothing to update

The learning rate parameter controls how big the updates would be if there were any parameters to update - higher values mean bigger changes.

Table of Contents

Class ActivationLayer<T>

Type Parameters

Constructors

ActivationLayer(int[], IActivationFunction<T>)

Parameters

ActivationLayer(int[], IVectorActivationFunction<T>)

Parameters

Properties

SupportsGpuExecution

Property Value

Remarks

SupportsJitCompilation

Property Value

Remarks

SupportsTraining

Property Value

Remarks

Methods

Backward(Tensor<T>)

Parameters

Returns

Remarks

Exceptions

BackwardGpu(IGpuTensor<T>)

Parameters

Returns

Exceptions

ExportComputationGraph(List<ComputationNode<T>>)

Parameters

Returns

Remarks

Forward(Tensor<T>)

Parameters

Returns

Remarks

ForwardGpu(params IGpuTensor<T>[])

Parameters

Returns

Remarks

Exceptions

GetParameters()

Returns

Remarks

ResetState()

Remarks

UpdateParameters(T)

Parameters

Remarks