Interface IActivationFunction<T>
- Namespace
- AiDotNet.Interfaces
- Assembly
- AiDotNet.dll
Defines an interface for activation functions used in neural networks and other machine learning algorithms.
public interface IActivationFunction<T>
Type Parameters
TThe numeric type used for calculations (e.g., double, float).
Remarks
For Beginners: An activation function is like a decision-maker in a neural network.
Imagine each neuron (node) in a neural network receives a number as input. The activation function decides how strongly that neuron should "fire" or activate based on that input.
For example:
- If the input is very negative, the neuron might not activate at all (output = 0)
- If the input is very positive, the neuron might activate fully (output = 1)
- If the input is around zero, the neuron might activate partially
Different activation functions create different patterns of activation, which helps neural networks learn different types of patterns in data. Common activation functions include Sigmoid, ReLU (Rectified Linear Unit), and Tanh (Hyperbolic Tangent).
This interface defines the standard methods that all activation functions must implement.
Properties
SupportsGpuTraining
Gets whether this activation function supports GPU-resident training.
bool SupportsGpuTraining { get; }
Property Value
- bool
True if the activation can perform forward and backward passes entirely on GPU.
Remarks
Activation functions return false if: - The GPU backend does not have kernels for this activation type - The activation has dynamic behavior that cannot be executed on GPU
For Beginners: GPU-resident training keeps all data on the graphics card during training, avoiding slow data transfers between CPU and GPU memory. This property indicates whether this activation function can participate in GPU-resident training.
SupportsJitCompilation
Gets whether this activation function supports JIT compilation.
bool SupportsJitCompilation { get; }
Property Value
- bool
True if the activation can be applied to computation graphs for JIT compilation.
Remarks
Activation functions return false if: - Gradient computation (backward pass) is not yet implemented - The activation uses operations not supported by TensorOperations - The activation has dynamic behavior that cannot be represented in a static graph
Once gradient computation is implemented and tested, set this to true.
For Beginners: JIT (Just-In-Time) compilation is an advanced optimization technique that pre-compiles the neural network's operations into a faster execution graph. This property indicates whether this activation function is ready to be part of that optimized execution. If false, the activation will fall back to the standard execution path.
Methods
Activate(Tensor<T>)
Applies the activation function to each element in a tensor.
Tensor<T> Activate(Tensor<T> input)
Parameters
inputTensor<T>The input tensor.
Returns
- Tensor<T>
A new tensor with the activation function applied to each element.
Activate(Vector<T>)
Applies the activation function to each element in a vector.
Vector<T> Activate(Vector<T> input)
Parameters
inputVector<T>The input vector.
Returns
- Vector<T>
A new vector with the activation function applied to each element.
Activate(T)
Applies the activation function to the input value.
T Activate(T input)
Parameters
inputTThe input value to the activation function.
Returns
- T
The activated output value.
Remarks
For Beginners: This method takes a number (which could be positive, negative, or zero) and transforms it according to the specific activation function's rule.
For example, with the ReLU activation function:
- If input is negative, output is 0
- If input is positive, output is the same as the input
This transformation helps neural networks model complex, non-linear relationships in data, which is essential for tasks like image recognition or language processing.
ApplyToGraph(ComputationNode<T>)
Applies this activation function to a computation graph node.
ComputationNode<T> ApplyToGraph(ComputationNode<T> input)
Parameters
inputComputationNode<T>The computation node to apply the activation to.
Returns
- ComputationNode<T>
A new computation node with the activation applied.
Remarks
This method maps the activation to the corresponding TensorOperations method. For example, ReLU returns TensorOperations<T>.ReLU(input).
For Beginners: This method adds the activation function to the computation graph, which is a data structure that represents all the operations in the neural network. The graph can then be optimized and executed more efficiently through JIT compilation.
Exceptions
- NotSupportedException
Thrown if SupportsJitCompilation is false.
Backward(Tensor<T>, Tensor<T>)
Calculates the backward pass gradient for this activation function.
Tensor<T> Backward(Tensor<T> input, Tensor<T> outputGradient)
Parameters
inputTensor<T>The input tensor that was used in the forward pass.
outputGradientTensor<T>The gradient flowing back from the next layer.
Returns
- Tensor<T>
The gradient with respect to the input.
Remarks
This method computes dL/dx = dL/dy * dy/dx. For element-wise activations (ReLU, Sigmoid), this is element-wise multiplication. For vector activations (Softmax), this involves Jacobian multiplication.
BackwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer?, IGpuBuffer?, IGpuBuffer, int)
Calculates the backward pass gradient on GPU.
void BackwardGpu(IDirectGpuBackend backend, IGpuBuffer gradOutput, IGpuBuffer? input, IGpuBuffer? output, IGpuBuffer gradInput, int size)
Parameters
backendIDirectGpuBackendThe GPU backend to use for execution.
gradOutputIGpuBufferThe gradient flowing back from the next layer.
inputIGpuBufferThe input buffer from the forward pass (needed for ReLU, GELU, Swish, LeakyReLU).
outputIGpuBufferThe output buffer from the forward pass (needed for Sigmoid, Tanh).
gradInputIGpuBufferThe output buffer to store the input gradient.
sizeintThe number of elements to process.
Remarks
This method computes the activation gradient entirely on GPU. Different activation functions require different cached values from forward pass: - ReLU, LeakyReLU, GELU, Swish: Need the input from forward pass - Sigmoid, Tanh: Need the output from forward pass
For Beginners: During training, we need to compute gradients to update the network. This method computes the gradient of the activation function on GPU, which is essential for efficient GPU-resident training.
Exceptions
- NotSupportedException
Thrown if SupportsGpuTraining is false.
Derivative(Tensor<T>)
Calculates the derivative for each element in a tensor.
Tensor<T> Derivative(Tensor<T> input)
Parameters
inputTensor<T>The input tensor.
Returns
- Tensor<T>
A new tensor containing derivatives for each input element.
Derivative(Vector<T>)
Calculates the derivative matrix for a vector input.
Matrix<T> Derivative(Vector<T> input)
Parameters
inputVector<T>The input vector.
Returns
- Matrix<T>
A diagonal matrix containing derivatives for each input element.
Derivative(T)
Calculates the derivative (slope) of the activation function at the given input value.
T Derivative(T input)
Parameters
inputTThe input value at which to calculate the derivative.
Returns
- T
The derivative value of the activation function at the input point.
Remarks
For Beginners: The derivative tells us how quickly the activation function's output changes when we make a small change to the input.
Think of it as the "slope" or "steepness" at a particular point on the activation function's curve.
This is crucial for training neural networks because:
- It helps determine how much to adjust the network's weights during learning
- A higher derivative means a stronger signal for learning
- A derivative of zero means no learning signal (which can be a problem known as "vanishing gradient")
During training, the neural network uses this derivative to figure out how to adjust its internal parameters to improve its predictions.
ForwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int)
Applies the activation function on GPU.
void ForwardGpu(IDirectGpuBackend backend, IGpuBuffer input, IGpuBuffer output, int size)
Parameters
backendIDirectGpuBackendThe GPU backend to use for execution.
inputIGpuBufferThe input GPU buffer.
outputIGpuBufferThe output GPU buffer to store the activated values.
sizeintThe number of elements to process.
Remarks
This method applies the activation function entirely on GPU, avoiding CPU-GPU data transfers. The input and output buffers may be the same for in-place operations if supported.
For Beginners: This is the GPU-accelerated version of the Activate method. Instead of processing data on the CPU, this runs thousands of calculations in parallel on the GPU, making it much faster for large tensors.
Exceptions
- NotSupportedException
Thrown if SupportsGpuTraining is false.