Interface IVectorActivationFunction<T>
- Namespace
- AiDotNet.Interfaces
- Assembly
- AiDotNet.dll
Defines activation functions that operate on vectors and tensors in neural networks.
public interface IVectorActivationFunction<T>
Type Parameters
TThe numeric data type used for calculations (e.g., float, double).
Remarks
Activation functions introduce non-linearity into neural networks, allowing them to learn complex patterns in data. This interface provides methods to apply activation functions to vectors and tensors, as well as calculate their derivatives for backpropagation.
For Beginners: Activation functions are like "decision makers" in neural networks.
Imagine you're deciding whether to go outside based on the temperature:
- If it's below 60—F, you definitely won't go (output = 0)
- If it's above 75—F, you definitely will go (output = 1)
- If it's between 60-75—F, you're somewhat likely to go (output between 0 and 1)
This is similar to how activation functions work. They take the input from previous calculations in the neural network and transform it into an output that determines how strongly a neuron "fires" or activates. Without activation functions, neural networks would just be doing simple linear calculations and couldn't learn complex patterns.
Common activation functions include:
- Sigmoid: Outputs values between 0 and 1 (like our temperature example)
- ReLU: Outputs the input if positive, or zero if negative
- Tanh: Outputs values between -1 and 1
Properties
SupportsJitCompilation
Gets whether this activation function supports JIT compilation.
bool SupportsJitCompilation { get; }
Property Value
- bool
True if the activation can be applied to computation graphs for JIT compilation.
Remarks
Activation functions return false if: - Gradient computation (backward pass) is not yet implemented - The activation uses operations not supported by TensorOperations - The activation has dynamic behavior that cannot be represented in a static graph
Once gradient computation is implemented and tested, set this to true.
For Beginners: JIT (Just-In-Time) compilation is an advanced optimization technique that pre-compiles the neural network's operations into a faster execution graph. This property indicates whether this activation function is ready to be part of that optimized execution. If false, the activation will fall back to the standard execution path.
Methods
Activate(Tensor<T>)
Applies the activation function to each element in a tensor.
Tensor<T> Activate(Tensor<T> input)
Parameters
inputTensor<T>The tensor to apply the activation function to.
Returns
- Tensor<T>
A new tensor with the activation function applied to each element.
Remarks
This method transforms each value in the input tensor according to the activation function.
For Beginners: A tensor is like a multi-dimensional array - think of it as a cube or higher-dimensional block of numbers. This method applies the same transformation to every number in that block.
For example, if you have image data (which can be represented as a 3D tensor with dimensions for height, width, and color channels), this method would apply the activation function to every pixel value in the image.
Activate(Vector<T>)
Applies the activation function to each element in a vector.
Vector<T> Activate(Vector<T> input)
Parameters
inputVector<T>The vector to apply the activation function to.
Returns
- Vector<T>
A new vector with the activation function applied to each element.
Remarks
This method transforms each value in the input vector according to the activation function.
For Beginners: This method takes a list of numbers (the input vector) and applies the same transformation to each number. For example, if using the ReLU activation function:
Input vector: [-2, 0, 3, -1, 5] Output vector: [0, 0, 3, 0, 5]
The ReLU function keeps positive values unchanged but changes negative values to zero. Different activation functions will transform the values differently.
ApplyToGraph(ComputationNode<T>)
Applies this activation function to a computation graph node.
ComputationNode<T> ApplyToGraph(ComputationNode<T> input)
Parameters
inputComputationNode<T>The computation node to apply the activation to.
Returns
- ComputationNode<T>
A new computation node with the activation applied.
Remarks
This method maps the activation to the corresponding TensorOperations method. For example, Softmax returns TensorOperations<T>.Softmax(input).
For Beginners: This method adds the activation function to the computation graph, which is a data structure that represents all the operations in the neural network. The graph can then be optimized and executed more efficiently through JIT compilation.
Exceptions
- NotSupportedException
Thrown if SupportsJitCompilation is false.
Backward(Tensor<T>, Tensor<T>)
Calculates the backward pass gradient for this activation function.
Tensor<T> Backward(Tensor<T> input, Tensor<T> outputGradient)
Parameters
inputTensor<T>The input tensor that was used in the forward pass.
outputGradientTensor<T>The gradient flowing back from the next layer.
Returns
- Tensor<T>
The gradient with respect to the input.
Remarks
For .NET 8.0+, a default implementation is provided that computes the element-wise product of the activation derivative and the incoming output gradient: inputGradient = derivative(input) * outputGradient. For .NET Framework 4.7.1, implementers must provide this method explicitly.
This default behavior is appropriate for most element-wise activation functions where the chain rule simplifies to element-wise multiplication. Implementations that require different behavior (e.g., softmax, which has cross-element dependencies) should override this method.
For Beginners: During backpropagation, we need to calculate how much each input contributed to the final error. This is done by multiplying the derivative of the activation function at each point by the gradient flowing back from the next layer. The default implementation handles this automatically for most activation functions.
Derivative(Tensor<T>)
Calculates the derivative of the activation function for each element in a tensor.
Tensor<T> Derivative(Tensor<T> input)
Parameters
inputTensor<T>The tensor to calculate derivatives for.
Returns
- Tensor<T>
A tensor containing the derivatives of the activation function.
Remarks
This method computes the derivatives of the activation function for all elements in the input tensor.
For Beginners: Similar to the vector version, this calculates how sensitive the activation function is to changes in each element of the input tensor. The difference is that this works with multi-dimensional data.
For example, with image data, this would tell us how a small change in each pixel's value would affect the output of the activation function. This information is used during the learning process to adjust the neural network's parameters.
Derivative(Vector<T>)
Calculates the derivative of the activation function for each element in a vector.
Matrix<T> Derivative(Vector<T> input)
Parameters
inputVector<T>The vector to calculate derivatives for.
Returns
- Matrix<T>
A matrix containing the derivatives of the activation function.
Remarks
This method computes how the activation function's output changes with respect to small changes in its input. This is essential for the backpropagation algorithm in neural networks.
For Beginners: The derivative tells us how sensitive the activation function is to changes in its input. This is crucial for the "learning" part of neural networks.
Think of it like this: If you slightly increase the temperature in our earlier example, how much more likely are you to go outside? The derivative gives us this rate of change.
For a vector input, this method returns a matrix where each element represents the derivative at the corresponding position in the input vector.