Table of Contents

Interface IVectorActivationFunction<T>

Namespace
AiDotNet.Interfaces
Assembly
AiDotNet.dll

Defines activation functions that operate on vectors and tensors in neural networks.

public interface IVectorActivationFunction<T>

Type Parameters

T

The numeric data type used for calculations (e.g., float, double).

Remarks

Activation functions introduce non-linearity into neural networks, allowing them to learn complex patterns in data. This interface provides methods to apply activation functions to vectors and tensors, as well as calculate their derivatives for backpropagation.

For Beginners: Activation functions are like "decision makers" in neural networks.

Imagine you're deciding whether to go outside based on the temperature:

  • If it's below 60—F, you definitely won't go (output = 0)
  • If it's above 75—F, you definitely will go (output = 1)
  • If it's between 60-75—F, you're somewhat likely to go (output between 0 and 1)

This is similar to how activation functions work. They take the input from previous calculations in the neural network and transform it into an output that determines how strongly a neuron "fires" or activates. Without activation functions, neural networks would just be doing simple linear calculations and couldn't learn complex patterns.

Common activation functions include:

  • Sigmoid: Outputs values between 0 and 1 (like our temperature example)
  • ReLU: Outputs the input if positive, or zero if negative
  • Tanh: Outputs values between -1 and 1

Properties

SupportsJitCompilation

Gets whether this activation function supports JIT compilation.

bool SupportsJitCompilation { get; }

Property Value

bool

True if the activation can be applied to computation graphs for JIT compilation.

Remarks

Activation functions return false if: - Gradient computation (backward pass) is not yet implemented - The activation uses operations not supported by TensorOperations - The activation has dynamic behavior that cannot be represented in a static graph

Once gradient computation is implemented and tested, set this to true.

For Beginners: JIT (Just-In-Time) compilation is an advanced optimization technique that pre-compiles the neural network's operations into a faster execution graph. This property indicates whether this activation function is ready to be part of that optimized execution. If false, the activation will fall back to the standard execution path.

Methods

Activate(Tensor<T>)

Applies the activation function to each element in a tensor.

Tensor<T> Activate(Tensor<T> input)

Parameters

input Tensor<T>

The tensor to apply the activation function to.

Returns

Tensor<T>

A new tensor with the activation function applied to each element.

Remarks

This method transforms each value in the input tensor according to the activation function.

For Beginners: A tensor is like a multi-dimensional array - think of it as a cube or higher-dimensional block of numbers. This method applies the same transformation to every number in that block.

For example, if you have image data (which can be represented as a 3D tensor with dimensions for height, width, and color channels), this method would apply the activation function to every pixel value in the image.

Activate(Vector<T>)

Applies the activation function to each element in a vector.

Vector<T> Activate(Vector<T> input)

Parameters

input Vector<T>

The vector to apply the activation function to.

Returns

Vector<T>

A new vector with the activation function applied to each element.

Remarks

This method transforms each value in the input vector according to the activation function.

For Beginners: This method takes a list of numbers (the input vector) and applies the same transformation to each number. For example, if using the ReLU activation function:

Input vector: [-2, 0, 3, -1, 5] Output vector: [0, 0, 3, 0, 5]

The ReLU function keeps positive values unchanged but changes negative values to zero. Different activation functions will transform the values differently.

ApplyToGraph(ComputationNode<T>)

Applies this activation function to a computation graph node.

ComputationNode<T> ApplyToGraph(ComputationNode<T> input)

Parameters

input ComputationNode<T>

The computation node to apply the activation to.

Returns

ComputationNode<T>

A new computation node with the activation applied.

Remarks

This method maps the activation to the corresponding TensorOperations method. For example, Softmax returns TensorOperations<T>.Softmax(input).

For Beginners: This method adds the activation function to the computation graph, which is a data structure that represents all the operations in the neural network. The graph can then be optimized and executed more efficiently through JIT compilation.

Exceptions

NotSupportedException

Thrown if SupportsJitCompilation is false.

Backward(Tensor<T>, Tensor<T>)

Calculates the backward pass gradient for this activation function.

Tensor<T> Backward(Tensor<T> input, Tensor<T> outputGradient)

Parameters

input Tensor<T>

The input tensor that was used in the forward pass.

outputGradient Tensor<T>

The gradient flowing back from the next layer.

Returns

Tensor<T>

The gradient with respect to the input.

Remarks

For .NET 8.0+, a default implementation is provided that computes the element-wise product of the activation derivative and the incoming output gradient: inputGradient = derivative(input) * outputGradient. For .NET Framework 4.7.1, implementers must provide this method explicitly.

This default behavior is appropriate for most element-wise activation functions where the chain rule simplifies to element-wise multiplication. Implementations that require different behavior (e.g., softmax, which has cross-element dependencies) should override this method.

For Beginners: During backpropagation, we need to calculate how much each input contributed to the final error. This is done by multiplying the derivative of the activation function at each point by the gradient flowing back from the next layer. The default implementation handles this automatically for most activation functions.

Derivative(Tensor<T>)

Calculates the derivative of the activation function for each element in a tensor.

Tensor<T> Derivative(Tensor<T> input)

Parameters

input Tensor<T>

The tensor to calculate derivatives for.

Returns

Tensor<T>

A tensor containing the derivatives of the activation function.

Remarks

This method computes the derivatives of the activation function for all elements in the input tensor.

For Beginners: Similar to the vector version, this calculates how sensitive the activation function is to changes in each element of the input tensor. The difference is that this works with multi-dimensional data.

For example, with image data, this would tell us how a small change in each pixel's value would affect the output of the activation function. This information is used during the learning process to adjust the neural network's parameters.

Derivative(Vector<T>)

Calculates the derivative of the activation function for each element in a vector.

Matrix<T> Derivative(Vector<T> input)

Parameters

input Vector<T>

The vector to calculate derivatives for.

Returns

Matrix<T>

A matrix containing the derivatives of the activation function.

Remarks

This method computes how the activation function's output changes with respect to small changes in its input. This is essential for the backpropagation algorithm in neural networks.

For Beginners: The derivative tells us how sensitive the activation function is to changes in its input. This is crucial for the "learning" part of neural networks.

Think of it like this: If you slightly increase the temperature in our earlier example, how much more likely are you to go outside? The derivative gives us this rate of change.

For a vector input, this method returns a matrix where each element represents the derivative at the corresponding position in the input vector.