Table of Contents

Interface IActivationFunction<T>

Namespace
AiDotNet.Interfaces
Assembly
AiDotNet.dll

Defines an interface for activation functions used in neural networks and other machine learning algorithms.

public interface IActivationFunction<T>

Type Parameters

T

The numeric type used for calculations (e.g., double, float).

Remarks

For Beginners: An activation function is like a decision-maker in a neural network.

Imagine each neuron (node) in a neural network receives a number as input. The activation function decides how strongly that neuron should "fire" or activate based on that input.

For example:

  • If the input is very negative, the neuron might not activate at all (output = 0)
  • If the input is very positive, the neuron might activate fully (output = 1)
  • If the input is around zero, the neuron might activate partially

Different activation functions create different patterns of activation, which helps neural networks learn different types of patterns in data. Common activation functions include Sigmoid, ReLU (Rectified Linear Unit), and Tanh (Hyperbolic Tangent).

This interface defines the standard methods that all activation functions must implement.

Properties

SupportsGpuTraining

Gets whether this activation function supports GPU-resident training.

bool SupportsGpuTraining { get; }

Property Value

bool

True if the activation can perform forward and backward passes entirely on GPU.

Remarks

Activation functions return false if: - The GPU backend does not have kernels for this activation type - The activation has dynamic behavior that cannot be executed on GPU

For Beginners: GPU-resident training keeps all data on the graphics card during training, avoiding slow data transfers between CPU and GPU memory. This property indicates whether this activation function can participate in GPU-resident training.

SupportsJitCompilation

Gets whether this activation function supports JIT compilation.

bool SupportsJitCompilation { get; }

Property Value

bool

True if the activation can be applied to computation graphs for JIT compilation.

Remarks

Activation functions return false if: - Gradient computation (backward pass) is not yet implemented - The activation uses operations not supported by TensorOperations - The activation has dynamic behavior that cannot be represented in a static graph

Once gradient computation is implemented and tested, set this to true.

For Beginners: JIT (Just-In-Time) compilation is an advanced optimization technique that pre-compiles the neural network's operations into a faster execution graph. This property indicates whether this activation function is ready to be part of that optimized execution. If false, the activation will fall back to the standard execution path.

Methods

Activate(Tensor<T>)

Applies the activation function to each element in a tensor.

Tensor<T> Activate(Tensor<T> input)

Parameters

input Tensor<T>

The input tensor.

Returns

Tensor<T>

A new tensor with the activation function applied to each element.

Activate(Vector<T>)

Applies the activation function to each element in a vector.

Vector<T> Activate(Vector<T> input)

Parameters

input Vector<T>

The input vector.

Returns

Vector<T>

A new vector with the activation function applied to each element.

Activate(T)

Applies the activation function to the input value.

T Activate(T input)

Parameters

input T

The input value to the activation function.

Returns

T

The activated output value.

Remarks

For Beginners: This method takes a number (which could be positive, negative, or zero) and transforms it according to the specific activation function's rule.

For example, with the ReLU activation function:

  • If input is negative, output is 0
  • If input is positive, output is the same as the input

This transformation helps neural networks model complex, non-linear relationships in data, which is essential for tasks like image recognition or language processing.

ApplyToGraph(ComputationNode<T>)

Applies this activation function to a computation graph node.

ComputationNode<T> ApplyToGraph(ComputationNode<T> input)

Parameters

input ComputationNode<T>

The computation node to apply the activation to.

Returns

ComputationNode<T>

A new computation node with the activation applied.

Remarks

This method maps the activation to the corresponding TensorOperations method. For example, ReLU returns TensorOperations<T>.ReLU(input).

For Beginners: This method adds the activation function to the computation graph, which is a data structure that represents all the operations in the neural network. The graph can then be optimized and executed more efficiently through JIT compilation.

Exceptions

NotSupportedException

Thrown if SupportsJitCompilation is false.

Backward(Tensor<T>, Tensor<T>)

Calculates the backward pass gradient for this activation function.

Tensor<T> Backward(Tensor<T> input, Tensor<T> outputGradient)

Parameters

input Tensor<T>

The input tensor that was used in the forward pass.

outputGradient Tensor<T>

The gradient flowing back from the next layer.

Returns

Tensor<T>

The gradient with respect to the input.

Remarks

This method computes dL/dx = dL/dy * dy/dx. For element-wise activations (ReLU, Sigmoid), this is element-wise multiplication. For vector activations (Softmax), this involves Jacobian multiplication.

BackwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer?, IGpuBuffer?, IGpuBuffer, int)

Calculates the backward pass gradient on GPU.

void BackwardGpu(IDirectGpuBackend backend, IGpuBuffer gradOutput, IGpuBuffer? input, IGpuBuffer? output, IGpuBuffer gradInput, int size)

Parameters

backend IDirectGpuBackend

The GPU backend to use for execution.

gradOutput IGpuBuffer

The gradient flowing back from the next layer.

input IGpuBuffer

The input buffer from the forward pass (needed for ReLU, GELU, Swish, LeakyReLU).

output IGpuBuffer

The output buffer from the forward pass (needed for Sigmoid, Tanh).

gradInput IGpuBuffer

The output buffer to store the input gradient.

size int

The number of elements to process.

Remarks

This method computes the activation gradient entirely on GPU. Different activation functions require different cached values from forward pass: - ReLU, LeakyReLU, GELU, Swish: Need the input from forward pass - Sigmoid, Tanh: Need the output from forward pass

For Beginners: During training, we need to compute gradients to update the network. This method computes the gradient of the activation function on GPU, which is essential for efficient GPU-resident training.

Exceptions

NotSupportedException

Thrown if SupportsGpuTraining is false.

Derivative(Tensor<T>)

Calculates the derivative for each element in a tensor.

Tensor<T> Derivative(Tensor<T> input)

Parameters

input Tensor<T>

The input tensor.

Returns

Tensor<T>

A new tensor containing derivatives for each input element.

Derivative(Vector<T>)

Calculates the derivative matrix for a vector input.

Matrix<T> Derivative(Vector<T> input)

Parameters

input Vector<T>

The input vector.

Returns

Matrix<T>

A diagonal matrix containing derivatives for each input element.

Derivative(T)

Calculates the derivative (slope) of the activation function at the given input value.

T Derivative(T input)

Parameters

input T

The input value at which to calculate the derivative.

Returns

T

The derivative value of the activation function at the input point.

Remarks

For Beginners: The derivative tells us how quickly the activation function's output changes when we make a small change to the input.

Think of it as the "slope" or "steepness" at a particular point on the activation function's curve.

This is crucial for training neural networks because:

  • It helps determine how much to adjust the network's weights during learning
  • A higher derivative means a stronger signal for learning
  • A derivative of zero means no learning signal (which can be a problem known as "vanishing gradient")

During training, the neural network uses this derivative to figure out how to adjust its internal parameters to improve its predictions.

ForwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int)

Applies the activation function on GPU.

void ForwardGpu(IDirectGpuBackend backend, IGpuBuffer input, IGpuBuffer output, int size)

Parameters

backend IDirectGpuBackend

The GPU backend to use for execution.

input IGpuBuffer

The input GPU buffer.

output IGpuBuffer

The output GPU buffer to store the activated values.

size int

The number of elements to process.

Remarks

This method applies the activation function entirely on GPU, avoiding CPU-GPU data transfers. The input and output buffers may be the same for in-place operations if supported.

For Beginners: This is the GPU-accelerated version of the Activate method. Instead of processing data on the CPU, this runs thousands of calculations in parallel on the GPU, making it much faster for large tensors.

Exceptions

NotSupportedException

Thrown if SupportsGpuTraining is false.