Table of Contents

Class MaskingLayer<T>

Namespace
AiDotNet.NeuralNetworks.Layers
Assembly
AiDotNet.dll

Represents a layer that masks specified values in the input tensor, typically used to ignore padding in sequential data.

public class MaskingLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable

Type Parameters

T

The numeric type used for calculations, typically float or double.

Inheritance
MaskingLayer<T>
Implements
Inherited Members

Remarks

The MaskingLayer is used to skip certain time steps in sequential data by masking out specific values. During the forward pass, time steps with values equal to the mask value are multiplied by zero, effectively removing them from consideration by subsequent layers. This is particularly useful for handling variable-length sequences where padding is used to make all sequences the same length.

For Beginners: This layer helps the network ignore certain parts of your data.

Think of it like a highlighter that marks which parts of your data are important:

  • Any value matching the "mask value" (usually 0) gets ignored
  • All other values pass through unchanged
  • This is especially useful for sequences of different lengths

For example, if you have sentences of different lengths:

  • Short sentences might be padded with zeros to match longer ones
  • The masking layer tells the network to ignore those zeros
  • This helps the network focus only on the real data

Without masking, the network would try to learn patterns from the padding values, which would confuse the learning process.

Constructors

MaskingLayer(int[], double)

Initializes a new instance of the MaskingLayer<T> class.

public MaskingLayer(int[] inputShape, double maskValue = 0)

Parameters

inputShape int[]

The shape of the input tensor.

maskValue double

The value to be masked out. Defaults to 0.

Remarks

This constructor creates a MaskingLayer that will mask out all values equal to the specified mask value. The output shape is the same as the input shape, as the masking operation doesn't change the dimensions.

For Beginners: This creates a new masking layer with your desired settings.

When setting up this layer:

  • inputShape defines the expected size and dimensions of your data
  • maskValue is the specific value you want to ignore (typically 0)

For example, if you have sequences padded with zeros, you would set maskValue to 0 so that the network ignores those padding values.

Properties

SupportsGpuExecution

Gets a value indicating whether this layer supports GPU execution.

protected override bool SupportsGpuExecution { get; }

Property Value

bool

SupportsGpuTraining

Gets whether this layer has full GPU training support (forward, backward, and parameter updates).

public override bool SupportsGpuTraining { get; }

Property Value

bool

Remarks

This property indicates whether the layer can perform its entire training cycle on GPU without downloading data to CPU. A layer has full GPU training support when:

  • ForwardGpu is implemented
  • BackwardGpu is implemented
  • UpdateParametersGpu is implemented (for layers with trainable parameters)
  • GPU weight/bias/gradient buffers are properly managed

For Beginners: This tells you if training can happen entirely on GPU.

GPU-resident training is much faster because:

  • Data stays on GPU between forward and backward passes
  • No expensive CPU-GPU transfers during each training step
  • GPU kernels handle all gradient computation

Only layers that return true here can participate in fully GPU-resident training.

SupportsJitCompilation

Gets a value indicating whether this layer supports JIT compilation.

public override bool SupportsJitCompilation { get; }

Property Value

bool

Always true because masking is a simple element-wise operation that can be JIT compiled.

SupportsTraining

Gets a value indicating whether this layer supports training through backpropagation.

public override bool SupportsTraining { get; }

Property Value

bool

Remarks

This property returns false because the MaskingLayer does not have any trainable parameters, though it does support backward pass for gradient propagation through the network.

For Beginners: This tells you if the layer can learn from training data.

A value of false means:

  • This layer doesn't have any values that get updated during training
  • It performs a fixed operation (masking)
  • However, during training, it still helps gradients flow backward through the network

The masking layer doesn't need to learn anything - it just follows a simple rule: mask out specific values and pass everything else through.

Methods

Backward(Tensor<T>)

Performs the backward pass of the masking layer.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>

The gradient of the loss with respect to the layer's output.

Returns

Tensor<T>

The gradient of the loss with respect to the layer's input.

Remarks

This method implements the backward pass of the masking layer, which is used during training to propagate error gradients back through the network. It applies the same mask to the output gradient that was used in the forward pass, ensuring that gradients for masked values remain zero.

For Beginners: This method handles the flow of error information during training.

During the backward pass:

  • The layer receives information about how its output affected the overall error
  • It applies the same mask to this gradient information
  • This ensures that no gradient flows back through the masked values

This process is important because:

  • We don't want the network to learn from the masked (padding) values
  • The mask stops error information from flowing back through those values
  • This helps keep the training focused only on the real data

BackwardGpu(IGpuTensor<T>)

Performs the backward pass of the layer on GPU.

public override IGpuTensor<T> BackwardGpu(IGpuTensor<T> outputGradient)

Parameters

outputGradient IGpuTensor<T>

The GPU-resident gradient of the loss with respect to the layer's output.

Returns

IGpuTensor<T>

The GPU-resident gradient of the loss with respect to the layer's input.

Remarks

This method performs the layer's backward computation entirely on GPU, including:

  • Computing input gradients to pass to previous layers
  • Computing and storing weight gradients on GPU (for layers with trainable parameters)
  • Computing and storing bias gradients on GPU

For Beginners: This is like Backward() but runs entirely on GPU.

During GPU training:

  1. Output gradients come in (on GPU)
  2. Input gradients are computed (stay on GPU)
  3. Weight/bias gradients are computed and stored (on GPU)
  4. Input gradients are returned for the previous layer

All data stays on GPU - no CPU round-trips needed!

Exceptions

NotSupportedException

Thrown when the layer does not support GPU training.

ExportComputationGraph(List<ComputationNode<T>>)

Exports the masking layer's forward pass as a JIT-compilable computation graph.

public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)

Parameters

inputNodes List<ComputationNode<T>>

List to populate with input computation nodes.

Returns

ComputationNode<T>

The output computation node representing the masked result.

Remarks

This method builds a computation graph for the masking operation. The mask is applied element-wise: masked_output = input * mask. For JIT compilation, we assume a pre-computed mask or identity (no masking).

Forward(Tensor<T>)

Performs the forward pass of the masking layer.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>

The input tensor to process.

Returns

Tensor<T>

The output tensor after masking.

Remarks

This method implements the forward pass of the masking layer. It creates a binary mask where values equal to the mask value are set to 0 and other values are set to 1. This mask is then applied to the input tensor by element-wise multiplication, effectively removing the masked values from consideration.

For Beginners: This method processes your data through the masking layer.

During the forward pass:

  1. The layer creates a "mask" - a matching array where:
    • Values equal to the mask value (usually 0) become 0 in the mask
    • All other values become 1 in the mask
  2. The original input is multiplied by this mask
    • Where the mask is 1, the original value passes through
    • Where the mask is 0, the result becomes 0

For example, if you have data [5, 0, 7, 0, 9] and a mask value of 0:

  • The mask would be [1, 0, 1, 0, 1]
  • After applying the mask: [5, 0, 7, 0, 9] * [1, 0, 1, 0, 1] = [5, 0, 7, 0, 9]
  • But the zeros now have special meaning - they'll be ignored by subsequent layers

ForwardGpu(params IGpuTensor<T>[])

Performs the GPU-resident forward pass of the masking layer.

public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)

Parameters

inputs IGpuTensor<T>[]

The GPU input tensors.

Returns

IGpuTensor<T>

The GPU output tensor after masking.

Remarks

All computations stay on the GPU. Uses NotEqualScalar to create the mask and Multiply for element-wise application.

GetParameters()

Gets all trainable parameters of the layer as a single vector.

public override Vector<T> GetParameters()

Returns

Vector<T>

An empty vector since this layer has no trainable parameters.

Remarks

This method returns an empty vector because the MaskingLayer has no trainable parameters. However, it must be implemented to satisfy the base class contract.

For Beginners: This method would normally return all the values that can be learned during training.

Since this layer has no learnable values:

  • It returns an empty list (vector with length 0)
  • This is expected for layers that perform fixed operations
  • Other layers, like those with weights, would return those weights

ResetState()

Resets the internal state of the layer.

public override void ResetState()

Remarks

This method clears any cached data from previous forward passes, essentially resetting the layer to its initial state. This is useful when starting to process a new batch of data.

For Beginners: This method clears the layer's memory to start fresh.

When resetting the state:

  • Stored inputs and masks are cleared
  • The layer forgets any information from previous data
  • This is important when processing a new, unrelated batch of data

Think of it like wiping a slate clean before writing new information.

UpdateParameters(T)

Updates the parameters of the layer based on the calculated gradients.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T

The learning rate to use for the parameter updates.

Remarks

This method is empty because the MaskingLayer has no trainable parameters to update. However, it must be implemented to satisfy the base class contract.

For Beginners: This method would normally update the layer's internal values during training.

However, since this layer doesn't have any trainable parameters:

  • There's nothing to update
  • The method exists but doesn't do anything
  • This is normal for layers that perform fixed operations