Class MaskingLayer<T>

Namespace: AiDotNet.NeuralNetworks.Layers

Assembly: AiDotNet.dll

Represents a layer that masks specified values in the input tensor, typically used to ignore padding in sequential data.

public class MaskingLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable

Type Parameters

T: The numeric type used for calculations, typically float or double.

Inheritance: object

LayerBase<T>

MaskingLayer<T>

Implements: ILayer<T>

IJitCompilable<T>

IDiagnosticsProvider

IWeightLoadable<T>

IDisposable

Inherited Members: LayerBase<T>.Engine

LayerBase<T>.ScalarActivation

LayerBase<T>.VectorActivation

LayerBase<T>.UsingVectorActivation

LayerBase<T>.NumOps

LayerBase<T>.Random

LayerBase<T>.Parameters

LayerBase<T>.ParameterGradients

LayerBase<T>.InputShape

LayerBase<T>.InputShapes

LayerBase<T>.UpdateInputShape(int[])

LayerBase<T>.OutputShape

LayerBase<T>.IsTrainingMode

LayerBase<T>.InitializationStrategy

LayerBase<T>.IsInitialized

LayerBase<T>.InitializationLock

LayerBase<T>.EnsureInitialized()

LayerBase<T>.UseAutodiff

LayerBase<T>.SetTrainingMode(bool)

LayerBase<T>.GetParameterGradients()

LayerBase<T>.ClearGradients()

LayerBase<T>.GetInputShape()

LayerBase<T>.GetInputShapes()

LayerBase<T>.GetOutputShape()

LayerBase<T>.GetWeights()

LayerBase<T>.GetBiases()

LayerBase<T>.MapActivationToFused()

LayerBase<T>.CanExecuteOnGpu

LayerBase<T>.CanTrainOnGpu

LayerBase<T>.UpdateParametersGpu(IGpuOptimizerConfig)

LayerBase<T>.UploadWeightsToGpu()

LayerBase<T>.DownloadWeightsFromGpu()

LayerBase<T>.ZeroGradientsGpu()

LayerBase<T>.GetActivationTypes()

LayerBase<T>.Forward(params Tensor<T>[])

LayerBase<T>.ApplyActivation(Tensor<T>)

LayerBase<T>.ApplyActivation(Vector<T>)

LayerBase<T>.ActivateTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ActivateTensor(IVectorActivationFunction<T>, Tensor<T>)

LayerBase<T>.CalculateInputShape(int, int, int)

LayerBase<T>.CalculateOutputShape(int, int, int)

LayerBase<T>.Clone()

LayerBase<T>.DerivativeTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ApplyActivationDerivative(T, T)

LayerBase<T>.ApplyActivationDerivative(Tensor<T>, Tensor<T>)

LayerBase<T>.ComputeActivationJacobian(Vector<T>)

LayerBase<T>.ApplyActivationDerivative(Vector<T>, Vector<T>)

LayerBase<T>.UpdateParameters(Vector<T>)

LayerBase<T>.ParameterCount

LayerBase<T>.Serialize(BinaryWriter)

LayerBase<T>.Deserialize(BinaryReader)

LayerBase<T>.SetParameters(Vector<T>)

LayerBase<T>.GetDiagnostics()

LayerBase<T>.ApplyActivationToGraph(ComputationNode<T>)

LayerBase<T>.CanActivationBeJitted()

LayerBase<T>.RegisterTrainableParameter(Tensor<T>, PersistentTensorRole)

LayerBase<T>.InvalidateTrainableParameter(Tensor<T>)

LayerBase<T>.HasGpuActivation()

LayerBase<T>.ApplyActivationForwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.ApplyActivationBackwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.GetFusedActivationType()

LayerBase<T>.ApplyGpuActivation(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int, FusedActivationType)

LayerBase<T>.ApplyGpuActivationBackward(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int, FusedActivationType, float)

LayerBase<T>.Dispose()

LayerBase<T>.Dispose(bool)

LayerBase<T>.WeightParameterName

LayerBase<T>.BiasParameterName

LayerBase<T>.SetWeights(Tensor<T>)

LayerBase<T>.SetBiases(Tensor<T>)

LayerBase<T>.GetParameterNames()

LayerBase<T>.TryGetParameter(string, out Tensor<T>)

LayerBase<T>.SetParameter(string, Tensor<T>)

LayerBase<T>.GetParameterShape(string)

LayerBase<T>.NamedParameterCount

LayerBase<T>.ValidateWeights(IEnumerable<string>, Func<string, string>)

LayerBase<T>.LoadWeights(Dictionary<string, Tensor<T>>, Func<string, string>, bool)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

The MaskingLayer is used to skip certain time steps in sequential data by masking out specific values. During the forward pass, time steps with values equal to the mask value are multiplied by zero, effectively removing them from consideration by subsequent layers. This is particularly useful for handling variable-length sequences where padding is used to make all sequences the same length.

For Beginners: This layer helps the network ignore certain parts of your data.

Think of it like a highlighter that marks which parts of your data are important:

Any value matching the "mask value" (usually 0) gets ignored
All other values pass through unchanged
This is especially useful for sequences of different lengths

For example, if you have sentences of different lengths:

Short sentences might be padded with zeros to match longer ones
The masking layer tells the network to ignore those zeros
This helps the network focus only on the real data

Without masking, the network would try to learn patterns from the padding values, which would confuse the learning process.

Constructors

MaskingLayer(int[], double)

Initializes a new instance of the MaskingLayer<T> class.

public MaskingLayer(int[] inputShape, double maskValue = 0)

Parameters

inputShape int[]: The shape of the input tensor.
maskValue double: The value to be masked out. Defaults to 0.

Remarks

This constructor creates a MaskingLayer that will mask out all values equal to the specified mask value. The output shape is the same as the input shape, as the masking operation doesn't change the dimensions.

For Beginners: This creates a new masking layer with your desired settings.

When setting up this layer:

inputShape defines the expected size and dimensions of your data
maskValue is the specific value you want to ignore (typically 0)

For example, if you have sequences padded with zeros, you would set maskValue to 0 so that the network ignores those padding values.

Properties

SupportsGpuExecution

Gets a value indicating whether this layer supports GPU execution.

protected override bool SupportsGpuExecution { get; }

Property Value

bool

SupportsGpuTraining

Gets whether this layer has full GPU training support (forward, backward, and parameter updates).

public override bool SupportsGpuTraining { get; }

Property Value

bool

Remarks

This property indicates whether the layer can perform its entire training cycle on GPU without downloading data to CPU. A layer has full GPU training support when:

ForwardGpu is implemented
BackwardGpu is implemented
UpdateParametersGpu is implemented (for layers with trainable parameters)
GPU weight/bias/gradient buffers are properly managed

For Beginners: This tells you if training can happen entirely on GPU.

GPU-resident training is much faster because:

Data stays on GPU between forward and backward passes
No expensive CPU-GPU transfers during each training step
GPU kernels handle all gradient computation

Only layers that return true here can participate in fully GPU-resident training.

SupportsJitCompilation

Gets a value indicating whether this layer supports JIT compilation.

public override bool SupportsJitCompilation { get; }

Property Value

bool: Always true because masking is a simple element-wise operation that can be JIT compiled.

SupportsTraining

Gets a value indicating whether this layer supports training through backpropagation.

public override bool SupportsTraining { get; }

Property Value

bool

Remarks

This property returns false because the MaskingLayer does not have any trainable parameters, though it does support backward pass for gradient propagation through the network.

For Beginners: This tells you if the layer can learn from training data.

A value of false means:

This layer doesn't have any values that get updated during training
It performs a fixed operation (masking)
However, during training, it still helps gradients flow backward through the network

The masking layer doesn't need to learn anything - it just follows a simple rule: mask out specific values and pass everything else through.

Methods

Backward(Tensor<T>)

Performs the backward pass of the masking layer.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>: The gradient of the loss with respect to the layer's output.

Returns

Tensor<T>: The gradient of the loss with respect to the layer's input.

Remarks

This method implements the backward pass of the masking layer, which is used during training to propagate error gradients back through the network. It applies the same mask to the output gradient that was used in the forward pass, ensuring that gradients for masked values remain zero.

For Beginners: This method handles the flow of error information during training.

During the backward pass:

The layer receives information about how its output affected the overall error
It applies the same mask to this gradient information
This ensures that no gradient flows back through the masked values

This process is important because:

We don't want the network to learn from the masked (padding) values
The mask stops error information from flowing back through those values
This helps keep the training focused only on the real data

BackwardGpu(IGpuTensor<T>)

Performs the backward pass of the layer on GPU.

public override IGpuTensor<T> BackwardGpu(IGpuTensor<T> outputGradient)

Parameters

outputGradient IGpuTensor<T>: The GPU-resident gradient of the loss with respect to the layer's output.

Returns

IGpuTensor<T>: The GPU-resident gradient of the loss with respect to the layer's input.

Remarks

This method performs the layer's backward computation entirely on GPU, including:

Computing input gradients to pass to previous layers
Computing and storing weight gradients on GPU (for layers with trainable parameters)
Computing and storing bias gradients on GPU

For Beginners: This is like Backward() but runs entirely on GPU.

During GPU training:

Output gradients come in (on GPU)
Input gradients are computed (stay on GPU)
Weight/bias gradients are computed and stored (on GPU)
Input gradients are returned for the previous layer

All data stays on GPU - no CPU round-trips needed!

Exceptions

NotSupportedException: Thrown when the layer does not support GPU training.

ExportComputationGraph(List<ComputationNode<T>>)

Exports the masking layer's forward pass as a JIT-compilable computation graph.

public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)

Parameters

inputNodes List<ComputationNode<T>>: List to populate with input computation nodes.

Returns

ComputationNode<T>: The output computation node representing the masked result.

Remarks

This method builds a computation graph for the masking operation. The mask is applied element-wise: masked_output = input * mask. For JIT compilation, we assume a pre-computed mask or identity (no masking).

Forward(Tensor<T>)

Performs the forward pass of the masking layer.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>: The input tensor to process.

Returns

Tensor<T>: The output tensor after masking.

Remarks

This method implements the forward pass of the masking layer. It creates a binary mask where values equal to the mask value are set to 0 and other values are set to 1. This mask is then applied to the input tensor by element-wise multiplication, effectively removing the masked values from consideration.

For Beginners: This method processes your data through the masking layer.

During the forward pass:

The layer creates a "mask" - a matching array where:
- Values equal to the mask value (usually 0) become 0 in the mask
- All other values become 1 in the mask
The original input is multiplied by this mask
- Where the mask is 1, the original value passes through
- Where the mask is 0, the result becomes 0

For example, if you have data [5, 0, 7, 0, 9] and a mask value of 0:

The mask would be [1, 0, 1, 0, 1]
After applying the mask: [5, 0, 7, 0, 9] * [1, 0, 1, 0, 1] = [5, 0, 7, 0, 9]
But the zeros now have special meaning - they'll be ignored by subsequent layers

ForwardGpu(params IGpuTensor<T>[])

Performs the GPU-resident forward pass of the masking layer.

public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)

Parameters

inputs IGpuTensor<T>[]: The GPU input tensors.

Returns

IGpuTensor<T>: The GPU output tensor after masking.

Remarks

All computations stay on the GPU. Uses NotEqualScalar to create the mask and Multiply for element-wise application.

GetParameters()

Gets all trainable parameters of the layer as a single vector.

public override Vector<T> GetParameters()

Returns

Vector<T>: An empty vector since this layer has no trainable parameters.

Remarks

This method returns an empty vector because the MaskingLayer has no trainable parameters. However, it must be implemented to satisfy the base class contract.

For Beginners: This method would normally return all the values that can be learned during training.

Since this layer has no learnable values:

It returns an empty list (vector with length 0)
This is expected for layers that perform fixed operations
Other layers, like those with weights, would return those weights

ResetState()

Resets the internal state of the layer.

public override void ResetState()

Remarks

This method clears any cached data from previous forward passes, essentially resetting the layer to its initial state. This is useful when starting to process a new batch of data.

For Beginners: This method clears the layer's memory to start fresh.

When resetting the state:

Stored inputs and masks are cleared
The layer forgets any information from previous data
This is important when processing a new, unrelated batch of data

Think of it like wiping a slate clean before writing new information.

UpdateParameters(T)

Updates the parameters of the layer based on the calculated gradients.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T: The learning rate to use for the parameter updates.

Remarks

This method is empty because the MaskingLayer has no trainable parameters to update. However, it must be implemented to satisfy the base class contract.

For Beginners: This method would normally update the layer's internal values during training.

However, since this layer doesn't have any trainable parameters:

There's nothing to update
The method exists but doesn't do anything
This is normal for layers that perform fixed operations

Table of Contents

Class MaskingLayer<T>

Type Parameters

Remarks

Constructors

MaskingLayer(int[], double)

Parameters

Remarks

Properties

SupportsGpuExecution

Property Value

SupportsGpuTraining

Property Value

Remarks

SupportsJitCompilation

Property Value

SupportsTraining

Property Value

Remarks

Methods

Backward(Tensor<T>)

Parameters

Returns

Remarks

BackwardGpu(IGpuTensor<T>)

Parameters

Returns

Remarks

Exceptions

ExportComputationGraph(List<ComputationNode<T>>)

Parameters

Returns

Remarks

Forward(Tensor<T>)

Parameters

Returns

Remarks

ForwardGpu(params IGpuTensor<T>[])

Parameters

Returns

Remarks

GetParameters()

Returns

Remarks

ResetState()

Remarks

UpdateParameters(T)

Parameters

Remarks