Class MaskingLayer<T>
- Namespace
- AiDotNet.NeuralNetworks.Layers
- Assembly
- AiDotNet.dll
Represents a layer that masks specified values in the input tensor, typically used to ignore padding in sequential data.
public class MaskingLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable
Type Parameters
TThe numeric type used for calculations, typically float or double.
- Inheritance
-
LayerBase<T>MaskingLayer<T>
- Implements
-
ILayer<T>
- Inherited Members
Remarks
The MaskingLayer is used to skip certain time steps in sequential data by masking out specific values. During the forward pass, time steps with values equal to the mask value are multiplied by zero, effectively removing them from consideration by subsequent layers. This is particularly useful for handling variable-length sequences where padding is used to make all sequences the same length.
For Beginners: This layer helps the network ignore certain parts of your data.
Think of it like a highlighter that marks which parts of your data are important:
- Any value matching the "mask value" (usually 0) gets ignored
- All other values pass through unchanged
- This is especially useful for sequences of different lengths
For example, if you have sentences of different lengths:
- Short sentences might be padded with zeros to match longer ones
- The masking layer tells the network to ignore those zeros
- This helps the network focus only on the real data
Without masking, the network would try to learn patterns from the padding values, which would confuse the learning process.
Constructors
MaskingLayer(int[], double)
Initializes a new instance of the MaskingLayer<T> class.
public MaskingLayer(int[] inputShape, double maskValue = 0)
Parameters
inputShapeint[]The shape of the input tensor.
maskValuedoubleThe value to be masked out. Defaults to 0.
Remarks
This constructor creates a MaskingLayer that will mask out all values equal to the specified mask value. The output shape is the same as the input shape, as the masking operation doesn't change the dimensions.
For Beginners: This creates a new masking layer with your desired settings.
When setting up this layer:
- inputShape defines the expected size and dimensions of your data
- maskValue is the specific value you want to ignore (typically 0)
For example, if you have sequences padded with zeros, you would set maskValue to 0 so that the network ignores those padding values.
Properties
SupportsGpuExecution
Gets a value indicating whether this layer supports GPU execution.
protected override bool SupportsGpuExecution { get; }
Property Value
SupportsGpuTraining
Gets whether this layer has full GPU training support (forward, backward, and parameter updates).
public override bool SupportsGpuTraining { get; }
Property Value
Remarks
This property indicates whether the layer can perform its entire training cycle on GPU without downloading data to CPU. A layer has full GPU training support when:
- ForwardGpu is implemented
- BackwardGpu is implemented
- UpdateParametersGpu is implemented (for layers with trainable parameters)
- GPU weight/bias/gradient buffers are properly managed
For Beginners: This tells you if training can happen entirely on GPU.
GPU-resident training is much faster because:
- Data stays on GPU between forward and backward passes
- No expensive CPU-GPU transfers during each training step
- GPU kernels handle all gradient computation
Only layers that return true here can participate in fully GPU-resident training.
SupportsJitCompilation
Gets a value indicating whether this layer supports JIT compilation.
public override bool SupportsJitCompilation { get; }
Property Value
- bool
Always
truebecause masking is a simple element-wise operation that can be JIT compiled.
SupportsTraining
Gets a value indicating whether this layer supports training through backpropagation.
public override bool SupportsTraining { get; }
Property Value
Remarks
This property returns false because the MaskingLayer does not have any trainable parameters, though it does support backward pass for gradient propagation through the network.
For Beginners: This tells you if the layer can learn from training data.
A value of false means:
- This layer doesn't have any values that get updated during training
- It performs a fixed operation (masking)
- However, during training, it still helps gradients flow backward through the network
The masking layer doesn't need to learn anything - it just follows a simple rule: mask out specific values and pass everything else through.
Methods
Backward(Tensor<T>)
Performs the backward pass of the masking layer.
public override Tensor<T> Backward(Tensor<T> outputGradient)
Parameters
outputGradientTensor<T>The gradient of the loss with respect to the layer's output.
Returns
- Tensor<T>
The gradient of the loss with respect to the layer's input.
Remarks
This method implements the backward pass of the masking layer, which is used during training to propagate error gradients back through the network. It applies the same mask to the output gradient that was used in the forward pass, ensuring that gradients for masked values remain zero.
For Beginners: This method handles the flow of error information during training.
During the backward pass:
- The layer receives information about how its output affected the overall error
- It applies the same mask to this gradient information
- This ensures that no gradient flows back through the masked values
This process is important because:
- We don't want the network to learn from the masked (padding) values
- The mask stops error information from flowing back through those values
- This helps keep the training focused only on the real data
BackwardGpu(IGpuTensor<T>)
Performs the backward pass of the layer on GPU.
public override IGpuTensor<T> BackwardGpu(IGpuTensor<T> outputGradient)
Parameters
outputGradientIGpuTensor<T>The GPU-resident gradient of the loss with respect to the layer's output.
Returns
- IGpuTensor<T>
The GPU-resident gradient of the loss with respect to the layer's input.
Remarks
This method performs the layer's backward computation entirely on GPU, including:
- Computing input gradients to pass to previous layers
- Computing and storing weight gradients on GPU (for layers with trainable parameters)
- Computing and storing bias gradients on GPU
For Beginners: This is like Backward() but runs entirely on GPU.
During GPU training:
- Output gradients come in (on GPU)
- Input gradients are computed (stay on GPU)
- Weight/bias gradients are computed and stored (on GPU)
- Input gradients are returned for the previous layer
All data stays on GPU - no CPU round-trips needed!
Exceptions
- NotSupportedException
Thrown when the layer does not support GPU training.
ExportComputationGraph(List<ComputationNode<T>>)
Exports the masking layer's forward pass as a JIT-compilable computation graph.
public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)
Parameters
inputNodesList<ComputationNode<T>>List to populate with input computation nodes.
Returns
- ComputationNode<T>
The output computation node representing the masked result.
Remarks
This method builds a computation graph for the masking operation. The mask is applied element-wise: masked_output = input * mask. For JIT compilation, we assume a pre-computed mask or identity (no masking).
Forward(Tensor<T>)
Performs the forward pass of the masking layer.
public override Tensor<T> Forward(Tensor<T> input)
Parameters
inputTensor<T>The input tensor to process.
Returns
- Tensor<T>
The output tensor after masking.
Remarks
This method implements the forward pass of the masking layer. It creates a binary mask where values equal to the mask value are set to 0 and other values are set to 1. This mask is then applied to the input tensor by element-wise multiplication, effectively removing the masked values from consideration.
For Beginners: This method processes your data through the masking layer.
During the forward pass:
- The layer creates a "mask" - a matching array where:
- Values equal to the mask value (usually 0) become 0 in the mask
- All other values become 1 in the mask
- The original input is multiplied by this mask
- Where the mask is 1, the original value passes through
- Where the mask is 0, the result becomes 0
For example, if you have data [5, 0, 7, 0, 9] and a mask value of 0:
- The mask would be [1, 0, 1, 0, 1]
- After applying the mask: [5, 0, 7, 0, 9] * [1, 0, 1, 0, 1] = [5, 0, 7, 0, 9]
- But the zeros now have special meaning - they'll be ignored by subsequent layers
ForwardGpu(params IGpuTensor<T>[])
Performs the GPU-resident forward pass of the masking layer.
public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)
Parameters
inputsIGpuTensor<T>[]The GPU input tensors.
Returns
- IGpuTensor<T>
The GPU output tensor after masking.
Remarks
All computations stay on the GPU. Uses NotEqualScalar to create the mask and Multiply for element-wise application.
GetParameters()
Gets all trainable parameters of the layer as a single vector.
public override Vector<T> GetParameters()
Returns
- Vector<T>
An empty vector since this layer has no trainable parameters.
Remarks
This method returns an empty vector because the MaskingLayer has no trainable parameters. However, it must be implemented to satisfy the base class contract.
For Beginners: This method would normally return all the values that can be learned during training.
Since this layer has no learnable values:
- It returns an empty list (vector with length 0)
- This is expected for layers that perform fixed operations
- Other layers, like those with weights, would return those weights
ResetState()
Resets the internal state of the layer.
public override void ResetState()
Remarks
This method clears any cached data from previous forward passes, essentially resetting the layer to its initial state. This is useful when starting to process a new batch of data.
For Beginners: This method clears the layer's memory to start fresh.
When resetting the state:
- Stored inputs and masks are cleared
- The layer forgets any information from previous data
- This is important when processing a new, unrelated batch of data
Think of it like wiping a slate clean before writing new information.
UpdateParameters(T)
Updates the parameters of the layer based on the calculated gradients.
public override void UpdateParameters(T learningRate)
Parameters
learningRateTThe learning rate to use for the parameter updates.
Remarks
This method is empty because the MaskingLayer has no trainable parameters to update. However, it must be implemented to satisfy the base class contract.
For Beginners: This method would normally update the layer's internal values during training.
However, since this layer doesn't have any trainable parameters:
- There's nothing to update
- The method exists but doesn't do anything
- This is normal for layers that perform fixed operations