Class SplitLayer<T>
- Namespace
- AiDotNet.NeuralNetworks.Layers
- Assembly
- AiDotNet.dll
Represents a layer that splits the input tensor along a specific dimension into multiple equal parts.
public class SplitLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable
Type Parameters
TThe numeric type used for calculations, typically float or double.
- Inheritance
-
LayerBase<T>SplitLayer<T>
- Implements
-
ILayer<T>
- Inherited Members
Remarks
A split layer divides the input tensor into multiple equal parts along a specified dimension. This is useful for parallel processing of data or for implementing multi-headed attention mechanisms. The layer ensures that the input size is divisible by the number of splits to maintain consistency.
For Beginners: This layer breaks up your input data into smaller, equal-sized chunks.
Think of it like cutting a pizza into equal slices:
- Your input data is the whole pizza
- The number of splits determines how many slices you want
- Each slice has the same size and shape
Benefits include:
- Processing different parts of the input in parallel
- Allowing different operations on different parts of the input
- Creating multi-stream architectures where each stream handles a portion of the data
For example, in natural language processing, you might split word embeddings to create multiple "attention heads" that each focus on different aspects of the text.
Constructors
SplitLayer(int[], int)
Initializes a new instance of the SplitLayer<T> class.
public SplitLayer(int[] inputShape, int numSplits)
Parameters
inputShapeint[]The shape of the input tensor.
numSplitsintThe number of parts to split the input tensor into.
Remarks
This constructor creates a split layer with the specified input shape and number of splits. It verifies that the input size is divisible by the number of splits to ensure all splits have the same size.
For Beginners: This sets up a new layer that will divide the input into equal parts.
When creating a split layer, you need to specify:
- inputShape: The dimensions of the data going into the layer
- numSplits: How many equal pieces to divide the input into
The constructor checks that the input can be divided equally by the number of splits. For example, if your input has 100 features and you want 4 splits, that works (100 รท 4 = 25). But if your input has 100 features and you want 3 splits, that won't work because you'd get splits of size 33.33... which isn't a whole number.
Properties
SupportsGpuExecution
Gets a value indicating whether this layer supports GPU execution. SplitLayer uses GPU Reshape operations.
protected override bool SupportsGpuExecution { get; }
Property Value
SupportsGpuTraining
Gets whether this layer has full GPU training support (forward, backward, and parameter updates).
public override bool SupportsGpuTraining { get; }
Property Value
Remarks
This property indicates whether the layer can perform its entire training cycle on GPU without downloading data to CPU. A layer has full GPU training support when:
- ForwardGpu is implemented
- BackwardGpu is implemented
- UpdateParametersGpu is implemented (for layers with trainable parameters)
- GPU weight/bias/gradient buffers are properly managed
For Beginners: This tells you if training can happen entirely on GPU.
GPU-resident training is much faster because:
- Data stays on GPU between forward and backward passes
- No expensive CPU-GPU transfers during each training step
- GPU kernels handle all gradient computation
Only layers that return true here can participate in fully GPU-resident training.
SupportsJitCompilation
Gets whether this layer supports JIT compilation.
public override bool SupportsJitCompilation { get; }
Property Value
- bool
True if the layer can be JIT compiled, false otherwise.
Remarks
This property indicates whether the layer has implemented ExportComputationGraph() and can benefit from JIT compilation. All layers MUST implement this property.
For Beginners: JIT compilation can make inference 5-10x faster by converting the layer's operations into optimized native code.
Layers should return false if they:
- Have not yet implemented a working ExportComputationGraph()
- Use dynamic operations that change based on input data
- Are too simple to benefit from JIT compilation
When false, the layer will use the standard Forward() method instead.
SupportsTraining
Gets a value indicating whether this layer supports training through backpropagation.
public override bool SupportsTraining { get; }
Property Value
- bool
Always returns
trueas split layers can propagate gradients.
Remarks
This property indicates that the split layer can participate in the training process by propagating gradients. Although the layer has no trainable parameters itself, it can pass gradients back to previous layers.
For Beginners: This property tells you that the layer can be used during training.
Even though this layer doesn't have any parameters that need to be adjusted:
- It can still pass error information backward to previous layers during training
- It participates in the backpropagation process
This allows the layer to be included in networks that learn from data.
Methods
Backward(Tensor<T>)
Performs the backward pass of the split layer.
public override Tensor<T> Backward(Tensor<T> outputGradient)
Parameters
outputGradientTensor<T>The gradient of the loss with respect to the layer's output.
Returns
- Tensor<T>
The gradient of the loss with respect to the layer's input.
Remarks
This method implements the backward pass of the split layer, which is used during training to propagate error gradients back through the network. It recombines the gradients from all splits into a single gradient tensor matching the original input shape.
For Beginners: This method reverses the splitting process for training.
During the backward pass:
- The method throws an error if the forward pass hasn't been called first
- It calculates how big each split is
- It creates a gradient tensor matching the original input shape
- It copies the gradient values from each split back to their original positions
This process ensures that error information flows backward through the network properly, allowing layers before the split to learn from the training process.
BackwardGpu(IGpuTensor<T>)
Performs the backward pass of the layer on GPU.
public override IGpuTensor<T> BackwardGpu(IGpuTensor<T> outputGradient)
Parameters
outputGradientIGpuTensor<T>The GPU-resident gradient of the loss with respect to the layer's output.
Returns
- IGpuTensor<T>
The GPU-resident gradient of the loss with respect to the layer's input.
Remarks
This method performs the layer's backward computation entirely on GPU, including:
- Computing input gradients to pass to previous layers
- Computing and storing weight gradients on GPU (for layers with trainable parameters)
- Computing and storing bias gradients on GPU
For Beginners: This is like Backward() but runs entirely on GPU.
During GPU training:
- Output gradients come in (on GPU)
- Input gradients are computed (stay on GPU)
- Weight/bias gradients are computed and stored (on GPU)
- Input gradients are returned for the previous layer
All data stays on GPU - no CPU round-trips needed!
Exceptions
- NotSupportedException
Thrown when the layer does not support GPU training.
ExportComputationGraph(List<ComputationNode<T>>)
Exports the split layer as a computation graph for JIT compilation.
public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)
Parameters
inputNodesList<ComputationNode<T>>List to which the input node will be added.
Returns
- ComputationNode<T>
The output computation node representing the split operation.
Remarks
The split layer is implemented as a reshape operation that adds a new dimension. Input shape [batch, inputSize] is reshaped to [batch, numSplits, splitSize].
Forward(Tensor<T>)
Performs the forward pass of the split layer.
public override Tensor<T> Forward(Tensor<T> input)
Parameters
inputTensor<T>The input tensor to process.
Returns
- Tensor<T>
The output tensor after splitting.
Remarks
This method implements the forward pass of the split layer. It divides the input tensor into multiple equal-sized parts along the specified dimension and returns a new tensor containing all the splits.
For Beginners: This method does the actual work of splitting the input data.
During the forward pass:
- The input is saved for later use in training
- The method calculates how big each split should be
- It creates a new tensor with an additional dimension to hold all the splits
- It copies the data from the input into the appropriate positions in the output
After splitting, the data will have a new dimension that indicates which split each piece belongs to. For example, if you split a batch of 10 samples with 100 features into 5 splits, you'll get an output with shape [10, 5, 20], where 20 is the size of each split.
ForwardGpu(params IGpuTensor<T>[])
Performs the forward pass using GPU-resident tensors.
public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)
Parameters
inputsIGpuTensor<T>[]The GPU-resident input tensors.
Returns
- IGpuTensor<T>
A GPU-resident output tensor after splitting.
Remarks
SplitLayer is implemented as a reshape operation that stays entirely GPU-resident. No data is downloaded to CPU during inference.
GetParameters()
Gets all trainable parameters of the layer as a single vector.
public override Vector<T> GetParameters()
Returns
- Vector<T>
An empty vector since this layer has no trainable parameters.
Remarks
This method returns an empty vector since the split layer has no trainable parameters. It is implemented to satisfy the interface requirements of LayerBase.
For Beginners: This method returns an empty list because the layer has no parameters.
Since the split layer doesn't modify the data in any way that requires learning:
- There are no weights or biases to adjust
- This method returns an empty vector (a list with no elements)
Other layers would return their weights and biases here, which would be used for saving the model or applying optimization techniques.
ResetState()
Resets the internal state of the split layer.
public override void ResetState()
Remarks
This method resets the internal state of the split layer, clearing the cached input. This is useful when starting to process a new batch or when implementing stateful networks.
For Beginners: This method clears the layer's memory to start fresh.
When resetting the state:
- The stored input from the previous forward pass is cleared
This is important for:
- Processing a new batch of unrelated data
- Preventing information from one batch affecting another
- Starting a new training episode
Think of it like clearing your workspace before starting a new project - it ensures that old information doesn't interfere with new processing.
UpdateParameters(T)
Updates the parameters of the layer using the calculated gradients.
public override void UpdateParameters(T learningRate)
Parameters
learningRateTThe learning rate to use for the parameter updates.
Remarks
This method is a no-op for the split layer since it has no trainable parameters to update. It is implemented to satisfy the interface requirements of LayerBase.
For Beginners: This method doesn't do anything in the split layer.
Since the split layer doesn't have any trainable parameters:
- There's nothing to update during training
- This method exists just to fulfill the requirements of being a layer
Other layers would use this method to update their weights and biases, but the split layer simply passes data through without modification.