Class SplitLayer<T>

Namespace: AiDotNet.NeuralNetworks.Layers

Assembly: AiDotNet.dll

Represents a layer that splits the input tensor along a specific dimension into multiple equal parts.

public class SplitLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable

Type Parameters

T: The numeric type used for calculations, typically float or double.

Inheritance: object

LayerBase<T>

SplitLayer<T>

Implements: ILayer<T>

IJitCompilable<T>

IDiagnosticsProvider

IWeightLoadable<T>

IDisposable

Inherited Members: LayerBase<T>.Engine

LayerBase<T>.ScalarActivation

LayerBase<T>.VectorActivation

LayerBase<T>.UsingVectorActivation

LayerBase<T>.NumOps

LayerBase<T>.Random

LayerBase<T>.Parameters

LayerBase<T>.ParameterGradients

LayerBase<T>.InputShape

LayerBase<T>.InputShapes

LayerBase<T>.UpdateInputShape(int[])

LayerBase<T>.OutputShape

LayerBase<T>.IsTrainingMode

LayerBase<T>.InitializationStrategy

LayerBase<T>.IsInitialized

LayerBase<T>.InitializationLock

LayerBase<T>.EnsureInitialized()

LayerBase<T>.UseAutodiff

LayerBase<T>.SetTrainingMode(bool)

LayerBase<T>.GetParameterGradients()

LayerBase<T>.ClearGradients()

LayerBase<T>.GetInputShape()

LayerBase<T>.GetInputShapes()

LayerBase<T>.GetOutputShape()

LayerBase<T>.GetWeights()

LayerBase<T>.GetBiases()

LayerBase<T>.MapActivationToFused()

LayerBase<T>.CanExecuteOnGpu

LayerBase<T>.CanTrainOnGpu

LayerBase<T>.UpdateParametersGpu(IGpuOptimizerConfig)

LayerBase<T>.UploadWeightsToGpu()

LayerBase<T>.DownloadWeightsFromGpu()

LayerBase<T>.ZeroGradientsGpu()

LayerBase<T>.GetActivationTypes()

LayerBase<T>.Forward(params Tensor<T>[])

LayerBase<T>.ApplyActivation(Tensor<T>)

LayerBase<T>.ApplyActivation(Vector<T>)

LayerBase<T>.ActivateTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ActivateTensor(IVectorActivationFunction<T>, Tensor<T>)

LayerBase<T>.CalculateInputShape(int, int, int)

LayerBase<T>.CalculateOutputShape(int, int, int)

LayerBase<T>.Clone()

LayerBase<T>.DerivativeTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ApplyActivationDerivative(T, T)

LayerBase<T>.ApplyActivationDerivative(Tensor<T>, Tensor<T>)

LayerBase<T>.ComputeActivationJacobian(Vector<T>)

LayerBase<T>.ApplyActivationDerivative(Vector<T>, Vector<T>)

LayerBase<T>.UpdateParameters(Vector<T>)

LayerBase<T>.ParameterCount

LayerBase<T>.Serialize(BinaryWriter)

LayerBase<T>.Deserialize(BinaryReader)

LayerBase<T>.SetParameters(Vector<T>)

LayerBase<T>.GetDiagnostics()

LayerBase<T>.ApplyActivationToGraph(ComputationNode<T>)

LayerBase<T>.CanActivationBeJitted()

LayerBase<T>.RegisterTrainableParameter(Tensor<T>, PersistentTensorRole)

LayerBase<T>.InvalidateTrainableParameter(Tensor<T>)

LayerBase<T>.HasGpuActivation()

LayerBase<T>.ApplyActivationForwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.ApplyActivationBackwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.GetFusedActivationType()

LayerBase<T>.ApplyGpuActivation(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int, FusedActivationType)

LayerBase<T>.ApplyGpuActivationBackward(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int, FusedActivationType, float)

LayerBase<T>.Dispose()

LayerBase<T>.Dispose(bool)

LayerBase<T>.WeightParameterName

LayerBase<T>.BiasParameterName

LayerBase<T>.SetWeights(Tensor<T>)

LayerBase<T>.SetBiases(Tensor<T>)

LayerBase<T>.GetParameterNames()

LayerBase<T>.TryGetParameter(string, out Tensor<T>)

LayerBase<T>.SetParameter(string, Tensor<T>)

LayerBase<T>.GetParameterShape(string)

LayerBase<T>.NamedParameterCount

LayerBase<T>.ValidateWeights(IEnumerable<string>, Func<string, string>)

LayerBase<T>.LoadWeights(Dictionary<string, Tensor<T>>, Func<string, string>, bool)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

A split layer divides the input tensor into multiple equal parts along a specified dimension. This is useful for parallel processing of data or for implementing multi-headed attention mechanisms. The layer ensures that the input size is divisible by the number of splits to maintain consistency.

For Beginners: This layer breaks up your input data into smaller, equal-sized chunks.

Think of it like cutting a pizza into equal slices:

Your input data is the whole pizza
The number of splits determines how many slices you want
Each slice has the same size and shape

Benefits include:

Processing different parts of the input in parallel
Allowing different operations on different parts of the input
Creating multi-stream architectures where each stream handles a portion of the data

For example, in natural language processing, you might split word embeddings to create multiple "attention heads" that each focus on different aspects of the text.

Constructors

SplitLayer(int[], int)

Initializes a new instance of the SplitLayer<T> class.

public SplitLayer(int[] inputShape, int numSplits)

Parameters

inputShape int[]: The shape of the input tensor.
numSplits int: The number of parts to split the input tensor into.

Remarks

This constructor creates a split layer with the specified input shape and number of splits. It verifies that the input size is divisible by the number of splits to ensure all splits have the same size.

For Beginners: This sets up a new layer that will divide the input into equal parts.

When creating a split layer, you need to specify:

inputShape: The dimensions of the data going into the layer
numSplits: How many equal pieces to divide the input into

The constructor checks that the input can be divided equally by the number of splits. For example, if your input has 100 features and you want 4 splits, that works (100 ÷ 4 = 25). But if your input has 100 features and you want 3 splits, that won't work because you'd get splits of size 33.33... which isn't a whole number.

Properties

SupportsGpuExecution

Gets a value indicating whether this layer supports GPU execution. SplitLayer uses GPU Reshape operations.

protected override bool SupportsGpuExecution { get; }

Property Value

bool

SupportsGpuTraining

Gets whether this layer has full GPU training support (forward, backward, and parameter updates).

public override bool SupportsGpuTraining { get; }

Property Value

bool

Remarks

This property indicates whether the layer can perform its entire training cycle on GPU without downloading data to CPU. A layer has full GPU training support when:

ForwardGpu is implemented
BackwardGpu is implemented
UpdateParametersGpu is implemented (for layers with trainable parameters)
GPU weight/bias/gradient buffers are properly managed

For Beginners: This tells you if training can happen entirely on GPU.

GPU-resident training is much faster because:

Data stays on GPU between forward and backward passes
No expensive CPU-GPU transfers during each training step
GPU kernels handle all gradient computation

Only layers that return true here can participate in fully GPU-resident training.

SupportsJitCompilation

Gets whether this layer supports JIT compilation.

public override bool SupportsJitCompilation { get; }

Property Value

bool: True if the layer can be JIT compiled, false otherwise.

Remarks

This property indicates whether the layer has implemented ExportComputationGraph() and can benefit from JIT compilation. All layers MUST implement this property.

For Beginners: JIT compilation can make inference 5-10x faster by converting the layer's operations into optimized native code.

Layers should return false if they:

Have not yet implemented a working ExportComputationGraph()
Use dynamic operations that change based on input data
Are too simple to benefit from JIT compilation

When false, the layer will use the standard Forward() method instead.

SupportsTraining

Gets a value indicating whether this layer supports training through backpropagation.

public override bool SupportsTraining { get; }

Property Value

bool: Always returns true as split layers can propagate gradients.

Remarks

This property indicates that the split layer can participate in the training process by propagating gradients. Although the layer has no trainable parameters itself, it can pass gradients back to previous layers.

For Beginners: This property tells you that the layer can be used during training.

Even though this layer doesn't have any parameters that need to be adjusted:

It can still pass error information backward to previous layers during training
It participates in the backpropagation process

This allows the layer to be included in networks that learn from data.

Methods

Backward(Tensor<T>)

Performs the backward pass of the split layer.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>: The gradient of the loss with respect to the layer's output.

Returns

Tensor<T>: The gradient of the loss with respect to the layer's input.

Remarks

This method implements the backward pass of the split layer, which is used during training to propagate error gradients back through the network. It recombines the gradients from all splits into a single gradient tensor matching the original input shape.

For Beginners: This method reverses the splitting process for training.

During the backward pass:

The method throws an error if the forward pass hasn't been called first
It calculates how big each split is
It creates a gradient tensor matching the original input shape
It copies the gradient values from each split back to their original positions

This process ensures that error information flows backward through the network properly, allowing layers before the split to learn from the training process.

BackwardGpu(IGpuTensor<T>)

Performs the backward pass of the layer on GPU.

public override IGpuTensor<T> BackwardGpu(IGpuTensor<T> outputGradient)

Parameters

outputGradient IGpuTensor<T>: The GPU-resident gradient of the loss with respect to the layer's output.

Returns

IGpuTensor<T>: The GPU-resident gradient of the loss with respect to the layer's input.

Remarks

This method performs the layer's backward computation entirely on GPU, including:

Computing input gradients to pass to previous layers
Computing and storing weight gradients on GPU (for layers with trainable parameters)
Computing and storing bias gradients on GPU

For Beginners: This is like Backward() but runs entirely on GPU.

During GPU training:

Output gradients come in (on GPU)
Input gradients are computed (stay on GPU)
Weight/bias gradients are computed and stored (on GPU)
Input gradients are returned for the previous layer

All data stays on GPU - no CPU round-trips needed!

Exceptions

NotSupportedException: Thrown when the layer does not support GPU training.

ExportComputationGraph(List<ComputationNode<T>>)

Exports the split layer as a computation graph for JIT compilation.

public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)

Parameters

inputNodes List<ComputationNode<T>>: List to which the input node will be added.

Returns

ComputationNode<T>: The output computation node representing the split operation.

Remarks

The split layer is implemented as a reshape operation that adds a new dimension. Input shape [batch, inputSize] is reshaped to [batch, numSplits, splitSize].

Forward(Tensor<T>)

Performs the forward pass of the split layer.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>: The input tensor to process.

Returns

Tensor<T>: The output tensor after splitting.

Remarks

This method implements the forward pass of the split layer. It divides the input tensor into multiple equal-sized parts along the specified dimension and returns a new tensor containing all the splits.

For Beginners: This method does the actual work of splitting the input data.

During the forward pass:

The input is saved for later use in training
The method calculates how big each split should be
It creates a new tensor with an additional dimension to hold all the splits
It copies the data from the input into the appropriate positions in the output

After splitting, the data will have a new dimension that indicates which split each piece belongs to. For example, if you split a batch of 10 samples with 100 features into 5 splits, you'll get an output with shape [10, 5, 20], where 20 is the size of each split.

ForwardGpu(params IGpuTensor<T>[])

Performs the forward pass using GPU-resident tensors.

public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)

Parameters

inputs IGpuTensor<T>[]: The GPU-resident input tensors.

Returns

IGpuTensor<T>: A GPU-resident output tensor after splitting.

Remarks

SplitLayer is implemented as a reshape operation that stays entirely GPU-resident. No data is downloaded to CPU during inference.

GetParameters()

Gets all trainable parameters of the layer as a single vector.

public override Vector<T> GetParameters()

Returns

Vector<T>: An empty vector since this layer has no trainable parameters.

Remarks

This method returns an empty vector since the split layer has no trainable parameters. It is implemented to satisfy the interface requirements of LayerBase.

For Beginners: This method returns an empty list because the layer has no parameters.

Since the split layer doesn't modify the data in any way that requires learning:

There are no weights or biases to adjust
This method returns an empty vector (a list with no elements)

Other layers would return their weights and biases here, which would be used for saving the model or applying optimization techniques.

ResetState()

Resets the internal state of the split layer.

public override void ResetState()

Remarks

This method resets the internal state of the split layer, clearing the cached input. This is useful when starting to process a new batch or when implementing stateful networks.

For Beginners: This method clears the layer's memory to start fresh.

When resetting the state:

The stored input from the previous forward pass is cleared

This is important for:

Processing a new batch of unrelated data
Preventing information from one batch affecting another
Starting a new training episode

Think of it like clearing your workspace before starting a new project - it ensures that old information doesn't interfere with new processing.

UpdateParameters(T)

Updates the parameters of the layer using the calculated gradients.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T: The learning rate to use for the parameter updates.

Remarks

This method is a no-op for the split layer since it has no trainable parameters to update. It is implemented to satisfy the interface requirements of LayerBase.

For Beginners: This method doesn't do anything in the split layer.

Since the split layer doesn't have any trainable parameters:

There's nothing to update during training
This method exists just to fulfill the requirements of being a layer

Other layers would use this method to update their weights and biases, but the split layer simply passes data through without modification.

Table of Contents

Class SplitLayer<T>

Type Parameters

Remarks

Constructors

SplitLayer(int[], int)

Parameters

Remarks

Properties

SupportsGpuExecution

Property Value

SupportsGpuTraining

Property Value

Remarks

SupportsJitCompilation

Property Value

Remarks

SupportsTraining

Property Value

Remarks

Methods

Backward(Tensor<T>)

Parameters

Returns

Remarks

BackwardGpu(IGpuTensor<T>)

Parameters

Returns

Remarks

Exceptions

ExportComputationGraph(List<ComputationNode<T>>)

Parameters

Returns

Remarks

Forward(Tensor<T>)

Parameters

Returns

Remarks

ForwardGpu(params IGpuTensor<T>[])

Parameters

Returns

Remarks

GetParameters()

Returns

Remarks

ResetState()

Remarks

UpdateParameters(T)

Parameters

Remarks