Table of Contents

Class TimeDistributedLayer<T>

Namespace
AiDotNet.NeuralNetworks.Layers
Assembly
AiDotNet.dll

Represents a wrapper layer that applies an inner layer to each time step of a sequence independently.

public class TimeDistributedLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable

Type Parameters

T

The numeric type used for calculations, typically float or double.

Inheritance
TimeDistributedLayer<T>
Implements
Inherited Members

Remarks

A time distributed layer applies the same inner layer (and its operations) to each time step of a sequence independently. This is particularly useful for processing sequential data where the same transformation needs to be applied to each element in the sequence. The layer maintains the temporal structure of the data while allowing each time step to be processed by the inner layer.

For Beginners: This layer helps process sequences of data by applying the same operation to each step.

Think of it like an assembly line worker who performs the same task on each item that passes by:

  • You have a sequence of items (like frames in a video or words in a sentence)
  • You want to apply the same operation to each item independently
  • This layer automates that process while preserving the original sequence order

For example, if you have a video with 30 frames per second, and you want to detect objects in each frame:

  • A normal layer would need to process all frames together
  • This time distributed layer would apply your object detection layer to each frame separately
  • The result would be object detections for each frame, still organized as a sequence

This makes it much easier to work with sequential data like videos, sentences, or time series.

Constructors

TimeDistributedLayer(LayerBase<T>, IActivationFunction<T>?, int[]?)

Initializes a new instance of the TimeDistributedLayer<T> class with scalar activation function.

public TimeDistributedLayer(LayerBase<T> innerLayer, IActivationFunction<T>? activationFunction = null, int[]? inputShape = null)

Parameters

innerLayer LayerBase<T>

The layer to apply to each time step.

activationFunction IActivationFunction<T>

The activation function to apply after processing. Defaults to ReLU if not specified.

inputShape int[]

Optional explicit input shape. If not provided, derived from the inner layer.

Remarks

This constructor creates a time distributed layer that applies the specified inner layer to each time step of a sequence. It also applies the specified scalar activation function to the output. The input shape can be explicitly provided or derived from the inner layer's input shape.

For Beginners: This constructor creates a new time distributed layer.

The parameters you provide determine:

  • innerLayer: What operation to apply to each time step in the sequence
  • activationFunction: What mathematical function to apply to the results (ReLU is default)
  • inputShape: The expected shape of incoming data (optional, can be figured out automatically)

For example, if processing a sequence of images, you might wrap a convolutional layer with this time distributed layer to apply the same convolutional operations to each image frame independently.

TimeDistributedLayer(LayerBase<T>, IVectorActivationFunction<T>?, int[]?)

Initializes a new instance of the TimeDistributedLayer<T> class with vector activation function.

public TimeDistributedLayer(LayerBase<T> innerLayer, IVectorActivationFunction<T>? vectorActivationFunction = null, int[]? inputShape = null)

Parameters

innerLayer LayerBase<T>

The layer to apply to each time step.

vectorActivationFunction IVectorActivationFunction<T>

The vector activation function to apply after processing. Defaults to ReLU if not specified.

inputShape int[]

Optional explicit input shape. If not provided, derived from the inner layer.

Remarks

This constructor creates a time distributed layer that applies the specified inner layer to each time step of a sequence. It also applies the specified vector activation function to the output. The input shape can be explicitly provided or derived from the inner layer's input shape.

For Beginners: This constructor is similar to the previous one, but uses vector activations.

Vector activations:

  • Process entire groups of numbers at once, rather than one at a time
  • Can capture relationships between different elements
  • Allow for more complex transformations

This version is useful when you need more sophisticated processing that considers how different features relate to each other, rather than treating each feature independently.

Properties

SupportsGpuExecution

Gets whether this layer has a GPU execution implementation for inference.

protected override bool SupportsGpuExecution { get; }

Property Value

bool

Remarks

Override this to return true when the layer implements ForwardGpu(params IGpuTensor<T>[]). The actual CanExecuteOnGpu property combines this with engine availability.

For Beginners: This flag indicates if the layer has GPU code for the forward pass. Set this to true in derived classes that implement ForwardGpu.

SupportsJitCompilation

Gets a value indicating whether this layer supports JIT compilation.

public override bool SupportsJitCompilation { get; }

Property Value

bool

true if the inner layer supports JIT compilation; otherwise, false.

Remarks

JIT compilation for TimeDistributed delegates to the inner layer. The time distributed behavior is achieved by reshaping the input so that time steps are treated as batch samples, allowing the inner layer to process all time steps in parallel.

SupportsTraining

Gets a value indicating whether this layer supports training.

public override bool SupportsTraining { get; }

Property Value

bool

true if the inner layer supports training; otherwise, false.

Remarks

This property indicates whether the time distributed layer can be trained. It simply forwards the value of the inner layer's SupportsTraining property, as the time distributed layer's trainability depends entirely on whether its inner layer can be trained.

For Beginners: This property tells you if the layer can learn from data.

Rather than having its own answer, this layer checks if the inner layer can learn:

  • If the inner layer supports training, this layer also supports training
  • If the inner layer doesn't support training, this layer also doesn't support training

This makes sense because:

  • The time distributed layer doesn't have its own trainable parameters
  • It just organizes how the inner layer is applied to sequences
  • The actual learning happens in the inner layer

Methods

Backward(Tensor<T>)

Performs the backward pass of the time distributed layer.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>

The gradient of the loss with respect to the layer's output.

Returns

Tensor<T>

The gradient of the loss with respect to the layer's input.

Remarks

This method implements the backward pass of the time distributed layer, which is used during training to propagate error gradients back through the network. It first computes the gradient with respect to the activation function, then iterates over each time step, applies the inner layer's backward pass to that time step's gradient, and collects the results into an input gradient sequence.

For Beginners: This method is used during training to calculate how the layer's input should change to reduce errors.

During the backward pass:

  1. The layer receives information about how its output should change (outputGradient)
  2. It first adjusts this gradient based on the activation function
  3. For each step in the sequence:
    • It extracts just that step's gradient
    • It passes that gradient backward through the inner layer
    • It collects the resulting input gradient
  4. All the individual input gradients are combined back into a sequence

This process tells the layer how its inputs should change to reduce errors, while maintaining the same time-step-by-time-step processing as the forward pass.

Exceptions

InvalidOperationException

Thrown when trying to perform a backward pass before a forward pass.

BackwardGpu(IGpuTensor<T>)

Computes the gradient of the loss with respect to the input on the GPU.

public override IGpuTensor<T> BackwardGpu(IGpuTensor<T> outputGradient)

Parameters

outputGradient IGpuTensor<T>

The gradient of the loss with respect to the layer's output.

Returns

IGpuTensor<T>

The gradient of the loss with respect to the layer's input.

ExportComputationGraph(List<ComputationNode<T>>)

Exports the layer's computation graph for JIT compilation.

public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)

Parameters

inputNodes List<ComputationNode<T>>

List to populate with input computation nodes.

Returns

ComputationNode<T>

The output computation node representing the layer's operation.

Remarks

This method constructs a computation graph representation of the layer's forward pass that can be JIT compiled for faster inference. All layers MUST implement this method to support JIT compilation.

For Beginners: JIT (Just-In-Time) compilation converts the layer's operations into optimized native code for 5-10x faster inference.

To support JIT compilation, a layer must:

  1. Implement this method to export its computation graph
  2. Set SupportsJitCompilation to true
  3. Use ComputationNode and TensorOperations to build the graph

All layers are required to implement this method, even if they set SupportsJitCompilation = false.

Forward(Tensor<T>)

Performs the forward pass of the time distributed layer.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>

The input tensor to process.

Returns

Tensor<T>

The output tensor after processing each time step.

Remarks

This method implements the forward pass of the time distributed layer. It iterates over each time step in the input sequence, applies the inner layer to that time step, and collects the results into an output sequence. Finally, it applies the activation function to the entire output.

For Beginners: This method processes the input sequence through the layer.

During the forward pass:

  1. The layer receives a sequence of inputs
  2. For each step in the sequence:
    • It extracts just that step's data
    • It passes that data through the inner layer
    • It collects the result
  3. All the individual results are combined back into a sequence
  4. The activation function is applied to the entire output

For example, with a video input, this would:

  • Process each frame individually through the inner layer
  • Maintain the original frame order in the output
  • Apply the activation function to all processed frames

ForwardGpu(params IGpuTensor<T>[])

Performs the forward pass of the layer on GPU.

public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)

Parameters

inputs IGpuTensor<T>[]

The GPU-resident input tensor(s).

Returns

IGpuTensor<T>

The GPU-resident output tensor.

Remarks

This method performs the layer's forward computation entirely on GPU. The input and output tensors remain in GPU memory, avoiding expensive CPU-GPU transfers.

For Beginners: This is like Forward() but runs on the graphics card.

The key difference:

  • Forward() uses CPU tensors that may be copied to/from GPU
  • ForwardGpu() keeps everything on GPU the whole time

Override this in derived classes that support GPU acceleration.

Exceptions

NotSupportedException

Thrown when the layer does not support GPU execution.

GetParameters()

Gets all trainable parameters of the inner layer.

public override Vector<T> GetParameters()

Returns

Vector<T>

A vector containing all trainable parameters from the inner layer.

Remarks

This method retrieves all trainable parameters from the inner layer. The time distributed layer itself doesn't have trainable parameters; it simply delegates to the inner layer.

For Beginners: This method collects all the learnable values from the inner layer.

Since the time distributed layer:

  • Doesn't have its own parameters to learn
  • Simply applies the inner layer multiple times

It returns the inner layer's parameters, which are:

  • The numbers that the neural network learns during training
  • The same across all time steps (parameter sharing)

This parameter sharing is a key feature - it means the layer learns patterns that can be applied to any time step, rather than learning different patterns for different positions in the sequence.

ResetState()

Resets the internal state of the layer and its inner layer.

public override void ResetState()

Remarks

This method resets the internal state of the time distributed layer and its inner layer. It clears the cached input and output tensors and delegates to the inner layer to reset its state as well.

For Beginners: This method clears the layer's memory to start fresh.

When resetting the state:

  • The layer forgets what inputs and outputs it recently processed
  • It also tells its inner layer to reset its own state
  • This prepares the layer to process new, unrelated sequences

This is important when:

  • Starting to process a new, unrelated sequence
  • Testing the layer with fresh inputs
  • Beginning a new training episode

Think of it like clearing your mind before starting a completely new task.

UpdateParameters(T)

Updates the parameters of the inner layer.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T

The learning rate to use for parameter updates.

Remarks

This method updates the parameters of the inner layer based on the gradients calculated during the backward pass. The time distributed layer itself doesn't have trainable parameters; it simply delegates the update to the inner layer.

For Beginners: This method updates the inner layer's parameters during training.

The time distributed layer:

  • Doesn't have its own parameters to update
  • Simply passes the learning rate to the inner layer
  • Lets the inner layer adjust its own parameters

This works because the time distributed layer is just a wrapper that changes how the inner layer is applied to sequences, but doesn't change the inner layer's learning process.