Class TimeEmbeddingLayer<T>
- Namespace
- AiDotNet.NeuralNetworks.Layers
- Assembly
- AiDotNet.dll
Represents a time embedding layer that encodes timesteps using sinusoidal embeddings for diffusion models.
public class TimeEmbeddingLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable
Type Parameters
TThe numeric type used for calculations, typically float or double.
- Inheritance
-
LayerBase<T>TimeEmbeddingLayer<T>
- Implements
-
ILayer<T>
- Inherited Members
Remarks
The time embedding layer converts scalar timesteps into high-dimensional embeddings using sinusoidal functions, similar to positional encodings in transformers. This embedding is then projected through a small MLP to produce the final time conditioning vector used in diffusion U-Net blocks.
For Beginners: In diffusion models, the network needs to know "what time step are we at?"
- At early timesteps (t near 0), images are clean and noise is minimal
- At late timesteps (t near T), images are mostly noise
- The network needs this information to know how much denoising to apply
This layer encodes the timestep number into a rich vector representation that:
- Uses sine and cosine functions at different frequencies (sinusoidal encoding)
- Passes through a small neural network (MLP) to learn task-specific representations
- Gets injected into every ResNet block of the U-Net
The sinusoidal encoding is inspired by transformer positional encodings:
- Low frequencies capture coarse time information
- High frequencies capture fine-grained time details
Constructors
TimeEmbeddingLayer(int, int, int)
Initializes a new instance of the TimeEmbeddingLayer<T> class.
public TimeEmbeddingLayer(int embeddingDim, int outputDim, int maxTimestep = 1000)
Parameters
embeddingDimintThe dimension of the sinusoidal embedding (typically model_dim / 4).
outputDimintThe dimension of the output embedding (typically model_dim * 4).
maxTimestepintMaximum timestep value for normalization. Default: 1000.
Remarks
For Beginners: Create a time embedding layer with specified dimensions.
Common configurations:
- embeddingDim = 64, outputDim = 256 for small models
- embeddingDim = 128, outputDim = 512 for medium models
- embeddingDim = 320, outputDim = 1280 for Stable Diffusion scale
The layer consists of:
- Sinusoidal encoding: timestep -> [embeddingDim] features
- Linear1 + SiLU: [embeddingDim] -> [outputDim]
- Linear2: [outputDim] -> [outputDim]
Properties
SupportsGpuExecution
Gets whether this layer has a GPU execution implementation for inference.
protected override bool SupportsGpuExecution { get; }
Property Value
Remarks
Override this to return true when the layer implements ForwardGpu(params IGpuTensor<T>[]). The actual CanExecuteOnGpu property combines this with engine availability.
For Beginners: This flag indicates if the layer has GPU code for the forward pass. Set this to true in derived classes that implement ForwardGpu.
SupportsJitCompilation
Gets whether this layer supports JIT compilation.
public override bool SupportsJitCompilation { get; }
Property Value
SupportsTraining
Gets a value indicating whether this layer supports training.
public override bool SupportsTraining { get; }
Property Value
Methods
Backward(Tensor<T>)
Performs the backward pass of the time embedding layer.
public override Tensor<T> Backward(Tensor<T> outputGradient)
Parameters
outputGradientTensor<T>The gradient of the loss with respect to the layer's output.
Returns
- Tensor<T>
The gradient of the loss with respect to the layer's input (timesteps).
BackwardGpu(IGpuTensor<T>)
Performs the GPU-resident backward pass of the time embedding layer.
public override IGpuTensor<T> BackwardGpu(IGpuTensor<T> outputGradient)
Parameters
outputGradientIGpuTensor<T>The GPU tensor containing the gradient of the loss with respect to the layer's output.
Returns
- IGpuTensor<T>
The gradient of the loss with respect to the timestep input (typically zeros since sinusoidal embedding is fixed).
ExportComputationGraph(List<ComputationNode<T>>)
Exports the layer as a computation graph for JIT compilation.
public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)
Parameters
inputNodesList<ComputationNode<T>>List of input nodes (expects one node containing timesteps).
Returns
- ComputationNode<T>
A computation node representing the time embedding output.
Remarks
This method builds a computation graph for the time embedding: 1. Sinusoidal embedding of timesteps 2. First linear layer (matrix multiply + bias) 3. SiLU/Swish activation 4. Second linear layer (matrix multiply + bias)
Forward(Tensor<T>)
Performs the forward pass of the time embedding layer.
public override Tensor<T> Forward(Tensor<T> input)
Parameters
inputTensor<T>Input tensor containing timesteps. Shape: [batch] or [batch, 1].
Returns
- Tensor<T>
Time embedding tensor with shape [batch, outputDim].
ForwardGpu(params IGpuTensor<T>[])
Performs the forward pass of the layer on GPU.
public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)
Parameters
inputsIGpuTensor<T>[]The GPU-resident input tensor(s).
Returns
- IGpuTensor<T>
The GPU-resident output tensor.
Remarks
This method performs the layer's forward computation entirely on GPU. The input and output tensors remain in GPU memory, avoiding expensive CPU-GPU transfers.
For Beginners: This is like Forward() but runs on the graphics card.
The key difference:
- Forward() uses CPU tensors that may be copied to/from GPU
- ForwardGpu() keeps everything on GPU the whole time
Override this in derived classes that support GPU acceleration.
Exceptions
- NotSupportedException
Thrown when the layer does not support GPU execution.
GetParameters()
Gets all trainable parameters of the layer as a single vector.
public override Vector<T> GetParameters()
Returns
- Vector<T>
ResetState()
Resets the internal state of the layer.
public override void ResetState()
SetParameters(Vector<T>)
Sets all trainable parameters of the layer from a vector.
public override void SetParameters(Vector<T> parameters)
Parameters
parametersVector<T>
UpdateParameters(T)
Updates the parameters of the layer using the calculated gradients.
public override void UpdateParameters(T learningRate)
Parameters
learningRateTThe learning rate to use for the parameter updates.