Table of Contents

Class HighwayLayer<T>

Namespace
AiDotNet.NeuralNetworks.Layers
Assembly
AiDotNet.dll

Represents a Highway Neural Network layer that allows information to flow unchanged through the network.

public class HighwayLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IWeightLoadable<T>, IDisposable, IAuxiliaryLossLayer<T>, IDiagnosticsProvider

Type Parameters

T

The numeric type used for calculations, typically float or double.

Inheritance
HighwayLayer<T>
Implements
Inherited Members

Remarks

A Highway Layer enables networks to effectively train even when they are very deep by introducing "gating units" which learn to selectively pass or transform information. Unlike regular feed-forward layers, highway layers have two "lanes": the transform lane that processes input data and the bypass lane that passes information unchanged. The balance between these two lanes is controlled by a learned gating mechanism.

For Beginners: This layer helps solve a common problem in deep neural networks: difficulty in training very deep networks.

Think of the Highway Layer like a road with two lanes:

  • The "transform lane" processes the data like a regular neural network layer
  • The "bypass lane" lets information pass through unchanged
  • A "gate" controls how much information flows through each lane

For example, when processing an image, the gate might let basic features like edges pass through directly while sending more complex features through the transform lane for further processing.

This helps the network train more effectively because important information can flow more easily through many layers without being lost or distorted.

Constructors

HighwayLayer(int, IActivationFunction<T>?, IActivationFunction<T>?)

Initializes a new instance of the HighwayLayer<T> class with the specified dimensions and element-wise activation functions.

public HighwayLayer(int inputDimension, IActivationFunction<T>? transformActivation = null, IActivationFunction<T>? gateActivation = null)

Parameters

inputDimension int

The dimension of the input and output vectors.

transformActivation IActivationFunction<T>

The activation function for the transform path. Defaults to tanh if not specified.

gateActivation IActivationFunction<T>

The activation function for the gate values. Defaults to sigmoid if not specified.

Remarks

This constructor creates a new Highway layer with the specified dimension and element-wise activation functions. The weights are initialized randomly with a scale factor, and the transform biases are initialized to zero while the gate biases are initialized to negative values to allow more information flow through the transform path initially.

For Beginners: This creates a new Highway layer with standard activation functions.

When creating a Highway layer, you specify:

  • inputDimension: How many features your data has (same for input and output)
  • transformActivation: How to shape transformed data (default is tanh, values between -1 and 1)
  • gateActivation: How to control the gates (default is sigmoid, values between 0 and 1)

The layer automatically initializes with:

  • Random weights for both transform and gate paths
  • Zero biases for the transform path
  • Negative biases for the gates (initially favoring the transform path)

HighwayLayer(int, IVectorActivationFunction<T>?, IVectorActivationFunction<T>?)

Initializes a new instance of the HighwayLayer<T> class with the specified dimensions and vector activation functions.

public HighwayLayer(int inputDimension, IVectorActivationFunction<T>? transformActivation = null, IVectorActivationFunction<T>? gateActivation = null)

Parameters

inputDimension int

The dimension of the input and output vectors.

transformActivation IVectorActivationFunction<T>

The vector activation function for the transform path. Defaults to tanh if not specified.

gateActivation IVectorActivationFunction<T>

The vector activation function for the gate values. Defaults to sigmoid if not specified.

Remarks

This constructor creates a new Highway layer with the specified dimension and vector activation functions. Vector activation functions operate on entire vectors rather than individual elements, which can capture dependencies between different elements of the vectors.

For Beginners: This creates a new Highway layer with more advanced vector-based activation functions.

Vector activation functions:

  • Process entire groups of numbers together, not just one at a time
  • Can capture relationships between different features
  • May be more powerful for complex patterns

This constructor is useful when you need the layer to understand how different features interact with each other, rather than treating each feature independently.

Properties

AuxiliaryLossWeight

Gets or sets the weight for the auxiliary loss contribution.

public T AuxiliaryLossWeight { get; set; }

Property Value

T

ParameterCount

Gets the total number of trainable parameters in this layer.

public override int ParameterCount { get; }

Property Value

int

The sum of elements in all weight and bias tensors (transform weights, transform bias, gate weights, gate bias).

Remarks

This property returns the total count of learnable parameters across all four parameter tensors: transform weights, transform biases, gate weights, and gate biases.

For Beginners: This tells you how many numbers the layer can adjust during training. For a Highway layer with 100 input/output dimensions, you would have: - 10,000 transform weights (100 x 100) - 100 transform biases - 10,000 gate weights (100 x 100) - 100 gate biases - Total: 20,200 parameters

SupportsGpuExecution

Gets a value indicating whether this layer supports GPU execution.

protected override bool SupportsGpuExecution { get; }

Property Value

bool

SupportsJitCompilation

Gets a value indicating whether this layer supports JIT compilation.

public override bool SupportsJitCompilation { get; }

Property Value

bool

true when weights are initialized and activation functions support JIT.

Remarks

Highway layers support JIT compilation when: - Transform and gate weights are initialized - The transform activation function (typically Tanh) supports JIT - The gate activation function (typically Sigmoid) supports JIT

SupportsTraining

Gets a value indicating whether this layer supports training.

public override bool SupportsTraining { get; }

Property Value

bool

true because this layer has trainable parameters (weights and biases).

Remarks

This property indicates whether the layer can be trained through backpropagation. The HighwayLayer always returns true because it contains trainable weights and biases.

For Beginners: This property tells you if the layer can learn from data.

A value of true means:

  • The layer can adjust its internal values during training
  • It will improve its performance as it sees more data
  • It participates in the learning process

The Highway layer always supports training because it has weights and biases that can be updated.

UseAuxiliaryLoss

Gets or sets a value indicating whether auxiliary loss is enabled for this layer.

public bool UseAuxiliaryLoss { get; set; }

Property Value

bool

Methods

Backward(Tensor<T>)

Performs the backward pass of the highway layer.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>

The gradient of the loss with respect to the layer's output.

Returns

Tensor<T>

The gradient of the loss with respect to the layer's input.

Remarks

This method implements the backward pass of the highway layer, which is used during training to propagate error gradients back through the network. It calculates the gradients for all the weights and biases, and returns the gradient with respect to the input for further backpropagation.

For Beginners: This method is used during training to calculate how the layer's input and parameters should change to reduce errors.

During the backward pass:

  1. The layer receives information about how its output should change to reduce the overall error
  2. It calculates how the gate values should change to better control the mix of transformed vs. bypassed data
  3. It calculates how the transform parameters should change to better process the input
  4. It determines how the input should change, which will be used by earlier layers

This process involves complex calculations that essentially run the layer's logic in reverse, figuring out how each component contributed to errors in the output.

Exceptions

InvalidOperationException

Thrown when Forward has not been called before Backward.

BackwardGpu(IGpuTensor<T>)

Performs the backward pass on GPU for the highway layer.

public override IGpuTensor<T> BackwardGpu(IGpuTensor<T> outputGradient)

Parameters

outputGradient IGpuTensor<T>

The GPU tensor containing the gradient of the loss with respect to the output.

Returns

IGpuTensor<T>

The GPU tensor containing the gradient of the loss with respect to the input.

ComputeAuxiliaryLoss()

Computes the auxiliary loss for this layer based on gate balance regularization.

public T ComputeAuxiliaryLoss()

Returns

T

The computed auxiliary loss value.

Remarks

This method computes a gate-balance regularization loss that encourages the gates to maintain a balanced value around 0.5, preventing degenerate gating where all gates collapse to 0 or 1. The loss is computed as the squared deviation of the mean gate value from 0.5, averaged across all dimensions and batch samples.

For Beginners: This prevents the highway layer from "cheating" by always using only one lane (transform or bypass). By penalizing gates that drift too far from 0.5, we ensure the network learns to use both lanes effectively, making the highway mechanism meaningful.

ExportComputationGraph(List<ComputationNode<T>>)

Exports the highway layer's forward pass as a JIT-compilable computation graph.

public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)

Parameters

inputNodes List<ComputationNode<T>>

List to populate with input computation nodes.

Returns

ComputationNode<T>

The output computation node representing the gated highway output.

Remarks

The highway layer computation graph implements: output = gate * transform(input) + (1 - gate) * input

Where:

  • transform = activation(input @ transformWeights + transformBias)
  • gate = sigmoid(input @ gateWeights + gateBias)

For Beginners: This creates an optimized version of the highway layer. The gate controls how much information flows through the transform path vs. the bypass path.

Forward(Tensor<T>)

Performs the forward pass of the highway layer.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>

The input tensor to process. Shape should be [batchSize, inputDimension].

Returns

Tensor<T>

The output tensor with shape [batchSize, inputDimension].

Remarks

This method implements the forward pass of the highway layer according to the formula: output = gate * transform_output + (1 - gate) * input. The gate values control how much of the transformed output versus the original input is used for each feature.

For Beginners: This method processes your data through the highway layer.

During the forward pass:

  1. The transform path processes the input data using weights and activation
  2. The gate controller computes values between 0 and 1 for each feature
  3. The final output mixes the original input and transformed data according to the gate values

For example, if a gate value is 0.7, the output will be 70% from the transform path and 30% directly from the input. This allows the layer to learn which features should be transformed and which should pass through unchanged.

ForwardGpu(params IGpuTensor<T>[])

Performs the forward pass on GPU using FusedLinearGpu for efficient computation.

public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)

Parameters

inputs IGpuTensor<T>[]

The GPU input tensors.

Returns

IGpuTensor<T>

The GPU output tensor.

GetAuxiliaryLossDiagnostics()

Gets diagnostic information about the auxiliary loss computation.

public Dictionary<string, string> GetAuxiliaryLossDiagnostics()

Returns

Dictionary<string, string>

A dictionary containing diagnostic information about the auxiliary loss.

GetDiagnostics()

Gets diagnostic information about this component's state and behavior. Overrides GetDiagnostics() to include auxiliary loss diagnostics.

public override Dictionary<string, string> GetDiagnostics()

Returns

Dictionary<string, string>

A dictionary containing diagnostic metrics including both base layer diagnostics and auxiliary loss diagnostics from GetAuxiliaryLossDiagnostics().

GetParameters()

Gets all trainable parameters of the layer as a single vector.

public override Vector<T> GetParameters()

Returns

Vector<T>

A vector containing all trainable parameters.

Remarks

This method retrieves all trainable parameters (weights and biases) and combines them into a single vector. The parameters are arranged in the following order: transform weights, transform biases, gate weights, gate biases. This is useful for optimization algorithms that operate on all parameters at once, or for saving and loading model weights.

For Beginners: This method collects all the learnable values from the layer.

The parameters:

  • Are the numbers that the neural network learns during training
  • Include weights and biases from both transform and gate paths
  • Are combined into a single long list (vector)

This is useful for:

  • Saving the model to disk
  • Loading parameters from a previously trained model
  • Advanced optimization techniques that need access to all parameters

ResetState()

Resets the internal state of the layer.

public override void ResetState()

Remarks

This method resets the internal state of the layer, clearing cached values from forward and backward passes. This includes the last input, output, transform output, gate output, and all gradients.

For Beginners: This method clears the layer's memory to start fresh.

When resetting the state:

  • All stored information about previous inputs and outputs is removed
  • All calculated gradients are cleared
  • The layer is ready for new data without being influenced by previous data

This is important for:

  • Processing a new, unrelated batch of data
  • Preventing information from one batch affecting another
  • Starting a new training episode

For example, if you've processed one batch of images and want to start with a new batch, you should reset the state to prevent the new processing from being influenced by the previous batch.

SetParameters(Vector<T>)

Sets the trainable parameters of the layer.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>

A vector containing all parameters to set.

Remarks

This method sets all the weight matrices and bias vectors of the highway layer from a single vector of parameters. The parameters should be arranged in the following order: transform weights, transform biases, gate weights, gate biases. This is useful for loading saved model weights or for implementing optimization algorithms that operate on all parameters at once.

For Beginners: This method updates all the learnable values in the layer.

When setting parameters:

  • The input must be a vector with the correct length
  • The parameters must be in the right order: transform weights, transform biases, gate weights, gate biases
  • This maintains the same structure used by GetParameters

This is useful for:

  • Loading a previously saved model
  • Transferring parameters from another model
  • Testing different parameter values

An error is thrown if the input vector doesn't have the expected number of parameters.

Exceptions

ArgumentException

Thrown when the parameters vector has incorrect length.

UpdateParameters(T)

Updates the parameters of the layer using the calculated gradients.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T

The learning rate to use for the parameter updates.

Remarks

This method updates all the weight matrices and bias vectors of the highway layer based on the gradients calculated during the backward pass. The learning rate controls the size of the parameter updates. This is typically called after the backward pass during training.

For Beginners: This method updates the layer's internal values during training.

When updating parameters:

  • All weights and biases are adjusted to reduce prediction errors
  • The learning rate controls how big each update step is
  • Smaller learning rates mean slower but more stable learning
  • Larger learning rates mean faster but potentially unstable learning

This is how the layer "learns" from data over time, gradually improving its ability to decide what information should be transformed and what should pass through unchanged.

Exceptions

InvalidOperationException

Thrown when Backward has not been called before UpdateParameters.