Table of Contents

Class ConditionalRandomFieldLayer<T>

Namespace
AiDotNet.NeuralNetworks.Layers
Assembly
AiDotNet.dll

Represents a Conditional Random Field (CRF) layer for sequence labeling tasks.

public class ConditionalRandomFieldLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable

Type Parameters

T

The numeric type used for calculations, typically float or double.

Inheritance
ConditionalRandomFieldLayer<T>
Implements
Inherited Members

Remarks

A Conditional Random Field (CRF) layer is a specialized neural network layer designed for sequence labeling tasks such as named entity recognition, part-of-speech tagging, and activity recognition. Unlike standard classification layers that make independent predictions for each element in a sequence, CRF layers model the dependencies between labels in a sequence, leading to more coherent predictions. The layer uses the Viterbi algorithm to find the most likely sequence of labels given the input features and learned transition probabilities between labels.

For Beginners: A Conditional Random Field (CRF) layer is used when you need to label each item in a sequence while considering how labels relate to each other.

In many sequence tasks, the label for an item depends not just on the item itself, but also on nearby items:

For example, in a sentence like "John Smith lives in New York":

  • Without CRF: Each word might be labeled independently, potentially creating invalid sequences
  • With CRF: The model considers that "New" followed by "York" is likely a location name

Think of it like:

  • Standard layers ask, "What's the best label for this word on its own?"
  • CRF layers ask, "What's the best sequence of labels for the whole sentence?"

CRFs are especially useful for tasks like:

  • Named entity recognition (finding names of people, organizations, locations)
  • Part-of-speech tagging (labeling words as nouns, verbs, etc.)
  • Any task where the correct labels form patterns or follow rules

Constructors

ConditionalRandomFieldLayer(int, int, IActivationFunction<T>?)

Initializes a new instance of the ConditionalRandomFieldLayer<T> class with a scalar activation function.

public ConditionalRandomFieldLayer(int numClasses, int sequenceLength, IActivationFunction<T>? scalarActivation = null)

Parameters

numClasses int

The number of possible label classes.

sequenceLength int

The length of the input sequences.

scalarActivation IActivationFunction<T>

The scalar activation function to apply to inputs. Defaults to identity if not specified.

Remarks

This constructor creates a new CRF layer with the specified number of classes and sequence length. It initializes the transition matrix, start scores, and end scores with appropriate random values. The scalar activation function is applied to the input features before the CRF processing.

For Beginners: This constructor creates a new CRF layer with a standard activation function.

When creating a CRF layer, you need to specify:

  • How many different labels (classes) there are (e.g., 9 parts of speech)
  • How long each input sequence is (e.g., maximum sentence length)
  • Optionally, an activation function to transform the input features

The layer creates and initializes:

  • A transition matrix that learns how likely one label is to follow another
  • Start scores that learn which labels commonly appear at the beginning
  • End scores that learn which labels commonly appear at the end

These values start as small random numbers and are refined during training.

ConditionalRandomFieldLayer(int, int, IVectorActivationFunction<T>?)

Initializes a new instance of the ConditionalRandomFieldLayer<T> class with a vector activation function.

public ConditionalRandomFieldLayer(int numClasses, int sequenceLength, IVectorActivationFunction<T>? vectorActivation = null)

Parameters

numClasses int

The number of possible label classes.

sequenceLength int

The length of the input sequences.

vectorActivation IVectorActivationFunction<T>

The vector activation function to apply to inputs. Defaults to identity if not specified.

Remarks

This constructor creates a new CRF layer with the specified number of classes and sequence length. It initializes the transition matrix, start scores, and end scores with appropriate random values. This overload accepts a vector activation function, which operates on entire vectors rather than individual elements when transforming the input features.

For Beginners: This constructor creates a new CRF layer with a vector-based activation function.

A vector activation function:

  • Operates on entire groups of numbers at once, rather than one at a time
  • Can capture relationships between different elements in the input
  • Defaults to the Identity function, which doesn't change the values

This constructor works the same way as the other one, but it's useful when you need more complex activation patterns that consider the relationships between different inputs.

Properties

SupportsGpuExecution

Gets whether this layer has a GPU execution implementation for inference.

protected override bool SupportsGpuExecution { get; }

Property Value

bool

Remarks

Override this to return true when the layer implements ForwardGpu(params IGpuTensor<T>[]). The actual CanExecuteOnGpu property combines this with engine availability.

For Beginners: This flag indicates if the layer has GPU code for the forward pass. Set this to true in derived classes that implement ForwardGpu.

SupportsJitCompilation

Gets a value indicating whether this layer supports JIT compilation.

public override bool SupportsJitCompilation { get; }

Property Value

bool

Always true. CRF uses the forward algorithm for differentiable training.

Remarks

JIT compilation for CRF uses the forward algorithm to compute the log partition function, which is differentiable with respect to emissions and transitions. This enables gradient-based optimization of CRF parameters. For inference, Viterbi decoding is used at runtime, but the JIT-compiled graph supports training.

SupportsTraining

Gets a value indicating whether this layer supports training.

public override bool SupportsTraining { get; }

Property Value

bool

Always true as CRF layers have trainable parameters.

Remarks

This property returns true because ConditionalRandomFieldLayer has trainable parameters (transition matrix, start scores, and end scores) that are adjusted during the training process to minimize the network's error.

For Beginners: This property tells you that this layer can learn from data.

A value of true means:

  • The layer contains values (parameters) that will change during training
  • It will improve its performance as it sees more examples
  • It participates in the learning process of the neural network

CRF layers always support training because they need to learn:

  • How likely one label is to follow another (transition probabilities)
  • Which labels are likely to appear at the start of a sequence
  • Which labels are likely to appear at the end of a sequence

Methods

Backward(Tensor<T>)

Performs the backward pass of the CRF layer.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>

The gradient of the loss with respect to the layer's output.

Returns

Tensor<T>

The gradient of the loss with respect to the layer's input.

Remarks

This method implements the backward pass of the CRF layer, which is used during training to propagate error gradients back through the network. It computes the gradients of the loss with respect to the layer's parameters (transition matrix, start scores, and end scores) and the layer's input.

For Beginners: This method is used during training to calculate how the layer's inputs and parameters should change to reduce errors.

During the backward pass:

  1. The layer receives error gradients from the next layer
  2. It calculates how much each parameter contributed to the error:
    • How transition scores between labels should change
    • How start and end scores should change
  3. It calculates how the input features contributed to the error
  4. If an activation function was used, its derivative is applied

This lets the network learn:

  • Which label is likely to follow another
  • Which labels commonly appear at the start or end of sequences
  • How input features relate to labels

This is part of the "backpropagation" algorithm that helps neural networks learn from their mistakes and improve over time.

Exceptions

InvalidOperationException

Thrown when backward is called before forward.

ExportComputationGraph(List<ComputationNode<T>>)

Exports the layer's computation graph for JIT compilation.

public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)

Parameters

inputNodes List<ComputationNode<T>>

List to populate with input computation nodes.

Returns

ComputationNode<T>

The output computation node representing the layer's operation.

Remarks

This method constructs a computation graph representation of the layer's forward pass that can be JIT compiled for faster inference. All layers MUST implement this method to support JIT compilation.

For Beginners: JIT (Just-In-Time) compilation converts the layer's operations into optimized native code for 5-10x faster inference.

To support JIT compilation, a layer must:

  1. Implement this method to export its computation graph
  2. Set SupportsJitCompilation to true
  3. Use ComputationNode and TensorOperations to build the graph

All layers are required to implement this method, even if they set SupportsJitCompilation = false.

Forward(Tensor<T>)

Performs the forward pass of the CRF layer.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>

The input tensor containing sequence features.

Returns

Tensor<T>

The output tensor containing the most likely sequence labels.

Remarks

This method implements the forward pass of the CRF layer using the Viterbi algorithm to find the most likely sequence of labels. It first applies any activation function to transform the input features, then uses dynamic programming to find the optimal label sequence considering the transition scores between labels, start scores, and end scores. The output is a one-hot encoded tensor representing the best label at each position in the sequence.

For Beginners: This method finds the best sequence of labels for the input data.

The forward pass has several steps:

  1. Transform the input features using the activation function (if specified)
  2. For each sequence in the batch, run the Viterbi algorithm:
    • Start with the initial scores and input features
    • For each position in the sequence, calculate the best previous label
    • Keep track of the best path using "backpointers"
    • Find the best final label considering the end scores
    • Trace backwards to find the optimal sequence of labels
  3. Convert the best label sequence to a one-hot encoded output

The Viterbi algorithm is like finding the shortest path through a grid, where each step considers both the current position's score and the transition cost from the previous position.

This approach ensures that the entire sequence of labels makes sense together, rather than just picking the best label at each position independently.

ForwardGpu(params IGpuTensor<T>[])

Performs the forward pass of the layer on GPU.

public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)

Parameters

inputs IGpuTensor<T>[]

The GPU-resident input tensor(s).

Returns

IGpuTensor<T>

The GPU-resident output tensor.

Remarks

This method performs the layer's forward computation entirely on GPU. The input and output tensors remain in GPU memory, avoiding expensive CPU-GPU transfers.

For Beginners: This is like Forward() but runs on the graphics card.

The key difference:

  • Forward() uses CPU tensors that may be copied to/from GPU
  • ForwardGpu() keeps everything on GPU the whole time

Override this in derived classes that support GPU acceleration.

Exceptions

NotSupportedException

Thrown when the layer does not support GPU execution.

GetParameters()

Gets all trainable parameters from the layer as a single vector.

public override Vector<T> GetParameters()

Returns

Vector<T>

A vector containing all trainable parameters.

Remarks

This method retrieves all trainable parameters from the layer (transition matrix, start scores, and end scores) and combines them into a single vector. This is useful for optimization algorithms that operate on all parameters at once, or for saving and loading model weights.

For Beginners: This method collects all the learnable values from the layer into a single list.

The parameters include:

  • The transition matrix (shows how likely one label is to follow another)
  • The start scores (shows which labels are likely at sequence beginnings)
  • The end scores (shows which labels are likely at sequence endings)

All these values are flattened into a single long list (vector).

This is useful for:

  • Saving the model to disk
  • Loading parameters from a previously trained model
  • Advanced optimization techniques that need access to all parameters

ResetState()

Resets the internal state of the CRF layer.

public override void ResetState()

Remarks

This method resets the internal state of the CRF layer, including the cached inputs, outputs, and parameter gradients. This is useful when starting to process a new sequence or batch of data.

For Beginners: This method clears the layer's temporary memory to start fresh.

When resetting the state:

  • Stored inputs and outputs are cleared
  • Calculated gradients are cleared
  • The layer forgets any information from previous batches

This is important for:

  • Processing a new, unrelated batch of data
  • Preventing information from one batch affecting another
  • Freeing up memory that's no longer needed

Note that this doesn't reset the learned parameters (transition matrix, start scores, end scores), just the temporary information used during a single forward/backward pass.

SetParameters(Vector<T>)

Sets the trainable parameters for the layer.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>

A vector containing all parameters to set.

Remarks

This method sets the trainable parameters for the layer (transition matrix, start scores, and end scores) from a single vector. This is useful for loading saved model weights or for implementing optimization algorithms that operate on all parameters at once.

For Beginners: This method updates all the learnable values in the layer.

When setting parameters:

  • The input must be a vector with the correct length
  • The first part is used for the transition matrix
  • The next part is used for the start scores
  • The final part is used for the end scores

This is useful for:

  • Loading a previously saved model
  • Transferring parameters from another model
  • Testing different parameter values

An error is thrown if the input vector doesn't have the expected number of parameters.

Exceptions

ArgumentException

Thrown when the parameters vector has incorrect length.

UpdateParameters(T)

Updates the layer's parameters using the calculated gradients.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T

The learning rate to use for the parameter updates.

Remarks

This method updates the layer's parameters (transition matrix, start scores, and end scores) based on the gradients calculated during the backward pass. The learning rate controls the size of the parameter updates. The update is performed by subtracting the scaled gradients from the current parameters.

For Beginners: This method updates the layer's internal values during training.

After calculating the gradients in the backward pass:

  • This method applies those changes to the transition matrix, start scores, and end scores
  • The learning rate controls how big each update step is
  • Smaller learning rates mean slower but more stable learning
  • Larger learning rates mean faster but potentially unstable learning

The formula is simple: new_value = old_value - (gradient * learning_rate)

This is how the layer "learns" from data over time, gradually improving its ability to predict the correct sequence of labels.

Exceptions

InvalidOperationException

Thrown when update is called before backward.