Class ConditionalRandomFieldLayer<T>
- Namespace
- AiDotNet.NeuralNetworks.Layers
- Assembly
- AiDotNet.dll
Represents a Conditional Random Field (CRF) layer for sequence labeling tasks.
public class ConditionalRandomFieldLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable
Type Parameters
TThe numeric type used for calculations, typically float or double.
- Inheritance
-
LayerBase<T>ConditionalRandomFieldLayer<T>
- Implements
-
ILayer<T>
- Inherited Members
Remarks
A Conditional Random Field (CRF) layer is a specialized neural network layer designed for sequence labeling tasks such as named entity recognition, part-of-speech tagging, and activity recognition. Unlike standard classification layers that make independent predictions for each element in a sequence, CRF layers model the dependencies between labels in a sequence, leading to more coherent predictions. The layer uses the Viterbi algorithm to find the most likely sequence of labels given the input features and learned transition probabilities between labels.
For Beginners: A Conditional Random Field (CRF) layer is used when you need to label each item in a sequence while considering how labels relate to each other.
In many sequence tasks, the label for an item depends not just on the item itself, but also on nearby items:
For example, in a sentence like "John Smith lives in New York":
- Without CRF: Each word might be labeled independently, potentially creating invalid sequences
- With CRF: The model considers that "New" followed by "York" is likely a location name
Think of it like:
- Standard layers ask, "What's the best label for this word on its own?"
- CRF layers ask, "What's the best sequence of labels for the whole sentence?"
CRFs are especially useful for tasks like:
- Named entity recognition (finding names of people, organizations, locations)
- Part-of-speech tagging (labeling words as nouns, verbs, etc.)
- Any task where the correct labels form patterns or follow rules
Constructors
ConditionalRandomFieldLayer(int, int, IActivationFunction<T>?)
Initializes a new instance of the ConditionalRandomFieldLayer<T> class with a scalar activation function.
public ConditionalRandomFieldLayer(int numClasses, int sequenceLength, IActivationFunction<T>? scalarActivation = null)
Parameters
numClassesintThe number of possible label classes.
sequenceLengthintThe length of the input sequences.
scalarActivationIActivationFunction<T>The scalar activation function to apply to inputs. Defaults to identity if not specified.
Remarks
This constructor creates a new CRF layer with the specified number of classes and sequence length. It initializes the transition matrix, start scores, and end scores with appropriate random values. The scalar activation function is applied to the input features before the CRF processing.
For Beginners: This constructor creates a new CRF layer with a standard activation function.
When creating a CRF layer, you need to specify:
- How many different labels (classes) there are (e.g., 9 parts of speech)
- How long each input sequence is (e.g., maximum sentence length)
- Optionally, an activation function to transform the input features
The layer creates and initializes:
- A transition matrix that learns how likely one label is to follow another
- Start scores that learn which labels commonly appear at the beginning
- End scores that learn which labels commonly appear at the end
These values start as small random numbers and are refined during training.
ConditionalRandomFieldLayer(int, int, IVectorActivationFunction<T>?)
Initializes a new instance of the ConditionalRandomFieldLayer<T> class with a vector activation function.
public ConditionalRandomFieldLayer(int numClasses, int sequenceLength, IVectorActivationFunction<T>? vectorActivation = null)
Parameters
numClassesintThe number of possible label classes.
sequenceLengthintThe length of the input sequences.
vectorActivationIVectorActivationFunction<T>The vector activation function to apply to inputs. Defaults to identity if not specified.
Remarks
This constructor creates a new CRF layer with the specified number of classes and sequence length. It initializes the transition matrix, start scores, and end scores with appropriate random values. This overload accepts a vector activation function, which operates on entire vectors rather than individual elements when transforming the input features.
For Beginners: This constructor creates a new CRF layer with a vector-based activation function.
A vector activation function:
- Operates on entire groups of numbers at once, rather than one at a time
- Can capture relationships between different elements in the input
- Defaults to the Identity function, which doesn't change the values
This constructor works the same way as the other one, but it's useful when you need more complex activation patterns that consider the relationships between different inputs.
Properties
SupportsGpuExecution
Gets whether this layer has a GPU execution implementation for inference.
protected override bool SupportsGpuExecution { get; }
Property Value
Remarks
Override this to return true when the layer implements ForwardGpu(params IGpuTensor<T>[]). The actual CanExecuteOnGpu property combines this with engine availability.
For Beginners: This flag indicates if the layer has GPU code for the forward pass. Set this to true in derived classes that implement ForwardGpu.
SupportsJitCompilation
Gets a value indicating whether this layer supports JIT compilation.
public override bool SupportsJitCompilation { get; }
Property Value
- bool
Always
true. CRF uses the forward algorithm for differentiable training.
Remarks
JIT compilation for CRF uses the forward algorithm to compute the log partition function, which is differentiable with respect to emissions and transitions. This enables gradient-based optimization of CRF parameters. For inference, Viterbi decoding is used at runtime, but the JIT-compiled graph supports training.
SupportsTraining
Gets a value indicating whether this layer supports training.
public override bool SupportsTraining { get; }
Property Value
- bool
Always
trueas CRF layers have trainable parameters.
Remarks
This property returns true because ConditionalRandomFieldLayer has trainable parameters (transition matrix, start scores, and end scores) that are adjusted during the training process to minimize the network's error.
For Beginners: This property tells you that this layer can learn from data.
A value of true means:
- The layer contains values (parameters) that will change during training
- It will improve its performance as it sees more examples
- It participates in the learning process of the neural network
CRF layers always support training because they need to learn:
- How likely one label is to follow another (transition probabilities)
- Which labels are likely to appear at the start of a sequence
- Which labels are likely to appear at the end of a sequence
Methods
Backward(Tensor<T>)
Performs the backward pass of the CRF layer.
public override Tensor<T> Backward(Tensor<T> outputGradient)
Parameters
outputGradientTensor<T>The gradient of the loss with respect to the layer's output.
Returns
- Tensor<T>
The gradient of the loss with respect to the layer's input.
Remarks
This method implements the backward pass of the CRF layer, which is used during training to propagate error gradients back through the network. It computes the gradients of the loss with respect to the layer's parameters (transition matrix, start scores, and end scores) and the layer's input.
For Beginners: This method is used during training to calculate how the layer's inputs and parameters should change to reduce errors.
During the backward pass:
- The layer receives error gradients from the next layer
- It calculates how much each parameter contributed to the error:
- How transition scores between labels should change
- How start and end scores should change
- It calculates how the input features contributed to the error
- If an activation function was used, its derivative is applied
This lets the network learn:
- Which label is likely to follow another
- Which labels commonly appear at the start or end of sequences
- How input features relate to labels
This is part of the "backpropagation" algorithm that helps neural networks learn from their mistakes and improve over time.
Exceptions
- InvalidOperationException
Thrown when backward is called before forward.
ExportComputationGraph(List<ComputationNode<T>>)
Exports the layer's computation graph for JIT compilation.
public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)
Parameters
inputNodesList<ComputationNode<T>>List to populate with input computation nodes.
Returns
- ComputationNode<T>
The output computation node representing the layer's operation.
Remarks
This method constructs a computation graph representation of the layer's forward pass that can be JIT compiled for faster inference. All layers MUST implement this method to support JIT compilation.
For Beginners: JIT (Just-In-Time) compilation converts the layer's operations into optimized native code for 5-10x faster inference.
To support JIT compilation, a layer must:
- Implement this method to export its computation graph
- Set SupportsJitCompilation to true
- Use ComputationNode and TensorOperations to build the graph
All layers are required to implement this method, even if they set SupportsJitCompilation = false.
Forward(Tensor<T>)
Performs the forward pass of the CRF layer.
public override Tensor<T> Forward(Tensor<T> input)
Parameters
inputTensor<T>The input tensor containing sequence features.
Returns
- Tensor<T>
The output tensor containing the most likely sequence labels.
Remarks
This method implements the forward pass of the CRF layer using the Viterbi algorithm to find the most likely sequence of labels. It first applies any activation function to transform the input features, then uses dynamic programming to find the optimal label sequence considering the transition scores between labels, start scores, and end scores. The output is a one-hot encoded tensor representing the best label at each position in the sequence.
For Beginners: This method finds the best sequence of labels for the input data.
The forward pass has several steps:
- Transform the input features using the activation function (if specified)
- For each sequence in the batch, run the Viterbi algorithm:
- Start with the initial scores and input features
- For each position in the sequence, calculate the best previous label
- Keep track of the best path using "backpointers"
- Find the best final label considering the end scores
- Trace backwards to find the optimal sequence of labels
- Convert the best label sequence to a one-hot encoded output
The Viterbi algorithm is like finding the shortest path through a grid, where each step considers both the current position's score and the transition cost from the previous position.
This approach ensures that the entire sequence of labels makes sense together, rather than just picking the best label at each position independently.
ForwardGpu(params IGpuTensor<T>[])
Performs the forward pass of the layer on GPU.
public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)
Parameters
inputsIGpuTensor<T>[]The GPU-resident input tensor(s).
Returns
- IGpuTensor<T>
The GPU-resident output tensor.
Remarks
This method performs the layer's forward computation entirely on GPU. The input and output tensors remain in GPU memory, avoiding expensive CPU-GPU transfers.
For Beginners: This is like Forward() but runs on the graphics card.
The key difference:
- Forward() uses CPU tensors that may be copied to/from GPU
- ForwardGpu() keeps everything on GPU the whole time
Override this in derived classes that support GPU acceleration.
Exceptions
- NotSupportedException
Thrown when the layer does not support GPU execution.
GetParameters()
Gets all trainable parameters from the layer as a single vector.
public override Vector<T> GetParameters()
Returns
- Vector<T>
A vector containing all trainable parameters.
Remarks
This method retrieves all trainable parameters from the layer (transition matrix, start scores, and end scores) and combines them into a single vector. This is useful for optimization algorithms that operate on all parameters at once, or for saving and loading model weights.
For Beginners: This method collects all the learnable values from the layer into a single list.
The parameters include:
- The transition matrix (shows how likely one label is to follow another)
- The start scores (shows which labels are likely at sequence beginnings)
- The end scores (shows which labels are likely at sequence endings)
All these values are flattened into a single long list (vector).
This is useful for:
- Saving the model to disk
- Loading parameters from a previously trained model
- Advanced optimization techniques that need access to all parameters
ResetState()
Resets the internal state of the CRF layer.
public override void ResetState()
Remarks
This method resets the internal state of the CRF layer, including the cached inputs, outputs, and parameter gradients. This is useful when starting to process a new sequence or batch of data.
For Beginners: This method clears the layer's temporary memory to start fresh.
When resetting the state:
- Stored inputs and outputs are cleared
- Calculated gradients are cleared
- The layer forgets any information from previous batches
This is important for:
- Processing a new, unrelated batch of data
- Preventing information from one batch affecting another
- Freeing up memory that's no longer needed
Note that this doesn't reset the learned parameters (transition matrix, start scores, end scores), just the temporary information used during a single forward/backward pass.
SetParameters(Vector<T>)
Sets the trainable parameters for the layer.
public override void SetParameters(Vector<T> parameters)
Parameters
parametersVector<T>A vector containing all parameters to set.
Remarks
This method sets the trainable parameters for the layer (transition matrix, start scores, and end scores) from a single vector. This is useful for loading saved model weights or for implementing optimization algorithms that operate on all parameters at once.
For Beginners: This method updates all the learnable values in the layer.
When setting parameters:
- The input must be a vector with the correct length
- The first part is used for the transition matrix
- The next part is used for the start scores
- The final part is used for the end scores
This is useful for:
- Loading a previously saved model
- Transferring parameters from another model
- Testing different parameter values
An error is thrown if the input vector doesn't have the expected number of parameters.
Exceptions
- ArgumentException
Thrown when the parameters vector has incorrect length.
UpdateParameters(T)
Updates the layer's parameters using the calculated gradients.
public override void UpdateParameters(T learningRate)
Parameters
learningRateTThe learning rate to use for the parameter updates.
Remarks
This method updates the layer's parameters (transition matrix, start scores, and end scores) based on the gradients calculated during the backward pass. The learning rate controls the size of the parameter updates. The update is performed by subtracting the scaled gradients from the current parameters.
For Beginners: This method updates the layer's internal values during training.
After calculating the gradients in the backward pass:
- This method applies those changes to the transition matrix, start scores, and end scores
- The learning rate controls how big each update step is
- Smaller learning rates mean slower but more stable learning
- Larger learning rates mean faster but potentially unstable learning
The formula is simple: new_value = old_value - (gradient * learning_rate)
This is how the layer "learns" from data over time, gradually improving its ability to predict the correct sequence of labels.
Exceptions
- InvalidOperationException
Thrown when update is called before backward.