Class CTCLoss<T>
- Namespace
- AiDotNet.LossFunctions
- Assembly
- AiDotNet.dll
Implements the Connectionist Temporal Classification (CTC) loss function for sequence-to-sequence learning.
public class CTCLoss<T> : ISequenceLossFunction<T>
Type Parameters
TThe numeric type used for calculations (e.g., float, double).
- Inheritance
-
CTCLoss<T>
- Implements
- Inherited Members
Remarks
For Beginners: Connectionist Temporal Classification (CTC) is a loss function designed for sequence-to-sequence learning problems where the alignment between input and output sequences is unknown.
For example, in speech recognition, we have:
- Input: An audio waveform (long sequence of sound samples)
- Output: Text transcript (shorter sequence of characters)
The key challenge is that we don't know exactly which parts of the audio correspond to each character. CTC solves this by considering all possible alignments between the input and output sequences.
CTC introduces a special "blank" token to handle:
- Repetitions of characters (e.g., "hello" vs "hheellloo")
- Silence or transitions between sounds
This loss function is commonly used in:
- Speech recognition
- Handwriting recognition
- Any task where input and output sequences have different lengths and unknown alignment
Constructors
CTCLoss(int, bool)
Initializes a new instance of the CTCLoss class.
public CTCLoss(int blankIndex = 0, bool inputsAreLogProbs = true)
Parameters
blankIndexintThe index of the blank symbol in the vocabulary. Default is 0.
inputsAreLogProbsboolWhether inputs are already in log space. Default is true.
Exceptions
- ArgumentNullException
Thrown when numericOperations is null.
- ArgumentOutOfRangeException
Thrown when blankIndex is negative.
Methods
CalculateGradient(Tensor<T>, int[][], int[], int[])
Calculates the gradient of the CTC loss with respect to the inputs.
public Tensor<T> CalculateGradient(Tensor<T> logProbs, int[][] targets, int[] inputLengths, int[] targetLengths)
Parameters
logProbsTensor<T>Log probabilities tensor [batch, time, classes].
targetsint[][]Target label sequences for each batch item.
inputLengthsint[]Actual lengths of each input sequence.
targetLengthsint[]Actual lengths of each target sequence.
Returns
- Tensor<T>
The gradient tensor with same shape as inputs.
CalculateLoss(Tensor<T>, int[][], int[], int[])
Calculates the CTC loss for a batch of sequences.
public T CalculateLoss(Tensor<T> logProbs, int[][] targets, int[] inputLengths, int[] targetLengths)
Parameters
logProbsTensor<T>Log probabilities tensor [batch, time, classes].
targetsint[][]Target label sequences for each batch item.
inputLengthsint[]Actual lengths of each input sequence.
targetLengthsint[]Actual lengths of each target sequence.
Returns
- T
The average CTC loss value across the batch.