Table of Contents

Class CTCLoss<T>

Namespace
AiDotNet.LossFunctions
Assembly
AiDotNet.dll

Implements the Connectionist Temporal Classification (CTC) loss function for sequence-to-sequence learning.

public class CTCLoss<T> : ISequenceLossFunction<T>

Type Parameters

T

The numeric type used for calculations (e.g., float, double).

Inheritance
CTCLoss<T>
Implements
Inherited Members

Remarks

For Beginners: Connectionist Temporal Classification (CTC) is a loss function designed for sequence-to-sequence learning problems where the alignment between input and output sequences is unknown.

For example, in speech recognition, we have:

  • Input: An audio waveform (long sequence of sound samples)
  • Output: Text transcript (shorter sequence of characters)

The key challenge is that we don't know exactly which parts of the audio correspond to each character. CTC solves this by considering all possible alignments between the input and output sequences.

CTC introduces a special "blank" token to handle:

  • Repetitions of characters (e.g., "hello" vs "hheellloo")
  • Silence or transitions between sounds

This loss function is commonly used in:

  • Speech recognition
  • Handwriting recognition
  • Any task where input and output sequences have different lengths and unknown alignment

Constructors

CTCLoss(int, bool)

Initializes a new instance of the CTCLoss class.

public CTCLoss(int blankIndex = 0, bool inputsAreLogProbs = true)

Parameters

blankIndex int

The index of the blank symbol in the vocabulary. Default is 0.

inputsAreLogProbs bool

Whether inputs are already in log space. Default is true.

Exceptions

ArgumentNullException

Thrown when numericOperations is null.

ArgumentOutOfRangeException

Thrown when blankIndex is negative.

Methods

CalculateGradient(Tensor<T>, int[][], int[], int[])

Calculates the gradient of the CTC loss with respect to the inputs.

public Tensor<T> CalculateGradient(Tensor<T> logProbs, int[][] targets, int[] inputLengths, int[] targetLengths)

Parameters

logProbs Tensor<T>

Log probabilities tensor [batch, time, classes].

targets int[][]

Target label sequences for each batch item.

inputLengths int[]

Actual lengths of each input sequence.

targetLengths int[]

Actual lengths of each target sequence.

Returns

Tensor<T>

The gradient tensor with same shape as inputs.

CalculateLoss(Tensor<T>, int[][], int[], int[])

Calculates the CTC loss for a batch of sequences.

public T CalculateLoss(Tensor<T> logProbs, int[][] targets, int[] inputLengths, int[] targetLengths)

Parameters

logProbs Tensor<T>

Log probabilities tensor [batch, time, classes].

targets int[][]

Target label sequences for each batch item.

inputLengths int[]

Actual lengths of each input sequence.

targetLengths int[]

Actual lengths of each target sequence.

Returns

T

The average CTC loss value across the batch.