Interface ITextRecognizer<T>

Namespace: AiDotNet.Document.Interfaces

Assembly: AiDotNet.dll

Interface for text recognition models that read text from cropped image regions.

public interface ITextRecognizer<T> : IDocumentModel<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>

Type Parameters

T: The numeric type used for calculations.

Inherited Members: IDocumentModel<T>.ExpectedImageSize

IDocumentModel<T>.RequiresOCR

IDocumentModel<T>.SupportedDocumentTypes

IDocumentModel<T>.IsOnnxMode

IDocumentModel<T>.EncodeDocument(Tensor<T>)

IDocumentModel<T>.ValidateInputShape(Tensor<T>)

IDocumentModel<T>.GetModelSummary()

IFullModel<T, Tensor<T>, Tensor<T>>.DefaultLossFunction

IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>.Train(Tensor<T>, Tensor<T>)

IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>.Predict(Tensor<T>)

IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>.GetModelMetadata()

IModelSerializer.Serialize()

IModelSerializer.Deserialize(byte[])

IModelSerializer.SaveModel(string)

IModelSerializer.LoadModel(string)

ICheckpointableModel.SaveState(Stream)

ICheckpointableModel.LoadState(Stream)

IParameterizable<T, Tensor<T>, Tensor<T>>.GetParameters()

IParameterizable<T, Tensor<T>, Tensor<T>>.SetParameters(Vector<T>)

IParameterizable<T, Tensor<T>, Tensor<T>>.ParameterCount

IParameterizable<T, Tensor<T>, Tensor<T>>.WithParameters(Vector<T>)

IFeatureAware.GetActiveFeatureIndices()

IFeatureAware.SetActiveFeatureIndices(IEnumerable<int>)

IFeatureAware.IsFeatureUsed(int)

IFeatureImportance<T>.GetFeatureImportance()

ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>.DeepCopy()

ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>.Clone()

IGradientComputable<T, Tensor<T>, Tensor<T>>.ComputeGradients(Tensor<T>, Tensor<T>, ILossFunction<T>)

IGradientComputable<T, Tensor<T>, Tensor<T>>.ApplyGradients(Vector<T>, T)

IJitCompilable<T>.ExportComputationGraph(List<ComputationNode<T>>)

IJitCompilable<T>.SupportsJitCompilation

Extension Methods: DistributedExtensions.AsDistributedForHighBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributedForLowBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, IShardingConfiguration<T>)

Remarks

Text recognition models convert cropped images of text into character sequences. They work on pre-detected text regions (from a text detector).

For Beginners: Text recognition is the second step in reading text from images. Given a small image containing only text (like a single word or line), the recognizer outputs the actual characters. This is like reading what's written in a highlighted region.

Example usage:

var recognizer = new TrOCR<float>(architecture);
var result = recognizer.RecognizeText(croppedTextImage);
Console.WriteLine($"Recognized: {result.Text} (confidence: {result.Confidence})");

Properties

MaxSequenceLength

Gets the maximum sequence length this recognizer can output.

int MaxSequenceLength { get; }

Property Value

int

SupportedCharacters

Gets the supported character set (alphabet) for this recognizer.

string SupportedCharacters { get; }

Property Value

string

SupportsAttentionVisualization

Gets whether this recognizer supports attention visualization.

bool SupportsAttentionVisualization { get; }

Property Value

bool

Methods

GetAttentionWeights()

Gets the attention weights for visualization (if supported).

Tensor<T>? GetAttentionWeights()

Returns

Tensor<T>: Attention tensor showing which image regions influenced each character.

GetCharacterProbabilities()

Gets the character-level probabilities for the last recognition.

Tensor<T> GetCharacterProbabilities()

Returns

Tensor<T>: Tensor of shape [sequence_length, vocab_size] with probabilities.

RecognizeText(Tensor<T>)

Recognizes text from a cropped image region.

TextRecognitionResult<T> RecognizeText(Tensor<T> croppedImage)

Parameters

croppedImage Tensor<T>: Cropped image containing text (from text detector).

Returns

TextRecognitionResult<T>: Recognition result with text and confidence.

RecognizeTextBatch(IEnumerable<Tensor<T>>)

Recognizes text from multiple cropped image regions (batch processing).

IEnumerable<TextRecognitionResult<T>> RecognizeTextBatch(IEnumerable<Tensor<T>> croppedImages)

Parameters

croppedImages IEnumerable<Tensor<T>>: List of cropped images containing text.

Returns

IEnumerable<TextRecognitionResult<T>>: List of recognition results.

Table of Contents

Interface ITextRecognizer<T>

Type Parameters

Remarks

Properties

MaxSequenceLength

Property Value

SupportedCharacters

Property Value

SupportsAttentionVisualization

Property Value

Methods

GetAttentionWeights()

Returns

GetCharacterProbabilities()

Returns

RecognizeText(Tensor<T>)

Parameters

Returns

RecognizeTextBatch(IEnumerable<Tensor<T>>)

Parameters

Returns