Table of Contents

Interface ITextRecognizer<T>

Namespace
AiDotNet.Document.Interfaces
Assembly
AiDotNet.dll

Interface for text recognition models that read text from cropped image regions.

public interface ITextRecognizer<T> : IDocumentModel<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>

Type Parameters

T

The numeric type used for calculations.

Inherited Members
Extension Methods

Remarks

Text recognition models convert cropped images of text into character sequences. They work on pre-detected text regions (from a text detector).

For Beginners: Text recognition is the second step in reading text from images. Given a small image containing only text (like a single word or line), the recognizer outputs the actual characters. This is like reading what's written in a highlighted region.

Example usage:

var recognizer = new TrOCR<float>(architecture);
var result = recognizer.RecognizeText(croppedTextImage);
Console.WriteLine($"Recognized: {result.Text} (confidence: {result.Confidence})");

Properties

MaxSequenceLength

Gets the maximum sequence length this recognizer can output.

int MaxSequenceLength { get; }

Property Value

int

SupportedCharacters

Gets the supported character set (alphabet) for this recognizer.

string SupportedCharacters { get; }

Property Value

string

SupportsAttentionVisualization

Gets whether this recognizer supports attention visualization.

bool SupportsAttentionVisualization { get; }

Property Value

bool

Methods

GetAttentionWeights()

Gets the attention weights for visualization (if supported).

Tensor<T>? GetAttentionWeights()

Returns

Tensor<T>

Attention tensor showing which image regions influenced each character.

GetCharacterProbabilities()

Gets the character-level probabilities for the last recognition.

Tensor<T> GetCharacterProbabilities()

Returns

Tensor<T>

Tensor of shape [sequence_length, vocab_size] with probabilities.

RecognizeText(Tensor<T>)

Recognizes text from a cropped image region.

TextRecognitionResult<T> RecognizeText(Tensor<T> croppedImage)

Parameters

croppedImage Tensor<T>

Cropped image containing text (from text detector).

Returns

TextRecognitionResult<T>

Recognition result with text and confidence.

RecognizeTextBatch(IEnumerable<Tensor<T>>)

Recognizes text from multiple cropped image regions (batch processing).

IEnumerable<TextRecognitionResult<T>> RecognizeTextBatch(IEnumerable<Tensor<T>> croppedImages)

Parameters

croppedImages IEnumerable<Tensor<T>>

List of cropped images containing text.

Returns

IEnumerable<TextRecognitionResult<T>>

List of recognition results.