Interface ITextRecognizer<T>
- Namespace
- AiDotNet.Document.Interfaces
- Assembly
- AiDotNet.dll
Interface for text recognition models that read text from cropped image regions.
public interface ITextRecognizer<T> : IDocumentModel<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>
Type Parameters
TThe numeric type used for calculations.
- Inherited Members
- Extension Methods
Remarks
Text recognition models convert cropped images of text into character sequences. They work on pre-detected text regions (from a text detector).
For Beginners: Text recognition is the second step in reading text from images. Given a small image containing only text (like a single word or line), the recognizer outputs the actual characters. This is like reading what's written in a highlighted region.
Example usage:
var recognizer = new TrOCR<float>(architecture);
var result = recognizer.RecognizeText(croppedTextImage);
Console.WriteLine($"Recognized: {result.Text} (confidence: {result.Confidence})");
Properties
MaxSequenceLength
Gets the maximum sequence length this recognizer can output.
int MaxSequenceLength { get; }
Property Value
SupportedCharacters
Gets the supported character set (alphabet) for this recognizer.
string SupportedCharacters { get; }
Property Value
SupportsAttentionVisualization
Gets whether this recognizer supports attention visualization.
bool SupportsAttentionVisualization { get; }
Property Value
Methods
GetAttentionWeights()
Gets the attention weights for visualization (if supported).
Tensor<T>? GetAttentionWeights()
Returns
- Tensor<T>
Attention tensor showing which image regions influenced each character.
GetCharacterProbabilities()
Gets the character-level probabilities for the last recognition.
Tensor<T> GetCharacterProbabilities()
Returns
- Tensor<T>
Tensor of shape [sequence_length, vocab_size] with probabilities.
RecognizeText(Tensor<T>)
Recognizes text from a cropped image region.
TextRecognitionResult<T> RecognizeText(Tensor<T> croppedImage)
Parameters
croppedImageTensor<T>Cropped image containing text (from text detector).
Returns
- TextRecognitionResult<T>
Recognition result with text and confidence.
RecognizeTextBatch(IEnumerable<Tensor<T>>)
Recognizes text from multiple cropped image regions (batch processing).
IEnumerable<TextRecognitionResult<T>> RecognizeTextBatch(IEnumerable<Tensor<T>> croppedImages)
Parameters
croppedImagesIEnumerable<Tensor<T>>List of cropped images containing text.
Returns
- IEnumerable<TextRecognitionResult<T>>
List of recognition results.