Interface IOCRModel<T>
- Namespace
- AiDotNet.Document.Interfaces
- Assembly
- AiDotNet.dll
Interface for OCR (Optical Character Recognition) models.
public interface IOCRModel<T> : IDocumentModel<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>
Type Parameters
TThe numeric type used for calculations.
- Inherited Members
- Extension Methods
Remarks
OCR models convert images containing text into machine-readable text strings, along with position information for each text element.
For Beginners: OCR is like teaching a computer to read. Given an image of text, the model outputs the actual text content and where each word/character is located.
Example usage:
var result = ocrModel.RecognizeText(documentImage);
Console.WriteLine($"Full text: {result.FullText}");
foreach (var word in result.Words)
{
Console.WriteLine($"'{word.Text}' at position ({word.BoundingBox})");
}
Properties
IsOCRFree
Gets whether this is an OCR-free model (end-to-end pixel-to-text).
bool IsOCRFree { get; }
Property Value
Remarks
OCR-free models like Donut directly convert pixels to text without explicit text detection or recognition stages. Traditional OCR has separate stages.
SupportedLanguages
Gets the languages supported by this OCR model.
IReadOnlyList<string> SupportedLanguages { get; }
Property Value
Remarks
Languages are specified using ISO 639-1 codes (e.g., "en", "zh", "ja"). Some models support multiple languages simultaneously.
Methods
RecognizeText(Tensor<T>)
Performs full OCR on a document image.
OCRResult<T> RecognizeText(Tensor<T> documentImage)
Parameters
documentImageTensor<T>The document image tensor.
Returns
- OCRResult<T>
OCR result with text, positions, and confidence scores.
RecognizeTextInRegion(Tensor<T>, Vector<T>)
Performs OCR on a specific region of the document.
OCRResult<T> RecognizeTextInRegion(Tensor<T> documentImage, Vector<T> region)
Parameters
documentImageTensor<T>The document image tensor.
regionVector<T>The region to process as normalized coordinates [x1, y1, x2, y2] where values are 0-1.
Returns
- OCRResult<T>
OCR result for the specified region.