Interface IOCRModel<T>

Namespace: AiDotNet.Document.Interfaces

Assembly: AiDotNet.dll

Interface for OCR (Optical Character Recognition) models.

public interface IOCRModel<T> : IDocumentModel<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>

Type Parameters

T: The numeric type used for calculations.

Inherited Members: IDocumentModel<T>.ExpectedImageSize

IDocumentModel<T>.MaxSequenceLength

IDocumentModel<T>.RequiresOCR

IDocumentModel<T>.SupportedDocumentTypes

IDocumentModel<T>.IsOnnxMode

IDocumentModel<T>.EncodeDocument(Tensor<T>)

IDocumentModel<T>.ValidateInputShape(Tensor<T>)

IDocumentModel<T>.GetModelSummary()

IFullModel<T, Tensor<T>, Tensor<T>>.DefaultLossFunction

IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>.Train(Tensor<T>, Tensor<T>)

IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>.Predict(Tensor<T>)

IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>.GetModelMetadata()

IModelSerializer.Serialize()

IModelSerializer.Deserialize(byte[])

IModelSerializer.SaveModel(string)

IModelSerializer.LoadModel(string)

ICheckpointableModel.SaveState(Stream)

ICheckpointableModel.LoadState(Stream)

IParameterizable<T, Tensor<T>, Tensor<T>>.GetParameters()

IParameterizable<T, Tensor<T>, Tensor<T>>.SetParameters(Vector<T>)

IParameterizable<T, Tensor<T>, Tensor<T>>.ParameterCount

IParameterizable<T, Tensor<T>, Tensor<T>>.WithParameters(Vector<T>)

IFeatureAware.GetActiveFeatureIndices()

IFeatureAware.SetActiveFeatureIndices(IEnumerable<int>)

IFeatureAware.IsFeatureUsed(int)

IFeatureImportance<T>.GetFeatureImportance()

ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>.DeepCopy()

ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>.Clone()

IGradientComputable<T, Tensor<T>, Tensor<T>>.ComputeGradients(Tensor<T>, Tensor<T>, ILossFunction<T>)

IGradientComputable<T, Tensor<T>, Tensor<T>>.ApplyGradients(Vector<T>, T)

IJitCompilable<T>.ExportComputationGraph(List<ComputationNode<T>>)

IJitCompilable<T>.SupportsJitCompilation

Extension Methods: DistributedExtensions.AsDistributedForHighBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributedForLowBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, IShardingConfiguration<T>)

Remarks

OCR models convert images containing text into machine-readable text strings, along with position information for each text element.

For Beginners: OCR is like teaching a computer to read. Given an image of text, the model outputs the actual text content and where each word/character is located.

Example usage:

var result = ocrModel.RecognizeText(documentImage);
Console.WriteLine($"Full text: {result.FullText}");
foreach (var word in result.Words)
{
    Console.WriteLine($"'{word.Text}' at position ({word.BoundingBox})");
}

Properties

IsOCRFree

Gets whether this is an OCR-free model (end-to-end pixel-to-text).

bool IsOCRFree { get; }

Property Value

bool

Remarks

OCR-free models like Donut directly convert pixels to text without explicit text detection or recognition stages. Traditional OCR has separate stages.

SupportedLanguages

Gets the languages supported by this OCR model.

IReadOnlyList<string> SupportedLanguages { get; }

Property Value

IReadOnlyList<string>

Remarks

Languages are specified using ISO 639-1 codes (e.g., "en", "zh", "ja"). Some models support multiple languages simultaneously.

Methods

RecognizeText(Tensor<T>)

Performs full OCR on a document image.

OCRResult<T> RecognizeText(Tensor<T> documentImage)

Parameters

documentImage Tensor<T>: The document image tensor.

Returns

OCRResult<T>: OCR result with text, positions, and confidence scores.

RecognizeTextInRegion(Tensor<T>, Vector<T>)

Performs OCR on a specific region of the document.

OCRResult<T> RecognizeTextInRegion(Tensor<T> documentImage, Vector<T> region)

Parameters

documentImage Tensor<T>: The document image tensor.
region Vector<T>: The region to process as normalized coordinates [x1, y1, x2, y2] where values are 0-1.

Returns

OCRResult<T>: OCR result for the specified region.

Table of Contents

Interface IOCRModel<T>

Type Parameters

Remarks

Properties

IsOCRFree

Property Value

Remarks

SupportedLanguages

Property Value

Remarks

Methods

RecognizeText(Tensor<T>)

Parameters

Returns

RecognizeTextInRegion(Tensor<T>, Vector<T>)

Parameters

Returns