Table of Contents

Class OCRBase<T>

Namespace
AiDotNet.ComputerVision.OCR
Assembly
AiDotNet.dll

Base class for OCR models.

public abstract class OCRBase<T>

Type Parameters

T

The numeric type used for calculations.

Inheritance
OCRBase<T>
Derived
Inherited Members

Constructors

OCRBase(OCROptions<T>)

Creates a new OCR model.

protected OCRBase(OCROptions<T> options)

Parameters

options OCROptions<T>

Fields

CharToIndex

Character to index mapping.

protected readonly Dictionary<char, int> CharToIndex

Field Value

Dictionary<char, int>

DefaultCharacterSet

Default character set for recognition.

protected static readonly string DefaultCharacterSet

Field Value

string

IndexToChar

Index to character mapping.

protected readonly Dictionary<int, char> IndexToChar

Field Value

Dictionary<int, char>

NumOps

protected readonly INumericOperations<T> NumOps

Field Value

INumericOperations<T>

Options

protected readonly OCROptions<T> Options

Field Value

OCROptions<T>

Properties

Name

Name of this OCR model.

public abstract string Name { get; }

Property Value

string

VocabularySize

Gets the vocabulary size (number of classes).

public int VocabularySize { get; }

Property Value

int

Methods

ComputeConfidence(Tensor<T>, string)

Computes confidence from logits.

protected T ComputeConfidence(Tensor<T> logits, string decodedText)

Parameters

logits Tensor<T>
decodedText string

Returns

T

DecodeAttention(Tensor<T>, int)

Decodes attention-based output to text.

protected string DecodeAttention(Tensor<T> logits, int endTokenId)

Parameters

logits Tensor<T>
endTokenId int

Returns

string

DecodeCTC(Tensor<T>)

Decodes CTC output to text.

protected string DecodeCTC(Tensor<T> logits)

Parameters

logits Tensor<T>

Returns

string

GetParameterCount()

Gets the total parameter count.

public abstract long GetParameterCount()

Returns

long

LoadWeightsAsync(string, CancellationToken)

Loads pretrained weights.

public abstract Task LoadWeightsAsync(string pathOrUrl, CancellationToken cancellationToken = default)

Parameters

pathOrUrl string
cancellationToken CancellationToken

Returns

Task

PreprocessCrop(Tensor<T>)

Preprocesses a text crop for recognition.

protected virtual Tensor<T> PreprocessCrop(Tensor<T> crop)

Parameters

crop Tensor<T>

Returns

Tensor<T>

Recognize(Tensor<T>)

Recognizes text in an image.

public abstract OCRResult<T> Recognize(Tensor<T> image)

Parameters

image Tensor<T>

Input image tensor [batch, channels, height, width].

Returns

OCRResult<T>

OCR result with recognized text.

RecognizeText(Tensor<T>)

Recognizes text in a cropped text region.

public abstract (string text, T confidence) RecognizeText(Tensor<T> croppedImage)

Parameters

croppedImage Tensor<T>

Cropped text region tensor.

Returns

(string Label, T Confidence)

Recognized text and confidence.

ResizeBilinear(Tensor<T>, int, int)

Resizes tensor using bilinear interpolation.

protected Tensor<T> ResizeBilinear(Tensor<T> input, int targetH, int targetW)

Parameters

input Tensor<T>
targetH int
targetW int

Returns

Tensor<T>

SaveWeights(string)

Saves model weights.

public abstract void SaveWeights(string path)

Parameters

path string