Table of Contents

Class TrOCR<T>

Namespace
AiDotNet.ComputerVision.OCR.Recognition
Assembly
AiDotNet.dll

TrOCR (Transformer-based OCR) for text recognition.

public class TrOCR<T> : OCRBase<T>

Type Parameters

T

The numeric type used for calculations.

Inheritance
TrOCR<T>
Inherited Members

Remarks

For Beginners: TrOCR uses a Vision Transformer (ViT) as the encoder to extract visual features, and a Transformer decoder to generate text autoregressively. This architecture leverages the power of pre-trained language models.

Key features: - Vision Transformer encoder for image understanding - Transformer decoder with attention for text generation - Autoregressive decoding with beam search - Can leverage pre-trained models

Reference: Li et al., "TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models", AAAI 2023

Constructors

TrOCR(OCROptions<T>)

Creates a new TrOCR text recognizer.

public TrOCR(OCROptions<T> options)

Parameters

options OCROptions<T>

Properties

Name

Name of this OCR model.

public override string Name { get; }

Property Value

string

NumHeads

Gets the number of attention heads in the transformer.

public int NumHeads { get; }

Property Value

int

Methods

GetParameterCount()

Gets the total parameter count.

public override long GetParameterCount()

Returns

long

LoadWeightsAsync(string, CancellationToken)

Loads pretrained weights.

public override Task LoadWeightsAsync(string pathOrUrl, CancellationToken cancellationToken = default)

Parameters

pathOrUrl string
cancellationToken CancellationToken

Returns

Task

Recognize(Tensor<T>)

Recognizes text in an image.

public override OCRResult<T> Recognize(Tensor<T> image)

Parameters

image Tensor<T>

Input image tensor [batch, channels, height, width].

Returns

OCRResult<T>

OCR result with recognized text.

RecognizeText(Tensor<T>)

Recognizes text in a cropped text region.

public override (string text, T confidence) RecognizeText(Tensor<T> croppedImage)

Parameters

croppedImage Tensor<T>

Cropped text region tensor.

Returns

(string Label, T Confidence)

Recognized text and confidence.

SaveWeights(string)

Saves model weights.

public override void SaveWeights(string path)

Parameters

path string