Interface IDocumentModel<T>

Namespace: AiDotNet.Document.Interfaces

Assembly: AiDotNet.dll

Base interface for all document AI models in AiDotNet.

public interface IDocumentModel<T> : IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>

Type Parameters

T: The numeric type used for calculations (e.g., float, double).

Inherited Members: IFullModel<T, Tensor<T>, Tensor<T>>.DefaultLossFunction

IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>.Train(Tensor<T>, Tensor<T>)

IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>.Predict(Tensor<T>)

IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>.GetModelMetadata()

IModelSerializer.Serialize()

IModelSerializer.Deserialize(byte[])

IModelSerializer.SaveModel(string)

IModelSerializer.LoadModel(string)

ICheckpointableModel.SaveState(Stream)

ICheckpointableModel.LoadState(Stream)

IParameterizable<T, Tensor<T>, Tensor<T>>.GetParameters()

IParameterizable<T, Tensor<T>, Tensor<T>>.SetParameters(Vector<T>)

IParameterizable<T, Tensor<T>, Tensor<T>>.ParameterCount

IParameterizable<T, Tensor<T>, Tensor<T>>.WithParameters(Vector<T>)

IFeatureAware.GetActiveFeatureIndices()

IFeatureAware.SetActiveFeatureIndices(IEnumerable<int>)

IFeatureAware.IsFeatureUsed(int)

IFeatureImportance<T>.GetFeatureImportance()

ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>.DeepCopy()

ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>.Clone()

IGradientComputable<T, Tensor<T>, Tensor<T>>.ComputeGradients(Tensor<T>, Tensor<T>, ILossFunction<T>)

IGradientComputable<T, Tensor<T>, Tensor<T>>.ApplyGradients(Vector<T>, T)

IJitCompilable<T>.ExportComputationGraph(List<ComputationNode<T>>)

IJitCompilable<T>.SupportsJitCompilation

Extension Methods: DistributedExtensions.AsDistributedForHighBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributedForLowBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, IShardingConfiguration<T>)

Remarks

This interface extends IFullModel<T, TInput, TOutput> to provide the core contract for document AI models, inheriting standard methods for training, inference, model persistence, and gradient computation.

For Beginners: A document AI model processes document images (scanned pages, PDFs, photos of text) to extract information, understand layout, or answer questions.

Key concepts:

Document images have shape [batch, channels, height, width]
Models can run in Native mode (pure C#) or ONNX mode (optimized runtime)
All models support both training and inference
Many document models combine vision and language understanding

Example usage:

var model = new LayoutLMv3<double>(architecture);
var layout = model.DetectLayout(documentImage);
var text = model.ExtractText(documentImage);

Properties

ExpectedImageSize

Gets the expected input image size (assumes square images).

int ExpectedImageSize { get; }

Property Value

int

Remarks

Common values: 224 (ViT base), 384, 448, 512, 768, 1024. Input images will be resized to [ImageSize x ImageSize] before processing.

IsOnnxMode

Gets whether this model is running in ONNX inference mode.

bool IsOnnxMode { get; }

Property Value

bool

Remarks

When true, the model uses pre-trained ONNX weights for fast inference. When false, the model uses native layers and can be trained.

MaxSequenceLength

Gets the maximum sequence length for text processing.

int MaxSequenceLength { get; }

Property Value

int

Remarks

For layout-aware models, this is the maximum number of text tokens. Common values: 512, 1024, 2048.

RequiresOCR

Gets whether this model requires OCR preprocessing.

bool RequiresOCR { get; }

Property Value

bool

Remarks

OCR-free models (Donut, Pix2Struct) return false - they process raw pixels directly. Layout-aware models (LayoutLM) return true - they need text and bounding boxes from OCR.

SupportedDocumentTypes

Gets the supported document types for this model.

DocumentType SupportedDocumentTypes { get; }

Property Value

DocumentType

Methods

EncodeDocument(Tensor<T>)

Processes a document image and returns encoded features.

Tensor<T> EncodeDocument(Tensor<T> documentImage)

Parameters

documentImage Tensor<T>: The document image tensor [batch, channels, height, width] or [channels, height, width].

Returns

Tensor<T>: Encoded document features suitable for downstream tasks.

Remarks

For Beginners: This method converts a document image into a numerical representation (feature vector) that captures the document's content and structure. These features can then be used for tasks like classification, QA, or information extraction.

GetModelSummary()

Gets a summary of the model architecture.

string GetModelSummary()

Returns

string: A string describing the model's architecture, parameters, and capabilities.

ValidateInputShape(Tensor<T>)

Validates that an input tensor has the correct shape for this model.

void ValidateInputShape(Tensor<T> documentImage)

Parameters

documentImage Tensor<T>: The tensor to validate.

Exceptions

ArgumentException: Thrown if the tensor shape is invalid.

Table of Contents

Interface IDocumentModel<T>

Type Parameters

Remarks

Properties

ExpectedImageSize

Property Value

Remarks

IsOnnxMode

Property Value

Remarks

MaxSequenceLength

Property Value

Remarks

RequiresOCR

Property Value

Remarks

SupportedDocumentTypes

Property Value

Methods

EncodeDocument(Tensor<T>)

Parameters

Returns

Remarks

GetModelSummary()

Returns

ValidateInputShape(Tensor<T>)

Parameters

Exceptions