Interface IDocumentQA<T>

Namespace: AiDotNet.Document.Interfaces

Assembly: AiDotNet.dll

Interface for document question answering models.

public interface IDocumentQA<T> : IDocumentModel<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>

Type Parameters

T: The numeric type used for calculations.

Inherited Members: IDocumentModel<T>.ExpectedImageSize

IDocumentModel<T>.MaxSequenceLength

IDocumentModel<T>.RequiresOCR

IDocumentModel<T>.SupportedDocumentTypes

IDocumentModel<T>.IsOnnxMode

IDocumentModel<T>.EncodeDocument(Tensor<T>)

IDocumentModel<T>.ValidateInputShape(Tensor<T>)

IDocumentModel<T>.GetModelSummary()

IFullModel<T, Tensor<T>, Tensor<T>>.DefaultLossFunction

IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>.Train(Tensor<T>, Tensor<T>)

IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>.Predict(Tensor<T>)

IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>.GetModelMetadata()

IModelSerializer.Serialize()

IModelSerializer.Deserialize(byte[])

IModelSerializer.SaveModel(string)

IModelSerializer.LoadModel(string)

ICheckpointableModel.SaveState(Stream)

ICheckpointableModel.LoadState(Stream)

IParameterizable<T, Tensor<T>, Tensor<T>>.GetParameters()

IParameterizable<T, Tensor<T>, Tensor<T>>.SetParameters(Vector<T>)

IParameterizable<T, Tensor<T>, Tensor<T>>.ParameterCount

IParameterizable<T, Tensor<T>, Tensor<T>>.WithParameters(Vector<T>)

IFeatureAware.GetActiveFeatureIndices()

IFeatureAware.SetActiveFeatureIndices(IEnumerable<int>)

IFeatureAware.IsFeatureUsed(int)

IFeatureImportance<T>.GetFeatureImportance()

ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>.DeepCopy()

ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>.Clone()

IGradientComputable<T, Tensor<T>, Tensor<T>>.ComputeGradients(Tensor<T>, Tensor<T>, ILossFunction<T>)

IGradientComputable<T, Tensor<T>, Tensor<T>>.ApplyGradients(Vector<T>, T)

IJitCompilable<T>.ExportComputationGraph(List<ComputationNode<T>>)

IJitCompilable<T>.SupportsJitCompilation

Extension Methods: DistributedExtensions.AsDistributedForHighBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributedForLowBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, IShardingConfiguration<T>)

Remarks

Document QA models answer natural language questions about document content, combining visual understanding with text comprehension.

For Beginners: Document QA is like having a smart assistant that can read a document and answer your questions about it. You show it a document image and ask questions like "What is the total amount?" or "Who signed this contract?"

Example usage:

var result = documentQA.AnswerQuestion(invoiceImage, "What is the invoice number?");
Console.WriteLine($"Answer: {result.Answer} (confidence: {result.Confidence:P0})");

Methods

AnswerQuestion(Tensor<T>, string)

Answers a question about a document.

DocumentQAResult<T> AnswerQuestion(Tensor<T> documentImage, string question)

Parameters

documentImage Tensor<T>: The document image tensor.
question string: The question to answer in natural language.

Returns

DocumentQAResult<T>: The answer with confidence and evidence information.

AnswerQuestion(Tensor<T>, string, int, double)

Answers a question with generation parameters.

DocumentQAResult<T> AnswerQuestion(Tensor<T> documentImage, string question, int maxAnswerLength, double temperature = 0)

Parameters

documentImage Tensor<T>: The document image tensor.
question string: The question to answer.
maxAnswerLength int: Maximum length of the generated answer.
temperature double: Sampling temperature for generation (0 = deterministic).

Returns

DocumentQAResult<T>: The answer result.

AnswerQuestions(Tensor<T>, IEnumerable<string>)

Answers multiple questions about a document in a batch.

IEnumerable<DocumentQAResult<T>> AnswerQuestions(Tensor<T> documentImage, IEnumerable<string> questions)

Parameters

documentImage Tensor<T>: The document image tensor.
questions IEnumerable<string>: The questions to answer.

Returns

IEnumerable<DocumentQAResult<T>>: Answers for each question in order.

Remarks

Batching multiple questions is more efficient than calling AnswerQuestion repeatedly because the document encoding can be reused.

ExtractFields(Tensor<T>, IEnumerable<string>)

Extracts specific fields from a document using natural language prompts.

Dictionary<string, DocumentQAResult<T>> ExtractFields(Tensor<T> documentImage, IEnumerable<string> fieldPrompts)

Parameters

documentImage Tensor<T>: The document image tensor.
fieldPrompts IEnumerable<string>: Field names or extraction prompts (e.g., "invoice_number", "total_amount").

Returns

Dictionary<string, DocumentQAResult<T>>: Dictionary mapping field names to their extracted values and confidence.

Remarks

For Beginners: This is a convenient way to extract multiple pieces of information at once. Instead of asking separate questions, you provide a list of field names and the model extracts all of them from the document.

Table of Contents

Interface IDocumentQA<T>

Type Parameters

Remarks

Methods

AnswerQuestion(Tensor<T>, string)

Parameters

Returns

AnswerQuestion(Tensor<T>, string, int, double)

Parameters

Returns

AnswerQuestions(Tensor<T>, IEnumerable<string>)

Parameters

Returns

Remarks

ExtractFields(Tensor<T>, IEnumerable<string>)

Parameters

Returns

Remarks