Interface IDocumentQA<T>
- Namespace
- AiDotNet.Document.Interfaces
- Assembly
- AiDotNet.dll
Interface for document question answering models.
public interface IDocumentQA<T> : IDocumentModel<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>
Type Parameters
TThe numeric type used for calculations.
- Inherited Members
- Extension Methods
Remarks
Document QA models answer natural language questions about document content, combining visual understanding with text comprehension.
For Beginners: Document QA is like having a smart assistant that can read a document and answer your questions about it. You show it a document image and ask questions like "What is the total amount?" or "Who signed this contract?"
Example usage:
var result = documentQA.AnswerQuestion(invoiceImage, "What is the invoice number?");
Console.WriteLine($"Answer: {result.Answer} (confidence: {result.Confidence:P0})");
Methods
AnswerQuestion(Tensor<T>, string)
Answers a question about a document.
DocumentQAResult<T> AnswerQuestion(Tensor<T> documentImage, string question)
Parameters
documentImageTensor<T>The document image tensor.
questionstringThe question to answer in natural language.
Returns
- DocumentQAResult<T>
The answer with confidence and evidence information.
AnswerQuestion(Tensor<T>, string, int, double)
Answers a question with generation parameters.
DocumentQAResult<T> AnswerQuestion(Tensor<T> documentImage, string question, int maxAnswerLength, double temperature = 0)
Parameters
documentImageTensor<T>The document image tensor.
questionstringThe question to answer.
maxAnswerLengthintMaximum length of the generated answer.
temperaturedoubleSampling temperature for generation (0 = deterministic).
Returns
- DocumentQAResult<T>
The answer result.
AnswerQuestions(Tensor<T>, IEnumerable<string>)
Answers multiple questions about a document in a batch.
IEnumerable<DocumentQAResult<T>> AnswerQuestions(Tensor<T> documentImage, IEnumerable<string> questions)
Parameters
documentImageTensor<T>The document image tensor.
questionsIEnumerable<string>The questions to answer.
Returns
- IEnumerable<DocumentQAResult<T>>
Answers for each question in order.
Remarks
Batching multiple questions is more efficient than calling AnswerQuestion repeatedly because the document encoding can be reused.
ExtractFields(Tensor<T>, IEnumerable<string>)
Extracts specific fields from a document using natural language prompts.
Dictionary<string, DocumentQAResult<T>> ExtractFields(Tensor<T> documentImage, IEnumerable<string> fieldPrompts)
Parameters
documentImageTensor<T>The document image tensor.
fieldPromptsIEnumerable<string>Field names or extraction prompts (e.g., "invoice_number", "total_amount").
Returns
- Dictionary<string, DocumentQAResult<T>>
Dictionary mapping field names to their extracted values and confidence.
Remarks
For Beginners: This is a convenient way to extract multiple pieces of information at once. Instead of asking separate questions, you provide a list of field names and the model extracts all of them from the document.