Interface ITableExtractor<T>

Namespace: AiDotNet.Document.Interfaces

Assembly: AiDotNet.dll

Interface for table detection and structure recognition models.

public interface ITableExtractor<T> : IDocumentModel<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>

Type Parameters

T: The numeric type used for calculations.

Inherited Members: IDocumentModel<T>.ExpectedImageSize

IDocumentModel<T>.MaxSequenceLength

IDocumentModel<T>.RequiresOCR

IDocumentModel<T>.SupportedDocumentTypes

IDocumentModel<T>.IsOnnxMode

IDocumentModel<T>.EncodeDocument(Tensor<T>)

IDocumentModel<T>.ValidateInputShape(Tensor<T>)

IDocumentModel<T>.GetModelSummary()

IFullModel<T, Tensor<T>, Tensor<T>>.DefaultLossFunction

IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>.Train(Tensor<T>, Tensor<T>)

IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>.Predict(Tensor<T>)

IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>.GetModelMetadata()

IModelSerializer.Serialize()

IModelSerializer.Deserialize(byte[])

IModelSerializer.SaveModel(string)

IModelSerializer.LoadModel(string)

ICheckpointableModel.SaveState(Stream)

ICheckpointableModel.LoadState(Stream)

IParameterizable<T, Tensor<T>, Tensor<T>>.GetParameters()

IParameterizable<T, Tensor<T>, Tensor<T>>.SetParameters(Vector<T>)

IParameterizable<T, Tensor<T>, Tensor<T>>.ParameterCount

IParameterizable<T, Tensor<T>, Tensor<T>>.WithParameters(Vector<T>)

IFeatureAware.GetActiveFeatureIndices()

IFeatureAware.SetActiveFeatureIndices(IEnumerable<int>)

IFeatureAware.IsFeatureUsed(int)

IFeatureImportance<T>.GetFeatureImportance()

ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>.DeepCopy()

ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>.Clone()

IGradientComputable<T, Tensor<T>, Tensor<T>>.ComputeGradients(Tensor<T>, Tensor<T>, ILossFunction<T>)

IGradientComputable<T, Tensor<T>, Tensor<T>>.ApplyGradients(Vector<T>, T)

IJitCompilable<T>.ExportComputationGraph(List<ComputationNode<T>>)

IJitCompilable<T>.SupportsJitCompilation

Extension Methods: DistributedExtensions.AsDistributedForHighBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributedForLowBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, IShardingConfiguration<T>)

Remarks

Table extraction models detect tables in documents and extract their structure (rows, columns, cells) along with the content in each cell.

For Beginners: Documents often contain tables with important data. Table extraction helps computers understand where tables are, how they're structured, and what data they contain. This is useful for extracting financial data, product catalogs, or any tabular information.

Example usage:

var tables = tableExtractor.DetectTables(documentImage);
foreach (var table in tables)
{
    var structure = tableExtractor.RecognizeStructure(table.Image);
    Console.WriteLine($"Table with {structure.NumRows} rows and {structure.NumColumns} columns");
}

Properties

SupportsBorderedTables

Gets whether this model supports bordered tables.

bool SupportsBorderedTables { get; }

Property Value

bool

SupportsBorderlessTables

Gets whether this model supports borderless tables.

bool SupportsBorderlessTables { get; }

Property Value

bool

SupportsMergedCells

Gets whether this model can detect merged cells (row/column spans).

bool SupportsMergedCells { get; }

Property Value

bool

Methods

DetectTables(Tensor<T>)

Detects tables in a document image.

IEnumerable<TableRegion<T>> DetectTables(Tensor<T> documentImage)

Parameters

documentImage Tensor<T>: The document image tensor.

Returns

IEnumerable<TableRegion<T>>: List of detected table regions with bounding boxes.

ExportTables(Tensor<T>, TableExportFormat)

Exports detected tables to a specific format.

string ExportTables(Tensor<T> documentImage, TableExportFormat format)

Parameters

documentImage Tensor<T>: The document image tensor.
format TableExportFormat: Output format (CSV, JSON, HTML, Markdown, Excel).

Returns

string: Serialized table data in the specified format.

ExtractTableContent(Tensor<T>)

Extracts table content as structured data.

IEnumerable<List<List<string>>> ExtractTableContent(Tensor<T> documentImage)

Parameters

documentImage Tensor<T>: The document image tensor.

Returns

IEnumerable<List<List<string>>>: Tables as lists of rows, where each row is a list of cell contents.

Remarks

This is a convenience method that combines table detection, structure recognition, and OCR to return the final table content.

RecognizeStructure(Tensor<T>)

Recognizes the structure of a table (rows, columns, cells).

TableStructureResult<T> RecognizeStructure(Tensor<T> tableImage)

Parameters

tableImage Tensor<T>: Cropped table image tensor (from DetectTables).

Returns

TableStructureResult<T>: Table structure with cell positions and spans.

Table of Contents

Interface ITableExtractor<T>

Type Parameters

Remarks

Properties

SupportsBorderedTables

Property Value

SupportsBorderlessTables

Property Value

SupportsMergedCells

Property Value

Methods

DetectTables(Tensor<T>)

Parameters

Returns

ExportTables(Tensor<T>, TableExportFormat)

Parameters

Returns

ExtractTableContent(Tensor<T>)

Parameters

Returns

Remarks

RecognizeStructure(Tensor<T>)

Parameters

Returns