Interface ITableExtractor<T>
- Namespace
- AiDotNet.Document.Interfaces
- Assembly
- AiDotNet.dll
Interface for table detection and structure recognition models.
public interface ITableExtractor<T> : IDocumentModel<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>
Type Parameters
TThe numeric type used for calculations.
- Inherited Members
- Extension Methods
Remarks
Table extraction models detect tables in documents and extract their structure (rows, columns, cells) along with the content in each cell.
For Beginners: Documents often contain tables with important data. Table extraction helps computers understand where tables are, how they're structured, and what data they contain. This is useful for extracting financial data, product catalogs, or any tabular information.
Example usage:
var tables = tableExtractor.DetectTables(documentImage);
foreach (var table in tables)
{
var structure = tableExtractor.RecognizeStructure(table.Image);
Console.WriteLine($"Table with {structure.NumRows} rows and {structure.NumColumns} columns");
}
Properties
SupportsBorderedTables
Gets whether this model supports bordered tables.
bool SupportsBorderedTables { get; }
Property Value
SupportsBorderlessTables
Gets whether this model supports borderless tables.
bool SupportsBorderlessTables { get; }
Property Value
SupportsMergedCells
Gets whether this model can detect merged cells (row/column spans).
bool SupportsMergedCells { get; }
Property Value
Methods
DetectTables(Tensor<T>)
Detects tables in a document image.
IEnumerable<TableRegion<T>> DetectTables(Tensor<T> documentImage)
Parameters
documentImageTensor<T>The document image tensor.
Returns
- IEnumerable<TableRegion<T>>
List of detected table regions with bounding boxes.
ExportTables(Tensor<T>, TableExportFormat)
Exports detected tables to a specific format.
string ExportTables(Tensor<T> documentImage, TableExportFormat format)
Parameters
documentImageTensor<T>The document image tensor.
formatTableExportFormatOutput format (CSV, JSON, HTML, Markdown, Excel).
Returns
- string
Serialized table data in the specified format.
ExtractTableContent(Tensor<T>)
Extracts table content as structured data.
IEnumerable<List<List<string>>> ExtractTableContent(Tensor<T> documentImage)
Parameters
documentImageTensor<T>The document image tensor.
Returns
- IEnumerable<List<List<string>>>
Tables as lists of rows, where each row is a list of cell contents.
Remarks
This is a convenience method that combines table detection, structure recognition, and OCR to return the final table content.
RecognizeStructure(Tensor<T>)
Recognizes the structure of a table (rows, columns, cells).
TableStructureResult<T> RecognizeStructure(Tensor<T> tableImage)
Parameters
tableImageTensor<T>Cropped table image tensor (from DetectTables).
Returns
- TableStructureResult<T>
Table structure with cell positions and spans.