Table of Contents

Class Document<T>

Namespace
AiDotNet.RetrievalAugmentedGeneration.Models
Assembly
AiDotNet.dll

Represents a document with content, metadata, and optional relevance scoring.

public class Document<T>

Type Parameters

T

The numeric data type used for relevance scoring.

Inheritance
Document<T>
Inherited Members

Remarks

A document is the fundamental unit of information in a retrieval-augmented generation system. It contains the actual text content, metadata for filtering and tracking, and optional relevance scores assigned during retrieval or reranking processes.

For Beginners: A document is like a file or article in your system.

Think of it like a book entry in a library catalog:

  • Id: The unique catalog number
  • Content: The actual text from the book
  • Metadata: Information about the book (author, date, category, etc.)
  • RelevanceScore: How well this book matches what you're looking for

For example, when you search for "climate change", documents about environmental science get high relevance scores, while documents about sports get low scores.

Constructors

Document()

Initializes a new instance of the Document class.

public Document()

Document(string, string)

Initializes a new instance of the Document class with specified content.

public Document(string id, string content)

Parameters

id string

The unique identifier for the document.

content string

The text content of the document.

Document(string, string, Dictionary<string, object>)

Initializes a new instance of the Document class with content and metadata.

public Document(string id, string content, Dictionary<string, object> metadata)

Parameters

id string

The unique identifier for the document.

content string

The text content of the document.

metadata Dictionary<string, object>

Metadata associated with the document.

Properties

Content

Gets or sets the text content of the document.

public string Content { get; set; }

Property Value

string

Remarks

This is the main textual content that will be searched, retrieved, and used to generate answers. The content can range from a single sentence to multiple paragraphs, depending on the chunking strategy employed.

For Beginners: This is the actual text from the document.

For example:

  • A product description
  • A paragraph from a research paper
  • An answer from a FAQ
  • A section from a technical manual

Embedding

Gets or sets the embedding vector for this document.

public Vector<T>? Embedding { get; set; }

Property Value

Vector<T>

Remarks

The embedding is a dense vector representation of the document's semantic meaning, typically generated by an embedding model. This vector enables semantic similarity search and is used by dense retrievers.

HasRelevanceScore

Gets or sets whether this document has a relevance score assigned.

public bool HasRelevanceScore { get; set; }

Property Value

bool

Id

Gets or sets the unique identifier for this document.

public string Id { get; set; }

Property Value

string

Remarks

The document ID should be unique within a document collection and persistent across sessions to enable consistent referencing and citation.

For Beginners: This is like a barcode or ISBN that uniquely identifies this document. No two documents should have the same ID.

Metadata

Gets or sets metadata associated with this document.

public Dictionary<string, object> Metadata { get; set; }

Property Value

Dictionary<string, object>

Remarks

Metadata provides additional information about the document that can be used for filtering, categorization, and source attribution. Common metadata includes: - Source file or URL - Author or creator - Creation or modification date - Document type or category - Section or chapter information

For Beginners: Metadata is information *about* the document, not the content itself.

Think of it like tags on a YouTube video:

  • Title, description, upload date (metadata)
  • The actual video content (stored in Content property)

Metadata helps you filter documents, like "show me only documents from 2024" or "only documents written by Dr. Smith".

RelevanceScore

Gets or sets the relevance score assigned to this document by a retriever or reranker.

public T RelevanceScore { get; set; }

Property Value

T

Remarks

The relevance score indicates how well this document matches a query. Higher scores indicate stronger relevance. The score scale and interpretation depend on the retrieval or reranking algorithm used. Use HasRelevanceScore to check if a score has been assigned before accessing this value.

For Beginners: This is like a match percentage showing how relevant this document is.

Think of it like search results:

  • Score 0.95: Almost perfect match, highly relevant
  • Score 0.50: Somewhat relevant
  • Score 0.10: Barely relevant
  • Check HasRelevanceScore first to see if scored

Documents with higher scores are more likely to contain the answer to your question.