Class Document<T>
- Namespace
- AiDotNet.RetrievalAugmentedGeneration.Models
- Assembly
- AiDotNet.dll
Represents a document with content, metadata, and optional relevance scoring.
public class Document<T>
Type Parameters
TThe numeric data type used for relevance scoring.
- Inheritance
-
Document<T>
- Inherited Members
Remarks
A document is the fundamental unit of information in a retrieval-augmented generation system. It contains the actual text content, metadata for filtering and tracking, and optional relevance scores assigned during retrieval or reranking processes.
For Beginners: A document is like a file or article in your system.
Think of it like a book entry in a library catalog:
- Id: The unique catalog number
- Content: The actual text from the book
- Metadata: Information about the book (author, date, category, etc.)
- RelevanceScore: How well this book matches what you're looking for
For example, when you search for "climate change", documents about environmental science get high relevance scores, while documents about sports get low scores.
Constructors
Document()
Initializes a new instance of the Document class.
public Document()
Document(string, string)
Initializes a new instance of the Document class with specified content.
public Document(string id, string content)
Parameters
Document(string, string, Dictionary<string, object>)
Initializes a new instance of the Document class with content and metadata.
public Document(string id, string content, Dictionary<string, object> metadata)
Parameters
idstringThe unique identifier for the document.
contentstringThe text content of the document.
metadataDictionary<string, object>Metadata associated with the document.
Properties
Content
Gets or sets the text content of the document.
public string Content { get; set; }
Property Value
Remarks
This is the main textual content that will be searched, retrieved, and used to generate answers. The content can range from a single sentence to multiple paragraphs, depending on the chunking strategy employed.
For Beginners: This is the actual text from the document.
For example:
- A product description
- A paragraph from a research paper
- An answer from a FAQ
- A section from a technical manual
Embedding
Gets or sets the embedding vector for this document.
public Vector<T>? Embedding { get; set; }
Property Value
- Vector<T>
Remarks
The embedding is a dense vector representation of the document's semantic meaning, typically generated by an embedding model. This vector enables semantic similarity search and is used by dense retrievers.
HasRelevanceScore
Gets or sets whether this document has a relevance score assigned.
public bool HasRelevanceScore { get; set; }
Property Value
Id
Gets or sets the unique identifier for this document.
public string Id { get; set; }
Property Value
Remarks
The document ID should be unique within a document collection and persistent across sessions to enable consistent referencing and citation.
For Beginners: This is like a barcode or ISBN that uniquely identifies this document. No two documents should have the same ID.
Metadata
Gets or sets metadata associated with this document.
public Dictionary<string, object> Metadata { get; set; }
Property Value
Remarks
Metadata provides additional information about the document that can be used for filtering, categorization, and source attribution. Common metadata includes: - Source file or URL - Author or creator - Creation or modification date - Document type or category - Section or chapter information
For Beginners: Metadata is information *about* the document, not the content itself.
Think of it like tags on a YouTube video:
- Title, description, upload date (metadata)
- The actual video content (stored in Content property)
Metadata helps you filter documents, like "show me only documents from 2024" or "only documents written by Dr. Smith".
RelevanceScore
Gets or sets the relevance score assigned to this document by a retriever or reranker.
public T RelevanceScore { get; set; }
Property Value
- T
Remarks
The relevance score indicates how well this document matches a query. Higher scores indicate stronger relevance. The score scale and interpretation depend on the retrieval or reranking algorithm used. Use HasRelevanceScore to check if a score has been assigned before accessing this value.
For Beginners: This is like a match percentage showing how relevant this document is.
Think of it like search results:
- Score 0.95: Almost perfect match, highly relevant
- Score 0.50: Somewhat relevant
- Score 0.10: Barely relevant
- Check HasRelevanceScore first to see if scored
Documents with higher scores are more likely to contain the answer to your question.