Class VectorDocument<T>
- Namespace
- AiDotNet.RetrievalAugmentedGeneration.Models
- Assembly
- AiDotNet.dll
Represents a document paired with its vector embedding for storage and retrieval.
public class VectorDocument<T>
Type Parameters
TThe numeric data type used for the vector embedding (typically float or double).
- Inheritance
-
VectorDocument<T>
- Inherited Members
Remarks
A VectorDocument combines a Document with its vector embedding, creating a complete unit ready for indexing in a vector store. The vector embedding captures the semantic meaning of the document's content in a numerical form suitable for similarity calculations.
For Beginners: A VectorDocument is like a book with its catalog card.
Think of it as two pieces working together:
- Document: The actual book (content, title, author, etc.)
- Embedding: The numerical "fingerprint" describing what the book is about
Why combine them? When you add documents to a search system, you need both:
- The vector (for finding similar documents through math)
- The document (for returning the actual content to users)
For example:
- Document: "Climate change affects global temperatures..."
- Embedding: [0.23, -0.45, 0.78, ..., 0.12] (768 numbers)
The system uses the numbers to search, then returns the text.
Constructors
VectorDocument()
Initializes a new instance of the VectorDocument class.
public VectorDocument()
VectorDocument(Document<T>, Vector<T>)
Initializes a new instance of the VectorDocument class with a document and embedding.
public VectorDocument(Document<T> document, Vector<T> embedding)
Parameters
documentDocument<T>The document containing content and metadata.
embeddingVector<T>The vector embedding of the document.
Properties
Document
Gets or sets the document containing the text content and metadata.
public Document<T> Document { get; set; }
Property Value
- Document<T>
Remarks
For Beginners: This is the actual document with all its information. Contains the text, ID, metadata - everything except the vector.
Embedding
Gets or sets the vector embedding representing the document's semantic meaning.
public Vector<T> Embedding { get; set; }
Property Value
- Vector<T>
Remarks
The embedding is a dense vector that encodes the document's content into a numerical representation. The vector dimension must match the embedding model's output dimension. Embeddings enable efficient similarity search through mathematical distance calculations.
For Beginners: This is the numerical "fingerprint" of the document.
Think of it like a GPS coordinate:
- Just as GPS uses (latitude, longitude) to represent a location
- An embedding uses hundreds of numbers to represent meaning
- Documents with similar meanings have similar numbers (close GPS coordinates)
- Different meanings have different numbers (far apart coordinates)
For example, embeddings for:
- "cat" and "kitten" would be close together (similar meaning)
- "cat" and "democracy" would be far apart (different meaning)