Table of Contents

Interface IDocumentStore<T>

Namespace
AiDotNet.Interfaces
Assembly
AiDotNet.dll

Defines the contract for document stores that index and retrieve vectorized documents.

public interface IDocumentStore<T>

Type Parameters

T

The numeric data type used for vector calculations (typically float or double).

Remarks

A document store manages a collection of documents with their vector embeddings, enabling efficient similarity-based retrieval. Implementations can range from simple in-memory storage to distributed vector databases. The interface supports adding documents, similarity search, and metadata-based filtering.

For Beginners: A document store is like a smart library that organizes information by meaning.

Think of it like a special filing cabinet:

  • Regular filing cabinet: Organized alphabetically or by date
  • Document store: Organized by meaning using math

When you search for "climate change", it finds documents about environmental issues even if they don't contain those exact words, because it understands the meaning.

It's like having a librarian who truly understands what each book is about and can find exactly what you need based on your question, not just keywords.

Properties

DocumentCount

Gets the number of documents currently stored in the document store.

int DocumentCount { get; }

Property Value

int

Remarks

For Beginners: This tells you how many documents are in the store, like counting how many books are in a library.

VectorDimension

Gets the dimensionality of the vectors stored in this document store.

int VectorDimension { get; }

Property Value

int

Remarks

All vectors in the store must have the same dimension, which is determined by the embedding model used. This ensures consistent similarity calculations.

For Beginners: This is the size of the number lists representing each document. All documents must use the same size so they can be fairly compared.

Methods

Add(VectorDocument<T>)

Adds a single vectorized document to the store.

void Add(VectorDocument<T> vectorDocument)

Parameters

vectorDocument VectorDocument<T>

Remarks

This method indexes a document along with its vector embedding for later retrieval. If a document with the same ID already exists, the behavior depends on the implementation (typically either update or throw an exception).

For Beginners: This adds a new document to the library.

Like adding a new book:

  • document: The book itself (with its content and info)
  • embedding: A numeric "fingerprint" of what the book is about

The fingerprint lets the system quickly find similar books later.

AddBatch(IEnumerable<VectorDocument<T>>)

Adds multiple vectorized documents to the store in a batch operation.

void AddBatch(IEnumerable<VectorDocument<T>> vectorDocuments)

Parameters

vectorDocuments IEnumerable<VectorDocument<T>>

The documents to add.

Remarks

Batch addition is more efficient than adding documents individually. The embeddings matrix should have dimensions [documentCount, VectorDimension], with each row representing one document's embedding in the same order as the documents enumerable.

For Beginners: This adds many documents at once, which is faster.

Like processing a whole shipment of new books to the library:

  • Instead of cataloging one book at a time
  • You process the entire box together
  • Much more efficient for large collections

Clear()

Removes all documents from the store.

void Clear()

Remarks

For Beginners: This empties the entire store, removing all documents. Use with caution - this cannot be undone!

GetAll()

Gets all documents currently stored in the document store.

IEnumerable<Document<T>> GetAll()

Returns

IEnumerable<Document<T>>

An enumerable of all documents in the store.

Remarks

This method retrieves all documents without any filtering or sorting. Use with caution on large document stores as it may be memory-intensive. For production systems with large document collections, consider using pagination or streaming approaches.

For Beginners: This gets every single document from the store.

Like asking the librarian: "Show me every book in the library"

Careful: If you have millions of documents, this could take a while and use a lot of memory!

GetById(string)

Retrieves a document by its unique identifier.

Document<T>? GetById(string documentId)

Parameters

documentId string

The unique identifier of the document to retrieve.

Returns

Document<T>

The document if found; otherwise, null.

Remarks

For Beginners: This gets a specific document if you know its ID.

Like asking the librarian: "Give me the book with catalog number ABC123"

GetSimilar(Vector<T>, int)

Retrieves the top-k most similar documents to a given query vector.

IEnumerable<Document<T>> GetSimilar(Vector<T> queryVector, int topK)

Parameters

queryVector Vector<T>

The vector to search for similar documents.

topK int

The number of most similar documents to return.

Returns

IEnumerable<Document<T>>

An enumerable of documents ordered by similarity (most similar first), with relevance scores populated.

Remarks

This method performs similarity search to find documents whose vector embeddings are closest to the query vector. The similarity metric (e.g., cosine similarity, Euclidean distance) is implementation-specific. Results are ordered by decreasing similarity/relevance.

For Beginners: This finds documents most similar to your search.

Think of it like asking the librarian: "Find me the 5 books most similar to this topic"

The system:

  • Compares your query's "fingerprint" to all document fingerprints
  • Finds the closest matches using math (measuring distances in number-space)
  • Returns the top matches, ordered from best to worst match

topK = 5 means "give me the 5 best matches"

GetSimilarWithFilters(Vector<T>, int, Dictionary<string, object>)

Retrieves similar documents with additional metadata filtering.

IEnumerable<Document<T>> GetSimilarWithFilters(Vector<T> queryVector, int topK, Dictionary<string, object> metadataFilters)

Parameters

queryVector Vector<T>

The vector to search for similar documents.

topK int

The number of most similar documents to return.

metadataFilters Dictionary<string, object>

Metadata filters to apply before similarity search.

Returns

IEnumerable<Document<T>>

An enumerable of filtered documents ordered by similarity, with relevance scores populated.

Remarks

This method combines similarity search with metadata filtering. First, documents are filtered based on metadata criteria, then similarity search is performed on the remaining candidates. This enables queries like "find similar documents from 2024" or "find similar documents by author X".

For Beginners: This finds similar documents but only from a specific subset.

Think of asking the librarian: "Find me the 5 most relevant books about climate change, but only books published after 2020 and only in the Science section"

The filters narrow down which documents to search:

  • metadata["year"] >= 2020
  • metadata["section"] == "Science"

Then similarity search runs only on documents that pass these filters.

Remove(string)

Removes a document from the store by its identifier.

bool Remove(string documentId)

Parameters

documentId string

The unique identifier of the document to remove.

Returns

bool

True if the document was found and removed; otherwise, false.

Remarks

For Beginners: This removes a document from the store.

Like removing a book from the library catalog - it's no longer searchable.