Class BTreeIndex
- Namespace
- AiDotNet.RetrievalAugmentedGeneration.Graph
- Assembly
- AiDotNet.dll
Simple file-based index for mapping string keys to file offsets.
public class BTreeIndex : IDisposable
- Inheritance
-
BTreeIndex
- Implements
- Inherited Members
Remarks
This class provides a persistent index structure that maps string keys (e.g., node IDs) to byte offsets in data files. The index is stored on disk and reloaded on restart, enabling fast lookups without scanning entire data files.
For Beginners: Think of this like a book's index at the back.
Without an index:
- To find "photosynthesis", you'd read every page from start to finish
- Very slow for large books (or large data files)
With an index:
- Look up "photosynthesis" → find it's on page 157
- Jump directly to page 157
- Much faster!
This class does the same for graph data:
- Key: "node_alice_001"
- Value: byte offset 45678 in nodes.dat file
- We can jump directly to byte 45678 to read Alice's data
The index itself is stored in a file so it survives application restarts.
Implementation Note: This is a simplified index using a sorted dictionary. For production systems with millions of entries, consider implementing a true B-Tree structure with splitting/merging nodes, or use an embedded database like SQLite or LevelDB.
Constructors
BTreeIndex(string)
Initializes a new instance of the BTreeIndex class.
public BTreeIndex(string indexFilePath)
Parameters
indexFilePathstringThe path to the index file on disk.
Properties
Count
Gets the number of entries in the index.
public int Count { get; }
Property Value
Methods
Add(string, long)
Adds or updates a key-offset mapping in the index.
public void Add(string key, long offset)
Parameters
Remarks
For Beginners: This adds an entry to the index.
Example:
- index.Add("alice", 1024)
- This means: "Alice's data starts at byte 1024 in the file"
If "alice" already exists, it updates to the new offset. The index is marked as "dirty" and will be saved to disk later.
Clear()
Removes all entries from the index.
public void Clear()
Contains(string)
Checks if a key exists in the index.
public bool Contains(string key)
Parameters
keystringThe key to check.
Returns
- bool
True if the key exists; otherwise, false.
Dispose()
Disposes the index, ensuring all changes are saved to disk.
public void Dispose()
Dispose(bool)
Releases resources used by the index.
protected virtual void Dispose(bool disposing)
Parameters
disposingboolTrue if called from Dispose(), false if called from finalizer.
Flush()
Saves the index to disk if it has been modified.
public void Flush()
Remarks
This method writes the index to disk in a simple text format: - Each line: "key|offset" - Example: "alice|1024"
The file is only written if changes have been made (_isDirty flag). This is called automatically by Dispose() to ensure data isn't lost.
For Beginners: This saves the index to a file.
Think of it like saving your work:
- You've been adding entries to the index (in memory)
- Flush() writes everything to disk
- Now the index survives if the program crashes or restarts
Format example (nodes_index.db):
alice|1024
bob|2048
charlie|3072
Get(string)
Retrieves the file offset associated with a key.
public long Get(string key)
Parameters
keystringThe key to look up.
Returns
- long
The byte offset if found; otherwise, -1.
Remarks
For Beginners: This looks up where data is stored.
Example:
- var offset = index.Get("alice")
- Returns: 1024 (meaning Alice's data is at byte 1024)
- Or returns -1 if "alice" is not in the index
GetAllKeys()
Gets all keys in the index.
public IEnumerable<string> GetAllKeys()
Returns
- IEnumerable<string>
Collection of all keys.
Remove(string)
Removes a key from the index.
public bool Remove(string key)
Parameters
keystringThe key to remove.
Returns
- bool
True if the key was found and removed; otherwise, false.
Remarks
For Beginners: This removes an entry from the index.
Example:
- index.Remove("alice")
- Now we can no longer look up where Alice's data is
- The actual data file is NOT modified - just the index
Note: For production systems, you'd typically mark entries as deleted rather than removing them, to avoid fragmenting the data file.