Table of Contents

Interface IVocabulary

Namespace
AiDotNet.Tokenization.Interfaces
Assembly
AiDotNet.dll

Interface for vocabulary management.

public interface IVocabulary

Properties

IdToToken

Gets the ID-to-token mapping.

IReadOnlyDictionary<int, string> IdToToken { get; }

Property Value

IReadOnlyDictionary<int, string>

Size

Gets the vocabulary size.

int Size { get; }

Property Value

int

TokenToId

Gets the token-to-ID mapping.

IReadOnlyDictionary<string, int> TokenToId { get; }

Property Value

IReadOnlyDictionary<string, int>

Methods

AddToken(string)

Adds a token to the vocabulary.

int AddToken(string token)

Parameters

token string

The token to add.

Returns

int

The token ID.

AddTokens(IEnumerable<string>)

Adds multiple tokens to the vocabulary.

void AddTokens(IEnumerable<string> tokens)

Parameters

tokens IEnumerable<string>

The tokens to add.

Clear()

Clears the vocabulary.

void Clear()

ContainsId(int)

Checks if a token ID exists in the vocabulary.

bool ContainsId(int id)

Parameters

id int

The token ID to check.

Returns

bool

True if the token ID exists, false otherwise.

ContainsToken(string)

Checks if a token exists in the vocabulary.

bool ContainsToken(string token)

Parameters

token string

The token to check.

Returns

bool

True if the token exists, false otherwise.

GetAllTokens()

Gets all tokens in the vocabulary.

IEnumerable<string> GetAllTokens()

Returns

IEnumerable<string>

All tokens.

GetToken(int)

Gets the token for a given token ID.

string? GetToken(int id)

Parameters

id int

The token ID.

Returns

string

The token, or null if not found.

GetTokenId(string)

Gets the token ID for a given token.

int GetTokenId(string token)

Parameters

token string

The token.

Returns

int

The token ID, or the unknown token ID if not found.