Table of Contents

Class AutoTokenizer

Namespace
AiDotNet.Tokenization.HuggingFace
Assembly
AiDotNet.dll

AutoTokenizer provides HuggingFace-style automatic tokenizer loading. This class automatically detects and loads the appropriate tokenizer type based on the model configuration.

public static class AutoTokenizer
Inheritance
AutoTokenizer
Inherited Members

Remarks

Usage mirrors the HuggingFace transformers library:

// Load from HuggingFace Hub
var tokenizer = AutoTokenizer.FromPretrained("bert-base-uncased");

// Load from local directory
var tokenizer = AutoTokenizer.FromPretrained("./my-model");

Methods

ClearCache(string?, string?)

Clears the cache for a specific model or all models.

public static void ClearCache(string? modelName = null, string? cacheDir = null)

Parameters

modelName string

Optional model name to clear. If null, clears all cached tokenizers.

cacheDir string

Optional cache directory. Uses default if not specified.

FromPretrained(string, string?)

Loads a tokenizer from a pretrained model name or path.

public static ITokenizer FromPretrained(string modelNameOrPath, string? cacheDir = null)

Parameters

modelNameOrPath string

Either a HuggingFace model name (e.g., "bert-base-uncased", "gpt2") or a local directory path containing tokenizer files.

cacheDir string

Optional cache directory for downloaded files. Defaults to ~/.cache/huggingface/tokenizers

Returns

ITokenizer

The loaded tokenizer.

Exceptions

ArgumentException

Thrown when modelNameOrPath is empty.

InvalidOperationException

Thrown when tokenizer cannot be loaded.

FromPretrainedAsync(string, string?)

Asynchronously loads a tokenizer from a pretrained model name or path.

public static Task<ITokenizer> FromPretrainedAsync(string modelNameOrPath, string? cacheDir = null)

Parameters

modelNameOrPath string

Either a HuggingFace model name (e.g., "bert-base-uncased", "gpt2") or a local directory path containing tokenizer files.

cacheDir string

Optional cache directory for downloaded files. Defaults to ~/.cache/huggingface/tokenizers

Returns

Task<ITokenizer>

The loaded tokenizer.

GetDefaultCacheDir()

Gets the default cache directory for tokenizer files.

public static string GetDefaultCacheDir()

Returns

string

The default cache directory path.

IsCached(string, string?)

Checks if a tokenizer is cached locally.

public static bool IsCached(string modelName, string? cacheDir = null)

Parameters

modelName string

The model name to check.

cacheDir string

Optional cache directory. Uses default if not specified.

Returns

bool

True if the tokenizer is cached, false otherwise.

ListCachedModels(string?)

Lists all cached tokenizer models.

public static string[] ListCachedModels(string? cacheDir = null)

Parameters

cacheDir string

Optional cache directory. Uses default if not specified.

Returns

string[]

Array of cached model names.