Namespace AiDotNet.Tokenization.Algorithms
Classes
- BpeTokenizer
Byte-Pair Encoding (BPE) tokenizer implementation for subword tokenization.
- CharacterTokenizer
Character-level tokenizer that splits text into individual characters. Useful for character-based language models and some RNN architectures.
- SentencePieceTokenizer
SentencePiece tokenizer implementation using Unigram language model. Used for multilingual models and language-agnostic tokenization.
- UnigramTokenizer
Unigram Language Model tokenizer using probabilistic segmentation.
- WordPieceTokenizer
WordPiece tokenizer implementation. Used by BERT and similar models.