Table of Contents

Class PhonemeTokenizer

Namespace
AiDotNet.Tokenization.Specialized
Assembly
AiDotNet.dll

Phoneme-based tokenizer for speech synthesis (TTS) applications.

public class PhonemeTokenizer : TokenizerBase, ITokenizer
Inheritance
PhonemeTokenizer
Implements
Inherited Members

Constructors

PhonemeTokenizer(IVocabulary, Dictionary<string, string>, SpecialTokens, PhonemeSet)

Creates a new phoneme tokenizer.

public PhonemeTokenizer(IVocabulary vocabulary, Dictionary<string, string> g2pRules, SpecialTokens specialTokens, PhonemeTokenizer.PhonemeSet phonemeSet = PhonemeSet.IPA)

Parameters

vocabulary IVocabulary
g2pRules Dictionary<string, string>
specialTokens SpecialTokens
phonemeSet PhonemeTokenizer.PhonemeSet

Methods

CleanupTokens(List<string>)

Cleans up tokens and converts them back to text (must be implemented by derived classes).

protected override string CleanupTokens(List<string> tokens)

Parameters

tokens List<string>

Returns

string

CreateARPAbet(SpecialTokens?)

Creates a phoneme tokenizer with ARPAbet phonemes.

public static PhonemeTokenizer CreateARPAbet(SpecialTokens? specialTokens = null)

Parameters

specialTokens SpecialTokens

Returns

PhonemeTokenizer

Tokenize(string)

Tokenizes text into phonemes.

public override List<string> Tokenize(string text)

Parameters

text string

Returns

List<string>