Table of Contents

Class WhisperTokenizer

Namespace
AiDotNet.Audio.Whisper
Assembly
AiDotNet.dll

Tokenizer for Whisper speech recognition model.

public class WhisperTokenizer
Inheritance
WhisperTokenizer
Inherited Members

Remarks

Whisper uses a special tokenizer with BPE (Byte Pair Encoding) and special tokens for controlling transcription behavior (language, task, timestamps).

For Beginners: A tokenizer converts text to numbers (tokens) and back. Whisper's tokenizer has special tokens for:

  • Language codes (to specify which language to transcribe)
  • Task tokens (transcribe vs translate)
  • Timestamp tokens (for word-level timing)

Properties

EndOfText

Gets the end of text token ID.

public int EndOfText { get; }

Property Value

int

NoSpeechToken

Gets the no speech token ID.

public int NoSpeechToken { get; }

Property Value

int

NoTimestampsToken

Gets the no timestamps token ID.

public int NoTimestampsToken { get; }

Property Value

int

StartOfTranscript

Gets the start of transcript token ID.

public int StartOfTranscript { get; }

Property Value

int

SupportedLanguages

Gets all supported language codes.

public static IReadOnlyList<string> SupportedLanguages { get; }

Property Value

IReadOnlyList<string>

TranscribeToken

Gets the transcribe task token ID.

public int TranscribeToken { get; }

Property Value

int

TranslateToken

Gets the translate task token ID.

public int TranslateToken { get; }

Property Value

int

Methods

Decode(IEnumerable<long>)

Decodes a sequence of token IDs to text.

public string Decode(IEnumerable<long> tokenIds)

Parameters

tokenIds IEnumerable<long>

The token IDs to decode.

Returns

string

The decoded text.

Remarks

This is a simplified decoder. A full implementation would use the actual BPE vocabulary from the Whisper model.

Encode(string)

Encodes text to token IDs.

public List<long> Encode(string text)

Parameters

text string

The text to encode.

Returns

List<long>

The encoded token IDs.

Remarks

This is a placeholder. A full implementation would use BPE encoding.

GetLanguageToken(string)

Gets the token ID for a language code.

public int GetLanguageToken(string languageCode)

Parameters

languageCode string

Two-letter language code (e.g., "en", "es").

Returns

int

The token ID for the language.

GetTimeFromToken(int)

Converts a timestamp token ID to time in seconds.

public double GetTimeFromToken(int tokenId)

Parameters

tokenId int

The timestamp token ID.

Returns

double

Time in seconds.

GetTimestampToken(double)

Gets the timestamp token ID for a given time in seconds.

public int GetTimestampToken(double timeSeconds)

Parameters

timeSeconds double

Time in seconds (must be a multiple of 0.02).

Returns

int

The timestamp token ID.

IsSpecialToken(int)

Checks if a token ID is a special token.

public bool IsSpecialToken(int tokenId)

Parameters

tokenId int

The token ID to check.

Returns

bool

True if the token is a special token.

IsTimestampToken(int)

Checks if a token ID is a timestamp token.

public bool IsTimestampToken(int tokenId)

Parameters

tokenId int

The token ID to check.

Returns

bool

True if the token is a timestamp token.