Table of Contents

Class WhisperOptions

Namespace
AiDotNet.Audio.Whisper
Assembly
AiDotNet.dll

Configuration options for the Whisper speech recognition model.

public class WhisperOptions
Inheritance
WhisperOptions
Inherited Members

Remarks

Whisper is a speech recognition model developed by OpenAI that can transcribe audio in multiple languages and perform translation.

For Beginners: Whisper comes in different sizes (tiny to large). Smaller models are faster but less accurate. Larger models are more accurate but slower.

  • Tiny: ~39M parameters, fastest, good for quick transcription
  • Base: ~74M parameters, balanced speed/accuracy
  • Small: ~244M parameters, good accuracy
  • Medium: ~769M parameters, high accuracy
  • Large: ~1.5B parameters, best accuracy, slow

Properties

BeamSize

Gets or sets the beam size for beam search decoding. Higher values give better results but are slower.

public int BeamSize { get; set; }

Property Value

int

DecoderModelPath

Gets or sets the path to the decoder ONNX model. If null, the model will be downloaded automatically.

public string? DecoderModelPath { get; set; }

Property Value

string

EncoderModelPath

Gets or sets the path to the encoder ONNX model. If null, the model will be downloaded automatically.

public string? EncoderModelPath { get; set; }

Property Value

string

Language

Gets or sets the language code for transcription (e.g., "en", "es", "fr"). Null for auto-detection.

public string? Language { get; set; }

Property Value

string

MaxAudioLengthSeconds

Gets or sets the maximum length of audio to process in seconds. Whisper processes 30-second chunks.

public int MaxAudioLengthSeconds { get; set; }

Property Value

int

MaxTokens

Gets or sets the maximum number of tokens to generate.

public int MaxTokens { get; set; }

Property Value

int

ModelSize

Gets or sets the model size to use.

public WhisperModelSize ModelSize { get; set; }

Property Value

WhisperModelSize

NumMels

Gets or sets the number of mel filterbank channels. Whisper uses 80 mel channels.

public int NumMels { get; set; }

Property Value

int

OnnxOptions

Gets or sets the ONNX execution options.

public OnnxModelOptions OnnxOptions { get; set; }

Property Value

OnnxModelOptions

ReturnTimestamps

Gets or sets whether to return timestamps with the transcription.

public bool ReturnTimestamps { get; set; }

Property Value

bool

SampleRate

Gets or sets the sample rate expected by the model. Whisper expects 16kHz audio.

public int SampleRate { get; set; }

Property Value

int

Temperature

Gets or sets the temperature for sampling. Lower values make output more deterministic.

public double Temperature { get; set; }

Property Value

double

Translate

Gets or sets whether to translate to English. If true, non-English audio will be translated to English.

public bool Translate { get; set; }

Property Value

bool

WordTimestamps

Gets or sets whether to include word-level timestamps.

public bool WordTimestamps { get; set; }

Property Value

bool