Class WhisperOptions

Namespace: AiDotNet.Audio.Whisper

Assembly: AiDotNet.dll

Configuration options for the Whisper speech recognition model.

public class WhisperOptions

Inheritance: object

WhisperOptions

Inherited Members: object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

Whisper is a speech recognition model developed by OpenAI that can transcribe audio in multiple languages and perform translation.

For Beginners: Whisper comes in different sizes (tiny to large). Smaller models are faster but less accurate. Larger models are more accurate but slower.

Tiny: ~39M parameters, fastest, good for quick transcription
Base: ~74M parameters, balanced speed/accuracy
Small: ~244M parameters, good accuracy
Medium: ~769M parameters, high accuracy
Large: ~1.5B parameters, best accuracy, slow

Properties

BeamSize

Gets or sets the beam size for beam search decoding. Higher values give better results but are slower.

public int BeamSize { get; set; }

Property Value

int

DecoderModelPath

Gets or sets the path to the decoder ONNX model. If null, the model will be downloaded automatically.

public string? DecoderModelPath { get; set; }

Property Value

string

EncoderModelPath

Gets or sets the path to the encoder ONNX model. If null, the model will be downloaded automatically.

public string? EncoderModelPath { get; set; }

Property Value

string

Language

Gets or sets the language code for transcription (e.g., "en", "es", "fr"). Null for auto-detection.

public string? Language { get; set; }

Property Value

string

MaxAudioLengthSeconds

Gets or sets the maximum length of audio to process in seconds. Whisper processes 30-second chunks.

public int MaxAudioLengthSeconds { get; set; }

Property Value

int

MaxTokens

Gets or sets the maximum number of tokens to generate.

public int MaxTokens { get; set; }

Property Value

int

ModelSize

Gets or sets the model size to use.

public WhisperModelSize ModelSize { get; set; }

Property Value

WhisperModelSize

NumMels

Gets or sets the number of mel filterbank channels. Whisper uses 80 mel channels.

public int NumMels { get; set; }

Property Value

int

OnnxOptions

Gets or sets the ONNX execution options.

public OnnxModelOptions OnnxOptions { get; set; }

Property Value

OnnxModelOptions

ReturnTimestamps

Gets or sets whether to return timestamps with the transcription.

public bool ReturnTimestamps { get; set; }

Property Value

bool

SampleRate

Gets or sets the sample rate expected by the model. Whisper expects 16kHz audio.

public int SampleRate { get; set; }

Property Value

int

Temperature

Gets or sets the temperature for sampling. Lower values make output more deterministic.

public double Temperature { get; set; }

Property Value

double

Translate

Gets or sets whether to translate to English. If true, non-English audio will be translated to English.

public bool Translate { get; set; }

Property Value

bool

WordTimestamps

Gets or sets whether to include word-level timestamps.

public bool WordTimestamps { get; set; }

Property Value

bool

Table of Contents

Class WhisperOptions

Remarks

Properties

BeamSize

Property Value

DecoderModelPath

Property Value

EncoderModelPath

Property Value

Language

Property Value

MaxAudioLengthSeconds

Property Value

MaxTokens

Property Value

ModelSize

Property Value

NumMels

Property Value

OnnxOptions

Property Value

ReturnTimestamps

Property Value

SampleRate

Property Value

Temperature

Property Value

Translate

Property Value

WordTimestamps

Property Value