Class VoiceActivityDetectorBase<T>

Namespace: AiDotNet.Audio.VoiceActivity

Assembly: AiDotNet.dll

Base class for algorithmic voice activity detection implementations (non-neural network).

public abstract class VoiceActivityDetectorBase<T> : IVoiceActivityDetector<T>

Type Parameters

T: The numeric type used for calculations.

Inheritance: object

VoiceActivityDetectorBase<T>

Implements: IVoiceActivityDetector<T>

Derived: EnergyBasedVad<T>

Inherited Members: object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

Voice Activity Detection (VAD) determines whether audio contains speech or silence. This is fundamental to many audio applications including speech recognition, communication systems, and noise reduction.

For Beginners: VAD answers a simple question: "Is someone speaking right now?"

Common uses:

Skip silence during transcription
Reduce transmission bandwidth in VoIP
Trigger recording only when speech is detected
Segment audio into speaker turns

This base class provides:

Frame-based processing with hangover logic
Streaming mode with state management
Segment detection across entire audio files

For neural network-based VAD (like Silero), see classes that extend AudioNeuralNetworkBase.

Constructors

VoiceActivityDetectorBase(int, int, double, int, int)

Initializes a new instance of VoiceActivityDetectorBase.

protected VoiceActivityDetectorBase(int sampleRate = 16000, int frameSize = 480, double threshold = 0.5, int minSpeechDurationMs = 250, int minSilenceDurationMs = 300)

Parameters

sampleRate int: Audio sample rate.
frameSize int: Frame size in samples.
threshold double: Detection threshold (0-1).
minSpeechDurationMs int: Minimum speech duration in ms.
minSilenceDurationMs int: Minimum silence duration in ms.

Fields

NumOps

Numeric operations for type T.

protected readonly INumericOperations<T> NumOps

Field Value

INumericOperations<T>

_inSpeech

Current speech state.

protected bool _inSpeech

Field Value

bool

_silenceFrameCount

Number of consecutive silence frames.

protected int _silenceFrameCount

Field Value

int

_speechFrameCount

Number of consecutive speech frames.

protected int _speechFrameCount

Field Value

int

Properties

FrameSize

Gets the frame size in samples used for detection.

public int FrameSize { get; protected set; }

Property Value

int

MinSilenceDurationMs

Gets or sets the minimum silence duration in milliseconds.

public int MinSilenceDurationMs { get; set; }

Property Value

int

Remarks

Silence gaps shorter than this don't split speech segments.

MinSpeechDurationMs

Gets or sets the minimum speech duration in milliseconds.

public int MinSpeechDurationMs { get; set; }

Property Value

int

Remarks

Speech segments shorter than this are ignored (reduces false triggers).

SampleRate

Gets the sample rate this VAD operates at.

public int SampleRate { get; protected set; }

Property Value

int

Threshold

Gets or sets the detection threshold (0.0 to 1.0).

public double Threshold { get; set; }

Property Value

double

Remarks

Higher threshold = fewer false positives but may miss quiet speech. Lower threshold = catches more speech but may trigger on noise. Default is typically 0.5.

Methods

ComputeFrameProbability(T[])

Computes speech probability for a single frame.

protected abstract T ComputeFrameProbability(T[] frame)

Parameters

frame T[]: Audio frame data.

Returns

T: Speech probability (0-1).

DetectSpeech(Tensor<T>)

Detects whether speech is present in an audio frame.

public virtual bool DetectSpeech(Tensor<T> audioFrame)

Parameters

audioFrame Tensor<T>: Audio frame with shape [samples] or [channels, samples].

Returns

bool: True if speech is detected, false otherwise.

DetectSpeechSegments(Tensor<T>)

Detects speech segments in a longer audio recording.

public virtual IReadOnlyList<(int StartSample, int EndSample)> DetectSpeechSegments(Tensor<T> audio)

Parameters

audio Tensor<T>: Full audio recording.

Returns

IReadOnlyList<(int StartSample, int EndSample)>: List of (startSample, endSample) tuples for each speech segment.

Remarks

For Beginners: This finds all the parts where someone is talking.

Example result for a 10-second recording: [(0.5s, 2.3s), (4.1s, 6.8s), (8.0s, 9.5s)] Meaning: Speech from 0.5-2.3s, silence, speech from 4.1-6.8s, etc.

GetFrameProbabilities(Tensor<T>)

Gets frame-by-frame speech probabilities for the entire audio.

public virtual T[] GetFrameProbabilities(Tensor<T> audio)

Parameters

audio Tensor<T>: Full audio recording.

Returns

T[]: Array of speech probabilities, one per frame.

GetSpeechProbability(Tensor<T>)

Gets the speech probability for an audio frame.

public virtual T GetSpeechProbability(Tensor<T> audioFrame)

Parameters

audioFrame Tensor<T>: Audio frame to analyze.

Returns

T: Probability of speech (0.0 = definitely not speech, 1.0 = definitely speech).

ProcessChunk(Tensor<T>)

Processes audio in streaming mode, maintaining state between calls.

public virtual (bool IsSpeech, T Probability) ProcessChunk(Tensor<T> audioChunk)

Parameters

audioChunk Tensor<T>: A chunk of audio for real-time processing.

Returns

(bool SameLanguage, T Confidence): Speech detection result with probability.

ResetState()

Resets internal state for streaming mode.

public virtual void ResetState()

Table of Contents

Class VoiceActivityDetectorBase<T>

Type Parameters

Remarks

Constructors

VoiceActivityDetectorBase(int, int, double, int, int)

Parameters

Fields

NumOps

Field Value

_inSpeech

Field Value

_silenceFrameCount

Field Value

_speechFrameCount

Field Value

Properties

FrameSize

Property Value

MinSilenceDurationMs

Property Value

Remarks

MinSpeechDurationMs

Property Value

Remarks

SampleRate

Property Value

Threshold

Property Value

Remarks

Methods

ComputeFrameProbability(T[])

Parameters

Returns

DetectSpeech(Tensor<T>)

Parameters

Returns

DetectSpeechSegments(Tensor<T>)

Parameters

Returns

Remarks

GetFrameProbabilities(Tensor<T>)

Parameters

Returns

GetSpeechProbability(Tensor<T>)

Parameters

Returns

ProcessChunk(Tensor<T>)

Parameters

Returns

ResetState()