Class EnergyBasedVad<T>
- Namespace
- AiDotNet.Audio.VoiceActivity
- Assembly
- AiDotNet.dll
Simple energy-based voice activity detector (algorithmic, no neural network).
public class EnergyBasedVad<T> : VoiceActivityDetectorBase<T>, IVoiceActivityDetector<T>
Type Parameters
TThe numeric type used for calculations.
- Inheritance
-
EnergyBasedVad<T>
- Implements
- Inherited Members
Remarks
This is a basic VAD that detects speech based on signal energy (loudness). It combines multiple features for more robust detection: - Short-time energy - Zero-crossing rate - Spectral flatness
For Beginners: This is the simplest type of VAD:
Basic idea: Speech is louder than silence!
- Compute the "energy" (sum of squared samples) for each frame
- If energy exceeds a threshold, it's probably speech
Enhanced features used here:
- Energy: How loud is the signal?
- Zero-crossings: How often does the signal cross zero?
- Speech: Medium zero-crossings (voiced sounds)
- Noise: High zero-crossings (random noise)
- Spectral flatness: Is it tonal or noisy?
- Speech: Low flatness (has harmonic structure)
- Noise: High flatness (random spectrum)
Pros:
- Very fast (no neural network)
- Low latency
- Works well in quiet environments
Cons:
- Struggles with background noise
- May trigger on loud non-speech sounds
- Requires threshold tuning for different environments
For better noise robustness, use neural network-based VAD like SileroVad.
Constructors
EnergyBasedVad(int, int, double, double, double, double, bool, int, int)
Creates an energy-based VAD with default parameters.
public EnergyBasedVad(int sampleRate = 16000, int frameSize = 480, double threshold = 0.5, double energyWeight = 0.5, double zcrWeight = 0.25, double flatnessWeight = 0.25, bool adaptiveThreshold = true, int minSpeechDurationMs = 250, int minSilenceDurationMs = 300)
Parameters
sampleRateintAudio sample rate (default: 16000).
frameSizeintFrame size in samples (default: 480 = 30ms at 16kHz).
thresholddoubleDetection threshold 0-1 (default: 0.5).
energyWeightdoubleWeight for energy feature (default: 0.5).
zcrWeightdoubleWeight for zero-crossing rate (default: 0.25).
flatnessWeightdoubleWeight for spectral flatness (default: 0.25).
adaptiveThresholdboolEnable adaptive threshold (default: true).
minSpeechDurationMsintMinimum speech duration (default: 250ms).
minSilenceDurationMsintMinimum silence duration (default: 300ms).
Methods
ComputeFrameProbability(T[])
Computes speech probability for a single frame.
protected override T ComputeFrameProbability(T[] frame)
Parameters
frameT[]Audio frame data.
Returns
- T
Speech probability (0-1).
ResetState()
Resets the VAD state including adaptive thresholds.
public override void ResetState()