Class AudioAugmenterBase<T>
- Namespace
- AiDotNet.Augmentation.Audio
- Assembly
- AiDotNet.dll
Base class for audio data augmentations.
public abstract class AudioAugmenterBase<T> : AugmentationBase<T, Tensor<T>>, IAugmentation<T, Tensor<T>>
Type Parameters
TThe numeric type for calculations.
- Inheritance
-
AugmentationBase<T, Tensor<T>>AudioAugmenterBase<T>
- Implements
-
IAugmentation<T, Tensor<T>>
- Derived
- Inherited Members
Remarks
For Beginners: Audio augmentation transforms sound data to improve model robustness to variations in recording conditions, speaking styles, and environmental noise. Common techniques include:
- Time stretching (faster/slower without pitch change)
- Pitch shifting (higher/lower without speed change)
- Adding background noise
- Volume changes
- Time shifting (moving audio forward/backward)
Audio data is typically represented as a 1D waveform tensor or 2D spectrogram.
Constructors
AudioAugmenterBase(double, int)
Initializes a new audio augmentation.
protected AudioAugmenterBase(double probability = 1, int sampleRate = 16000)
Parameters
probabilitydoubleThe probability of applying this augmentation (0.0 to 1.0).
sampleRateintThe sample rate of the audio data in Hz.
Properties
SampleRate
Gets or sets the sample rate of the audio data in Hz.
public int SampleRate { get; set; }
Property Value
Remarks
Default: 16000 Hz (common for speech recognition)
Other common values: 22050 Hz (music), 44100 Hz (CD quality), 48000 Hz (professional audio)
Methods
GetDuration(Tensor<T>)
Gets the duration of the audio in seconds.
protected double GetDuration(Tensor<T> waveform)
Parameters
waveformTensor<T>The audio waveform tensor.
Returns
- double
The duration in seconds.
GetParameters()
Gets the parameters of this augmentation.
public override IDictionary<string, object> GetParameters()
Returns
- IDictionary<string, object>
A dictionary of parameter names to values.
GetSampleCount(Tensor<T>)
Gets the number of audio samples.
protected int GetSampleCount(Tensor<T> waveform)
Parameters
waveformTensor<T>The audio waveform tensor.
Returns
- int
The number of samples.