Table of Contents

Class AudioAugmenterBase<T>

Namespace
AiDotNet.Augmentation.Audio
Assembly
AiDotNet.dll

Base class for audio data augmentations.

public abstract class AudioAugmenterBase<T> : AugmentationBase<T, Tensor<T>>, IAugmentation<T, Tensor<T>>

Type Parameters

T

The numeric type for calculations.

Inheritance
AugmentationBase<T, Tensor<T>>
AudioAugmenterBase<T>
Implements
IAugmentation<T, Tensor<T>>
Derived
Inherited Members

Remarks

For Beginners: Audio augmentation transforms sound data to improve model robustness to variations in recording conditions, speaking styles, and environmental noise. Common techniques include:

  • Time stretching (faster/slower without pitch change)
  • Pitch shifting (higher/lower without speed change)
  • Adding background noise
  • Volume changes
  • Time shifting (moving audio forward/backward)

Audio data is typically represented as a 1D waveform tensor or 2D spectrogram.

Constructors

AudioAugmenterBase(double, int)

Initializes a new audio augmentation.

protected AudioAugmenterBase(double probability = 1, int sampleRate = 16000)

Parameters

probability double

The probability of applying this augmentation (0.0 to 1.0).

sampleRate int

The sample rate of the audio data in Hz.

Properties

SampleRate

Gets or sets the sample rate of the audio data in Hz.

public int SampleRate { get; set; }

Property Value

int

Remarks

Default: 16000 Hz (common for speech recognition)

Other common values: 22050 Hz (music), 44100 Hz (CD quality), 48000 Hz (professional audio)

Methods

GetDuration(Tensor<T>)

Gets the duration of the audio in seconds.

protected double GetDuration(Tensor<T> waveform)

Parameters

waveform Tensor<T>

The audio waveform tensor.

Returns

double

The duration in seconds.

GetParameters()

Gets the parameters of this augmentation.

public override IDictionary<string, object> GetParameters()

Returns

IDictionary<string, object>

A dictionary of parameter names to values.

GetSampleCount(Tensor<T>)

Gets the number of audio samples.

protected int GetSampleCount(Tensor<T> waveform)

Parameters

waveform Tensor<T>

The audio waveform tensor.

Returns

int

The number of samples.