Class AudioAugmenterBase<T>

Namespace: AiDotNet.Augmentation.Audio

Assembly: AiDotNet.dll

Base class for audio data augmentations.

public abstract class AudioAugmenterBase<T> : AugmentationBase<T, Tensor<T>>, IAugmentation<T, Tensor<T>>

Type Parameters

T: The numeric type for calculations.

Inheritance: object

AugmentationBase<T, Tensor<T>>

AudioAugmenterBase<T>

Implements: IAugmentation<T, Tensor<T>>

Derived: AudioNoise<T>

PitchShift<T>

TimeShift<T>

TimeStretch<T>

VolumeChange<T>

Inherited Members: AugmentationBase<T, Tensor<T>>.NumOps

AugmentationBase<T, Tensor<T>>.OnAugmentationApplied

AugmentationBase<T, Tensor<T>>.Name

AugmentationBase<T, Tensor<T>>.Probability

AugmentationBase<T, Tensor<T>>.IsTrainingOnly

AugmentationBase<T, Tensor<T>>.IsEnabled

AugmentationBase<T, Tensor<T>>.Apply(Tensor<T>, AugmentationContext<T>)

AugmentationBase<T, Tensor<T>>.ApplyAugmentation(Tensor<T>, AugmentationContext<T>)

AugmentationBase<T, Tensor<T>>.GetParameters()

AugmentationBase<T, Tensor<T>>.RaiseAugmentationApplied(AugmentationContext<T>, bool)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

For Beginners: Audio augmentation transforms sound data to improve model robustness to variations in recording conditions, speaking styles, and environmental noise. Common techniques include:

Time stretching (faster/slower without pitch change)
Pitch shifting (higher/lower without speed change)
Adding background noise
Volume changes
Time shifting (moving audio forward/backward)

Audio data is typically represented as a 1D waveform tensor or 2D spectrogram.

Constructors

AudioAugmenterBase(double, int)

Initializes a new audio augmentation.

protected AudioAugmenterBase(double probability = 1, int sampleRate = 16000)

Parameters

probability double: The probability of applying this augmentation (0.0 to 1.0).
sampleRate int: The sample rate of the audio data in Hz.

Properties

SampleRate

Gets or sets the sample rate of the audio data in Hz.

public int SampleRate { get; set; }

Property Value

int

Remarks

Default: 16000 Hz (common for speech recognition)

Other common values: 22050 Hz (music), 44100 Hz (CD quality), 48000 Hz (professional audio)

Methods

GetDuration(Tensor<T>)

Gets the duration of the audio in seconds.

protected double GetDuration(Tensor<T> waveform)

Parameters

waveform Tensor<T>: The audio waveform tensor.

Returns

double: The duration in seconds.

GetParameters()

Gets the parameters of this augmentation.

public override IDictionary<string, object> GetParameters()

Returns

IDictionary<string, object>: A dictionary of parameter names to values.

GetSampleCount(Tensor<T>)

Gets the number of audio samples.

protected int GetSampleCount(Tensor<T> waveform)

Parameters

waveform Tensor<T>: The audio waveform tensor.

Returns

int: The number of samples.

Table of Contents

Class AudioAugmenterBase<T>

Type Parameters

Remarks

Constructors

AudioAugmenterBase(double, int)

Parameters

Properties

SampleRate

Property Value

Remarks

Methods

GetDuration(Tensor<T>)

Parameters

Returns

GetParameters()

Returns

GetSampleCount(Tensor<T>)

Parameters

Returns