Class AudioFeatureExtractorBase<T>

Namespace: AiDotNet.Audio.Features

Assembly: AiDotNet.dll

Base class for audio feature extractors providing common functionality.

public abstract class AudioFeatureExtractorBase<T> : IAudioFeatureExtractor<T>

Type Parameters

T: The numeric type used for calculations.

Inheritance: object

AudioFeatureExtractorBase<T>

Implements: IAudioFeatureExtractor<T>

Derived: ChromaExtractor<T>

MfccExtractor<T>

SpectralFeatureExtractor<T>

Inherited Members: object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

This base class provides:

Common audio processing utilities (windowing, framing)
Numeric operations through INumericOperations<T>
Sample rate and FFT configuration

Constructors

AudioFeatureExtractorBase(AudioFeatureOptions?)

Initializes a new instance of the AudioFeatureExtractorBase class.

protected AudioFeatureExtractorBase(AudioFeatureOptions? options = null)

Parameters

options AudioFeatureOptions: The feature extraction options.

Fields

NumOps

Numeric operations for the current type.

protected readonly INumericOperations<T> NumOps

Field Value

INumericOperations<T>

Options

The audio feature extraction options.

protected readonly AudioFeatureOptions Options

Field Value

AudioFeatureOptions

Properties

FeatureDimension

Gets the number of features produced per frame.

public abstract int FeatureDimension { get; }

Property Value

int

FftSize

Gets the FFT size.

protected int FftSize { get; }

Property Value

int

HopLength

Gets the hop length between frames.

protected int HopLength { get; }

Property Value

int

Name

Gets the name of this feature extractor.

public abstract string Name { get; }

Property Value

string

SampleRate

Gets the sample rate expected by this extractor.

public int SampleRate { get; }

Property Value

int

WindowLength

Gets the window length.

protected int WindowLength { get; }

Property Value

int

Methods

ComputeNumFrames(int)

Computes the number of frames that will be produced for a given audio length.

protected int ComputeNumFrames(int audioLength)

Parameters

audioLength int: The number of audio samples.

Returns

int: The number of frames.

CreateHammingWindow(int)

Creates a Hamming window of the specified length.

protected T[] CreateHammingWindow(int length)

Parameters

length int: The window length.

Returns

T[]: The Hamming window coefficients.

CreateHannWindow(int)

Creates a Hann window of the specified length.

protected T[] CreateHannWindow(int length)

Parameters

length int: The window length.

Returns

T[]: The Hann window coefficients.

CreateMelFilterbank(int, int, int, double, double?)

Creates mel filterbank.

protected T[,] CreateMelFilterbank(int numMels, int fftSize, int sampleRate, double fMin = 0, double? fMax = null)

Parameters

numMels int: Number of mel filters.
fftSize int: FFT size.
sampleRate int: Sample rate.
fMin double: Minimum frequency.
fMax double?: Maximum frequency.

Returns

T[,]: The mel filterbank matrix [numMels x (fftSize/2+1)].

Extract(Tensor<T>)

Extracts features from an audio waveform.

public abstract Tensor<T> Extract(Tensor<T> audio)

Parameters

audio Tensor<T>: The audio waveform as a 1D tensor [samples].

Returns

Tensor<T>: Features as a 2D tensor [frames, features].

Extract(Vector<T>)

Extracts features from an audio waveform.

public virtual Matrix<T> Extract(Vector<T> audio)

Parameters

audio Vector<T>: The audio waveform as a Vector.

Returns

Matrix<T>: Features as a Matrix [frames x features].

ExtractAsync(Tensor<T>, CancellationToken)

Extracts features from an audio waveform asynchronously.

public virtual Task<Tensor<T>> ExtractAsync(Tensor<T> audio, CancellationToken cancellationToken = default)

Parameters

audio Tensor<T>: The audio waveform.
cancellationToken CancellationToken: Cancellation token.

Returns

Task<Tensor<T>>: Features as a 2D tensor [frames, features].

ExtractFrame(T[], int, T[])

Extracts a single frame from the audio signal.

protected T[] ExtractFrame(T[] audio, int startIndex, T[] window)

Parameters

audio T[]: The audio data.
startIndex int: The start index of the frame.
window T[]: The window function to apply.

Returns

T[]: The windowed frame.

HzToMel(double)

Converts frequency in Hz to mel scale.

protected static double HzToMel(double hz)

Parameters

hz double: Frequency in Hz.

Returns

double: Frequency in mel scale.

MelToHz(double)

Converts mel scale frequency to Hz.

protected static double MelToHz(double mel)

Parameters

mel double: Frequency in mel scale.

Returns

double: Frequency in Hz.

PadAudioCenter(T[])

Pads audio for center-aligned frames.

protected T[] PadAudioCenter(T[] audio)

Parameters

audio T[]: The audio data.

Returns

T[]: The padded audio.

Table of Contents

Class AudioFeatureExtractorBase<T>

Type Parameters

Remarks

Constructors

AudioFeatureExtractorBase(AudioFeatureOptions?)

Parameters

Fields

NumOps

Field Value

Options

Field Value

Properties

FeatureDimension

Property Value

FftSize

Property Value

HopLength

Property Value

Name

Property Value

SampleRate

Property Value

WindowLength

Property Value

Methods

ComputeNumFrames(int)

Parameters

Returns

CreateHammingWindow(int)

Parameters

Returns

CreateHannWindow(int)

Parameters

Returns

CreateMelFilterbank(int, int, int, double, double?)

Parameters

Returns

Extract(Tensor<T>)

Parameters

Returns

Extract(Vector<T>)

Parameters

Returns

ExtractAsync(Tensor<T>, CancellationToken)

Parameters

Returns

ExtractFrame(T[], int, T[])

Parameters

Returns

HzToMel(double)

Parameters

Returns

MelToHz(double)

Parameters

Returns

PadAudioCenter(T[])

Parameters

Returns