Table of Contents

Class AudioFeatureExtractorBase<T>

Namespace
AiDotNet.Audio.Features
Assembly
AiDotNet.dll

Base class for audio feature extractors providing common functionality.

public abstract class AudioFeatureExtractorBase<T> : IAudioFeatureExtractor<T>

Type Parameters

T

The numeric type used for calculations.

Inheritance
AudioFeatureExtractorBase<T>
Implements
Derived
Inherited Members

Remarks

This base class provides:

  • Common audio processing utilities (windowing, framing)
  • Numeric operations through INumericOperations<T>
  • Sample rate and FFT configuration

Constructors

AudioFeatureExtractorBase(AudioFeatureOptions?)

Initializes a new instance of the AudioFeatureExtractorBase class.

protected AudioFeatureExtractorBase(AudioFeatureOptions? options = null)

Parameters

options AudioFeatureOptions

The feature extraction options.

Fields

NumOps

Numeric operations for the current type.

protected readonly INumericOperations<T> NumOps

Field Value

INumericOperations<T>

Options

The audio feature extraction options.

protected readonly AudioFeatureOptions Options

Field Value

AudioFeatureOptions

Properties

FeatureDimension

Gets the number of features produced per frame.

public abstract int FeatureDimension { get; }

Property Value

int

FftSize

Gets the FFT size.

protected int FftSize { get; }

Property Value

int

HopLength

Gets the hop length between frames.

protected int HopLength { get; }

Property Value

int

Name

Gets the name of this feature extractor.

public abstract string Name { get; }

Property Value

string

SampleRate

Gets the sample rate expected by this extractor.

public int SampleRate { get; }

Property Value

int

WindowLength

Gets the window length.

protected int WindowLength { get; }

Property Value

int

Methods

ComputeNumFrames(int)

Computes the number of frames that will be produced for a given audio length.

protected int ComputeNumFrames(int audioLength)

Parameters

audioLength int

The number of audio samples.

Returns

int

The number of frames.

CreateHammingWindow(int)

Creates a Hamming window of the specified length.

protected T[] CreateHammingWindow(int length)

Parameters

length int

The window length.

Returns

T[]

The Hamming window coefficients.

CreateHannWindow(int)

Creates a Hann window of the specified length.

protected T[] CreateHannWindow(int length)

Parameters

length int

The window length.

Returns

T[]

The Hann window coefficients.

CreateMelFilterbank(int, int, int, double, double?)

Creates mel filterbank.

protected T[,] CreateMelFilterbank(int numMels, int fftSize, int sampleRate, double fMin = 0, double? fMax = null)

Parameters

numMels int

Number of mel filters.

fftSize int

FFT size.

sampleRate int

Sample rate.

fMin double

Minimum frequency.

fMax double?

Maximum frequency.

Returns

T[,]

The mel filterbank matrix [numMels x (fftSize/2+1)].

Extract(Tensor<T>)

Extracts features from an audio waveform.

public abstract Tensor<T> Extract(Tensor<T> audio)

Parameters

audio Tensor<T>

The audio waveform as a 1D tensor [samples].

Returns

Tensor<T>

Features as a 2D tensor [frames, features].

Extract(Vector<T>)

Extracts features from an audio waveform.

public virtual Matrix<T> Extract(Vector<T> audio)

Parameters

audio Vector<T>

The audio waveform as a Vector.

Returns

Matrix<T>

Features as a Matrix [frames x features].

ExtractAsync(Tensor<T>, CancellationToken)

Extracts features from an audio waveform asynchronously.

public virtual Task<Tensor<T>> ExtractAsync(Tensor<T> audio, CancellationToken cancellationToken = default)

Parameters

audio Tensor<T>

The audio waveform.

cancellationToken CancellationToken

Cancellation token.

Returns

Task<Tensor<T>>

Features as a 2D tensor [frames, features].

ExtractFrame(T[], int, T[])

Extracts a single frame from the audio signal.

protected T[] ExtractFrame(T[] audio, int startIndex, T[] window)

Parameters

audio T[]

The audio data.

startIndex int

The start index of the frame.

window T[]

The window function to apply.

Returns

T[]

The windowed frame.

HzToMel(double)

Converts frequency in Hz to mel scale.

protected static double HzToMel(double hz)

Parameters

hz double

Frequency in Hz.

Returns

double

Frequency in mel scale.

MelToHz(double)

Converts mel scale frequency to Hz.

protected static double MelToHz(double mel)

Parameters

mel double

Frequency in mel scale.

Returns

double

Frequency in Hz.

PadAudioCenter(T[])

Pads audio for center-aligned frames.

protected T[] PadAudioCenter(T[] audio)

Parameters

audio T[]

The audio data.

Returns

T[]

The padded audio.