Class MfccExtractor<T>

Namespace: AiDotNet.Audio.Features

Assembly: AiDotNet.dll

Extracts Mel-Frequency Cepstral Coefficients (MFCCs) from audio signals.

public class MfccExtractor<T> : AudioFeatureExtractorBase<T>, IAudioFeatureExtractor<T>

Type Parameters

T: The numeric type used for calculations.

Inheritance: object

AudioFeatureExtractorBase<T>

MfccExtractor<T>

Implements: IAudioFeatureExtractor<T>

Inherited Members: AudioFeatureExtractorBase<T>.NumOps

AudioFeatureExtractorBase<T>.Options

AudioFeatureExtractorBase<T>.SampleRate

AudioFeatureExtractorBase<T>.FftSize

AudioFeatureExtractorBase<T>.HopLength

AudioFeatureExtractorBase<T>.WindowLength

AudioFeatureExtractorBase<T>.Extract(Vector<T>)

AudioFeatureExtractorBase<T>.ExtractAsync(Tensor<T>, CancellationToken)

AudioFeatureExtractorBase<T>.ComputeNumFrames(int)

AudioFeatureExtractorBase<T>.CreateHannWindow(int)

AudioFeatureExtractorBase<T>.CreateHammingWindow(int)

AudioFeatureExtractorBase<T>.ExtractFrame(T[], int, T[])

AudioFeatureExtractorBase<T>.PadAudioCenter(T[])

AudioFeatureExtractorBase<T>.HzToMel(double)

AudioFeatureExtractorBase<T>.MelToHz(double)

AudioFeatureExtractorBase<T>.CreateMelFilterbank(int, int, int, double, double?)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

MFCCs are a compact representation of the spectral envelope of an audio signal. They are widely used in speech recognition, speaker identification, and music analysis.

For Beginners: MFCCs capture the "shape" of the audio's frequency content, similar to how humans perceive sound. The process:

Compute the Mel spectrogram (power spectrum on perceptual scale)
Take the log (matches human loudness perception)
Apply DCT (decorrelates and compresses the information)
Keep only the first N coefficients (typically 13-40)

Why MFCCs work well for speech:

They capture formant frequencies (vocal tract resonances)
They're robust to background noise
They compress audio information efficiently

Usage:

var mfcc = new MfccExtractor<float>(new MfccOptions { NumCoefficients = 13 });
var features = mfcc.Extract(audioTensor);
// features.Shape = [numFrames, 13]

Constructors

MfccExtractor(MfccOptions?)

Initializes a new MFCC extractor.

public MfccExtractor(MfccOptions? options = null)

Parameters

options MfccOptions: MFCC extraction options.

Properties

FeatureDimension

Gets the number of features produced per frame.

public override int FeatureDimension { get; }

Property Value

int

Name

Gets the name of this feature extractor.

public override string Name { get; }

Property Value

string

Methods

Extract(Tensor<T>)

Extracts features from an audio waveform.

public override Tensor<T> Extract(Tensor<T> audio)

Parameters

audio Tensor<T>: The audio waveform as a 1D tensor [samples].

Returns

Tensor<T>: Features as a 2D tensor [frames, features].

Table of Contents

Class MfccExtractor<T>

Type Parameters

Remarks

Constructors

MfccExtractor(MfccOptions?)

Parameters

Properties

FeatureDimension

Property Value

Name

Property Value

Methods

Extract(Tensor<T>)

Parameters

Returns