Table of Contents

Class MfccExtractor<T>

Namespace
AiDotNet.Audio.Features
Assembly
AiDotNet.dll

Extracts Mel-Frequency Cepstral Coefficients (MFCCs) from audio signals.

public class MfccExtractor<T> : AudioFeatureExtractorBase<T>, IAudioFeatureExtractor<T>

Type Parameters

T

The numeric type used for calculations.

Inheritance
MfccExtractor<T>
Implements
Inherited Members

Remarks

MFCCs are a compact representation of the spectral envelope of an audio signal. They are widely used in speech recognition, speaker identification, and music analysis.

For Beginners: MFCCs capture the "shape" of the audio's frequency content, similar to how humans perceive sound. The process:

  1. Compute the Mel spectrogram (power spectrum on perceptual scale)
  2. Take the log (matches human loudness perception)
  3. Apply DCT (decorrelates and compresses the information)
  4. Keep only the first N coefficients (typically 13-40)

Why MFCCs work well for speech:

  • They capture formant frequencies (vocal tract resonances)
  • They're robust to background noise
  • They compress audio information efficiently

Usage:

var mfcc = new MfccExtractor<float>(new MfccOptions { NumCoefficients = 13 });
var features = mfcc.Extract(audioTensor);
// features.Shape = [numFrames, 13]

Constructors

MfccExtractor(MfccOptions?)

Initializes a new MFCC extractor.

public MfccExtractor(MfccOptions? options = null)

Parameters

options MfccOptions

MFCC extraction options.

Properties

FeatureDimension

Gets the number of features produced per frame.

public override int FeatureDimension { get; }

Property Value

int

Name

Gets the name of this feature extractor.

public override string Name { get; }

Property Value

string

Methods

Extract(Tensor<T>)

Extracts features from an audio waveform.

public override Tensor<T> Extract(Tensor<T> audio)

Parameters

audio Tensor<T>

The audio waveform as a 1D tensor [samples].

Returns

Tensor<T>

Features as a 2D tensor [frames, features].