Class MfccExtractor<T>
Extracts Mel-Frequency Cepstral Coefficients (MFCCs) from audio signals.
public class MfccExtractor<T> : AudioFeatureExtractorBase<T>, IAudioFeatureExtractor<T>
Type Parameters
TThe numeric type used for calculations.
- Inheritance
-
MfccExtractor<T>
- Implements
- Inherited Members
Remarks
MFCCs are a compact representation of the spectral envelope of an audio signal. They are widely used in speech recognition, speaker identification, and music analysis.
For Beginners: MFCCs capture the "shape" of the audio's frequency content, similar to how humans perceive sound. The process:
- Compute the Mel spectrogram (power spectrum on perceptual scale)
- Take the log (matches human loudness perception)
- Apply DCT (decorrelates and compresses the information)
- Keep only the first N coefficients (typically 13-40)
Why MFCCs work well for speech:
- They capture formant frequencies (vocal tract resonances)
- They're robust to background noise
- They compress audio information efficiently
Usage:
var mfcc = new MfccExtractor<float>(new MfccOptions { NumCoefficients = 13 });
var features = mfcc.Extract(audioTensor);
// features.Shape = [numFrames, 13]
Constructors
MfccExtractor(MfccOptions?)
Initializes a new MFCC extractor.
public MfccExtractor(MfccOptions? options = null)
Parameters
optionsMfccOptionsMFCC extraction options.
Properties
FeatureDimension
Gets the number of features produced per frame.
public override int FeatureDimension { get; }
Property Value
Name
Gets the name of this feature extractor.
public override string Name { get; }
Property Value
Methods
Extract(Tensor<T>)
Extracts features from an audio waveform.
public override Tensor<T> Extract(Tensor<T> audio)
Parameters
audioTensor<T>The audio waveform as a 1D tensor [samples].
Returns
- Tensor<T>
Features as a 2D tensor [frames, features].