Class AudioFeatureExtractorBase<T>
Base class for audio feature extractors providing common functionality.
public abstract class AudioFeatureExtractorBase<T> : IAudioFeatureExtractor<T>
Type Parameters
TThe numeric type used for calculations.
- Inheritance
-
AudioFeatureExtractorBase<T>
- Implements
- Derived
- Inherited Members
Remarks
This base class provides:
- Common audio processing utilities (windowing, framing)
- Numeric operations through INumericOperations<T>
- Sample rate and FFT configuration
Constructors
AudioFeatureExtractorBase(AudioFeatureOptions?)
Initializes a new instance of the AudioFeatureExtractorBase class.
protected AudioFeatureExtractorBase(AudioFeatureOptions? options = null)
Parameters
optionsAudioFeatureOptionsThe feature extraction options.
Fields
NumOps
Numeric operations for the current type.
protected readonly INumericOperations<T> NumOps
Field Value
- INumericOperations<T>
Options
The audio feature extraction options.
protected readonly AudioFeatureOptions Options
Field Value
Properties
FeatureDimension
Gets the number of features produced per frame.
public abstract int FeatureDimension { get; }
Property Value
FftSize
Gets the FFT size.
protected int FftSize { get; }
Property Value
HopLength
Gets the hop length between frames.
protected int HopLength { get; }
Property Value
Name
Gets the name of this feature extractor.
public abstract string Name { get; }
Property Value
SampleRate
Gets the sample rate expected by this extractor.
public int SampleRate { get; }
Property Value
WindowLength
Gets the window length.
protected int WindowLength { get; }
Property Value
Methods
ComputeNumFrames(int)
Computes the number of frames that will be produced for a given audio length.
protected int ComputeNumFrames(int audioLength)
Parameters
audioLengthintThe number of audio samples.
Returns
- int
The number of frames.
CreateHammingWindow(int)
Creates a Hamming window of the specified length.
protected T[] CreateHammingWindow(int length)
Parameters
lengthintThe window length.
Returns
- T[]
The Hamming window coefficients.
CreateHannWindow(int)
Creates a Hann window of the specified length.
protected T[] CreateHannWindow(int length)
Parameters
lengthintThe window length.
Returns
- T[]
The Hann window coefficients.
CreateMelFilterbank(int, int, int, double, double?)
Creates mel filterbank.
protected T[,] CreateMelFilterbank(int numMels, int fftSize, int sampleRate, double fMin = 0, double? fMax = null)
Parameters
numMelsintNumber of mel filters.
fftSizeintFFT size.
sampleRateintSample rate.
fMindoubleMinimum frequency.
fMaxdouble?Maximum frequency.
Returns
- T[,]
The mel filterbank matrix [numMels x (fftSize/2+1)].
Extract(Tensor<T>)
Extracts features from an audio waveform.
public abstract Tensor<T> Extract(Tensor<T> audio)
Parameters
audioTensor<T>The audio waveform as a 1D tensor [samples].
Returns
- Tensor<T>
Features as a 2D tensor [frames, features].
Extract(Vector<T>)
Extracts features from an audio waveform.
public virtual Matrix<T> Extract(Vector<T> audio)
Parameters
audioVector<T>The audio waveform as a Vector.
Returns
- Matrix<T>
Features as a Matrix [frames x features].
ExtractAsync(Tensor<T>, CancellationToken)
Extracts features from an audio waveform asynchronously.
public virtual Task<Tensor<T>> ExtractAsync(Tensor<T> audio, CancellationToken cancellationToken = default)
Parameters
audioTensor<T>The audio waveform.
cancellationTokenCancellationTokenCancellation token.
Returns
- Task<Tensor<T>>
Features as a 2D tensor [frames, features].
ExtractFrame(T[], int, T[])
Extracts a single frame from the audio signal.
protected T[] ExtractFrame(T[] audio, int startIndex, T[] window)
Parameters
audioT[]The audio data.
startIndexintThe start index of the frame.
windowT[]The window function to apply.
Returns
- T[]
The windowed frame.
HzToMel(double)
Converts frequency in Hz to mel scale.
protected static double HzToMel(double hz)
Parameters
hzdoubleFrequency in Hz.
Returns
- double
Frequency in mel scale.
MelToHz(double)
Converts mel scale frequency to Hz.
protected static double MelToHz(double mel)
Parameters
meldoubleFrequency in mel scale.
Returns
- double
Frequency in Hz.
PadAudioCenter(T[])
Pads audio for center-aligned frames.
protected T[] PadAudioCenter(T[] audio)
Parameters
audioT[]The audio data.
Returns
- T[]
The padded audio.