Class AudioEnhancerBase<T>
- Namespace
- AiDotNet.Audio.Enhancement
- Assembly
- AiDotNet.dll
Base class for algorithmic audio enhancement (non-neural network based).
public abstract class AudioEnhancerBase<T> : IAudioEnhancer<T>
Type Parameters
TThe numeric type used for calculations.
- Inheritance
-
AudioEnhancerBase<T>
- Implements
- Derived
- Inherited Members
Remarks
Provides common functionality for all audio enhancers including:
- Frame-based processing with overlap-add
- Streaming mode with state management
- STFT-based analysis/synthesis
Constructors
AudioEnhancerBase(int, int, int, int, double)
Initializes a new instance of the AudioEnhancerBase class.
protected AudioEnhancerBase(int sampleRate = 16000, int numChannels = 1, int fftSize = 512, int hopSize = 128, double enhancementStrength = 0.7)
Parameters
sampleRateintAudio sample rate in Hz.
numChannelsintNumber of audio channels.
fftSizeintFFT size for spectral analysis.
hopSizeintHop size between frames.
enhancementStrengthdoubleEnhancement strength (0-1).
Fields
NumOps
Numeric operations for type T.
protected readonly INumericOperations<T> NumOps
Field Value
- INumericOperations<T>
_bufferPosition
Current position in input buffer.
protected int _bufferPosition
Field Value
_fftSize
FFT size for spectral analysis.
protected readonly int _fftSize
Field Value
_hopSize
Hop size between frames.
protected readonly int _hopSize
Field Value
_inputBuffer
Input buffer for streaming mode.
protected T[]? _inputBuffer
Field Value
- T[]
_noiseProfile
Estimated noise profile for spectral subtraction.
protected T[]? _noiseProfile
Field Value
- T[]
_outputBuffer
Output buffer for overlap-add.
protected T[]? _outputBuffer
Field Value
- T[]
_window
Window function coefficients.
protected readonly T[] _window
Field Value
- T[]
Properties
EnhancementStrength
Gets or sets the enhancement strength (0.0 = no enhancement, 1.0 = maximum).
public double EnhancementStrength { get; set; }
Property Value
Remarks
Higher values provide more noise reduction but may introduce artifacts. Start with 0.5-0.7 for natural-sounding results.
LatencySamples
Gets the processing latency in samples.
public int LatencySamples { get; }
Property Value
Remarks
Important for real-time applications. Lower latency means faster response but potentially lower quality enhancement.
NumChannels
Gets the number of audio channels supported.
public int NumChannels { get; protected set; }
Property Value
SampleRate
Audio sample rate.
public int SampleRate { get; protected set; }
Property Value
Methods
ComputeFFT(T[])
Computes FFT of audio frame using FftSharp library (O(N log N) algorithm).
protected virtual (T[] Magnitudes, T[] Phases) ComputeFFT(T[] frame)
Parameters
frameT[]
Returns
- (T[] Magnitudes, T[] Phases)
ComputeIFFT(T[], T[])
Computes inverse FFT using FftSharp library.
protected virtual T[] ComputeIFFT(T[] magnitudes, T[] phases)
Parameters
magnitudesT[]phasesT[]
Returns
- T[]
CreateHannWindow(int)
Creates a Hann window of the specified size.
protected T[] CreateHannWindow(int size)
Parameters
sizeint
Returns
- T[]
Enhance(Tensor<T>)
Enhances audio quality by reducing noise and artifacts.
public virtual Tensor<T> Enhance(Tensor<T> audio)
Parameters
audioTensor<T>Input audio tensor with shape [channels, samples] or [samples].
Returns
- Tensor<T>
Enhanced audio tensor with the same shape as input.
EnhanceWithReference(Tensor<T>, Tensor<T>)
Enhances audio with a reference signal for echo cancellation.
public virtual Tensor<T> EnhanceWithReference(Tensor<T> audio, Tensor<T> reference)
Parameters
audioTensor<T>Input audio (microphone signal).
referenceTensor<T>Reference audio (speaker playback signal).
Returns
- Tensor<T>
Enhanced audio with echo removed.
Remarks
For Beginners: This is for video calls!
The problem: Your microphone picks up sound from your speakers, creating an echo for the other person.
Solution: We know what's playing from the speakers (reference), so we can subtract it from what the microphone picks up.
EstimateNoiseProfile(Tensor<T>)
Estimates the noise profile from a segment of audio.
public virtual void EstimateNoiseProfile(Tensor<T> noiseOnlyAudio)
Parameters
noiseOnlyAudioTensor<T>Audio containing only noise (no signal).
Remarks
For Beginners: Some enhancers work better if you tell them what the noise sounds like. Record a few seconds of "silence" (just the background noise) and pass it here.
EstimateNoiseSpectrum(T[])
Estimates noise spectrum from noise-only audio.
protected T[] EstimateNoiseSpectrum(T[] noiseAudio)
Parameters
noiseAudioT[]
Returns
- T[]
ProcessChunk(Tensor<T>)
Processes audio in real-time streaming mode.
public virtual Tensor<T> ProcessChunk(Tensor<T> audioChunk)
Parameters
audioChunkTensor<T>A small chunk of audio for real-time processing.
Returns
- Tensor<T>
Enhanced audio chunk (may have latency).
Remarks
For real-time applications like video calls. The enhancer maintains internal state between calls for continuity.
ProcessOverlapAdd(T[])
Processes audio using overlap-add method.
protected T[] ProcessOverlapAdd(T[] input)
Parameters
inputT[]
Returns
- T[]
ProcessSpectralFrame(T[], T[])
Processes a single spectral frame.
protected abstract T[] ProcessSpectralFrame(T[] magnitudes, T[] phases)
Parameters
magnitudesT[]Magnitude spectrum.
phasesT[]Phase spectrum.
Returns
- T[]
Enhanced magnitude spectrum.
ProcessStreamingChunk(T[])
Processes a streaming chunk of audio.
protected T[] ProcessStreamingChunk(T[] chunk)
Parameters
chunkT[]
Returns
- T[]
ResetState()
Resets internal state for streaming mode.
public virtual void ResetState()