Table of Contents

Class AudioEnhancerBase<T>

Namespace
AiDotNet.Audio.Enhancement
Assembly
AiDotNet.dll

Base class for algorithmic audio enhancement (non-neural network based).

public abstract class AudioEnhancerBase<T> : IAudioEnhancer<T>

Type Parameters

T

The numeric type used for calculations.

Inheritance
AudioEnhancerBase<T>
Implements
Derived
Inherited Members

Remarks

Provides common functionality for all audio enhancers including:

  • Frame-based processing with overlap-add
  • Streaming mode with state management
  • STFT-based analysis/synthesis

Constructors

AudioEnhancerBase(int, int, int, int, double)

Initializes a new instance of the AudioEnhancerBase class.

protected AudioEnhancerBase(int sampleRate = 16000, int numChannels = 1, int fftSize = 512, int hopSize = 128, double enhancementStrength = 0.7)

Parameters

sampleRate int

Audio sample rate in Hz.

numChannels int

Number of audio channels.

fftSize int

FFT size for spectral analysis.

hopSize int

Hop size between frames.

enhancementStrength double

Enhancement strength (0-1).

Fields

NumOps

Numeric operations for type T.

protected readonly INumericOperations<T> NumOps

Field Value

INumericOperations<T>

_bufferPosition

Current position in input buffer.

protected int _bufferPosition

Field Value

int

_fftSize

FFT size for spectral analysis.

protected readonly int _fftSize

Field Value

int

_hopSize

Hop size between frames.

protected readonly int _hopSize

Field Value

int

_inputBuffer

Input buffer for streaming mode.

protected T[]? _inputBuffer

Field Value

T[]

_noiseProfile

Estimated noise profile for spectral subtraction.

protected T[]? _noiseProfile

Field Value

T[]

_outputBuffer

Output buffer for overlap-add.

protected T[]? _outputBuffer

Field Value

T[]

_window

Window function coefficients.

protected readonly T[] _window

Field Value

T[]

Properties

EnhancementStrength

Gets or sets the enhancement strength (0.0 = no enhancement, 1.0 = maximum).

public double EnhancementStrength { get; set; }

Property Value

double

Remarks

Higher values provide more noise reduction but may introduce artifacts. Start with 0.5-0.7 for natural-sounding results.

LatencySamples

Gets the processing latency in samples.

public int LatencySamples { get; }

Property Value

int

Remarks

Important for real-time applications. Lower latency means faster response but potentially lower quality enhancement.

NumChannels

Gets the number of audio channels supported.

public int NumChannels { get; protected set; }

Property Value

int

SampleRate

Audio sample rate.

public int SampleRate { get; protected set; }

Property Value

int

Methods

ComputeFFT(T[])

Computes FFT of audio frame using FftSharp library (O(N log N) algorithm).

protected virtual (T[] Magnitudes, T[] Phases) ComputeFFT(T[] frame)

Parameters

frame T[]

Returns

(T[] Magnitudes, T[] Phases)

ComputeIFFT(T[], T[])

Computes inverse FFT using FftSharp library.

protected virtual T[] ComputeIFFT(T[] magnitudes, T[] phases)

Parameters

magnitudes T[]
phases T[]

Returns

T[]

CreateHannWindow(int)

Creates a Hann window of the specified size.

protected T[] CreateHannWindow(int size)

Parameters

size int

Returns

T[]

Enhance(Tensor<T>)

Enhances audio quality by reducing noise and artifacts.

public virtual Tensor<T> Enhance(Tensor<T> audio)

Parameters

audio Tensor<T>

Input audio tensor with shape [channels, samples] or [samples].

Returns

Tensor<T>

Enhanced audio tensor with the same shape as input.

EnhanceWithReference(Tensor<T>, Tensor<T>)

Enhances audio with a reference signal for echo cancellation.

public virtual Tensor<T> EnhanceWithReference(Tensor<T> audio, Tensor<T> reference)

Parameters

audio Tensor<T>

Input audio (microphone signal).

reference Tensor<T>

Reference audio (speaker playback signal).

Returns

Tensor<T>

Enhanced audio with echo removed.

Remarks

For Beginners: This is for video calls!

The problem: Your microphone picks up sound from your speakers, creating an echo for the other person.

Solution: We know what's playing from the speakers (reference), so we can subtract it from what the microphone picks up.

EstimateNoiseProfile(Tensor<T>)

Estimates the noise profile from a segment of audio.

public virtual void EstimateNoiseProfile(Tensor<T> noiseOnlyAudio)

Parameters

noiseOnlyAudio Tensor<T>

Audio containing only noise (no signal).

Remarks

For Beginners: Some enhancers work better if you tell them what the noise sounds like. Record a few seconds of "silence" (just the background noise) and pass it here.

EstimateNoiseSpectrum(T[])

Estimates noise spectrum from noise-only audio.

protected T[] EstimateNoiseSpectrum(T[] noiseAudio)

Parameters

noiseAudio T[]

Returns

T[]

ProcessChunk(Tensor<T>)

Processes audio in real-time streaming mode.

public virtual Tensor<T> ProcessChunk(Tensor<T> audioChunk)

Parameters

audioChunk Tensor<T>

A small chunk of audio for real-time processing.

Returns

Tensor<T>

Enhanced audio chunk (may have latency).

Remarks

For real-time applications like video calls. The enhancer maintains internal state between calls for continuity.

ProcessOverlapAdd(T[])

Processes audio using overlap-add method.

protected T[] ProcessOverlapAdd(T[] input)

Parameters

input T[]

Returns

T[]

ProcessSpectralFrame(T[], T[])

Processes a single spectral frame.

protected abstract T[] ProcessSpectralFrame(T[] magnitudes, T[] phases)

Parameters

magnitudes T[]

Magnitude spectrum.

phases T[]

Phase spectrum.

Returns

T[]

Enhanced magnitude spectrum.

ProcessStreamingChunk(T[])

Processes a streaming chunk of audio.

protected T[] ProcessStreamingChunk(T[] chunk)

Parameters

chunk T[]

Returns

T[]

ResetState()

Resets internal state for streaming mode.

public virtual void ResetState()