Table of Contents

Class GriffinLim<T>

Namespace
AiDotNet.Diffusion.Audio
Assembly
AiDotNet.dll

Griffin-Lim algorithm for audio reconstruction from magnitude spectrograms.

public class GriffinLim<T>

Type Parameters

T

The numeric type used for calculations.

Inheritance
GriffinLim<T>
Inherited Members

Remarks

The Griffin-Lim algorithm iteratively estimates the phase of a signal given only its magnitude spectrogram. This is useful when you have a spectrogram (e.g., from a generative model like Riffusion) but need to reconstruct audio.

For Beginners: When we compute a spectrogram, we get both magnitude (how loud each frequency is) and phase (where in its cycle each frequency is). For visualization and some ML tasks, we often discard phase and keep only magnitude.

But to play back audio, we need phase information! Griffin-Lim solves this by:

  1. Starting with random phase
  2. Converting to audio (ISTFT)
  3. Converting back to spectrogram (STFT)
  4. Keeping the new phase but forcing the original magnitude
  5. Repeating until convergence

With momentum acceleration, convergence is typically achieved in 30-60 iterations.

Usage:

var griffinLim = new GriffinLim<float>(
    nFft: 2048,
    hopLength: 512,
    iterations: 60
);

// From a magnitude spectrogram (e.g., generated by AI)
var audio = griffinLim.Reconstruct(magnitudeSpectrogram);

Constructors

GriffinLim(ShortTimeFourierTransform<T>, int, double, int?)

Initializes Griffin-Lim with an existing STFT processor.

public GriffinLim(ShortTimeFourierTransform<T> stft, int iterations = 60, double momentum = 0.99, int? seed = null)

Parameters

stft ShortTimeFourierTransform<T>

STFT processor to use.

iterations int

Number of iterations.

momentum double

Momentum factor.

seed int?

Random seed.

GriffinLim(int, int?, IWindowFunction<T>?, int, double, int?)

Initializes a new Griffin-Lim processor.

public GriffinLim(int nFft = 2048, int? hopLength = null, IWindowFunction<T>? windowFunction = null, int iterations = 60, double momentum = 0.99, int? seed = null)

Parameters

nFft int

FFT size (default: 2048).

hopLength int?

Hop length between frames (default: nFft/4).

windowFunction IWindowFunction<T>

Window function (default: Hanning - industry standard for audio STFT).

iterations int

Number of iterations (default: 60).

momentum double

Momentum factor for faster convergence (default: 0.99).

seed int?

Random seed for reproducibility (default: null for random).

Remarks

For Beginners: - iterations: More iterations = better quality but slower. 60 is usually enough. - momentum: Higher values (0.9-0.99) converge faster. Set to 0 for original algorithm.

Typical quality at different iterations:

  • 10 iterations: Noticeable artifacts
  • 30 iterations: Acceptable quality
  • 60 iterations: Good quality
  • 100+ iterations: Diminishing returns

Properties

Iterations

Gets the number of iterations.

public int Iterations { get; }

Property Value

int

Momentum

Gets the momentum factor.

public double Momentum { get; }

Property Value

double

STFT

Gets the STFT processor.

public ShortTimeFourierTransform<T> STFT { get; }

Property Value

ShortTimeFourierTransform<T>

Methods

ComputeSpectralConvergence(Tensor<T>, Tensor<T>)

Estimates the spectral convergence error.

public double ComputeSpectralConvergence(Tensor<T> targetMagnitude, Tensor<T> signal)

Parameters

targetMagnitude Tensor<T>

Target magnitude spectrogram.

signal Tensor<T>

Reconstructed signal.

Returns

double

Spectral convergence error (lower is better).

Reconstruct(Tensor<T>, int?)

Reconstructs audio from a magnitude spectrogram using Griffin-Lim.

public Tensor<T> Reconstruct(Tensor<T> magnitude, int? length = null)

Parameters

magnitude Tensor<T>

Magnitude spectrogram [numFrames, numFreqs].

length int?

Expected output length (optional).

Returns

Tensor<T>

Reconstructed audio signal.

Remarks

For Beginners: This method takes a magnitude spectrogram (e.g., from a generative AI model) and creates an audio waveform that, when analyzed with STFT, produces a similar magnitude spectrogram.

ReconstructFromMel(Tensor<T>, MelSpectrogram<T>, int?)

Reconstructs audio from a Mel spectrogram.

public Tensor<T> ReconstructFromMel(Tensor<T> melSpec, MelSpectrogram<T> melProcessor, int? length = null)

Parameters

melSpec Tensor<T>

Mel spectrogram from MelSpectrogram processor.

melProcessor MelSpectrogram<T>

The MelSpectrogram processor used to create the Mel spectrogram.

length int?

Expected output length (optional).

Returns

Tensor<T>

Reconstructed audio signal.

Remarks

For Beginners: This is a convenience method that first inverts the Mel spectrogram to a linear magnitude spectrogram, then applies Griffin-Lim.

ReconstructWithProgress(Tensor<T>, Action<int, double>, int?)

Reconstructs audio with progress callback.

public Tensor<T> ReconstructWithProgress(Tensor<T> magnitude, Action<int, double> progressCallback, int? length = null)

Parameters

magnitude Tensor<T>

Magnitude spectrogram.

progressCallback Action<int, double>

Callback called after each iteration with (iteration, convergenceMetric).

length int?

Expected output length.

Returns

Tensor<T>

Reconstructed audio signal.