Class GriffinLim<T>

Namespace: AiDotNet.Diffusion.Audio

Assembly: AiDotNet.dll

Griffin-Lim algorithm for audio reconstruction from magnitude spectrograms.

public class GriffinLim<T>

Type Parameters

T: The numeric type used for calculations.

Inheritance: object

GriffinLim<T>

Inherited Members: object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

The Griffin-Lim algorithm iteratively estimates the phase of a signal given only its magnitude spectrogram. This is useful when you have a spectrogram (e.g., from a generative model like Riffusion) but need to reconstruct audio.

For Beginners: When we compute a spectrogram, we get both magnitude (how loud each frequency is) and phase (where in its cycle each frequency is). For visualization and some ML tasks, we often discard phase and keep only magnitude.

But to play back audio, we need phase information! Griffin-Lim solves this by:

Starting with random phase
Converting to audio (ISTFT)
Converting back to spectrogram (STFT)
Keeping the new phase but forcing the original magnitude
Repeating until convergence

With momentum acceleration, convergence is typically achieved in 30-60 iterations.

Usage:

var griffinLim = new GriffinLim<float>(
    nFft: 2048,
    hopLength: 512,
    iterations: 60
);

// From a magnitude spectrogram (e.g., generated by AI)
var audio = griffinLim.Reconstruct(magnitudeSpectrogram);

Constructors

GriffinLim(ShortTimeFourierTransform<T>, int, double, int?)

Initializes Griffin-Lim with an existing STFT processor.

public GriffinLim(ShortTimeFourierTransform<T> stft, int iterations = 60, double momentum = 0.99, int? seed = null)

Parameters

stft ShortTimeFourierTransform<T>: STFT processor to use.
iterations int: Number of iterations.
momentum double: Momentum factor.
seed int?: Random seed.

GriffinLim(int, int?, IWindowFunction<T>?, int, double, int?)

Initializes a new Griffin-Lim processor.

public GriffinLim(int nFft = 2048, int? hopLength = null, IWindowFunction<T>? windowFunction = null, int iterations = 60, double momentum = 0.99, int? seed = null)

Parameters

nFft int: FFT size (default: 2048).
hopLength int?: Hop length between frames (default: nFft/4).
windowFunction IWindowFunction<T>: Window function (default: Hanning - industry standard for audio STFT).
iterations int: Number of iterations (default: 60).
momentum double: Momentum factor for faster convergence (default: 0.99).
seed int?: Random seed for reproducibility (default: null for random).

Remarks

For Beginners: - iterations: More iterations = better quality but slower. 60 is usually enough. - momentum: Higher values (0.9-0.99) converge faster. Set to 0 for original algorithm.

Typical quality at different iterations:

10 iterations: Noticeable artifacts
30 iterations: Acceptable quality
60 iterations: Good quality
100+ iterations: Diminishing returns

Properties

Iterations

Gets the number of iterations.

public int Iterations { get; }

Property Value

int

Momentum

Gets the momentum factor.

public double Momentum { get; }

Property Value

double

STFT

Gets the STFT processor.

public ShortTimeFourierTransform<T> STFT { get; }

Property Value

ShortTimeFourierTransform<T>

Methods

ComputeSpectralConvergence(Tensor<T>, Tensor<T>)

Estimates the spectral convergence error.

public double ComputeSpectralConvergence(Tensor<T> targetMagnitude, Tensor<T> signal)

Parameters

targetMagnitude Tensor<T>: Target magnitude spectrogram.
signal Tensor<T>: Reconstructed signal.

Returns

double: Spectral convergence error (lower is better).

Reconstruct(Tensor<T>, int?)

Reconstructs audio from a magnitude spectrogram using Griffin-Lim.

public Tensor<T> Reconstruct(Tensor<T> magnitude, int? length = null)

Parameters

magnitude Tensor<T>: Magnitude spectrogram [numFrames, numFreqs].
length int?: Expected output length (optional).

Returns

Tensor<T>: Reconstructed audio signal.

Remarks

For Beginners: This method takes a magnitude spectrogram (e.g., from a generative AI model) and creates an audio waveform that, when analyzed with STFT, produces a similar magnitude spectrogram.

ReconstructFromMel(Tensor<T>, MelSpectrogram<T>, int?)

Reconstructs audio from a Mel spectrogram.

public Tensor<T> ReconstructFromMel(Tensor<T> melSpec, MelSpectrogram<T> melProcessor, int? length = null)

Parameters

melSpec Tensor<T>: Mel spectrogram from MelSpectrogram processor.
melProcessor MelSpectrogram<T>: The MelSpectrogram processor used to create the Mel spectrogram.
length int?: Expected output length (optional).

Returns

Tensor<T>: Reconstructed audio signal.

Remarks

For Beginners: This is a convenience method that first inverts the Mel spectrogram to a linear magnitude spectrogram, then applies Griffin-Lim.

ReconstructWithProgress(Tensor<T>, Action<int, double>, int?)

Reconstructs audio with progress callback.

public Tensor<T> ReconstructWithProgress(Tensor<T> magnitude, Action<int, double> progressCallback, int? length = null)

Parameters

magnitude Tensor<T>: Magnitude spectrogram.
progressCallback Action<int, double>: Callback called after each iteration with (iteration, convergenceMetric).
length int?: Expected output length.

Returns

Tensor<T>: Reconstructed audio signal.

Table of Contents

Class GriffinLim<T>

Type Parameters

Remarks

Constructors

GriffinLim(ShortTimeFourierTransform<T>, int, double, int?)

Parameters

GriffinLim(int, int?, IWindowFunction<T>?, int, double, int?)

Parameters

Remarks

Properties

Iterations

Property Value

Momentum

Property Value

STFT

Property Value

Methods

ComputeSpectralConvergence(Tensor<T>, Tensor<T>)

Parameters

Returns

Reconstruct(Tensor<T>, int?)

Parameters

Returns

Remarks

ReconstructFromMel(Tensor<T>, MelSpectrogram<T>, int?)

Parameters

Returns

Remarks

ReconstructWithProgress(Tensor<T>, Action<int, double>, int?)

Parameters

Returns