Class GriffinLim<T>
Griffin-Lim algorithm for audio reconstruction from magnitude spectrograms.
public class GriffinLim<T>
Type Parameters
TThe numeric type used for calculations.
- Inheritance
-
GriffinLim<T>
- Inherited Members
Remarks
The Griffin-Lim algorithm iteratively estimates the phase of a signal given only its magnitude spectrogram. This is useful when you have a spectrogram (e.g., from a generative model like Riffusion) but need to reconstruct audio.
For Beginners: When we compute a spectrogram, we get both magnitude (how loud each frequency is) and phase (where in its cycle each frequency is). For visualization and some ML tasks, we often discard phase and keep only magnitude.
But to play back audio, we need phase information! Griffin-Lim solves this by:
- Starting with random phase
- Converting to audio (ISTFT)
- Converting back to spectrogram (STFT)
- Keeping the new phase but forcing the original magnitude
- Repeating until convergence
With momentum acceleration, convergence is typically achieved in 30-60 iterations.
Usage:
var griffinLim = new GriffinLim<float>(
nFft: 2048,
hopLength: 512,
iterations: 60
);
// From a magnitude spectrogram (e.g., generated by AI)
var audio = griffinLim.Reconstruct(magnitudeSpectrogram);
Constructors
GriffinLim(ShortTimeFourierTransform<T>, int, double, int?)
Initializes Griffin-Lim with an existing STFT processor.
public GriffinLim(ShortTimeFourierTransform<T> stft, int iterations = 60, double momentum = 0.99, int? seed = null)
Parameters
stftShortTimeFourierTransform<T>STFT processor to use.
iterationsintNumber of iterations.
momentumdoubleMomentum factor.
seedint?Random seed.
GriffinLim(int, int?, IWindowFunction<T>?, int, double, int?)
Initializes a new Griffin-Lim processor.
public GriffinLim(int nFft = 2048, int? hopLength = null, IWindowFunction<T>? windowFunction = null, int iterations = 60, double momentum = 0.99, int? seed = null)
Parameters
nFftintFFT size (default: 2048).
hopLengthint?Hop length between frames (default: nFft/4).
windowFunctionIWindowFunction<T>Window function (default: Hanning - industry standard for audio STFT).
iterationsintNumber of iterations (default: 60).
momentumdoubleMomentum factor for faster convergence (default: 0.99).
seedint?Random seed for reproducibility (default: null for random).
Remarks
For Beginners: - iterations: More iterations = better quality but slower. 60 is usually enough. - momentum: Higher values (0.9-0.99) converge faster. Set to 0 for original algorithm.
Typical quality at different iterations:
- 10 iterations: Noticeable artifacts
- 30 iterations: Acceptable quality
- 60 iterations: Good quality
- 100+ iterations: Diminishing returns
Properties
Iterations
Gets the number of iterations.
public int Iterations { get; }
Property Value
Momentum
Gets the momentum factor.
public double Momentum { get; }
Property Value
STFT
Gets the STFT processor.
public ShortTimeFourierTransform<T> STFT { get; }
Property Value
Methods
ComputeSpectralConvergence(Tensor<T>, Tensor<T>)
Estimates the spectral convergence error.
public double ComputeSpectralConvergence(Tensor<T> targetMagnitude, Tensor<T> signal)
Parameters
targetMagnitudeTensor<T>Target magnitude spectrogram.
signalTensor<T>Reconstructed signal.
Returns
- double
Spectral convergence error (lower is better).
Reconstruct(Tensor<T>, int?)
Reconstructs audio from a magnitude spectrogram using Griffin-Lim.
public Tensor<T> Reconstruct(Tensor<T> magnitude, int? length = null)
Parameters
magnitudeTensor<T>Magnitude spectrogram [numFrames, numFreqs].
lengthint?Expected output length (optional).
Returns
- Tensor<T>
Reconstructed audio signal.
Remarks
For Beginners: This method takes a magnitude spectrogram (e.g., from a generative AI model) and creates an audio waveform that, when analyzed with STFT, produces a similar magnitude spectrogram.
ReconstructFromMel(Tensor<T>, MelSpectrogram<T>, int?)
Reconstructs audio from a Mel spectrogram.
public Tensor<T> ReconstructFromMel(Tensor<T> melSpec, MelSpectrogram<T> melProcessor, int? length = null)
Parameters
melSpecTensor<T>Mel spectrogram from MelSpectrogram processor.
melProcessorMelSpectrogram<T>The MelSpectrogram processor used to create the Mel spectrogram.
lengthint?Expected output length (optional).
Returns
- Tensor<T>
Reconstructed audio signal.
Remarks
For Beginners: This is a convenience method that first inverts the Mel spectrogram to a linear magnitude spectrogram, then applies Griffin-Lim.
ReconstructWithProgress(Tensor<T>, Action<int, double>, int?)
Reconstructs audio with progress callback.
public Tensor<T> ReconstructWithProgress(Tensor<T> magnitude, Action<int, double> progressCallback, int? length = null)
Parameters
magnitudeTensor<T>Magnitude spectrogram.
progressCallbackAction<int, double>Callback called after each iteration with (iteration, convergenceMetric).
lengthint?Expected output length.
Returns
- Tensor<T>
Reconstructed audio signal.