Table of Contents

Class ShortTimeFourierTransform<T>

Namespace
AiDotNet.Diffusion.Audio
Assembly
AiDotNet.dll

Short-Time Fourier Transform (STFT) for analyzing audio signals over time.

public class ShortTimeFourierTransform<T>

Type Parameters

T

The numeric type used for calculations.

Inheritance
ShortTimeFourierTransform<T>
Inherited Members

Remarks

The STFT breaks a signal into short overlapping segments and computes the Fourier transform of each segment. This reveals how the frequency content of a signal changes over time.

For Beginners: Audio signals like music or speech change over time. While a regular FFT tells you which frequencies are in the entire signal, it doesn't tell you WHEN those frequencies occur.

The STFT solves this by:

  1. Cutting the audio into small overlapping pieces (frames)
  2. Applying a window function to each frame (reduces edge artifacts)
  3. Computing FFT on each windowed frame
  4. Stacking the results to form a spectrogram (time vs. frequency)

Usage:

var stft = new ShortTimeFourierTransform<float>(nFft: 2048, hopLength: 512);
var spectrogram = stft.Forward(audioSignal);
// spectrogram.Shape = [numFrames, nFft/2 + 1] (complex values)

// To reconstruct audio from spectrogram:
var reconstructed = stft.Inverse(spectrogram);

Constructors

ShortTimeFourierTransform(int, int?, int?, IWindowFunction<T>?, bool, PaddingMode)

Initializes a new STFT processor.

public ShortTimeFourierTransform(int nFft = 2048, int? hopLength = null, int? windowLength = null, IWindowFunction<T>? windowFunction = null, bool center = true, PaddingMode padMode = PaddingMode.Reflect)

Parameters

nFft int

FFT size (default: 2048). Should be a power of 2.

hopLength int?

Hop length between frames (default: nFft/4).

windowLength int?

Window length (default: nFft).

windowFunction IWindowFunction<T>

Window function to use (default: HanningWindow - industry standard for audio).

center bool

Whether to pad signal so frames are centered (default: true).

padMode PaddingMode

Padding mode when centering (default: Reflect).

Remarks

For Beginners: - nFft: Determines frequency resolution. Larger = more frequency detail but less time detail - hopLength: How much to slide between frames. Smaller = more overlap = smoother output Common: hopLength = nFft/4 gives 75% overlap - windowFunction: Reduces spectral leakage. Hann (default) is the industry standard for audio. Other options: HammingWindow, BlackmanWindow, KaiserWindow, etc.

GPU Acceleration: This class automatically uses GPU-accelerated FFT operations when available through AiDotNetEngine.Current.

Properties

HopLength

Gets the hop length.

public int HopLength { get; }

Property Value

int

NFft

Gets the FFT size.

public int NFft { get; }

Property Value

int

NumFrequencyBins

Gets the number of frequency bins (nFft / 2 + 1).

public int NumFrequencyBins { get; }

Property Value

int

WindowTensor

Gets the window tensor for GPU operations.

public Tensor<T>? WindowTensor { get; }

Property Value

Tensor<T>

Methods

CalculateNumFrames(int)

Calculates the number of frames for a given signal length.

public int CalculateNumFrames(int signalLength)

Parameters

signalLength int

Length of the input signal.

Returns

int

Number of STFT frames.

CalculateSignalLength(int)

Calculates signal length from number of frames.

public int CalculateSignalLength(int numFrames)

Parameters

numFrames int

Number of STFT frames.

Returns

int

Approximate signal length.

ExtractPhase(Tensor<Complex<T>>)

Extracts phase from complex spectrogram.

public static Tensor<T> ExtractPhase(Tensor<Complex<T>> complex)

Parameters

complex Tensor<Complex<T>>

Complex spectrogram.

Returns

Tensor<T>

Phase tensor in radians.

Forward(Tensor<T>)

Computes the Short-Time Fourier Transform of a signal.

public Tensor<Complex<T>> Forward(Tensor<T> signal)

Parameters

signal Tensor<T>

Input signal as a tensor [length] or [batch, length].

Returns

Tensor<Complex<T>>

Complex spectrogram tensor [numFrames, numFreqs] or [batch, numFrames, numFreqs].

Remarks

For Beginners: This method takes your audio waveform and produces a spectrogram showing which frequencies are present at each point in time.

Inverse(Tensor<Complex<T>>, int?)

Computes the Inverse Short-Time Fourier Transform (overlap-add reconstruction).

public Tensor<T> Inverse(Tensor<Complex<T>> spectrogram, int? length = null)

Parameters

spectrogram Tensor<Complex<T>>

Complex spectrogram [numFrames, numFreqs] or [batch, numFrames, numFreqs].

length int?

Expected output length (optional, otherwise computed from spectrogram).

Returns

Tensor<T>

Reconstructed signal tensor.

Remarks

For Beginners: This method takes a spectrogram and converts it back to an audio waveform. It uses the "overlap-add" method, where each frame is inverse-FFT'd and the overlapping portions are added together.

Note: Perfect reconstruction requires the same STFT parameters used for analysis.

InverseFromMagnitudeAndPhase(Tensor<T>, Tensor<T>, int?)

Reconstructs audio signal from magnitude and phase spectrograms.

public Tensor<T> InverseFromMagnitudeAndPhase(Tensor<T> magnitude, Tensor<T> phase, int? length = null)

Parameters

magnitude Tensor<T>

Magnitude spectrogram.

phase Tensor<T>

Phase spectrogram in radians.

length int?

Expected output length (optional).

Returns

Tensor<T>

Reconstructed audio signal.

Remarks

GPU Acceleration: This method uses IEngine.ISTFT for hardware-accelerated audio reconstruction when GPU is available.

Magnitude(Tensor<T>)

Computes the magnitude spectrogram.

public Tensor<T> Magnitude(Tensor<T> signal)

Parameters

signal Tensor<T>

Input signal.

Returns

Tensor<T>

Magnitude spectrogram [numFrames, numFreqs].

Remarks

For Beginners: The magnitude spectrogram shows how loud each frequency is at each time, discarding phase information. This is often used for visualization and audio processing where phase isn't needed.

GPU Acceleration: When GPU is available, this method uses hardware-accelerated STFT operations through IEngine for significantly faster processing.

MagnitudeAndPhase(Tensor<T>, out Tensor<T>, out Tensor<T>)

Computes magnitude and phase spectrograms simultaneously.

public void MagnitudeAndPhase(Tensor<T> signal, out Tensor<T> magnitude, out Tensor<T> phase)

Parameters

signal Tensor<T>

Input signal.

magnitude Tensor<T>

Output magnitude spectrogram.

phase Tensor<T>

Output phase spectrogram in radians.

Remarks

GPU Acceleration: This method uses IEngine.STFT directly for optimal GPU utilization, returning both magnitude and phase in a single pass.

PolarToComplex(Tensor<T>, Tensor<T>)

Creates complex spectrogram from magnitude and phase.

public static Tensor<Complex<T>> PolarToComplex(Tensor<T> magnitude, Tensor<T> phase)

Parameters

magnitude Tensor<T>

Magnitude tensor.

phase Tensor<T>

Phase tensor in radians.

Returns

Tensor<Complex<T>>

Complex spectrogram.

Power(Tensor<T>)

Computes the power spectrogram (magnitude squared).

public Tensor<T> Power(Tensor<T> signal)

Parameters

signal Tensor<T>

Input signal.

Returns

Tensor<T>

Power spectrogram [numFrames, numFreqs].

Remarks

GPU Acceleration: When GPU is available, this method uses hardware-accelerated STFT operations through IEngine for significantly faster processing.