Class ShortTimeFourierTransform<T>

Namespace: AiDotNet.Diffusion.Audio

Assembly: AiDotNet.dll

Short-Time Fourier Transform (STFT) for analyzing audio signals over time.

public class ShortTimeFourierTransform<T>

Type Parameters

T: The numeric type used for calculations.

Inheritance: object

ShortTimeFourierTransform<T>

Inherited Members: object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

The STFT breaks a signal into short overlapping segments and computes the Fourier transform of each segment. This reveals how the frequency content of a signal changes over time.

For Beginners: Audio signals like music or speech change over time. While a regular FFT tells you which frequencies are in the entire signal, it doesn't tell you WHEN those frequencies occur.

The STFT solves this by:

Cutting the audio into small overlapping pieces (frames)
Applying a window function to each frame (reduces edge artifacts)
Computing FFT on each windowed frame
Stacking the results to form a spectrogram (time vs. frequency)

Usage:

var stft = new ShortTimeFourierTransform<float>(nFft: 2048, hopLength: 512);
var spectrogram = stft.Forward(audioSignal);
// spectrogram.Shape = [numFrames, nFft/2 + 1] (complex values)

// To reconstruct audio from spectrogram:
var reconstructed = stft.Inverse(spectrogram);

Constructors

ShortTimeFourierTransform(int, int?, int?, IWindowFunction<T>?, bool, PaddingMode)

Initializes a new STFT processor.

public ShortTimeFourierTransform(int nFft = 2048, int? hopLength = null, int? windowLength = null, IWindowFunction<T>? windowFunction = null, bool center = true, PaddingMode padMode = PaddingMode.Reflect)

Parameters

nFft int: FFT size (default: 2048). Should be a power of 2.
hopLength int?: Hop length between frames (default: nFft/4).
windowLength int?: Window length (default: nFft).
windowFunction IWindowFunction<T>: Window function to use (default: HanningWindow - industry standard for audio).
center bool: Whether to pad signal so frames are centered (default: true).
padMode PaddingMode: Padding mode when centering (default: Reflect).

Remarks

For Beginners: - nFft: Determines frequency resolution. Larger = more frequency detail but less time detail - hopLength: How much to slide between frames. Smaller = more overlap = smoother output Common: hopLength = nFft/4 gives 75% overlap - windowFunction: Reduces spectral leakage. Hann (default) is the industry standard for audio. Other options: HammingWindow, BlackmanWindow, KaiserWindow, etc.

GPU Acceleration: This class automatically uses GPU-accelerated FFT operations when available through AiDotNetEngine.Current.

Properties

HopLength

Gets the hop length.

public int HopLength { get; }

Property Value

int

NFft

Gets the FFT size.

public int NFft { get; }

Property Value

int

NumFrequencyBins

Gets the number of frequency bins (nFft / 2 + 1).

public int NumFrequencyBins { get; }

Property Value

int

WindowTensor

Gets the window tensor for GPU operations.

public Tensor<T>? WindowTensor { get; }

Property Value

Tensor<T>

Methods

CalculateNumFrames(int)

Calculates the number of frames for a given signal length.

public int CalculateNumFrames(int signalLength)

Parameters

signalLength int: Length of the input signal.

Returns

int: Number of STFT frames.

CalculateSignalLength(int)

Calculates signal length from number of frames.

public int CalculateSignalLength(int numFrames)

Parameters

numFrames int: Number of STFT frames.

Returns

int: Approximate signal length.

ExtractPhase(Tensor<Complex<T>>)

Extracts phase from complex spectrogram.

public static Tensor<T> ExtractPhase(Tensor<Complex<T>> complex)

Parameters

complex Tensor<Complex<T>>: Complex spectrogram.

Returns

Tensor<T>: Phase tensor in radians.

Forward(Tensor<T>)

Computes the Short-Time Fourier Transform of a signal.

public Tensor<Complex<T>> Forward(Tensor<T> signal)

Parameters

signal Tensor<T>: Input signal as a tensor [length] or [batch, length].

Returns

Tensor<Complex<T>>: Complex spectrogram tensor [numFrames, numFreqs] or [batch, numFrames, numFreqs].

Remarks

For Beginners: This method takes your audio waveform and produces a spectrogram showing which frequencies are present at each point in time.

Inverse(Tensor<Complex<T>>, int?)

Computes the Inverse Short-Time Fourier Transform (overlap-add reconstruction).

public Tensor<T> Inverse(Tensor<Complex<T>> spectrogram, int? length = null)

Parameters

spectrogram Tensor<Complex<T>>: Complex spectrogram [numFrames, numFreqs] or [batch, numFrames, numFreqs].
length int?: Expected output length (optional, otherwise computed from spectrogram).

Returns

Tensor<T>: Reconstructed signal tensor.

Remarks

For Beginners: This method takes a spectrogram and converts it back to an audio waveform. It uses the "overlap-add" method, where each frame is inverse-FFT'd and the overlapping portions are added together.

Note: Perfect reconstruction requires the same STFT parameters used for analysis.

InverseFromMagnitudeAndPhase(Tensor<T>, Tensor<T>, int?)

Reconstructs audio signal from magnitude and phase spectrograms.

public Tensor<T> InverseFromMagnitudeAndPhase(Tensor<T> magnitude, Tensor<T> phase, int? length = null)

Parameters

magnitude Tensor<T>: Magnitude spectrogram.
phase Tensor<T>: Phase spectrogram in radians.
length int?: Expected output length (optional).

Returns

Tensor<T>: Reconstructed audio signal.

Remarks

GPU Acceleration: This method uses IEngine.ISTFT for hardware-accelerated audio reconstruction when GPU is available.

Magnitude(Tensor<T>)

Computes the magnitude spectrogram.

public Tensor<T> Magnitude(Tensor<T> signal)

Parameters

signal Tensor<T>: Input signal.

Returns

Tensor<T>: Magnitude spectrogram [numFrames, numFreqs].

Remarks

For Beginners: The magnitude spectrogram shows how loud each frequency is at each time, discarding phase information. This is often used for visualization and audio processing where phase isn't needed.

GPU Acceleration: When GPU is available, this method uses hardware-accelerated STFT operations through IEngine for significantly faster processing.

MagnitudeAndPhase(Tensor<T>, out Tensor<T>, out Tensor<T>)

Computes magnitude and phase spectrograms simultaneously.

public void MagnitudeAndPhase(Tensor<T> signal, out Tensor<T> magnitude, out Tensor<T> phase)

Parameters

signal Tensor<T>: Input signal.
magnitude Tensor<T>: Output magnitude spectrogram.
phase Tensor<T>: Output phase spectrogram in radians.

Remarks

GPU Acceleration: This method uses IEngine.STFT directly for optimal GPU utilization, returning both magnitude and phase in a single pass.

PolarToComplex(Tensor<T>, Tensor<T>)

Creates complex spectrogram from magnitude and phase.

public static Tensor<Complex<T>> PolarToComplex(Tensor<T> magnitude, Tensor<T> phase)

Parameters

magnitude Tensor<T>: Magnitude tensor.
phase Tensor<T>: Phase tensor in radians.

Returns

Tensor<Complex<T>>: Complex spectrogram.

Power(Tensor<T>)

Computes the power spectrogram (magnitude squared).

public Tensor<T> Power(Tensor<T> signal)

Parameters

signal Tensor<T>: Input signal.

Returns

Tensor<T>: Power spectrogram [numFrames, numFreqs].

Remarks

GPU Acceleration: When GPU is available, this method uses hardware-accelerated STFT operations through IEngine for significantly faster processing.

Table of Contents

Class ShortTimeFourierTransform<T>

Type Parameters

Remarks

Constructors

ShortTimeFourierTransform(int, int?, int?, IWindowFunction<T>?, bool, PaddingMode)

Parameters

Remarks

Properties

HopLength

Property Value

NFft

Property Value

NumFrequencyBins

Property Value

WindowTensor

Property Value

Methods

CalculateNumFrames(int)

Parameters

Returns

CalculateSignalLength(int)

Parameters

Returns

ExtractPhase(Tensor<Complex<T>>)

Parameters

Returns

Forward(Tensor<T>)

Parameters

Returns

Remarks

Inverse(Tensor<Complex<T>>, int?)

Parameters

Returns

Remarks

InverseFromMagnitudeAndPhase(Tensor<T>, Tensor<T>, int?)

Parameters

Returns

Remarks

Magnitude(Tensor<T>)

Parameters

Returns

Remarks

MagnitudeAndPhase(Tensor<T>, out Tensor<T>, out Tensor<T>)

Parameters

Remarks

PolarToComplex(Tensor<T>, Tensor<T>)

Parameters

Returns

Power(Tensor<T>)

Parameters

Returns

Remarks