Class ShortTimeFourierTransform<T>
Short-Time Fourier Transform (STFT) for analyzing audio signals over time.
public class ShortTimeFourierTransform<T>
Type Parameters
TThe numeric type used for calculations.
- Inheritance
-
ShortTimeFourierTransform<T>
- Inherited Members
Remarks
The STFT breaks a signal into short overlapping segments and computes the Fourier transform of each segment. This reveals how the frequency content of a signal changes over time.
For Beginners: Audio signals like music or speech change over time. While a regular FFT tells you which frequencies are in the entire signal, it doesn't tell you WHEN those frequencies occur.
The STFT solves this by:
- Cutting the audio into small overlapping pieces (frames)
- Applying a window function to each frame (reduces edge artifacts)
- Computing FFT on each windowed frame
- Stacking the results to form a spectrogram (time vs. frequency)
Usage:
var stft = new ShortTimeFourierTransform<float>(nFft: 2048, hopLength: 512);
var spectrogram = stft.Forward(audioSignal);
// spectrogram.Shape = [numFrames, nFft/2 + 1] (complex values)
// To reconstruct audio from spectrogram:
var reconstructed = stft.Inverse(spectrogram);
Constructors
ShortTimeFourierTransform(int, int?, int?, IWindowFunction<T>?, bool, PaddingMode)
Initializes a new STFT processor.
public ShortTimeFourierTransform(int nFft = 2048, int? hopLength = null, int? windowLength = null, IWindowFunction<T>? windowFunction = null, bool center = true, PaddingMode padMode = PaddingMode.Reflect)
Parameters
nFftintFFT size (default: 2048). Should be a power of 2.
hopLengthint?Hop length between frames (default: nFft/4).
windowLengthint?Window length (default: nFft).
windowFunctionIWindowFunction<T>Window function to use (default: HanningWindow - industry standard for audio).
centerboolWhether to pad signal so frames are centered (default: true).
padModePaddingModePadding mode when centering (default: Reflect).
Remarks
For Beginners: - nFft: Determines frequency resolution. Larger = more frequency detail but less time detail - hopLength: How much to slide between frames. Smaller = more overlap = smoother output Common: hopLength = nFft/4 gives 75% overlap - windowFunction: Reduces spectral leakage. Hann (default) is the industry standard for audio. Other options: HammingWindow, BlackmanWindow, KaiserWindow, etc.
GPU Acceleration: This class automatically uses GPU-accelerated FFT operations when available through AiDotNetEngine.Current.
Properties
HopLength
Gets the hop length.
public int HopLength { get; }
Property Value
NFft
Gets the FFT size.
public int NFft { get; }
Property Value
NumFrequencyBins
Gets the number of frequency bins (nFft / 2 + 1).
public int NumFrequencyBins { get; }
Property Value
WindowTensor
Gets the window tensor for GPU operations.
public Tensor<T>? WindowTensor { get; }
Property Value
- Tensor<T>
Methods
CalculateNumFrames(int)
Calculates the number of frames for a given signal length.
public int CalculateNumFrames(int signalLength)
Parameters
signalLengthintLength of the input signal.
Returns
- int
Number of STFT frames.
CalculateSignalLength(int)
Calculates signal length from number of frames.
public int CalculateSignalLength(int numFrames)
Parameters
numFramesintNumber of STFT frames.
Returns
- int
Approximate signal length.
ExtractPhase(Tensor<Complex<T>>)
Extracts phase from complex spectrogram.
public static Tensor<T> ExtractPhase(Tensor<Complex<T>> complex)
Parameters
complexTensor<Complex<T>>Complex spectrogram.
Returns
- Tensor<T>
Phase tensor in radians.
Forward(Tensor<T>)
Computes the Short-Time Fourier Transform of a signal.
public Tensor<Complex<T>> Forward(Tensor<T> signal)
Parameters
signalTensor<T>Input signal as a tensor [length] or [batch, length].
Returns
- Tensor<Complex<T>>
Complex spectrogram tensor [numFrames, numFreqs] or [batch, numFrames, numFreqs].
Remarks
For Beginners: This method takes your audio waveform and produces a spectrogram showing which frequencies are present at each point in time.
Inverse(Tensor<Complex<T>>, int?)
Computes the Inverse Short-Time Fourier Transform (overlap-add reconstruction).
public Tensor<T> Inverse(Tensor<Complex<T>> spectrogram, int? length = null)
Parameters
spectrogramTensor<Complex<T>>Complex spectrogram [numFrames, numFreqs] or [batch, numFrames, numFreqs].
lengthint?Expected output length (optional, otherwise computed from spectrogram).
Returns
- Tensor<T>
Reconstructed signal tensor.
Remarks
For Beginners: This method takes a spectrogram and converts it back to an audio waveform. It uses the "overlap-add" method, where each frame is inverse-FFT'd and the overlapping portions are added together.
Note: Perfect reconstruction requires the same STFT parameters used for analysis.
InverseFromMagnitudeAndPhase(Tensor<T>, Tensor<T>, int?)
Reconstructs audio signal from magnitude and phase spectrograms.
public Tensor<T> InverseFromMagnitudeAndPhase(Tensor<T> magnitude, Tensor<T> phase, int? length = null)
Parameters
magnitudeTensor<T>Magnitude spectrogram.
phaseTensor<T>Phase spectrogram in radians.
lengthint?Expected output length (optional).
Returns
- Tensor<T>
Reconstructed audio signal.
Remarks
GPU Acceleration: This method uses IEngine.ISTFT for hardware-accelerated audio reconstruction when GPU is available.
Magnitude(Tensor<T>)
Computes the magnitude spectrogram.
public Tensor<T> Magnitude(Tensor<T> signal)
Parameters
signalTensor<T>Input signal.
Returns
- Tensor<T>
Magnitude spectrogram [numFrames, numFreqs].
Remarks
For Beginners: The magnitude spectrogram shows how loud each frequency is at each time, discarding phase information. This is often used for visualization and audio processing where phase isn't needed.
GPU Acceleration: When GPU is available, this method uses hardware-accelerated STFT operations through IEngine for significantly faster processing.
MagnitudeAndPhase(Tensor<T>, out Tensor<T>, out Tensor<T>)
Computes magnitude and phase spectrograms simultaneously.
public void MagnitudeAndPhase(Tensor<T> signal, out Tensor<T> magnitude, out Tensor<T> phase)
Parameters
signalTensor<T>Input signal.
magnitudeTensor<T>Output magnitude spectrogram.
phaseTensor<T>Output phase spectrogram in radians.
Remarks
GPU Acceleration: This method uses IEngine.STFT directly for optimal GPU utilization, returning both magnitude and phase in a single pass.
PolarToComplex(Tensor<T>, Tensor<T>)
Creates complex spectrogram from magnitude and phase.
public static Tensor<Complex<T>> PolarToComplex(Tensor<T> magnitude, Tensor<T> phase)
Parameters
magnitudeTensor<T>Magnitude tensor.
phaseTensor<T>Phase tensor in radians.
Returns
- Tensor<Complex<T>>
Complex spectrogram.
Power(Tensor<T>)
Computes the power spectrogram (magnitude squared).
public Tensor<T> Power(Tensor<T> signal)
Parameters
signalTensor<T>Input signal.
Returns
- Tensor<T>
Power spectrogram [numFrames, numFreqs].
Remarks
GPU Acceleration: When GPU is available, this method uses hardware-accelerated STFT operations through IEngine for significantly faster processing.