Table of Contents

Class ConstantQTransform<T>

Namespace
AiDotNet.Audio.Features
Assembly
AiDotNet.dll

Constant-Q Transform (CQT) for music analysis with logarithmic frequency resolution.

public class ConstantQTransform<T>

Type Parameters

T

The numeric type used for calculations.

Inheritance
ConstantQTransform<T>
Inherited Members

Remarks

Unlike the FFT which has linear frequency spacing, the CQT uses logarithmic spacing where each frequency bin is a constant ratio (Q factor) above the previous one. This matches how humans perceive pitch: octaves are equally spaced.

For Beginners: The CQT is perfect for music analysis because:

  • Musical notes are logarithmically spaced (each octave doubles frequency)
  • Low notes get wide bins, high notes get narrow bins (matches perception)
  • Makes chord/key detection much easier than FFT

Usage:

var cqt = new ConstantQTransform<float>(
    sampleRate: 22050,
    fMin: 32.7, // C1
    binsPerOctave: 12, // Semitones
    numOctaves: 7);

var audio = LoadAudio("music.wav");
var cqtSpectrum = cqt.Transform(audio);
// cqtSpectrum has shape [time_frames, num_bins] where num_bins = 12 * 7 = 84

Constructors

ConstantQTransform(int, double, int, int, int, WindowType)

Creates a new Constant-Q Transform instance.

public ConstantQTransform(int sampleRate = 22050, double fMin = 32.7, int binsPerOctave = 12, int numOctaves = 7, int hopLength = 512, WindowType windowType = WindowType.Hann)

Parameters

sampleRate int

Audio sample rate in Hz.

fMin double

Minimum frequency in Hz (default: C1 = 32.7 Hz).

binsPerOctave int

Number of bins per octave (default: 12 for semitones).

numOctaves int

Number of octaves to cover (default: 7).

hopLength int

Hop length between frames (default: 512).

windowType WindowType

Window type for analysis (default: Hann).

Properties

Frequencies

Gets the center frequencies for each bin.

public double[] Frequencies { get; }

Property Value

double[]

NumBins

Gets the number of frequency bins in the CQT output.

public int NumBins { get; }

Property Value

int

QFactor

Gets the Q factor (quality factor) for this CQT.

public double QFactor { get; }

Property Value

double

SampleRate

Gets the sample rate.

public int SampleRate { get; }

Property Value

int

Methods

GetMidiNote(int)

Gets the MIDI note number for a given bin index.

public double GetMidiNote(int binIndex)

Parameters

binIndex int

The CQT bin index.

Returns

double

The corresponding MIDI note number.

GetNoteName(int)

Gets the note name for a given bin index.

public string GetNoteName(int binIndex)

Parameters

binIndex int

The CQT bin index.

Returns

string

The note name (e.g., "C4", "A#3").

Transform(Tensor<T>)

Computes the Constant-Q Transform of an audio signal.

public Tensor<T> Transform(Tensor<T> audio)

Parameters

audio Tensor<T>

Input audio waveform.

Returns

Tensor<T>

CQT magnitude spectrogram with shape [time_frames, num_bins].

TransformComplex(Tensor<T>)

Computes the complex CQT (with phase information).

public Tensor<T> TransformComplex(Tensor<T> audio)

Parameters

audio Tensor<T>

Input audio waveform.

Returns

Tensor<T>

Complex CQT with shape [time_frames, num_bins, 2] where last dim is [real, imag].

TransformDb(Tensor<T>, double, double)

Computes CQT in decibels (log scale).

public Tensor<T> TransformDb(Tensor<T> audio, double refValue = 1, double minDb = -80)

Parameters

audio Tensor<T>

Input audio waveform.

refValue double

Reference value for dB conversion.

minDb double

Minimum dB value (floor).

Returns

Tensor<T>

CQT spectrogram in decibels.

TransformPower(Tensor<T>, double)

Computes CQT with power spectrum (magnitude squared).

public Tensor<T> TransformPower(Tensor<T> audio, double power = 2)

Parameters

audio Tensor<T>

Input audio waveform.

power double

Power exponent (default: 2.0 for power spectrum).

Returns

Tensor<T>

Power CQT spectrogram.