Class AudioLDMOptions

Namespace: AiDotNet.Audio.AudioLDM

Assembly: AiDotNet.dll

Configuration options for AudioLDM text-to-audio generation.

public class AudioLDMOptions

Inheritance: object

AudioLDMOptions

Inherited Members: object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

AudioLDM is a latent diffusion model for text-to-audio generation. It operates in a compressed latent space learned by a VAE, making generation efficient while maintaining high audio quality.

For Beginners: AudioLDM generates realistic audio from descriptions:

Example prompts:

"A dog barking followed by children laughing"
"Rain falling on a tin roof with distant thunder"
"Footsteps on gravel approaching and stopping"
"Piano music in a concert hall with audience applause"

Tips for good prompts:

Be specific about the sound source and environment
Include temporal information (before, after, while)
Mention acoustic properties (loud, soft, distant, echoing)

Properties

ClapEmbeddingDim

Gets or sets the CLAP embedding dimension.

public int ClapEmbeddingDim { get; set; }

Property Value

int

ClapEncoderPath

Gets or sets the path to the CLAP text encoder ONNX model.

public string? ClapEncoderPath { get; set; }

Property Value

string

DropoutRate

Gets or sets the dropout rate for training.

public double DropoutRate { get; set; }

Property Value

double

DurationSeconds

Gets or sets the default duration of generated audio in seconds.

public double DurationSeconds { get; set; }

Property Value

double

GuidanceScale

Gets or sets the classifier-free guidance scale.

public double GuidanceScale { get; set; }

Property Value

double

Remarks

Controls how closely the model follows the text prompt: - Low (1.0-2.0): More variation, less prompt adherence - Default (2.5): Good balance - High (4.0-7.0): Stricter prompt following

HopLength

Gets or sets the hop length for spectrogram computation.

public int HopLength { get; set; }

Property Value

int

Remarks

Controls the time resolution of the spectrogram. Smaller values = higher time resolution but more compute.

LatentDimension

Gets or sets the VAE latent dimension.

public int LatentDimension { get; set; }

Property Value

int

Remarks

The dimension of the compressed audio representation. Default of 8 matches standard AudioLDM architecture.

LatentDownsampleFactor

Gets or sets the latent downsampling factor.

public int LatentDownsampleFactor { get; set; }

Property Value

int

Remarks

How much the VAE compresses the spectrogram spatially. Default of 4 provides good compression/quality trade-off.

MaxDurationSeconds

Gets or sets the maximum duration in seconds.

public double MaxDurationSeconds { get; set; }

Property Value

double

Remarks

AudioLDM can generate up to 30 seconds of audio. Longer durations require more memory and compute time.

MaxTextLength

Gets or sets the maximum text sequence length.

public int MaxTextLength { get; set; }

Property Value

int

ModelSize

Gets or sets the model size variant.

public AudioLDMModelSize ModelSize { get; set; }

Property Value

AudioLDMModelSize

Remarks

Different sizes trade off quality vs speed. Default is Base which balances both well.

NumInferenceSteps

Gets or sets the number of diffusion steps.

public int NumInferenceSteps { get; set; }

Property Value

int

Remarks

More steps = higher quality but slower generation: - 25 steps: Fast, lower quality - 50 steps: Good balance (default) - 100+ steps: Best quality, slow

NumMelBins

Gets or sets the number of mel spectrogram bins.

public int NumMelBins { get; set; }

Property Value

int

Remarks

Standard AudioLDM uses 64 mel bins. Higher values capture more spectral detail.

OnnxOptions

Gets or sets the ONNX execution options.

public OnnxModelOptions OnnxOptions { get; set; }

Property Value

OnnxModelOptions

SampleRate

Gets or sets the output sample rate in Hz.

public int SampleRate { get; set; }

Property Value

int

Remarks

AudioLDM uses 16kHz by default for speech/environmental sounds. Can be set to 48kHz for higher quality music generation.

Seed

Gets or sets the random seed for reproducibility.

public int? Seed { get; set; }

Property Value

int?

Remarks

Set to a specific value to generate the same audio each time. Null for random generation.

Stereo

Gets or sets whether to generate stereo audio.

public bool Stereo { get; set; }

Property Value

bool

Remarks

When true, generates two-channel stereo output. Mono output is duplicated to stereo for compatibility.

UNetPath

Gets or sets the path to the U-Net denoiser ONNX model.

public string? UNetPath { get; set; }

Property Value

string

VaePath

Gets or sets the path to the VAE ONNX model.

public string? VaePath { get; set; }

Property Value

string

VocoderPath

Gets or sets the path to the HiFi-GAN vocoder ONNX model.

public string? VocoderPath { get; set; }

Property Value

string

WindowSize

Gets or sets the FFT window size.

public int WindowSize { get; set; }

Property Value

int

Table of Contents

Class AudioLDMOptions

Remarks

Properties

ClapEmbeddingDim

Property Value

ClapEncoderPath

Property Value

DropoutRate

Property Value

DurationSeconds

Property Value

GuidanceScale

Property Value

Remarks

HopLength

Property Value

Remarks

LatentDimension

Property Value

Remarks

LatentDownsampleFactor

Property Value

Remarks

MaxDurationSeconds

Property Value

Remarks

MaxTextLength

Property Value

ModelSize

Property Value

Remarks

NumInferenceSteps

Property Value

Remarks

NumMelBins

Property Value

Remarks

OnnxOptions

Property Value

SampleRate

Property Value

Remarks

Seed

Property Value

Remarks

Stereo

Property Value

Remarks

UNetPath

Property Value

VaePath

Property Value

VocoderPath

Property Value

WindowSize

Property Value