Class AudioLDMOptions
Configuration options for AudioLDM text-to-audio generation.
public class AudioLDMOptions
- Inheritance
-
AudioLDMOptions
- Inherited Members
Remarks
AudioLDM is a latent diffusion model for text-to-audio generation. It operates in a compressed latent space learned by a VAE, making generation efficient while maintaining high audio quality.
For Beginners: AudioLDM generates realistic audio from descriptions:
Example prompts:
- "A dog barking followed by children laughing"
- "Rain falling on a tin roof with distant thunder"
- "Footsteps on gravel approaching and stopping"
- "Piano music in a concert hall with audience applause"
Tips for good prompts:
- Be specific about the sound source and environment
- Include temporal information (before, after, while)
- Mention acoustic properties (loud, soft, distant, echoing)
Properties
ClapEmbeddingDim
Gets or sets the CLAP embedding dimension.
public int ClapEmbeddingDim { get; set; }
Property Value
ClapEncoderPath
Gets or sets the path to the CLAP text encoder ONNX model.
public string? ClapEncoderPath { get; set; }
Property Value
DropoutRate
Gets or sets the dropout rate for training.
public double DropoutRate { get; set; }
Property Value
DurationSeconds
Gets or sets the default duration of generated audio in seconds.
public double DurationSeconds { get; set; }
Property Value
GuidanceScale
Gets or sets the classifier-free guidance scale.
public double GuidanceScale { get; set; }
Property Value
Remarks
Controls how closely the model follows the text prompt: - Low (1.0-2.0): More variation, less prompt adherence - Default (2.5): Good balance - High (4.0-7.0): Stricter prompt following
HopLength
Gets or sets the hop length for spectrogram computation.
public int HopLength { get; set; }
Property Value
Remarks
Controls the time resolution of the spectrogram. Smaller values = higher time resolution but more compute.
LatentDimension
Gets or sets the VAE latent dimension.
public int LatentDimension { get; set; }
Property Value
Remarks
The dimension of the compressed audio representation. Default of 8 matches standard AudioLDM architecture.
LatentDownsampleFactor
Gets or sets the latent downsampling factor.
public int LatentDownsampleFactor { get; set; }
Property Value
Remarks
How much the VAE compresses the spectrogram spatially. Default of 4 provides good compression/quality trade-off.
MaxDurationSeconds
Gets or sets the maximum duration in seconds.
public double MaxDurationSeconds { get; set; }
Property Value
Remarks
AudioLDM can generate up to 30 seconds of audio. Longer durations require more memory and compute time.
MaxTextLength
Gets or sets the maximum text sequence length.
public int MaxTextLength { get; set; }
Property Value
ModelSize
Gets or sets the model size variant.
public AudioLDMModelSize ModelSize { get; set; }
Property Value
Remarks
Different sizes trade off quality vs speed. Default is Base which balances both well.
NumInferenceSteps
Gets or sets the number of diffusion steps.
public int NumInferenceSteps { get; set; }
Property Value
Remarks
More steps = higher quality but slower generation: - 25 steps: Fast, lower quality - 50 steps: Good balance (default) - 100+ steps: Best quality, slow
NumMelBins
Gets or sets the number of mel spectrogram bins.
public int NumMelBins { get; set; }
Property Value
Remarks
Standard AudioLDM uses 64 mel bins. Higher values capture more spectral detail.
OnnxOptions
Gets or sets the ONNX execution options.
public OnnxModelOptions OnnxOptions { get; set; }
Property Value
SampleRate
Gets or sets the output sample rate in Hz.
public int SampleRate { get; set; }
Property Value
Remarks
AudioLDM uses 16kHz by default for speech/environmental sounds. Can be set to 48kHz for higher quality music generation.
Seed
Gets or sets the random seed for reproducibility.
public int? Seed { get; set; }
Property Value
- int?
Remarks
Set to a specific value to generate the same audio each time. Null for random generation.
Stereo
Gets or sets whether to generate stereo audio.
public bool Stereo { get; set; }
Property Value
Remarks
When true, generates two-channel stereo output. Mono output is duplicated to stereo for compatibility.
UNetPath
Gets or sets the path to the U-Net denoiser ONNX model.
public string? UNetPath { get; set; }
Property Value
VaePath
Gets or sets the path to the VAE ONNX model.
public string? VaePath { get; set; }
Property Value
VocoderPath
Gets or sets the path to the HiFi-GAN vocoder ONNX model.
public string? VocoderPath { get; set; }
Property Value
WindowSize
Gets or sets the FFT window size.
public int WindowSize { get; set; }