Class StableAudioOptions
- Namespace
- AiDotNet.Audio.StableAudio
- Assembly
- AiDotNet.dll
Configuration options for Stable Audio generation.
public class StableAudioOptions
- Inheritance
-
StableAudioOptions
- Inherited Members
Remarks
Stable Audio is Stability AI's state-of-the-art audio generation model using latent diffusion with a Diffusion Transformer (DiT) architecture. It supports high-quality music and sound effects generation with variable-length output.
For Beginners: Stable Audio generates professional-quality audio:
Example prompts:
- "Upbeat electronic dance track with synth leads and heavy bass drop"
- "Peaceful ambient soundscape with soft pads and nature sounds"
- "Epic orchestral trailer music with dramatic brass and percussion"
- "Lo-fi hip hop beat with jazzy piano chords and vinyl crackle"
Tips for good prompts:
- Be specific about genre, instruments, mood, and tempo
- Mention audio characteristics (stereo width, dynamics)
- Include style references when appropriate
Properties
DitHiddenDim
Gets or sets the DiT hidden dimension.
public int DitHiddenDim { get; set; }
Property Value
Remarks
Hidden dimension of the Diffusion Transformer blocks. Default of 1024 is for Base model.
DitPath
Gets or sets the path to the DiT denoiser ONNX model.
public string? DitPath { get; set; }
Property Value
DropoutRate
Gets or sets the dropout rate for training.
public double DropoutRate { get; set; }
Property Value
DurationSeconds
Gets or sets the default duration of generated audio in seconds.
public double DurationSeconds { get; set; }
Property Value
GuidanceScale
Gets or sets the classifier-free guidance scale.
public double GuidanceScale { get; set; }
Property Value
Remarks
Controls how closely the model follows the text prompt: - Low (1.0-3.0): More variation, less prompt adherence - Default (7.0): Good balance - High (10.0-15.0): Stricter prompt following, may reduce quality
LatentDimension
Gets or sets the latent dimension.
public int LatentDimension { get; set; }
Property Value
Remarks
The dimension of the compressed audio representation. Default of 64 matches standard Stable Audio architecture.
MaxAudioLength
Gets or sets the maximum audio latent length.
public int MaxAudioLength { get; set; }
Property Value
MaxDurationSeconds
Gets or sets the maximum duration in seconds.
public double MaxDurationSeconds { get; set; }
Property Value
Remarks
Stable Audio 2.0 can generate up to 180 seconds (3 minutes) of audio. The Open variant supports up to 47 seconds. Longer durations require more memory and compute time.
MaxTextLength
Gets or sets the maximum text sequence length.
public int MaxTextLength { get; set; }
Property Value
ModelSize
Gets or sets the model size variant.
public StableAudioModelSize ModelSize { get; set; }
Property Value
Remarks
Different sizes trade off quality vs speed. Default is Base which balances both well.
NumAttentionHeads
Gets or sets the number of attention heads.
public int NumAttentionHeads { get; set; }
Property Value
NumDitBlocks
Gets or sets the number of DiT blocks.
public int NumDitBlocks { get; set; }
Property Value
Remarks
Number of Diffusion Transformer blocks. More blocks = more capacity but slower.
NumInferenceSteps
Gets or sets the number of diffusion steps.
public int NumInferenceSteps { get; set; }
Property Value
Remarks
More steps = higher quality but slower generation: - 25 steps: Fast, lower quality - 50 steps: Good balance - 100 steps: High quality (default) - 200+ steps: Best quality, slow
OnnxOptions
Gets or sets the ONNX execution options.
public OnnxModelOptions OnnxOptions { get; set; }
Property Value
SampleRate
Gets or sets the output sample rate in Hz.
public int SampleRate { get; set; }
Property Value
Remarks
Stable Audio uses 44.1kHz by default for CD-quality audio. This is the professional music standard sample rate.
Seed
Gets or sets the random seed for reproducibility.
public int? Seed { get; set; }
Property Value
- int?
Remarks
Set to a specific value to generate the same audio each time. Null for random generation.
Stereo
Gets or sets whether to generate stereo audio.
public bool Stereo { get; set; }
Property Value
Remarks
When true, generates two-channel stereo output. Stable Audio natively supports stereo generation.
TextEmbeddingDim
Gets or sets the T5 embedding dimension.
public int TextEmbeddingDim { get; set; }
Property Value
TextEncoderPath
Gets or sets the path to the T5 text encoder ONNX model.
public string? TextEncoderPath { get; set; }
Property Value
TimingConditioningScale
Gets or sets the conditioning scale for timing information.
public double TimingConditioningScale { get; set; }
Property Value
Remarks
Stable Audio uses duration and timing conditioning. This controls how strongly the model follows timing information.
VaePath
Gets or sets the path to the VAE ONNX model.
public string? VaePath { get; set; }