Table of Contents

Class StableAudioOptions

Namespace
AiDotNet.Audio.StableAudio
Assembly
AiDotNet.dll

Configuration options for Stable Audio generation.

public class StableAudioOptions
Inheritance
StableAudioOptions
Inherited Members

Remarks

Stable Audio is Stability AI's state-of-the-art audio generation model using latent diffusion with a Diffusion Transformer (DiT) architecture. It supports high-quality music and sound effects generation with variable-length output.

For Beginners: Stable Audio generates professional-quality audio:

Example prompts:

  • "Upbeat electronic dance track with synth leads and heavy bass drop"
  • "Peaceful ambient soundscape with soft pads and nature sounds"
  • "Epic orchestral trailer music with dramatic brass and percussion"
  • "Lo-fi hip hop beat with jazzy piano chords and vinyl crackle"

Tips for good prompts:

  • Be specific about genre, instruments, mood, and tempo
  • Mention audio characteristics (stereo width, dynamics)
  • Include style references when appropriate

Properties

DitHiddenDim

Gets or sets the DiT hidden dimension.

public int DitHiddenDim { get; set; }

Property Value

int

Remarks

Hidden dimension of the Diffusion Transformer blocks. Default of 1024 is for Base model.

DitPath

Gets or sets the path to the DiT denoiser ONNX model.

public string? DitPath { get; set; }

Property Value

string

DropoutRate

Gets or sets the dropout rate for training.

public double DropoutRate { get; set; }

Property Value

double

DurationSeconds

Gets or sets the default duration of generated audio in seconds.

public double DurationSeconds { get; set; }

Property Value

double

GuidanceScale

Gets or sets the classifier-free guidance scale.

public double GuidanceScale { get; set; }

Property Value

double

Remarks

Controls how closely the model follows the text prompt: - Low (1.0-3.0): More variation, less prompt adherence - Default (7.0): Good balance - High (10.0-15.0): Stricter prompt following, may reduce quality

LatentDimension

Gets or sets the latent dimension.

public int LatentDimension { get; set; }

Property Value

int

Remarks

The dimension of the compressed audio representation. Default of 64 matches standard Stable Audio architecture.

MaxAudioLength

Gets or sets the maximum audio latent length.

public int MaxAudioLength { get; set; }

Property Value

int

MaxDurationSeconds

Gets or sets the maximum duration in seconds.

public double MaxDurationSeconds { get; set; }

Property Value

double

Remarks

Stable Audio 2.0 can generate up to 180 seconds (3 minutes) of audio. The Open variant supports up to 47 seconds. Longer durations require more memory and compute time.

MaxTextLength

Gets or sets the maximum text sequence length.

public int MaxTextLength { get; set; }

Property Value

int

ModelSize

Gets or sets the model size variant.

public StableAudioModelSize ModelSize { get; set; }

Property Value

StableAudioModelSize

Remarks

Different sizes trade off quality vs speed. Default is Base which balances both well.

NumAttentionHeads

Gets or sets the number of attention heads.

public int NumAttentionHeads { get; set; }

Property Value

int

NumDitBlocks

Gets or sets the number of DiT blocks.

public int NumDitBlocks { get; set; }

Property Value

int

Remarks

Number of Diffusion Transformer blocks. More blocks = more capacity but slower.

NumInferenceSteps

Gets or sets the number of diffusion steps.

public int NumInferenceSteps { get; set; }

Property Value

int

Remarks

More steps = higher quality but slower generation: - 25 steps: Fast, lower quality - 50 steps: Good balance - 100 steps: High quality (default) - 200+ steps: Best quality, slow

OnnxOptions

Gets or sets the ONNX execution options.

public OnnxModelOptions OnnxOptions { get; set; }

Property Value

OnnxModelOptions

SampleRate

Gets or sets the output sample rate in Hz.

public int SampleRate { get; set; }

Property Value

int

Remarks

Stable Audio uses 44.1kHz by default for CD-quality audio. This is the professional music standard sample rate.

Seed

Gets or sets the random seed for reproducibility.

public int? Seed { get; set; }

Property Value

int?

Remarks

Set to a specific value to generate the same audio each time. Null for random generation.

Stereo

Gets or sets whether to generate stereo audio.

public bool Stereo { get; set; }

Property Value

bool

Remarks

When true, generates two-channel stereo output. Stable Audio natively supports stereo generation.

TextEmbeddingDim

Gets or sets the T5 embedding dimension.

public int TextEmbeddingDim { get; set; }

Property Value

int

TextEncoderPath

Gets or sets the path to the T5 text encoder ONNX model.

public string? TextEncoderPath { get; set; }

Property Value

string

TimingConditioningScale

Gets or sets the conditioning scale for timing information.

public double TimingConditioningScale { get; set; }

Property Value

double

Remarks

Stable Audio uses duration and timing conditioning. This controls how strongly the model follows timing information.

VaePath

Gets or sets the path to the VAE ONNX model.

public string? VaePath { get; set; }

Property Value

string