Enum StableAudioModelSize
- Namespace
- AiDotNet.Audio.StableAudio
- Assembly
- AiDotNet.dll
Specifies the size variant of the Stable Audio model.
public enum StableAudioModelSize
Fields
Base = 1Base model variant (800M parameters). Default choice.
- T5 encoder: 768 hidden dim
- DiT: 1024 hidden dim, 24 blocks
- Good balance of quality and speed
Large = 2Large model variant (1.5B parameters).
- T5 encoder: 1024 hidden dim
- DiT: 1536 hidden dim, 32 blocks
- Highest quality, requires significant GPU memory
Open = 3Stable Audio Open variant.
- Open-source model with permissive license
- Optimized for music generation
- Based on Base architecture
Small = 0Small model variant (300M parameters).
- T5 encoder: 256 hidden dim
- DiT: 512 hidden dim, 12 blocks
- Fast inference, suitable for experimentation
V2 = 4Stable Audio 2.0 variant.
- Improved architecture with better coherence
- Extended duration support (up to 3 minutes)
- Enhanced stereo output
Remarks
Stable Audio is a latent diffusion model by Stability AI for high-quality audio generation. It uses a Diffusion Transformer (DiT) architecture instead of U-Net for improved quality and supports variable-length audio generation.
For Beginners: Think of model sizes like different quality levels:
- Small: Fast generation, good for experimentation (300M parameters)
- Base: Balanced quality and speed (800M parameters)
- Large: Best quality, requires more resources (1.5B parameters)
- Open: Open-source variant with permissive license