Table of Contents

Class MusicGenOptions

Namespace
AiDotNet.Audio.MusicGen
Assembly
AiDotNet.dll

Configuration options for MusicGen text-to-music generation.

public class MusicGenOptions
Inheritance
MusicGenOptions
Inherited Members

Remarks

MusicGen is Meta's state-of-the-art music generation model that creates high-quality music from text descriptions. It uses a single-stage transformer language model operating over EnCodec audio codes.

For Beginners: MusicGen generates actual music from descriptions:

Example prompts:

  • "Upbeat electronic dance music with heavy bass"
  • "Calm acoustic guitar melody with soft drums"
  • "Epic orchestral piece with dramatic strings"
  • "Lo-fi hip hop beats for studying"

Tips for good prompts:

  • Be specific about genre, instruments, and mood
  • Include tempo hints (fast, slow, moderate)
  • Mention energy level (energetic, calm, building)

Properties

CodebookSize

Gets or sets the codebook vocabulary size.

public int CodebookSize { get; set; }

Property Value

int

Remarks

Must match the EnCodec model configuration. Default of 2048 is standard for MusicGen.

DropoutRate

Gets or sets the dropout rate for training.

public double DropoutRate { get; set; }

Property Value

double

DurationSeconds

Gets or sets the default duration of generated music in seconds.

public double DurationSeconds { get; set; }

Property Value

double

EnCodecDecoderPath

Gets or sets the path to the EnCodec decoder ONNX model.

public string? EnCodecDecoderPath { get; set; }

Property Value

string

GuidanceScale

Gets or sets the classifier-free guidance scale.

public double GuidanceScale { get; set; }

Property Value

double

Remarks

Controls how closely the model follows the text prompt: - Low (1.0-2.0): More variation, less prompt adherence - Default (3.0): Good balance - High (4.0-7.0): Stricter prompt following, less creativity

LanguageModelPath

Gets or sets the path to the language model ONNX model.

public string? LanguageModelPath { get; set; }

Property Value

string

MaxDurationSeconds

Gets or sets the maximum duration in seconds.

public double MaxDurationSeconds { get; set; }

Property Value

double

Remarks

MusicGen can generate up to 30 seconds of audio. Longer durations require more memory and compute time.

MaxTextLength

Gets or sets the maximum text sequence length.

public int MaxTextLength { get; set; }

Property Value

int

ModelSize

Gets or sets the model size variant.

public MusicGenModelSize ModelSize { get; set; }

Property Value

MusicGenModelSize

Remarks

Different sizes trade off quality vs speed. Default is Medium which balances both well.

NumCodebooks

Gets or sets the number of EnCodec codebooks to use.

public int NumCodebooks { get; set; }

Property Value

int

Remarks

More codebooks = higher quality but slower generation. Default of 4 is used by standard MusicGen models.

OnnxOptions

Gets or sets the ONNX execution options.

public OnnxModelOptions OnnxOptions { get; set; }

Property Value

OnnxModelOptions

SampleRate

Gets or sets the output sample rate in Hz.

public int SampleRate { get; set; }

Property Value

int

Remarks

MusicGen uses 32kHz by default, matching the EnCodec codec. This produces high-quality audio suitable for music.

Seed

Gets or sets the random seed for reproducibility.

public int? Seed { get; set; }

Property Value

int?

Remarks

Set to a specific value to generate the same music each time. Null for random generation.

Stereo

Gets or sets whether to generate stereo audio.

public bool Stereo { get; set; }

Property Value

bool

Remarks

Requires the Stereo model variant for best results. When using non-stereo models, mono output is duplicated to stereo.

Temperature

Gets or sets the sampling temperature.

public double Temperature { get; set; }

Property Value

double

Remarks

Controls randomness in generation: - Lower (0.5-0.8): More predictable, stable output - Default (1.0): Balanced creativity - Higher (1.2-2.0): More creative but potentially less coherent

TextEncoderPath

Gets or sets the path to the text encoder ONNX model.

public string? TextEncoderPath { get; set; }

Property Value

string

TopK

Gets or sets the top-k sampling parameter.

public int TopK { get; set; }

Property Value

int

Remarks

Limits sampling to the top K most likely tokens. Default of 250 works well for diverse music generation.

TopP

Gets or sets the top-p (nucleus) sampling parameter.

public double TopP { get; set; }

Property Value

double

Remarks

If greater than 0, uses nucleus sampling instead of top-k. Value of 0.0 disables nucleus sampling (uses top-k only).

UseDelayPattern

Gets or sets whether to use the delay pattern.

public bool UseDelayPattern { get; set; }

Property Value

bool

Remarks

The delay pattern is MusicGen's key innovation: - Generates codebooks with temporal offset - Reduces effective sequence length - Improves generation efficiency

Should be true for standard MusicGen operation.