Table of Contents

Class MusicGenModel<T>

Namespace
AiDotNet.Diffusion.Models
Assembly
AiDotNet.dll

MusicGen - Diffusion-based music generation model with advanced musical controls.

public class MusicGenModel<T> : AudioDiffusionModelBase<T>, ILatentDiffusionModel<T>, IAudioDiffusionModel<T>, IDiffusionModel<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>

Type Parameters

T

The numeric type used for calculations.

Inheritance
MusicGenModel<T>
Implements
IFullModel<T, Tensor<T>, Tensor<T>>
IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>
IParameterizable<T, Tensor<T>, Tensor<T>>
ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>
IGradientComputable<T, Tensor<T>, Tensor<T>>
Inherited Members
Extension Methods

Examples

// Create a MusicGen model
var musicGen = new MusicGenModel<float>();

// Generate electronic music at specific BPM
var edm = musicGen.GenerateMusicWithTempo(
    prompt: "Energetic electronic dance music with synthesizers",
    bpm: 128,
    durationSeconds: 30.0,
    numInferenceSteps: 200);

// Generate melody-conditioned music
var variation = musicGen.GenerateFromMelody(
    melodyAudio: originalMelody,
    prompt: "Jazz version with saxophone",
    preservationStrength: 0.7);

Remarks

MusicGenModel is a specialized diffusion model for music generation that provides fine-grained control over musical characteristics including:

  1. Text-to-Music: Generate music from natural language descriptions
  2. Melody Conditioning: Guide generation with a reference melody
  3. Rhythm/Beat Conditioning: Generate music following a specific rhythm pattern
  4. Tempo Control: Generate at specific BPM (beats per minute)
  5. Key/Scale Guidance: Influence the musical key of generated content
  6. Style Transfer: Transform existing music to different styles

For Beginners: This model generates music with precise control:

Example prompts:

  • "Upbeat electronic dance music at 128 BPM" -> EDM track
  • "Sad piano ballad in A minor" -> emotional piano piece
  • "Funky bass groove with drums" -> funk rhythm section
  • "Orchestral film score, epic and dramatic" -> cinematic music

Advanced controls:

  • BPM: Set exact tempo (60-200 BPM typical)
  • Key: Major/minor keys (C major, A minor, etc.)
  • Instruments: Specify or exclude instruments
  • Style: Jazz, rock, classical, electronic, etc.

Technical specifications: - Sample rate: 32 kHz (high-quality music) - Latent channels: 16 (more capacity for musical structure) - Mel channels: 128 - Duration: Up to 60 seconds - Guidance scale: 3.0-7.0 typical

Constructors

MusicGenModel()

Initializes a new MusicGen model with default parameters.

public MusicGenModel()

MusicGenModel(DiffusionModelOptions<T>?, INoiseScheduler<T>?, UNetNoisePredictor<T>?, AudioVAE<T>?, IConditioningModule<T>?, MusicGenSize, int, double, int?)

Initializes a new MusicGen model with custom parameters.

public MusicGenModel(DiffusionModelOptions<T>? options = null, INoiseScheduler<T>? scheduler = null, UNetNoisePredictor<T>? unet = null, AudioVAE<T>? musicVAE = null, IConditioningModule<T>? textConditioner = null, MusicGenSize modelSize = MusicGenSize.Medium, int sampleRate = 32000, double defaultDurationSeconds = 30, int? seed = null)

Parameters

options DiffusionModelOptions<T>

Configuration options for the diffusion model.

scheduler INoiseScheduler<T>

Optional custom scheduler.

unet UNetNoisePredictor<T>

Optional custom U-Net noise predictor.

musicVAE AudioVAE<T>

Optional custom music VAE.

textConditioner IConditioningModule<T>

Optional text conditioning module.

modelSize MusicGenSize

Model size variant.

sampleRate int

Audio sample rate in Hz.

defaultDurationSeconds double

Default music duration.

seed int?

Optional random seed.

Fields

DEFAULT_BPM

Default BPM for music generation.

public const int DEFAULT_BPM = 120

Field Value

int

MUSICGEN_BASE_CHANNELS

MusicGen U-Net base channels.

public const int MUSICGEN_BASE_CHANNELS = 512

Field Value

int

MUSICGEN_CONTEXT_DIM

Context dimension for conditioning.

public const int MUSICGEN_CONTEXT_DIM = 1536

Field Value

int

MUSICGEN_LATENT_CHANNELS

MusicGen latent space channels (larger for musical structure).

public const int MUSICGEN_LATENT_CHANNELS = 16

Field Value

int

MUSICGEN_MAX_DURATION

Maximum supported duration in seconds.

public const double MUSICGEN_MAX_DURATION = 60

Field Value

double

MUSICGEN_MEL_CHANNELS

MusicGen mel spectrogram channels.

public const int MUSICGEN_MEL_CHANNELS = 128

Field Value

int

MUSICGEN_SAMPLE_RATE

MusicGen default sample rate for high-quality music.

public const int MUSICGEN_SAMPLE_RATE = 32000

Field Value

int

Properties

Conditioner

Gets the conditioning module (optional, for conditioned generation).

public override IConditioningModule<T>? Conditioner { get; }

Property Value

IConditioningModule<T>

LatentChannels

Gets the number of latent channels.

public override int LatentChannels { get; }

Property Value

int

Remarks

Typically 4 for Stable Diffusion models.

MelodyEncoder

Gets the melody encoder for melody conditioning.

public MelodyEncoder<T> MelodyEncoder { get; }

Property Value

MelodyEncoder<T>

ModelSize

Gets the model size variant.

public MusicGenSize ModelSize { get; }

Property Value

MusicGenSize

MusicVAE

Gets the music VAE for direct access.

public AudioVAE<T> MusicVAE { get; }

Property Value

AudioVAE<T>

NoisePredictor

Gets the noise predictor model (U-Net, DiT, etc.).

public override INoisePredictor<T> NoisePredictor { get; }

Property Value

INoisePredictor<T>

ParameterCount

Gets the number of parameters in the model.

public override int ParameterCount { get; }

Property Value

int

Remarks

This property returns the total count of trainable parameters in the model. It's useful for understanding model complexity and memory requirements.

RhythmEncoder

Gets the rhythm encoder for beat conditioning.

public RhythmEncoder<T> RhythmEncoder { get; }

Property Value

RhythmEncoder<T>

SupportsAudioToAudio

Gets whether this model supports audio-to-audio transformation.

public override bool SupportsAudioToAudio { get; }

Property Value

bool

SupportsTextToAudio

Gets whether this model supports text-to-audio generation.

public override bool SupportsTextToAudio { get; }

Property Value

bool

SupportsTextToMusic

Gets whether this model supports text-to-music generation.

public override bool SupportsTextToMusic { get; }

Property Value

bool

SupportsTextToSpeech

Gets whether this model supports text-to-speech generation.

public override bool SupportsTextToSpeech { get; }

Property Value

bool

VAE

Gets the VAE model used for encoding and decoding.

public override IVAEModel<T> VAE { get; }

Property Value

IVAEModel<T>

Methods

Clone()

Creates a deep copy of the model.

public override IDiffusionModel<T> Clone()

Returns

IDiffusionModel<T>

A new instance with the same parameters.

ContinueMusic(Tensor<T>, string?, double, double, int, double, int?)

Generates music continuation from an audio prompt.

public virtual Tensor<T> ContinueMusic(Tensor<T> audioPrompt, string? textPrompt = null, double continuationDurationSeconds = 15, double overlapSeconds = 2, int numInferenceSteps = 150, double guidanceScale = 4, int? seed = null)

Parameters

audioPrompt Tensor<T>

Audio to continue from.

textPrompt string

Optional text guidance for continuation.

continuationDurationSeconds double

Duration of continuation.

overlapSeconds double

Overlap with original for smooth transition.

numInferenceSteps int

Number of denoising steps.

guidanceScale double

Classifier-free guidance scale.

seed int?

Optional random seed.

Returns

Tensor<T>

Continued audio waveform.

DeepCopy()

Creates a deep copy of this object.

public override IFullModel<T, Tensor<T>, Tensor<T>> DeepCopy()

Returns

IFullModel<T, Tensor<T>, Tensor<T>>

GenerateFromMelody(Tensor<T>, string, double, string?, int, double, int?)

Generates music conditioned on a reference melody.

public virtual Tensor<T> GenerateFromMelody(Tensor<T> melodyAudio, string prompt, double preservationStrength = 0.6, string? negativePrompt = null, int numInferenceSteps = 200, double guidanceScale = 5, int? seed = null)

Parameters

melodyAudio Tensor<T>

Reference melody audio.

prompt string

Text description for the style/arrangement.

preservationStrength double

How closely to follow the melody (0.0-1.0).

negativePrompt string

Optional negative prompt.

numInferenceSteps int

Number of denoising steps.

guidanceScale double

Classifier-free guidance scale.

seed int?

Optional random seed.

Returns

Tensor<T>

Audio waveform tensor.

Remarks

For Beginners: Melody conditioning lets you:

  • Create covers: Keep melody, change style
  • Add accompaniment: Keep melody, generate instruments
  • Style transfer: Transform melody to different genre

Preservation strength:

  • 0.3-0.5: Use melody as loose guide
  • 0.5-0.7: Balance melody with new elements
  • 0.7-0.9: Closely follow original melody

GenerateFromRhythm(Tensor<T>, string, double, string?, int, double, int?)

Generates music conditioned on a rhythm/beat pattern.

public virtual Tensor<T> GenerateFromRhythm(Tensor<T> rhythmAudio, string prompt, double rhythmStrength = 0.5, string? negativePrompt = null, int numInferenceSteps = 200, double guidanceScale = 5, int? seed = null)

Parameters

rhythmAudio Tensor<T>

Reference rhythm/percussion audio.

prompt string

Text description for the melody/harmony.

rhythmStrength double

How closely to follow the rhythm (0.0-1.0).

negativePrompt string

Optional negative prompt.

numInferenceSteps int

Number of denoising steps.

guidanceScale double

Classifier-free guidance scale.

seed int?

Optional random seed.

Returns

Tensor<T>

Audio waveform tensor.

GenerateMusic(string, string?, double?, int, double, int?)

Generates music from a text prompt.

public override Tensor<T> GenerateMusic(string prompt, string? negativePrompt = null, double? durationSeconds = null, int numInferenceSteps = 200, double guidanceScale = 5, int? seed = null)

Parameters

prompt string

Text description of the desired music.

negativePrompt string

Optional negative prompt.

durationSeconds double?

Duration of music to generate.

numInferenceSteps int

Number of denoising steps.

guidanceScale double

Classifier-free guidance scale.

seed int?

Optional random seed.

Returns

Tensor<T>

Audio waveform tensor [1, samples].

GenerateMusicWithTempo(string, int, string?, double?, int, double, int?)

Generates music with specific tempo (BPM) control.

public virtual Tensor<T> GenerateMusicWithTempo(string prompt, int bpm, string? negativePrompt = null, double? durationSeconds = null, int numInferenceSteps = 200, double guidanceScale = 5, int? seed = null)

Parameters

prompt string

Text description of the desired music.

bpm int

Target beats per minute (60-200 typical).

negativePrompt string

Optional negative prompt.

durationSeconds double?

Duration of music to generate.

numInferenceSteps int

Number of denoising steps.

guidanceScale double

Classifier-free guidance scale.

seed int?

Optional random seed.

Returns

Tensor<T>

Audio waveform tensor.

Remarks

For Beginners: BPM (Beats Per Minute) controls the tempo:

Common BPM ranges:

  • 60-80: Slow ballads, ambient
  • 80-100: Hip-hop, R&B
  • 100-120: Pop, house
  • 120-140: Techno, trance
  • 140-180: Drum and bass, dubstep

GetParameters()

Gets the parameters that can be optimized.

public override Vector<T> GetParameters()

Returns

Vector<T>

SetParameters(Vector<T>)

Sets the model parameters.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>

The parameter vector to set.

Remarks

This method allows direct modification of the model's internal parameters. This is useful for optimization algorithms that need to update parameters iteratively. If the length of parameters does not match ParameterCount, an ArgumentException should be thrown.

Exceptions

ArgumentException

Thrown when the length of parameters does not match ParameterCount.