Class MusicGenModel<T>
MusicGen - Diffusion-based music generation model with advanced musical controls.
public class MusicGenModel<T> : AudioDiffusionModelBase<T>, ILatentDiffusionModel<T>, IAudioDiffusionModel<T>, IDiffusionModel<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>
Type Parameters
TThe numeric type used for calculations.
- Inheritance
-
MusicGenModel<T>
- Implements
- Inherited Members
- Extension Methods
Examples
// Create a MusicGen model
var musicGen = new MusicGenModel<float>();
// Generate electronic music at specific BPM
var edm = musicGen.GenerateMusicWithTempo(
prompt: "Energetic electronic dance music with synthesizers",
bpm: 128,
durationSeconds: 30.0,
numInferenceSteps: 200);
// Generate melody-conditioned music
var variation = musicGen.GenerateFromMelody(
melodyAudio: originalMelody,
prompt: "Jazz version with saxophone",
preservationStrength: 0.7);
Remarks
MusicGenModel is a specialized diffusion model for music generation that provides fine-grained control over musical characteristics including:
- Text-to-Music: Generate music from natural language descriptions
- Melody Conditioning: Guide generation with a reference melody
- Rhythm/Beat Conditioning: Generate music following a specific rhythm pattern
- Tempo Control: Generate at specific BPM (beats per minute)
- Key/Scale Guidance: Influence the musical key of generated content
- Style Transfer: Transform existing music to different styles
For Beginners: This model generates music with precise control:
Example prompts:
- "Upbeat electronic dance music at 128 BPM" -> EDM track
- "Sad piano ballad in A minor" -> emotional piano piece
- "Funky bass groove with drums" -> funk rhythm section
- "Orchestral film score, epic and dramatic" -> cinematic music
Advanced controls:
- BPM: Set exact tempo (60-200 BPM typical)
- Key: Major/minor keys (C major, A minor, etc.)
- Instruments: Specify or exclude instruments
- Style: Jazz, rock, classical, electronic, etc.
Technical specifications: - Sample rate: 32 kHz (high-quality music) - Latent channels: 16 (more capacity for musical structure) - Mel channels: 128 - Duration: Up to 60 seconds - Guidance scale: 3.0-7.0 typical
Constructors
MusicGenModel()
Initializes a new MusicGen model with default parameters.
public MusicGenModel()
MusicGenModel(DiffusionModelOptions<T>?, INoiseScheduler<T>?, UNetNoisePredictor<T>?, AudioVAE<T>?, IConditioningModule<T>?, MusicGenSize, int, double, int?)
Initializes a new MusicGen model with custom parameters.
public MusicGenModel(DiffusionModelOptions<T>? options = null, INoiseScheduler<T>? scheduler = null, UNetNoisePredictor<T>? unet = null, AudioVAE<T>? musicVAE = null, IConditioningModule<T>? textConditioner = null, MusicGenSize modelSize = MusicGenSize.Medium, int sampleRate = 32000, double defaultDurationSeconds = 30, int? seed = null)
Parameters
optionsDiffusionModelOptions<T>Configuration options for the diffusion model.
schedulerINoiseScheduler<T>Optional custom scheduler.
unetUNetNoisePredictor<T>Optional custom U-Net noise predictor.
musicVAEAudioVAE<T>Optional custom music VAE.
textConditionerIConditioningModule<T>Optional text conditioning module.
modelSizeMusicGenSizeModel size variant.
sampleRateintAudio sample rate in Hz.
defaultDurationSecondsdoubleDefault music duration.
seedint?Optional random seed.
Fields
DEFAULT_BPM
Default BPM for music generation.
public const int DEFAULT_BPM = 120
Field Value
MUSICGEN_BASE_CHANNELS
MusicGen U-Net base channels.
public const int MUSICGEN_BASE_CHANNELS = 512
Field Value
MUSICGEN_CONTEXT_DIM
Context dimension for conditioning.
public const int MUSICGEN_CONTEXT_DIM = 1536
Field Value
MUSICGEN_LATENT_CHANNELS
MusicGen latent space channels (larger for musical structure).
public const int MUSICGEN_LATENT_CHANNELS = 16
Field Value
MUSICGEN_MAX_DURATION
Maximum supported duration in seconds.
public const double MUSICGEN_MAX_DURATION = 60
Field Value
MUSICGEN_MEL_CHANNELS
MusicGen mel spectrogram channels.
public const int MUSICGEN_MEL_CHANNELS = 128
Field Value
MUSICGEN_SAMPLE_RATE
MusicGen default sample rate for high-quality music.
public const int MUSICGEN_SAMPLE_RATE = 32000
Field Value
Properties
Conditioner
Gets the conditioning module (optional, for conditioned generation).
public override IConditioningModule<T>? Conditioner { get; }
Property Value
LatentChannels
Gets the number of latent channels.
public override int LatentChannels { get; }
Property Value
Remarks
Typically 4 for Stable Diffusion models.
MelodyEncoder
Gets the melody encoder for melody conditioning.
public MelodyEncoder<T> MelodyEncoder { get; }
Property Value
ModelSize
Gets the model size variant.
public MusicGenSize ModelSize { get; }
Property Value
MusicVAE
Gets the music VAE for direct access.
public AudioVAE<T> MusicVAE { get; }
Property Value
- AudioVAE<T>
NoisePredictor
Gets the noise predictor model (U-Net, DiT, etc.).
public override INoisePredictor<T> NoisePredictor { get; }
Property Value
ParameterCount
Gets the number of parameters in the model.
public override int ParameterCount { get; }
Property Value
Remarks
This property returns the total count of trainable parameters in the model. It's useful for understanding model complexity and memory requirements.
RhythmEncoder
Gets the rhythm encoder for beat conditioning.
public RhythmEncoder<T> RhythmEncoder { get; }
Property Value
SupportsAudioToAudio
Gets whether this model supports audio-to-audio transformation.
public override bool SupportsAudioToAudio { get; }
Property Value
SupportsTextToAudio
Gets whether this model supports text-to-audio generation.
public override bool SupportsTextToAudio { get; }
Property Value
SupportsTextToMusic
Gets whether this model supports text-to-music generation.
public override bool SupportsTextToMusic { get; }
Property Value
SupportsTextToSpeech
Gets whether this model supports text-to-speech generation.
public override bool SupportsTextToSpeech { get; }
Property Value
VAE
Gets the VAE model used for encoding and decoding.
public override IVAEModel<T> VAE { get; }
Property Value
- IVAEModel<T>
Methods
Clone()
Creates a deep copy of the model.
public override IDiffusionModel<T> Clone()
Returns
- IDiffusionModel<T>
A new instance with the same parameters.
ContinueMusic(Tensor<T>, string?, double, double, int, double, int?)
Generates music continuation from an audio prompt.
public virtual Tensor<T> ContinueMusic(Tensor<T> audioPrompt, string? textPrompt = null, double continuationDurationSeconds = 15, double overlapSeconds = 2, int numInferenceSteps = 150, double guidanceScale = 4, int? seed = null)
Parameters
audioPromptTensor<T>Audio to continue from.
textPromptstringOptional text guidance for continuation.
continuationDurationSecondsdoubleDuration of continuation.
overlapSecondsdoubleOverlap with original for smooth transition.
numInferenceStepsintNumber of denoising steps.
guidanceScaledoubleClassifier-free guidance scale.
seedint?Optional random seed.
Returns
- Tensor<T>
Continued audio waveform.
DeepCopy()
Creates a deep copy of this object.
public override IFullModel<T, Tensor<T>, Tensor<T>> DeepCopy()
Returns
- IFullModel<T, Tensor<T>, Tensor<T>>
GenerateFromMelody(Tensor<T>, string, double, string?, int, double, int?)
Generates music conditioned on a reference melody.
public virtual Tensor<T> GenerateFromMelody(Tensor<T> melodyAudio, string prompt, double preservationStrength = 0.6, string? negativePrompt = null, int numInferenceSteps = 200, double guidanceScale = 5, int? seed = null)
Parameters
melodyAudioTensor<T>Reference melody audio.
promptstringText description for the style/arrangement.
preservationStrengthdoubleHow closely to follow the melody (0.0-1.0).
negativePromptstringOptional negative prompt.
numInferenceStepsintNumber of denoising steps.
guidanceScaledoubleClassifier-free guidance scale.
seedint?Optional random seed.
Returns
- Tensor<T>
Audio waveform tensor.
Remarks
For Beginners: Melody conditioning lets you:
- Create covers: Keep melody, change style
- Add accompaniment: Keep melody, generate instruments
- Style transfer: Transform melody to different genre
Preservation strength:
- 0.3-0.5: Use melody as loose guide
- 0.5-0.7: Balance melody with new elements
- 0.7-0.9: Closely follow original melody
GenerateFromRhythm(Tensor<T>, string, double, string?, int, double, int?)
Generates music conditioned on a rhythm/beat pattern.
public virtual Tensor<T> GenerateFromRhythm(Tensor<T> rhythmAudio, string prompt, double rhythmStrength = 0.5, string? negativePrompt = null, int numInferenceSteps = 200, double guidanceScale = 5, int? seed = null)
Parameters
rhythmAudioTensor<T>Reference rhythm/percussion audio.
promptstringText description for the melody/harmony.
rhythmStrengthdoubleHow closely to follow the rhythm (0.0-1.0).
negativePromptstringOptional negative prompt.
numInferenceStepsintNumber of denoising steps.
guidanceScaledoubleClassifier-free guidance scale.
seedint?Optional random seed.
Returns
- Tensor<T>
Audio waveform tensor.
GenerateMusic(string, string?, double?, int, double, int?)
Generates music from a text prompt.
public override Tensor<T> GenerateMusic(string prompt, string? negativePrompt = null, double? durationSeconds = null, int numInferenceSteps = 200, double guidanceScale = 5, int? seed = null)
Parameters
promptstringText description of the desired music.
negativePromptstringOptional negative prompt.
durationSecondsdouble?Duration of music to generate.
numInferenceStepsintNumber of denoising steps.
guidanceScaledoubleClassifier-free guidance scale.
seedint?Optional random seed.
Returns
- Tensor<T>
Audio waveform tensor [1, samples].
GenerateMusicWithTempo(string, int, string?, double?, int, double, int?)
Generates music with specific tempo (BPM) control.
public virtual Tensor<T> GenerateMusicWithTempo(string prompt, int bpm, string? negativePrompt = null, double? durationSeconds = null, int numInferenceSteps = 200, double guidanceScale = 5, int? seed = null)
Parameters
promptstringText description of the desired music.
bpmintTarget beats per minute (60-200 typical).
negativePromptstringOptional negative prompt.
durationSecondsdouble?Duration of music to generate.
numInferenceStepsintNumber of denoising steps.
guidanceScaledoubleClassifier-free guidance scale.
seedint?Optional random seed.
Returns
- Tensor<T>
Audio waveform tensor.
Remarks
For Beginners: BPM (Beats Per Minute) controls the tempo:
Common BPM ranges:
- 60-80: Slow ballads, ambient
- 80-100: Hip-hop, R&B
- 100-120: Pop, house
- 120-140: Techno, trance
- 140-180: Drum and bass, dubstep
GetParameters()
Gets the parameters that can be optimized.
public override Vector<T> GetParameters()
Returns
- Vector<T>
SetParameters(Vector<T>)
Sets the model parameters.
public override void SetParameters(Vector<T> parameters)
Parameters
parametersVector<T>The parameter vector to set.
Remarks
This method allows direct modification of the model's internal parameters.
This is useful for optimization algorithms that need to update parameters iteratively.
If the length of parameters does not match ParameterCount,
an ArgumentException should be thrown.
Exceptions
- ArgumentException
Thrown when the length of
parametersdoes not match ParameterCount.