Class MusicGenModel<T>

Namespace: AiDotNet.Audio.MusicGen

Assembly: AiDotNet.dll

Meta's MusicGen model for generating music from text descriptions.

public class MusicGenModel<T> : AudioNeuralNetworkBase<T>, INeuralNetworkModel<T>, INeuralNetwork<T>, IInterpretableModel<T>, IInputGradientComputable<T>, IDisposable, IAudioGenerator<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>

Type Parameters

T: The numeric type used for calculations.

Inheritance: object

NeuralNetworkBase<T>

AudioNeuralNetworkBase<T>

MusicGenModel<T>

Implements: INeuralNetworkModel<T>

INeuralNetwork<T>

IInterpretableModel<T>

IInputGradientComputable<T>

IDisposable

IAudioGenerator<T>

IFullModel<T, Tensor<T>, Tensor<T>>

IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>

IModelSerializer

ICheckpointableModel

IParameterizable<T, Tensor<T>, Tensor<T>>

IFeatureAware

IFeatureImportance<T>

ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>

IGradientComputable<T, Tensor<T>, Tensor<T>>

IJitCompilable<T>

Inherited Members: AudioNeuralNetworkBase<T>.NumMels

AudioNeuralNetworkBase<T>.IsOnnxMode

AudioNeuralNetworkBase<T>.OnnxEncoder

AudioNeuralNetworkBase<T>.OnnxDecoder

AudioNeuralNetworkBase<T>.OnnxModel

AudioNeuralNetworkBase<T>.MelSpec

AudioNeuralNetworkBase<T>.SupportsTraining

AudioNeuralNetworkBase<T>.RunOnnxInference(Tensor<T>)

AudioNeuralNetworkBase<T>.Forward(Tensor<T>)

AudioNeuralNetworkBase<T>.DefaultLossFunction

AudioNeuralNetworkBase<T>.CreateMelSpectrogram(int, int, int, int)

NeuralNetworkBase<T>.Layers

NeuralNetworkBase<T>.LayerCount

NeuralNetworkBase<T>.Architecture

NeuralNetworkBase<T>.NumOps

NeuralNetworkBase<T>.Engine

NeuralNetworkBase<T>._layerInputs

NeuralNetworkBase<T>._layerOutputs

NeuralNetworkBase<T>.Random

NeuralNetworkBase<T>.LossFunction

NeuralNetworkBase<T>.LastLoss

NeuralNetworkBase<T>.IsTrainingMode

NeuralNetworkBase<T>.SupportsGpuTraining

NeuralNetworkBase<T>.CanTrainOnGpu

NeuralNetworkBase<T>.GpuEngine

NeuralNetworkBase<T>.MaxGradNorm

NeuralNetworkBase<T>._mixedPrecisionContext

NeuralNetworkBase<T>._memoryManager

NeuralNetworkBase<T>.IsMemoryManagementEnabled

NeuralNetworkBase<T>.IsGradientCheckpointingEnabled

NeuralNetworkBase<T>.IsMixedPrecisionEnabled

NeuralNetworkBase<T>.ClipGradients(List<Tensor<T>>)

NeuralNetworkBase<T>.ClipGradient(Tensor<T>)

NeuralNetworkBase<T>.ClipGradient(Vector<T>)

NeuralNetworkBase<T>.GetParameters()

NeuralNetworkBase<T>.Backpropagate(Tensor<T>)

NeuralNetworkBase<T>.BackpropagateWithRecompute(Tensor<T>)

NeuralNetworkBase<T>.ForwardGpu(IGpuTensor<T>)

NeuralNetworkBase<T>.BackpropagateGpu(IGpuTensor<T>)

NeuralNetworkBase<T>.BackpropagateGpuDeferred(IGpuTensor<T>, GpuExecutionOptions)

NeuralNetworkBase<T>.UpdateParametersGpu(T, T, T)

NeuralNetworkBase<T>.UpdateParametersGpu(IGpuOptimizerConfig)

NeuralNetworkBase<T>.UpdateParametersGpuDeferred(IGpuOptimizerConfig, GpuExecutionOptions)

NeuralNetworkBase<T>.TrainBatchGpuDeferred(IGpuTensor<T>, IGpuTensor<T>, IGpuOptimizerConfig, GpuExecutionOptions)

NeuralNetworkBase<T>.TrainBatchGpuDeferredAsync(IGpuTensor<T>, IGpuTensor<T>, IGpuOptimizerConfig, GpuExecutionOptions, CancellationToken)

NeuralNetworkBase<T>.UploadWeightsToGpu()

NeuralNetworkBase<T>.DownloadWeightsFromGpu()

NeuralNetworkBase<T>.ZeroGradientsGpu()

NeuralNetworkBase<T>.ExtractSingleExample(Tensor<T>, int)

NeuralNetworkBase<T>.ForwardWithMemory(Tensor<T>)

NeuralNetworkBase<T>.ForwardWithCheckpointing(Tensor<T>)

NeuralNetworkBase<T>.CanUseGpuResidentPath()

NeuralNetworkBase<T>.TryForwardGpuOptimized(Tensor<T>, out Tensor<T>)

NeuralNetworkBase<T>.ForwardGpu(Tensor<T>)

NeuralNetworkBase<T>.ForwardDeferred(Tensor<T>)

NeuralNetworkBase<T>.ForwardDeferredAsync(Tensor<T>, CancellationToken)

NeuralNetworkBase<T>.BeginGpuExecution(GpuExecutionOptions)

NeuralNetworkBase<T>.ForwardWithGpuContext(Tensor<T>)

NeuralNetworkBase<T>.ForwardWithGpuContext(IGpuTensor<T>)

NeuralNetworkBase<T>.GetGpuMemoryStats()

NeuralNetworkBase<T>.ForwardWithFeatures(Tensor<T>, int[])

NeuralNetworkBase<T>.ParameterCount

NeuralNetworkBase<T>.GetParameterCount()

NeuralNetworkBase<T>.InvalidateParameterCountCache()

NeuralNetworkBase<T>.AddLayerToCollection(ILayer<T>)

NeuralNetworkBase<T>.RemoveLayerFromCollection(ILayer<T>)

NeuralNetworkBase<T>.ClearLayers()

NeuralNetworkBase<T>.ValidateCustomLayers(List<ILayer<T>>)

NeuralNetworkBase<T>.ValidateCustomLayersInternal(List<ILayer<T>>)

NeuralNetworkBase<T>.IsValidInputLayer(ILayer<T>)

NeuralNetworkBase<T>.IsValidOutputLayer(ILayer<T>)

NeuralNetworkBase<T>.AreLayersCompatible(ILayer<T>, ILayer<T>)

NeuralNetworkBase<T>.GetParameterGradients()

NeuralNetworkBase<T>.EnsureArchitectureInitialized()

NeuralNetworkBase<T>.SetTrainingMode(bool)

NeuralNetworkBase<T>.EnableMemoryManagement(TrainingMemoryConfig)

NeuralNetworkBase<T>.DisableMemoryManagement()

NeuralNetworkBase<T>.GetMemoryEstimate(int, int)

NeuralNetworkBase<T>.GetLastLoss()

NeuralNetworkBase<T>.ResetState()

NeuralNetworkBase<T>.BackwardWithInputGradient(Tensor<T>)

NeuralNetworkBase<T>.ComputeInputGradient(Vector<T>, Vector<T>)

NeuralNetworkBase<T>.ComputeInputGradient(Tensor<T>, Tensor<T>)

NeuralNetworkBase<T>.SaveModel(string)

NeuralNetworkBase<T>.LoadModel(string)

NeuralNetworkBase<T>.Serialize()

NeuralNetworkBase<T>.Deserialize(byte[])

NeuralNetworkBase<T>.WithParameters(Vector<T>)

NeuralNetworkBase<T>.GetActiveFeatureIndices()

NeuralNetworkBase<T>.IsFeatureUsed(int)

NeuralNetworkBase<T>.DeepCopy()

NeuralNetworkBase<T>.Clone()

NeuralNetworkBase<T>.SetActiveFeatureIndices(IEnumerable<int>)

NeuralNetworkBase<T>._enabledMethods

NeuralNetworkBase<T>._sensitiveFeatures

NeuralNetworkBase<T>._fairnessMetrics

NeuralNetworkBase<T>._baseModel

NeuralNetworkBase<T>.GetGlobalFeatureImportanceAsync()

NeuralNetworkBase<T>.GetLocalFeatureImportanceAsync(Tensor<T>)

NeuralNetworkBase<T>.GetShapValuesAsync(Tensor<T>)

NeuralNetworkBase<T>.GetLimeExplanationAsync(Tensor<T>, int)

NeuralNetworkBase<T>.GetPartialDependenceAsync(Vector<int>, int)

NeuralNetworkBase<T>.GetCounterfactualAsync(Tensor<T>, Tensor<T>, int)

NeuralNetworkBase<T>.GetModelSpecificInterpretabilityAsync()

NeuralNetworkBase<T>.GenerateTextExplanationAsync(Tensor<T>, Tensor<T>)

NeuralNetworkBase<T>.GetFeatureInteractionAsync(int, int)

NeuralNetworkBase<T>.ValidateFairnessAsync(Tensor<T>, int)

NeuralNetworkBase<T>.GetAnchorExplanationAsync(Tensor<T>, T)

NeuralNetworkBase<T>.SetBaseModel<TInput, TOutput>(IFullModel<T, TInput, TOutput>)

NeuralNetworkBase<T>.EnableMethod(params InterpretationMethod[])

NeuralNetworkBase<T>.ConfigureFairness(Vector<int>, params FairnessMetric[])

NeuralNetworkBase<T>.GetNamedLayerActivations(Tensor<T>)

NeuralNetworkBase<T>.GetArchitecture()

NeuralNetworkBase<T>.GetFeatureImportance()

NeuralNetworkBase<T>.SetParameters(Vector<T>)

NeuralNetworkBase<T>.AddLayer(LayerType, int, ActivationFunction)

NeuralNetworkBase<T>.AddConvolutionalLayer(int, int, int, ActivationFunction)

NeuralNetworkBase<T>.AddLSTMLayer(int, bool)

NeuralNetworkBase<T>.AddDropoutLayer(double)

NeuralNetworkBase<T>.AddBatchNormalizationLayer(int, double, double)

NeuralNetworkBase<T>.AddPoolingLayer(int[], PoolingType, int, int?)

NeuralNetworkBase<T>.GetGradients()

NeuralNetworkBase<T>.GetInputShape()

NeuralNetworkBase<T>.GetLayerActivations(Tensor<T>)

NeuralNetworkBase<T>.ComputeGradients(Tensor<T>, Tensor<T>, ILossFunction<T>)

NeuralNetworkBase<T>.ApplyGradients(Vector<T>, T)

NeuralNetworkBase<T>.SaveState(Stream)

NeuralNetworkBase<T>.LoadState(Stream)

NeuralNetworkBase<T>.Dispose()

NeuralNetworkBase<T>.SupportsJitCompilation

NeuralNetworkBase<T>.ExportComputationGraph(List<ComputationNode<T>>)

NeuralNetworkBase<T>.ConvertLayerToGraph(ILayer<T>, ComputationNode<T>)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Extension Methods: DistributedExtensions.AsDistributedForHighBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributedForLowBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, IShardingConfiguration<T>)

Remarks

MusicGen is a state-of-the-art text-to-music generation model from Meta AI Research. It uses a single-stage transformer language model that operates directly on EnCodec audio codes, generating high-quality music from text descriptions.

Architecture components:

Text Encoder: T5-based encoder that converts text prompts to embeddings
Language Model: Transformer decoder that generates audio codes autoregressively
EnCodec Decoder: Neural audio codec that converts discrete codes to waveforms

For Beginners: MusicGen creates original music from your descriptions:

How it works:

You describe the music you want ("upbeat jazz piano")
The text encoder understands your description
The language model generates a sequence of "music tokens"
The EnCodec decoder converts tokens to actual audio

Key features:

30 seconds of high-quality 32kHz audio
Multiple genres and styles
Control over instruments, tempo, mood
Stereo output option

Usage:

var model = new MusicGenModel<float>(options);
var audio = model.GenerateMusic("Calm piano melody with soft strings");

Reference: "Simple and Controllable Music Generation" by Copet et al., Meta AI, 2023

Constructors

MusicGenModel(NeuralNetworkArchitecture<T>, MusicGenOptions?, ITokenizer?, IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>?, ILossFunction<T>?)

Creates a MusicGen model using native layers for training from scratch.

public MusicGenModel(NeuralNetworkArchitecture<T> architecture, MusicGenOptions? options = null, ITokenizer? tokenizer = null, IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>? optimizer = null, ILossFunction<T>? lossFunction = null)

Parameters

architecture NeuralNetworkArchitecture<T>: The neural network architecture configuration.
options MusicGenOptions: MusicGen configuration options.
tokenizer ITokenizer: Optional tokenizer. If null, creates T5-compatible tokenizer.
optimizer IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>: Optional optimizer. Defaults to AdamW.
lossFunction ILossFunction<T>: Optional loss function. Defaults to CrossEntropy.

Remarks

For Beginners: Use this constructor when: - Training MusicGen from scratch (requires significant data) - Fine-tuning on custom music styles - Research and experimentation

For most use cases, load pretrained ONNX models instead.

MusicGenModel(NeuralNetworkArchitecture<T>, string, string, string, ITokenizer, MusicGenOptions?, IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>?, ILossFunction<T>?)

Creates a MusicGen model using pretrained ONNX models for inference.

public MusicGenModel(NeuralNetworkArchitecture<T> architecture, string textEncoderPath, string languageModelPath, string encodecDecoderPath, ITokenizer tokenizer, MusicGenOptions? options = null, IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>? optimizer = null, ILossFunction<T>? lossFunction = null)

Parameters

architecture NeuralNetworkArchitecture<T>: The neural network architecture configuration.
textEncoderPath string: Path to the T5 text encoder ONNX model.
languageModelPath string: Path to the transformer LM ONNX model.
encodecDecoderPath string: Path to the EnCodec decoder ONNX model.
tokenizer ITokenizer: T5 tokenizer for text processing (REQUIRED).
options MusicGenOptions: MusicGen configuration options.
optimizer IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>: Optional optimizer for fine-tuning.
lossFunction ILossFunction<T>: Optional loss function.

Exceptions

ArgumentException: Thrown when required paths are empty.
FileNotFoundException: Thrown when model files don't exist.
ArgumentNullException: Thrown when tokenizer is null.

Properties

MaxDurationSeconds

Gets the maximum duration of audio that can be generated.

public double MaxDurationSeconds { get; }

Property Value

double

SampleRate

Gets the sample rate of generated audio.

public int SampleRate { get; }

Property Value

int

SupportsAudioContinuation

Gets whether this model supports audio continuation.

public bool SupportsAudioContinuation { get; }

Property Value

bool

SupportsAudioInpainting

Gets whether this model supports audio inpainting.

public bool SupportsAudioInpainting { get; }

Property Value

bool

SupportsTextToAudio

Gets whether this model supports text-to-audio generation.

public bool SupportsTextToAudio { get; }

Property Value

bool

SupportsTextToMusic

Gets whether this model supports text-to-music generation.

public bool SupportsTextToMusic { get; }

Property Value

bool

Methods

ContinueAudio(Tensor<T>, string?, double, int, int?)

Continues existing audio by extending it.

public Tensor<T> ContinueAudio(Tensor<T> inputAudio, string? prompt = null, double extensionSeconds = 5, int numInferenceSteps = 100, int? seed = null)

Parameters

inputAudio Tensor<T>: Audio to continue from.
prompt string: Optional text guidance for continuation.
extensionSeconds double: How many seconds to add.
numInferenceSteps int: Not used.
seed int?: Random seed.

Returns

Tensor<T>: Extended audio (original + continuation).

CreateNewInstance()

Creates a new instance for cloning.

protected override IFullModel<T, Tensor<T>, Tensor<T>> CreateNewInstance()

Returns

IFullModel<T, Tensor<T>, Tensor<T>>

DeserializeNetworkSpecificData(BinaryReader)

Deserializes network-specific data.

protected override void DeserializeNetworkSpecificData(BinaryReader reader)

Parameters

reader BinaryReader

Dispose(bool)

Disposes of resources.

protected override void Dispose(bool disposing)

Parameters

disposing bool

GenerateAudio(string, string?, double, int, double, int?)

Generates audio from a text description.

public Tensor<T> GenerateAudio(string prompt, string? negativePrompt = null, double durationSeconds = 5, int numInferenceSteps = 100, double guidanceScale = 3, int? seed = null)

Parameters

prompt string
negativePrompt string
durationSeconds double
numInferenceSteps int
guidanceScale double
seed int?

Returns

Tensor<T>

Remarks

MusicGen is optimized for music, not general audio. For best results, use GenerateMusic instead.

GenerateAudioAsync(string, string?, double, int, double, int?, CancellationToken)

Generates audio asynchronously.

public Task<Tensor<T>> GenerateAudioAsync(string prompt, string? negativePrompt = null, double durationSeconds = 5, int numInferenceSteps = 100, double guidanceScale = 3, int? seed = null, CancellationToken cancellationToken = default)

Parameters

prompt string
negativePrompt string
durationSeconds double
numInferenceSteps int
guidanceScale double
seed int?
cancellationToken CancellationToken

Returns

Task<Tensor<T>>

GenerateMusic(string, string?, double, int, double, int?)

Generates music from a text description.

public Tensor<T> GenerateMusic(string prompt, string? negativePrompt = null, double durationSeconds = 10, int numInferenceSteps = 100, double guidanceScale = 3, int? seed = null)

Parameters

prompt string: Text description of the desired music.
negativePrompt string: What to avoid in the generated music.
durationSeconds double: Duration of music to generate (max 30s).
numInferenceSteps int: Not used in autoregressive generation.
guidanceScale double: How closely to follow the prompt.
seed int?: Random seed for reproducibility.

Returns

Tensor<T>: Generated music waveform tensor.

GetDefaultOptions()

Gets default generation options.

public AudioGenerationOptions<T> GetDefaultOptions()

Returns

AudioGenerationOptions<T>

GetModelMetadata()

Gets model metadata.

public override ModelMetadata<T> GetModelMetadata()

Returns

ModelMetadata<T>

InitializeLayers()

Initializes the neural network layers following the golden standard pattern.

protected override void InitializeLayers()

Remarks

This method follows the AiDotNet golden standard pattern: 1. First, check if the user provided custom layers via Architecture.Layers 2. If custom layers exist, use them (allows full customization) 3. Otherwise, use LayerHelper.CreateDefaultMusicGenLayers() for standard architecture

For Beginners: This gives you flexibility: - Want standard MusicGen? Just create the model, it auto-configures. - Want custom architecture? Pass your own layers in the Architecture.

InpaintAudio(Tensor<T>, Tensor<T>, string?, int, int?)

Inpainting is not supported by MusicGen.

public Tensor<T> InpaintAudio(Tensor<T> audio, Tensor<T> mask, string? prompt = null, int numInferenceSteps = 100, int? seed = null)

Parameters

audio Tensor<T>
mask Tensor<T>
prompt string
numInferenceSteps int
seed int?

Returns

Tensor<T>

PostprocessOutput(Tensor<T>)

Postprocesses model output.

protected override Tensor<T> PostprocessOutput(Tensor<T> modelOutput)

Parameters

modelOutput Tensor<T>

Returns

Tensor<T>

Predict(Tensor<T>)

Makes a prediction using the model.

public override Tensor<T> Predict(Tensor<T> input)

Parameters

input Tensor<T>

Returns

Tensor<T>

PreprocessAudio(Tensor<T>)

Preprocesses raw audio for model input.

protected override Tensor<T> PreprocessAudio(Tensor<T> rawAudio)

Parameters

rawAudio Tensor<T>

Returns

Tensor<T>

SerializeNetworkSpecificData(BinaryWriter)

Serializes network-specific data.

protected override void SerializeNetworkSpecificData(BinaryWriter writer)

Parameters

writer BinaryWriter

Train(Tensor<T>, Tensor<T>)

Trains the model on input data.

public override void Train(Tensor<T> input, Tensor<T> expectedOutput)

Parameters

input Tensor<T>
expectedOutput Tensor<T>

UpdateParameters(Vector<T>)

Updates model parameters.

public override void UpdateParameters(Vector<T> parameters)

Parameters

parameters Vector<T>

Table of Contents

Class MusicGenModel<T>

Type Parameters

Remarks

Constructors

MusicGenModel(NeuralNetworkArchitecture<T>, MusicGenOptions?, ITokenizer?, IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>?, ILossFunction<T>?)

Parameters

Remarks

MusicGenModel(NeuralNetworkArchitecture<T>, string, string, string, ITokenizer, MusicGenOptions?, IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>?, ILossFunction<T>?)

Parameters

Exceptions

Properties

MaxDurationSeconds

Property Value

SampleRate

Property Value

SupportsAudioContinuation

Property Value

SupportsAudioInpainting

Property Value

SupportsTextToAudio

Property Value

SupportsTextToMusic

Property Value

Methods

ContinueAudio(Tensor<T>, string?, double, int, int?)

Parameters

Returns

CreateNewInstance()

Returns

DeserializeNetworkSpecificData(BinaryReader)

Parameters

Dispose(bool)

Parameters

GenerateAudio(string, string?, double, int, double, int?)

Parameters

Returns

Remarks

GenerateAudioAsync(string, string?, double, int, double, int?, CancellationToken)

Parameters

Returns

GenerateMusic(string, string?, double, int, double, int?)

Parameters

Returns

GetDefaultOptions()

Returns

GetModelMetadata()

Returns

InitializeLayers()

Remarks

InpaintAudio(Tensor<T>, Tensor<T>, string?, int, int?)

Parameters

Returns

PostprocessOutput(Tensor<T>)

Parameters

Returns

Predict(Tensor<T>)

Parameters

Returns

PreprocessAudio(Tensor<T>)

Parameters

Returns

SerializeNetworkSpecificData(BinaryWriter)

Parameters

Train(Tensor<T>, Tensor<T>)

Parameters

UpdateParameters(Vector<T>)

Parameters