Class RiffusionModel<T>
Riffusion model for music generation via spectrogram diffusion.
public class RiffusionModel<T> : LatentDiffusionModelBase<T>, ILatentDiffusionModel<T>, IDiffusionModel<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>
Type Parameters
TThe numeric type used for calculations.
- Inheritance
-
RiffusionModel<T>
- Implements
- Inherited Members
- Extension Methods
Examples
// Create a Riffusion model
var riffusion = new RiffusionModel<float>();
// Generate music from text
var spectrogram = riffusion.GenerateSpectrogram(
prompt: "jazz piano solo, smooth and relaxing",
durationSeconds: 5.0);
// Convert to audio
var audio = riffusion.SpectrogramToAudio(spectrogram);
// Interpolate between two styles
var interpolated = riffusion.InterpolateStyles(
promptA: "upbeat electronic dance music",
promptB: "calm ambient soundscape",
alpha: 0.5);
Remarks
Riffusion generates music by treating audio spectrograms as images and using Stable Diffusion to generate them. The resulting spectrograms are then converted back to audio using the Griffin-Lim algorithm or neural vocoders.
For Beginners: Riffusion creates music by first generating a "picture" of the sound (spectrogram), then converting that picture back into actual audio.
How it works:
- You describe the music you want: "jazz piano solo"
- Riffusion generates a spectrogram (visual representation of sound)
- The spectrogram is converted to playable audio
Key features:
- Text-to-music generation
- Style interpolation (blend two music styles)
- Real-time streaming generation
- Works with any Stable Diffusion checkpoint
What makes it unique:
- Treats audio generation as an image generation problem
- Can leverage all SD techniques: ControlNet, img2img, etc.
- Fast inference compared to autoregressive music models
Technical details: - Uses mel-spectrograms with specific parameters - Typically 512x512 spectrogram images - Griffin-Lim or neural vocoder for audio reconstruction - Supports seed-based interpolation for smooth transitions - Compatible with LoRA adapters for style transfer
Reference: Based on Riffusion project (riffusion.com)
Constructors
RiffusionModel()
Initializes a new instance of RiffusionModel with default parameters.
public RiffusionModel()
RiffusionModel(DiffusionModelOptions<T>?, INoiseScheduler<T>?, UNetNoisePredictor<T>?, StandardVAE<T>?, IConditioningModule<T>?, SpectrogramConfig?, int?)
Initializes a new instance of RiffusionModel with custom parameters.
public RiffusionModel(DiffusionModelOptions<T>? options = null, INoiseScheduler<T>? scheduler = null, UNetNoisePredictor<T>? unet = null, StandardVAE<T>? vae = null, IConditioningModule<T>? conditioner = null, SpectrogramConfig? spectrogramConfig = null, int? seed = null)
Parameters
optionsDiffusionModelOptions<T>Configuration options.
schedulerINoiseScheduler<T>Optional custom scheduler.
unetUNetNoisePredictor<T>Optional custom U-Net.
vaeStandardVAE<T>Optional custom VAE.
conditionerIConditioningModule<T>Optional text conditioning module.
spectrogramConfigSpectrogramConfigSpectrogram configuration.
seedint?Optional random seed.
Properties
Conditioner
Gets the conditioning module (optional, for conditioned generation).
public override IConditioningModule<T>? Conditioner { get; }
Property Value
LatentChannels
Gets the number of latent channels.
public override int LatentChannels { get; }
Property Value
Remarks
Typically 4 for Stable Diffusion models.
NoisePredictor
Gets the noise predictor model (U-Net, DiT, etc.).
public override INoisePredictor<T> NoisePredictor { get; }
Property Value
ParameterCount
Gets the number of parameters in the model.
public override int ParameterCount { get; }
Property Value
Remarks
This property returns the total count of trainable parameters in the model. It's useful for understanding model complexity and memory requirements.
SpectrogramConfiguration
Gets the spectrogram configuration.
public SpectrogramConfig SpectrogramConfiguration { get; }
Property Value
VAE
Gets the VAE model used for encoding and decoding.
public override IVAEModel<T> VAE { get; }
Property Value
- IVAEModel<T>
Methods
Clone()
Creates a deep copy of the model.
public override IDiffusionModel<T> Clone()
Returns
- IDiffusionModel<T>
A new instance with the same parameters.
DeepCopy()
Creates a deep copy of this object.
public override IFullModel<T, Tensor<T>, Tensor<T>> DeepCopy()
Returns
- IFullModel<T, Tensor<T>, Tensor<T>>
GenerateAudio(string, string?, double, int, double?, int?)
Generates audio directly from a text prompt.
public virtual Tensor<T> GenerateAudio(string prompt, string? negativePrompt = null, double durationSeconds = 5, int numInferenceSteps = 50, double? guidanceScale = null, int? seed = null)
Parameters
promptstringText description of the desired music.
negativePromptstringOptional negative prompt.
durationSecondsdoubleDesired audio duration.
numInferenceStepsintDenoising steps.
guidanceScaledouble?Guidance scale.
seedint?Random seed.
Returns
- Tensor<T>
Audio waveform tensor.
GenerateSpectrogram(string, string?, double, int, double?, int?)
Generates a spectrogram from a text prompt.
public virtual Tensor<T> GenerateSpectrogram(string prompt, string? negativePrompt = null, double durationSeconds = 5, int numInferenceSteps = 50, double? guidanceScale = null, int? seed = null)
Parameters
promptstringText description of the desired music.
negativePromptstringOptional negative prompt.
durationSecondsdoubleDesired audio duration in seconds.
numInferenceStepsintNumber of denoising steps.
guidanceScaledouble?Classifier-free guidance scale.
seedint?Optional random seed.
Returns
- Tensor<T>
Generated spectrogram tensor.
GetParameters()
Gets the parameters that can be optimized.
public override Vector<T> GetParameters()
Returns
- Vector<T>
InterpolateStyles(string, string, double, double, int, double?, int?)
Interpolates between two music styles.
public virtual Tensor<T> InterpolateStyles(string promptA, string promptB, double alpha = 0.5, double durationSeconds = 5, int numInferenceSteps = 50, double? guidanceScale = null, int? seed = null)
Parameters
promptAstringFirst style description.
promptBstringSecond style description.
alphadoubleInterpolation factor (0 = promptA, 1 = promptB).
durationSecondsdoubleAudio duration.
numInferenceStepsintDenoising steps.
guidanceScaledouble?Guidance scale.
seedint?Random seed.
Returns
- Tensor<T>
Interpolated spectrogram.
SetParameters(Vector<T>)
Sets the model parameters.
public override void SetParameters(Vector<T> parameters)
Parameters
parametersVector<T>The parameter vector to set.
Remarks
This method allows direct modification of the model's internal parameters.
This is useful for optimization algorithms that need to update parameters iteratively.
If the length of parameters does not match ParameterCount,
an ArgumentException should be thrown.
Exceptions
- ArgumentException
Thrown when the length of
parametersdoes not match ParameterCount.
SpectrogramToAudio(Tensor<T>)
Converts a spectrogram to audio waveform.
public virtual Tensor<T> SpectrogramToAudio(Tensor<T> spectrogram)
Parameters
spectrogramTensor<T>Input spectrogram tensor.
Returns
- Tensor<T>
Audio waveform tensor.