Class RiffusionModel<T>

Namespace: AiDotNet.Diffusion.Models

Assembly: AiDotNet.dll

Riffusion model for music generation via spectrogram diffusion.

public class RiffusionModel<T> : LatentDiffusionModelBase<T>, ILatentDiffusionModel<T>, IDiffusionModel<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>

Type Parameters

T: The numeric type used for calculations.

Inheritance: object

DiffusionModelBase<T>

LatentDiffusionModelBase<T>

RiffusionModel<T>

Implements: ILatentDiffusionModel<T>

IDiffusionModel<T>

IFullModel<T, Tensor<T>, Tensor<T>>

IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>

IModelSerializer

ICheckpointableModel

IParameterizable<T, Tensor<T>, Tensor<T>>

IFeatureAware

IFeatureImportance<T>

ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>

IGradientComputable<T, Tensor<T>, Tensor<T>>

IJitCompilable<T>

Inherited Members: LatentDiffusionModelBase<T>.GuidanceScale

LatentDiffusionModelBase<T>.SupportsNegativePrompt

LatentDiffusionModelBase<T>.SupportsInpainting

LatentDiffusionModelBase<T>.EncodeToLatent(Tensor<T>, bool)

LatentDiffusionModelBase<T>.DecodeFromLatent(Tensor<T>)

LatentDiffusionModelBase<T>.GenerateFromText(string, string, int, int, int, double?, int?)

LatentDiffusionModelBase<T>.ImageToImage(Tensor<T>, string, string, double, int, double?, int?)

LatentDiffusionModelBase<T>.Inpaint(Tensor<T>, Tensor<T>, string, string, int, double?, int?)

LatentDiffusionModelBase<T>.SetGuidanceScale(double)

LatentDiffusionModelBase<T>.PredictNoise(Tensor<T>, int)

LatentDiffusionModelBase<T>.Generate(int[], int, int?)

LatentDiffusionModelBase<T>.ApplyGuidance(Tensor<T>, Tensor<T>, double)

LatentDiffusionModelBase<T>.SampleNoiseTensor(int[], Random)

LatentDiffusionModelBase<T>.ResizeMaskToLatent(Tensor<T>, int[])

LatentDiffusionModelBase<T>.BlendLatentsWithMask(Tensor<T>, Tensor<T>, Tensor<T>, int)

DiffusionModelBase<T>.NumOps

DiffusionModelBase<T>.RandomGenerator

DiffusionModelBase<T>.LossFunction

DiffusionModelBase<T>.LearningRate

DiffusionModelBase<T>.Scheduler

DiffusionModelBase<T>.DefaultLossFunction

DiffusionModelBase<T>.SupportsJitCompilation

DiffusionModelBase<T>.ComputeLoss(Tensor<T>, Tensor<T>, int[])

DiffusionModelBase<T>.Train(Tensor<T>, Tensor<T>)

DiffusionModelBase<T>.Predict(Tensor<T>)

DiffusionModelBase<T>.GetModelMetadata()

DiffusionModelBase<T>.WithParameters(Vector<T>)

DiffusionModelBase<T>.Serialize()

DiffusionModelBase<T>.Deserialize(byte[])

DiffusionModelBase<T>.SaveModel(string)

DiffusionModelBase<T>.LoadModel(string)

DiffusionModelBase<T>.SaveState(Stream)

DiffusionModelBase<T>.LoadState(Stream)

DiffusionModelBase<T>.GetActiveFeatureIndices()

DiffusionModelBase<T>.SetActiveFeatureIndices(IEnumerable<int>)

DiffusionModelBase<T>.IsFeatureUsed(int)

DiffusionModelBase<T>.GetFeatureImportance()

DiffusionModelBase<T>.ComputeGradients(Tensor<T>, Tensor<T>, ILossFunction<T>)

DiffusionModelBase<T>.ApplyGradients(Vector<T>, T)

DiffusionModelBase<T>.ExportComputationGraph(List<ComputationNode<T>>)

DiffusionModelBase<T>.SampleNoise(int, Random)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Extension Methods: DistributedExtensions.AsDistributedForHighBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributedForLowBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, IShardingConfiguration<T>)

Examples

// Create a Riffusion model
var riffusion = new RiffusionModel<float>();

// Generate music from text
var spectrogram = riffusion.GenerateSpectrogram(
    prompt: "jazz piano solo, smooth and relaxing",
    durationSeconds: 5.0);

// Convert to audio
var audio = riffusion.SpectrogramToAudio(spectrogram);

// Interpolate between two styles
var interpolated = riffusion.InterpolateStyles(
    promptA: "upbeat electronic dance music",
    promptB: "calm ambient soundscape",
    alpha: 0.5);

Remarks

Riffusion generates music by treating audio spectrograms as images and using Stable Diffusion to generate them. The resulting spectrograms are then converted back to audio using the Griffin-Lim algorithm or neural vocoders.

For Beginners: Riffusion creates music by first generating a "picture" of the sound (spectrogram), then converting that picture back into actual audio.

How it works:

You describe the music you want: "jazz piano solo"
Riffusion generates a spectrogram (visual representation of sound)
The spectrogram is converted to playable audio

Key features:

Text-to-music generation
Style interpolation (blend two music styles)
Real-time streaming generation
Works with any Stable Diffusion checkpoint

What makes it unique:

Treats audio generation as an image generation problem
Can leverage all SD techniques: ControlNet, img2img, etc.
Fast inference compared to autoregressive music models

Technical details: - Uses mel-spectrograms with specific parameters - Typically 512x512 spectrogram images - Griffin-Lim or neural vocoder for audio reconstruction - Supports seed-based interpolation for smooth transitions - Compatible with LoRA adapters for style transfer

Reference: Based on Riffusion project (riffusion.com)

Constructors

RiffusionModel()

Initializes a new instance of RiffusionModel with default parameters.

public RiffusionModel()

RiffusionModel(DiffusionModelOptions<T>?, INoiseScheduler<T>?, UNetNoisePredictor<T>?, StandardVAE<T>?, IConditioningModule<T>?, SpectrogramConfig?, int?)

Initializes a new instance of RiffusionModel with custom parameters.

public RiffusionModel(DiffusionModelOptions<T>? options = null, INoiseScheduler<T>? scheduler = null, UNetNoisePredictor<T>? unet = null, StandardVAE<T>? vae = null, IConditioningModule<T>? conditioner = null, SpectrogramConfig? spectrogramConfig = null, int? seed = null)

Parameters

options DiffusionModelOptions<T>: Configuration options.
scheduler INoiseScheduler<T>: Optional custom scheduler.
unet UNetNoisePredictor<T>: Optional custom U-Net.
vae StandardVAE<T>: Optional custom VAE.
conditioner IConditioningModule<T>: Optional text conditioning module.
spectrogramConfig SpectrogramConfig: Spectrogram configuration.
seed int?: Optional random seed.

Properties

Conditioner

Gets the conditioning module (optional, for conditioned generation).

public override IConditioningModule<T>? Conditioner { get; }

Property Value

IConditioningModule<T>

LatentChannels

Gets the number of latent channels.

public override int LatentChannels { get; }

Property Value

int

Remarks

Typically 4 for Stable Diffusion models.

NoisePredictor

Gets the noise predictor model (U-Net, DiT, etc.).

public override INoisePredictor<T> NoisePredictor { get; }

Property Value

INoisePredictor<T>

ParameterCount

Gets the number of parameters in the model.

public override int ParameterCount { get; }

Property Value

int

Remarks

This property returns the total count of trainable parameters in the model. It's useful for understanding model complexity and memory requirements.

SpectrogramConfiguration

Gets the spectrogram configuration.

public SpectrogramConfig SpectrogramConfiguration { get; }

Property Value

SpectrogramConfig

VAE

Gets the VAE model used for encoding and decoding.

public override IVAEModel<T> VAE { get; }

Property Value

IVAEModel<T>

Methods

Clone()

Creates a deep copy of the model.

public override IDiffusionModel<T> Clone()

Returns

IDiffusionModel<T>: A new instance with the same parameters.

DeepCopy()

Creates a deep copy of this object.

public override IFullModel<T, Tensor<T>, Tensor<T>> DeepCopy()

Returns

IFullModel<T, Tensor<T>, Tensor<T>>

GenerateAudio(string, string?, double, int, double?, int?)

Generates audio directly from a text prompt.

public virtual Tensor<T> GenerateAudio(string prompt, string? negativePrompt = null, double durationSeconds = 5, int numInferenceSteps = 50, double? guidanceScale = null, int? seed = null)

Parameters

prompt string: Text description of the desired music.
negativePrompt string: Optional negative prompt.
durationSeconds double: Desired audio duration.
numInferenceSteps int: Denoising steps.
guidanceScale double?: Guidance scale.
seed int?: Random seed.

Returns

Tensor<T>: Audio waveform tensor.

GenerateSpectrogram(string, string?, double, int, double?, int?)

Generates a spectrogram from a text prompt.

public virtual Tensor<T> GenerateSpectrogram(string prompt, string? negativePrompt = null, double durationSeconds = 5, int numInferenceSteps = 50, double? guidanceScale = null, int? seed = null)

Parameters

prompt string: Text description of the desired music.
negativePrompt string: Optional negative prompt.
durationSeconds double: Desired audio duration in seconds.
numInferenceSteps int: Number of denoising steps.
guidanceScale double?: Classifier-free guidance scale.
seed int?: Optional random seed.

Returns

Tensor<T>: Generated spectrogram tensor.

GetParameters()

Gets the parameters that can be optimized.

public override Vector<T> GetParameters()

Returns

Vector<T>

InterpolateStyles(string, string, double, double, int, double?, int?)

Interpolates between two music styles.

public virtual Tensor<T> InterpolateStyles(string promptA, string promptB, double alpha = 0.5, double durationSeconds = 5, int numInferenceSteps = 50, double? guidanceScale = null, int? seed = null)

Parameters

promptA string: First style description.
promptB string: Second style description.
alpha double: Interpolation factor (0 = promptA, 1 = promptB).
durationSeconds double: Audio duration.
numInferenceSteps int: Denoising steps.
guidanceScale double?: Guidance scale.
seed int?: Random seed.

Returns

Tensor<T>: Interpolated spectrogram.

SetParameters(Vector<T>)

Sets the model parameters.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>: The parameter vector to set.

Remarks

This method allows direct modification of the model's internal parameters. This is useful for optimization algorithms that need to update parameters iteratively. If the length of parameters does not match ParameterCount, an ArgumentException should be thrown.

Exceptions

ArgumentException: Thrown when the length of parameters does not match ParameterCount.

SpectrogramToAudio(Tensor<T>)

Converts a spectrogram to audio waveform.

public virtual Tensor<T> SpectrogramToAudio(Tensor<T> spectrogram)

Parameters

spectrogram Tensor<T>: Input spectrogram tensor.

Returns

Tensor<T>: Audio waveform tensor.

Table of Contents

Class RiffusionModel<T>

Type Parameters

Examples

Remarks

Constructors

RiffusionModel()

RiffusionModel(DiffusionModelOptions<T>?, INoiseScheduler<T>?, UNetNoisePredictor<T>?, StandardVAE<T>?, IConditioningModule<T>?, SpectrogramConfig?, int?)

Parameters

Properties

Conditioner

Property Value

LatentChannels

Property Value

Remarks

NoisePredictor

Property Value

ParameterCount

Property Value

Remarks

SpectrogramConfiguration

Property Value

VAE

Property Value

Methods

Clone()

Returns

DeepCopy()

Returns

GenerateAudio(string, string?, double, int, double?, int?)

Parameters

Returns

GenerateSpectrogram(string, string?, double, int, double?, int?)

Parameters

Returns

GetParameters()

Returns

InterpolateStyles(string, string, double, double, int, double?, int?)

Parameters

Returns

SetParameters(Vector<T>)

Parameters

Remarks

Exceptions

SpectrogramToAudio(Tensor<T>)

Parameters

Returns