Class DiffWaveModel<T>

Namespace: AiDotNet.Diffusion.Models

Assembly: AiDotNet.dll

DiffWave model for high-quality audio waveform synthesis using diffusion.

public class DiffWaveModel<T> : DiffusionModelBase<T>, IDiffusionModel<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>

Type Parameters

T: The numeric type used for calculations.

Inheritance: object

DiffusionModelBase<T>

DiffWaveModel<T>

Implements: IDiffusionModel<T>

IFullModel<T, Tensor<T>, Tensor<T>>

IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>

IModelSerializer

ICheckpointableModel

IParameterizable<T, Tensor<T>, Tensor<T>>

IFeatureAware

IFeatureImportance<T>

ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>

IGradientComputable<T, Tensor<T>, Tensor<T>>

IJitCompilable<T>

Inherited Members: DiffusionModelBase<T>.NumOps

DiffusionModelBase<T>.RandomGenerator

DiffusionModelBase<T>.LossFunction

DiffusionModelBase<T>.LearningRate

DiffusionModelBase<T>.Scheduler

DiffusionModelBase<T>.DefaultLossFunction

DiffusionModelBase<T>.SupportsJitCompilation

DiffusionModelBase<T>.Generate(int[], int, int?)

DiffusionModelBase<T>.ComputeLoss(Tensor<T>, Tensor<T>, int[])

DiffusionModelBase<T>.Train(Tensor<T>, Tensor<T>)

DiffusionModelBase<T>.Predict(Tensor<T>)

DiffusionModelBase<T>.WithParameters(Vector<T>)

DiffusionModelBase<T>.Serialize()

DiffusionModelBase<T>.Deserialize(byte[])

DiffusionModelBase<T>.SaveModel(string)

DiffusionModelBase<T>.LoadModel(string)

DiffusionModelBase<T>.SaveState(Stream)

DiffusionModelBase<T>.LoadState(Stream)

DiffusionModelBase<T>.GetActiveFeatureIndices()

DiffusionModelBase<T>.SetActiveFeatureIndices(IEnumerable<int>)

DiffusionModelBase<T>.IsFeatureUsed(int)

DiffusionModelBase<T>.GetFeatureImportance()

DiffusionModelBase<T>.ComputeGradients(Tensor<T>, Tensor<T>, ILossFunction<T>)

DiffusionModelBase<T>.ApplyGradients(Vector<T>, T)

DiffusionModelBase<T>.ExportComputationGraph(List<ComputationNode<T>>)

DiffusionModelBase<T>.SampleNoise(int, Random)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Extension Methods: DistributedExtensions.AsDistributedForHighBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributedForLowBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, IShardingConfiguration<T>)

Examples

// Create a DiffWave model
var diffWave = new DiffWaveModel<float>();

// Generate unconditional audio
var audio = diffWave.GenerateAudio(
    sampleLength: 16000,  // 1 second at 16kHz
    numInferenceSteps: 50);

// Generate audio from mel-spectrogram (vocoder mode)
var melSpec = ComputeMelSpectrogram(text);
var vocodedAudio = diffWave.GenerateFromMelSpectrogram(melSpec);

Remarks

DiffWave is a versatile diffusion model for raw audio waveform synthesis. It uses a non-autoregressive architecture with dilated convolutions to achieve high-quality audio generation with fast inference.

For Beginners: DiffWave generates audio (like speech or music) directly as a waveform - the actual audio signal that speakers play.

Unlike spectrograms (visual representations of sound), DiffWave creates:

Raw audio samples that can be played directly
High-quality, natural-sounding audio
Various audio types: speech, music, sound effects

How it works:

Start with random noise (static)
Gradually refine it into clear audio
Use dilated convolutions to understand audio context
Optionally condition on mel-spectrograms or text

Applications:

Text-to-speech synthesis
Music generation
Audio super-resolution
Neural vocoders

Technical details: - Non-autoregressive: generates all samples in parallel - Dilated convolutions: capture long-range audio dependencies - Mel-spectrogram conditioning: for speech synthesis - Fast inference compared to autoregressive models - Supports variable-length audio generation

Reference: Kong et al., "DiffWave: A Versatile Diffusion Model for Audio Synthesis", 2020

Constructors

DiffWaveModel()

Initializes a new instance of DiffWaveModel with default parameters.

public DiffWaveModel()

DiffWaveModel(DiffusionModelOptions<T>?, INoiseScheduler<T>?, int, int, int, int, int, int?)

Initializes a new instance of DiffWaveModel with custom parameters.

public DiffWaveModel(DiffusionModelOptions<T>? options = null, INoiseScheduler<T>? scheduler = null, int residualChannels = 64, int residualLayers = 30, int dilationCycle = 10, int melChannels = 80, int sampleRate = 22050, int? seed = null)

Parameters

options DiffusionModelOptions<T>: Configuration options.
scheduler INoiseScheduler<T>: Optional custom scheduler.
residualChannels int: Number of residual channels.
residualLayers int: Number of residual layers.
dilationCycle int: Dilation cycle length.
melChannels int: Number of mel-spectrogram channels.
sampleRate int: Audio sample rate in Hz.
seed int?: Optional random seed.

Properties

ParameterCount

Gets the number of parameters in the model.

public override int ParameterCount { get; }

Property Value

int

Remarks

This property returns the total count of trainable parameters in the model. It's useful for understanding model complexity and memory requirements.

SampleRate

Gets the sample rate in Hz.

public int SampleRate { get; }

Property Value

int

Methods

Clone()

Creates a deep copy of the model.

public override IDiffusionModel<T> Clone()

Returns

IDiffusionModel<T>: A new instance with the same parameters.

DeepCopy()

Creates a deep copy of this object.

public override IFullModel<T, Tensor<T>, Tensor<T>> DeepCopy()

Returns

IFullModel<T, Tensor<T>, Tensor<T>>

GenerateAudio(int, int, int?)

Generates unconditional audio.

public virtual Tensor<T> GenerateAudio(int sampleLength, int numInferenceSteps = 50, int? seed = null)

Parameters

sampleLength int: Length of audio in samples.
numInferenceSteps int: Number of denoising steps.
seed int?: Optional random seed.

Returns

Tensor<T>: Generated audio waveform tensor [1, sampleLength].

GenerateBatch(int, int, int, int?)

Generates a batch of audio samples.

public virtual Tensor<T> GenerateBatch(int batchSize, int sampleLength, int numInferenceSteps = 50, int? seed = null)

Parameters

batchSize int: Number of samples to generate.
sampleLength int: Length of each sample.
numInferenceSteps int: Number of steps.
seed int?: Random seed.

Returns

Tensor<T>: Batch of audio tensors [batch, sampleLength].

GenerateFromMelSpectrogram(Tensor<T>?, int?, int, int?)

Generates audio from a mel-spectrogram (vocoder mode).

public virtual Tensor<T> GenerateFromMelSpectrogram(Tensor<T>? melSpectrogram = null, int? sampleLength = null, int numInferenceSteps = 50, int? seed = null)

Parameters

melSpectrogram Tensor<T>: Mel-spectrogram tensor [batch, melChannels, frames].
sampleLength int?: Optional target sample length.
numInferenceSteps int: Number of denoising steps.
seed int?: Optional random seed.

Returns

Tensor<T>: Generated audio waveform tensor.

GetModelMetadata()

Retrieves metadata and performance metrics about the trained model.

public override ModelMetadata<T> GetModelMetadata()

Returns

ModelMetadata<T>: An object containing metadata and performance metrics about the trained model.

Remarks

This method provides information about the model's structure, parameters, and performance metrics.

For Beginners: Model metadata is like a report card for your machine learning model.

Just as a report card shows how well a student is performing in different subjects, model metadata shows how well your model is performing and provides details about its structure.

This information typically includes:

Accuracy measures: How well does the model's predictions match actual values?
Error metrics: How far off are the model's predictions on average?
Model parameters: What patterns did the model learn from the data?
Training information: How long did training take? How many iterations were needed?

For example, in a house price prediction model, metadata might include:

Average prediction error (e.g., off by $15,000 on average)
How strongly each feature (bedrooms, location) influences the prediction
How well the model fits the training data

This information helps you understand your model's strengths and weaknesses, and decide if it's ready to use or needs more training.

GetParameters()

Gets the parameters that can be optimized.

public override Vector<T> GetParameters()

Returns

Vector<T>

PredictNoise(Tensor<T>, int)

Predicts the noise in a noisy sample at a given timestep.

public override Tensor<T> PredictNoise(Tensor<T> noisySample, int timestep)

Parameters

noisySample Tensor<T>: The noisy input sample.
timestep int: The current timestep in the diffusion process.

Returns

Tensor<T>: The predicted noise tensor.

Remarks

This is the core prediction that the model learns. Given a noisy sample at timestep t, predict what noise was added to create it.

For Beginners: The model looks at a noisy image and guesses "what noise was added to make it look like this?" This prediction is then used to remove that noise and get a cleaner image.

SetParameters(Vector<T>)

Sets the model parameters.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>: The parameter vector to set.

Remarks

This method allows direct modification of the model's internal parameters. This is useful for optimization algorithms that need to update parameters iteratively. If the length of parameters does not match ParameterCount, an ArgumentException should be thrown.

Exceptions

ArgumentException: Thrown when the length of parameters does not match ParameterCount.

Table of Contents

Class DiffWaveModel<T>

Type Parameters

Examples

Remarks

Constructors

DiffWaveModel()

DiffWaveModel(DiffusionModelOptions<T>?, INoiseScheduler<T>?, int, int, int, int, int, int?)

Parameters

Properties

ParameterCount

Property Value

Remarks

SampleRate

Property Value

Methods

Clone()

Returns

DeepCopy()

Returns

GenerateAudio(int, int, int?)

Parameters

Returns

GenerateBatch(int, int, int, int?)

Parameters

Returns

GenerateFromMelSpectrogram(Tensor<T>?, int?, int, int?)

Parameters

Returns

GetModelMetadata()

Returns

Remarks

GetParameters()

Returns

PredictNoise(Tensor<T>, int)

Parameters

Returns

Remarks

SetParameters(Vector<T>)

Parameters

Remarks

Exceptions