Interface IAudioGenerator<T>

Namespace: AiDotNet.Interfaces

Assembly: AiDotNet.dll

Interface for audio generation models that create audio from text descriptions or other conditions.

public interface IAudioGenerator<T> : IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>

Type Parameters

T: The numeric type used for calculations.

Inherited Members: IFullModel<T, Tensor<T>, Tensor<T>>.DefaultLossFunction

IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>.Train(Tensor<T>, Tensor<T>)

IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>.Predict(Tensor<T>)

IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>.GetModelMetadata()

IModelSerializer.Serialize()

IModelSerializer.Deserialize(byte[])

IModelSerializer.SaveModel(string)

IModelSerializer.LoadModel(string)

ICheckpointableModel.SaveState(Stream)

ICheckpointableModel.LoadState(Stream)

IParameterizable<T, Tensor<T>, Tensor<T>>.GetParameters()

IParameterizable<T, Tensor<T>, Tensor<T>>.SetParameters(Vector<T>)

IParameterizable<T, Tensor<T>, Tensor<T>>.ParameterCount

IParameterizable<T, Tensor<T>, Tensor<T>>.WithParameters(Vector<T>)

IFeatureAware.GetActiveFeatureIndices()

IFeatureAware.SetActiveFeatureIndices(IEnumerable<int>)

IFeatureAware.IsFeatureUsed(int)

IFeatureImportance<T>.GetFeatureImportance()

ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>.DeepCopy()

ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>.Clone()

IGradientComputable<T, Tensor<T>, Tensor<T>>.ComputeGradients(Tensor<T>, Tensor<T>, ILossFunction<T>)

IGradientComputable<T, Tensor<T>, Tensor<T>>.ApplyGradients(Vector<T>, T)

IJitCompilable<T>.ExportComputationGraph(List<ComputationNode<T>>)

IJitCompilable<T>.SupportsJitCompilation

Extension Methods: DistributedExtensions.AsDistributedForHighBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributedForLowBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, IShardingConfiguration<T>)

Remarks

Audio generation models create sounds, music, and audio effects from various inputs. Unlike TTS which focuses on speech, audio generators can produce any type of sound including music, environmental sounds, and sound effects.

For Beginners: Audio generation is like having an artist who can create any sound you describe.

How audio generation works:

You provide a description ("A dog barking in a park")
The model generates audio features that match the description
The features are converted to playable audio

Types of audio generation:

Text-to-Audio: "Thunder during a storm" creates thunder sounds
Text-to-Music: "Upbeat jazz piano" creates music
Audio Inpainting: Fill in missing parts of audio
Audio Continuation: Extend existing audio naturally

Common use cases:

Video game sound effects
Film and media production
Music composition assistance
Podcast and content creation

This interface extends IFullModel<T, TInput, TOutput> for Tensor-based audio processing.

Properties

IsOnnxMode

Gets whether this model is running in ONNX inference mode.

bool IsOnnxMode { get; }

Property Value

bool

Remarks

When true, the model uses pre-trained ONNX weights for inference. When false, the model can be trained from scratch using the neural network infrastructure.

MaxDurationSeconds

Gets the maximum duration of audio that can be generated in seconds.

double MaxDurationSeconds { get; }

Property Value

double

SampleRate

Gets the sample rate of generated audio.

int SampleRate { get; }

Property Value

int

Remarks

Common values: 16000 Hz (low quality), 22050 Hz (medium), 44100 Hz (CD quality).

SupportsAudioContinuation

Gets whether this model supports audio continuation.

bool SupportsAudioContinuation { get; }

Property Value

bool

SupportsAudioInpainting

Gets whether this model supports audio inpainting.

bool SupportsAudioInpainting { get; }

Property Value

bool

SupportsTextToAudio

Gets whether this model supports text-to-audio generation.

bool SupportsTextToAudio { get; }

Property Value

bool

SupportsTextToMusic

Gets whether this model supports text-to-music generation.

bool SupportsTextToMusic { get; }

Property Value

bool

Methods

ContinueAudio(Tensor<T>, string?, double, int, int?)

Continues existing audio to extend it naturally.

Tensor<T> ContinueAudio(Tensor<T> inputAudio, string? prompt = null, double extensionSeconds = 5, int numInferenceSteps = 100, int? seed = null)

Parameters

inputAudio Tensor<T>: The audio to continue from.
prompt string: Optional text guidance for continuation.
extensionSeconds double: How many seconds to add.
numInferenceSteps int: Number of generation steps.
seed int?: Random seed for reproducibility.

Returns

Tensor<T>: Extended audio waveform (original + continuation).

Remarks

For Beginners: This extends audio by generating more that follows naturally. - Input: 5 seconds of guitar - Output: Original + 10 more seconds in the same style

Exceptions

NotSupportedException: Thrown if continuation is not supported.

GenerateAudio(string, string?, double, int, double, int?)

Generates audio from a text description.

Tensor<T> GenerateAudio(string prompt, string? negativePrompt = null, double durationSeconds = 5, int numInferenceSteps = 100, double guidanceScale = 3, int? seed = null)

Parameters

prompt string: Text description of the desired audio.
negativePrompt string: What to avoid in the generated audio.
durationSeconds double: Length of audio to generate.
numInferenceSteps int: Number of generation steps (more = higher quality).
guidanceScale double: How closely to follow the prompt (higher = more literal).
seed int?: Random seed for reproducibility.

Returns

Tensor<T>: Generated audio waveform tensor [samples] or [channels, samples].

Remarks

For Beginners: This creates sound effects or ambient audio from descriptions. - prompt: "Ocean waves crashing on a beach" creates wave sounds - prompt: "Birds chirping in a forest" creates bird sounds - negativePrompt: "No human voices" prevents speech in the output

GenerateAudioAsync(string, string?, double, int, double, int?, CancellationToken)

Generates audio from a text description asynchronously.

Task<Tensor<T>> GenerateAudioAsync(string prompt, string? negativePrompt = null, double durationSeconds = 5, int numInferenceSteps = 100, double guidanceScale = 3, int? seed = null, CancellationToken cancellationToken = default)

Parameters

prompt string: Text description of the desired audio.
negativePrompt string: What to avoid in the generated audio.
durationSeconds double: Length of audio to generate.
numInferenceSteps int: Number of generation steps.
guidanceScale double: How closely to follow the prompt.
seed int?: Random seed for reproducibility.
cancellationToken CancellationToken: Cancellation token for async operation.

Returns

Task<Tensor<T>>: Generated audio waveform tensor.

GenerateMusic(string, string?, double, int, double, int?)

Generates music from a text description.

Tensor<T> GenerateMusic(string prompt, string? negativePrompt = null, double durationSeconds = 10, int numInferenceSteps = 100, double guidanceScale = 3, int? seed = null)

Parameters

prompt string: Text description of the desired music.
negativePrompt string: What to avoid in the generated music.
durationSeconds double: Length of music to generate.
numInferenceSteps int: Number of generation steps.
guidanceScale double: How closely to follow the prompt.
seed int?: Random seed for reproducibility.

Returns

Tensor<T>: Generated music waveform tensor.

Remarks

For Beginners: This creates music from descriptions. - prompt: "Relaxing piano melody" creates piano music - prompt: "Energetic rock guitar riff" creates rock music

GetDefaultOptions()

Gets generation options for advanced control.

AudioGenerationOptions<T> GetDefaultOptions()

Returns

AudioGenerationOptions<T>

InpaintAudio(Tensor<T>, Tensor<T>, string?, int, int?)

Fills in missing or masked sections of audio.

Tensor<T> InpaintAudio(Tensor<T> audio, Tensor<T> mask, string? prompt = null, int numInferenceSteps = 100, int? seed = null)

Parameters

audio Tensor<T>: Audio with sections to fill.
mask Tensor<T>: Mask tensor indicating which samples to regenerate (1 = regenerate, 0 = keep).
prompt string: Optional text guidance for inpainting.
numInferenceSteps int: Number of generation steps.
seed int?: Random seed for reproducibility.

Returns

Tensor<T>: Audio with masked sections filled in.

Remarks

For Beginners: This fills in gaps in audio, like photo inpainting but for sound. - Input: Audio with a 2-second gap (maybe someone coughed) - Output: Audio with the gap filled seamlessly

Exceptions

NotSupportedException: Thrown if inpainting is not supported.

Table of Contents

Interface IAudioGenerator<T>

Type Parameters

Remarks

Properties

IsOnnxMode

Property Value

Remarks

MaxDurationSeconds

Property Value

SampleRate

Property Value

Remarks

SupportsAudioContinuation

Property Value

SupportsAudioInpainting

Property Value

SupportsTextToAudio

Property Value

SupportsTextToMusic

Property Value

Methods

ContinueAudio(Tensor<T>, string?, double, int, int?)

Parameters

Returns

Remarks

Exceptions

GenerateAudio(string, string?, double, int, double, int?)

Parameters

Returns

Remarks

GenerateAudioAsync(string, string?, double, int, double, int?, CancellationToken)

Parameters

Returns

GenerateMusic(string, string?, double, int, double, int?)

Parameters

Returns

Remarks

GetDefaultOptions()

Returns

InpaintAudio(Tensor<T>, Tensor<T>, string?, int, int?)

Parameters

Returns

Remarks

Exceptions