Interface IMusicSourceSeparator<T>

Namespace: AiDotNet.Interfaces

Assembly: AiDotNet.dll

Interface for music source separation models that isolate individual instruments/vocals from a mix.

public interface IMusicSourceSeparator<T> : IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>

Type Parameters

T: The numeric type used for calculations.

Inherited Members: IFullModel<T, Tensor<T>, Tensor<T>>.DefaultLossFunction

IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>.Train(Tensor<T>, Tensor<T>)

IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>.Predict(Tensor<T>)

IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>.GetModelMetadata()

IModelSerializer.Serialize()

IModelSerializer.Deserialize(byte[])

IModelSerializer.SaveModel(string)

IModelSerializer.LoadModel(string)

ICheckpointableModel.SaveState(Stream)

ICheckpointableModel.LoadState(Stream)

IParameterizable<T, Tensor<T>, Tensor<T>>.GetParameters()

IParameterizable<T, Tensor<T>, Tensor<T>>.SetParameters(Vector<T>)

IParameterizable<T, Tensor<T>, Tensor<T>>.ParameterCount

IParameterizable<T, Tensor<T>, Tensor<T>>.WithParameters(Vector<T>)

IFeatureAware.GetActiveFeatureIndices()

IFeatureAware.SetActiveFeatureIndices(IEnumerable<int>)

IFeatureAware.IsFeatureUsed(int)

IFeatureImportance<T>.GetFeatureImportance()

ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>.DeepCopy()

ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>.Clone()

IGradientComputable<T, Tensor<T>, Tensor<T>>.ComputeGradients(Tensor<T>, Tensor<T>, ILossFunction<T>)

IGradientComputable<T, Tensor<T>, Tensor<T>>.ApplyGradients(Vector<T>, T)

IJitCompilable<T>.ExportComputationGraph(List<ComputationNode<T>>)

IJitCompilable<T>.SupportsJitCompilation

Extension Methods: DistributedExtensions.AsDistributedForHighBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributedForLowBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, IShardingConfiguration<T>)

Remarks

Music source separation (also called audio source separation or "unmixing") takes a mixed audio signal and separates it into individual components like vocals, drums, bass, and other instruments.

For Beginners: Source separation is like un-mixing a smoothie back into its original fruits.

How it works:

The mixed audio is converted to a spectrogram
A neural network learns which parts belong to which source
Masks are applied to isolate each source
Individual spectrograms are converted back to audio

Common separations:

2-stem: Vocals vs Accompaniment
4-stem: Vocals, Drums, Bass, Other
5-stem: Vocals, Drums, Bass, Piano, Other

Use cases:

Karaoke (remove vocals)
Remixing (isolate and rearrange parts)
Music transcription (analyze individual instruments)
Sample extraction (get drum loops, vocal hooks)
Music education (practice with isolated parts)

Popular models:

Demucs (Facebook/Meta)
Spleeter (Deezer)
Open-Unmix

This interface extends IFullModel<T, TInput, TOutput> for Tensor-based audio processing.

Properties

IsOnnxMode

Gets whether this model is running in ONNX inference mode.

bool IsOnnxMode { get; }

Property Value

bool

NumStems

Gets the number of stems/sources this model produces.

int NumStems { get; }

Property Value

int

SampleRate

Gets the expected sample rate for input audio.

int SampleRate { get; }

Property Value

int

SupportedSources

Gets the sources this model can separate.

IReadOnlyList<string> SupportedSources { get; }

Property Value

IReadOnlyList<string>

Remarks

Common sources: "vocals", "drums", "bass", "other", "piano".

Methods

ExtractSource(Tensor<T>, string)

Extracts a specific source from the mix.

Tensor<T> ExtractSource(Tensor<T> audio, string source)

Parameters

audio Tensor<T>: Mixed audio waveform tensor.
source string: The source to extract (e.g., "vocals", "drums").

Returns

Tensor<T>: Isolated audio for the requested source.

Remarks

For Beginners: Use this when you only need one specific part. More efficient than separating everything if you only need vocals.

GetSourceMask(Tensor<T>, string)

Gets the soft mask for a specific source.

Tensor<T> GetSourceMask(Tensor<T> audio, string source)

Parameters

audio Tensor<T>: Mixed audio waveform tensor.
source string: The source to get the mask for.

Returns

Tensor<T>: Soft mask tensor indicating how much of each time-frequency bin belongs to this source.

Remarks

For Beginners: A soft mask shows the probability that each part of the audio belongs to a specific source. Values near 1 mean "definitely this source", values near 0 mean "definitely not this source".

Remix(SourceSeparationResult<T>, IReadOnlyDictionary<string, double>)

Remixes the separated sources with custom volumes.

Tensor<T> Remix(SourceSeparationResult<T> separationResult, IReadOnlyDictionary<string, double> sourceVolumes)

Parameters

separationResult SourceSeparationResult<T>: Previous separation result.
sourceVolumes IReadOnlyDictionary<string, double>: Volume multipliers for each source (1.0 = original).

Returns

Tensor<T>: Remixed audio with adjusted source volumes.

RemoveSource(Tensor<T>, string)

Removes a specific source from the mix.

Tensor<T> RemoveSource(Tensor<T> audio, string source)

Parameters

audio Tensor<T>: Mixed audio waveform tensor.
source string: The source to remove (e.g., "vocals" for karaoke).

Returns

Tensor<T>: Audio with the specified source removed.

Remarks

For Beginners: This removes a source instead of extracting it. - RemoveSource("vocals") = karaoke track - RemoveSource("drums") = drumless practice track

Separate(Tensor<T>)

Separates all sources from the audio mix.

SourceSeparationResult<T> Separate(Tensor<T> audio)

Parameters

audio Tensor<T>: Mixed audio waveform tensor [samples] or [channels, samples].

Returns

SourceSeparationResult<T>: Separation result with all isolated sources.

Remarks

For Beginners: This is the main method for separating audio. - Pass in a mixed song - Get back individual tracks for each instrument/voice

SeparateAsync(Tensor<T>, CancellationToken)

Separates all sources asynchronously.

Task<SourceSeparationResult<T>> SeparateAsync(Tensor<T> audio, CancellationToken cancellationToken = default)

Parameters

audio Tensor<T>: Mixed audio waveform tensor.
cancellationToken CancellationToken: Cancellation token for async operation.

Returns

Task<SourceSeparationResult<T>>: Separation result with all isolated sources.

Table of Contents

Interface IMusicSourceSeparator<T>

Type Parameters

Remarks

Properties

IsOnnxMode

Property Value

NumStems

Property Value

SampleRate

Property Value

SupportedSources

Property Value

Remarks

Methods

ExtractSource(Tensor<T>, string)

Parameters

Returns

Remarks

GetSourceMask(Tensor<T>, string)

Parameters

Returns

Remarks

Remix(SourceSeparationResult<T>, IReadOnlyDictionary<string, double>)

Parameters

Returns

RemoveSource(Tensor<T>, string)

Parameters

Returns

Remarks

Separate(Tensor<T>)

Parameters

Returns

Remarks

SeparateAsync(Tensor<T>, CancellationToken)

Parameters

Returns