Table of Contents

Interface IMusicSourceSeparator<T>

Namespace
AiDotNet.Interfaces
Assembly
AiDotNet.dll

Interface for music source separation models that isolate individual instruments/vocals from a mix.

public interface IMusicSourceSeparator<T> : IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>

Type Parameters

T

The numeric type used for calculations.

Inherited Members
Extension Methods

Remarks

Music source separation (also called audio source separation or "unmixing") takes a mixed audio signal and separates it into individual components like vocals, drums, bass, and other instruments.

For Beginners: Source separation is like un-mixing a smoothie back into its original fruits.

How it works:

  1. The mixed audio is converted to a spectrogram
  2. A neural network learns which parts belong to which source
  3. Masks are applied to isolate each source
  4. Individual spectrograms are converted back to audio

Common separations:

  • 2-stem: Vocals vs Accompaniment
  • 4-stem: Vocals, Drums, Bass, Other
  • 5-stem: Vocals, Drums, Bass, Piano, Other

Use cases:

  • Karaoke (remove vocals)
  • Remixing (isolate and rearrange parts)
  • Music transcription (analyze individual instruments)
  • Sample extraction (get drum loops, vocal hooks)
  • Music education (practice with isolated parts)

Popular models:

  • Demucs (Facebook/Meta)
  • Spleeter (Deezer)
  • Open-Unmix

This interface extends IFullModel<T, TInput, TOutput> for Tensor-based audio processing.

Properties

IsOnnxMode

Gets whether this model is running in ONNX inference mode.

bool IsOnnxMode { get; }

Property Value

bool

NumStems

Gets the number of stems/sources this model produces.

int NumStems { get; }

Property Value

int

SampleRate

Gets the expected sample rate for input audio.

int SampleRate { get; }

Property Value

int

SupportedSources

Gets the sources this model can separate.

IReadOnlyList<string> SupportedSources { get; }

Property Value

IReadOnlyList<string>

Remarks

Common sources: "vocals", "drums", "bass", "other", "piano".

Methods

ExtractSource(Tensor<T>, string)

Extracts a specific source from the mix.

Tensor<T> ExtractSource(Tensor<T> audio, string source)

Parameters

audio Tensor<T>

Mixed audio waveform tensor.

source string

The source to extract (e.g., "vocals", "drums").

Returns

Tensor<T>

Isolated audio for the requested source.

Remarks

For Beginners: Use this when you only need one specific part. More efficient than separating everything if you only need vocals.

GetSourceMask(Tensor<T>, string)

Gets the soft mask for a specific source.

Tensor<T> GetSourceMask(Tensor<T> audio, string source)

Parameters

audio Tensor<T>

Mixed audio waveform tensor.

source string

The source to get the mask for.

Returns

Tensor<T>

Soft mask tensor indicating how much of each time-frequency bin belongs to this source.

Remarks

For Beginners: A soft mask shows the probability that each part of the audio belongs to a specific source. Values near 1 mean "definitely this source", values near 0 mean "definitely not this source".

Remix(SourceSeparationResult<T>, IReadOnlyDictionary<string, double>)

Remixes the separated sources with custom volumes.

Tensor<T> Remix(SourceSeparationResult<T> separationResult, IReadOnlyDictionary<string, double> sourceVolumes)

Parameters

separationResult SourceSeparationResult<T>

Previous separation result.

sourceVolumes IReadOnlyDictionary<string, double>

Volume multipliers for each source (1.0 = original).

Returns

Tensor<T>

Remixed audio with adjusted source volumes.

RemoveSource(Tensor<T>, string)

Removes a specific source from the mix.

Tensor<T> RemoveSource(Tensor<T> audio, string source)

Parameters

audio Tensor<T>

Mixed audio waveform tensor.

source string

The source to remove (e.g., "vocals" for karaoke).

Returns

Tensor<T>

Audio with the specified source removed.

Remarks

For Beginners: This removes a source instead of extracting it. - RemoveSource("vocals") = karaoke track - RemoveSource("drums") = drumless practice track

Separate(Tensor<T>)

Separates all sources from the audio mix.

SourceSeparationResult<T> Separate(Tensor<T> audio)

Parameters

audio Tensor<T>

Mixed audio waveform tensor [samples] or [channels, samples].

Returns

SourceSeparationResult<T>

Separation result with all isolated sources.

Remarks

For Beginners: This is the main method for separating audio. - Pass in a mixed song - Get back individual tracks for each instrument/voice

SeparateAsync(Tensor<T>, CancellationToken)

Separates all sources asynchronously.

Task<SourceSeparationResult<T>> SeparateAsync(Tensor<T> audio, CancellationToken cancellationToken = default)

Parameters

audio Tensor<T>

Mixed audio waveform tensor.

cancellationToken CancellationToken

Cancellation token for async operation.

Returns

Task<SourceSeparationResult<T>>

Separation result with all isolated sources.