Table of Contents

Class MusicSourceSeparator<T>

Namespace
AiDotNet.Audio.SourceSeparation
Assembly
AiDotNet.dll

Music source separation model for separating audio into stems (vocals, drums, bass, other).

public class MusicSourceSeparator<T> : AudioNeuralNetworkBase<T>, INeuralNetworkModel<T>, INeuralNetwork<T>, IInterpretableModel<T>, IInputGradientComputable<T>, IDisposable, IMusicSourceSeparator<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>

Type Parameters

T

The numeric type used for calculations.

Inheritance
MusicSourceSeparator<T>
Implements
IFullModel<T, Tensor<T>, Tensor<T>>
IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>
IParameterizable<T, Tensor<T>, Tensor<T>>
ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>
IGradientComputable<T, Tensor<T>, Tensor<T>>
Inherited Members
Extension Methods

Remarks

This implements a U-Net based source separation approach similar to Spleeter/Demucs. The model separates mixed audio into individual instrument stems using spectral masking.

For Beginners: Source separation is like unmixing a smoothie:

  • Input: Mixed audio with multiple instruments and vocals
  • Output: Separate tracks for vocals, drums, bass, and other instruments
  • Uses neural networks to predict which parts of the spectrum belong to each source

Usage with ONNX model:

var separator = await MusicSourceSeparator<float>.CreateAsync();
var stems = separator.Separate(mixedAudio);
var vocals = stems.GetSource("vocals");

Usage for training:

var architecture = new NeuralNetworkArchitecture<float>(inputFeatures: 1025, outputSize: 4*1025);
var separator = new MusicSourceSeparator<float>(architecture);
separator.Train(mixed, stems);

Constructors

MusicSourceSeparator(SourceSeparationOptions?)

Creates a MusicSourceSeparator for CPU-based spectral processing.

public MusicSourceSeparator(SourceSeparationOptions? options = null)

Parameters

options SourceSeparationOptions

Optional configuration options.

MusicSourceSeparator(NeuralNetworkArchitecture<T>, SourceSeparationOptions?, IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>?)

Creates a MusicSourceSeparator for native training mode.

public MusicSourceSeparator(NeuralNetworkArchitecture<T> architecture, SourceSeparationOptions? options = null, IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>? optimizer = null)

Parameters

architecture NeuralNetworkArchitecture<T>

Neural network architecture.

options SourceSeparationOptions

Optional configuration options.

optimizer IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>

Optional custom optimizer.

MusicSourceSeparator(string, SourceSeparationOptions?)

Creates a MusicSourceSeparator for ONNX inference mode.

public MusicSourceSeparator(string modelPath, SourceSeparationOptions? options = null)

Parameters

modelPath string

Path to the ONNX model file.

options SourceSeparationOptions

Optional configuration options.

Fields

FiveStemSources

Source names for 5-stem separation.

public static readonly string[] FiveStemSources

Field Value

string[]

StandardSources

Standard source names for 4-stem separation.

public static readonly string[] StandardSources

Field Value

string[]

TwoStemSources

Source names for 2-stem separation.

public static readonly string[] TwoStemSources

Field Value

string[]

Properties

NumStems

Gets the number of stems/sources this model produces.

public int NumStems { get; }

Property Value

int

SupportedSources

Gets the sources this model can separate.

public IReadOnlyList<string> SupportedSources { get; }

Property Value

IReadOnlyList<string>

Methods

CreateAsync(SourceSeparationOptions?, IProgress<double>?, CancellationToken)

Creates a MusicSourceSeparator asynchronously, downloading models if needed.

public static Task<MusicSourceSeparator<T>> CreateAsync(SourceSeparationOptions? options = null, IProgress<double>? progress = null, CancellationToken cancellationToken = default)

Parameters

options SourceSeparationOptions
progress IProgress<double>
cancellationToken CancellationToken

Returns

Task<MusicSourceSeparator<T>>

CreateCpuOnly(SourceSeparationOptions?)

Creates a MusicSourceSeparator for CPU-based spectral processing without neural network.

public static MusicSourceSeparator<T> CreateCpuOnly(SourceSeparationOptions? options = null)

Parameters

options SourceSeparationOptions

Returns

MusicSourceSeparator<T>

CreateNewInstance()

Creates a new instance of this network type.

protected override IFullModel<T, Tensor<T>, Tensor<T>> CreateNewInstance()

Returns

IFullModel<T, Tensor<T>, Tensor<T>>

DeserializeNetworkSpecificData(BinaryReader)

Deserializes network-specific data.

protected override void DeserializeNetworkSpecificData(BinaryReader reader)

Parameters

reader BinaryReader

Dispose(bool)

Disposes of managed resources.

protected override void Dispose(bool disposing)

Parameters

disposing bool

ExtractSource(Tensor<T>, string)

Extracts a specific source from the mix.

public Tensor<T> ExtractSource(Tensor<T> audio, string source)

Parameters

audio Tensor<T>
source string

Returns

Tensor<T>

GetModelMetadata()

Gets model metadata for serialization.

public override ModelMetadata<T> GetModelMetadata()

Returns

ModelMetadata<T>

GetSourceMask(Tensor<T>, string)

Gets the soft mask for a specific source.

public Tensor<T> GetSourceMask(Tensor<T> audio, string source)

Parameters

audio Tensor<T>
source string

Returns

Tensor<T>

InitializeLayers()

Initializes the neural network layers.

protected override void InitializeLayers()

PostprocessOutput(Tensor<T>)

Postprocesses model output (applies sigmoid to mask values).

protected override Tensor<T> PostprocessOutput(Tensor<T> modelOutput)

Parameters

modelOutput Tensor<T>

Returns

Tensor<T>

Predict(Tensor<T>)

Predicts source masks from spectrogram magnitude.

public override Tensor<T> Predict(Tensor<T> input)

Parameters

input Tensor<T>

Returns

Tensor<T>

PreprocessAudio(Tensor<T>)

Preprocesses raw audio into spectrogram format.

protected override Tensor<T> PreprocessAudio(Tensor<T> rawAudio)

Parameters

rawAudio Tensor<T>

Returns

Tensor<T>

Remix(SourceSeparationResult<T>, IReadOnlyDictionary<string, double>)

Remixes the separated sources with custom volumes.

public Tensor<T> Remix(SourceSeparationResult<T> separationResult, IReadOnlyDictionary<string, double> sourceVolumes)

Parameters

separationResult SourceSeparationResult<T>
sourceVolumes IReadOnlyDictionary<string, double>

Returns

Tensor<T>

RemoveSource(Tensor<T>, string)

Removes a specific source from the mix.

public Tensor<T> RemoveSource(Tensor<T> audio, string source)

Parameters

audio Tensor<T>
source string

Returns

Tensor<T>

Separate(Tensor<T>)

Separates all sources from the audio mix.

public SourceSeparationResult<T> Separate(Tensor<T> audio)

Parameters

audio Tensor<T>

Returns

SourceSeparationResult<T>

SeparateAsync(Tensor<T>, CancellationToken)

Separates all sources asynchronously.

public Task<SourceSeparationResult<T>> SeparateAsync(Tensor<T> audio, CancellationToken cancellationToken = default)

Parameters

audio Tensor<T>
cancellationToken CancellationToken

Returns

Task<SourceSeparationResult<T>>

SerializeNetworkSpecificData(BinaryWriter)

Serializes network-specific data.

protected override void SerializeNetworkSpecificData(BinaryWriter writer)

Parameters

writer BinaryWriter

Train(Tensor<T>, Tensor<T>)

Trains the model on mixed audio and ground truth stems.

public override void Train(Tensor<T> input, Tensor<T> expected)

Parameters

input Tensor<T>
expected Tensor<T>

UpdateParameters(Vector<T>)

Updates parameters from a flattened parameter vector.

public override void UpdateParameters(Vector<T> parameters)

Parameters

parameters Vector<T>