Class MusicSourceSeparator<T>
- Namespace
- AiDotNet.Audio.SourceSeparation
- Assembly
- AiDotNet.dll
Music source separation model for separating audio into stems (vocals, drums, bass, other).
public class MusicSourceSeparator<T> : AudioNeuralNetworkBase<T>, INeuralNetworkModel<T>, INeuralNetwork<T>, IInterpretableModel<T>, IInputGradientComputable<T>, IDisposable, IMusicSourceSeparator<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>
Type Parameters
TThe numeric type used for calculations.
- Inheritance
-
MusicSourceSeparator<T>
- Implements
- Inherited Members
- Extension Methods
Remarks
This implements a U-Net based source separation approach similar to Spleeter/Demucs. The model separates mixed audio into individual instrument stems using spectral masking.
For Beginners: Source separation is like unmixing a smoothie:
- Input: Mixed audio with multiple instruments and vocals
- Output: Separate tracks for vocals, drums, bass, and other instruments
- Uses neural networks to predict which parts of the spectrum belong to each source
Usage with ONNX model:
var separator = await MusicSourceSeparator<float>.CreateAsync();
var stems = separator.Separate(mixedAudio);
var vocals = stems.GetSource("vocals");
Usage for training:
var architecture = new NeuralNetworkArchitecture<float>(inputFeatures: 1025, outputSize: 4*1025);
var separator = new MusicSourceSeparator<float>(architecture);
separator.Train(mixed, stems);
Constructors
MusicSourceSeparator(SourceSeparationOptions?)
Creates a MusicSourceSeparator for CPU-based spectral processing.
public MusicSourceSeparator(SourceSeparationOptions? options = null)
Parameters
optionsSourceSeparationOptionsOptional configuration options.
MusicSourceSeparator(NeuralNetworkArchitecture<T>, SourceSeparationOptions?, IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>?)
Creates a MusicSourceSeparator for native training mode.
public MusicSourceSeparator(NeuralNetworkArchitecture<T> architecture, SourceSeparationOptions? options = null, IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>? optimizer = null)
Parameters
architectureNeuralNetworkArchitecture<T>Neural network architecture.
optionsSourceSeparationOptionsOptional configuration options.
optimizerIGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>Optional custom optimizer.
MusicSourceSeparator(string, SourceSeparationOptions?)
Creates a MusicSourceSeparator for ONNX inference mode.
public MusicSourceSeparator(string modelPath, SourceSeparationOptions? options = null)
Parameters
modelPathstringPath to the ONNX model file.
optionsSourceSeparationOptionsOptional configuration options.
Fields
FiveStemSources
Source names for 5-stem separation.
public static readonly string[] FiveStemSources
Field Value
- string[]
StandardSources
Standard source names for 4-stem separation.
public static readonly string[] StandardSources
Field Value
- string[]
TwoStemSources
Source names for 2-stem separation.
public static readonly string[] TwoStemSources
Field Value
- string[]
Properties
NumStems
Gets the number of stems/sources this model produces.
public int NumStems { get; }
Property Value
SupportedSources
Gets the sources this model can separate.
public IReadOnlyList<string> SupportedSources { get; }
Property Value
Methods
CreateAsync(SourceSeparationOptions?, IProgress<double>?, CancellationToken)
Creates a MusicSourceSeparator asynchronously, downloading models if needed.
public static Task<MusicSourceSeparator<T>> CreateAsync(SourceSeparationOptions? options = null, IProgress<double>? progress = null, CancellationToken cancellationToken = default)
Parameters
optionsSourceSeparationOptionsprogressIProgress<double>cancellationTokenCancellationToken
Returns
CreateCpuOnly(SourceSeparationOptions?)
Creates a MusicSourceSeparator for CPU-based spectral processing without neural network.
public static MusicSourceSeparator<T> CreateCpuOnly(SourceSeparationOptions? options = null)
Parameters
optionsSourceSeparationOptions
Returns
CreateNewInstance()
Creates a new instance of this network type.
protected override IFullModel<T, Tensor<T>, Tensor<T>> CreateNewInstance()
Returns
- IFullModel<T, Tensor<T>, Tensor<T>>
DeserializeNetworkSpecificData(BinaryReader)
Deserializes network-specific data.
protected override void DeserializeNetworkSpecificData(BinaryReader reader)
Parameters
readerBinaryReader
Dispose(bool)
Disposes of managed resources.
protected override void Dispose(bool disposing)
Parameters
disposingbool
ExtractSource(Tensor<T>, string)
Extracts a specific source from the mix.
public Tensor<T> ExtractSource(Tensor<T> audio, string source)
Parameters
audioTensor<T>sourcestring
Returns
- Tensor<T>
GetModelMetadata()
Gets model metadata for serialization.
public override ModelMetadata<T> GetModelMetadata()
Returns
GetSourceMask(Tensor<T>, string)
Gets the soft mask for a specific source.
public Tensor<T> GetSourceMask(Tensor<T> audio, string source)
Parameters
audioTensor<T>sourcestring
Returns
- Tensor<T>
InitializeLayers()
Initializes the neural network layers.
protected override void InitializeLayers()
PostprocessOutput(Tensor<T>)
Postprocesses model output (applies sigmoid to mask values).
protected override Tensor<T> PostprocessOutput(Tensor<T> modelOutput)
Parameters
modelOutputTensor<T>
Returns
- Tensor<T>
Predict(Tensor<T>)
Predicts source masks from spectrogram magnitude.
public override Tensor<T> Predict(Tensor<T> input)
Parameters
inputTensor<T>
Returns
- Tensor<T>
PreprocessAudio(Tensor<T>)
Preprocesses raw audio into spectrogram format.
protected override Tensor<T> PreprocessAudio(Tensor<T> rawAudio)
Parameters
rawAudioTensor<T>
Returns
- Tensor<T>
Remix(SourceSeparationResult<T>, IReadOnlyDictionary<string, double>)
Remixes the separated sources with custom volumes.
public Tensor<T> Remix(SourceSeparationResult<T> separationResult, IReadOnlyDictionary<string, double> sourceVolumes)
Parameters
separationResultSourceSeparationResult<T>sourceVolumesIReadOnlyDictionary<string, double>
Returns
- Tensor<T>
RemoveSource(Tensor<T>, string)
Removes a specific source from the mix.
public Tensor<T> RemoveSource(Tensor<T> audio, string source)
Parameters
audioTensor<T>sourcestring
Returns
- Tensor<T>
Separate(Tensor<T>)
Separates all sources from the audio mix.
public SourceSeparationResult<T> Separate(Tensor<T> audio)
Parameters
audioTensor<T>
Returns
SeparateAsync(Tensor<T>, CancellationToken)
Separates all sources asynchronously.
public Task<SourceSeparationResult<T>> SeparateAsync(Tensor<T> audio, CancellationToken cancellationToken = default)
Parameters
audioTensor<T>cancellationTokenCancellationToken
Returns
SerializeNetworkSpecificData(BinaryWriter)
Serializes network-specific data.
protected override void SerializeNetworkSpecificData(BinaryWriter writer)
Parameters
writerBinaryWriter
Train(Tensor<T>, Tensor<T>)
Trains the model on mixed audio and ground truth stems.
public override void Train(Tensor<T> input, Tensor<T> expected)
Parameters
inputTensor<T>expectedTensor<T>
UpdateParameters(Vector<T>)
Updates parameters from a flattened parameter vector.
public override void UpdateParameters(Vector<T> parameters)
Parameters
parametersVector<T>