Interface IMusicSourceSeparator<T>
- Namespace
- AiDotNet.Interfaces
- Assembly
- AiDotNet.dll
Interface for music source separation models that isolate individual instruments/vocals from a mix.
public interface IMusicSourceSeparator<T> : IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>
Type Parameters
TThe numeric type used for calculations.
- Inherited Members
- Extension Methods
Remarks
Music source separation (also called audio source separation or "unmixing") takes a mixed audio signal and separates it into individual components like vocals, drums, bass, and other instruments.
For Beginners: Source separation is like un-mixing a smoothie back into its original fruits.
How it works:
- The mixed audio is converted to a spectrogram
- A neural network learns which parts belong to which source
- Masks are applied to isolate each source
- Individual spectrograms are converted back to audio
Common separations:
- 2-stem: Vocals vs Accompaniment
- 4-stem: Vocals, Drums, Bass, Other
- 5-stem: Vocals, Drums, Bass, Piano, Other
Use cases:
- Karaoke (remove vocals)
- Remixing (isolate and rearrange parts)
- Music transcription (analyze individual instruments)
- Sample extraction (get drum loops, vocal hooks)
- Music education (practice with isolated parts)
Popular models:
- Demucs (Facebook/Meta)
- Spleeter (Deezer)
- Open-Unmix
This interface extends IFullModel<T, TInput, TOutput> for Tensor-based audio processing.
Properties
IsOnnxMode
Gets whether this model is running in ONNX inference mode.
bool IsOnnxMode { get; }
Property Value
NumStems
Gets the number of stems/sources this model produces.
int NumStems { get; }
Property Value
SampleRate
Gets the expected sample rate for input audio.
int SampleRate { get; }
Property Value
SupportedSources
Gets the sources this model can separate.
IReadOnlyList<string> SupportedSources { get; }
Property Value
Remarks
Common sources: "vocals", "drums", "bass", "other", "piano".
Methods
ExtractSource(Tensor<T>, string)
Extracts a specific source from the mix.
Tensor<T> ExtractSource(Tensor<T> audio, string source)
Parameters
audioTensor<T>Mixed audio waveform tensor.
sourcestringThe source to extract (e.g., "vocals", "drums").
Returns
- Tensor<T>
Isolated audio for the requested source.
Remarks
For Beginners: Use this when you only need one specific part. More efficient than separating everything if you only need vocals.
GetSourceMask(Tensor<T>, string)
Gets the soft mask for a specific source.
Tensor<T> GetSourceMask(Tensor<T> audio, string source)
Parameters
audioTensor<T>Mixed audio waveform tensor.
sourcestringThe source to get the mask for.
Returns
- Tensor<T>
Soft mask tensor indicating how much of each time-frequency bin belongs to this source.
Remarks
For Beginners: A soft mask shows the probability that each part of the audio belongs to a specific source. Values near 1 mean "definitely this source", values near 0 mean "definitely not this source".
Remix(SourceSeparationResult<T>, IReadOnlyDictionary<string, double>)
Remixes the separated sources with custom volumes.
Tensor<T> Remix(SourceSeparationResult<T> separationResult, IReadOnlyDictionary<string, double> sourceVolumes)
Parameters
separationResultSourceSeparationResult<T>Previous separation result.
sourceVolumesIReadOnlyDictionary<string, double>Volume multipliers for each source (1.0 = original).
Returns
- Tensor<T>
Remixed audio with adjusted source volumes.
RemoveSource(Tensor<T>, string)
Removes a specific source from the mix.
Tensor<T> RemoveSource(Tensor<T> audio, string source)
Parameters
audioTensor<T>Mixed audio waveform tensor.
sourcestringThe source to remove (e.g., "vocals" for karaoke).
Returns
- Tensor<T>
Audio with the specified source removed.
Remarks
For Beginners: This removes a source instead of extracting it. - RemoveSource("vocals") = karaoke track - RemoveSource("drums") = drumless practice track
Separate(Tensor<T>)
Separates all sources from the audio mix.
SourceSeparationResult<T> Separate(Tensor<T> audio)
Parameters
audioTensor<T>Mixed audio waveform tensor [samples] or [channels, samples].
Returns
- SourceSeparationResult<T>
Separation result with all isolated sources.
Remarks
For Beginners: This is the main method for separating audio. - Pass in a mixed song - Get back individual tracks for each instrument/voice
SeparateAsync(Tensor<T>, CancellationToken)
Separates all sources asynchronously.
Task<SourceSeparationResult<T>> SeparateAsync(Tensor<T> audio, CancellationToken cancellationToken = default)
Parameters
audioTensor<T>Mixed audio waveform tensor.
cancellationTokenCancellationTokenCancellation token for async operation.
Returns
- Task<SourceSeparationResult<T>>
Separation result with all isolated sources.