Class AudioNeuralNetworkBase<T>
Base class for audio-focused neural networks that can operate in both ONNX inference and native training modes.
public abstract class AudioNeuralNetworkBase<T> : NeuralNetworkBase<T>, INeuralNetworkModel<T>, INeuralNetwork<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>, IInterpretableModel<T>, IInputGradientComputable<T>, IDisposable
Type Parameters
TThe numeric type used for calculations.
- Inheritance
-
AudioNeuralNetworkBase<T>
- Implements
- Derived
- Inherited Members
- Extension Methods
Remarks
This class extends NeuralNetworkBase<T> to provide audio-specific functionality while maintaining full integration with the AiDotNet neural network infrastructure.
For Beginners: Audio neural networks process sound data (like speech or music). This base class provides:
- Support for pre-trained ONNX models (fast inference with existing models)
- Full training capability from scratch (like other neural networks)
- Audio preprocessing utilities (mel spectrograms, etc.)
- Sample rate handling
You can use this class in two ways:
- Load a pre-trained ONNX model for quick inference
- Build and train a new model from scratch
Constructors
AudioNeuralNetworkBase(NeuralNetworkArchitecture<T>, ILossFunction<T>?, double)
Initializes a new instance of the AudioNeuralNetworkBase class with the specified architecture.
protected AudioNeuralNetworkBase(NeuralNetworkArchitecture<T> architecture, ILossFunction<T>? lossFunction = null, double maxGradNorm = 1)
Parameters
architectureNeuralNetworkArchitecture<T>The neural network architecture.
lossFunctionILossFunction<T>The loss function to use. If null, a default MSE loss is used.
maxGradNormdoubleMaximum gradient norm for gradient clipping.
Properties
DefaultLossFunction
Gets the default loss function for this model.
public override ILossFunction<T> DefaultLossFunction { get; }
Property Value
IsOnnxMode
Gets whether this model is running in ONNX inference mode.
public bool IsOnnxMode { get; }
Property Value
Remarks
When true, the model uses pre-trained ONNX weights for inference. When false, the model uses native layers and can be trained.
MelSpec
Gets the mel spectrogram extractor for preprocessing.
protected MelSpectrogram<T>? MelSpec { get; set; }
Property Value
NumMels
Gets the number of mel spectrogram channels used by this model.
public int NumMels { get; protected set; }
Property Value
Remarks
Mel spectrograms divide the frequency range into perceptual bands. Common values: 64, 80, or 128 mel bins.
OnnxDecoder
Gets or sets the ONNX decoder model (for encoder-decoder architectures).
protected OnnxModel<T>? OnnxDecoder { get; set; }
Property Value
- OnnxModel<T>
OnnxEncoder
Gets or sets the ONNX encoder model (for encoder-decoder architectures).
protected OnnxModel<T>? OnnxEncoder { get; set; }
Property Value
- OnnxModel<T>
OnnxModel
Gets or sets the ONNX model (for single-model architectures).
protected OnnxModel<T>? OnnxModel { get; set; }
Property Value
- OnnxModel<T>
SampleRate
Gets the sample rate expected by this model.
public int SampleRate { get; protected set; }
Property Value
Remarks
Common values: 16000 Hz (speech), 22050 Hz (music), 44100 Hz (high quality). Input audio should be resampled to match this rate.
SupportsTraining
Gets whether this network supports training.
public override bool SupportsTraining { get; }
Property Value
Remarks
In ONNX mode, training is not supported - the model is inference-only. In native mode, training is fully supported.
Methods
CreateMelSpectrogram(int, int, int, int)
Creates a mel spectrogram extractor with the model's settings.
protected MelSpectrogram<T> CreateMelSpectrogram(int sampleRate = 16000, int nMels = 80, int nFft = 1024, int hopLength = 256)
Parameters
sampleRateintSample rate of input audio.
nMelsintNumber of mel bands.
nFftintFFT window size.
hopLengthintHop length between frames.
Returns
- MelSpectrogram<T>
A configured mel spectrogram extractor.
Dispose(bool)
Disposes of resources used by this model.
protected override void Dispose(bool disposing)
Parameters
disposingboolTrue if disposing managed resources.
Forward(Tensor<T>)
Performs a forward pass through the native neural network layers.
protected virtual Tensor<T> Forward(Tensor<T> input)
Parameters
inputTensor<T>Preprocessed input tensor.
Returns
- Tensor<T>
Model output tensor.
PostprocessOutput(Tensor<T>)
Postprocesses model output into the final result format.
protected abstract Tensor<T> PostprocessOutput(Tensor<T> modelOutput)
Parameters
modelOutputTensor<T>Raw output from the model.
Returns
- Tensor<T>
Postprocessed output in the expected format.
PreprocessAudio(Tensor<T>)
Preprocesses raw audio for model input.
protected abstract Tensor<T> PreprocessAudio(Tensor<T> rawAudio)
Parameters
rawAudioTensor<T>Raw audio waveform tensor [samples] or [batch, samples].
Returns
- Tensor<T>
Preprocessed audio features suitable for model input.
Remarks
For Beginners: Raw audio is just a series of numbers representing sound pressure. Neural networks often work better with transformed representations like mel spectrograms. This method converts raw audio into the format the model expects.
RunOnnxInference(Tensor<T>)
Runs inference using ONNX model(s).
protected virtual Tensor<T> RunOnnxInference(Tensor<T> input)
Parameters
inputTensor<T>Preprocessed input tensor.
Returns
- Tensor<T>
Model output tensor.
Remarks
Override this method to implement ONNX-specific inference logic for models with complex encoder-decoder or multi-model architectures.