Table of Contents

Class AudioNeuralNetworkBase<T>

Namespace
AiDotNet.Audio
Assembly
AiDotNet.dll

Base class for audio-focused neural networks that can operate in both ONNX inference and native training modes.

public abstract class AudioNeuralNetworkBase<T> : NeuralNetworkBase<T>, INeuralNetworkModel<T>, INeuralNetwork<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>, IInterpretableModel<T>, IInputGradientComputable<T>, IDisposable

Type Parameters

T

The numeric type used for calculations.

Inheritance
AudioNeuralNetworkBase<T>
Implements
IFullModel<T, Tensor<T>, Tensor<T>>
IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>
IParameterizable<T, Tensor<T>, Tensor<T>>
ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>
IGradientComputable<T, Tensor<T>, Tensor<T>>
Derived
Inherited Members
Extension Methods

Remarks

This class extends NeuralNetworkBase<T> to provide audio-specific functionality while maintaining full integration with the AiDotNet neural network infrastructure.

For Beginners: Audio neural networks process sound data (like speech or music). This base class provides:

  • Support for pre-trained ONNX models (fast inference with existing models)
  • Full training capability from scratch (like other neural networks)
  • Audio preprocessing utilities (mel spectrograms, etc.)
  • Sample rate handling

You can use this class in two ways:

  1. Load a pre-trained ONNX model for quick inference
  2. Build and train a new model from scratch

Constructors

AudioNeuralNetworkBase(NeuralNetworkArchitecture<T>, ILossFunction<T>?, double)

Initializes a new instance of the AudioNeuralNetworkBase class with the specified architecture.

protected AudioNeuralNetworkBase(NeuralNetworkArchitecture<T> architecture, ILossFunction<T>? lossFunction = null, double maxGradNorm = 1)

Parameters

architecture NeuralNetworkArchitecture<T>

The neural network architecture.

lossFunction ILossFunction<T>

The loss function to use. If null, a default MSE loss is used.

maxGradNorm double

Maximum gradient norm for gradient clipping.

Properties

DefaultLossFunction

Gets the default loss function for this model.

public override ILossFunction<T> DefaultLossFunction { get; }

Property Value

ILossFunction<T>

IsOnnxMode

Gets whether this model is running in ONNX inference mode.

public bool IsOnnxMode { get; }

Property Value

bool

Remarks

When true, the model uses pre-trained ONNX weights for inference. When false, the model uses native layers and can be trained.

MelSpec

Gets the mel spectrogram extractor for preprocessing.

protected MelSpectrogram<T>? MelSpec { get; set; }

Property Value

MelSpectrogram<T>

NumMels

Gets the number of mel spectrogram channels used by this model.

public int NumMels { get; protected set; }

Property Value

int

Remarks

Mel spectrograms divide the frequency range into perceptual bands. Common values: 64, 80, or 128 mel bins.

OnnxDecoder

Gets or sets the ONNX decoder model (for encoder-decoder architectures).

protected OnnxModel<T>? OnnxDecoder { get; set; }

Property Value

OnnxModel<T>

OnnxEncoder

Gets or sets the ONNX encoder model (for encoder-decoder architectures).

protected OnnxModel<T>? OnnxEncoder { get; set; }

Property Value

OnnxModel<T>

OnnxModel

Gets or sets the ONNX model (for single-model architectures).

protected OnnxModel<T>? OnnxModel { get; set; }

Property Value

OnnxModel<T>

SampleRate

Gets the sample rate expected by this model.

public int SampleRate { get; protected set; }

Property Value

int

Remarks

Common values: 16000 Hz (speech), 22050 Hz (music), 44100 Hz (high quality). Input audio should be resampled to match this rate.

SupportsTraining

Gets whether this network supports training.

public override bool SupportsTraining { get; }

Property Value

bool

Remarks

In ONNX mode, training is not supported - the model is inference-only. In native mode, training is fully supported.

Methods

CreateMelSpectrogram(int, int, int, int)

Creates a mel spectrogram extractor with the model's settings.

protected MelSpectrogram<T> CreateMelSpectrogram(int sampleRate = 16000, int nMels = 80, int nFft = 1024, int hopLength = 256)

Parameters

sampleRate int

Sample rate of input audio.

nMels int

Number of mel bands.

nFft int

FFT window size.

hopLength int

Hop length between frames.

Returns

MelSpectrogram<T>

A configured mel spectrogram extractor.

Dispose(bool)

Disposes of resources used by this model.

protected override void Dispose(bool disposing)

Parameters

disposing bool

True if disposing managed resources.

Forward(Tensor<T>)

Performs a forward pass through the native neural network layers.

protected virtual Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>

Preprocessed input tensor.

Returns

Tensor<T>

Model output tensor.

PostprocessOutput(Tensor<T>)

Postprocesses model output into the final result format.

protected abstract Tensor<T> PostprocessOutput(Tensor<T> modelOutput)

Parameters

modelOutput Tensor<T>

Raw output from the model.

Returns

Tensor<T>

Postprocessed output in the expected format.

PreprocessAudio(Tensor<T>)

Preprocesses raw audio for model input.

protected abstract Tensor<T> PreprocessAudio(Tensor<T> rawAudio)

Parameters

rawAudio Tensor<T>

Raw audio waveform tensor [samples] or [batch, samples].

Returns

Tensor<T>

Preprocessed audio features suitable for model input.

Remarks

For Beginners: Raw audio is just a series of numbers representing sound pressure. Neural networks often work better with transformed representations like mel spectrograms. This method converts raw audio into the format the model expects.

RunOnnxInference(Tensor<T>)

Runs inference using ONNX model(s).

protected virtual Tensor<T> RunOnnxInference(Tensor<T> input)

Parameters

input Tensor<T>

Preprocessed input tensor.

Returns

Tensor<T>

Model output tensor.

Remarks

Override this method to implement ONNX-specific inference logic for models with complex encoder-decoder or multi-model architectures.