Class AudioNeuralNetworkBase<T>

Namespace: AiDotNet.Audio

Assembly: AiDotNet.dll

Base class for audio-focused neural networks that can operate in both ONNX inference and native training modes.

public abstract class AudioNeuralNetworkBase<T> : NeuralNetworkBase<T>, INeuralNetworkModel<T>, INeuralNetwork<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>, IInterpretableModel<T>, IInputGradientComputable<T>, IDisposable

Type Parameters

T: The numeric type used for calculations.

Inheritance: object

NeuralNetworkBase<T>

AudioNeuralNetworkBase<T>

Implements: INeuralNetworkModel<T>

INeuralNetwork<T>

IFullModel<T, Tensor<T>, Tensor<T>>

IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>

IModelSerializer

ICheckpointableModel

IParameterizable<T, Tensor<T>, Tensor<T>>

IFeatureAware

IFeatureImportance<T>

ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>

IGradientComputable<T, Tensor<T>, Tensor<T>>

IJitCompilable<T>

IInterpretableModel<T>

IInputGradientComputable<T>

IDisposable

Inherited Members: NeuralNetworkBase<T>.Layers

NeuralNetworkBase<T>.LayerCount

NeuralNetworkBase<T>.Architecture

NeuralNetworkBase<T>.NumOps

NeuralNetworkBase<T>.Engine

NeuralNetworkBase<T>._layerInputs

NeuralNetworkBase<T>._layerOutputs

NeuralNetworkBase<T>.Random

NeuralNetworkBase<T>.LossFunction

NeuralNetworkBase<T>.LastLoss

NeuralNetworkBase<T>.IsTrainingMode

NeuralNetworkBase<T>.SupportsGpuTraining

NeuralNetworkBase<T>.CanTrainOnGpu

NeuralNetworkBase<T>.GpuEngine

NeuralNetworkBase<T>.MaxGradNorm

NeuralNetworkBase<T>._mixedPrecisionContext

NeuralNetworkBase<T>._memoryManager

NeuralNetworkBase<T>.IsMemoryManagementEnabled

NeuralNetworkBase<T>.IsGradientCheckpointingEnabled

NeuralNetworkBase<T>.IsMixedPrecisionEnabled

NeuralNetworkBase<T>.ClipGradients(List<Tensor<T>>)

NeuralNetworkBase<T>.ClipGradient(Tensor<T>)

NeuralNetworkBase<T>.ClipGradient(Vector<T>)

NeuralNetworkBase<T>.GetParameters()

NeuralNetworkBase<T>.Backpropagate(Tensor<T>)

NeuralNetworkBase<T>.BackpropagateWithRecompute(Tensor<T>)

NeuralNetworkBase<T>.ForwardGpu(IGpuTensor<T>)

NeuralNetworkBase<T>.BackpropagateGpu(IGpuTensor<T>)

NeuralNetworkBase<T>.BackpropagateGpuDeferred(IGpuTensor<T>, GpuExecutionOptions)

NeuralNetworkBase<T>.UpdateParametersGpu(T, T, T)

NeuralNetworkBase<T>.UpdateParametersGpu(IGpuOptimizerConfig)

NeuralNetworkBase<T>.UpdateParametersGpuDeferred(IGpuOptimizerConfig, GpuExecutionOptions)

NeuralNetworkBase<T>.TrainBatchGpuDeferred(IGpuTensor<T>, IGpuTensor<T>, IGpuOptimizerConfig, GpuExecutionOptions)

NeuralNetworkBase<T>.TrainBatchGpuDeferredAsync(IGpuTensor<T>, IGpuTensor<T>, IGpuOptimizerConfig, GpuExecutionOptions, CancellationToken)

NeuralNetworkBase<T>.UploadWeightsToGpu()

NeuralNetworkBase<T>.DownloadWeightsFromGpu()

NeuralNetworkBase<T>.ZeroGradientsGpu()

NeuralNetworkBase<T>.ExtractSingleExample(Tensor<T>, int)

NeuralNetworkBase<T>.ForwardWithMemory(Tensor<T>)

NeuralNetworkBase<T>.ForwardWithCheckpointing(Tensor<T>)

NeuralNetworkBase<T>.CanUseGpuResidentPath()

NeuralNetworkBase<T>.TryForwardGpuOptimized(Tensor<T>, out Tensor<T>)

NeuralNetworkBase<T>.ForwardGpu(Tensor<T>)

NeuralNetworkBase<T>.ForwardDeferred(Tensor<T>)

NeuralNetworkBase<T>.ForwardDeferredAsync(Tensor<T>, CancellationToken)

NeuralNetworkBase<T>.BeginGpuExecution(GpuExecutionOptions)

NeuralNetworkBase<T>.ForwardWithGpuContext(Tensor<T>)

NeuralNetworkBase<T>.ForwardWithGpuContext(IGpuTensor<T>)

NeuralNetworkBase<T>.GetGpuMemoryStats()

NeuralNetworkBase<T>.ForwardWithFeatures(Tensor<T>, int[])

NeuralNetworkBase<T>.ParameterCount

NeuralNetworkBase<T>.GetParameterCount()

NeuralNetworkBase<T>.InvalidateParameterCountCache()

NeuralNetworkBase<T>.AddLayerToCollection(ILayer<T>)

NeuralNetworkBase<T>.RemoveLayerFromCollection(ILayer<T>)

NeuralNetworkBase<T>.ClearLayers()

NeuralNetworkBase<T>.ValidateCustomLayers(List<ILayer<T>>)

NeuralNetworkBase<T>.ValidateCustomLayersInternal(List<ILayer<T>>)

NeuralNetworkBase<T>.IsValidInputLayer(ILayer<T>)

NeuralNetworkBase<T>.IsValidOutputLayer(ILayer<T>)

NeuralNetworkBase<T>.AreLayersCompatible(ILayer<T>, ILayer<T>)

NeuralNetworkBase<T>.GetParameterGradients()

NeuralNetworkBase<T>.EnsureArchitectureInitialized()

NeuralNetworkBase<T>.InitializeLayers()

NeuralNetworkBase<T>.Predict(Tensor<T>)

NeuralNetworkBase<T>.UpdateParameters(Vector<T>)

NeuralNetworkBase<T>.SetTrainingMode(bool)

NeuralNetworkBase<T>.EnableMemoryManagement(TrainingMemoryConfig)

NeuralNetworkBase<T>.DisableMemoryManagement()

NeuralNetworkBase<T>.GetMemoryEstimate(int, int)

NeuralNetworkBase<T>.GetLastLoss()

NeuralNetworkBase<T>.Train(Tensor<T>, Tensor<T>)

NeuralNetworkBase<T>.GetModelMetadata()

NeuralNetworkBase<T>.ResetState()

NeuralNetworkBase<T>.BackwardWithInputGradient(Tensor<T>)

NeuralNetworkBase<T>.ComputeInputGradient(Vector<T>, Vector<T>)

NeuralNetworkBase<T>.ComputeInputGradient(Tensor<T>, Tensor<T>)

NeuralNetworkBase<T>.SaveModel(string)

NeuralNetworkBase<T>.LoadModel(string)

NeuralNetworkBase<T>.Serialize()

NeuralNetworkBase<T>.Deserialize(byte[])

NeuralNetworkBase<T>.SerializeNetworkSpecificData(BinaryWriter)

NeuralNetworkBase<T>.DeserializeNetworkSpecificData(BinaryReader)

NeuralNetworkBase<T>.WithParameters(Vector<T>)

NeuralNetworkBase<T>.GetActiveFeatureIndices()

NeuralNetworkBase<T>.IsFeatureUsed(int)

NeuralNetworkBase<T>.DeepCopy()

NeuralNetworkBase<T>.Clone()

NeuralNetworkBase<T>.CreateNewInstance()

NeuralNetworkBase<T>.SetActiveFeatureIndices(IEnumerable<int>)

NeuralNetworkBase<T>._enabledMethods

NeuralNetworkBase<T>._sensitiveFeatures

NeuralNetworkBase<T>._fairnessMetrics

NeuralNetworkBase<T>._baseModel

NeuralNetworkBase<T>.GetGlobalFeatureImportanceAsync()

NeuralNetworkBase<T>.GetLocalFeatureImportanceAsync(Tensor<T>)

NeuralNetworkBase<T>.GetShapValuesAsync(Tensor<T>)

NeuralNetworkBase<T>.GetLimeExplanationAsync(Tensor<T>, int)

NeuralNetworkBase<T>.GetPartialDependenceAsync(Vector<int>, int)

NeuralNetworkBase<T>.GetCounterfactualAsync(Tensor<T>, Tensor<T>, int)

NeuralNetworkBase<T>.GetModelSpecificInterpretabilityAsync()

NeuralNetworkBase<T>.GenerateTextExplanationAsync(Tensor<T>, Tensor<T>)

NeuralNetworkBase<T>.GetFeatureInteractionAsync(int, int)

NeuralNetworkBase<T>.ValidateFairnessAsync(Tensor<T>, int)

NeuralNetworkBase<T>.GetAnchorExplanationAsync(Tensor<T>, T)

NeuralNetworkBase<T>.SetBaseModel<TInput, TOutput>(IFullModel<T, TInput, TOutput>)

NeuralNetworkBase<T>.EnableMethod(params InterpretationMethod[])

NeuralNetworkBase<T>.ConfigureFairness(Vector<int>, params FairnessMetric[])

NeuralNetworkBase<T>.GetNamedLayerActivations(Tensor<T>)

NeuralNetworkBase<T>.GetArchitecture()

NeuralNetworkBase<T>.GetFeatureImportance()

NeuralNetworkBase<T>.SetParameters(Vector<T>)

NeuralNetworkBase<T>.AddLayer(LayerType, int, ActivationFunction)

NeuralNetworkBase<T>.AddConvolutionalLayer(int, int, int, ActivationFunction)

NeuralNetworkBase<T>.AddLSTMLayer(int, bool)

NeuralNetworkBase<T>.AddDropoutLayer(double)

NeuralNetworkBase<T>.AddBatchNormalizationLayer(int, double, double)

NeuralNetworkBase<T>.AddPoolingLayer(int[], PoolingType, int, int?)

NeuralNetworkBase<T>.GetGradients()

NeuralNetworkBase<T>.GetInputShape()

NeuralNetworkBase<T>.GetLayerActivations(Tensor<T>)

NeuralNetworkBase<T>.ComputeGradients(Tensor<T>, Tensor<T>, ILossFunction<T>)

NeuralNetworkBase<T>.ApplyGradients(Vector<T>, T)

NeuralNetworkBase<T>.SaveState(Stream)

NeuralNetworkBase<T>.LoadState(Stream)

NeuralNetworkBase<T>.Dispose()

NeuralNetworkBase<T>.SupportsJitCompilation

NeuralNetworkBase<T>.ExportComputationGraph(List<ComputationNode<T>>)

NeuralNetworkBase<T>.ConvertLayerToGraph(ILayer<T>, ComputationNode<T>)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Extension Methods: DistributedExtensions.AsDistributedForHighBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributedForLowBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, IShardingConfiguration<T>)

Remarks

This class extends NeuralNetworkBase<T> to provide audio-specific functionality while maintaining full integration with the AiDotNet neural network infrastructure.

For Beginners: Audio neural networks process sound data (like speech or music). This base class provides:

Support for pre-trained ONNX models (fast inference with existing models)
Full training capability from scratch (like other neural networks)
Audio preprocessing utilities (mel spectrograms, etc.)
Sample rate handling

You can use this class in two ways:

Load a pre-trained ONNX model for quick inference
Build and train a new model from scratch

Constructors

AudioNeuralNetworkBase(NeuralNetworkArchitecture<T>, ILossFunction<T>?, double)

Initializes a new instance of the AudioNeuralNetworkBase class with the specified architecture.

protected AudioNeuralNetworkBase(NeuralNetworkArchitecture<T> architecture, ILossFunction<T>? lossFunction = null, double maxGradNorm = 1)

Parameters

architecture NeuralNetworkArchitecture<T>: The neural network architecture.
lossFunction ILossFunction<T>: The loss function to use. If null, a default MSE loss is used.
maxGradNorm double: Maximum gradient norm for gradient clipping.

Properties

DefaultLossFunction

Gets the default loss function for this model.

public override ILossFunction<T> DefaultLossFunction { get; }

Property Value

ILossFunction<T>

IsOnnxMode

Gets whether this model is running in ONNX inference mode.

public bool IsOnnxMode { get; }

Property Value

bool

Remarks

When true, the model uses pre-trained ONNX weights for inference. When false, the model uses native layers and can be trained.

MelSpec

Gets the mel spectrogram extractor for preprocessing.

protected MelSpectrogram<T>? MelSpec { get; set; }

Property Value

MelSpectrogram<T>

NumMels

Gets the number of mel spectrogram channels used by this model.

public int NumMels { get; protected set; }

Property Value

int

Remarks

Mel spectrograms divide the frequency range into perceptual bands. Common values: 64, 80, or 128 mel bins.

OnnxDecoder

Gets or sets the ONNX decoder model (for encoder-decoder architectures).

protected OnnxModel<T>? OnnxDecoder { get; set; }

Property Value

OnnxModel<T>

OnnxEncoder

Gets or sets the ONNX encoder model (for encoder-decoder architectures).

protected OnnxModel<T>? OnnxEncoder { get; set; }

Property Value

OnnxModel<T>

OnnxModel

Gets or sets the ONNX model (for single-model architectures).

protected OnnxModel<T>? OnnxModel { get; set; }

Property Value

OnnxModel<T>

SampleRate

Gets the sample rate expected by this model.

public int SampleRate { get; protected set; }

Property Value

int

Remarks

Common values: 16000 Hz (speech), 22050 Hz (music), 44100 Hz (high quality). Input audio should be resampled to match this rate.

SupportsTraining

Gets whether this network supports training.

public override bool SupportsTraining { get; }

Property Value

bool

Remarks

In ONNX mode, training is not supported - the model is inference-only. In native mode, training is fully supported.

Methods

CreateMelSpectrogram(int, int, int, int)

Creates a mel spectrogram extractor with the model's settings.

protected MelSpectrogram<T> CreateMelSpectrogram(int sampleRate = 16000, int nMels = 80, int nFft = 1024, int hopLength = 256)

Parameters

sampleRate int: Sample rate of input audio.
nMels int: Number of mel bands.
nFft int: FFT window size.
hopLength int: Hop length between frames.

Returns

MelSpectrogram<T>: A configured mel spectrogram extractor.

Dispose(bool)

Disposes of resources used by this model.

protected override void Dispose(bool disposing)

Parameters

disposing bool: True if disposing managed resources.

Forward(Tensor<T>)

Performs a forward pass through the native neural network layers.

protected virtual Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>: Preprocessed input tensor.

Returns

Tensor<T>: Model output tensor.

PostprocessOutput(Tensor<T>)

Postprocesses model output into the final result format.

protected abstract Tensor<T> PostprocessOutput(Tensor<T> modelOutput)

Parameters

modelOutput Tensor<T>: Raw output from the model.

Returns

Tensor<T>: Postprocessed output in the expected format.

PreprocessAudio(Tensor<T>)

Preprocesses raw audio for model input.

protected abstract Tensor<T> PreprocessAudio(Tensor<T> rawAudio)

Parameters

rawAudio Tensor<T>: Raw audio waveform tensor [samples] or [batch, samples].

Returns

Tensor<T>: Preprocessed audio features suitable for model input.

Remarks

For Beginners: Raw audio is just a series of numbers representing sound pressure. Neural networks often work better with transformed representations like mel spectrograms. This method converts raw audio into the format the model expects.

RunOnnxInference(Tensor<T>)

Runs inference using ONNX model(s).

protected virtual Tensor<T> RunOnnxInference(Tensor<T> input)

Parameters

input Tensor<T>: Preprocessed input tensor.

Returns

Tensor<T>: Model output tensor.

Remarks

Override this method to implement ONNX-specific inference logic for models with complex encoder-decoder or multi-model architectures.

Table of Contents

Class AudioNeuralNetworkBase<T>

Type Parameters

Remarks

Constructors

AudioNeuralNetworkBase(NeuralNetworkArchitecture<T>, ILossFunction<T>?, double)

Parameters

Properties

DefaultLossFunction

Property Value

IsOnnxMode

Property Value

Remarks

MelSpec

Property Value

NumMels

Property Value

Remarks

OnnxDecoder

Property Value

OnnxEncoder

Property Value

OnnxModel

Property Value

SampleRate

Property Value

Remarks

SupportsTraining

Property Value

Remarks

Methods

CreateMelSpectrogram(int, int, int, int)

Parameters

Returns

Dispose(bool)

Parameters

Forward(Tensor<T>)

Parameters

Returns

PostprocessOutput(Tensor<T>)

Parameters

Returns

PreprocessAudio(Tensor<T>)

Parameters

Returns

Remarks

RunOnnxInference(Tensor<T>)

Parameters

Returns

Remarks