Class SceneClassifier<T>

Namespace: AiDotNet.Audio.Classification

Assembly: AiDotNet.dll

Acoustic scene classification model for identifying recording environments.

public class SceneClassifier<T> : AudioClassifierBase<T>, INeuralNetworkModel<T>, INeuralNetwork<T>, IInterpretableModel<T>, IInputGradientComputable<T>, IDisposable, ISceneClassifier<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>

Type Parameters

T: The numeric type used for calculations.

Inheritance: object

NeuralNetworkBase<T>

AudioNeuralNetworkBase<T>

AudioClassifierBase<T>

SceneClassifier<T>

Implements: INeuralNetworkModel<T>

INeuralNetwork<T>

IInterpretableModel<T>

IInputGradientComputable<T>

IDisposable

ISceneClassifier<T>

IFullModel<T, Tensor<T>, Tensor<T>>

IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>

IModelSerializer

ICheckpointableModel

IParameterizable<T, Tensor<T>, Tensor<T>>

IFeatureAware

IFeatureImportance<T>

ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>

IGradientComputable<T, Tensor<T>, Tensor<T>>

IJitCompilable<T>

Inherited Members: AudioClassifierBase<T>.ClassLabels

AudioClassifierBase<T>.NumClasses

AudioClassifierBase<T>.ApplySoftmax(Vector<T>)

AudioClassifierBase<T>.ApplySoftmax(Tensor<T>)

AudioClassifierBase<T>.GetTopK(Dictionary<string, T>, int)

AudioClassifierBase<T>.GetPrediction(Dictionary<string, T>)

AudioClassifierBase<T>.ApplyThreshold(Dictionary<string, T>, T)

AudioClassifierBase<T>.ComputeClassWeights(Dictionary<string, int>)

AudioNeuralNetworkBase<T>.SampleRate

AudioNeuralNetworkBase<T>.NumMels

AudioNeuralNetworkBase<T>.IsOnnxMode

AudioNeuralNetworkBase<T>.OnnxEncoder

AudioNeuralNetworkBase<T>.OnnxDecoder

AudioNeuralNetworkBase<T>.OnnxModel

AudioNeuralNetworkBase<T>.MelSpec

AudioNeuralNetworkBase<T>.SupportsTraining

AudioNeuralNetworkBase<T>.RunOnnxInference(Tensor<T>)

AudioNeuralNetworkBase<T>.Forward(Tensor<T>)

AudioNeuralNetworkBase<T>.DefaultLossFunction

AudioNeuralNetworkBase<T>.CreateMelSpectrogram(int, int, int, int)

NeuralNetworkBase<T>.Layers

NeuralNetworkBase<T>.LayerCount

NeuralNetworkBase<T>.Architecture

NeuralNetworkBase<T>.NumOps

NeuralNetworkBase<T>.Engine

NeuralNetworkBase<T>._layerInputs

NeuralNetworkBase<T>._layerOutputs

NeuralNetworkBase<T>.Random

NeuralNetworkBase<T>.LossFunction

NeuralNetworkBase<T>.LastLoss

NeuralNetworkBase<T>.IsTrainingMode

NeuralNetworkBase<T>.SupportsGpuTraining

NeuralNetworkBase<T>.CanTrainOnGpu

NeuralNetworkBase<T>.GpuEngine

NeuralNetworkBase<T>.MaxGradNorm

NeuralNetworkBase<T>._mixedPrecisionContext

NeuralNetworkBase<T>._memoryManager

NeuralNetworkBase<T>.IsMemoryManagementEnabled

NeuralNetworkBase<T>.IsGradientCheckpointingEnabled

NeuralNetworkBase<T>.IsMixedPrecisionEnabled

NeuralNetworkBase<T>.ClipGradients(List<Tensor<T>>)

NeuralNetworkBase<T>.ClipGradient(Tensor<T>)

NeuralNetworkBase<T>.ClipGradient(Vector<T>)

NeuralNetworkBase<T>.GetParameters()

NeuralNetworkBase<T>.Backpropagate(Tensor<T>)

NeuralNetworkBase<T>.BackpropagateWithRecompute(Tensor<T>)

NeuralNetworkBase<T>.ForwardGpu(IGpuTensor<T>)

NeuralNetworkBase<T>.BackpropagateGpu(IGpuTensor<T>)

NeuralNetworkBase<T>.BackpropagateGpuDeferred(IGpuTensor<T>, GpuExecutionOptions)

NeuralNetworkBase<T>.UpdateParametersGpu(T, T, T)

NeuralNetworkBase<T>.UpdateParametersGpu(IGpuOptimizerConfig)

NeuralNetworkBase<T>.UpdateParametersGpuDeferred(IGpuOptimizerConfig, GpuExecutionOptions)

NeuralNetworkBase<T>.TrainBatchGpuDeferred(IGpuTensor<T>, IGpuTensor<T>, IGpuOptimizerConfig, GpuExecutionOptions)

NeuralNetworkBase<T>.TrainBatchGpuDeferredAsync(IGpuTensor<T>, IGpuTensor<T>, IGpuOptimizerConfig, GpuExecutionOptions, CancellationToken)

NeuralNetworkBase<T>.UploadWeightsToGpu()

NeuralNetworkBase<T>.DownloadWeightsFromGpu()

NeuralNetworkBase<T>.ZeroGradientsGpu()

NeuralNetworkBase<T>.ExtractSingleExample(Tensor<T>, int)

NeuralNetworkBase<T>.ForwardWithMemory(Tensor<T>)

NeuralNetworkBase<T>.ForwardWithCheckpointing(Tensor<T>)

NeuralNetworkBase<T>.CanUseGpuResidentPath()

NeuralNetworkBase<T>.TryForwardGpuOptimized(Tensor<T>, out Tensor<T>)

NeuralNetworkBase<T>.ForwardGpu(Tensor<T>)

NeuralNetworkBase<T>.ForwardDeferred(Tensor<T>)

NeuralNetworkBase<T>.ForwardDeferredAsync(Tensor<T>, CancellationToken)

NeuralNetworkBase<T>.BeginGpuExecution(GpuExecutionOptions)

NeuralNetworkBase<T>.ForwardWithGpuContext(Tensor<T>)

NeuralNetworkBase<T>.ForwardWithGpuContext(IGpuTensor<T>)

NeuralNetworkBase<T>.GetGpuMemoryStats()

NeuralNetworkBase<T>.ForwardWithFeatures(Tensor<T>, int[])

NeuralNetworkBase<T>.ParameterCount

NeuralNetworkBase<T>.GetParameterCount()

NeuralNetworkBase<T>.InvalidateParameterCountCache()

NeuralNetworkBase<T>.AddLayerToCollection(ILayer<T>)

NeuralNetworkBase<T>.RemoveLayerFromCollection(ILayer<T>)

NeuralNetworkBase<T>.ClearLayers()

NeuralNetworkBase<T>.ValidateCustomLayers(List<ILayer<T>>)

NeuralNetworkBase<T>.ValidateCustomLayersInternal(List<ILayer<T>>)

NeuralNetworkBase<T>.IsValidInputLayer(ILayer<T>)

NeuralNetworkBase<T>.IsValidOutputLayer(ILayer<T>)

NeuralNetworkBase<T>.AreLayersCompatible(ILayer<T>, ILayer<T>)

NeuralNetworkBase<T>.GetParameterGradients()

NeuralNetworkBase<T>.EnsureArchitectureInitialized()

NeuralNetworkBase<T>.SetTrainingMode(bool)

NeuralNetworkBase<T>.EnableMemoryManagement(TrainingMemoryConfig)

NeuralNetworkBase<T>.DisableMemoryManagement()

NeuralNetworkBase<T>.GetMemoryEstimate(int, int)

NeuralNetworkBase<T>.GetLastLoss()

NeuralNetworkBase<T>.ResetState()

NeuralNetworkBase<T>.BackwardWithInputGradient(Tensor<T>)

NeuralNetworkBase<T>.ComputeInputGradient(Vector<T>, Vector<T>)

NeuralNetworkBase<T>.ComputeInputGradient(Tensor<T>, Tensor<T>)

NeuralNetworkBase<T>.SaveModel(string)

NeuralNetworkBase<T>.LoadModel(string)

NeuralNetworkBase<T>.Serialize()

NeuralNetworkBase<T>.Deserialize(byte[])

NeuralNetworkBase<T>.WithParameters(Vector<T>)

NeuralNetworkBase<T>.GetActiveFeatureIndices()

NeuralNetworkBase<T>.IsFeatureUsed(int)

NeuralNetworkBase<T>.DeepCopy()

NeuralNetworkBase<T>.Clone()

NeuralNetworkBase<T>.SetActiveFeatureIndices(IEnumerable<int>)

NeuralNetworkBase<T>._enabledMethods

NeuralNetworkBase<T>._sensitiveFeatures

NeuralNetworkBase<T>._fairnessMetrics

NeuralNetworkBase<T>._baseModel

NeuralNetworkBase<T>.GetGlobalFeatureImportanceAsync()

NeuralNetworkBase<T>.GetLocalFeatureImportanceAsync(Tensor<T>)

NeuralNetworkBase<T>.GetShapValuesAsync(Tensor<T>)

NeuralNetworkBase<T>.GetLimeExplanationAsync(Tensor<T>, int)

NeuralNetworkBase<T>.GetPartialDependenceAsync(Vector<int>, int)

NeuralNetworkBase<T>.GetCounterfactualAsync(Tensor<T>, Tensor<T>, int)

NeuralNetworkBase<T>.GetModelSpecificInterpretabilityAsync()

NeuralNetworkBase<T>.GenerateTextExplanationAsync(Tensor<T>, Tensor<T>)

NeuralNetworkBase<T>.GetFeatureInteractionAsync(int, int)

NeuralNetworkBase<T>.ValidateFairnessAsync(Tensor<T>, int)

NeuralNetworkBase<T>.GetAnchorExplanationAsync(Tensor<T>, T)

NeuralNetworkBase<T>.SetBaseModel<TInput, TOutput>(IFullModel<T, TInput, TOutput>)

NeuralNetworkBase<T>.EnableMethod(params InterpretationMethod[])

NeuralNetworkBase<T>.ConfigureFairness(Vector<int>, params FairnessMetric[])

NeuralNetworkBase<T>.GetNamedLayerActivations(Tensor<T>)

NeuralNetworkBase<T>.GetArchitecture()

NeuralNetworkBase<T>.GetFeatureImportance()

NeuralNetworkBase<T>.SetParameters(Vector<T>)

NeuralNetworkBase<T>.AddLayer(LayerType, int, ActivationFunction)

NeuralNetworkBase<T>.AddConvolutionalLayer(int, int, int, ActivationFunction)

NeuralNetworkBase<T>.AddLSTMLayer(int, bool)

NeuralNetworkBase<T>.AddDropoutLayer(double)

NeuralNetworkBase<T>.AddBatchNormalizationLayer(int, double, double)

NeuralNetworkBase<T>.AddPoolingLayer(int[], PoolingType, int, int?)

NeuralNetworkBase<T>.GetGradients()

NeuralNetworkBase<T>.GetInputShape()

NeuralNetworkBase<T>.GetLayerActivations(Tensor<T>)

NeuralNetworkBase<T>.ComputeGradients(Tensor<T>, Tensor<T>, ILossFunction<T>)

NeuralNetworkBase<T>.ApplyGradients(Vector<T>, T)

NeuralNetworkBase<T>.SaveState(Stream)

NeuralNetworkBase<T>.LoadState(Stream)

NeuralNetworkBase<T>.Dispose()

NeuralNetworkBase<T>.SupportsJitCompilation

NeuralNetworkBase<T>.ExportComputationGraph(List<ComputationNode<T>>)

NeuralNetworkBase<T>.ConvertLayerToGraph(ILayer<T>, ComputationNode<T>)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Extension Methods: DistributedExtensions.AsDistributedForHighBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributedForLowBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, IShardingConfiguration<T>)

Remarks

Classifies audio recordings by their acoustic environment or scene context. Based on DCASE (Detection and Classification of Acoustic Scenes and Events) challenge.

For Beginners: Scene classification answers "Where was this recorded?":

Indoor: office, home, shopping mall, restaurant, library
Outdoor: street, park, beach, forest, construction site
Transportation: bus, train, metro, airport, car

Usage with ONNX model:

var classifier = new SceneClassifier<float>("model.onnx");
var result = classifier.Classify(audioTensor);
Console.WriteLine($"Scene: {result.PredictedScene} ({result.Confidence})");

Usage for training:

var architecture = new NeuralNetworkArchitecture<float>(inputFeatures: 60, outputSize: 30);
var classifier = new SceneClassifier<float>(architecture);
classifier.Train(features, labels);
var result = classifier.Classify(newAudio);

Constructors

SceneClassifier(SceneClassifierOptions?)

Creates a SceneClassifier with default options for basic classification.

public SceneClassifier(SceneClassifierOptions? options = null)

Parameters

options SceneClassifierOptions: Optional configuration options.

SceneClassifier(NeuralNetworkArchitecture<T>, SceneClassifierOptions?, IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>?)

Creates a SceneClassifier for native training mode.

public SceneClassifier(NeuralNetworkArchitecture<T> architecture, SceneClassifierOptions? options = null, IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>? optimizer = null)

Parameters

architecture NeuralNetworkArchitecture<T>: Neural network architecture.
options SceneClassifierOptions: Optional configuration options.
optimizer IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>: Optional custom optimizer.

SceneClassifier(string, SceneClassifierOptions?)

Creates a SceneClassifier for ONNX inference mode.

public SceneClassifier(string modelPath, SceneClassifierOptions? options = null)

Parameters

modelPath string: Path to the ONNX model file.
options SceneClassifierOptions: Optional configuration options.

Fields

StandardScenes

Standard acoustic scene labels (DCASE-style).

public static readonly string[] StandardScenes

Field Value

string[]

Properties

MinimumDurationSeconds

Gets the minimum audio duration required for reliable classification.

public double MinimumDurationSeconds { get; }

Property Value

double

Scenes

Gets the scenes (alias for SupportedScenes for legacy API compatibility).

public IReadOnlyList<string> Scenes { get; }

Property Value

IReadOnlyList<string>

SupportedScenes

Gets the list of scenes this model can classify.

public IReadOnlyList<string> SupportedScenes { get; }

Property Value

IReadOnlyList<string>

Methods

Classify(Tensor<T>)

Classifies the acoustic scene of an audio recording.

public SceneClassificationResult<T> Classify(Tensor<T> audio)

Parameters

audio Tensor<T>

Returns

SceneClassificationResult<T>

ClassifyAsync(Tensor<T>, CancellationToken)

Classifies the acoustic scene asynchronously.

public Task<SceneClassificationResult<T>> ClassifyAsync(Tensor<T> audio, CancellationToken cancellationToken = default)

Parameters

audio Tensor<T>
cancellationToken CancellationToken

Returns

Task<SceneClassificationResult<T>>

ClassifyCategory(Tensor<T>)

Classifies audio and returns category with confidence (legacy API compatibility).

public (string Category, T Confidence) ClassifyCategory(Tensor<T> audio)

Parameters

audio Tensor<T>: Audio waveform tensor.

Returns

(string Label, T Confidence): Tuple of (category, confidence).

CreateAsync(SceneClassifierOptions?, IProgress<double>?, CancellationToken)

Creates a SceneClassifier asynchronously with model download.

public static Task<SceneClassifier<T>> CreateAsync(SceneClassifierOptions? options = null, IProgress<double>? progress = null, CancellationToken cancellationToken = default)

Parameters

options SceneClassifierOptions
progress IProgress<double>
cancellationToken CancellationToken

Returns

Task<SceneClassifier<T>>

CreateNewInstance()

Creates a new instance of this network type.

protected override IFullModel<T, Tensor<T>, Tensor<T>> CreateNewInstance()

Returns

IFullModel<T, Tensor<T>, Tensor<T>>

DeserializeNetworkSpecificData(BinaryReader)

Deserializes network-specific data.

protected override void DeserializeNetworkSpecificData(BinaryReader reader)

Parameters

reader BinaryReader

Dispose(bool)

Disposes of managed resources.

protected override void Dispose(bool disposing)

Parameters

disposing bool

ExtractAcousticFeatures(Tensor<T>)

Extracts acoustic features used for scene classification.

public Tensor<T> ExtractAcousticFeatures(Tensor<T> audio)

Parameters

audio Tensor<T>

Returns

Tensor<T>

GetModelMetadata()

Gets model metadata for serialization.

public override ModelMetadata<T> GetModelMetadata()

Returns

ModelMetadata<T>

GetSceneProbabilities(Tensor<T>)

Gets scene probabilities for all supported scenes.

public IReadOnlyDictionary<string, T> GetSceneProbabilities(Tensor<T> audio)

Parameters

audio Tensor<T>

Returns

IReadOnlyDictionary<string, T>

GetTopScenes(Tensor<T>, int)

Gets top-K scene predictions.

public IReadOnlyList<ScenePrediction<T>> GetTopScenes(Tensor<T> audio, int k = 5)

Parameters

audio Tensor<T>
k int

Returns

IReadOnlyList<ScenePrediction<T>>

InitializeLayers()

Initializes the neural network layers.

protected override void InitializeLayers()

PostprocessOutput(Tensor<T>)

Postprocesses model output into final predictions.

protected override Tensor<T> PostprocessOutput(Tensor<T> modelOutput)

Parameters

modelOutput Tensor<T>

Returns

Tensor<T>

Predict(Tensor<T>)

Predicts scene probabilities from audio features.

public override Tensor<T> Predict(Tensor<T> input)

Parameters

input Tensor<T>

Returns

Tensor<T>

PreprocessAudio(Tensor<T>)

Preprocesses raw audio into model input format.

protected override Tensor<T> PreprocessAudio(Tensor<T> rawAudio)

Parameters

rawAudio Tensor<T>

Returns

Tensor<T>

SerializeNetworkSpecificData(BinaryWriter)

Serializes network-specific data.

protected override void SerializeNetworkSpecificData(BinaryWriter writer)

Parameters

writer BinaryWriter

TrackSceneChanges(Tensor<T>, double)

Tracks scene changes over time in longer audio.

public SceneTrackingResult<T> TrackSceneChanges(Tensor<T> audio, double segmentDuration = 10)

Parameters

audio Tensor<T>
segmentDuration double

Returns

SceneTrackingResult<T>

Train(Tensor<T>, Tensor<T>)

Trains the model on labeled audio samples.

public override void Train(Tensor<T> input, Tensor<T> expected)

Parameters

input Tensor<T>
expected Tensor<T>

UpdateParameters(Vector<T>)

Updates parameters from a flattened parameter vector.

public override void UpdateParameters(Vector<T> parameters)

Parameters

parameters Vector<T>

Table of Contents

Class SceneClassifier<T>

Type Parameters

Remarks

Constructors

SceneClassifier(SceneClassifierOptions?)

Parameters

SceneClassifier(NeuralNetworkArchitecture<T>, SceneClassifierOptions?, IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>?)

Parameters

SceneClassifier(string, SceneClassifierOptions?)

Parameters

Fields

StandardScenes

Field Value

Properties

MinimumDurationSeconds

Property Value

Scenes

Property Value

SupportedScenes

Property Value

Methods

Classify(Tensor<T>)

Parameters

Returns

ClassifyAsync(Tensor<T>, CancellationToken)

Parameters

Returns

ClassifyCategory(Tensor<T>)

Parameters

Returns

CreateAsync(SceneClassifierOptions?, IProgress<double>?, CancellationToken)

Parameters

Returns

CreateNewInstance()

Returns

DeserializeNetworkSpecificData(BinaryReader)

Parameters

Dispose(bool)

Parameters

ExtractAcousticFeatures(Tensor<T>)

Parameters

Returns

GetModelMetadata()

Returns

GetSceneProbabilities(Tensor<T>)

Parameters

Returns

GetTopScenes(Tensor<T>, int)

Parameters

Returns

InitializeLayers()

PostprocessOutput(Tensor<T>)

Parameters

Returns

Predict(Tensor<T>)

Parameters

Returns

PreprocessAudio(Tensor<T>)

Parameters

Returns

SerializeNetworkSpecificData(BinaryWriter)

Parameters

TrackSceneChanges(Tensor<T>, double)

Parameters

Returns

Train(Tensor<T>, Tensor<T>)

Parameters

UpdateParameters(Vector<T>)

Parameters