Interface ISceneClassifier<T>

Namespace: AiDotNet.Interfaces

Assembly: AiDotNet.dll

Interface for acoustic scene classification models that identify the environment/context of audio.

public interface ISceneClassifier<T> : IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>

Type Parameters

T: The numeric type used for calculations.

Inherited Members: IFullModel<T, Tensor<T>, Tensor<T>>.DefaultLossFunction

IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>.Train(Tensor<T>, Tensor<T>)

IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>.Predict(Tensor<T>)

IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>.GetModelMetadata()

IModelSerializer.Serialize()

IModelSerializer.Deserialize(byte[])

IModelSerializer.SaveModel(string)

IModelSerializer.LoadModel(string)

ICheckpointableModel.SaveState(Stream)

ICheckpointableModel.LoadState(Stream)

IParameterizable<T, Tensor<T>, Tensor<T>>.GetParameters()

IParameterizable<T, Tensor<T>, Tensor<T>>.SetParameters(Vector<T>)

IParameterizable<T, Tensor<T>, Tensor<T>>.ParameterCount

IParameterizable<T, Tensor<T>, Tensor<T>>.WithParameters(Vector<T>)

IFeatureAware.GetActiveFeatureIndices()

IFeatureAware.SetActiveFeatureIndices(IEnumerable<int>)

IFeatureAware.IsFeatureUsed(int)

IFeatureImportance<T>.GetFeatureImportance()

ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>.DeepCopy()

ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>.Clone()

IGradientComputable<T, Tensor<T>, Tensor<T>>.ComputeGradients(Tensor<T>, Tensor<T>, ILossFunction<T>)

IGradientComputable<T, Tensor<T>, Tensor<T>>.ApplyGradients(Vector<T>, T)

IJitCompilable<T>.ExportComputationGraph(List<ComputationNode<T>>)

IJitCompilable<T>.SupportsJitCompilation

Extension Methods: DistributedExtensions.AsDistributedForHighBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributedForLowBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, IShardingConfiguration<T>)

Remarks

Acoustic scene classification (ASC) identifies the environment or context where audio was recorded. Unlike event detection which finds specific sounds, scene classification characterizes the overall acoustic atmosphere.

For Beginners: Scene classification is like asking "Where was this recording made?"

How it works:

Audio features capture the overall acoustic character
A classifier matches these features to known scene types
The most likely scene (and alternatives) are returned

Example scenes:

Indoor: Office, restaurant, kitchen, library, shopping mall
Outdoor: Park, street, beach, forest, construction site
Transportation: Car, bus, train, metro, airport

How scenes differ from events:

Event: "A dog barked" (specific sound)
Scene: "This was recorded in a park" (overall environment)

Use cases:

Context-aware devices (adjust phone behavior based on location)
Audio organization (group recordings by location)
Surveillance (detect unusual environments)
AR/VR (match virtual audio to real environment)
Assistive technology (describe environment to blind users)

This interface extends IFullModel<T, TInput, TOutput> for Tensor-based audio processing.

Properties

IsOnnxMode

Gets whether this model is running in ONNX inference mode.

bool IsOnnxMode { get; }

Property Value

bool

MinimumDurationSeconds

Gets the minimum audio duration required for reliable classification.

double MinimumDurationSeconds { get; }

Property Value

double

SampleRate

Gets the expected sample rate for input audio.

int SampleRate { get; }

Property Value

int

SupportedScenes

Gets the list of scenes this model can classify.

IReadOnlyList<string> SupportedScenes { get; }

Property Value

IReadOnlyList<string>

Methods

Classify(Tensor<T>)

Classifies the acoustic scene of audio.

SceneClassificationResult<T> Classify(Tensor<T> audio)

Parameters

audio Tensor<T>: Audio waveform tensor [samples] or [channels, samples].

Returns

SceneClassificationResult<T>: Scene classification result.

Remarks

For Beginners: This is the main method for identifying the scene. - Pass in a recording - Get back where it was likely recorded (office, park, etc.)

ClassifyAsync(Tensor<T>, CancellationToken)

Classifies acoustic scene asynchronously.

Task<SceneClassificationResult<T>> ClassifyAsync(Tensor<T> audio, CancellationToken cancellationToken = default)

Parameters

audio Tensor<T>: Audio waveform tensor.
cancellationToken CancellationToken: Cancellation token for async operation.

Returns

Task<SceneClassificationResult<T>>: Scene classification result.

ExtractAcousticFeatures(Tensor<T>)

Extracts acoustic features used for scene classification.

Tensor<T> ExtractAcousticFeatures(Tensor<T> audio)

Parameters

audio Tensor<T>: Audio waveform tensor.

Returns

Tensor<T>: Feature tensor capturing acoustic characteristics.

GetSceneProbabilities(Tensor<T>)

Gets scene probabilities for all supported scenes.

IReadOnlyDictionary<string, T> GetSceneProbabilities(Tensor<T> audio)

Parameters

audio Tensor<T>: Audio waveform tensor.

Returns

IReadOnlyDictionary<string, T>: Dictionary mapping scene names to probability scores.

GetTopScenes(Tensor<T>, int)

Gets top-K scene predictions.

IReadOnlyList<ScenePrediction<T>> GetTopScenes(Tensor<T> audio, int k = 5)

Parameters

audio Tensor<T>: Audio waveform tensor.
k int: Number of top scenes to return.

Returns

IReadOnlyList<ScenePrediction<T>>: List of top scene predictions.

TrackSceneChanges(Tensor<T>, double)

Tracks scene changes over time in longer audio.

SceneTrackingResult<T> TrackSceneChanges(Tensor<T> audio, double segmentDuration = 10)

Parameters

audio Tensor<T>: Audio waveform tensor.
segmentDuration double: Duration of each analysis segment in seconds.

Returns

SceneTrackingResult<T>: Scene tracking result showing scene over time.

Remarks

For Beginners: For longer recordings that might move between places (like walking from street to inside a building), this tracks the scene changes.

Table of Contents

Interface ISceneClassifier<T>

Type Parameters

Remarks

Properties

IsOnnxMode

Property Value

MinimumDurationSeconds

Property Value

SampleRate

Property Value

SupportedScenes

Property Value

Methods

Classify(Tensor<T>)

Parameters

Returns

Remarks

ClassifyAsync(Tensor<T>, CancellationToken)

Parameters

Returns

ExtractAcousticFeatures(Tensor<T>)

Parameters

Returns

GetSceneProbabilities(Tensor<T>)

Parameters

Returns

GetTopScenes(Tensor<T>, int)

Parameters

Returns

TrackSceneChanges(Tensor<T>, double)

Parameters

Returns

Remarks