Table of Contents

Interface ISceneClassifier<T>

Namespace
AiDotNet.Interfaces
Assembly
AiDotNet.dll

Interface for acoustic scene classification models that identify the environment/context of audio.

public interface ISceneClassifier<T> : IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>

Type Parameters

T

The numeric type used for calculations.

Inherited Members
Extension Methods

Remarks

Acoustic scene classification (ASC) identifies the environment or context where audio was recorded. Unlike event detection which finds specific sounds, scene classification characterizes the overall acoustic atmosphere.

For Beginners: Scene classification is like asking "Where was this recording made?"

How it works:

  1. Audio features capture the overall acoustic character
  2. A classifier matches these features to known scene types
  3. The most likely scene (and alternatives) are returned

Example scenes:

  • Indoor: Office, restaurant, kitchen, library, shopping mall
  • Outdoor: Park, street, beach, forest, construction site
  • Transportation: Car, bus, train, metro, airport

How scenes differ from events:

  • Event: "A dog barked" (specific sound)
  • Scene: "This was recorded in a park" (overall environment)

Use cases:

  • Context-aware devices (adjust phone behavior based on location)
  • Audio organization (group recordings by location)
  • Surveillance (detect unusual environments)
  • AR/VR (match virtual audio to real environment)
  • Assistive technology (describe environment to blind users)

This interface extends IFullModel<T, TInput, TOutput> for Tensor-based audio processing.

Properties

IsOnnxMode

Gets whether this model is running in ONNX inference mode.

bool IsOnnxMode { get; }

Property Value

bool

MinimumDurationSeconds

Gets the minimum audio duration required for reliable classification.

double MinimumDurationSeconds { get; }

Property Value

double

SampleRate

Gets the expected sample rate for input audio.

int SampleRate { get; }

Property Value

int

SupportedScenes

Gets the list of scenes this model can classify.

IReadOnlyList<string> SupportedScenes { get; }

Property Value

IReadOnlyList<string>

Methods

Classify(Tensor<T>)

Classifies the acoustic scene of audio.

SceneClassificationResult<T> Classify(Tensor<T> audio)

Parameters

audio Tensor<T>

Audio waveform tensor [samples] or [channels, samples].

Returns

SceneClassificationResult<T>

Scene classification result.

Remarks

For Beginners: This is the main method for identifying the scene. - Pass in a recording - Get back where it was likely recorded (office, park, etc.)

ClassifyAsync(Tensor<T>, CancellationToken)

Classifies acoustic scene asynchronously.

Task<SceneClassificationResult<T>> ClassifyAsync(Tensor<T> audio, CancellationToken cancellationToken = default)

Parameters

audio Tensor<T>

Audio waveform tensor.

cancellationToken CancellationToken

Cancellation token for async operation.

Returns

Task<SceneClassificationResult<T>>

Scene classification result.

ExtractAcousticFeatures(Tensor<T>)

Extracts acoustic features used for scene classification.

Tensor<T> ExtractAcousticFeatures(Tensor<T> audio)

Parameters

audio Tensor<T>

Audio waveform tensor.

Returns

Tensor<T>

Feature tensor capturing acoustic characteristics.

GetSceneProbabilities(Tensor<T>)

Gets scene probabilities for all supported scenes.

IReadOnlyDictionary<string, T> GetSceneProbabilities(Tensor<T> audio)

Parameters

audio Tensor<T>

Audio waveform tensor.

Returns

IReadOnlyDictionary<string, T>

Dictionary mapping scene names to probability scores.

GetTopScenes(Tensor<T>, int)

Gets top-K scene predictions.

IReadOnlyList<ScenePrediction<T>> GetTopScenes(Tensor<T> audio, int k = 5)

Parameters

audio Tensor<T>

Audio waveform tensor.

k int

Number of top scenes to return.

Returns

IReadOnlyList<ScenePrediction<T>>

List of top scene predictions.

TrackSceneChanges(Tensor<T>, double)

Tracks scene changes over time in longer audio.

SceneTrackingResult<T> TrackSceneChanges(Tensor<T> audio, double segmentDuration = 10)

Parameters

audio Tensor<T>

Audio waveform tensor.

segmentDuration double

Duration of each analysis segment in seconds.

Returns

SceneTrackingResult<T>

Scene tracking result showing scene over time.

Remarks

For Beginners: For longer recordings that might move between places (like walking from street to inside a building), this tracks the scene changes.