Interface ISceneClassifier<T>
- Namespace
- AiDotNet.Interfaces
- Assembly
- AiDotNet.dll
Interface for acoustic scene classification models that identify the environment/context of audio.
public interface ISceneClassifier<T> : IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>
Type Parameters
TThe numeric type used for calculations.
- Inherited Members
- Extension Methods
Remarks
Acoustic scene classification (ASC) identifies the environment or context where audio was recorded. Unlike event detection which finds specific sounds, scene classification characterizes the overall acoustic atmosphere.
For Beginners: Scene classification is like asking "Where was this recording made?"
How it works:
- Audio features capture the overall acoustic character
- A classifier matches these features to known scene types
- The most likely scene (and alternatives) are returned
Example scenes:
- Indoor: Office, restaurant, kitchen, library, shopping mall
- Outdoor: Park, street, beach, forest, construction site
- Transportation: Car, bus, train, metro, airport
How scenes differ from events:
- Event: "A dog barked" (specific sound)
- Scene: "This was recorded in a park" (overall environment)
Use cases:
- Context-aware devices (adjust phone behavior based on location)
- Audio organization (group recordings by location)
- Surveillance (detect unusual environments)
- AR/VR (match virtual audio to real environment)
- Assistive technology (describe environment to blind users)
This interface extends IFullModel<T, TInput, TOutput> for Tensor-based audio processing.
Properties
IsOnnxMode
Gets whether this model is running in ONNX inference mode.
bool IsOnnxMode { get; }
Property Value
MinimumDurationSeconds
Gets the minimum audio duration required for reliable classification.
double MinimumDurationSeconds { get; }
Property Value
SampleRate
Gets the expected sample rate for input audio.
int SampleRate { get; }
Property Value
SupportedScenes
Gets the list of scenes this model can classify.
IReadOnlyList<string> SupportedScenes { get; }
Property Value
Methods
Classify(Tensor<T>)
Classifies the acoustic scene of audio.
SceneClassificationResult<T> Classify(Tensor<T> audio)
Parameters
audioTensor<T>Audio waveform tensor [samples] or [channels, samples].
Returns
- SceneClassificationResult<T>
Scene classification result.
Remarks
For Beginners: This is the main method for identifying the scene. - Pass in a recording - Get back where it was likely recorded (office, park, etc.)
ClassifyAsync(Tensor<T>, CancellationToken)
Classifies acoustic scene asynchronously.
Task<SceneClassificationResult<T>> ClassifyAsync(Tensor<T> audio, CancellationToken cancellationToken = default)
Parameters
audioTensor<T>Audio waveform tensor.
cancellationTokenCancellationTokenCancellation token for async operation.
Returns
- Task<SceneClassificationResult<T>>
Scene classification result.
ExtractAcousticFeatures(Tensor<T>)
Extracts acoustic features used for scene classification.
Tensor<T> ExtractAcousticFeatures(Tensor<T> audio)
Parameters
audioTensor<T>Audio waveform tensor.
Returns
- Tensor<T>
Feature tensor capturing acoustic characteristics.
GetSceneProbabilities(Tensor<T>)
Gets scene probabilities for all supported scenes.
IReadOnlyDictionary<string, T> GetSceneProbabilities(Tensor<T> audio)
Parameters
audioTensor<T>Audio waveform tensor.
Returns
- IReadOnlyDictionary<string, T>
Dictionary mapping scene names to probability scores.
GetTopScenes(Tensor<T>, int)
Gets top-K scene predictions.
IReadOnlyList<ScenePrediction<T>> GetTopScenes(Tensor<T> audio, int k = 5)
Parameters
audioTensor<T>Audio waveform tensor.
kintNumber of top scenes to return.
Returns
- IReadOnlyList<ScenePrediction<T>>
List of top scene predictions.
TrackSceneChanges(Tensor<T>, double)
Tracks scene changes over time in longer audio.
SceneTrackingResult<T> TrackSceneChanges(Tensor<T> audio, double segmentDuration = 10)
Parameters
audioTensor<T>Audio waveform tensor.
segmentDurationdoubleDuration of each analysis segment in seconds.
Returns
- SceneTrackingResult<T>
Scene tracking result showing scene over time.
Remarks
For Beginners: For longer recordings that might move between places (like walking from street to inside a building), this tracks the scene changes.