Class SceneClassifier<T>
- Namespace
- AiDotNet.Audio.Classification
- Assembly
- AiDotNet.dll
Acoustic scene classification model for identifying recording environments.
public class SceneClassifier<T> : AudioClassifierBase<T>, INeuralNetworkModel<T>, INeuralNetwork<T>, IInterpretableModel<T>, IInputGradientComputable<T>, IDisposable, ISceneClassifier<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>
Type Parameters
TThe numeric type used for calculations.
- Inheritance
-
SceneClassifier<T>
- Implements
- Inherited Members
- Extension Methods
Remarks
Classifies audio recordings by their acoustic environment or scene context. Based on DCASE (Detection and Classification of Acoustic Scenes and Events) challenge.
For Beginners: Scene classification answers "Where was this recorded?":
- Indoor: office, home, shopping mall, restaurant, library
- Outdoor: street, park, beach, forest, construction site
- Transportation: bus, train, metro, airport, car
Usage with ONNX model:
var classifier = new SceneClassifier<float>("model.onnx");
var result = classifier.Classify(audioTensor);
Console.WriteLine($"Scene: {result.PredictedScene} ({result.Confidence})");
Usage for training:
var architecture = new NeuralNetworkArchitecture<float>(inputFeatures: 60, outputSize: 30);
var classifier = new SceneClassifier<float>(architecture);
classifier.Train(features, labels);
var result = classifier.Classify(newAudio);
Constructors
SceneClassifier(SceneClassifierOptions?)
Creates a SceneClassifier with default options for basic classification.
public SceneClassifier(SceneClassifierOptions? options = null)
Parameters
optionsSceneClassifierOptionsOptional configuration options.
SceneClassifier(NeuralNetworkArchitecture<T>, SceneClassifierOptions?, IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>?)
Creates a SceneClassifier for native training mode.
public SceneClassifier(NeuralNetworkArchitecture<T> architecture, SceneClassifierOptions? options = null, IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>? optimizer = null)
Parameters
architectureNeuralNetworkArchitecture<T>Neural network architecture.
optionsSceneClassifierOptionsOptional configuration options.
optimizerIGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>Optional custom optimizer.
SceneClassifier(string, SceneClassifierOptions?)
Creates a SceneClassifier for ONNX inference mode.
public SceneClassifier(string modelPath, SceneClassifierOptions? options = null)
Parameters
modelPathstringPath to the ONNX model file.
optionsSceneClassifierOptionsOptional configuration options.
Fields
StandardScenes
Standard acoustic scene labels (DCASE-style).
public static readonly string[] StandardScenes
Field Value
- string[]
Properties
MinimumDurationSeconds
Gets the minimum audio duration required for reliable classification.
public double MinimumDurationSeconds { get; }
Property Value
Scenes
Gets the scenes (alias for SupportedScenes for legacy API compatibility).
public IReadOnlyList<string> Scenes { get; }
Property Value
SupportedScenes
Gets the list of scenes this model can classify.
public IReadOnlyList<string> SupportedScenes { get; }
Property Value
Methods
Classify(Tensor<T>)
Classifies the acoustic scene of an audio recording.
public SceneClassificationResult<T> Classify(Tensor<T> audio)
Parameters
audioTensor<T>
Returns
ClassifyAsync(Tensor<T>, CancellationToken)
Classifies the acoustic scene asynchronously.
public Task<SceneClassificationResult<T>> ClassifyAsync(Tensor<T> audio, CancellationToken cancellationToken = default)
Parameters
audioTensor<T>cancellationTokenCancellationToken
Returns
ClassifyCategory(Tensor<T>)
Classifies audio and returns category with confidence (legacy API compatibility).
public (string Category, T Confidence) ClassifyCategory(Tensor<T> audio)
Parameters
audioTensor<T>Audio waveform tensor.
Returns
- (string Label, T Confidence)
Tuple of (category, confidence).
CreateAsync(SceneClassifierOptions?, IProgress<double>?, CancellationToken)
Creates a SceneClassifier asynchronously with model download.
public static Task<SceneClassifier<T>> CreateAsync(SceneClassifierOptions? options = null, IProgress<double>? progress = null, CancellationToken cancellationToken = default)
Parameters
optionsSceneClassifierOptionsprogressIProgress<double>cancellationTokenCancellationToken
Returns
- Task<SceneClassifier<T>>
CreateNewInstance()
Creates a new instance of this network type.
protected override IFullModel<T, Tensor<T>, Tensor<T>> CreateNewInstance()
Returns
- IFullModel<T, Tensor<T>, Tensor<T>>
DeserializeNetworkSpecificData(BinaryReader)
Deserializes network-specific data.
protected override void DeserializeNetworkSpecificData(BinaryReader reader)
Parameters
readerBinaryReader
Dispose(bool)
Disposes of managed resources.
protected override void Dispose(bool disposing)
Parameters
disposingbool
ExtractAcousticFeatures(Tensor<T>)
Extracts acoustic features used for scene classification.
public Tensor<T> ExtractAcousticFeatures(Tensor<T> audio)
Parameters
audioTensor<T>
Returns
- Tensor<T>
GetModelMetadata()
Gets model metadata for serialization.
public override ModelMetadata<T> GetModelMetadata()
Returns
GetSceneProbabilities(Tensor<T>)
Gets scene probabilities for all supported scenes.
public IReadOnlyDictionary<string, T> GetSceneProbabilities(Tensor<T> audio)
Parameters
audioTensor<T>
Returns
GetTopScenes(Tensor<T>, int)
Gets top-K scene predictions.
public IReadOnlyList<ScenePrediction<T>> GetTopScenes(Tensor<T> audio, int k = 5)
Parameters
audioTensor<T>kint
Returns
InitializeLayers()
Initializes the neural network layers.
protected override void InitializeLayers()
PostprocessOutput(Tensor<T>)
Postprocesses model output into final predictions.
protected override Tensor<T> PostprocessOutput(Tensor<T> modelOutput)
Parameters
modelOutputTensor<T>
Returns
- Tensor<T>
Predict(Tensor<T>)
Predicts scene probabilities from audio features.
public override Tensor<T> Predict(Tensor<T> input)
Parameters
inputTensor<T>
Returns
- Tensor<T>
PreprocessAudio(Tensor<T>)
Preprocesses raw audio into model input format.
protected override Tensor<T> PreprocessAudio(Tensor<T> rawAudio)
Parameters
rawAudioTensor<T>
Returns
- Tensor<T>
SerializeNetworkSpecificData(BinaryWriter)
Serializes network-specific data.
protected override void SerializeNetworkSpecificData(BinaryWriter writer)
Parameters
writerBinaryWriter
TrackSceneChanges(Tensor<T>, double)
Tracks scene changes over time in longer audio.
public SceneTrackingResult<T> TrackSceneChanges(Tensor<T> audio, double segmentDuration = 10)
Parameters
audioTensor<T>segmentDurationdouble
Returns
Train(Tensor<T>, Tensor<T>)
Trains the model on labeled audio samples.
public override void Train(Tensor<T> input, Tensor<T> expected)
Parameters
inputTensor<T>expectedTensor<T>
UpdateParameters(Vector<T>)
Updates parameters from a flattened parameter vector.
public override void UpdateParameters(Vector<T> parameters)
Parameters
parametersVector<T>