Class AudioEventDetector<T>
- Namespace
- AiDotNet.Audio.Classification
- Assembly
- AiDotNet.dll
Audio event detection model for identifying sounds in audio (AudioSet-style).
public class AudioEventDetector<T> : AudioClassifierBase<T>, INeuralNetworkModel<T>, INeuralNetwork<T>, IInterpretableModel<T>, IInputGradientComputable<T>, IDisposable, IAudioEventDetector<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>
Type Parameters
TThe numeric type used for calculations.
- Inheritance
-
AudioEventDetector<T>
- Implements
- Inherited Members
- Extension Methods
Remarks
Detects various audio events like speech, music, environmental sounds, and more. Based on AudioSet ontology with 527+ event classes organized hierarchically.
Architecture: This model extends AudioClassifierBase<T> and implements IAudioEventDetector<T> for multi-label event detection. Unlike single-label classification, event detection identifies multiple overlapping events with their temporal boundaries.
For Beginners: Audio event detection answers "What sounds are in this audio?":
- Human sounds: speech, laughter, coughing, footsteps
- Animal sounds: dog barking, bird singing, cat meowing
- Music: instruments, genres, singing
- Environmental: traffic, rain, wind, construction
Usage with ONNX:
var options = new AudioEventDetectorOptions { ModelPath = "audio-events.onnx" };
var detector = new AudioEventDetector<float>(options);
var result = detector.Detect(audio);
foreach (var evt in result.Events)
{
Console.WriteLine($"{evt.EventType}: {evt.Confidence} at {evt.StartTime:F2}s");
}
Usage with training:
var architecture = new NeuralNetworkArchitecture<float>(inputFeatures: 64, outputSize: 50);
var detector = new AudioEventDetector<float>(architecture, new AudioEventDetectorOptions());
detector.Train(features, labels);
Constructors
AudioEventDetector(AudioEventDetectorOptions?)
Creates an AudioEventDetector with legacy options only (native mode).
public AudioEventDetector(AudioEventDetectorOptions? options = null)
Parameters
optionsAudioEventDetectorOptionsDetection options.
Remarks
This constructor creates a native mode detector. For ONNX inference mode, use CreateAsync(AudioEventDetectorOptions?, IProgress<double>?, CancellationToken) or the constructor that accepts a model path.
Example for ONNX mode:
var detector = await AudioEventDetector<float>.CreateAsync(options);
AudioEventDetector(NeuralNetworkArchitecture<T>, AudioEventDetectorOptions?, IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>?)
Creates an AudioEventDetector for native training mode.
public AudioEventDetector(NeuralNetworkArchitecture<T> architecture, AudioEventDetectorOptions? options = null, IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>? optimizer = null)
Parameters
architectureNeuralNetworkArchitecture<T>The neural network architecture.
optionsAudioEventDetectorOptionsDetection options.
optimizerIGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>Optional custom optimizer (defaults to AdamW).
AudioEventDetector(NeuralNetworkArchitecture<T>, string, AudioEventDetectorOptions?)
Creates an AudioEventDetector for ONNX inference mode.
public AudioEventDetector(NeuralNetworkArchitecture<T> architecture, string modelPath, AudioEventDetectorOptions? options = null)
Parameters
architectureNeuralNetworkArchitecture<T>Neural network architecture.
modelPathstringPath to ONNX model file.
optionsAudioEventDetectorOptionsDetection options.
Fields
CommonEventLabels
Common audio event categories from AudioSet.
public static readonly string[] CommonEventLabels
Field Value
- string[]
Properties
EventLabels
Gets the event labels (alias for SupportedEvents for legacy API compatibility).
public IReadOnlyList<string> EventLabels { get; }
Property Value
SupportedEvents
Gets the list of event types this model can detect.
public IReadOnlyList<string> SupportedEvents { get; }
Property Value
TimeResolution
Gets the time resolution for event detection in seconds.
public double TimeResolution { get; }
Property Value
Methods
CreateAsync(AudioEventDetectorOptions?, IProgress<double>?, CancellationToken)
Creates an AudioEventDetector asynchronously with model download.
public static Task<AudioEventDetector<T>> CreateAsync(AudioEventDetectorOptions? options = null, IProgress<double>? progress = null, CancellationToken cancellationToken = default)
Parameters
optionsAudioEventDetectorOptionsprogressIProgress<double>cancellationTokenCancellationToken
Returns
CreateNewInstance()
Creates a new instance for deserialization.
protected override IFullModel<T, Tensor<T>, Tensor<T>> CreateNewInstance()
Returns
- IFullModel<T, Tensor<T>, Tensor<T>>
DeserializeNetworkSpecificData(BinaryReader)
Deserializes network-specific data.
protected override void DeserializeNetworkSpecificData(BinaryReader reader)
Parameters
readerBinaryReader
Detect(Tensor<T>)
Detects audio events in the audio stream.
public AudioEventResult<T> Detect(Tensor<T> audio)
Parameters
audioTensor<T>
Returns
Detect(Tensor<T>, T)
Detects audio events with custom threshold.
public AudioEventResult<T> Detect(Tensor<T> audio, T threshold)
Parameters
audioTensor<T>thresholdT
Returns
DetectAsync(Tensor<T>, CancellationToken)
Detects audio events asynchronously.
public Task<AudioEventResult<T>> DetectAsync(Tensor<T> audio, CancellationToken cancellationToken = default)
Parameters
audioTensor<T>cancellationTokenCancellationToken
Returns
- Task<AudioEventResult<T>>
DetectFrame(Tensor<T>)
Detects a single frame (no windowing) - legacy API.
public Dictionary<string, double> DetectFrame(Tensor<T> audio)
Parameters
audioTensor<T>
Returns
DetectLegacy(Tensor<T>)
Detects audio events in the given audio (legacy API).
public List<AudioEvent> DetectLegacy(Tensor<T> audio)
Parameters
audioTensor<T>Audio waveform.
Returns
- List<AudioEvent>
List of detected audio events.
DetectLegacyAsync(Tensor<T>, CancellationToken)
Detects audio events asynchronously (legacy API).
public Task<List<AudioEvent>> DetectLegacyAsync(Tensor<T> audio, CancellationToken cancellationToken = default)
Parameters
audioTensor<T>cancellationTokenCancellationToken
Returns
DetectSpecific(Tensor<T>, IReadOnlyList<string>)
Detects specific events only.
public AudioEventResult<T> DetectSpecific(Tensor<T> audio, IReadOnlyList<string> eventTypes)
Parameters
audioTensor<T>eventTypesIReadOnlyList<string>
Returns
DetectSpecific(Tensor<T>, IReadOnlyList<string>, T)
Detects specific events only with custom threshold.
public AudioEventResult<T> DetectSpecific(Tensor<T> audio, IReadOnlyList<string> eventTypes, T threshold)
Parameters
audioTensor<T>eventTypesIReadOnlyList<string>thresholdT
Returns
DetectTopK(Tensor<T>, int)
Gets the top K events for a single frame (legacy API).
public List<(string Label, double Confidence)> DetectTopK(Tensor<T> audio, int topK = 5)
Parameters
audioTensor<T>topKint
Returns
- List<(string Label, double Confidence)>
Dispose(bool)
Disposes managed resources.
protected override void Dispose(bool disposing)
Parameters
disposingbool
GetEventProbabilities(Tensor<T>)
Gets frame-level event probabilities.
public Tensor<T> GetEventProbabilities(Tensor<T> audio)
Parameters
audioTensor<T>
Returns
- Tensor<T>
GetModelMetadata()
Gets model metadata.
public override ModelMetadata<T> GetModelMetadata()
Returns
InitializeLayers()
Initializes the neural network layers.
protected override void InitializeLayers()
PostprocessOutput(Tensor<T>)
Post-processes model output.
protected override Tensor<T> PostprocessOutput(Tensor<T> modelOutput)
Parameters
modelOutputTensor<T>
Returns
- Tensor<T>
Predict(Tensor<T>)
Predicts output for the given input.
public override Tensor<T> Predict(Tensor<T> input)
Parameters
inputTensor<T>
Returns
- Tensor<T>
PreprocessAudio(Tensor<T>)
Preprocesses audio for the model.
protected override Tensor<T> PreprocessAudio(Tensor<T> rawAudio)
Parameters
rawAudioTensor<T>
Returns
- Tensor<T>
SerializeNetworkSpecificData(BinaryWriter)
Serializes network-specific data.
protected override void SerializeNetworkSpecificData(BinaryWriter writer)
Parameters
writerBinaryWriter
StartStreamingSession()
Starts a streaming event detection session.
public IStreamingEventDetectionSession<T> StartStreamingSession()
Returns
StartStreamingSession(int, T)
Starts a streaming event detection session with custom settings.
public IStreamingEventDetectionSession<T> StartStreamingSession(int sampleRate, T threshold)
Parameters
sampleRateintthresholdT
Returns
Train(Tensor<T>, Tensor<T>)
Trains the model on a single example.
public override void Train(Tensor<T> input, Tensor<T> expected)
Parameters
inputTensor<T>expectedTensor<T>
UpdateParameters(Vector<T>)
Updates network parameters from a flattened vector.
public override void UpdateParameters(Vector<T> parameters)
Parameters
parametersVector<T>