Table of Contents

Class AudioEventDetector<T>

Namespace
AiDotNet.Audio.Classification
Assembly
AiDotNet.dll

Audio event detection model for identifying sounds in audio (AudioSet-style).

public class AudioEventDetector<T> : AudioClassifierBase<T>, INeuralNetworkModel<T>, INeuralNetwork<T>, IInterpretableModel<T>, IInputGradientComputable<T>, IDisposable, IAudioEventDetector<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>

Type Parameters

T

The numeric type used for calculations.

Inheritance
AudioEventDetector<T>
Implements
IFullModel<T, Tensor<T>, Tensor<T>>
IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>
IParameterizable<T, Tensor<T>, Tensor<T>>
ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>
IGradientComputable<T, Tensor<T>, Tensor<T>>
Inherited Members
Extension Methods

Remarks

Detects various audio events like speech, music, environmental sounds, and more. Based on AudioSet ontology with 527+ event classes organized hierarchically.

Architecture: This model extends AudioClassifierBase<T> and implements IAudioEventDetector<T> for multi-label event detection. Unlike single-label classification, event detection identifies multiple overlapping events with their temporal boundaries.

For Beginners: Audio event detection answers "What sounds are in this audio?":

  • Human sounds: speech, laughter, coughing, footsteps
  • Animal sounds: dog barking, bird singing, cat meowing
  • Music: instruments, genres, singing
  • Environmental: traffic, rain, wind, construction

Usage with ONNX:

var options = new AudioEventDetectorOptions { ModelPath = "audio-events.onnx" };
var detector = new AudioEventDetector<float>(options);
var result = detector.Detect(audio);
foreach (var evt in result.Events)
{
    Console.WriteLine($"{evt.EventType}: {evt.Confidence} at {evt.StartTime:F2}s");
}

Usage with training:

var architecture = new NeuralNetworkArchitecture<float>(inputFeatures: 64, outputSize: 50);
var detector = new AudioEventDetector<float>(architecture, new AudioEventDetectorOptions());
detector.Train(features, labels);

Constructors

AudioEventDetector(AudioEventDetectorOptions?)

Creates an AudioEventDetector with legacy options only (native mode).

public AudioEventDetector(AudioEventDetectorOptions? options = null)

Parameters

options AudioEventDetectorOptions

Detection options.

Remarks

This constructor creates a native mode detector. For ONNX inference mode, use CreateAsync(AudioEventDetectorOptions?, IProgress<double>?, CancellationToken) or the constructor that accepts a model path.

Example for ONNX mode:

var detector = await AudioEventDetector<float>.CreateAsync(options);

AudioEventDetector(NeuralNetworkArchitecture<T>, AudioEventDetectorOptions?, IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>?)

Creates an AudioEventDetector for native training mode.

public AudioEventDetector(NeuralNetworkArchitecture<T> architecture, AudioEventDetectorOptions? options = null, IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>? optimizer = null)

Parameters

architecture NeuralNetworkArchitecture<T>

The neural network architecture.

options AudioEventDetectorOptions

Detection options.

optimizer IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>

Optional custom optimizer (defaults to AdamW).

AudioEventDetector(NeuralNetworkArchitecture<T>, string, AudioEventDetectorOptions?)

Creates an AudioEventDetector for ONNX inference mode.

public AudioEventDetector(NeuralNetworkArchitecture<T> architecture, string modelPath, AudioEventDetectorOptions? options = null)

Parameters

architecture NeuralNetworkArchitecture<T>

Neural network architecture.

modelPath string

Path to ONNX model file.

options AudioEventDetectorOptions

Detection options.

Fields

CommonEventLabels

Common audio event categories from AudioSet.

public static readonly string[] CommonEventLabels

Field Value

string[]

Properties

EventLabels

Gets the event labels (alias for SupportedEvents for legacy API compatibility).

public IReadOnlyList<string> EventLabels { get; }

Property Value

IReadOnlyList<string>

SupportedEvents

Gets the list of event types this model can detect.

public IReadOnlyList<string> SupportedEvents { get; }

Property Value

IReadOnlyList<string>

TimeResolution

Gets the time resolution for event detection in seconds.

public double TimeResolution { get; }

Property Value

double

Methods

CreateAsync(AudioEventDetectorOptions?, IProgress<double>?, CancellationToken)

Creates an AudioEventDetector asynchronously with model download.

public static Task<AudioEventDetector<T>> CreateAsync(AudioEventDetectorOptions? options = null, IProgress<double>? progress = null, CancellationToken cancellationToken = default)

Parameters

options AudioEventDetectorOptions
progress IProgress<double>
cancellationToken CancellationToken

Returns

Task<AudioEventDetector<T>>

CreateNewInstance()

Creates a new instance for deserialization.

protected override IFullModel<T, Tensor<T>, Tensor<T>> CreateNewInstance()

Returns

IFullModel<T, Tensor<T>, Tensor<T>>

DeserializeNetworkSpecificData(BinaryReader)

Deserializes network-specific data.

protected override void DeserializeNetworkSpecificData(BinaryReader reader)

Parameters

reader BinaryReader

Detect(Tensor<T>)

Detects audio events in the audio stream.

public AudioEventResult<T> Detect(Tensor<T> audio)

Parameters

audio Tensor<T>

Returns

AudioEventResult<T>

Detect(Tensor<T>, T)

Detects audio events with custom threshold.

public AudioEventResult<T> Detect(Tensor<T> audio, T threshold)

Parameters

audio Tensor<T>
threshold T

Returns

AudioEventResult<T>

DetectAsync(Tensor<T>, CancellationToken)

Detects audio events asynchronously.

public Task<AudioEventResult<T>> DetectAsync(Tensor<T> audio, CancellationToken cancellationToken = default)

Parameters

audio Tensor<T>
cancellationToken CancellationToken

Returns

Task<AudioEventResult<T>>

DetectFrame(Tensor<T>)

Detects a single frame (no windowing) - legacy API.

public Dictionary<string, double> DetectFrame(Tensor<T> audio)

Parameters

audio Tensor<T>

Returns

Dictionary<string, double>

DetectLegacy(Tensor<T>)

Detects audio events in the given audio (legacy API).

public List<AudioEvent> DetectLegacy(Tensor<T> audio)

Parameters

audio Tensor<T>

Audio waveform.

Returns

List<AudioEvent>

List of detected audio events.

DetectLegacyAsync(Tensor<T>, CancellationToken)

Detects audio events asynchronously (legacy API).

public Task<List<AudioEvent>> DetectLegacyAsync(Tensor<T> audio, CancellationToken cancellationToken = default)

Parameters

audio Tensor<T>
cancellationToken CancellationToken

Returns

Task<List<AudioEvent>>

DetectSpecific(Tensor<T>, IReadOnlyList<string>)

Detects specific events only.

public AudioEventResult<T> DetectSpecific(Tensor<T> audio, IReadOnlyList<string> eventTypes)

Parameters

audio Tensor<T>
eventTypes IReadOnlyList<string>

Returns

AudioEventResult<T>

DetectSpecific(Tensor<T>, IReadOnlyList<string>, T)

Detects specific events only with custom threshold.

public AudioEventResult<T> DetectSpecific(Tensor<T> audio, IReadOnlyList<string> eventTypes, T threshold)

Parameters

audio Tensor<T>
eventTypes IReadOnlyList<string>
threshold T

Returns

AudioEventResult<T>

DetectTopK(Tensor<T>, int)

Gets the top K events for a single frame (legacy API).

public List<(string Label, double Confidence)> DetectTopK(Tensor<T> audio, int topK = 5)

Parameters

audio Tensor<T>
topK int

Returns

List<(string Label, double Confidence)>

Dispose(bool)

Disposes managed resources.

protected override void Dispose(bool disposing)

Parameters

disposing bool

GetEventProbabilities(Tensor<T>)

Gets frame-level event probabilities.

public Tensor<T> GetEventProbabilities(Tensor<T> audio)

Parameters

audio Tensor<T>

Returns

Tensor<T>

GetModelMetadata()

Gets model metadata.

public override ModelMetadata<T> GetModelMetadata()

Returns

ModelMetadata<T>

InitializeLayers()

Initializes the neural network layers.

protected override void InitializeLayers()

PostprocessOutput(Tensor<T>)

Post-processes model output.

protected override Tensor<T> PostprocessOutput(Tensor<T> modelOutput)

Parameters

modelOutput Tensor<T>

Returns

Tensor<T>

Predict(Tensor<T>)

Predicts output for the given input.

public override Tensor<T> Predict(Tensor<T> input)

Parameters

input Tensor<T>

Returns

Tensor<T>

PreprocessAudio(Tensor<T>)

Preprocesses audio for the model.

protected override Tensor<T> PreprocessAudio(Tensor<T> rawAudio)

Parameters

rawAudio Tensor<T>

Returns

Tensor<T>

SerializeNetworkSpecificData(BinaryWriter)

Serializes network-specific data.

protected override void SerializeNetworkSpecificData(BinaryWriter writer)

Parameters

writer BinaryWriter

StartStreamingSession()

Starts a streaming event detection session.

public IStreamingEventDetectionSession<T> StartStreamingSession()

Returns

IStreamingEventDetectionSession<T>

StartStreamingSession(int, T)

Starts a streaming event detection session with custom settings.

public IStreamingEventDetectionSession<T> StartStreamingSession(int sampleRate, T threshold)

Parameters

sampleRate int
threshold T

Returns

IStreamingEventDetectionSession<T>

Train(Tensor<T>, Tensor<T>)

Trains the model on a single example.

public override void Train(Tensor<T> input, Tensor<T> expected)

Parameters

input Tensor<T>
expected Tensor<T>

UpdateParameters(Vector<T>)

Updates network parameters from a flattened vector.

public override void UpdateParameters(Vector<T> parameters)

Parameters

parameters Vector<T>