Interface ISoundLocalizer<T>

Namespace: AiDotNet.Interfaces

Assembly: AiDotNet.dll

Interface for sound localization models that estimate the spatial position of sound sources.

public interface ISoundLocalizer<T>

Type Parameters

T: The numeric type used for calculations.

Remarks

Sound localization estimates where sound is coming from in 3D space. This requires multi-channel audio (stereo or more) and uses differences in timing, loudness, and spectral content between channels to determine direction.

For Beginners: Sound localization is like closing your eyes and pointing to where a sound is coming from.

How it works (like human hearing):

Sound reaches one ear slightly before the other (ITD - Interaural Time Difference)
Sound is slightly louder in the closer ear (ILD - Interaural Level Difference)
Head shape affects high frequencies differently for each ear
Brain combines all cues to determine direction

What's measured:

Azimuth: Left-right angle (0° = front, 90° = right, -90° = left)
Elevation: Up-down angle (0° = level, 90° = above)
Distance: How far away (harder to estimate from audio alone)

Use cases:

Spatial audio for VR/AR (place sounds correctly in 3D)
Smart speakers (know which direction user is speaking from)
Security (detect where intruder sounds come from)
Robotics (navigate toward or away from sounds)
Audio surveillance (track moving sound sources)
Hearing aids (enhance sounds from specific directions)

Properties

ArrayConfig

Gets the microphone array geometry if applicable.

MicrophoneArrayConfig<T>? ArrayConfig { get; }

Property Value

MicrophoneArrayConfig<T>

RequiredChannels

Gets the number of audio channels required.

int RequiredChannels { get; }

Property Value

int

Remarks

Minimum 2 for stereo. More channels (e.g., 4+ for arrays) enable better accuracy.

SampleRate

Gets the expected sample rate for input audio.

int SampleRate { get; }

Property Value

int

SupportsDistanceEstimation

Gets whether this model can estimate distance (not just direction).

bool SupportsDistanceEstimation { get; }

Property Value

bool

SupportsMultipleSourceTracking

Gets whether this model can track multiple simultaneous sources.

bool SupportsMultipleSourceTracking { get; }

Property Value

bool

Methods

Beamform(Tensor<T>, double, double)

Beamforms audio to focus on a specific direction.

Tensor<T> Beamform(Tensor<T> audio, double targetAzimuth, double targetElevation = 0)

Parameters

audio Tensor<T>: Multi-channel audio tensor.
targetAzimuth double: Target azimuth angle in degrees.
targetElevation double: Target elevation angle in degrees.

Returns

Tensor<T>: Beamformed single-channel audio focused on target direction.

Remarks

For Beginners: Beamforming is like a "zoom" for audio - it enhances sounds from one direction while reducing sounds from other directions.

ComputeSpatialSpectrum(Tensor<T>, double)

Computes spatial power spectrum for visualization.

Tensor<T> ComputeSpatialSpectrum(Tensor<T> audio, double azimuthResolution = 5)

Parameters

audio Tensor<T>: Multi-channel audio tensor.
azimuthResolution double: Resolution in degrees for azimuth.

Returns

Tensor<T>: Power values for each direction [num_directions].

Remarks

For Beginners: This creates a "heat map" of where sounds are coming from. Peaks in the spectrum indicate sound source directions.

EstimateDirections(Tensor<T>, int)

Estimates direction of arrival (DOA) for dominant sources.

IReadOnlyList<DirectionEstimate<T>> EstimateDirections(Tensor<T> audio, int maxSources = 3)

Parameters

audio Tensor<T>: Multi-channel audio tensor.
maxSources int: Maximum number of sources to detect.

Returns

IReadOnlyList<DirectionEstimate<T>>: List of direction estimates.

Localize(Tensor<T>)

Localizes sound sources in multi-channel audio.

LocalizationResult<T> Localize(Tensor<T> audio)

Parameters

audio Tensor<T>: Multi-channel audio tensor [channels, samples].

Returns

LocalizationResult<T>: Localization result with source positions.

Remarks

For Beginners: This is the main method for finding where sounds are. - Pass in stereo (or multi-channel) audio - Get back the direction(s) sounds are coming from

LocalizeAsync(Tensor<T>, CancellationToken)

Localizes sound sources asynchronously.

Task<LocalizationResult<T>> LocalizeAsync(Tensor<T> audio, CancellationToken cancellationToken = default)

Parameters

audio Tensor<T>: Multi-channel audio tensor.
cancellationToken CancellationToken: Cancellation token for async operation.

Returns

Task<LocalizationResult<T>>: Localization result.

TrackSources(Tensor<T>, double)

Tracks sound source positions over time.

SoundTrackingResult<T> TrackSources(Tensor<T> audio, double windowDuration = 0.1)

Parameters

audio Tensor<T>: Multi-channel audio tensor.
windowDuration double: Duration of each analysis window in seconds.

Returns

SoundTrackingResult<T>: Tracking result showing source positions over time.

Remarks

For Beginners: For moving sound sources (like a person walking while talking), this tracks how the position changes over time.

Table of Contents

Interface ISoundLocalizer<T>

Type Parameters

Remarks

Properties

ArrayConfig

Property Value

RequiredChannels

Property Value

Remarks

SampleRate

Property Value

SupportsDistanceEstimation

Property Value

SupportsMultipleSourceTracking

Property Value

Methods

Beamform(Tensor<T>, double, double)

Parameters

Returns

Remarks

ComputeSpatialSpectrum(Tensor<T>, double)

Parameters

Returns

Remarks

EstimateDirections(Tensor<T>, int)

Parameters

Returns

Localize(Tensor<T>)

Parameters

Returns

Remarks

LocalizeAsync(Tensor<T>, CancellationToken)

Parameters

Returns

TrackSources(Tensor<T>, double)

Parameters

Returns

Remarks