Table of Contents

Interface ISoundLocalizer<T>

Namespace
AiDotNet.Interfaces
Assembly
AiDotNet.dll

Interface for sound localization models that estimate the spatial position of sound sources.

public interface ISoundLocalizer<T>

Type Parameters

T

The numeric type used for calculations.

Remarks

Sound localization estimates where sound is coming from in 3D space. This requires multi-channel audio (stereo or more) and uses differences in timing, loudness, and spectral content between channels to determine direction.

For Beginners: Sound localization is like closing your eyes and pointing to where a sound is coming from.

How it works (like human hearing):

  1. Sound reaches one ear slightly before the other (ITD - Interaural Time Difference)
  2. Sound is slightly louder in the closer ear (ILD - Interaural Level Difference)
  3. Head shape affects high frequencies differently for each ear
  4. Brain combines all cues to determine direction

What's measured:

  • Azimuth: Left-right angle (0° = front, 90° = right, -90° = left)
  • Elevation: Up-down angle (0° = level, 90° = above)
  • Distance: How far away (harder to estimate from audio alone)

Use cases:

  • Spatial audio for VR/AR (place sounds correctly in 3D)
  • Smart speakers (know which direction user is speaking from)
  • Security (detect where intruder sounds come from)
  • Robotics (navigate toward or away from sounds)
  • Audio surveillance (track moving sound sources)
  • Hearing aids (enhance sounds from specific directions)

Properties

ArrayConfig

Gets the microphone array geometry if applicable.

MicrophoneArrayConfig<T>? ArrayConfig { get; }

Property Value

MicrophoneArrayConfig<T>

RequiredChannels

Gets the number of audio channels required.

int RequiredChannels { get; }

Property Value

int

Remarks

Minimum 2 for stereo. More channels (e.g., 4+ for arrays) enable better accuracy.

SampleRate

Gets the expected sample rate for input audio.

int SampleRate { get; }

Property Value

int

SupportsDistanceEstimation

Gets whether this model can estimate distance (not just direction).

bool SupportsDistanceEstimation { get; }

Property Value

bool

SupportsMultipleSourceTracking

Gets whether this model can track multiple simultaneous sources.

bool SupportsMultipleSourceTracking { get; }

Property Value

bool

Methods

Beamform(Tensor<T>, double, double)

Beamforms audio to focus on a specific direction.

Tensor<T> Beamform(Tensor<T> audio, double targetAzimuth, double targetElevation = 0)

Parameters

audio Tensor<T>

Multi-channel audio tensor.

targetAzimuth double

Target azimuth angle in degrees.

targetElevation double

Target elevation angle in degrees.

Returns

Tensor<T>

Beamformed single-channel audio focused on target direction.

Remarks

For Beginners: Beamforming is like a "zoom" for audio - it enhances sounds from one direction while reducing sounds from other directions.

ComputeSpatialSpectrum(Tensor<T>, double)

Computes spatial power spectrum for visualization.

Tensor<T> ComputeSpatialSpectrum(Tensor<T> audio, double azimuthResolution = 5)

Parameters

audio Tensor<T>

Multi-channel audio tensor.

azimuthResolution double

Resolution in degrees for azimuth.

Returns

Tensor<T>

Power values for each direction [num_directions].

Remarks

For Beginners: This creates a "heat map" of where sounds are coming from. Peaks in the spectrum indicate sound source directions.

EstimateDirections(Tensor<T>, int)

Estimates direction of arrival (DOA) for dominant sources.

IReadOnlyList<DirectionEstimate<T>> EstimateDirections(Tensor<T> audio, int maxSources = 3)

Parameters

audio Tensor<T>

Multi-channel audio tensor.

maxSources int

Maximum number of sources to detect.

Returns

IReadOnlyList<DirectionEstimate<T>>

List of direction estimates.

Localize(Tensor<T>)

Localizes sound sources in multi-channel audio.

LocalizationResult<T> Localize(Tensor<T> audio)

Parameters

audio Tensor<T>

Multi-channel audio tensor [channels, samples].

Returns

LocalizationResult<T>

Localization result with source positions.

Remarks

For Beginners: This is the main method for finding where sounds are. - Pass in stereo (or multi-channel) audio - Get back the direction(s) sounds are coming from

LocalizeAsync(Tensor<T>, CancellationToken)

Localizes sound sources asynchronously.

Task<LocalizationResult<T>> LocalizeAsync(Tensor<T> audio, CancellationToken cancellationToken = default)

Parameters

audio Tensor<T>

Multi-channel audio tensor.

cancellationToken CancellationToken

Cancellation token for async operation.

Returns

Task<LocalizationResult<T>>

Localization result.

TrackSources(Tensor<T>, double)

Tracks sound source positions over time.

SoundTrackingResult<T> TrackSources(Tensor<T> audio, double windowDuration = 0.1)

Parameters

audio Tensor<T>

Multi-channel audio tensor.

windowDuration double

Duration of each analysis window in seconds.

Returns

SoundTrackingResult<T>

Tracking result showing source positions over time.

Remarks

For Beginners: For moving sound sources (like a person walking while talking), this tracks how the position changes over time.