Interface ISoundLocalizer<T>
- Namespace
- AiDotNet.Interfaces
- Assembly
- AiDotNet.dll
Interface for sound localization models that estimate the spatial position of sound sources.
public interface ISoundLocalizer<T>
Type Parameters
TThe numeric type used for calculations.
Remarks
Sound localization estimates where sound is coming from in 3D space. This requires multi-channel audio (stereo or more) and uses differences in timing, loudness, and spectral content between channels to determine direction.
For Beginners: Sound localization is like closing your eyes and pointing to where a sound is coming from.
How it works (like human hearing):
- Sound reaches one ear slightly before the other (ITD - Interaural Time Difference)
- Sound is slightly louder in the closer ear (ILD - Interaural Level Difference)
- Head shape affects high frequencies differently for each ear
- Brain combines all cues to determine direction
What's measured:
- Azimuth: Left-right angle (0° = front, 90° = right, -90° = left)
- Elevation: Up-down angle (0° = level, 90° = above)
- Distance: How far away (harder to estimate from audio alone)
Use cases:
- Spatial audio for VR/AR (place sounds correctly in 3D)
- Smart speakers (know which direction user is speaking from)
- Security (detect where intruder sounds come from)
- Robotics (navigate toward or away from sounds)
- Audio surveillance (track moving sound sources)
- Hearing aids (enhance sounds from specific directions)
Properties
ArrayConfig
Gets the microphone array geometry if applicable.
MicrophoneArrayConfig<T>? ArrayConfig { get; }
Property Value
RequiredChannels
Gets the number of audio channels required.
int RequiredChannels { get; }
Property Value
Remarks
Minimum 2 for stereo. More channels (e.g., 4+ for arrays) enable better accuracy.
SampleRate
Gets the expected sample rate for input audio.
int SampleRate { get; }
Property Value
SupportsDistanceEstimation
Gets whether this model can estimate distance (not just direction).
bool SupportsDistanceEstimation { get; }
Property Value
SupportsMultipleSourceTracking
Gets whether this model can track multiple simultaneous sources.
bool SupportsMultipleSourceTracking { get; }
Property Value
Methods
Beamform(Tensor<T>, double, double)
Beamforms audio to focus on a specific direction.
Tensor<T> Beamform(Tensor<T> audio, double targetAzimuth, double targetElevation = 0)
Parameters
audioTensor<T>Multi-channel audio tensor.
targetAzimuthdoubleTarget azimuth angle in degrees.
targetElevationdoubleTarget elevation angle in degrees.
Returns
- Tensor<T>
Beamformed single-channel audio focused on target direction.
Remarks
For Beginners: Beamforming is like a "zoom" for audio - it enhances sounds from one direction while reducing sounds from other directions.
ComputeSpatialSpectrum(Tensor<T>, double)
Computes spatial power spectrum for visualization.
Tensor<T> ComputeSpatialSpectrum(Tensor<T> audio, double azimuthResolution = 5)
Parameters
audioTensor<T>Multi-channel audio tensor.
azimuthResolutiondoubleResolution in degrees for azimuth.
Returns
- Tensor<T>
Power values for each direction [num_directions].
Remarks
For Beginners: This creates a "heat map" of where sounds are coming from. Peaks in the spectrum indicate sound source directions.
EstimateDirections(Tensor<T>, int)
Estimates direction of arrival (DOA) for dominant sources.
IReadOnlyList<DirectionEstimate<T>> EstimateDirections(Tensor<T> audio, int maxSources = 3)
Parameters
audioTensor<T>Multi-channel audio tensor.
maxSourcesintMaximum number of sources to detect.
Returns
- IReadOnlyList<DirectionEstimate<T>>
List of direction estimates.
Localize(Tensor<T>)
Localizes sound sources in multi-channel audio.
LocalizationResult<T> Localize(Tensor<T> audio)
Parameters
audioTensor<T>Multi-channel audio tensor [channels, samples].
Returns
- LocalizationResult<T>
Localization result with source positions.
Remarks
For Beginners: This is the main method for finding where sounds are. - Pass in stereo (or multi-channel) audio - Get back the direction(s) sounds are coming from
LocalizeAsync(Tensor<T>, CancellationToken)
Localizes sound sources asynchronously.
Task<LocalizationResult<T>> LocalizeAsync(Tensor<T> audio, CancellationToken cancellationToken = default)
Parameters
audioTensor<T>Multi-channel audio tensor.
cancellationTokenCancellationTokenCancellation token for async operation.
Returns
- Task<LocalizationResult<T>>
Localization result.
TrackSources(Tensor<T>, double)
Tracks sound source positions over time.
SoundTrackingResult<T> TrackSources(Tensor<T> audio, double windowDuration = 0.1)
Parameters
audioTensor<T>Multi-channel audio tensor.
windowDurationdoubleDuration of each analysis window in seconds.
Returns
- SoundTrackingResult<T>
Tracking result showing source positions over time.
Remarks
For Beginners: For moving sound sources (like a person walking while talking), this tracks how the position changes over time.