Table of Contents

Interface ISpeakerVerifier<T>

Namespace
AiDotNet.Interfaces
Assembly
AiDotNet.dll

Interface for speaker verification models that determine if audio matches a claimed identity.

public interface ISpeakerVerifier<T> : IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>

Type Parameters

T

The numeric type used for calculations.

Inherited Members
Extension Methods

Remarks

Speaker verification (also called speaker authentication) determines whether a speech sample matches a claimed identity. It answers the question "Is this person who they claim to be?" This is a 1-to-1 comparison task.

For Beginners: Speaker verification is like a voice-based password check.

How it works:

  1. User enrolls by providing voice samples
  2. System creates a voiceprint (speaker embedding) for that user
  3. Later, user provides a new voice sample
  4. System compares new sample to stored voiceprint
  5. Decision: Accept (same person) or Reject (different person)

Common use cases:

  • Voice banking authentication
  • Phone-based customer verification
  • Smart speaker personalization
  • Access control systems

Key metrics:

  • False Accept Rate (FAR): How often imposters are wrongly accepted
  • False Reject Rate (FRR): How often legitimate users are wrongly rejected
  • Equal Error Rate (EER): When FAR = FRR (lower is better)

This interface extends IFullModel<T, TInput, TOutput> for Tensor-based audio processing.

Properties

DefaultThreshold

Gets the default decision threshold for verification.

T DefaultThreshold { get; }

Property Value

T

Remarks

Scores above this threshold indicate same speaker. The optimal threshold depends on the security requirements: - Higher threshold: More secure (fewer false accepts) but more false rejects - Lower threshold: More convenient (fewer false rejects) but more false accepts

EmbeddingExtractor

Gets the underlying speaker embedding extractor.

ISpeakerEmbeddingExtractor<T> EmbeddingExtractor { get; }

Property Value

ISpeakerEmbeddingExtractor<T>

IsOnnxMode

Gets whether this model is running in ONNX inference mode.

bool IsOnnxMode { get; }

Property Value

bool

SampleRate

Gets the expected sample rate for input audio.

int SampleRate { get; }

Property Value

int

Methods

ComputeScore(Tensor<T>, Tensor<T>)

Computes the verification score between audio and a reference.

T ComputeScore(Tensor<T> audio, Tensor<T> referenceEmbedding)

Parameters

audio Tensor<T>

Audio to verify.

referenceEmbedding Tensor<T>

Reference speaker embedding.

Returns

T

Verification score (higher = more likely same speaker).

Enroll(Tensor<T>)

Enrolls a speaker by creating a reference embedding from a single audio sample.

SpeakerProfile<T> Enroll(Tensor<T> enrollmentAudio)

Parameters

enrollmentAudio Tensor<T>

Single audio sample from the speaker.

Returns

SpeakerProfile<T>

Enrolled speaker profile.

Enroll(IReadOnlyList<Tensor<T>>)

Enrolls a speaker by creating a reference embedding from audio samples.

SpeakerProfile<T> Enroll(IReadOnlyList<Tensor<T>> enrollmentAudio)

Parameters

enrollmentAudio IReadOnlyList<Tensor<T>>

Audio samples from the speaker.

Returns

SpeakerProfile<T>

Enrolled speaker profile containing the aggregated embedding.

Remarks

For Beginners: Enrollment is like setting up a new voice password. The more audio you provide, the better the system can recognize the speaker.

GetThresholdForFAR(double)

Gets the recommended threshold for a target false accept rate.

T GetThresholdForFAR(double targetFAR)

Parameters

targetFAR double

Target false accept rate (e.g., 0.01 for 1%).

Returns

T

Recommended threshold value.

UpdateProfile(SpeakerProfile<T>, Tensor<T>)

Updates an existing speaker profile with additional audio.

SpeakerProfile<T> UpdateProfile(SpeakerProfile<T> existingProfile, Tensor<T> newAudio)

Parameters

existingProfile SpeakerProfile<T>

The existing speaker profile.

newAudio Tensor<T>

New audio to incorporate.

Returns

SpeakerProfile<T>

Updated speaker profile.

Verify(Tensor<T>, Tensor<T>)

Verifies if audio matches a reference speaker embedding.

SpeakerVerificationResult<T> Verify(Tensor<T> audio, Tensor<T> referenceEmbedding)

Parameters

audio Tensor<T>

Audio to verify.

referenceEmbedding Tensor<T>

Pre-computed speaker embedding of the claimed identity.

Returns

SpeakerVerificationResult<T>

Verification result with decision and confidence score.

Remarks

For Beginners: This checks if the voice in the audio matches a known voiceprint. - true = Same person (accept) - false = Different person (reject)

Verify(Tensor<T>, Tensor<T>, T)

Verifies if audio matches a reference speaker embedding with custom threshold.

SpeakerVerificationResult<T> Verify(Tensor<T> audio, Tensor<T> referenceEmbedding, T threshold)

Parameters

audio Tensor<T>

Audio to verify.

referenceEmbedding Tensor<T>

Pre-computed speaker embedding of the claimed identity.

threshold T

Custom decision threshold.

Returns

SpeakerVerificationResult<T>

Verification result with decision and confidence score.

VerifyAsync(Tensor<T>, Tensor<T>, CancellationToken)

Verifies if audio matches a reference speaker embedding asynchronously.

Task<SpeakerVerificationResult<T>> VerifyAsync(Tensor<T> audio, Tensor<T> referenceEmbedding, CancellationToken cancellationToken = default)

Parameters

audio Tensor<T>

Audio to verify.

referenceEmbedding Tensor<T>

Pre-computed speaker embedding of the claimed identity.

cancellationToken CancellationToken

Cancellation token for async operation.

Returns

Task<SpeakerVerificationResult<T>>

Verification result with decision and confidence score.

VerifyWithReferenceAudio(Tensor<T>, Tensor<T>)

Verifies if audio matches reference audio of a claimed speaker.

SpeakerVerificationResult<T> VerifyWithReferenceAudio(Tensor<T> audio, Tensor<T> referenceAudio)

Parameters

audio Tensor<T>

Audio to verify.

referenceAudio Tensor<T>

Reference audio of the claimed identity.

Returns

SpeakerVerificationResult<T>

Verification result with decision and confidence score.