Interface ISpeakerVerifier<T>
- Namespace
- AiDotNet.Interfaces
- Assembly
- AiDotNet.dll
Interface for speaker verification models that determine if audio matches a claimed identity.
public interface ISpeakerVerifier<T> : IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>
Type Parameters
TThe numeric type used for calculations.
- Inherited Members
- Extension Methods
Remarks
Speaker verification (also called speaker authentication) determines whether a speech sample matches a claimed identity. It answers the question "Is this person who they claim to be?" This is a 1-to-1 comparison task.
For Beginners: Speaker verification is like a voice-based password check.
How it works:
- User enrolls by providing voice samples
- System creates a voiceprint (speaker embedding) for that user
- Later, user provides a new voice sample
- System compares new sample to stored voiceprint
- Decision: Accept (same person) or Reject (different person)
Common use cases:
- Voice banking authentication
- Phone-based customer verification
- Smart speaker personalization
- Access control systems
Key metrics:
- False Accept Rate (FAR): How often imposters are wrongly accepted
- False Reject Rate (FRR): How often legitimate users are wrongly rejected
- Equal Error Rate (EER): When FAR = FRR (lower is better)
This interface extends IFullModel<T, TInput, TOutput> for Tensor-based audio processing.
Properties
DefaultThreshold
Gets the default decision threshold for verification.
T DefaultThreshold { get; }
Property Value
- T
Remarks
Scores above this threshold indicate same speaker. The optimal threshold depends on the security requirements: - Higher threshold: More secure (fewer false accepts) but more false rejects - Lower threshold: More convenient (fewer false rejects) but more false accepts
EmbeddingExtractor
Gets the underlying speaker embedding extractor.
ISpeakerEmbeddingExtractor<T> EmbeddingExtractor { get; }
Property Value
IsOnnxMode
Gets whether this model is running in ONNX inference mode.
bool IsOnnxMode { get; }
Property Value
SampleRate
Gets the expected sample rate for input audio.
int SampleRate { get; }
Property Value
Methods
ComputeScore(Tensor<T>, Tensor<T>)
Computes the verification score between audio and a reference.
T ComputeScore(Tensor<T> audio, Tensor<T> referenceEmbedding)
Parameters
audioTensor<T>Audio to verify.
referenceEmbeddingTensor<T>Reference speaker embedding.
Returns
- T
Verification score (higher = more likely same speaker).
Enroll(Tensor<T>)
Enrolls a speaker by creating a reference embedding from a single audio sample.
SpeakerProfile<T> Enroll(Tensor<T> enrollmentAudio)
Parameters
enrollmentAudioTensor<T>Single audio sample from the speaker.
Returns
- SpeakerProfile<T>
Enrolled speaker profile.
Enroll(IReadOnlyList<Tensor<T>>)
Enrolls a speaker by creating a reference embedding from audio samples.
SpeakerProfile<T> Enroll(IReadOnlyList<Tensor<T>> enrollmentAudio)
Parameters
enrollmentAudioIReadOnlyList<Tensor<T>>Audio samples from the speaker.
Returns
- SpeakerProfile<T>
Enrolled speaker profile containing the aggregated embedding.
Remarks
For Beginners: Enrollment is like setting up a new voice password. The more audio you provide, the better the system can recognize the speaker.
GetThresholdForFAR(double)
Gets the recommended threshold for a target false accept rate.
T GetThresholdForFAR(double targetFAR)
Parameters
targetFARdoubleTarget false accept rate (e.g., 0.01 for 1%).
Returns
- T
Recommended threshold value.
UpdateProfile(SpeakerProfile<T>, Tensor<T>)
Updates an existing speaker profile with additional audio.
SpeakerProfile<T> UpdateProfile(SpeakerProfile<T> existingProfile, Tensor<T> newAudio)
Parameters
existingProfileSpeakerProfile<T>The existing speaker profile.
newAudioTensor<T>New audio to incorporate.
Returns
- SpeakerProfile<T>
Updated speaker profile.
Verify(Tensor<T>, Tensor<T>)
Verifies if audio matches a reference speaker embedding.
SpeakerVerificationResult<T> Verify(Tensor<T> audio, Tensor<T> referenceEmbedding)
Parameters
audioTensor<T>Audio to verify.
referenceEmbeddingTensor<T>Pre-computed speaker embedding of the claimed identity.
Returns
- SpeakerVerificationResult<T>
Verification result with decision and confidence score.
Remarks
For Beginners: This checks if the voice in the audio matches a known voiceprint. - true = Same person (accept) - false = Different person (reject)
Verify(Tensor<T>, Tensor<T>, T)
Verifies if audio matches a reference speaker embedding with custom threshold.
SpeakerVerificationResult<T> Verify(Tensor<T> audio, Tensor<T> referenceEmbedding, T threshold)
Parameters
audioTensor<T>Audio to verify.
referenceEmbeddingTensor<T>Pre-computed speaker embedding of the claimed identity.
thresholdTCustom decision threshold.
Returns
- SpeakerVerificationResult<T>
Verification result with decision and confidence score.
VerifyAsync(Tensor<T>, Tensor<T>, CancellationToken)
Verifies if audio matches a reference speaker embedding asynchronously.
Task<SpeakerVerificationResult<T>> VerifyAsync(Tensor<T> audio, Tensor<T> referenceEmbedding, CancellationToken cancellationToken = default)
Parameters
audioTensor<T>Audio to verify.
referenceEmbeddingTensor<T>Pre-computed speaker embedding of the claimed identity.
cancellationTokenCancellationTokenCancellation token for async operation.
Returns
- Task<SpeakerVerificationResult<T>>
Verification result with decision and confidence score.
VerifyWithReferenceAudio(Tensor<T>, Tensor<T>)
Verifies if audio matches reference audio of a claimed speaker.
SpeakerVerificationResult<T> VerifyWithReferenceAudio(Tensor<T> audio, Tensor<T> referenceAudio)
Parameters
audioTensor<T>Audio to verify.
referenceAudioTensor<T>Reference audio of the claimed identity.
Returns
- SpeakerVerificationResult<T>
Verification result with decision and confidence score.