Table of Contents

Class SpeakerVerifier<T>

Namespace
AiDotNet.Audio.Speaker
Assembly
AiDotNet.dll

Verifies speaker identity by comparing embeddings against enrolled speakers.

public class SpeakerVerifier<T> : SpeakerRecognitionBase<T>, INeuralNetworkModel<T>, INeuralNetwork<T>, IInterpretableModel<T>, IInputGradientComputable<T>, IDisposable, ISpeakerVerifier<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>

Type Parameters

T

The numeric type used for calculations.

Inheritance
SpeakerVerifier<T>
Implements
IFullModel<T, Tensor<T>, Tensor<T>>
IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>
IParameterizable<T, Tensor<T>, Tensor<T>>
ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>
IGradientComputable<T, Tensor<T>, Tensor<T>>
Inherited Members
Extension Methods

Remarks

Speaker verification answers the question "Is this the person they claim to be?" by comparing a test utterance against enrolled speaker embeddings.

For Beginners: Speaker verification is like voice-based password checking: 1. First, you "enroll" a speaker by recording their voice samples 2. Later, when someone claims to be that person, you record them and compare 3. If the voices match closely enough, the identity is verified

Usage (ONNX Mode):

var verifier = new SpeakerVerifier<float>(
    architecture,
    embeddingModelPath: "speaker_model.onnx");
var result = verifier.Verify(audio, referenceEmbedding);

Usage (Native Training Mode):

var verifier = new SpeakerVerifier<float>(architecture);
verifier.Train(audioInput, expectedOutput);

Constructors

SpeakerVerifier()

Creates a SpeakerVerifier with default settings for native training mode.

public SpeakerVerifier()

Remarks

For Beginners: This is the simplest way to create a speaker verifier. It uses default settings suitable for most use cases.

SpeakerVerifier(NeuralNetworkArchitecture<T>, int, int, double, int, int, int, IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>?, ILossFunction<T>?)

Creates a SpeakerVerifier for native training mode.

public SpeakerVerifier(NeuralNetworkArchitecture<T> architecture, int sampleRate = 16000, int embeddingDimension = 256, double defaultThreshold = 0.6, int hiddenDim = 256, int numEncoderLayers = 3, int numHeads = 4, IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>? optimizer = null, ILossFunction<T>? lossFunction = null)

Parameters

architecture NeuralNetworkArchitecture<T>

The neural network architecture configuration.

sampleRate int

Expected sample rate for input audio. Default is 16000.

embeddingDimension int

Dimension of speaker embeddings. Default is 256.

defaultThreshold double

Default verification threshold. Default is 0.6.

hiddenDim int

Hidden dimension for encoder layers. Default is 256.

numEncoderLayers int

Number of encoder layers. Default is 3.

numHeads int

Number of attention heads. Default is 4.

optimizer IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>

Optimizer for training. If null, AdamW is used.

lossFunction ILossFunction<T>

Loss function for training. If null, MSE loss is used.

SpeakerVerifier(NeuralNetworkArchitecture<T>, string, int, int, double, OnnxModelOptions?)

Creates a SpeakerVerifier for ONNX inference with a pretrained model.

public SpeakerVerifier(NeuralNetworkArchitecture<T> architecture, string embeddingModelPath, int sampleRate = 16000, int embeddingDimension = 256, double defaultThreshold = 0.6, OnnxModelOptions? onnxOptions = null)

Parameters

architecture NeuralNetworkArchitecture<T>

The neural network architecture configuration.

embeddingModelPath string

Required path to speaker embedding ONNX model.

sampleRate int

Expected sample rate for input audio. Default is 16000.

embeddingDimension int

Dimension of speaker embeddings. Default is 256.

defaultThreshold double

Default verification threshold. Default is 0.6.

onnxOptions OnnxModelOptions

ONNX runtime options.

Properties

DefaultThreshold

Gets the default verification threshold.

public T DefaultThreshold { get; }

Property Value

T

EmbeddingExtractor

Gets the underlying speaker embedding extractor.

public ISpeakerEmbeddingExtractor<T> EmbeddingExtractor { get; }

Property Value

ISpeakerEmbeddingExtractor<T>

EnrolledCount

Gets the number of enrolled speakers.

public int EnrolledCount { get; }

Property Value

int

IsOnnxMode

Gets whether the model is in ONNX inference mode.

public bool IsOnnxMode { get; }

Property Value

bool

VerificationThreshold

Gets the verification threshold.

public double VerificationThreshold { get; }

Property Value

double

Methods

ComputeScore(Tensor<T>, Tensor<T>)

Computes the verification score between audio and a reference.

public T ComputeScore(Tensor<T> audio, Tensor<T> referenceEmbedding)

Parameters

audio Tensor<T>
referenceEmbedding Tensor<T>

Returns

T

CreateNewInstance()

Creates a new instance of this model for cloning.

protected override IFullModel<T, Tensor<T>, Tensor<T>> CreateNewInstance()

Returns

IFullModel<T, Tensor<T>, Tensor<T>>

DeserializeNetworkSpecificData(BinaryReader)

Deserializes network-specific data.

protected override void DeserializeNetworkSpecificData(BinaryReader reader)

Parameters

reader BinaryReader

Dispose(bool)

Disposes the model and releases resources.

protected override void Dispose(bool disposing)

Parameters

disposing bool

Enroll(Tensor<T>)

Enrolls a speaker by creating a reference embedding from a single audio sample.

public SpeakerProfile<T> Enroll(Tensor<T> enrollmentAudio)

Parameters

enrollmentAudio Tensor<T>

Returns

SpeakerProfile<T>

Enroll(IReadOnlyList<Tensor<T>>)

Enrolls a speaker by creating a reference embedding from audio samples.

public SpeakerProfile<T> Enroll(IReadOnlyList<Tensor<T>> enrollmentAudio)

Parameters

enrollmentAudio IReadOnlyList<Tensor<T>>

Returns

SpeakerProfile<T>

Enroll(string, params SpeakerEmbedding<T>[])

Enrolls a speaker with one or more embeddings (legacy API).

public void Enroll(string speakerId, params SpeakerEmbedding<T>[] embeddings)

Parameters

speakerId string
embeddings SpeakerEmbedding<T>[]

GetEnrolledSpeakers()

Gets all enrolled speaker IDs.

public IReadOnlyList<string> GetEnrolledSpeakers()

Returns

IReadOnlyList<string>

GetModelMetadata()

Gets metadata about the model.

public override ModelMetadata<T> GetModelMetadata()

Returns

ModelMetadata<T>

GetThresholdForFAR(double)

Gets the recommended threshold for a target false accept rate.

public T GetThresholdForFAR(double targetFAR)

Parameters

targetFAR double

Returns

T

Identify(SpeakerEmbedding<T>)

Identifies the most likely speaker from enrolled set (legacy API).

public IdentificationResult Identify(SpeakerEmbedding<T> testEmbedding)

Parameters

testEmbedding SpeakerEmbedding<T>

Returns

IdentificationResult

InitializeLayers()

Initializes the layers for the speaker verifier.

protected override void InitializeLayers()

IsEnrolled(string)

Checks if a speaker is enrolled.

public bool IsEnrolled(string speakerId)

Parameters

speakerId string

Returns

bool

PostprocessOutput(Tensor<T>)

Postprocesses model output into the final result format.

protected override Tensor<T> PostprocessOutput(Tensor<T> modelOutput)

Parameters

modelOutput Tensor<T>

Returns

Tensor<T>

Predict(Tensor<T>)

Makes a prediction using the model.

public override Tensor<T> Predict(Tensor<T> input)

Parameters

input Tensor<T>

Returns

Tensor<T>

PreprocessAudio(Tensor<T>)

Preprocesses raw audio for model input.

protected override Tensor<T> PreprocessAudio(Tensor<T> rawAudio)

Parameters

rawAudio Tensor<T>

Returns

Tensor<T>

SerializeNetworkSpecificData(BinaryWriter)

Serializes network-specific data.

protected override void SerializeNetworkSpecificData(BinaryWriter writer)

Parameters

writer BinaryWriter

Train(Tensor<T>, Tensor<T>)

Trains the model on input data.

public override void Train(Tensor<T> input, Tensor<T> expectedOutput)

Parameters

input Tensor<T>
expectedOutput Tensor<T>

Unenroll(string)

Removes a speaker's enrollment.

public bool Unenroll(string speakerId)

Parameters

speakerId string

Returns

bool

UpdateParameters(Vector<T>)

Updates model parameters.

public override void UpdateParameters(Vector<T> parameters)

Parameters

parameters Vector<T>

UpdateProfile(SpeakerProfile<T>, Tensor<T>)

Updates an existing speaker profile with additional audio.

public SpeakerProfile<T> UpdateProfile(SpeakerProfile<T> existingProfile, Tensor<T> newAudio)

Parameters

existingProfile SpeakerProfile<T>
newAudio Tensor<T>

Returns

SpeakerProfile<T>

Verify(Tensor<T>, Tensor<T>)

Verifies if audio matches a reference speaker embedding.

public SpeakerVerificationResult<T> Verify(Tensor<T> audio, Tensor<T> referenceEmbedding)

Parameters

audio Tensor<T>
referenceEmbedding Tensor<T>

Returns

SpeakerVerificationResult<T>

Verify(Tensor<T>, Tensor<T>, T)

Verifies if audio matches a reference speaker embedding with custom threshold.

public SpeakerVerificationResult<T> Verify(Tensor<T> audio, Tensor<T> referenceEmbedding, T threshold)

Parameters

audio Tensor<T>
referenceEmbedding Tensor<T>
threshold T

Returns

SpeakerVerificationResult<T>

Verify(string, SpeakerEmbedding<T>)

Verifies if a test embedding matches an enrolled speaker (legacy API).

public VerificationResult Verify(string speakerId, SpeakerEmbedding<T> testEmbedding)

Parameters

speakerId string
testEmbedding SpeakerEmbedding<T>

Returns

VerificationResult

VerifyAsync(Tensor<T>, Tensor<T>, CancellationToken)

Verifies if audio matches a reference speaker embedding asynchronously.

public Task<SpeakerVerificationResult<T>> VerifyAsync(Tensor<T> audio, Tensor<T> referenceEmbedding, CancellationToken cancellationToken = default)

Parameters

audio Tensor<T>
referenceEmbedding Tensor<T>
cancellationToken CancellationToken

Returns

Task<SpeakerVerificationResult<T>>

VerifyWithReferenceAudio(Tensor<T>, Tensor<T>)

Verifies if audio matches reference audio of a claimed speaker.

public SpeakerVerificationResult<T> VerifyWithReferenceAudio(Tensor<T> audio, Tensor<T> referenceAudio)

Parameters

audio Tensor<T>
referenceAudio Tensor<T>

Returns

SpeakerVerificationResult<T>