Class SpeakerVerifier<T>
Verifies speaker identity by comparing embeddings against enrolled speakers.
public class SpeakerVerifier<T> : SpeakerRecognitionBase<T>, INeuralNetworkModel<T>, INeuralNetwork<T>, IInterpretableModel<T>, IInputGradientComputable<T>, IDisposable, ISpeakerVerifier<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>
Type Parameters
TThe numeric type used for calculations.
- Inheritance
-
SpeakerVerifier<T>
- Implements
- Inherited Members
- Extension Methods
Remarks
Speaker verification answers the question "Is this the person they claim to be?" by comparing a test utterance against enrolled speaker embeddings.
For Beginners: Speaker verification is like voice-based password checking: 1. First, you "enroll" a speaker by recording their voice samples 2. Later, when someone claims to be that person, you record them and compare 3. If the voices match closely enough, the identity is verified
Usage (ONNX Mode):
var verifier = new SpeakerVerifier<float>(
architecture,
embeddingModelPath: "speaker_model.onnx");
var result = verifier.Verify(audio, referenceEmbedding);
Usage (Native Training Mode):
var verifier = new SpeakerVerifier<float>(architecture);
verifier.Train(audioInput, expectedOutput);
Constructors
SpeakerVerifier()
Creates a SpeakerVerifier with default settings for native training mode.
public SpeakerVerifier()
Remarks
For Beginners: This is the simplest way to create a speaker verifier. It uses default settings suitable for most use cases.
SpeakerVerifier(NeuralNetworkArchitecture<T>, int, int, double, int, int, int, IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>?, ILossFunction<T>?)
Creates a SpeakerVerifier for native training mode.
public SpeakerVerifier(NeuralNetworkArchitecture<T> architecture, int sampleRate = 16000, int embeddingDimension = 256, double defaultThreshold = 0.6, int hiddenDim = 256, int numEncoderLayers = 3, int numHeads = 4, IGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>? optimizer = null, ILossFunction<T>? lossFunction = null)
Parameters
architectureNeuralNetworkArchitecture<T>The neural network architecture configuration.
sampleRateintExpected sample rate for input audio. Default is 16000.
embeddingDimensionintDimension of speaker embeddings. Default is 256.
defaultThresholddoubleDefault verification threshold. Default is 0.6.
hiddenDimintHidden dimension for encoder layers. Default is 256.
numEncoderLayersintNumber of encoder layers. Default is 3.
numHeadsintNumber of attention heads. Default is 4.
optimizerIGradientBasedOptimizer<T, Tensor<T>, Tensor<T>>Optimizer for training. If null, AdamW is used.
lossFunctionILossFunction<T>Loss function for training. If null, MSE loss is used.
SpeakerVerifier(NeuralNetworkArchitecture<T>, string, int, int, double, OnnxModelOptions?)
Creates a SpeakerVerifier for ONNX inference with a pretrained model.
public SpeakerVerifier(NeuralNetworkArchitecture<T> architecture, string embeddingModelPath, int sampleRate = 16000, int embeddingDimension = 256, double defaultThreshold = 0.6, OnnxModelOptions? onnxOptions = null)
Parameters
architectureNeuralNetworkArchitecture<T>The neural network architecture configuration.
embeddingModelPathstringRequired path to speaker embedding ONNX model.
sampleRateintExpected sample rate for input audio. Default is 16000.
embeddingDimensionintDimension of speaker embeddings. Default is 256.
defaultThresholddoubleDefault verification threshold. Default is 0.6.
onnxOptionsOnnxModelOptionsONNX runtime options.
Properties
DefaultThreshold
Gets the default verification threshold.
public T DefaultThreshold { get; }
Property Value
- T
EmbeddingExtractor
Gets the underlying speaker embedding extractor.
public ISpeakerEmbeddingExtractor<T> EmbeddingExtractor { get; }
Property Value
EnrolledCount
Gets the number of enrolled speakers.
public int EnrolledCount { get; }
Property Value
IsOnnxMode
Gets whether the model is in ONNX inference mode.
public bool IsOnnxMode { get; }
Property Value
VerificationThreshold
Gets the verification threshold.
public double VerificationThreshold { get; }
Property Value
Methods
ComputeScore(Tensor<T>, Tensor<T>)
Computes the verification score between audio and a reference.
public T ComputeScore(Tensor<T> audio, Tensor<T> referenceEmbedding)
Parameters
audioTensor<T>referenceEmbeddingTensor<T>
Returns
- T
CreateNewInstance()
Creates a new instance of this model for cloning.
protected override IFullModel<T, Tensor<T>, Tensor<T>> CreateNewInstance()
Returns
- IFullModel<T, Tensor<T>, Tensor<T>>
DeserializeNetworkSpecificData(BinaryReader)
Deserializes network-specific data.
protected override void DeserializeNetworkSpecificData(BinaryReader reader)
Parameters
readerBinaryReader
Dispose(bool)
Disposes the model and releases resources.
protected override void Dispose(bool disposing)
Parameters
disposingbool
Enroll(Tensor<T>)
Enrolls a speaker by creating a reference embedding from a single audio sample.
public SpeakerProfile<T> Enroll(Tensor<T> enrollmentAudio)
Parameters
enrollmentAudioTensor<T>
Returns
Enroll(IReadOnlyList<Tensor<T>>)
Enrolls a speaker by creating a reference embedding from audio samples.
public SpeakerProfile<T> Enroll(IReadOnlyList<Tensor<T>> enrollmentAudio)
Parameters
enrollmentAudioIReadOnlyList<Tensor<T>>
Returns
Enroll(string, params SpeakerEmbedding<T>[])
Enrolls a speaker with one or more embeddings (legacy API).
public void Enroll(string speakerId, params SpeakerEmbedding<T>[] embeddings)
Parameters
speakerIdstringembeddingsSpeakerEmbedding<T>[]
GetEnrolledSpeakers()
Gets all enrolled speaker IDs.
public IReadOnlyList<string> GetEnrolledSpeakers()
Returns
GetModelMetadata()
Gets metadata about the model.
public override ModelMetadata<T> GetModelMetadata()
Returns
GetThresholdForFAR(double)
Gets the recommended threshold for a target false accept rate.
public T GetThresholdForFAR(double targetFAR)
Parameters
targetFARdouble
Returns
- T
Identify(SpeakerEmbedding<T>)
Identifies the most likely speaker from enrolled set (legacy API).
public IdentificationResult Identify(SpeakerEmbedding<T> testEmbedding)
Parameters
testEmbeddingSpeakerEmbedding<T>
Returns
InitializeLayers()
Initializes the layers for the speaker verifier.
protected override void InitializeLayers()
IsEnrolled(string)
Checks if a speaker is enrolled.
public bool IsEnrolled(string speakerId)
Parameters
speakerIdstring
Returns
PostprocessOutput(Tensor<T>)
Postprocesses model output into the final result format.
protected override Tensor<T> PostprocessOutput(Tensor<T> modelOutput)
Parameters
modelOutputTensor<T>
Returns
- Tensor<T>
Predict(Tensor<T>)
Makes a prediction using the model.
public override Tensor<T> Predict(Tensor<T> input)
Parameters
inputTensor<T>
Returns
- Tensor<T>
PreprocessAudio(Tensor<T>)
Preprocesses raw audio for model input.
protected override Tensor<T> PreprocessAudio(Tensor<T> rawAudio)
Parameters
rawAudioTensor<T>
Returns
- Tensor<T>
SerializeNetworkSpecificData(BinaryWriter)
Serializes network-specific data.
protected override void SerializeNetworkSpecificData(BinaryWriter writer)
Parameters
writerBinaryWriter
Train(Tensor<T>, Tensor<T>)
Trains the model on input data.
public override void Train(Tensor<T> input, Tensor<T> expectedOutput)
Parameters
inputTensor<T>expectedOutputTensor<T>
Unenroll(string)
Removes a speaker's enrollment.
public bool Unenroll(string speakerId)
Parameters
speakerIdstring
Returns
UpdateParameters(Vector<T>)
Updates model parameters.
public override void UpdateParameters(Vector<T> parameters)
Parameters
parametersVector<T>
UpdateProfile(SpeakerProfile<T>, Tensor<T>)
Updates an existing speaker profile with additional audio.
public SpeakerProfile<T> UpdateProfile(SpeakerProfile<T> existingProfile, Tensor<T> newAudio)
Parameters
existingProfileSpeakerProfile<T>newAudioTensor<T>
Returns
Verify(Tensor<T>, Tensor<T>)
Verifies if audio matches a reference speaker embedding.
public SpeakerVerificationResult<T> Verify(Tensor<T> audio, Tensor<T> referenceEmbedding)
Parameters
audioTensor<T>referenceEmbeddingTensor<T>
Returns
Verify(Tensor<T>, Tensor<T>, T)
Verifies if audio matches a reference speaker embedding with custom threshold.
public SpeakerVerificationResult<T> Verify(Tensor<T> audio, Tensor<T> referenceEmbedding, T threshold)
Parameters
audioTensor<T>referenceEmbeddingTensor<T>thresholdT
Returns
Verify(string, SpeakerEmbedding<T>)
Verifies if a test embedding matches an enrolled speaker (legacy API).
public VerificationResult Verify(string speakerId, SpeakerEmbedding<T> testEmbedding)
Parameters
speakerIdstringtestEmbeddingSpeakerEmbedding<T>
Returns
VerifyAsync(Tensor<T>, Tensor<T>, CancellationToken)
Verifies if audio matches a reference speaker embedding asynchronously.
public Task<SpeakerVerificationResult<T>> VerifyAsync(Tensor<T> audio, Tensor<T> referenceEmbedding, CancellationToken cancellationToken = default)
Parameters
audioTensor<T>referenceEmbeddingTensor<T>cancellationTokenCancellationToken
Returns
VerifyWithReferenceAudio(Tensor<T>, Tensor<T>)
Verifies if audio matches reference audio of a claimed speaker.
public SpeakerVerificationResult<T> VerifyWithReferenceAudio(Tensor<T> audio, Tensor<T> referenceAudio)
Parameters
audioTensor<T>referenceAudioTensor<T>