Class SpeakerRecognitionBase<T>
Base class for speaker recognition models (embedding extraction, verification, diarization).
public abstract class SpeakerRecognitionBase<T> : AudioNeuralNetworkBase<T>, INeuralNetworkModel<T>, INeuralNetwork<T>, IFullModel<T, Tensor<T>, Tensor<T>>, IModel<Tensor<T>, Tensor<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Tensor<T>, Tensor<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Tensor<T>, Tensor<T>>>, IGradientComputable<T, Tensor<T>, Tensor<T>>, IJitCompilable<T>, IInterpretableModel<T>, IInputGradientComputable<T>, IDisposable
Type Parameters
TThe numeric type used for calculations.
- Inheritance
-
SpeakerRecognitionBase<T>
- Implements
- Derived
- Inherited Members
- Extension Methods
Remarks
Speaker recognition encompasses tasks that identify or verify speakers based on their voice. This base class provides common functionality for: - Speaker embedding extraction (d-vectors, x-vectors) - Speaker verification (is this the claimed speaker?) - Speaker diarization (who spoke when?)
For Beginners: Speaker recognition is like voice fingerprinting. Just as fingerprints are unique to each person, voice characteristics (pitch, speaking style, accent) can identify individuals.
This base class provides:
- Feature extraction utilities (MFCCs, spectral features)
- Embedding dimension management
- Similarity computation methods
Constructors
SpeakerRecognitionBase(NeuralNetworkArchitecture<T>, ILossFunction<T>?)
Initializes a new instance of the SpeakerRecognitionBase class.
protected SpeakerRecognitionBase(NeuralNetworkArchitecture<T> architecture, ILossFunction<T>? lossFunction = null)
Parameters
architectureNeuralNetworkArchitecture<T>The neural network architecture.
lossFunctionILossFunction<T>The loss function to use. If null, a default MSE loss is used.
Properties
EmbeddingDimension
Gets the dimension of output speaker embeddings.
public int EmbeddingDimension { get; protected set; }
Property Value
Remarks
Common values: 192, 256, or 512. Higher dimensions may capture more nuance but require more storage and computation.
MfccExtractor
Gets the MFCC extractor for preprocessing.
protected MfccExtractor<T>? MfccExtractor { get; set; }
Property Value
Methods
AggregateEmbeddings(IReadOnlyList<Tensor<T>>)
Aggregates multiple embeddings into a single representative embedding.
protected Tensor<T> AggregateEmbeddings(IReadOnlyList<Tensor<T>> embeddings)
Parameters
embeddingsIReadOnlyList<Tensor<T>>Collection of embeddings to aggregate.
Returns
- Tensor<T>
Aggregated embedding (normalized mean).
Remarks
For Beginners: If you have multiple recordings of the same person, this combines them into one stronger voiceprint by averaging and normalizing.
ComputeCosineSimilarity(Tensor<T>, Tensor<T>)
Computes cosine similarity between two speaker embedding tensors.
protected T ComputeCosineSimilarity(Tensor<T> embedding1, Tensor<T> embedding2)
Parameters
embedding1Tensor<T>First speaker embedding tensor.
embedding2Tensor<T>Second speaker embedding tensor.
Returns
- T
Cosine similarity score.
ComputeCosineSimilarity(Vector<T>, Vector<T>)
Computes cosine similarity between two speaker embeddings.
protected T ComputeCosineSimilarity(Vector<T> embedding1, Vector<T> embedding2)
Parameters
embedding1Vector<T>First speaker embedding vector.
embedding2Vector<T>Second speaker embedding vector.
Returns
- T
Cosine similarity score between -1 and 1.
Remarks
For Beginners: Cosine similarity measures how similar two embeddings are. - Score close to 1.0: Very similar (likely same speaker) - Score close to 0.0: Not similar - Score close to -1.0: Opposite (very different)
CreateMfccExtractor(int, int)
Creates an MFCC extractor for preprocessing speaker audio.
protected MfccExtractor<T> CreateMfccExtractor(int sampleRate = 16000, int numCoeffs = 40)
Parameters
Returns
- MfccExtractor<T>
A configured MFCC extractor.
NormalizeEmbedding(Tensor<T>)
Normalizes an embedding to unit length (L2 normalization).
protected Tensor<T> NormalizeEmbedding(Tensor<T> embedding)
Parameters
embeddingTensor<T>The embedding to normalize.
Returns
- Tensor<T>
Normalized embedding with unit length.
Remarks
For Beginners: Normalizing embeddings makes them easier to compare. After normalization, all embeddings have length 1, so cosine similarity becomes equivalent to a simple dot product.