Interface IEmotionRecognizer<T>
- Namespace
- AiDotNet.Interfaces
- Assembly
- AiDotNet.dll
Defines the contract for speech emotion recognition models.
public interface IEmotionRecognizer<T>
Type Parameters
TThe numeric type used for calculations.
Remarks
Speech Emotion Recognition (SER) identifies emotional states from voice:
- Basic emotions: Happy, Sad, Angry, Fear, Surprise, Disgust, Neutral
- Arousal: Low (calm) to High (excited)
- Valence: Negative to Positive
- Dominance: Submissive to Dominant
For Beginners: This is like reading emotions from someone's voice!
How humans convey emotion in speech:
- Pitch: Higher when excited/happy, lower when sad
- Speed: Faster when angry/excited, slower when sad
- Volume: Louder when angry, softer when sad/fearful
- Voice quality: Breathy, tense, relaxed
Applications:
- Call centers: Detect frustrated customers for escalation
- Mental health: Monitor patient emotional state
- Voice assistants: Respond appropriately to user mood
- Gaming: Adapt game difficulty/story based on player emotion
- Market research: Analyze focus group reactions
Challenges:
- Cultural differences in emotional expression
- Speaker variability (age, gender, accent)
- Context dependency (same words can mean different emotions)
- Mixed emotions (happy but nervous)
Properties
SampleRate
Gets the sample rate this recognizer operates at.
int SampleRate { get; }
Property Value
SupportedEmotions
Gets the list of emotions this model can detect.
IReadOnlyList<string> SupportedEmotions { get; }
Property Value
Methods
ExtractEmotionFeatures(Tensor<T>)
Extracts emotion-relevant features from audio.
Vector<T> ExtractEmotionFeatures(Tensor<T> audio)
Parameters
audioTensor<T>Audio tensor.
Returns
- Vector<T>
Feature vector useful for emotion classification.
GetArousal(Tensor<T>)
Gets arousal (activation) level from speech.
T GetArousal(Tensor<T> audio)
Parameters
audioTensor<T>Audio tensor containing speech.
Returns
- T
Arousal level from -1.0 (calm) to 1.0 (excited).
GetEmotionProbabilities(Tensor<T>)
Gets probabilities for all supported emotions.
IReadOnlyDictionary<string, T> GetEmotionProbabilities(Tensor<T> audio)
Parameters
audioTensor<T>Audio tensor containing speech.
Returns
- IReadOnlyDictionary<string, T>
Dictionary mapping emotion names to probabilities.
GetValence(Tensor<T>)
Gets valence (positivity) level from speech.
T GetValence(Tensor<T> audio)
Parameters
audioTensor<T>Audio tensor containing speech.
Returns
- T
Valence level from -1.0 (negative) to 1.0 (positive).
RecognizeEmotion(Tensor<T>)
Recognizes the primary emotion in speech audio.
EmotionResult<T> RecognizeEmotion(Tensor<T> audio)
Parameters
audioTensor<T>Audio tensor containing speech.
Returns
- EmotionResult<T>
The detected emotion and confidence score.
RecognizeEmotionTimeSeries(Tensor<T>, int, int)
Recognizes emotions over time (for longer recordings).
IReadOnlyList<TimedEmotionResult<T>> RecognizeEmotionTimeSeries(Tensor<T> audio, int windowSizeMs = 1000, int hopSizeMs = 500)
Parameters
audioTensor<T>Audio tensor containing speech.
windowSizeMsintAnalysis window size in milliseconds.
hopSizeMsintHop between windows in milliseconds.
Returns
- IReadOnlyList<TimedEmotionResult<T>>
Time-series of emotion predictions.