Interface IEmotionRecognizer<T>

Namespace: AiDotNet.Interfaces

Assembly: AiDotNet.dll

Defines the contract for speech emotion recognition models.

public interface IEmotionRecognizer<T>

Type Parameters

T: The numeric type used for calculations.

Remarks

Speech Emotion Recognition (SER) identifies emotional states from voice:

Basic emotions: Happy, Sad, Angry, Fear, Surprise, Disgust, Neutral
Arousal: Low (calm) to High (excited)
Valence: Negative to Positive
Dominance: Submissive to Dominant

For Beginners: This is like reading emotions from someone's voice!

How humans convey emotion in speech:

Pitch: Higher when excited/happy, lower when sad
Speed: Faster when angry/excited, slower when sad
Volume: Louder when angry, softer when sad/fearful
Voice quality: Breathy, tense, relaxed

Applications:

Call centers: Detect frustrated customers for escalation
Mental health: Monitor patient emotional state
Voice assistants: Respond appropriately to user mood
Gaming: Adapt game difficulty/story based on player emotion
Market research: Analyze focus group reactions

Challenges:

Cultural differences in emotional expression
Speaker variability (age, gender, accent)
Context dependency (same words can mean different emotions)
Mixed emotions (happy but nervous)

Properties

SampleRate

Gets the sample rate this recognizer operates at.

int SampleRate { get; }

Property Value

int

SupportedEmotions

Gets the list of emotions this model can detect.

IReadOnlyList<string> SupportedEmotions { get; }

Property Value

IReadOnlyList<string>

Methods

ExtractEmotionFeatures(Tensor<T>)

Extracts emotion-relevant features from audio.

Vector<T> ExtractEmotionFeatures(Tensor<T> audio)

Parameters

audio Tensor<T>: Audio tensor.

Returns

Vector<T>: Feature vector useful for emotion classification.

GetArousal(Tensor<T>)

Gets arousal (activation) level from speech.

T GetArousal(Tensor<T> audio)

Parameters

audio Tensor<T>: Audio tensor containing speech.

Returns

T: Arousal level from -1.0 (calm) to 1.0 (excited).

GetEmotionProbabilities(Tensor<T>)

Gets probabilities for all supported emotions.

IReadOnlyDictionary<string, T> GetEmotionProbabilities(Tensor<T> audio)

Parameters

audio Tensor<T>: Audio tensor containing speech.

Returns

IReadOnlyDictionary<string, T>: Dictionary mapping emotion names to probabilities.

GetValence(Tensor<T>)

Gets valence (positivity) level from speech.

T GetValence(Tensor<T> audio)

Parameters

audio Tensor<T>: Audio tensor containing speech.

Returns

T: Valence level from -1.0 (negative) to 1.0 (positive).

RecognizeEmotion(Tensor<T>)

Recognizes the primary emotion in speech audio.

EmotionResult<T> RecognizeEmotion(Tensor<T> audio)

Parameters

audio Tensor<T>: Audio tensor containing speech.

Returns

EmotionResult<T>: The detected emotion and confidence score.

RecognizeEmotionTimeSeries(Tensor<T>, int, int)

Recognizes emotions over time (for longer recordings).

IReadOnlyList<TimedEmotionResult<T>> RecognizeEmotionTimeSeries(Tensor<T> audio, int windowSizeMs = 1000, int hopSizeMs = 500)

Parameters

audio Tensor<T>: Audio tensor containing speech.
windowSizeMs int: Analysis window size in milliseconds.
hopSizeMs int: Hop between windows in milliseconds.

Returns

IReadOnlyList<TimedEmotionResult<T>>: Time-series of emotion predictions.

Table of Contents

Interface IEmotionRecognizer<T>

Type Parameters

Remarks

Properties

SampleRate

Property Value

SupportedEmotions

Property Value

Methods

ExtractEmotionFeatures(Tensor<T>)

Parameters

Returns

GetArousal(Tensor<T>)

Parameters

Returns

GetEmotionProbabilities(Tensor<T>)

Parameters

Returns

GetValence(Tensor<T>)

Parameters

Returns

RecognizeEmotion(Tensor<T>)

Parameters

Returns

RecognizeEmotionTimeSeries(Tensor<T>, int, int)

Parameters

Returns