Table of Contents

Interface IEmotionRecognizer<T>

Namespace
AiDotNet.Interfaces
Assembly
AiDotNet.dll

Defines the contract for speech emotion recognition models.

public interface IEmotionRecognizer<T>

Type Parameters

T

The numeric type used for calculations.

Remarks

Speech Emotion Recognition (SER) identifies emotional states from voice:

  • Basic emotions: Happy, Sad, Angry, Fear, Surprise, Disgust, Neutral
  • Arousal: Low (calm) to High (excited)
  • Valence: Negative to Positive
  • Dominance: Submissive to Dominant

For Beginners: This is like reading emotions from someone's voice!

How humans convey emotion in speech:

  • Pitch: Higher when excited/happy, lower when sad
  • Speed: Faster when angry/excited, slower when sad
  • Volume: Louder when angry, softer when sad/fearful
  • Voice quality: Breathy, tense, relaxed

Applications:

  • Call centers: Detect frustrated customers for escalation
  • Mental health: Monitor patient emotional state
  • Voice assistants: Respond appropriately to user mood
  • Gaming: Adapt game difficulty/story based on player emotion
  • Market research: Analyze focus group reactions

Challenges:

  • Cultural differences in emotional expression
  • Speaker variability (age, gender, accent)
  • Context dependency (same words can mean different emotions)
  • Mixed emotions (happy but nervous)

Properties

SampleRate

Gets the sample rate this recognizer operates at.

int SampleRate { get; }

Property Value

int

SupportedEmotions

Gets the list of emotions this model can detect.

IReadOnlyList<string> SupportedEmotions { get; }

Property Value

IReadOnlyList<string>

Methods

ExtractEmotionFeatures(Tensor<T>)

Extracts emotion-relevant features from audio.

Vector<T> ExtractEmotionFeatures(Tensor<T> audio)

Parameters

audio Tensor<T>

Audio tensor.

Returns

Vector<T>

Feature vector useful for emotion classification.

GetArousal(Tensor<T>)

Gets arousal (activation) level from speech.

T GetArousal(Tensor<T> audio)

Parameters

audio Tensor<T>

Audio tensor containing speech.

Returns

T

Arousal level from -1.0 (calm) to 1.0 (excited).

GetEmotionProbabilities(Tensor<T>)

Gets probabilities for all supported emotions.

IReadOnlyDictionary<string, T> GetEmotionProbabilities(Tensor<T> audio)

Parameters

audio Tensor<T>

Audio tensor containing speech.

Returns

IReadOnlyDictionary<string, T>

Dictionary mapping emotion names to probabilities.

GetValence(Tensor<T>)

Gets valence (positivity) level from speech.

T GetValence(Tensor<T> audio)

Parameters

audio Tensor<T>

Audio tensor containing speech.

Returns

T

Valence level from -1.0 (negative) to 1.0 (positive).

RecognizeEmotion(Tensor<T>)

Recognizes the primary emotion in speech audio.

EmotionResult<T> RecognizeEmotion(Tensor<T> audio)

Parameters

audio Tensor<T>

Audio tensor containing speech.

Returns

EmotionResult<T>

The detected emotion and confidence score.

RecognizeEmotionTimeSeries(Tensor<T>, int, int)

Recognizes emotions over time (for longer recordings).

IReadOnlyList<TimedEmotionResult<T>> RecognizeEmotionTimeSeries(Tensor<T> audio, int windowSizeMs = 1000, int hopSizeMs = 500)

Parameters

audio Tensor<T>

Audio tensor containing speech.

windowSizeMs int

Analysis window size in milliseconds.

hopSizeMs int

Hop between windows in milliseconds.

Returns

IReadOnlyList<TimedEmotionResult<T>>

Time-series of emotion predictions.