Table of Contents

Interface ILanguageIdentifier<T>

Namespace
AiDotNet.Interfaces
Assembly
AiDotNet.dll

Defines the contract for spoken language identification from audio.

public interface ILanguageIdentifier<T>

Type Parameters

T

The numeric type used for calculations.

Remarks

Language Identification (LID) determines which language is being spoken in an audio recording. This is different from speech recognition - we're identifying the language, not transcribing the words.

For Beginners: This is like having a friend who can tell you "that's French!" or "that sounds like Mandarin!" just from hearing it.

How it works:

  1. Extract acoustic features (phonemes, prosody, rhythm)
  2. Compare to language models trained on many languages
  3. Return the most likely language(s)

Applications:

  • Call routing in multilingual call centers
  • Automatic subtitle language selection
  • Content moderation (filter by language)
  • Multilingual speech recognition (select correct model)
  • Immigration/border control voice analysis

Challenges:

  • Code-switching (mixing languages mid-sentence)
  • Accented speech (Spanish with American accent)
  • Closely related languages (Norwegian vs Swedish)
  • Short utterances (harder to identify with less audio)

Properties

SampleRate

Gets the sample rate this identifier operates at.

int SampleRate { get; }

Property Value

int

SupportedLanguages

Gets the list of languages this model can identify.

IReadOnlyList<string> SupportedLanguages { get; }

Property Value

IReadOnlyList<string>

Remarks

Language codes typically follow ISO 639-1 (e.g., "en", "es", "zh") or ISO 639-3 for more specific variants.

Methods

AreSameLanguage(Tensor<T>, Tensor<T>)

Checks if two audio samples are in the same language.

(bool SameLanguage, T Confidence) AreSameLanguage(Tensor<T> audio1, Tensor<T> audio2)

Parameters

audio1 Tensor<T>

First audio sample.

audio2 Tensor<T>

Second audio sample.

Returns

(bool SameLanguage, T Confidence)

True if same language, with confidence score.

GetLanguageDisplayName(string)

Gets the display name for a language code.

string GetLanguageDisplayName(string languageCode)

Parameters

languageCode string

ISO language code.

Returns

string

Human-readable language name.

GetLanguageProbabilities(Tensor<T>)

Gets probabilities for all supported languages.

IReadOnlyDictionary<string, T> GetLanguageProbabilities(Tensor<T> audio)

Parameters

audio Tensor<T>

Audio tensor containing speech.

Returns

IReadOnlyDictionary<string, T>

Dictionary mapping language codes to probabilities.

GetTopLanguages(Tensor<T>, int)

Gets the top-N most likely languages.

IReadOnlyList<(string Language, T Probability)> GetTopLanguages(Tensor<T> audio, int topN = 5)

Parameters

audio Tensor<T>

Audio tensor containing speech.

topN int

Number of languages to return.

Returns

IReadOnlyList<(string Label, T Probability)>

List of (language, probability) pairs sorted by probability.

IdentifyLanguage(Tensor<T>)

Identifies the language spoken in audio.

LanguageResult<T> IdentifyLanguage(Tensor<T> audio)

Parameters

audio Tensor<T>

Audio tensor containing speech.

Returns

LanguageResult<T>

Detected language code and confidence.

IdentifyLanguageSegments(Tensor<T>, int)

Identifies language with time segmentation (for multilingual audio).

IReadOnlyList<LanguageSegment<T>> IdentifyLanguageSegments(Tensor<T> audio, int windowSizeMs = 2000)

Parameters

audio Tensor<T>

Audio tensor that may contain multiple languages.

windowSizeMs int

Analysis window size in milliseconds.

Returns

IReadOnlyList<LanguageSegment<T>>

Time-segmented language predictions.

Remarks

For Beginners: Use this when someone might switch languages mid-recording (code-switching). It tells you which language is spoken at each point in time.