Interface ILanguageIdentifier<T>
- Namespace
- AiDotNet.Interfaces
- Assembly
- AiDotNet.dll
Defines the contract for spoken language identification from audio.
public interface ILanguageIdentifier<T>
Type Parameters
TThe numeric type used for calculations.
Remarks
Language Identification (LID) determines which language is being spoken in an audio recording. This is different from speech recognition - we're identifying the language, not transcribing the words.
For Beginners: This is like having a friend who can tell you "that's French!" or "that sounds like Mandarin!" just from hearing it.
How it works:
- Extract acoustic features (phonemes, prosody, rhythm)
- Compare to language models trained on many languages
- Return the most likely language(s)
Applications:
- Call routing in multilingual call centers
- Automatic subtitle language selection
- Content moderation (filter by language)
- Multilingual speech recognition (select correct model)
- Immigration/border control voice analysis
Challenges:
- Code-switching (mixing languages mid-sentence)
- Accented speech (Spanish with American accent)
- Closely related languages (Norwegian vs Swedish)
- Short utterances (harder to identify with less audio)
Properties
SampleRate
Gets the sample rate this identifier operates at.
int SampleRate { get; }
Property Value
SupportedLanguages
Gets the list of languages this model can identify.
IReadOnlyList<string> SupportedLanguages { get; }
Property Value
Remarks
Language codes typically follow ISO 639-1 (e.g., "en", "es", "zh") or ISO 639-3 for more specific variants.
Methods
AreSameLanguage(Tensor<T>, Tensor<T>)
Checks if two audio samples are in the same language.
(bool SameLanguage, T Confidence) AreSameLanguage(Tensor<T> audio1, Tensor<T> audio2)
Parameters
audio1Tensor<T>First audio sample.
audio2Tensor<T>Second audio sample.
Returns
- (bool SameLanguage, T Confidence)
True if same language, with confidence score.
GetLanguageDisplayName(string)
Gets the display name for a language code.
string GetLanguageDisplayName(string languageCode)
Parameters
languageCodestringISO language code.
Returns
- string
Human-readable language name.
GetLanguageProbabilities(Tensor<T>)
Gets probabilities for all supported languages.
IReadOnlyDictionary<string, T> GetLanguageProbabilities(Tensor<T> audio)
Parameters
audioTensor<T>Audio tensor containing speech.
Returns
- IReadOnlyDictionary<string, T>
Dictionary mapping language codes to probabilities.
GetTopLanguages(Tensor<T>, int)
Gets the top-N most likely languages.
IReadOnlyList<(string Language, T Probability)> GetTopLanguages(Tensor<T> audio, int topN = 5)
Parameters
audioTensor<T>Audio tensor containing speech.
topNintNumber of languages to return.
Returns
- IReadOnlyList<(string Label, T Probability)>
List of (language, probability) pairs sorted by probability.
IdentifyLanguage(Tensor<T>)
Identifies the language spoken in audio.
LanguageResult<T> IdentifyLanguage(Tensor<T> audio)
Parameters
audioTensor<T>Audio tensor containing speech.
Returns
- LanguageResult<T>
Detected language code and confidence.
IdentifyLanguageSegments(Tensor<T>, int)
Identifies language with time segmentation (for multilingual audio).
IReadOnlyList<LanguageSegment<T>> IdentifyLanguageSegments(Tensor<T> audio, int windowSizeMs = 2000)
Parameters
audioTensor<T>Audio tensor that may contain multiple languages.
windowSizeMsintAnalysis window size in milliseconds.
Returns
- IReadOnlyList<LanguageSegment<T>>
Time-segmented language predictions.
Remarks
For Beginners: Use this when someone might switch languages mid-recording (code-switching). It tells you which language is spoken at each point in time.