Table of Contents

Class LatentDirichletAllocation<T>

Namespace
AiDotNet.Preprocessing.DimensionalityReduction
Assembly
AiDotNet.dll

Latent Dirichlet Allocation for topic modeling.

public class LatentDirichletAllocation<T> : TransformerBase<T, Matrix<T>, Matrix<T>>, IDataTransformer<T, Matrix<T>, Matrix<T>>

Type Parameters

T

The numeric type for calculations (e.g., float, double).

Inheritance
TransformerBase<T, Matrix<T>, Matrix<T>>
LatentDirichletAllocation<T>
Implements
IDataTransformer<T, Matrix<T>, Matrix<T>>
Inherited Members

Remarks

LDA is a generative probabilistic model that discovers latent topics in a collection of documents (or more generally, in count data). Each document is modeled as a mixture of topics, and each topic is a distribution over words (features).

The model assumes: - Each document has a distribution over topics (theta) - Each topic has a distribution over words (beta) - Words in documents are generated by first sampling a topic, then sampling a word

For Beginners: LDA finds hidden themes (topics) in your data: - Input: Document-term matrix (rows=documents, columns=word counts) - Output: Document-topic distribution (what topics each document is about) - Also learns: Topic-word distribution (what words define each topic) - Use for: Topic modeling, document clustering, feature extraction

Constructors

LatentDirichletAllocation(int, double?, double?, LdaLearningMethod, int, double, int, int?, int[]?)

Creates a new instance of LatentDirichletAllocation<T>.

public LatentDirichletAllocation(int nComponents = 10, double? docTopicPrior = null, double? topicWordPrior = null, LdaLearningMethod learningMethod = LdaLearningMethod.Online, int maxIter = 10, double tol = 0.0001, int batchSize = 128, int? randomState = null, int[]? columnIndices = null)

Parameters

nComponents int

Number of topics. Defaults to 10.

docTopicPrior double?

Prior for document-topic distribution (alpha). If null, uses 1/n_topics.

topicWordPrior double?

Prior for topic-word distribution (beta/eta). If null, uses 1/n_topics.

learningMethod LdaLearningMethod

Learning method to use. Defaults to Online.

maxIter int

Maximum number of iterations. Defaults to 10.

tol double

Convergence tolerance. Defaults to 1e-4.

batchSize int

Batch size for online learning. Defaults to 128.

randomState int?

Random seed for reproducibility.

columnIndices int[]

The column indices to use, or null for all columns.

Properties

Components

Gets the topic-word distribution matrix (topics x vocabulary). Each row is a topic, each column is a word probability.

public double[,]? Components { get; }

Property Value

double[,]

DocTopicPrior

Gets the document-topic prior (alpha).

public double DocTopicPrior { get; }

Property Value

double

LearningMethod

Gets the learning method.

public LdaLearningMethod LearningMethod { get; }

Property Value

LdaLearningMethod

NComponents

Gets the number of topics.

public int NComponents { get; }

Property Value

int

NVocab

Gets the vocabulary size.

public int NVocab { get; }

Property Value

int

SupportsInverseTransform

Gets whether this transformer supports inverse transformation.

public override bool SupportsInverseTransform { get; }

Property Value

bool

TopicWordPrior

Gets the topic-word prior (beta/eta).

public double TopicWordPrior { get; }

Property Value

double

Methods

FitCore(Matrix<T>)

Fits LDA using variational inference.

protected override void FitCore(Matrix<T> data)

Parameters

data Matrix<T>

GetFeatureNamesOut(string[]?)

Gets the output feature names after transformation.

public override string[] GetFeatureNamesOut(string[]? inputFeatureNames = null)

Parameters

inputFeatureNames string[]

Returns

string[]

GetTopWordsPerTopic(int, string[]?)

Gets the top words for each topic.

public string[][] GetTopWordsPerTopic(int nTopWords = 10, string[]? featureNames = null)

Parameters

nTopWords int

Number of top words to return per topic.

featureNames string[]

Optional vocabulary names.

Returns

string[][]

Array of arrays containing top word indices (or names if provided) for each topic.

InverseTransformCore(Matrix<T>)

Inverse transformation is not supported.

protected override Matrix<T> InverseTransformCore(Matrix<T> data)

Parameters

data Matrix<T>

Returns

Matrix<T>

TransformCore(Matrix<T>)

Transforms documents to topic distributions.

protected override Matrix<T> TransformCore(Matrix<T> data)

Parameters

data Matrix<T>

Returns

Matrix<T>