Class LatentDirichletAllocation<T>

Namespace: AiDotNet.Preprocessing.DimensionalityReduction

Assembly: AiDotNet.dll

Latent Dirichlet Allocation for topic modeling.

public class LatentDirichletAllocation<T> : TransformerBase<T, Matrix<T>, Matrix<T>>, IDataTransformer<T, Matrix<T>, Matrix<T>>

Type Parameters

T: The numeric type for calculations (e.g., float, double).

Inheritance: object

TransformerBase<T, Matrix<T>, Matrix<T>>

LatentDirichletAllocation<T>

Implements: IDataTransformer<T, Matrix<T>, Matrix<T>>

Inherited Members: TransformerBase<T, Matrix<T>, Matrix<T>>.NumOps

TransformerBase<T, Matrix<T>, Matrix<T>>.Engine

TransformerBase<T, Matrix<T>, Matrix<T>>.IsFitted

TransformerBase<T, Matrix<T>, Matrix<T>>.ColumnIndices

TransformerBase<T, Matrix<T>, Matrix<T>>.SupportsInverseTransform

TransformerBase<T, Matrix<T>, Matrix<T>>.Fit(Matrix<T>)

TransformerBase<T, Matrix<T>, Matrix<T>>.Transform(Matrix<T>)

TransformerBase<T, Matrix<T>, Matrix<T>>.FitTransform(Matrix<T>)

TransformerBase<T, Matrix<T>, Matrix<T>>.InverseTransform(Matrix<T>)

TransformerBase<T, Matrix<T>, Matrix<T>>.GetFeatureNamesOut(string[])

TransformerBase<T, Matrix<T>, Matrix<T>>.FitCore(Matrix<T>)

TransformerBase<T, Matrix<T>, Matrix<T>>.TransformCore(Matrix<T>)

TransformerBase<T, Matrix<T>, Matrix<T>>.InverseTransformCore(Matrix<T>)

TransformerBase<T, Matrix<T>, Matrix<T>>.ValidateInputData(Matrix<T>)

TransformerBase<T, Matrix<T>, Matrix<T>>.EnsureFitted()

TransformerBase<T, Matrix<T>, Matrix<T>>.GetColumnsToProcess(int)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

LDA is a generative probabilistic model that discovers latent topics in a collection of documents (or more generally, in count data). Each document is modeled as a mixture of topics, and each topic is a distribution over words (features).

The model assumes: - Each document has a distribution over topics (theta) - Each topic has a distribution over words (beta) - Words in documents are generated by first sampling a topic, then sampling a word

For Beginners: LDA finds hidden themes (topics) in your data: - Input: Document-term matrix (rows=documents, columns=word counts) - Output: Document-topic distribution (what topics each document is about) - Also learns: Topic-word distribution (what words define each topic) - Use for: Topic modeling, document clustering, feature extraction

Constructors

LatentDirichletAllocation(int, double?, double?, LdaLearningMethod, int, double, int, int?, int[]?)

Creates a new instance of LatentDirichletAllocation<T>.

public LatentDirichletAllocation(int nComponents = 10, double? docTopicPrior = null, double? topicWordPrior = null, LdaLearningMethod learningMethod = LdaLearningMethod.Online, int maxIter = 10, double tol = 0.0001, int batchSize = 128, int? randomState = null, int[]? columnIndices = null)

Parameters

nComponents int: Number of topics. Defaults to 10.
docTopicPrior double?: Prior for document-topic distribution (alpha). If null, uses 1/n_topics.
topicWordPrior double?: Prior for topic-word distribution (beta/eta). If null, uses 1/n_topics.
learningMethod LdaLearningMethod: Learning method to use. Defaults to Online.
maxIter int: Maximum number of iterations. Defaults to 10.
tol double: Convergence tolerance. Defaults to 1e-4.
batchSize int: Batch size for online learning. Defaults to 128.
randomState int?: Random seed for reproducibility.
columnIndices int[]: The column indices to use, or null for all columns.

Properties

Components

Gets the topic-word distribution matrix (topics x vocabulary). Each row is a topic, each column is a word probability.

public double[,]? Components { get; }

Property Value

double[,]

DocTopicPrior

Gets the document-topic prior (alpha).

public double DocTopicPrior { get; }

Property Value

double

LearningMethod

Gets the learning method.

public LdaLearningMethod LearningMethod { get; }

Property Value

LdaLearningMethod

NComponents

Gets the number of topics.

public int NComponents { get; }

Property Value

int

NVocab

Gets the vocabulary size.

public int NVocab { get; }

Property Value

int

SupportsInverseTransform

Gets whether this transformer supports inverse transformation.

public override bool SupportsInverseTransform { get; }

Property Value

bool

TopicWordPrior

Gets the topic-word prior (beta/eta).

public double TopicWordPrior { get; }

Property Value

double

Methods

FitCore(Matrix<T>)

Fits LDA using variational inference.

protected override void FitCore(Matrix<T> data)

Parameters

data Matrix<T>

GetFeatureNamesOut(string[]?)

Gets the output feature names after transformation.

public override string[] GetFeatureNamesOut(string[]? inputFeatureNames = null)

Parameters

inputFeatureNames string[]

Returns

string[]

GetTopWordsPerTopic(int, string[]?)

Gets the top words for each topic.

public string[][] GetTopWordsPerTopic(int nTopWords = 10, string[]? featureNames = null)

Parameters

nTopWords int: Number of top words to return per topic.
featureNames string[]: Optional vocabulary names.

Returns

string[][]: Array of arrays containing top word indices (or names if provided) for each topic.

InverseTransformCore(Matrix<T>)

Inverse transformation is not supported.

protected override Matrix<T> InverseTransformCore(Matrix<T> data)

Parameters

data Matrix<T>

Returns

Matrix<T>

TransformCore(Matrix<T>)

Transforms documents to topic distributions.

protected override Matrix<T> TransformCore(Matrix<T> data)

Parameters

data Matrix<T>

Returns

Matrix<T>

Table of Contents

Class LatentDirichletAllocation<T>

Type Parameters

Remarks

Constructors

LatentDirichletAllocation(int, double?, double?, LdaLearningMethod, int, double, int, int?, int[]?)

Parameters

Properties

Components

Property Value

DocTopicPrior

Property Value

LearningMethod

Property Value

NComponents

Property Value

NVocab

Property Value

SupportsInverseTransform

Property Value

TopicWordPrior

Property Value

Methods

FitCore(Matrix<T>)

Parameters

GetFeatureNamesOut(string[]?)

Parameters

Returns

GetTopWordsPerTopic(int, string[]?)

Parameters

Returns

InverseTransformCore(Matrix<T>)

Parameters

Returns

TransformCore(Matrix<T>)

Parameters

Returns