Class LatentDirichletAllocation<T>
- Namespace
- AiDotNet.Preprocessing.DimensionalityReduction
- Assembly
- AiDotNet.dll
Latent Dirichlet Allocation for topic modeling.
public class LatentDirichletAllocation<T> : TransformerBase<T, Matrix<T>, Matrix<T>>, IDataTransformer<T, Matrix<T>, Matrix<T>>
Type Parameters
TThe numeric type for calculations (e.g., float, double).
- Inheritance
-
LatentDirichletAllocation<T>
- Implements
- Inherited Members
Remarks
LDA is a generative probabilistic model that discovers latent topics in a collection of documents (or more generally, in count data). Each document is modeled as a mixture of topics, and each topic is a distribution over words (features).
The model assumes: - Each document has a distribution over topics (theta) - Each topic has a distribution over words (beta) - Words in documents are generated by first sampling a topic, then sampling a word
For Beginners: LDA finds hidden themes (topics) in your data: - Input: Document-term matrix (rows=documents, columns=word counts) - Output: Document-topic distribution (what topics each document is about) - Also learns: Topic-word distribution (what words define each topic) - Use for: Topic modeling, document clustering, feature extraction
Constructors
LatentDirichletAllocation(int, double?, double?, LdaLearningMethod, int, double, int, int?, int[]?)
Creates a new instance of LatentDirichletAllocation<T>.
public LatentDirichletAllocation(int nComponents = 10, double? docTopicPrior = null, double? topicWordPrior = null, LdaLearningMethod learningMethod = LdaLearningMethod.Online, int maxIter = 10, double tol = 0.0001, int batchSize = 128, int? randomState = null, int[]? columnIndices = null)
Parameters
nComponentsintNumber of topics. Defaults to 10.
docTopicPriordouble?Prior for document-topic distribution (alpha). If null, uses 1/n_topics.
topicWordPriordouble?Prior for topic-word distribution (beta/eta). If null, uses 1/n_topics.
learningMethodLdaLearningMethodLearning method to use. Defaults to Online.
maxIterintMaximum number of iterations. Defaults to 10.
toldoubleConvergence tolerance. Defaults to 1e-4.
batchSizeintBatch size for online learning. Defaults to 128.
randomStateint?Random seed for reproducibility.
columnIndicesint[]The column indices to use, or null for all columns.
Properties
Components
Gets the topic-word distribution matrix (topics x vocabulary). Each row is a topic, each column is a word probability.
public double[,]? Components { get; }
Property Value
- double[,]
DocTopicPrior
Gets the document-topic prior (alpha).
public double DocTopicPrior { get; }
Property Value
LearningMethod
Gets the learning method.
public LdaLearningMethod LearningMethod { get; }
Property Value
NComponents
Gets the number of topics.
public int NComponents { get; }
Property Value
NVocab
Gets the vocabulary size.
public int NVocab { get; }
Property Value
SupportsInverseTransform
Gets whether this transformer supports inverse transformation.
public override bool SupportsInverseTransform { get; }
Property Value
TopicWordPrior
Gets the topic-word prior (beta/eta).
public double TopicWordPrior { get; }
Property Value
Methods
FitCore(Matrix<T>)
Fits LDA using variational inference.
protected override void FitCore(Matrix<T> data)
Parameters
dataMatrix<T>
GetFeatureNamesOut(string[]?)
Gets the output feature names after transformation.
public override string[] GetFeatureNamesOut(string[]? inputFeatureNames = null)
Parameters
inputFeatureNamesstring[]
Returns
- string[]
GetTopWordsPerTopic(int, string[]?)
Gets the top words for each topic.
public string[][] GetTopWordsPerTopic(int nTopWords = 10, string[]? featureNames = null)
Parameters
nTopWordsintNumber of top words to return per topic.
featureNamesstring[]Optional vocabulary names.
Returns
- string[][]
Array of arrays containing top word indices (or names if provided) for each topic.
InverseTransformCore(Matrix<T>)
Inverse transformation is not supported.
protected override Matrix<T> InverseTransformCore(Matrix<T> data)
Parameters
dataMatrix<T>
Returns
- Matrix<T>
TransformCore(Matrix<T>)
Transforms documents to topic distributions.
protected override Matrix<T> TransformCore(Matrix<T> data)
Parameters
dataMatrix<T>
Returns
- Matrix<T>