Class UMAP<T>
- Namespace
- AiDotNet.Preprocessing.DimensionalityReduction
- Assembly
- AiDotNet.dll
Uniform Manifold Approximation and Projection for dimensionality reduction.
public class UMAP<T> : TransformerBase<T, Matrix<T>, Matrix<T>>, IDataTransformer<T, Matrix<T>, Matrix<T>>
Type Parameters
TThe numeric type for calculations (e.g., float, double).
- Inheritance
-
UMAP<T>
- Implements
- Inherited Members
Remarks
UMAP is a nonlinear dimensionality reduction technique that constructs a high-dimensional graph representation and optimizes a low-dimensional graph to be as structurally similar as possible. It is based on Riemannian geometry and algebraic topology.
Key advantages over t-SNE: - Much faster (scales better to large datasets) - Preserves more global structure - Supports out-of-sample transformation - More deterministic results
For Beginners: UMAP creates visualizations similar to t-SNE but: - It's faster, especially for large datasets - Distances between clusters are more meaningful - You can transform new data points without refitting - Great for both visualization AND as a preprocessing step for ML
Example use cases:
- Visualizing high-dimensional data (gene expression, embeddings)
- Preprocessing features for classification
- Clustering analysis
- Anomaly detection
Constructors
UMAP(int, int, double, double, UMAPMetric, int, double, double, double, double, int?, int[]?)
Creates a new instance of UMAP<T>.
public UMAP(int nComponents = 2, int nNeighbors = 15, double minDist = 0.1, double spread = 1, UMAPMetric metric = UMAPMetric.Euclidean, int nEpochs = 200, double learningRate = 1, double negativeSampleRate = 5, double localConnectivity = 1, double repulsionStrength = 1, int? randomState = null, int[]? columnIndices = null)
Parameters
nComponentsintTarget dimensionality (usually 2 or 3). Defaults to 2.
nNeighborsintNumber of neighbors for manifold approximation. Defaults to 15.
minDistdoubleMinimum distance between points in embedding. Defaults to 0.1.
spreaddoubleEffective scale of embedded points. Defaults to 1.0.
metricUMAPMetricDistance metric to use. Defaults to Euclidean.
nEpochsintNumber of training epochs. Defaults to 200.
learningRatedoubleLearning rate for SGD. Defaults to 1.0.
negativeSampleRatedoubleNegative samples per positive. Defaults to 5.
localConnectivitydoubleLocal connectivity constraint. Defaults to 1.0.
repulsionStrengthdoubleRepulsion strength during optimization. Defaults to 1.0.
randomStateint?Random seed for reproducibility.
columnIndicesint[]The column indices to use, or null for all columns.
Properties
Embedding
Gets the embedding result.
public double[,]? Embedding { get; }
Property Value
- double[,]
Metric
Gets the distance metric.
public UMAPMetric Metric { get; }
Property Value
MinDist
Gets the minimum distance parameter.
public double MinDist { get; }
Property Value
NComponents
Gets the number of components (dimensions).
public int NComponents { get; }
Property Value
NNeighbors
Gets the number of neighbors.
public int NNeighbors { get; }
Property Value
SupportsInverseTransform
Gets whether this transformer supports inverse transformation.
public override bool SupportsInverseTransform { get; }
Property Value
Methods
FitCore(Matrix<T>)
Fits UMAP and computes the embedding.
protected override void FitCore(Matrix<T> data)
Parameters
dataMatrix<T>
GetEmbedding()
Gets the embedding computed during Fit for the training data.
public Matrix<T> GetEmbedding()
Returns
- Matrix<T>
The embedding matrix for the training data.
GetFeatureNamesOut(string[]?)
Gets the output feature names after transformation.
public override string[] GetFeatureNamesOut(string[]? inputFeatureNames = null)
Parameters
inputFeatureNamesstring[]
Returns
- string[]
InverseTransformCore(Matrix<T>)
Inverse transformation is not supported.
protected override Matrix<T> InverseTransformCore(Matrix<T> data)
Parameters
dataMatrix<T>
Returns
- Matrix<T>
TransformCore(Matrix<T>)
Transforms data using the fitted UMAP embedding.
protected override Matrix<T> TransformCore(Matrix<T> data)
Parameters
dataMatrix<T>
Returns
- Matrix<T>
Remarks
This method always performs out-of-sample transformation using the learned embedding space. To get the original training embedding, use GetEmbedding().