Class ClusteringMetrics<T>
Represents clustering quality metrics for evaluating the performance of clustering algorithms.
public class ClusteringMetrics<T>
Type Parameters
TThe numeric type used for metric values, typically float or double.
- Inheritance
-
ClusteringMetrics<T>
- Inherited Members
Remarks
This class encapsulates various metrics used to assess the quality of clustering results. These metrics help determine how well data points are grouped into clusters and whether the clustering algorithm has produced meaningful, well-separated groups.
For Beginners: This class stores measurements that tell you how good your clustering is.
When you group data into clusters (like organizing customers into segments or grouping similar documents), you need to know if the grouping makes sense. This class provides several scores that help answer questions like:
- Are items in the same cluster similar to each other?
- Are different clusters well-separated from each other?
- How does your clustering compare to known "ground truth" groupings?
The metrics included are:
- Silhouette Score: Measures how well each item fits in its cluster (-1 to 1, higher is better)
- Calinski-Harabasz Index: Measures cluster separation (higher is better)
- Davies-Bouldin Index: Measures cluster compactness and separation (lower is better)
- Adjusted Rand Index: Compares clustering to ground truth labels (-1 to 1, higher is better)
These metrics are automatically calculated during cross-validation when your model produces cluster labels.
Constructors
ClusteringMetrics()
Initializes a new instance of the ClusteringMetrics class with default values (all metrics null).
public ClusteringMetrics()
ClusteringMetrics(T?, T?, T?, T?)
Initializes a new instance of the ClusteringMetrics class with specified metric values.
public ClusteringMetrics(T? silhouetteScore = default, T? calinskiHarabaszIndex = default, T? daviesBouldinIndex = default, T? adjustedRandIndex = default)
Parameters
silhouetteScoreTThe Silhouette Score value.
calinskiHarabaszIndexTThe Calinski-Harabasz Index value.
daviesBouldinIndexTThe Davies-Bouldin Index value.
adjustedRandIndexTThe Adjusted Rand Index value.
Properties
AdjustedRandIndex
Gets or sets the Adjusted Rand Index comparing clustering to ground truth labels.
public T? AdjustedRandIndex { get; set; }
Property Value
- T
The Adjusted Rand Index value, or null if ground truth labels are not available.
Remarks
For Beginners: This index compares your clustering to known "correct" labels, accounting for chance. - Range: -1 to 1 (can be slightly negative) - 1 = perfect match to ground truth - 0 = random clustering - Negative = worse than random
This metric requires ground truth labels to compare against. If your dataset has known categories (like product types, customer segments, etc.), this tells you how well your clustering recovered those categories.
CalinskiHarabaszIndex
Gets or sets the Calinski-Harabasz Index for the clustering.
public T? CalinskiHarabaszIndex { get; set; }
Property Value
- T
The Calinski-Harabasz Index value, or null if not calculated.
Remarks
For Beginners: This index measures how well-separated and compact your clusters are. - Higher values are better - No fixed maximum - higher means more distinct, well-separated clusters - Best used to compare different numbers of clusters on the same data
DaviesBouldinIndex
Gets or sets the Davies-Bouldin Index for the clustering.
public T? DaviesBouldinIndex { get; set; }
Property Value
- T
The Davies-Bouldin Index value, or null if not calculated.
Remarks
For Beginners: This index measures the average similarity between each cluster and its most similar neighbor. - Lower values are better - 0 = perfect clustering - Higher values mean clusters are less distinct
SilhouetteScore
Gets or sets the Silhouette Score for the clustering.
public T? SilhouetteScore { get; set; }
Property Value
- T
The Silhouette Score value, or null if not calculated.
Remarks
For Beginners: This score measures how well each data point fits into its assigned cluster. - Range: -1 to 1 - 1 = perfect clustering (points are very similar to their cluster and very different from other clusters) - 0 = clusters are overlapping or ambiguous - -1 = points might be in the wrong clusters