Class CountEncoder<T>
- Namespace
- AiDotNet.Preprocessing.Encoders
- Assembly
- AiDotNet.dll
Encodes categorical features using frequency counts.
public class CountEncoder<T> : TransformerBase<T, Matrix<T>, Matrix<T>>, IDataTransformer<T, Matrix<T>, Matrix<T>>
Type Parameters
TThe numeric type for calculations (e.g., float, double).
- Inheritance
-
CountEncoder<T>
- Implements
- Inherited Members
Remarks
CountEncoder replaces each category with its frequency count (number of occurrences) in the training data. This creates a continuous feature that captures category popularity.
Options include normalizing counts to probabilities (0-1 range) or log-transforming the counts to handle highly skewed distributions.
For Beginners: Instead of creating multiple columns, frequency encoding replaces each category with how often it appears: - Category "common" appearing 1000 times → 1000 (or 0.5 if normalized) - Category "rare" appearing 10 times → 10 (or 0.005 if normalized)
This is useful when the popularity of a category is predictive of the target.
Constructors
CountEncoder(bool, bool, CountEncoderHandleUnknown, double, int[]?)
Creates a new instance of CountEncoder<T>.
public CountEncoder(bool normalize = false, bool logTransform = false, CountEncoderHandleUnknown handleUnknown = CountEncoderHandleUnknown.UseValue, double unknownValue = 1, int[]? columnIndices = null)
Parameters
normalizeboolIf true, normalize counts to probabilities (0-1). Defaults to false.
logTransformboolIf true, apply log1p transform to counts. Defaults to false.
handleUnknownCountEncoderHandleUnknownHow to handle unknown categories. Defaults to UseValue.
unknownValuedoubleValue to use for unknown categories when HandleUnknown is UseValue. Defaults to 1.
columnIndicesint[]The column indices to encode, or null for all columns.
Properties
CountMaps
Gets the count maps for each column.
public Dictionary<int, Dictionary<double, double>>? CountMaps { get; }
Property Value
HandleUnknown
Gets how unknown categories are handled.
public CountEncoderHandleUnknown HandleUnknown { get; }
Property Value
LogTransform
Gets whether counts are log-transformed.
public bool LogTransform { get; }
Property Value
Normalize
Gets whether counts are normalized to probabilities.
public bool Normalize { get; }
Property Value
SupportsInverseTransform
Gets whether this transformer supports inverse transformation.
public override bool SupportsInverseTransform { get; }
Property Value
Methods
FitCore(Matrix<T>)
Learns the frequency counts from the training data.
protected override void FitCore(Matrix<T> data)
Parameters
dataMatrix<T>The training data matrix.
GetFeatureNamesOut(string[]?)
Gets the output feature names after transformation.
public override string[] GetFeatureNamesOut(string[]? inputFeatureNames = null)
Parameters
inputFeatureNamesstring[]
Returns
- string[]
InverseTransformCore(Matrix<T>)
Inverse transformation is not supported for frequency encoding.
protected override Matrix<T> InverseTransformCore(Matrix<T> data)
Parameters
dataMatrix<T>
Returns
- Matrix<T>
TransformCore(Matrix<T>)
Transforms the data by replacing categories with their frequency counts.
protected override Matrix<T> TransformCore(Matrix<T> data)
Parameters
dataMatrix<T>The data to transform.
Returns
- Matrix<T>
The frequency encoded data.