Table of Contents

Class CountEncoder<T>

Namespace
AiDotNet.Preprocessing.Encoders
Assembly
AiDotNet.dll

Encodes categorical features using frequency counts.

public class CountEncoder<T> : TransformerBase<T, Matrix<T>, Matrix<T>>, IDataTransformer<T, Matrix<T>, Matrix<T>>

Type Parameters

T

The numeric type for calculations (e.g., float, double).

Inheritance
TransformerBase<T, Matrix<T>, Matrix<T>>
CountEncoder<T>
Implements
IDataTransformer<T, Matrix<T>, Matrix<T>>
Inherited Members

Remarks

CountEncoder replaces each category with its frequency count (number of occurrences) in the training data. This creates a continuous feature that captures category popularity.

Options include normalizing counts to probabilities (0-1 range) or log-transforming the counts to handle highly skewed distributions.

For Beginners: Instead of creating multiple columns, frequency encoding replaces each category with how often it appears: - Category "common" appearing 1000 times → 1000 (or 0.5 if normalized) - Category "rare" appearing 10 times → 10 (or 0.005 if normalized)

This is useful when the popularity of a category is predictive of the target.

Constructors

CountEncoder(bool, bool, CountEncoderHandleUnknown, double, int[]?)

Creates a new instance of CountEncoder<T>.

public CountEncoder(bool normalize = false, bool logTransform = false, CountEncoderHandleUnknown handleUnknown = CountEncoderHandleUnknown.UseValue, double unknownValue = 1, int[]? columnIndices = null)

Parameters

normalize bool

If true, normalize counts to probabilities (0-1). Defaults to false.

logTransform bool

If true, apply log1p transform to counts. Defaults to false.

handleUnknown CountEncoderHandleUnknown

How to handle unknown categories. Defaults to UseValue.

unknownValue double

Value to use for unknown categories when HandleUnknown is UseValue. Defaults to 1.

columnIndices int[]

The column indices to encode, or null for all columns.

Properties

CountMaps

Gets the count maps for each column.

public Dictionary<int, Dictionary<double, double>>? CountMaps { get; }

Property Value

Dictionary<int, Dictionary<double, double>>

HandleUnknown

Gets how unknown categories are handled.

public CountEncoderHandleUnknown HandleUnknown { get; }

Property Value

CountEncoderHandleUnknown

LogTransform

Gets whether counts are log-transformed.

public bool LogTransform { get; }

Property Value

bool

Normalize

Gets whether counts are normalized to probabilities.

public bool Normalize { get; }

Property Value

bool

SupportsInverseTransform

Gets whether this transformer supports inverse transformation.

public override bool SupportsInverseTransform { get; }

Property Value

bool

Methods

FitCore(Matrix<T>)

Learns the frequency counts from the training data.

protected override void FitCore(Matrix<T> data)

Parameters

data Matrix<T>

The training data matrix.

GetFeatureNamesOut(string[]?)

Gets the output feature names after transformation.

public override string[] GetFeatureNamesOut(string[]? inputFeatureNames = null)

Parameters

inputFeatureNames string[]

Returns

string[]

InverseTransformCore(Matrix<T>)

Inverse transformation is not supported for frequency encoding.

protected override Matrix<T> InverseTransformCore(Matrix<T> data)

Parameters

data Matrix<T>

Returns

Matrix<T>

TransformCore(Matrix<T>)

Transforms the data by replacing categories with their frequency counts.

protected override Matrix<T> TransformCore(Matrix<T> data)

Parameters

data Matrix<T>

The data to transform.

Returns

Matrix<T>

The frequency encoded data.