Table of Contents

Class CatBoostEncoder<T>

Namespace
AiDotNet.Preprocessing.Encoders
Assembly
AiDotNet.dll

Encodes categorical features using ordered (CatBoost-style) target encoding.

public class CatBoostEncoder<T> : TransformerBase<T, Matrix<T>, Matrix<T>>, IDataTransformer<T, Matrix<T>, Matrix<T>>

Type Parameters

T

The numeric type for calculations (e.g., float, double).

Inheritance
TransformerBase<T, Matrix<T>, Matrix<T>>
CatBoostEncoder<T>
Implements
IDataTransformer<T, Matrix<T>, Matrix<T>>
Inherited Members

Remarks

CatBoostEncoder applies an ordered approach to target encoding that prevents target leakage by only using target values from previous samples when encoding. This is the same technique used in the CatBoost gradient boosting library.

For each sample, the encoding is computed as: (sum of targets for previous samples with same category + prior) / (count + 1)

For Beginners: Regular target encoding can "cheat" by using future information. CatBoost encoding prevents this: - When encoding row 10, it only uses data from rows 1-9 - Row 1 always gets the prior (global mean) since there's nothing before it - This prevents overfitting and works better with gradient boosting

Constructors

CatBoostEncoder(double, CatBoostHandleUnknown, int, int[]?)

Creates a new instance of CatBoostEncoder<T>.

public CatBoostEncoder(double prior = 1, CatBoostHandleUnknown handleUnknown = CatBoostHandleUnknown.UseGlobalMean, int randomState = 0, int[]? columnIndices = null)

Parameters

prior double

Prior weight for regularization. Higher values add more smoothing. Defaults to 1.0.

handleUnknown CatBoostHandleUnknown

How to handle unknown categories. Defaults to UseGlobalMean.

randomState int

Random seed for shuffling order. Defaults to 0.

columnIndices int[]

The column indices to encode, or null for all columns.

Properties

GlobalMean

Gets the global target mean.

public double GlobalMean { get; }

Property Value

double

HandleUnknown

Gets how unknown categories are handled.

public CatBoostHandleUnknown HandleUnknown { get; }

Property Value

CatBoostHandleUnknown

Prior

Gets the prior value (regularization).

public double Prior { get; }

Property Value

double

SupportsInverseTransform

Gets whether this transformer supports inverse transformation.

public override bool SupportsInverseTransform { get; }

Property Value

bool

Methods

Fit(Matrix<T>, Vector<T>)

Fits the encoder by computing category statistics.

public void Fit(Matrix<T> data, Vector<T> target)

Parameters

data Matrix<T>

The feature matrix to fit.

target Vector<T>

The target values.

FitCore(Matrix<T>)

Fits the encoder (requires target via specialized Fit method).

protected override void FitCore(Matrix<T> data)

Parameters

data Matrix<T>

FitTransform(Matrix<T>, Vector<T>)

Fits and transforms using ordered target encoding (CatBoost style).

public Matrix<T> FitTransform(Matrix<T> data, Vector<T> target)

Parameters

data Matrix<T>

The feature matrix.

target Vector<T>

The target values.

Returns

Matrix<T>

The encoded data with ordered target statistics.

GetFeatureNamesOut(string[]?)

Gets the output feature names after transformation.

public override string[] GetFeatureNamesOut(string[]? inputFeatureNames = null)

Parameters

inputFeatureNames string[]

Returns

string[]

InverseTransformCore(Matrix<T>)

Inverse transformation is not supported.

protected override Matrix<T> InverseTransformCore(Matrix<T> data)

Parameters

data Matrix<T>

Returns

Matrix<T>

TransformCore(Matrix<T>)

Transforms test data using full category statistics (for inference).

protected override Matrix<T> TransformCore(Matrix<T> data)

Parameters

data Matrix<T>

Returns

Matrix<T>

TransformWithTarget(Matrix<T>, Vector<T>)

Transforms training data using ordered encoding (only uses previous samples).

public Matrix<T> TransformWithTarget(Matrix<T> data, Vector<T> target)

Parameters

data Matrix<T>

The data to transform.

target Vector<T>

The target values (for ordered calculation).

Returns

Matrix<T>

The encoded data.