Class CatBoostEncoder<T>
- Namespace
- AiDotNet.Preprocessing.Encoders
- Assembly
- AiDotNet.dll
Encodes categorical features using ordered (CatBoost-style) target encoding.
public class CatBoostEncoder<T> : TransformerBase<T, Matrix<T>, Matrix<T>>, IDataTransformer<T, Matrix<T>, Matrix<T>>
Type Parameters
TThe numeric type for calculations (e.g., float, double).
- Inheritance
-
CatBoostEncoder<T>
- Implements
- Inherited Members
Remarks
CatBoostEncoder applies an ordered approach to target encoding that prevents target leakage by only using target values from previous samples when encoding. This is the same technique used in the CatBoost gradient boosting library.
For each sample, the encoding is computed as: (sum of targets for previous samples with same category + prior) / (count + 1)
For Beginners: Regular target encoding can "cheat" by using future information. CatBoost encoding prevents this: - When encoding row 10, it only uses data from rows 1-9 - Row 1 always gets the prior (global mean) since there's nothing before it - This prevents overfitting and works better with gradient boosting
Constructors
CatBoostEncoder(double, CatBoostHandleUnknown, int, int[]?)
Creates a new instance of CatBoostEncoder<T>.
public CatBoostEncoder(double prior = 1, CatBoostHandleUnknown handleUnknown = CatBoostHandleUnknown.UseGlobalMean, int randomState = 0, int[]? columnIndices = null)
Parameters
priordoublePrior weight for regularization. Higher values add more smoothing. Defaults to 1.0.
handleUnknownCatBoostHandleUnknownHow to handle unknown categories. Defaults to UseGlobalMean.
randomStateintRandom seed for shuffling order. Defaults to 0.
columnIndicesint[]The column indices to encode, or null for all columns.
Properties
GlobalMean
Gets the global target mean.
public double GlobalMean { get; }
Property Value
HandleUnknown
Gets how unknown categories are handled.
public CatBoostHandleUnknown HandleUnknown { get; }
Property Value
Prior
Gets the prior value (regularization).
public double Prior { get; }
Property Value
SupportsInverseTransform
Gets whether this transformer supports inverse transformation.
public override bool SupportsInverseTransform { get; }
Property Value
Methods
Fit(Matrix<T>, Vector<T>)
Fits the encoder by computing category statistics.
public void Fit(Matrix<T> data, Vector<T> target)
Parameters
dataMatrix<T>The feature matrix to fit.
targetVector<T>The target values.
FitCore(Matrix<T>)
Fits the encoder (requires target via specialized Fit method).
protected override void FitCore(Matrix<T> data)
Parameters
dataMatrix<T>
FitTransform(Matrix<T>, Vector<T>)
Fits and transforms using ordered target encoding (CatBoost style).
public Matrix<T> FitTransform(Matrix<T> data, Vector<T> target)
Parameters
dataMatrix<T>The feature matrix.
targetVector<T>The target values.
Returns
- Matrix<T>
The encoded data with ordered target statistics.
GetFeatureNamesOut(string[]?)
Gets the output feature names after transformation.
public override string[] GetFeatureNamesOut(string[]? inputFeatureNames = null)
Parameters
inputFeatureNamesstring[]
Returns
- string[]
InverseTransformCore(Matrix<T>)
Inverse transformation is not supported.
protected override Matrix<T> InverseTransformCore(Matrix<T> data)
Parameters
dataMatrix<T>
Returns
- Matrix<T>
TransformCore(Matrix<T>)
Transforms test data using full category statistics (for inference).
protected override Matrix<T> TransformCore(Matrix<T> data)
Parameters
dataMatrix<T>
Returns
- Matrix<T>
TransformWithTarget(Matrix<T>, Vector<T>)
Transforms training data using ordered encoding (only uses previous samples).
public Matrix<T> TransformWithTarget(Matrix<T> data, Vector<T> target)
Parameters
dataMatrix<T>The data to transform.
targetVector<T>The target values (for ordered calculation).
Returns
- Matrix<T>
The encoded data.