Class TargetEncoder<T>
- Namespace
- AiDotNet.Preprocessing.Encoders
- Assembly
- AiDotNet.dll
Encodes categorical features using target mean encoding.
public class TargetEncoder<T> : TransformerBase<T, Matrix<T>, Matrix<T>>, IDataTransformer<T, Matrix<T>, Matrix<T>>
Type Parameters
TThe numeric type for calculations (e.g., float, double).
- Inheritance
-
TargetEncoder<T>
- Implements
- Inherited Members
Remarks
TargetEncoder replaces each category with the mean of the target variable for that category. This creates a continuous feature that captures the relationship between the category and target.
To prevent overfitting, especially with rare categories, smoothing is applied: encoding = (count * category_mean + smoothing * global_mean) / (count + smoothing)
For Beginners: Instead of one-hot encoding (many columns), target encoding creates a single column per feature containing the average target value for each category: - Category "A" with average target 0.8 becomes 0.8 - Category "B" with average target 0.3 becomes 0.3
This is especially useful for high-cardinality features where one-hot would create too many columns.
Constructors
TargetEncoder(double, double, TargetEncoderHandleUnknown, int[]?)
Creates a new instance of TargetEncoder<T>.
public TargetEncoder(double smoothing = 1, double minSamplesLeaf = 1, TargetEncoderHandleUnknown handleUnknown = TargetEncoderHandleUnknown.UseGlobalMean, int[]? columnIndices = null)
Parameters
smoothingdoubleSmoothing parameter. Higher values give more weight to global mean. Defaults to 1.0.
minSamplesLeafdoubleMinimum samples to compute category mean. Categories below this use global mean. Defaults to 1.
handleUnknownTargetEncoderHandleUnknownHow to handle unknown categories during transform. Defaults to UseGlobalMean.
columnIndicesint[]The column indices to encode, or null for all columns.
Properties
EncodingMaps
Gets the encoding maps for each column.
public Dictionary<int, Dictionary<double, double>>? EncodingMaps { get; }
Property Value
HandleUnknown
Gets how unknown categories are handled during transform.
public TargetEncoderHandleUnknown HandleUnknown { get; }
Property Value
Smoothing
Gets the smoothing parameter used during encoding.
public double Smoothing { get; }
Property Value
SupportsInverseTransform
Gets whether this transformer supports inverse transformation.
public override bool SupportsInverseTransform { get; }
Property Value
Methods
Fit(Matrix<T>, Vector<T>)
Fits the encoder by learning the target means for each category.
public void Fit(Matrix<T> data, Vector<T> target)
Parameters
dataMatrix<T>The feature matrix to fit.
targetVector<T>The target values used to compute means.
Exceptions
- ArgumentException
If target length doesn't match data rows.
FitCore(Matrix<T>)
Fits the encoder using the base Fit method (requires target via FitWithTarget).
protected override void FitCore(Matrix<T> data)
Parameters
dataMatrix<T>The feature matrix.
Exceptions
- InvalidOperationException
Always thrown. Use Fit(Matrix, Vector) instead.
FitTransform(Matrix<T>, Vector<T>)
Fits the encoder and transforms the data in one step.
public Matrix<T> FitTransform(Matrix<T> data, Vector<T> target)
Parameters
dataMatrix<T>The feature matrix to fit and transform.
targetVector<T>The target values used to compute means.
Returns
- Matrix<T>
The encoded data.
GetFeatureNamesOut(string[]?)
Gets the output feature names after transformation.
public override string[] GetFeatureNamesOut(string[]? inputFeatureNames = null)
Parameters
inputFeatureNamesstring[]
Returns
- string[]
InverseTransformCore(Matrix<T>)
Inverse transformation is not supported for target encoding.
protected override Matrix<T> InverseTransformCore(Matrix<T> data)
Parameters
dataMatrix<T>
Returns
- Matrix<T>
TransformCore(Matrix<T>)
Transforms the data by replacing categories with their target means.
protected override Matrix<T> TransformCore(Matrix<T> data)
Parameters
dataMatrix<T>The data to transform.
Returns
- Matrix<T>
The encoded data.