Class WOEEncoder<T>
- Namespace
- AiDotNet.Preprocessing.Encoders
- Assembly
- AiDotNet.dll
Encodes categorical features using Weight of Evidence (WOE).
public class WOEEncoder<T> : TransformerBase<T, Matrix<T>, Matrix<T>>, IDataTransformer<T, Matrix<T>, Matrix<T>>
Type Parameters
TThe numeric type for calculations (e.g., float, double).
- Inheritance
-
WOEEncoder<T>
- Implements
- Inherited Members
Remarks
Weight of Evidence is commonly used in credit scoring and binary classification. It measures the strength of the relationship between a category and the binary target. WOE = ln(Distribution of Events / Distribution of Non-Events)
Higher WOE values indicate categories more associated with the positive class, while lower (negative) values indicate association with the negative class.
For Beginners: WOE tells you how "good" or "bad" a category is for prediction: - WOE > 0: Category is more likely to have positive outcomes - WOE < 0: Category is more likely to have negative outcomes - WOE ≈ 0: Category has no predictive power
Example in loan default prediction:
- "Employed" might have WOE = -0.5 (less likely to default)
- "Unemployed" might have WOE = +0.8 (more likely to default)
Constructors
WOEEncoder(double, WOEHandleUnknown, int[]?)
Creates a new instance of WOEEncoder<T>.
public WOEEncoder(double regularization = 0.5, WOEHandleUnknown handleUnknown = WOEHandleUnknown.UseZero, int[]? columnIndices = null)
Parameters
regularizationdoubleRegularization to add to counts to prevent division by zero. Defaults to 0.5.
handleUnknownWOEHandleUnknownHow to handle unknown categories. Defaults to UseZero.
columnIndicesint[]The column indices to encode, or null for all columns.
Properties
HandleUnknown
Gets how unknown categories are handled.
public WOEHandleUnknown HandleUnknown { get; }
Property Value
Regularization
Gets the regularization parameter to prevent infinite WOE values.
public double Regularization { get; }
Property Value
SupportsInverseTransform
Gets whether this transformer supports inverse transformation.
public override bool SupportsInverseTransform { get; }
Property Value
WOEValues
Gets the WOE values for each category.
public Dictionary<int, Dictionary<double, double>>? WOEValues { get; }
Property Value
Methods
CalculateInformationValue(Matrix<T>, Vector<T>)
Calculates Information Value (IV) for each feature.
public Dictionary<int, double> CalculateInformationValue(Matrix<T> data, Vector<T> target)
Parameters
dataMatrix<T>The feature matrix.
targetVector<T>The binary target.
Returns
- Dictionary<int, double>
Dictionary mapping column index to IV value.
Remarks
IV measures the overall predictive power of a feature. IV < 0.02: Not useful for prediction 0.02 < IV < 0.1: Weak predictor 0.1 < IV < 0.3: Medium predictor 0.3 < IV < 0.5: Strong predictor IV > 0.5: Suspicious (possible overfitting)
Fit(Matrix<T>, Vector<T>)
Fits the encoder by computing WOE values for each category.
public void Fit(Matrix<T> data, Vector<T> target)
Parameters
dataMatrix<T>The feature matrix to fit.
targetVector<T>The binary target values (0 or 1).
FitCore(Matrix<T>)
Fits the encoder (requires binary target via specialized Fit method).
protected override void FitCore(Matrix<T> data)
Parameters
dataMatrix<T>
FitTransform(Matrix<T>, Vector<T>)
Fits and transforms the data.
public Matrix<T> FitTransform(Matrix<T> data, Vector<T> target)
Parameters
dataMatrix<T>targetVector<T>
Returns
- Matrix<T>
GetFeatureNamesOut(string[]?)
Gets the output feature names after transformation.
public override string[] GetFeatureNamesOut(string[]? inputFeatureNames = null)
Parameters
inputFeatureNamesstring[]
Returns
- string[]
InverseTransformCore(Matrix<T>)
Inverse transformation is not supported.
protected override Matrix<T> InverseTransformCore(Matrix<T> data)
Parameters
dataMatrix<T>
Returns
- Matrix<T>
TransformCore(Matrix<T>)
Transforms the data by replacing categories with WOE values.
protected override Matrix<T> TransformCore(Matrix<T> data)
Parameters
dataMatrix<T>
Returns
- Matrix<T>