Class Winsorizer<T>
- Namespace
- AiDotNet.Preprocessing.OutlierHandling
- Assembly
- AiDotNet.dll
Winsorizes data by replacing extreme values with percentile bounds.
public class Winsorizer<T> : TransformerBase<T, Matrix<T>, Matrix<T>>, IDataTransformer<T, Matrix<T>, Matrix<T>>
Type Parameters
TThe numeric type for calculations (e.g., float, double).
- Inheritance
-
Winsorizer<T>
- Implements
- Inherited Members
Remarks
Winsorizer is a statistical technique that limits extreme values in the data to reduce the effect of outliers. Unlike trimming (which removes outliers), Winsorization replaces them with less extreme values.
This is equivalent to OutlierClipper but follows the traditional Winsorization terminology where you specify the percentage of data to Winsorize at each tail.
For Beginners: Winsorization is named after biostatistician Charles Winsor. Instead of removing outliers, it replaces them with the nearest "normal" values: - If you Winsorize at 5%, the bottom 5% of values become equal to the 5th percentile - The top 5% of values become equal to the 95th percentile
This preserves sample size while reducing outlier impact.
Constructors
Winsorizer(double, double, WinsorizerLimitType, int[]?)
Creates a new instance of Winsorizer<T>.
public Winsorizer(double lowerLimit = 5, double upperLimit = 95, WinsorizerLimitType limitType = WinsorizerLimitType.Percentile, int[]? columnIndices = null)
Parameters
lowerLimitdoubleLower limit. For percentile type: 0-50. For IQR type: multiplier (e.g., 1.5). Defaults to 5.
upperLimitdoubleUpper limit. For percentile type: 50-100. For IQR type: multiplier (e.g., 1.5). Defaults to 95.
limitTypeWinsorizerLimitTypeType of limits to use. Defaults to Percentile.
columnIndicesint[]The column indices to Winsorize, or null for all columns.
Properties
LimitType
Gets the type of limit (percentile or IQR).
public WinsorizerLimitType LimitType { get; }
Property Value
LowerBounds
Gets the computed lower bounds for each feature.
public double[]? LowerBounds { get; }
Property Value
- double[]
LowerLimit
Gets the lower limit value.
public double LowerLimit { get; }
Property Value
SupportsInverseTransform
Gets whether this transformer supports inverse transformation.
public override bool SupportsInverseTransform { get; }
Property Value
UpperBounds
Gets the computed upper bounds for each feature.
public double[]? UpperBounds { get; }
Property Value
- double[]
UpperLimit
Gets the upper limit value.
public double UpperLimit { get; }
Property Value
Methods
FitCore(Matrix<T>)
Computes the Winsorization bounds for each feature.
protected override void FitCore(Matrix<T> data)
Parameters
dataMatrix<T>The training data matrix.
GetFeatureNamesOut(string[]?)
Gets the output feature names after transformation.
public override string[] GetFeatureNamesOut(string[]? inputFeatureNames = null)
Parameters
inputFeatureNamesstring[]
Returns
- string[]
InverseTransformCore(Matrix<T>)
Inverse transformation is not supported.
protected override Matrix<T> InverseTransformCore(Matrix<T> data)
Parameters
dataMatrix<T>
Returns
- Matrix<T>
TransformCore(Matrix<T>)
Winsorizes the data by replacing extreme values with bounds.
protected override Matrix<T> TransformCore(Matrix<T> data)
Parameters
dataMatrix<T>The data to transform.
Returns
- Matrix<T>
The Winsorized data.