Table of Contents

Class RobustScaler<T>

Namespace
AiDotNet.Preprocessing.Scalers
Assembly
AiDotNet.dll

Scales features using statistics that are robust to outliers.

public class RobustScaler<T> : TransformerBase<T, Matrix<T>, Matrix<T>>, IDataTransformer<T, Matrix<T>, Matrix<T>>

Type Parameters

T

The numeric type for calculations (e.g., float, double).

Inheritance
TransformerBase<T, Matrix<T>, Matrix<T>>
RobustScaler<T>
Implements
IDataTransformer<T, Matrix<T>, Matrix<T>>
Inherited Members

Remarks

Robust scaling removes the median and scales data according to the interquartile range (IQR). The IQR is the range between the 25th percentile (Q1) and 75th percentile (Q3). Unlike StandardScaler, RobustScaler uses statistics that are less affected by outliers.

For Beginners: This scaler is like StandardScaler but better handles outliers: - Uses median (middle value) instead of mean (average) - Uses IQR (spread of middle 50%) instead of standard deviation

Why this matters:

  • Mean and std are heavily influenced by extreme values
  • Median and IQR ignore extreme values

Example: If most house prices are $100K-$500K but a few are $10M, RobustScaler won't let those mansions distort the scaling.

Constructors

RobustScaler(bool, bool, int[]?)

Creates a new instance of RobustScaler<T> with default settings.

public RobustScaler(bool withCentering = true, bool withScaling = true, int[]? columnIndices = null)

Parameters

withCentering bool

If true, center the data by subtracting the median. Default is true.

withScaling bool

If true, scale the data by dividing by the IQR. Default is true.

columnIndices int[]

The column indices to scale, or null for all columns.

RobustScaler(double, double, bool, bool, int[]?)

Creates a new instance of RobustScaler<T> with custom quantile range.

public RobustScaler(double quantileRangeMin, double quantileRangeMax, bool withCentering = true, bool withScaling = true, int[]? columnIndices = null)

Parameters

quantileRangeMin double

The lower quantile (0-100). Default is 25 (Q1).

quantileRangeMax double

The upper quantile (0-100). Default is 75 (Q3).

withCentering bool

If true, center the data by subtracting the median.

withScaling bool

If true, scale the data by dividing by the IQR.

columnIndices int[]

The column indices to scale, or null for all columns.

Properties

InterquartileRange

Gets the interquartile range (IQR) of each feature computed during fitting.

public Vector<T>? InterquartileRange { get; }

Property Value

Vector<T>

Median

Gets the median of each feature computed during fitting.

public Vector<T>? Median { get; }

Property Value

Vector<T>

SupportsInverseTransform

Gets whether this transformer supports inverse transformation.

public override bool SupportsInverseTransform { get; }

Property Value

bool

WithCentering

Gets whether this scaler centers the data (subtracts median).

public bool WithCentering { get; }

Property Value

bool

WithScaling

Gets whether this scaler scales the data (divides by IQR).

public bool WithScaling { get; }

Property Value

bool

Methods

FitCore(Matrix<T>)

Computes the median and IQR of each feature from the training data.

protected override void FitCore(Matrix<T> data)

Parameters

data Matrix<T>

The training data matrix where each column is a feature.

GetFeatureNamesOut(string[]?)

Gets the output feature names after transformation.

public override string[] GetFeatureNamesOut(string[]? inputFeatureNames = null)

Parameters

inputFeatureNames string[]

The input feature names.

Returns

string[]

The same feature names (RobustScaler doesn't change number of features).

InverseTransformCore(Matrix<T>)

Reverses the robust scaling transformation.

protected override Matrix<T> InverseTransformCore(Matrix<T> data)

Parameters

data Matrix<T>

The scaled data.

Returns

Matrix<T>

The original-scale data.

TransformCore(Matrix<T>)

Transforms the data by applying robust scaling.

protected override Matrix<T> TransformCore(Matrix<T> data)

Parameters

data Matrix<T>

The data to transform.

Returns

Matrix<T>

The scaled data.