Class NormalizationParameters<T>

Namespace: AiDotNet.Models

Assembly: AiDotNet.dll

Represents the parameters used for normalizing a single feature or target variable in a machine learning model.

public class NormalizationParameters<T>

Type Parameters

T: The numeric type used for calculations, typically float or double.

Inheritance: object

NormalizationParameters<T>

Inherited Members: object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

This class encapsulates all the parameters needed to normalize and denormalize a single feature or target variable. It supports multiple normalization methods, such as min-max scaling, z-score normalization, robust scaling, and binning, and stores the relevant parameters for each method. These parameters are typically calculated during training based on the training data and are then used to normalize new data in the same way.

For Beginners: This class stores the information needed to scale a single feature or target variable.

When normalizing data for machine learning:

Different methods can be used (min-max scaling, z-score normalization, etc.)
Each method requires specific parameters (like minimum/maximum values or mean/standard deviation)
These parameters need to be saved to ensure consistent scaling

This class stores all those parameters for a single feature, including:

Which normalization method is being used
The specific values needed for that method (min/max, mean/stddev, etc.)

For example, if using min-max scaling to normalize house prices from $100,000-$1,500,000 to a 0-1 range, this class would store the minimum ($100,000) and maximum ($1,500,000) values needed for that conversion.

Constructors

NormalizationParameters(INumericOperations<T>?)

Initializes a new instance of the NormalizationParameters class with default values.

public NormalizationParameters(INumericOperations<T>? numOps = null)

Parameters

numOps INumericOperations<T>: Optional numeric operations provider. If null, a default provider will be used.

Remarks

This constructor creates a new NormalizationParameters instance with default values. It initializes all numeric properties to zero and sets the normalization method to None. The constructor takes an optional numeric operations provider, which is used for mathematical operations on the generic type T. If no provider is specified, a default one is obtained from the MathHelper class.

For Beginners: This constructor creates a new set of normalization parameters with default values.

When creating new normalization parameters:

All numeric values are initialized to zero
The normalization method is set to "None"
A numeric operations provider is set up to handle the math

The numeric operations provider:

Allows the class to work with different numeric types (float, double, decimal)
Provides methods for basic math operations on type T
Is usually obtained automatically from MathHelper

This constructor is typically used when:

Creating parameters before calculating actual values
Deserializing parameters from storage
Creating empty parameters as placeholders

Properties

Bins

public List<T> Bins { get; set; }

Property Value

List<T>

IQR

Gets or sets the interquartile range (IQR) of the data.

public T IQR { get; set; }

Property Value

T: The interquartile range, used for robust normalization.

Remarks

This property stores the interquartile range (IQR) of the data for the feature or target variable. The IQR is the difference between the 75th percentile (Q3) and the 25th percentile (Q1) of the data. It is primarily used for robust normalization, where values are scaled by subtracting the median and dividing by the IQR. This approach is less sensitive to outliers than z-score normalization, which uses the mean and standard deviation. The IQR is typically calculated during training based on the training data.

For Beginners: This stores the range between the 25th and 75th percentiles of the data.

The interquartile range (IQR):

Is used primarily for robust normalization
Measures the spread of the middle 50% of the data
Is less affected by outliers than standard deviation

For example, if the middle 50% of house prices in your dataset range from $250,000 to $450,000, the IQR would be 200000.

This parameter is important because:

It provides a robust measure of data spread
It's less influenced by extreme values than standard deviation
It's used as a divisor in robust scaling to handle data with outliers

Max

Gets or sets the maximum value observed in the data.

public T Max { get; set; }

Property Value

T: The maximum value, used for min-max normalization.

Remarks

This property stores the maximum value observed in the data for the feature or target variable. It is primarily used for min-max normalization, where values are scaled to a range based on the minimum and maximum values. The maximum value is typically calculated during training based on the training data.

For Beginners: This stores the largest value observed in the data.

The maximum value:

Is used primarily for min-max scaling
Represents the upper bound of the original data range
Is typically mapped to 1 or another upper bound in the normalized range

For example, if normalizing house prices and the most expensive house is $1,500,000, this value would be 1500000.

This parameter is important because:

It defines the upper end of the data range
It's needed to properly scale new data points
It helps ensure consistent normalization

MaxAbs

Gets or sets the maximum absolute value observed in the data.

public T MaxAbs { get; set; }

Property Value

T: The maximum absolute value, used for MaxAbsScaler normalization.

Remarks

This property stores the maximum absolute value observed in the data for the feature or target variable. It is used for MaxAbsScaler normalization, which scales data to the range [-1, 1] by dividing each value by the maximum absolute value. This method preserves the sign of values and maintains zeros (which is important for sparse data). The maximum absolute value is typically calculated during training based on the training data.

For Beginners: This stores the largest absolute value (ignoring the sign) in your data.

The maximum absolute value:

Is used for MaxAbsScaler normalization
Represents the farthest distance from zero in either direction
Is used as a divisor to scale values to the range [-1, 1]

For example, if your data ranges from -75 to 100, the maximum absolute value would be 100, and all values would be divided by 100 to scale them to [-0.75, 1.0].

This parameter is important because:

It preserves the sign of values (positive stays positive, negative stays negative)
It keeps zero values as zero (important for sparse data)
It's simpler than min-max scaling but still effective

Mean

Gets or sets the mean (average) value of the data.

public T Mean { get; set; }

Property Value

T: The mean value, used for z-score normalization.

Remarks

This property stores the mean (average) value of the data for the feature or target variable. It is primarily used for z-score normalization, where values are scaled by subtracting the mean and dividing by the standard deviation. This centers the data around zero. The mean is typically calculated during training based on the training data.

For Beginners: This stores the average value of the data.

The mean value:

Is used primarily for z-score normalization
Represents the center point of the data distribution
Is subtracted from each value during z-score normalization

For example, if the average house price in your dataset is $350,000, this value would be 350000.

This parameter is important because:

It defines the center of the data distribution
It's needed to properly center new data points
It helps ensure consistent normalization

Median

Gets or sets the median value of the data.

public T Median { get; set; }

Property Value

T: The median value, used for robust normalization.

Remarks

This property stores the median value of the data for the feature or target variable. It is primarily used for robust normalization, where values are scaled by subtracting the median and dividing by the interquartile range. This approach is less sensitive to outliers than z-score normalization, which uses the mean and standard deviation. The median is the middle value when the data is sorted and is typically calculated during training based on the training data.

For Beginners: This stores the middle value of the data when sorted.

The median value:

Is used primarily for robust normalization
Represents the middle point of the sorted data
Is less affected by outliers than the mean

For example, if the middle house price in your sorted dataset is $320,000, this value would be 320000.

This parameter is important because:

It provides a robust measure of central tendency
It's less influenced by extreme values than the mean
It's used in robust scaling to handle data with outliers

Method

Gets or sets the normalization method used.

public NormalizationMethod Method { get; set; }

Property Value

NormalizationMethod: A NormalizationMethod enumeration value indicating which normalization technique is used.

Remarks

This property indicates which normalization method is used for the feature or target variable. Different methods use different parameters and have different characteristics. For example, min-max scaling normalizes values to a specific range (typically 0 to 1), z-score normalization centers the data around zero with a standard deviation of one, and robust scaling uses the median and interquartile range to be less sensitive to outliers.

For Beginners: This indicates which scaling technique is being used.

The normalization method:

Determines how the data will be scaled
Affects which parameters are actually used
Has different properties and use cases

Common methods include:

None: No normalization is applied
MinMax: Scales data to a range, typically 0-1 (uses Min and Max)
ZScore: Centers data around 0 with standard deviation of 1 (uses Mean and StdDev)
Robust: Similar to ZScore but less affected by outliers (uses Median and IQR)
Custom: Uses custom Scale and Shift values
Binning: Divides data into discrete bins

Each method has advantages for different types of data and models.

Min

Gets or sets the minimum value observed in the data.

public T Min { get; set; }

Property Value

T: The minimum value, used for min-max normalization.

Remarks

This property stores the minimum value observed in the data for the feature or target variable. It is primarily used for min-max normalization, where values are scaled to a range based on the minimum and maximum values. The minimum value is typically calculated during training based on the training data.

For Beginners: This stores the smallest value observed in the data.

The minimum value:

Is used primarily for min-max scaling
Represents the lower bound of the original data range
Is typically mapped to 0 or another lower bound in the normalized range

For example, if normalizing house prices and the cheapest house is $100,000, this value would be 100000.

This parameter is important because:

It defines the lower end of the data range
It's needed to properly scale new data points
It helps ensure consistent normalization

OutputDistribution

Gets or sets the target output distribution for quantile transformation.

public OutputDistribution OutputDistribution { get; set; }

Property Value

OutputDistribution: An OutputDistribution enum indicating either Uniform or Normal distribution.

Remarks

This property specifies whether the QuantileTransformer should map data to a uniform distribution (where all ranges have equal probability) or a normal distribution (bell-shaped curve). This setting determines how the quantiles are mapped during transformation.

For Beginners: This specifies what shape you want your data to have after transformation.

The output distribution:

Can be Uniform (flat distribution) or Normal (bell curve)
Affects how values are redistributed
Depends on what your machine learning algorithm expects

Uniform distribution:

All value ranges have equal numbers of data points
Values are spread evenly across the range
Good for algorithms that don't assume any particular distribution

Normal distribution:

Creates a bell-shaped curve
Most values cluster around the center
Good for algorithms that work best with normally-distributed data

P

Gets or sets a power parameter for certain normalization methods.

public T P { get; set; }

Property Value

T: The power parameter, used for power transformations.

Remarks

This property stores a power parameter that can be used for certain normalization methods, such as power transformations like Box-Cox or Yeo-Johnson transformations. These transformations can help make skewed data more normally distributed by raising values to a certain power. The optimal power parameter is typically determined during training to maximize the normality of the transformed data.

For Beginners: This stores a power value used for certain advanced normalization techniques.

The power parameter:

Is used for power transformations like Box-Cox or Yeo-Johnson
Helps make skewed data more normally distributed
Can be optimized to find the best transformation

For example, a value of 0.5 would correspond to a square root transformation, which can help normalize right-skewed data.

This parameter is useful when:

Your data has a skewed distribution
You want to make the data more normally distributed
Standard normalization methods don't work well

Power transformations are more advanced techniques but can significantly improve model performance with certain types of data.

Quantiles

Gets or sets the quantile values used for quantile transformation.

public List<T> Quantiles { get; set; }

Property Value

List<T>: A list of quantile values representing the empirical distribution.

Remarks

This property stores the quantile values calculated from the training data for QuantileTransformer. These quantiles represent the empirical cumulative distribution function (CDF) of the data and are used to map values to either a uniform or normal distribution. The number of quantiles determines the granularity of the transformation.

For Beginners: This stores the distribution pattern learned from your training data.

The quantiles list:

Is used for QuantileTransformer normalization
Contains values that divide your data into equal-sized groups
Helps map your data to a target distribution (uniform or normal)

For example, with 100 quantiles:

The 25th quantile is the value below which 25% of the data falls
The 50th quantile is the median
The 75th quantile is the value below which 75% of the data falls

This approach is powerful because:

It can handle any input distribution
It's very robust to outliers
It can transform data to match a desired distribution shape

Scale

Gets or sets the scale factor for custom normalization.

public T Scale { get; set; }

Property Value

T: The scale factor, used for custom normalization.

Remarks

This property stores a custom scale factor for the feature or target variable. It is used for custom normalization, where values are scaled by multiplying by this factor. Custom normalization allows for more flexibility in how the data is scaled, but requires the scale factor to be specified explicitly rather than being calculated from the data.

For Beginners: This stores a custom multiplication factor for scaling.

The scale factor:

Is used for custom normalization
Represents how much to multiply each value by
Allows for flexible, manual control of scaling

For example, if you want to convert dollars to thousands of dollars, you might use a scale factor of 0.001.

This parameter is useful when:

You want more control over the normalization process
You have domain knowledge about the appropriate scaling
Standard methods don't fit your specific needs

Shift

Gets or sets the shift value for custom normalization.

public T Shift { get; set; }

Property Value

T: The shift value, used for custom normalization.

Remarks

This property stores a custom shift value for the feature or target variable. It is used for custom normalization, where values are shifted by adding this value (typically after scaling). Custom normalization allows for more flexibility in how the data is transformed, but requires the shift value to be specified explicitly rather than being calculated from the data.

For Beginners: This stores a custom value to add after scaling.

The shift value:

Is used for custom normalization
Represents how much to add to each value after scaling
Allows for flexible, manual control of normalization

For example, if you want to shift temperatures from Celsius to Fahrenheit, you might use a scale of 1.8 and a shift of 32.

This parameter is useful when:

You want more control over the normalization process
You have domain knowledge about the appropriate transformation
Standard methods don't fit your specific needs

StdDev

Gets or sets the standard deviation of the data.

public T StdDev { get; set; }

Property Value

T: The standard deviation, used for z-score normalization.

Remarks

This property stores the standard deviation of the data for the feature or target variable. It is primarily used for z-score normalization, where values are scaled by subtracting the mean and dividing by the standard deviation. This scales the data to have a standard deviation of one. The standard deviation is typically calculated during training based on the training data.

For Beginners: This stores how spread out the data values are.

The standard deviation:

Is used primarily for z-score normalization
Measures how dispersed the data is around the mean
Is used as a divisor during z-score normalization

For example, if house prices in your dataset typically vary by about $150,000 from the mean, this value would be approximately 150000.

This parameter is important because:

It defines the scale of variation in the data
It's needed to properly scale new data points
It helps ensure consistent normalization

Table of Contents

Class NormalizationParameters<T>

Type Parameters

Remarks

Constructors

NormalizationParameters(INumericOperations<T>?)

Parameters

Remarks

Properties

Bins

Property Value

IQR

Property Value

Remarks

Max

Property Value

Remarks

MaxAbs

Property Value

Remarks

Mean

Property Value

Remarks

Median

Property Value

Remarks

Method

Property Value

Remarks

Min

Property Value

Remarks

OutputDistribution

Property Value

Remarks

P

Property Value

Remarks

Quantiles

Property Value

Remarks

Scale

Property Value

Remarks

Shift

Property Value

Remarks

StdDev

Property Value

Remarks