Class HashingEncoder<T>
- Namespace
- AiDotNet.Preprocessing.Encoders
- Assembly
- AiDotNet.dll
Encodes categorical features using feature hashing (hashing trick).
public class HashingEncoder<T> : TransformerBase<T, Matrix<T>, Matrix<T>>, IDataTransformer<T, Matrix<T>, Matrix<T>>
Type Parameters
TThe numeric type for calculations (e.g., float, double).
- Inheritance
-
HashingEncoder<T>
- Implements
- Inherited Members
Remarks
HashingEncoder uses a hash function to map categories to a fixed number of columns. This is useful for high-cardinality categorical features where one-hot encoding would create too many columns.
Unlike other encoders, HashingEncoder doesn't need to store the category mappings, making it memory-efficient and able to handle previously unseen categories.
For Beginners: Instead of creating one column per category: - Hash encoding creates a fixed number of columns (e.g., 8) - Each category is hashed to one of these columns - Multiple categories may share the same column (collision)
Pros: Fixed memory, handles new categories, fast Cons: Information loss from collisions, not reversible
Constructors
HashingEncoder(int, bool, int[]?)
Creates a new instance of HashingEncoder<T>.
public HashingEncoder(int nComponents = 8, bool alternateSign = true, int[]? columnIndices = null)
Parameters
nComponentsintNumber of output features per encoded column. Defaults to 8.
alternateSignboolIf true, use alternate signs to reduce collision bias. Defaults to true.
columnIndicesint[]The column indices to encode, or null for all columns.
Properties
AlternateSign
Gets whether alternate signs are used for hash collisions.
public bool AlternateSign { get; }
Property Value
NComponents
Gets the number of hash components (output features per encoded column).
public int NComponents { get; }
Property Value
SupportsInverseTransform
Gets whether this transformer supports inverse transformation.
public override bool SupportsInverseTransform { get; }
Property Value
Methods
FitCore(Matrix<T>)
Computes the output feature structure.
protected override void FitCore(Matrix<T> data)
Parameters
dataMatrix<T>The training data matrix.
GetFeatureNamesOut(string[]?)
Gets the output feature names after transformation.
public override string[] GetFeatureNamesOut(string[]? inputFeatureNames = null)
Parameters
inputFeatureNamesstring[]
Returns
- string[]
InverseTransformCore(Matrix<T>)
Inverse transformation is not supported for hash encoding.
protected override Matrix<T> InverseTransformCore(Matrix<T> data)
Parameters
dataMatrix<T>
Returns
- Matrix<T>
TransformCore(Matrix<T>)
Transforms the data using feature hashing.
protected override Matrix<T> TransformCore(Matrix<T> data)
Parameters
dataMatrix<T>The data to transform.
Returns
- Matrix<T>
The hash-encoded data.