Class ProductQuantizationCompression<T>

Namespace: AiDotNet.ModelCompression

Assembly: AiDotNet.dll

Implements Product Quantization (PQ) compression for model weights.

public class ProductQuantizationCompression<T> : ModelCompressionBase<T>, IModelCompressionStrategy<T>

Type Parameters

T: The numeric type used for calculations (e.g., float, double).

Inheritance: object

ModelCompressionBase<T>

ProductQuantizationCompression<T>

Implements: IModelCompressionStrategy<T>

Inherited Members: ModelCompressionBase<T>.NumOps

ModelCompressionBase<T>.CalculateCompressionRatio(long, long)

ModelCompressionBase<T>.CompressMatrix(Matrix<T>)

ModelCompressionBase<T>.DecompressMatrix(Matrix<T>, ICompressionMetadata<T>)

ModelCompressionBase<T>.GetCompressedSize(Matrix<T>, ICompressionMetadata<T>)

ModelCompressionBase<T>.CompressTensor(Tensor<T>)

ModelCompressionBase<T>.DecompressTensor(Tensor<T>, ICompressionMetadata<T>)

ModelCompressionBase<T>.GetCompressedSize(Tensor<T>, ICompressionMetadata<T>)

ModelCompressionBase<T>.GetElementSize()

ModelCompressionBase<T>.MatrixToVector(Matrix<T>)

ModelCompressionBase<T>.VectorToMatrix(Vector<T>, int, int)

ModelCompressionBase<T>.TensorToVector(Tensor<T>)

ModelCompressionBase<T>.VectorToTensor(Vector<T>, int[])

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

Product Quantization is a powerful compression technique that divides weight vectors into subvectors and quantizes each subvector separately using its own codebook. This provides a good balance between compression ratio and reconstruction accuracy.

For Beginners: Product Quantization is like organizing a closet using multiple small bins.

Instead of trying to compress all your clothes in one big box:

Divide clothes into categories (shirts, pants, socks)
For each category, pick a few representative items
Store only which representative each item is most similar to

For neural network weights:

Divide each weight vector into M smaller pieces (subvectors)
For each piece, find K cluster centers (codebook)
Replace each subvector with its nearest codebook entry

Benefits:

Better accuracy than global clustering for the same compression ratio
Very efficient for high-dimensional weight vectors
Commonly used in production systems (e.g., FAISS library)

Example:

1024-dimensional weight vector divided into 8 subvectors of 128 dimensions each
Each subvector has 256 possible codes (8-bit quantization)
Original: 1024 × 32 bits = 32,768 bits
Compressed: 8 × 8 bits + codebook = ~64 bits + codebook
Massive compression with minimal accuracy loss!

Important Limitation: This implementation is designed for compressing a single weight vector. Traditional PQ achieves compression by training codebooks on multiple vectors and amortizing codebook storage. For single-vector compression, the codebook overhead may exceed the original data size.

When to use this compressor:

When you have very high-dimensional weight vectors (thousands of dimensions)
When reconstruction quality is more important than compression ratio
When you plan to extend to batch compression of multiple similar vectors

For better single-vector compression:

Consider WeightClusteringCompression<T> for simpler k-means clustering
Consider HuffmanEncodingCompression<T> for lossless entropy coding
Consider DeepCompression<T> for a multi-stage pipeline

Constructors

ProductQuantizationCompression(int, int, int, double, int?)

Initializes a new instance of the ProductQuantizationCompression class.

public ProductQuantizationCompression(int numSubvectors = 8, int numCentroids = 256, int maxIterations = 100, double tolerance = 1E-06, int? randomSeed = null)

Parameters

numSubvectors int: Number of subvectors to divide each weight vector into (default: 8).
numCentroids int: Number of centroids per subvector codebook (default: 256 for 8-bit).
maxIterations int: Maximum K-means iterations per codebook (default: 100).
tolerance double: Convergence tolerance for K-means (default: 1e-6).
randomSeed int?: Random seed for reproducibility.

Remarks

For Beginners: These parameters control the compression behavior:

numSubvectors: How many pieces to split each weight vector into
- More subvectors = more compression but potentially lower accuracy
- Fewer subvectors = less compression but higher accuracy
- Must divide evenly into your weight vector length
numCentroids: How many representative values per subvector
- 256 centroids = 8-bit codes (very common)
- 16 centroids = 4-bit codes (more aggressive)
- 65536 centroids = 16-bit codes (higher quality)
maxIterations/tolerance: Control the K-means clustering quality
- Defaults work well for most cases

Methods

Compress(Vector<T>)

Compresses weights using Product Quantization.

public override (Vector<T> compressedWeights, ICompressionMetadata<T> metadata) Compress(Vector<T> weights)

Parameters

weights Vector<T>: The original model weights.

Returns

(Vector<T> compressedWeights, ICompressionMetadata<T> metadata): Compressed weights and metadata containing codebooks and codes.

Decompress(Vector<T>, ICompressionMetadata<T>)

Decompresses weights by reconstructing from codebooks and codes.

public override Vector<T> Decompress(Vector<T> compressedWeights, ICompressionMetadata<T> metadata)

Parameters

compressedWeights Vector<T>: The compressed weights (codebook indices).
metadata ICompressionMetadata<T>: The metadata containing codebooks.

Returns

Vector<T>: The decompressed weights.

GetCompressedSize(Vector<T>, ICompressionMetadata<T>)

Gets the compressed size including codebooks and codes.

public override long GetCompressedSize(Vector<T> compressedWeights, ICompressionMetadata<T> metadata)

Parameters

compressedWeights Vector<T>
metadata ICompressionMetadata<T>

Returns

long

Table of Contents

Class ProductQuantizationCompression<T>

Type Parameters

Remarks

Constructors

ProductQuantizationCompression(int, int, int, double, int?)

Parameters

Remarks

Methods

Compress(Vector<T>)

Parameters

Returns

Decompress(Vector<T>, ICompressionMetadata<T>)

Parameters

Returns

GetCompressedSize(Vector<T>, ICompressionMetadata<T>)

Parameters

Returns