Table of Contents

Class ProductQuantizationCompression<T>

Namespace
AiDotNet.ModelCompression
Assembly
AiDotNet.dll

Implements Product Quantization (PQ) compression for model weights.

public class ProductQuantizationCompression<T> : ModelCompressionBase<T>, IModelCompressionStrategy<T>

Type Parameters

T

The numeric type used for calculations (e.g., float, double).

Inheritance
ProductQuantizationCompression<T>
Implements
Inherited Members

Remarks

Product Quantization is a powerful compression technique that divides weight vectors into subvectors and quantizes each subvector separately using its own codebook. This provides a good balance between compression ratio and reconstruction accuracy.

For Beginners: Product Quantization is like organizing a closet using multiple small bins.

Instead of trying to compress all your clothes in one big box:

  1. Divide clothes into categories (shirts, pants, socks)
  2. For each category, pick a few representative items
  3. Store only which representative each item is most similar to

For neural network weights:

  • Divide each weight vector into M smaller pieces (subvectors)
  • For each piece, find K cluster centers (codebook)
  • Replace each subvector with its nearest codebook entry

Benefits:

  • Better accuracy than global clustering for the same compression ratio
  • Very efficient for high-dimensional weight vectors
  • Commonly used in production systems (e.g., FAISS library)

Example:

  • 1024-dimensional weight vector divided into 8 subvectors of 128 dimensions each
  • Each subvector has 256 possible codes (8-bit quantization)
  • Original: 1024 × 32 bits = 32,768 bits
  • Compressed: 8 × 8 bits + codebook = ~64 bits + codebook
  • Massive compression with minimal accuracy loss!

Important Limitation: This implementation is designed for compressing a single weight vector. Traditional PQ achieves compression by training codebooks on multiple vectors and amortizing codebook storage. For single-vector compression, the codebook overhead may exceed the original data size.

When to use this compressor:

  • When you have very high-dimensional weight vectors (thousands of dimensions)
  • When reconstruction quality is more important than compression ratio
  • When you plan to extend to batch compression of multiple similar vectors

For better single-vector compression:

Constructors

ProductQuantizationCompression(int, int, int, double, int?)

Initializes a new instance of the ProductQuantizationCompression class.

public ProductQuantizationCompression(int numSubvectors = 8, int numCentroids = 256, int maxIterations = 100, double tolerance = 1E-06, int? randomSeed = null)

Parameters

numSubvectors int

Number of subvectors to divide each weight vector into (default: 8).

numCentroids int

Number of centroids per subvector codebook (default: 256 for 8-bit).

maxIterations int

Maximum K-means iterations per codebook (default: 100).

tolerance double

Convergence tolerance for K-means (default: 1e-6).

randomSeed int?

Random seed for reproducibility.

Remarks

For Beginners: These parameters control the compression behavior:

  • numSubvectors: How many pieces to split each weight vector into

    • More subvectors = more compression but potentially lower accuracy
    • Fewer subvectors = less compression but higher accuracy
    • Must divide evenly into your weight vector length
  • numCentroids: How many representative values per subvector

    • 256 centroids = 8-bit codes (very common)
    • 16 centroids = 4-bit codes (more aggressive)
    • 65536 centroids = 16-bit codes (higher quality)
  • maxIterations/tolerance: Control the K-means clustering quality

    • Defaults work well for most cases

Methods

Compress(Vector<T>)

Compresses weights using Product Quantization.

public override (Vector<T> compressedWeights, ICompressionMetadata<T> metadata) Compress(Vector<T> weights)

Parameters

weights Vector<T>

The original model weights.

Returns

(Vector<T> compressedWeights, ICompressionMetadata<T> metadata)

Compressed weights and metadata containing codebooks and codes.

Decompress(Vector<T>, ICompressionMetadata<T>)

Decompresses weights by reconstructing from codebooks and codes.

public override Vector<T> Decompress(Vector<T> compressedWeights, ICompressionMetadata<T> metadata)

Parameters

compressedWeights Vector<T>

The compressed weights (codebook indices).

metadata ICompressionMetadata<T>

The metadata containing codebooks.

Returns

Vector<T>

The decompressed weights.

GetCompressedSize(Vector<T>, ICompressionMetadata<T>)

Gets the compressed size including codebooks and codes.

public override long GetCompressedSize(Vector<T> compressedWeights, ICompressionMetadata<T> metadata)

Parameters

compressedWeights Vector<T>
metadata ICompressionMetadata<T>

Returns

long