Class ProductQuantizationCompression<T>
- Namespace
- AiDotNet.ModelCompression
- Assembly
- AiDotNet.dll
Implements Product Quantization (PQ) compression for model weights.
public class ProductQuantizationCompression<T> : ModelCompressionBase<T>, IModelCompressionStrategy<T>
Type Parameters
TThe numeric type used for calculations (e.g., float, double).
- Inheritance
-
ProductQuantizationCompression<T>
- Implements
- Inherited Members
Remarks
Product Quantization is a powerful compression technique that divides weight vectors into subvectors and quantizes each subvector separately using its own codebook. This provides a good balance between compression ratio and reconstruction accuracy.
For Beginners: Product Quantization is like organizing a closet using multiple small bins.
Instead of trying to compress all your clothes in one big box:
- Divide clothes into categories (shirts, pants, socks)
- For each category, pick a few representative items
- Store only which representative each item is most similar to
For neural network weights:
- Divide each weight vector into M smaller pieces (subvectors)
- For each piece, find K cluster centers (codebook)
- Replace each subvector with its nearest codebook entry
Benefits:
- Better accuracy than global clustering for the same compression ratio
- Very efficient for high-dimensional weight vectors
- Commonly used in production systems (e.g., FAISS library)
Example:
- 1024-dimensional weight vector divided into 8 subvectors of 128 dimensions each
- Each subvector has 256 possible codes (8-bit quantization)
- Original: 1024 × 32 bits = 32,768 bits
- Compressed: 8 × 8 bits + codebook = ~64 bits + codebook
- Massive compression with minimal accuracy loss!
Important Limitation: This implementation is designed for compressing a single weight vector. Traditional PQ achieves compression by training codebooks on multiple vectors and amortizing codebook storage. For single-vector compression, the codebook overhead may exceed the original data size.
When to use this compressor:
- When you have very high-dimensional weight vectors (thousands of dimensions)
- When reconstruction quality is more important than compression ratio
- When you plan to extend to batch compression of multiple similar vectors
For better single-vector compression:
- Consider WeightClusteringCompression<T> for simpler k-means clustering
- Consider HuffmanEncodingCompression<T> for lossless entropy coding
- Consider DeepCompression<T> for a multi-stage pipeline
Constructors
ProductQuantizationCompression(int, int, int, double, int?)
Initializes a new instance of the ProductQuantizationCompression class.
public ProductQuantizationCompression(int numSubvectors = 8, int numCentroids = 256, int maxIterations = 100, double tolerance = 1E-06, int? randomSeed = null)
Parameters
numSubvectorsintNumber of subvectors to divide each weight vector into (default: 8).
numCentroidsintNumber of centroids per subvector codebook (default: 256 for 8-bit).
maxIterationsintMaximum K-means iterations per codebook (default: 100).
tolerancedoubleConvergence tolerance for K-means (default: 1e-6).
randomSeedint?Random seed for reproducibility.
Remarks
For Beginners: These parameters control the compression behavior:
numSubvectors: How many pieces to split each weight vector into
- More subvectors = more compression but potentially lower accuracy
- Fewer subvectors = less compression but higher accuracy
- Must divide evenly into your weight vector length
numCentroids: How many representative values per subvector
- 256 centroids = 8-bit codes (very common)
- 16 centroids = 4-bit codes (more aggressive)
- 65536 centroids = 16-bit codes (higher quality)
maxIterations/tolerance: Control the K-means clustering quality
- Defaults work well for most cases
Methods
Compress(Vector<T>)
Compresses weights using Product Quantization.
public override (Vector<T> compressedWeights, ICompressionMetadata<T> metadata) Compress(Vector<T> weights)
Parameters
weightsVector<T>The original model weights.
Returns
- (Vector<T> compressedWeights, ICompressionMetadata<T> metadata)
Compressed weights and metadata containing codebooks and codes.
Decompress(Vector<T>, ICompressionMetadata<T>)
Decompresses weights by reconstructing from codebooks and codes.
public override Vector<T> Decompress(Vector<T> compressedWeights, ICompressionMetadata<T> metadata)
Parameters
compressedWeightsVector<T>The compressed weights (codebook indices).
metadataICompressionMetadata<T>The metadata containing codebooks.
Returns
- Vector<T>
The decompressed weights.
GetCompressedSize(Vector<T>, ICompressionMetadata<T>)
Gets the compressed size including codebooks and codes.
public override long GetCompressedSize(Vector<T> compressedWeights, ICompressionMetadata<T> metadata)
Parameters
compressedWeightsVector<T>metadataICompressionMetadata<T>