Class DeepCompression<T>
- Namespace
- AiDotNet.ModelCompression
- Assembly
- AiDotNet.dll
Implements the Deep Compression algorithm from Han et al. (2015).
public class DeepCompression<T> : ModelCompressionBase<T>, IModelCompressionStrategy<T>
Type Parameters
TThe numeric type used for calculations (e.g., float, double).
- Inheritance
-
DeepCompression<T>
- Implements
- Inherited Members
Remarks
Deep Compression is a three-stage compression pipeline that achieves 35-49x compression on neural networks with minimal accuracy loss. The technique was introduced in:
Han, S., Mao, H., & Dally, W. J. (2015). "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding." arXiv:1510.00149.
The three stages are applied sequentially:
- Pruning: Remove weights below a magnitude threshold
- Quantization: Cluster remaining weights using k-means (weight sharing)
- Huffman Coding: Apply entropy coding to the sparse, quantized representation
For Beginners: Deep Compression is like a three-step recipe for making neural networks much smaller:
Step 1 - Pruning (Remove the unimportant) Think of it like cleaning out your closet. Many neural network weights are tiny and don't really matter. We set these to zero and don't store them at all. This alone can give ~9x compression!
Step 2 - Quantization (Group similar values) After pruning, we group similar weight values together. Instead of storing the exact value 0.4523, 0.4518, 0.4531, we store them all as "cluster #7 = 0.4524". We only need to store which cluster each weight belongs to. This gives another ~4x compression!
Step 3 - Huffman Coding (Efficient storage) Some cluster numbers appear more often than others. We use shorter codes for common values and longer codes for rare values (like Morse code). This gives another ~1.5x compression!
Combined: 9 × 4 × 1.5 ≈ 35-50x compression!
Example usage:
var deepCompression = new DeepCompression<double>(
pruningSparsity: 0.9, // Remove 90% of weights
numClusters: 32, // 5-bit quantization
huffmanPrecision: 4); // 4 decimal places
var (compressed, metadata) = deepCompression.Compress(weights);
var restored = deepCompression.Decompress(compressed, metadata);
Constructors
DeepCompression(double, double, int, int, double, int, int?, bool)
Initializes a new instance of the DeepCompression class.
public DeepCompression(double pruningSparsity = 0.9, double pruningThreshold = 0, int numClusters = 32, int maxClusteringIterations = 100, double clusteringTolerance = 1E-06, int huffmanPrecision = 4, int? randomSeed = null, bool enableRetraining = false)
Parameters
pruningSparsitydoubleTarget sparsity for pruning stage (default: 0.9 = 90% zeros).
pruningThresholddoubleExplicit magnitude threshold (default: 0 = use sparsity target).
numClustersintNumber of clusters for quantization (default: 32 for 5-bit).
maxClusteringIterationsintMaximum k-means iterations (default: 100).
clusteringTolerancedoubleK-means convergence tolerance (default: 1e-6).
huffmanPrecisionintDecimal precision for Huffman encoding (default: 4).
randomSeedint?Random seed for reproducibility.
enableRetrainingboolWhether to enable fine-tuning hints (default: false).
Remarks
For Beginners: These parameters let you tune the compression:
Pruning parameters:
- pruningSparsity: What fraction of weights to remove (0.9 = remove 90%)
- pruningThreshold: Alternative way to set pruning (by magnitude instead of percentage)
Quantization parameters:
- numClusters: How many unique weight values to allow
- 16 clusters = 4-bit (more compression, less accuracy)
- 32 clusters = 5-bit (Han et al. recommended for conv layers)
- 256 clusters = 8-bit (less compression, higher accuracy)
Huffman parameters:
- huffmanPrecision: How precisely to encode cluster indices
Han et al. recommended settings:
- Convolutional layers: 8-bit (256 clusters), 65-70% sparsity
- Fully-connected layers: 5-bit (32 clusters), 90-95% sparsity
Methods
Compress(Vector<T>)
Compresses weights using the three-stage Deep Compression pipeline.
public override (Vector<T> compressedWeights, ICompressionMetadata<T> metadata) Compress(Vector<T> weights)
Parameters
weightsVector<T>The original model weights.
Returns
- (Vector<T> compressedWeights, ICompressionMetadata<T> metadata)
Compressed weights and metadata for all three stages.
Remarks
For Beginners: This method applies all three compression stages in order: 1. First, it prunes (removes) small weights 2. Then, it clusters the remaining weights into groups 3. Finally, it applies Huffman coding for efficient storage
The metadata contains everything needed to reverse this process.
Decompress(Vector<T>, ICompressionMetadata<T>)
Decompresses weights by reversing all three stages.
public override Vector<T> Decompress(Vector<T> compressedWeights, ICompressionMetadata<T> metadata)
Parameters
compressedWeightsVector<T>The compressed weights.
metadataICompressionMetadata<T>The Deep Compression metadata.
Returns
- Vector<T>
The decompressed weights.
ForConvolutionalLayers(int?)
Creates a DeepCompression instance optimized for convolutional layers.
public static DeepCompression<T> ForConvolutionalLayers(int? randomSeed = null)
Parameters
randomSeedint?Optional random seed for reproducibility.
Returns
- DeepCompression<T>
A DeepCompression instance with Han et al. recommended settings for conv layers.
Remarks
For Beginners: Convolutional layers are typically more sensitive to compression, so we use more conservative settings: 8-bit quantization and lower sparsity.
ForFullyConnectedLayers(int?)
Creates a DeepCompression instance optimized for fully-connected layers.
public static DeepCompression<T> ForFullyConnectedLayers(int? randomSeed = null)
Parameters
randomSeedint?Optional random seed for reproducibility.
Returns
- DeepCompression<T>
A DeepCompression instance with Han et al. recommended settings for FC layers.
Remarks
For Beginners: Fully-connected layers have many redundant weights and can tolerate more aggressive compression: 5-bit quantization and higher sparsity.
GetCompressedSize(Vector<T>, ICompressionMetadata<T>)
Gets the total compressed size from all three stages.
public override long GetCompressedSize(Vector<T> compressedWeights, ICompressionMetadata<T> metadata)
Parameters
compressedWeightsVector<T>metadataICompressionMetadata<T>