Table of Contents

Class DeepCompression<T>

Namespace
AiDotNet.ModelCompression
Assembly
AiDotNet.dll

Implements the Deep Compression algorithm from Han et al. (2015).

public class DeepCompression<T> : ModelCompressionBase<T>, IModelCompressionStrategy<T>

Type Parameters

T

The numeric type used for calculations (e.g., float, double).

Inheritance
DeepCompression<T>
Implements
Inherited Members

Remarks

Deep Compression is a three-stage compression pipeline that achieves 35-49x compression on neural networks with minimal accuracy loss. The technique was introduced in:

Han, S., Mao, H., & Dally, W. J. (2015). "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding." arXiv:1510.00149.

The three stages are applied sequentially:

  1. Pruning: Remove weights below a magnitude threshold
  2. Quantization: Cluster remaining weights using k-means (weight sharing)
  3. Huffman Coding: Apply entropy coding to the sparse, quantized representation

For Beginners: Deep Compression is like a three-step recipe for making neural networks much smaller:

Step 1 - Pruning (Remove the unimportant) Think of it like cleaning out your closet. Many neural network weights are tiny and don't really matter. We set these to zero and don't store them at all. This alone can give ~9x compression!

Step 2 - Quantization (Group similar values) After pruning, we group similar weight values together. Instead of storing the exact value 0.4523, 0.4518, 0.4531, we store them all as "cluster #7 = 0.4524". We only need to store which cluster each weight belongs to. This gives another ~4x compression!

Step 3 - Huffman Coding (Efficient storage) Some cluster numbers appear more often than others. We use shorter codes for common values and longer codes for rare values (like Morse code). This gives another ~1.5x compression!

Combined: 9 × 4 × 1.5 ≈ 35-50x compression!

Example usage:

var deepCompression = new DeepCompression<double>(
    pruningSparsity: 0.9,    // Remove 90% of weights
    numClusters: 32,          // 5-bit quantization
    huffmanPrecision: 4);     // 4 decimal places

var (compressed, metadata) = deepCompression.Compress(weights);
var restored = deepCompression.Decompress(compressed, metadata);

Constructors

DeepCompression(double, double, int, int, double, int, int?, bool)

Initializes a new instance of the DeepCompression class.

public DeepCompression(double pruningSparsity = 0.9, double pruningThreshold = 0, int numClusters = 32, int maxClusteringIterations = 100, double clusteringTolerance = 1E-06, int huffmanPrecision = 4, int? randomSeed = null, bool enableRetraining = false)

Parameters

pruningSparsity double

Target sparsity for pruning stage (default: 0.9 = 90% zeros).

pruningThreshold double

Explicit magnitude threshold (default: 0 = use sparsity target).

numClusters int

Number of clusters for quantization (default: 32 for 5-bit).

maxClusteringIterations int

Maximum k-means iterations (default: 100).

clusteringTolerance double

K-means convergence tolerance (default: 1e-6).

huffmanPrecision int

Decimal precision for Huffman encoding (default: 4).

randomSeed int?

Random seed for reproducibility.

enableRetraining bool

Whether to enable fine-tuning hints (default: false).

Remarks

For Beginners: These parameters let you tune the compression:

Pruning parameters:

  • pruningSparsity: What fraction of weights to remove (0.9 = remove 90%)
  • pruningThreshold: Alternative way to set pruning (by magnitude instead of percentage)

Quantization parameters:

  • numClusters: How many unique weight values to allow
    • 16 clusters = 4-bit (more compression, less accuracy)
    • 32 clusters = 5-bit (Han et al. recommended for conv layers)
    • 256 clusters = 8-bit (less compression, higher accuracy)

Huffman parameters:

  • huffmanPrecision: How precisely to encode cluster indices

Han et al. recommended settings:

  • Convolutional layers: 8-bit (256 clusters), 65-70% sparsity
  • Fully-connected layers: 5-bit (32 clusters), 90-95% sparsity

Methods

Compress(Vector<T>)

Compresses weights using the three-stage Deep Compression pipeline.

public override (Vector<T> compressedWeights, ICompressionMetadata<T> metadata) Compress(Vector<T> weights)

Parameters

weights Vector<T>

The original model weights.

Returns

(Vector<T> compressedWeights, ICompressionMetadata<T> metadata)

Compressed weights and metadata for all three stages.

Remarks

For Beginners: This method applies all three compression stages in order: 1. First, it prunes (removes) small weights 2. Then, it clusters the remaining weights into groups 3. Finally, it applies Huffman coding for efficient storage

The metadata contains everything needed to reverse this process.

Decompress(Vector<T>, ICompressionMetadata<T>)

Decompresses weights by reversing all three stages.

public override Vector<T> Decompress(Vector<T> compressedWeights, ICompressionMetadata<T> metadata)

Parameters

compressedWeights Vector<T>

The compressed weights.

metadata ICompressionMetadata<T>

The Deep Compression metadata.

Returns

Vector<T>

The decompressed weights.

ForConvolutionalLayers(int?)

Creates a DeepCompression instance optimized for convolutional layers.

public static DeepCompression<T> ForConvolutionalLayers(int? randomSeed = null)

Parameters

randomSeed int?

Optional random seed for reproducibility.

Returns

DeepCompression<T>

A DeepCompression instance with Han et al. recommended settings for conv layers.

Remarks

For Beginners: Convolutional layers are typically more sensitive to compression, so we use more conservative settings: 8-bit quantization and lower sparsity.

ForFullyConnectedLayers(int?)

Creates a DeepCompression instance optimized for fully-connected layers.

public static DeepCompression<T> ForFullyConnectedLayers(int? randomSeed = null)

Parameters

randomSeed int?

Optional random seed for reproducibility.

Returns

DeepCompression<T>

A DeepCompression instance with Han et al. recommended settings for FC layers.

Remarks

For Beginners: Fully-connected layers have many redundant weights and can tolerate more aggressive compression: 5-bit quantization and higher sparsity.

GetCompressedSize(Vector<T>, ICompressionMetadata<T>)

Gets the total compressed size from all three stages.

public override long GetCompressedSize(Vector<T> compressedWeights, ICompressionMetadata<T> metadata)

Parameters

compressedWeights Vector<T>
metadata ICompressionMetadata<T>

Returns

long