Table of Contents

Class QuantizationConfig

Namespace
AiDotNet.Deployment.Configuration
Assembly
AiDotNet.dll

Configuration for model quantization - compressing models by using lower precision numbers.

public class QuantizationConfig
Inheritance
QuantizationConfig
Inherited Members

Remarks

For Beginners: Quantization makes your AI model smaller and faster by using smaller numbers. Think of it like compressing a high-quality photo - it takes less space but might lose a little quality.

Why use quantization?

  • Smaller model size (50-75% reduction)
  • Faster inference (2-4x speedup)
  • Lower memory usage
  • Enables deployment on mobile/edge devices

Trade-offs:

  • Slightly lower accuracy (usually 1-5%)
  • Some models are more sensitive than others

Modes:

  • None: No quantization (full precision)
  • Float16: Half precision (50% size reduction, minimal accuracy loss)
  • Int8: 8-bit integers (75% size reduction, small accuracy loss)

For most models, Float16 is a great choice - significant benefits with minimal accuracy loss.

Properties

CalibrationMethod

Gets or sets the calibration method used to determine optimal scaling factors (default: MinMax).

public CalibrationMethod CalibrationMethod { get; set; }

Property Value

CalibrationMethod

Remarks

For Beginners: Calibration determines how to convert large numbers to small numbers. Only used for Int8 quantization. MinMax is fast and works well for most cases. Ignored for Float16 or None modes.

CalibrationSamples

Gets or sets the number of calibration samples to use (default: 100).

public int CalibrationSamples { get; set; }

Property Value

int

Remarks

For Beginners: More samples give better calibration but take longer. 100 is a good default. Use 1000+ for critical applications where accuracy is paramount.

Mode

Gets or sets the quantization mode (default: None).

public QuantizationMode Mode { get; set; }

Property Value

QuantizationMode

Remarks

For Beginners: Choose which type of quantization to use: - None: Full precision, no compression - Float16: Half precision, good balance - Int8: Maximum compression, slight accuracy loss

QuantizeActivations

Gets or sets whether to quantize only weights or both weights and activations (default: false).

public bool QuantizeActivations { get; set; }

Property Value

bool

Remarks

For Beginners: False means only compress the model parameters (weights). True means also compress the intermediate values during inference (activations). Activations give better compression but require calibration data.

UseSymmetricQuantization

Gets or sets whether to use symmetric quantization (default: true).

public bool UseSymmetricQuantization { get; set; }

Property Value

bool

Remarks

For Beginners: Symmetric quantization treats positive and negative values the same way. It's faster but asymmetric may be slightly more accurate. Use true for most cases.