Class QuantizationConfig
- Namespace
- AiDotNet.Deployment.Configuration
- Assembly
- AiDotNet.dll
Configuration for model quantization - compressing models by using lower precision numbers.
public class QuantizationConfig
- Inheritance
-
QuantizationConfig
- Inherited Members
Remarks
For Beginners: Quantization makes your AI model smaller and faster by using smaller numbers. Think of it like compressing a high-quality photo - it takes less space but might lose a little quality.
Why use quantization?
- Smaller model size (50-75% reduction)
- Faster inference (2-4x speedup)
- Lower memory usage
- Enables deployment on mobile/edge devices
Trade-offs:
- Slightly lower accuracy (usually 1-5%)
- Some models are more sensitive than others
Modes:
- None: No quantization (full precision)
- Float16: Half precision (50% size reduction, minimal accuracy loss)
- Int8: 8-bit integers (75% size reduction, small accuracy loss)
For most models, Float16 is a great choice - significant benefits with minimal accuracy loss.
Properties
CalibrationMethod
Gets or sets the calibration method used to determine optimal scaling factors (default: MinMax).
public CalibrationMethod CalibrationMethod { get; set; }
Property Value
Remarks
For Beginners: Calibration determines how to convert large numbers to small numbers. Only used for Int8 quantization. MinMax is fast and works well for most cases. Ignored for Float16 or None modes.
CalibrationSamples
Gets or sets the number of calibration samples to use (default: 100).
public int CalibrationSamples { get; set; }
Property Value
Remarks
For Beginners: More samples give better calibration but take longer. 100 is a good default. Use 1000+ for critical applications where accuracy is paramount.
Mode
Gets or sets the quantization mode (default: None).
public QuantizationMode Mode { get; set; }
Property Value
Remarks
For Beginners: Choose which type of quantization to use: - None: Full precision, no compression - Float16: Half precision, good balance - Int8: Maximum compression, slight accuracy loss
QuantizeActivations
Gets or sets whether to quantize only weights or both weights and activations (default: false).
public bool QuantizeActivations { get; set; }
Property Value
Remarks
For Beginners: False means only compress the model parameters (weights). True means also compress the intermediate values during inference (activations). Activations give better compression but require calibration data.
UseSymmetricQuantization
Gets or sets whether to use symmetric quantization (default: true).
public bool UseSymmetricQuantization { get; set; }
Property Value
Remarks
For Beginners: Symmetric quantization treats positive and negative values the same way. It's faster but asymmetric may be slightly more accurate. Use true for most cases.