Enum FlashAttentionPrecision

Namespace: AiDotNet.NeuralNetworks.Attention

Assembly: AiDotNet.dll

Precision modes for Flash Attention computation.

public enum FlashAttentionPrecision

Fields

Float16 = 0: Use 16-bit floating point (half precision). Fastest but may have numerical issues with very long sequences.
Float32 = 1: Use 32-bit floating point (single precision). Good balance of speed and accuracy.
Mixed = 2: Use mixed precision (FP16 for matmul, FP32 for softmax). Best combination of speed and numerical stability.

Edit this page