Class FusedAttentionOp
- Namespace
- AiDotNet.JitCompiler.IR.Operations
- Assembly
- AiDotNet.dll
Fused attention operation (Q*K^T + softmax + matmul V).
public class FusedAttentionOp : IROp
- Inheritance
-
FusedAttentionOp
- Inherited Members
Remarks
For Beginners: The core of Transformer models!
Attention: scores = Q @ K^T / sqrt(d_k) weights = softmax(scores) output = weights @ V
This is the most expensive part of transformers. Fusing allows optimizations like Flash Attention for massive speedups.
Properties
CausalMask
Gets or sets whether to use causal masking.
public bool CausalMask { get; set; }
Property Value
Scale
Gets or sets the scaling factor (typically 1/sqrt(d_k)).
public double Scale { get; set; }
Property Value
SoftmaxAxis
Gets or sets the softmax axis.
public int SoftmaxAxis { get; set; }
Property Value
Methods
Validate()
Validates inputs (Q, K, V).
public override bool Validate()