Class FusedAttentionOp

Fused attention operation (Q*K^T + softmax + matmul V).

public class FusedAttentionOp : IROp

Inherited Members: IROp.OutputIds

IROp.OutputId

IROp.InputIds

IROp.OutputType

IROp.OutputShape

IROp.OutputShapes

IROp.OpType

IROp.ToString()

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

Remarks

For Beginners: The core of Transformer models!

Attention: scores = Q @ K^T / sqrt(d_k) weights = softmax(scores) output = weights @ V

This is the most expensive part of transformers. Fusing allows optimizations like Flash Attention for massive speedups.

Gets or sets whether to use causal masking.

public bool CausalMask { get; set; }

Gets or sets the scaling factor (typically 1/sqrt(d_k)).

public double Scale { get; set; }

Gets or sets the softmax axis.

public int SoftmaxAxis { get; set; }

Validates inputs (Q, K, V).

public override bool Validate()