Table of Contents

Class FusedAttentionOp

Namespace
AiDotNet.JitCompiler.IR.Operations
Assembly
AiDotNet.dll

Fused attention operation (Q*K^T + softmax + matmul V).

public class FusedAttentionOp : IROp
Inheritance
FusedAttentionOp
Inherited Members

Remarks

For Beginners: The core of Transformer models!

Attention: scores = Q @ K^T / sqrt(d_k) weights = softmax(scores) output = weights @ V

This is the most expensive part of transformers. Fusing allows optimizations like Flash Attention for massive speedups.

Properties

CausalMask

Gets or sets whether to use causal masking.

public bool CausalMask { get; set; }

Property Value

bool

Scale

Gets or sets the scaling factor (typically 1/sqrt(d_k)).

public double Scale { get; set; }

Property Value

double

SoftmaxAxis

Gets or sets the softmax axis.

public int SoftmaxAxis { get; set; }

Property Value

int

Methods

Validate()

Validates inputs (Q, K, V).

public override bool Validate()

Returns

bool