Table of Contents

Class UpperConfidenceBoundExploration<T>

Namespace
AiDotNet.ReinforcementLearning.Policies.Exploration
Assembly
AiDotNet.dll

Upper Confidence Bound (UCB) exploration for discrete action spaces. Balances exploration and exploitation using confidence intervals: UCB(a) = Q(a) + c * √(ln(t) / N(a))

public class UpperConfidenceBoundExploration<T> : ExplorationStrategyBase<T>, IExplorationStrategy<T>

Type Parameters

T

The numeric type used for calculations.

Inheritance
UpperConfidenceBoundExploration<T>
Implements
Inherited Members

Constructors

UpperConfidenceBoundExploration(double)

Initializes a new instance of the Upper Confidence Bound exploration strategy.

public UpperConfidenceBoundExploration(double explorationConstant = 2)

Parameters

explorationConstant double

Exploration constant 'c' that controls exploration level (default: 2.0).

Properties

ExplorationConstant

Gets the current exploration constant.

public double ExplorationConstant { get; }

Property Value

double

TotalSteps

Gets the total number of steps taken.

public int TotalSteps { get; }

Property Value

int

Methods

GetExplorationAction(Vector<T>, Vector<T>, int, Random)

Selects action using UCB: action with highest Q(a) + c * √(ln(t) / N(a))

public override Vector<T> GetExplorationAction(Vector<T> state, Vector<T> policyAction, int actionSpaceSize, Random random)

Parameters

state Vector<T>
policyAction Vector<T>
actionSpaceSize int
random Random

Returns

Vector<T>

Reset()

Resets action counts and total steps.

public override void Reset()

Update()

Updates internal parameters (UCB is count-based, no explicit decay).

public override void Update()