Class UpperConfidenceBoundExploration<T>

Namespace: AiDotNet.ReinforcementLearning.Policies.Exploration

Assembly: AiDotNet.dll

Upper Confidence Bound (UCB) exploration for discrete action spaces. Balances exploration and exploitation using confidence intervals: UCB(a) = Q(a) + c * √(ln(t) / N(a))

public class UpperConfidenceBoundExploration<T> : ExplorationStrategyBase<T>, IExplorationStrategy<T>

Type Parameters

T: The numeric type used for calculations.

Inheritance: object

ExplorationStrategyBase<T>

UpperConfidenceBoundExploration<T>

Implements: IExplorationStrategy<T>

Inherited Members: ExplorationStrategyBase<T>.NumOps

ExplorationStrategyBase<T>.BoxMullerSample(Random)

ExplorationStrategyBase<T>.ClampAction(Vector<T>, double, double)

ExplorationStrategyBase<T>.ValidateActionSize(int, int, string)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Constructors

UpperConfidenceBoundExploration(double)

Initializes a new instance of the Upper Confidence Bound exploration strategy.

public UpperConfidenceBoundExploration(double explorationConstant = 2)

Parameters

explorationConstant double: Exploration constant 'c' that controls exploration level (default: 2.0).

Properties

ExplorationConstant

Gets the current exploration constant.

public double ExplorationConstant { get; }

Property Value

double

TotalSteps

Gets the total number of steps taken.

public int TotalSteps { get; }

Property Value

int

Methods

GetExplorationAction(Vector<T>, Vector<T>, int, Random)

Selects action using UCB: action with highest Q(a) + c * √(ln(t) / N(a))

public override Vector<T> GetExplorationAction(Vector<T> state, Vector<T> policyAction, int actionSpaceSize, Random random)

Parameters

state Vector<T>
policyAction Vector<T>
actionSpaceSize int
random Random

Returns

Vector<T>

Reset()

Resets action counts and total steps.

public override void Reset()

Update()

Updates internal parameters (UCB is count-based, no explicit decay).

public override void Update()

Table of Contents

Class UpperConfidenceBoundExploration<T>

Type Parameters

Constructors

UpperConfidenceBoundExploration(double)

Parameters

Properties

ExplorationConstant

Property Value

TotalSteps

Property Value

Methods

GetExplorationAction(Vector<T>, Vector<T>, int, Random)

Parameters

Returns

Reset()

Update()