Class UpperConfidenceBoundExploration<T>
- Namespace
- AiDotNet.ReinforcementLearning.Policies.Exploration
- Assembly
- AiDotNet.dll
Upper Confidence Bound (UCB) exploration for discrete action spaces. Balances exploration and exploitation using confidence intervals: UCB(a) = Q(a) + c * √(ln(t) / N(a))
public class UpperConfidenceBoundExploration<T> : ExplorationStrategyBase<T>, IExplorationStrategy<T>
Type Parameters
TThe numeric type used for calculations.
- Inheritance
-
UpperConfidenceBoundExploration<T>
- Implements
- Inherited Members
Constructors
UpperConfidenceBoundExploration(double)
Initializes a new instance of the Upper Confidence Bound exploration strategy.
public UpperConfidenceBoundExploration(double explorationConstant = 2)
Parameters
explorationConstantdoubleExploration constant 'c' that controls exploration level (default: 2.0).
Properties
ExplorationConstant
Gets the current exploration constant.
public double ExplorationConstant { get; }
Property Value
TotalSteps
Gets the total number of steps taken.
public int TotalSteps { get; }
Property Value
Methods
GetExplorationAction(Vector<T>, Vector<T>, int, Random)
Selects action using UCB: action with highest Q(a) + c * √(ln(t) / N(a))
public override Vector<T> GetExplorationAction(Vector<T> state, Vector<T> policyAction, int actionSpaceSize, Random random)
Parameters
Returns
- Vector<T>
Reset()
Resets action counts and total steps.
public override void Reset()
Update()
Updates internal parameters (UCB is count-based, no explicit decay).
public override void Update()