Table of Contents

Class ThompsonSamplingExploration<T>

Namespace
AiDotNet.ReinforcementLearning.Policies.Exploration
Assembly
AiDotNet.dll

Thompson Sampling (Bayesian) exploration for discrete action spaces. Maintains Beta distributions for each action and samples from posteriors.

public class ThompsonSamplingExploration<T> : ExplorationStrategyBase<T>, IExplorationStrategy<T>

Type Parameters

T

The numeric type used for calculations.

Inheritance
ThompsonSamplingExploration<T>
Implements
Inherited Members

Constructors

ThompsonSamplingExploration(double, double)

Initializes a new instance of the Thompson Sampling exploration strategy.

public ThompsonSamplingExploration(double priorAlpha = 1, double priorBeta = 1)

Parameters

priorAlpha double

Prior alpha parameter for Beta distribution (default: 1.0).

priorBeta double

Prior beta parameter for Beta distribution (default: 1.0).

Methods

GetExplorationAction(Vector<T>, Vector<T>, int, Random)

Selects action by sampling from Beta posteriors for each action.

public override Vector<T> GetExplorationAction(Vector<T> state, Vector<T> policyAction, int actionSpaceSize, Random random)

Parameters

state Vector<T>
policyAction Vector<T>
actionSpaceSize int
random Random

Returns

Vector<T>

Reset()

Resets all action distributions to prior.

public override void Reset()

Update()

Updates internal parameters (call UpdateDistribution separately for each action).

public override void Update()

UpdateDistribution(int, double)

Updates the Beta distribution for a specific action based on reward.

public void UpdateDistribution(int actionIndex, double reward)

Parameters

actionIndex int

The action that was taken.

reward double

The reward received (should be in [0, 1]).