Table of Contents

Namespace AiDotNet.ReinforcementLearning.Policies.Exploration

Classes

BoltzmannExploration<T>

Boltzmann (softmax) exploration with temperature-based action selection. Uses temperature parameter to control exploration: higher temperature = more random. Action probability: P(a) = exp(Q(a)/τ) / Σ exp(Q(a')/τ)

EpsilonGreedyExploration<T>

Epsilon-greedy exploration: with probability epsilon, select random action.

ExplorationStrategyBase<T>

Abstract base class for exploration strategy implementations. Provides common functionality for noise generation and action clamping.

GaussianNoiseExploration<T>

Gaussian noise exploration for continuous action spaces.

NoExploration<T>

No exploration - always use the policy's action directly (greedy).

OrnsteinUhlenbeckNoise<T>

Ornstein-Uhlenbeck process noise for temporally correlated exploration. Commonly used in DDPG and other continuous control algorithms. Process equation: dx = θ(μ - x)dt + σdW

ThompsonSamplingExploration<T>

Thompson Sampling (Bayesian) exploration for discrete action spaces. Maintains Beta distributions for each action and samples from posteriors.

UpperConfidenceBoundExploration<T>

Upper Confidence Bound (UCB) exploration for discrete action spaces. Balances exploration and exploitation using confidence intervals: UCB(a) = Q(a) + c * √(ln(t) / N(a))

Interfaces

IExplorationStrategy<T>

Interface for exploration strategies used by policies.