Namespace AiDotNet.ReinforcementLearning.Policies.Exploration
Classes
- BoltzmannExploration<T>
Boltzmann (softmax) exploration with temperature-based action selection. Uses temperature parameter to control exploration: higher temperature = more random. Action probability: P(a) = exp(Q(a)/τ) / Σ exp(Q(a')/τ)
- EpsilonGreedyExploration<T>
Epsilon-greedy exploration: with probability epsilon, select random action.
- ExplorationStrategyBase<T>
Abstract base class for exploration strategy implementations. Provides common functionality for noise generation and action clamping.
- GaussianNoiseExploration<T>
Gaussian noise exploration for continuous action spaces.
- NoExploration<T>
No exploration - always use the policy's action directly (greedy).
- OrnsteinUhlenbeckNoise<T>
Ornstein-Uhlenbeck process noise for temporally correlated exploration. Commonly used in DDPG and other continuous control algorithms. Process equation: dx = θ(μ - x)dt + σdW
- ThompsonSamplingExploration<T>
Thompson Sampling (Bayesian) exploration for discrete action spaces. Maintains Beta distributions for each action and samples from posteriors.
- UpperConfidenceBoundExploration<T>
Upper Confidence Bound (UCB) exploration for discrete action spaces. Balances exploration and exploitation using confidence intervals: UCB(a) = Q(a) + c * √(ln(t) / N(a))
Interfaces
- IExplorationStrategy<T>
Interface for exploration strategies used by policies.