Class RLTrainingOptions<T>
- Namespace
- AiDotNet.Configuration
- Assembly
- AiDotNet.dll
Configuration options for reinforcement learning training loops via AiModelBuilder.
public class RLTrainingOptions<T>
Type Parameters
TThe numeric type used for calculations (e.g., float, double).
- Inheritance
-
RLTrainingOptions<T>
- Inherited Members
Remarks
This class provides comprehensive configuration for RL training loops, following industry-standard patterns from libraries like Stable-Baselines3, RLlib, and CleanRL.
Note: This class is for configuring the training loop (episodes, steps, callbacks). For agent-specific options (learning rate, discount factor), see each agent's options class.
For Beginners: Reinforcement learning trains an agent through trial and error in an environment. This options class lets you customize every aspect of that training process: - How many episodes to run - How to explore vs exploit - How to store and sample experiences - When to receive progress updates
Quick Start Example:
var options = new RLTrainingOptions<double>
{
Environment = new CartPoleEnvironment<double>(),
Episodes = 1000,
MaxStepsPerEpisode = 500
};
var result = await new AiModelBuilder<double, Vector<double>, Vector<double>>()
.ConfigureReinforcementLearning(options)
.ConfigureModel(new DQNAgent<double>(agentOptions))
.BuildAsync();
Properties
BatchSize
Gets or sets the batch size for sampling from the replay buffer.
public int BatchSize { get; set; }
Property Value
Remarks
For Beginners: When learning, the agent samples a batch of past experiences. Larger batches give more stable gradients but use more memory. Default: 64 experiences per batch.
CheckpointConfig
Gets or sets the checkpoint configuration for saving models during training.
public RLCheckpointConfig? CheckpointConfig { get; set; }
Property Value
Remarks
For Beginners: Checkpointing saves your model periodically during training. This protects against crashes and lets you resume training later.
EarlyStoppingConfig
Gets or sets the early stopping configuration.
public RLEarlyStoppingConfig<T>? EarlyStoppingConfig { get; set; }
Property Value
Remarks
For Beginners: Early stopping automatically stops training when the agent stops improving, preventing overfitting and saving time.
Environment
Gets or sets the environment for the agent to interact with.
public IEnvironment<T>? Environment { get; set; }
Property Value
- IEnvironment<T>
Remarks
For Beginners: The environment is the "world" where your agent learns. It could be a game, simulation, or any system with states, actions, and rewards.
Episodes
Gets or sets the number of episodes to train for.
public int Episodes { get; set; }
Property Value
Remarks
For Beginners: An episode is one complete run through the environment from start to finish (or until max steps). More episodes generally means better learning. Default: 1000 episodes.
EvaluationConfig
Gets or sets the evaluation configuration for assessing agent performance during training.
public RLEvaluationConfig? EvaluationConfig { get; set; }
Property Value
Remarks
For Beginners: Evaluation runs the agent without exploration to measure true performance. This gives you an unbiased estimate of how well the agent learned.
ExplorationSchedule
Gets or sets the exploration schedule configuration.
public ExplorationScheduleConfig<T>? ExplorationSchedule { get; set; }
Property Value
Remarks
For Beginners: This controls how exploration (trying random actions) decreases over time as the agent learns. Common schedule: start at 1.0 (fully random), decay to 0.01 (mostly learned policy).
ExplorationStrategy
Gets or sets the optional exploration strategy to use during training.
public IExplorationStrategy<T>? ExplorationStrategy { get; set; }
Property Value
Remarks
For Beginners: Exploration strategies help the agent try new things instead of always doing what it thinks is best. Common strategies:
- EpsilonGreedy: Random action with probability epsilon
- Boltzmann: Softmax over Q-values
- GaussianNoise: Add noise to continuous actions If null, the agent's default exploration is used.
GradientSteps
Gets or sets the number of gradient steps per training update.
public int GradientSteps { get; set; }
Property Value
Remarks
For Beginners: Each training update can perform multiple gradient descent steps. More gradient steps can speed up learning but may cause instability. Default: 1 gradient step per update.
LogFrequency
Gets or sets how often to log progress (every N episodes).
public int LogFrequency { get; set; }
Property Value
Remarks
For Beginners: Set to 0 to disable automatic console logging. Set to 10 to log every 10 episodes, etc. Default: 10 (log every 10 episodes).
MaxStepsPerEpisode
Gets or sets the maximum steps per episode to prevent infinite loops.
public int MaxStepsPerEpisode { get; set; }
Property Value
Remarks
For Beginners: Some environments might never end naturally. This limit ensures episodes don't run forever. Default: 500 steps per episode.
NormalizeObservations
Gets or sets whether to normalize observations.
public bool NormalizeObservations { get; set; }
Property Value
Remarks
For Beginners: Normalizing observations (scaling them to similar ranges) often helps neural networks learn faster and more stably. Default: false.
NormalizeRewards
Gets or sets whether to normalize rewards.
public bool NormalizeRewards { get; set; }
Property Value
Remarks
For Beginners: Normalizing rewards can help when reward scales vary widely during training. Default: false.
OnEpisodeComplete
Gets or sets the callback invoked after each episode completes.
public Action<RLEpisodeMetrics<T>>? OnEpisodeComplete { get; set; }
Property Value
Remarks
For Beginners: This callback lets you monitor training progress. It receives detailed metrics about the completed episode.
Example:
options.OnEpisodeComplete = (metrics) =>
{
Console.WriteLine($"Episode {metrics.Episode}: Reward = {metrics.TotalReward}");
};
OnStepComplete
Gets or sets the callback invoked after each training step.
public Action<RLStepMetrics<T>>? OnStepComplete { get; set; }
Property Value
- Action<RLStepMetrics<T>>
Remarks
For Beginners: This callback fires more frequently (every step or training update). Useful for detailed logging or progress bars.
OnTrainingComplete
Gets or sets the callback invoked when training ends.
public Action<RLTrainingSummary<T>>? OnTrainingComplete { get; set; }
Property Value
Remarks
Receives the final training summary with aggregated metrics.
OnTrainingStart
Gets or sets the callback invoked when training starts.
public Action? OnTrainingStart { get; set; }
Property Value
PrioritizedReplayConfig
Gets or sets the prioritized replay configuration.
public PrioritizedReplayConfig<T>? PrioritizedReplayConfig { get; set; }
Property Value
Remarks
Only used if UsePrioritizedReplay is true.
ReplayBuffer
Gets or sets the optional replay buffer for experience storage.
public IReplayBuffer<T, Vector<T>, Vector<T>>? ReplayBuffer { get; set; }
Property Value
- IReplayBuffer<T, Vector<T>, Vector<T>>
Remarks
For Beginners: A replay buffer stores past experiences for learning. If null, the agent's internal buffer is used. You can provide:
- UniformReplayBuffer: All experiences equally likely
- PrioritizedReplayBuffer: Important experiences sampled more often
RewardClipping
Gets or sets the reward clipping bounds.
public RewardClippingConfig<T>? RewardClipping { get; set; }
Property Value
Remarks
For Beginners: Clipping rewards to a range (e.g., -1 to 1) can stabilize training when raw rewards have very different scales. Used famously in Atari DQN paper. If null, rewards are not clipped.
Seed
Gets or sets the random seed for reproducibility.
public int? Seed { get; set; }
Property Value
- int?
Remarks
For Beginners: Setting a seed makes training reproducible - you'll get the same results if you run training again with the same seed. If null, results will vary between runs.
TargetNetworkConfig
Gets or sets the target network configuration for DQN-family algorithms.
public TargetNetworkConfig<T>? TargetNetworkConfig { get; set; }
Property Value
Remarks
For Beginners: Target networks help stabilize learning in DQN-based algorithms by providing stable Q-value targets. This prevents the "moving target" problem.
TrainFrequency
Gets or sets the frequency of training updates (every N steps).
public int TrainFrequency { get; set; }
Property Value
Remarks
For Beginners: The agent doesn't have to learn after every single step. Training every N steps can be more efficient. Set to 1 for training every step. Default: 1 (train every step).
UsePrioritizedReplay
Gets or sets whether to use prioritized experience replay.
public bool UsePrioritizedReplay { get; set; }
Property Value
Remarks
For Beginners: Prioritized replay samples important experiences more often. "Important" usually means experiences with high TD-error (surprising outcomes). This can speed up learning but adds computational overhead. Default: false (uniform sampling).
WarmupSteps
Gets or sets the number of initial random steps before training begins.
public int WarmupSteps { get; set; }
Property Value
Remarks
For Beginners: Before the agent starts learning, it's helpful to fill the replay buffer with some random experiences. This provides diverse starting data. Default: 1000 warmup steps.
Methods
Default(IEnvironment<T>)
Creates default options with sensible values for most use cases.
public static RLTrainingOptions<T> Default(IEnvironment<T> environment)
Parameters
environmentIEnvironment<T>The environment to train in.
Returns
- RLTrainingOptions<T>
Options with recommended defaults.