Table of Contents

Class RLTrainingOptions<T>

Namespace
AiDotNet.Configuration
Assembly
AiDotNet.dll

Configuration options for reinforcement learning training loops via AiModelBuilder.

public class RLTrainingOptions<T>

Type Parameters

T

The numeric type used for calculations (e.g., float, double).

Inheritance
RLTrainingOptions<T>
Inherited Members

Remarks

This class provides comprehensive configuration for RL training loops, following industry-standard patterns from libraries like Stable-Baselines3, RLlib, and CleanRL.

Note: This class is for configuring the training loop (episodes, steps, callbacks). For agent-specific options (learning rate, discount factor), see each agent's options class.

For Beginners: Reinforcement learning trains an agent through trial and error in an environment. This options class lets you customize every aspect of that training process: - How many episodes to run - How to explore vs exploit - How to store and sample experiences - When to receive progress updates

Quick Start Example:

var options = new RLTrainingOptions<double>
{
    Environment = new CartPoleEnvironment<double>(),
    Episodes = 1000,
    MaxStepsPerEpisode = 500
};

var result = await new AiModelBuilder<double, Vector<double>, Vector<double>>()
    .ConfigureReinforcementLearning(options)
    .ConfigureModel(new DQNAgent<double>(agentOptions))
    .BuildAsync();

Properties

BatchSize

Gets or sets the batch size for sampling from the replay buffer.

public int BatchSize { get; set; }

Property Value

int

Remarks

For Beginners: When learning, the agent samples a batch of past experiences. Larger batches give more stable gradients but use more memory. Default: 64 experiences per batch.

CheckpointConfig

Gets or sets the checkpoint configuration for saving models during training.

public RLCheckpointConfig? CheckpointConfig { get; set; }

Property Value

RLCheckpointConfig

Remarks

For Beginners: Checkpointing saves your model periodically during training. This protects against crashes and lets you resume training later.

EarlyStoppingConfig

Gets or sets the early stopping configuration.

public RLEarlyStoppingConfig<T>? EarlyStoppingConfig { get; set; }

Property Value

RLEarlyStoppingConfig<T>

Remarks

For Beginners: Early stopping automatically stops training when the agent stops improving, preventing overfitting and saving time.

Environment

Gets or sets the environment for the agent to interact with.

public IEnvironment<T>? Environment { get; set; }

Property Value

IEnvironment<T>

Remarks

For Beginners: The environment is the "world" where your agent learns. It could be a game, simulation, or any system with states, actions, and rewards.

Episodes

Gets or sets the number of episodes to train for.

public int Episodes { get; set; }

Property Value

int

Remarks

For Beginners: An episode is one complete run through the environment from start to finish (or until max steps). More episodes generally means better learning. Default: 1000 episodes.

EvaluationConfig

Gets or sets the evaluation configuration for assessing agent performance during training.

public RLEvaluationConfig? EvaluationConfig { get; set; }

Property Value

RLEvaluationConfig

Remarks

For Beginners: Evaluation runs the agent without exploration to measure true performance. This gives you an unbiased estimate of how well the agent learned.

ExplorationSchedule

Gets or sets the exploration schedule configuration.

public ExplorationScheduleConfig<T>? ExplorationSchedule { get; set; }

Property Value

ExplorationScheduleConfig<T>

Remarks

For Beginners: This controls how exploration (trying random actions) decreases over time as the agent learns. Common schedule: start at 1.0 (fully random), decay to 0.01 (mostly learned policy).

ExplorationStrategy

Gets or sets the optional exploration strategy to use during training.

public IExplorationStrategy<T>? ExplorationStrategy { get; set; }

Property Value

IExplorationStrategy<T>

Remarks

For Beginners: Exploration strategies help the agent try new things instead of always doing what it thinks is best. Common strategies:

  • EpsilonGreedy: Random action with probability epsilon
  • Boltzmann: Softmax over Q-values
  • GaussianNoise: Add noise to continuous actions If null, the agent's default exploration is used.

GradientSteps

Gets or sets the number of gradient steps per training update.

public int GradientSteps { get; set; }

Property Value

int

Remarks

For Beginners: Each training update can perform multiple gradient descent steps. More gradient steps can speed up learning but may cause instability. Default: 1 gradient step per update.

LogFrequency

Gets or sets how often to log progress (every N episodes).

public int LogFrequency { get; set; }

Property Value

int

Remarks

For Beginners: Set to 0 to disable automatic console logging. Set to 10 to log every 10 episodes, etc. Default: 10 (log every 10 episodes).

MaxStepsPerEpisode

Gets or sets the maximum steps per episode to prevent infinite loops.

public int MaxStepsPerEpisode { get; set; }

Property Value

int

Remarks

For Beginners: Some environments might never end naturally. This limit ensures episodes don't run forever. Default: 500 steps per episode.

NormalizeObservations

Gets or sets whether to normalize observations.

public bool NormalizeObservations { get; set; }

Property Value

bool

Remarks

For Beginners: Normalizing observations (scaling them to similar ranges) often helps neural networks learn faster and more stably. Default: false.

NormalizeRewards

Gets or sets whether to normalize rewards.

public bool NormalizeRewards { get; set; }

Property Value

bool

Remarks

For Beginners: Normalizing rewards can help when reward scales vary widely during training. Default: false.

OnEpisodeComplete

Gets or sets the callback invoked after each episode completes.

public Action<RLEpisodeMetrics<T>>? OnEpisodeComplete { get; set; }

Property Value

Action<RLEpisodeMetrics<T>>

Remarks

For Beginners: This callback lets you monitor training progress. It receives detailed metrics about the completed episode.

Example:

options.OnEpisodeComplete = (metrics) =>
{
    Console.WriteLine($"Episode {metrics.Episode}: Reward = {metrics.TotalReward}");
};

OnStepComplete

Gets or sets the callback invoked after each training step.

public Action<RLStepMetrics<T>>? OnStepComplete { get; set; }

Property Value

Action<RLStepMetrics<T>>

Remarks

For Beginners: This callback fires more frequently (every step or training update). Useful for detailed logging or progress bars.

OnTrainingComplete

Gets or sets the callback invoked when training ends.

public Action<RLTrainingSummary<T>>? OnTrainingComplete { get; set; }

Property Value

Action<RLTrainingSummary<T>>

Remarks

Receives the final training summary with aggregated metrics.

OnTrainingStart

Gets or sets the callback invoked when training starts.

public Action? OnTrainingStart { get; set; }

Property Value

Action

PrioritizedReplayConfig

Gets or sets the prioritized replay configuration.

public PrioritizedReplayConfig<T>? PrioritizedReplayConfig { get; set; }

Property Value

PrioritizedReplayConfig<T>

Remarks

Only used if UsePrioritizedReplay is true.

ReplayBuffer

Gets or sets the optional replay buffer for experience storage.

public IReplayBuffer<T, Vector<T>, Vector<T>>? ReplayBuffer { get; set; }

Property Value

IReplayBuffer<T, Vector<T>, Vector<T>>

Remarks

For Beginners: A replay buffer stores past experiences for learning. If null, the agent's internal buffer is used. You can provide:

  • UniformReplayBuffer: All experiences equally likely
  • PrioritizedReplayBuffer: Important experiences sampled more often

RewardClipping

Gets or sets the reward clipping bounds.

public RewardClippingConfig<T>? RewardClipping { get; set; }

Property Value

RewardClippingConfig<T>

Remarks

For Beginners: Clipping rewards to a range (e.g., -1 to 1) can stabilize training when raw rewards have very different scales. Used famously in Atari DQN paper. If null, rewards are not clipped.

Seed

Gets or sets the random seed for reproducibility.

public int? Seed { get; set; }

Property Value

int?

Remarks

For Beginners: Setting a seed makes training reproducible - you'll get the same results if you run training again with the same seed. If null, results will vary between runs.

TargetNetworkConfig

Gets or sets the target network configuration for DQN-family algorithms.

public TargetNetworkConfig<T>? TargetNetworkConfig { get; set; }

Property Value

TargetNetworkConfig<T>

Remarks

For Beginners: Target networks help stabilize learning in DQN-based algorithms by providing stable Q-value targets. This prevents the "moving target" problem.

TrainFrequency

Gets or sets the frequency of training updates (every N steps).

public int TrainFrequency { get; set; }

Property Value

int

Remarks

For Beginners: The agent doesn't have to learn after every single step. Training every N steps can be more efficient. Set to 1 for training every step. Default: 1 (train every step).

UsePrioritizedReplay

Gets or sets whether to use prioritized experience replay.

public bool UsePrioritizedReplay { get; set; }

Property Value

bool

Remarks

For Beginners: Prioritized replay samples important experiences more often. "Important" usually means experiences with high TD-error (surprising outcomes). This can speed up learning but adds computational overhead. Default: false (uniform sampling).

WarmupSteps

Gets or sets the number of initial random steps before training begins.

public int WarmupSteps { get; set; }

Property Value

int

Remarks

For Beginners: Before the agent starts learning, it's helpful to fill the replay buffer with some random experiences. This provides diverse starting data. Default: 1000 warmup steps.

Methods

Default(IEnvironment<T>)

Creates default options with sensible values for most use cases.

public static RLTrainingOptions<T> Default(IEnvironment<T> environment)

Parameters

environment IEnvironment<T>

The environment to train in.

Returns

RLTrainingOptions<T>

Options with recommended defaults.