Table of Contents

Class SACOptions<T>

Namespace
AiDotNet.Models.Options
Assembly
AiDotNet.dll

Configuration options for Soft Actor-Critic (SAC) agents.

public class SACOptions<T>

Type Parameters

T

The numeric type used for calculations.

Inheritance
SACOptions<T>
Inherited Members

Remarks

SAC is a state-of-the-art off-policy actor-critic algorithm that combines maximum entropy RL with stable off-policy learning. It's particularly effective for continuous control tasks and is known for excellent sample efficiency and robustness.

For Beginners: SAC is one of the best algorithms for continuous control (like robot movement).

Key innovations:

  • Maximum Entropy: Encourages exploration by being "random on purpose"
  • Off-Policy: Learns from old experiences (sample efficient)
  • Twin Q-Networks: Uses two Q-functions to prevent overestimation
  • Automatic Tuning: Adjusts exploration automatically

Think of it like learning to drive while staying diverse in your driving style - you don't just learn one way to drive, you stay flexible and adaptable.

Used by: Robotic manipulation, dexterous control, autonomous systems

Constructors

SACOptions()

public SACOptions()

Properties

ActionSize

Size of the continuous action space.

public int ActionSize { get; set; }

Property Value

int

AlphaLearningRate

Learning rate for temperature parameter (alpha).

public T AlphaLearningRate { get; set; }

Property Value

T

AutoTuneTemperature

Whether to automatically tune the temperature parameter.

public bool AutoTuneTemperature { get; set; }

Property Value

bool

Remarks

Recommended: true. Automatically adjusts exploration based on entropy target.

BatchSize

Mini-batch size for training.

public int BatchSize { get; set; }

Property Value

int

Remarks

Typical values: 256-512.

DiscountFactor

Discount factor (gamma) for future rewards.

public T DiscountFactor { get; set; }

Property Value

T

Remarks

Typical values: 0.95-0.99.

GradientSteps

Number of gradient steps per environment step.

public int GradientSteps { get; set; }

Property Value

int

Remarks

Typical value: 1. Can be > 1 for faster learning from collected experiences.

InitialTemperature

Initial temperature (alpha) for entropy regularization.

public T InitialTemperature { get; set; }

Property Value

T

Remarks

Typical values: 0.2-1.0. Higher = more exploration. Can be automatically tuned if AutoTuneTemperature is true.

PolicyHiddenLayers

Hidden layer sizes for policy network.

public List<int> PolicyHiddenLayers { get; set; }

Property Value

List<int>

PolicyLearningRate

Learning rate for policy network.

public T PolicyLearningRate { get; set; }

Property Value

T

QHiddenLayers

Hidden layer sizes for Q-networks.

public List<int> QHiddenLayers { get; set; }

Property Value

List<int>

QLearningRate

Learning rate for Q-networks.

public T QLearningRate { get; set; }

Property Value

T

QLossFunction

Loss function for Q-networks (typically MSE).

public ILossFunction<T> QLossFunction { get; set; }

Property Value

ILossFunction<T>

Remarks

MSE (Mean Squared Error) is the standard loss for SAC Q-networks as it minimizes the Bellman error: L = E[(Q(s,a) - (r + γ * Q_target(s',a')))^2]. This is the correct loss function for value-based RL algorithms.

ReplayBufferSize

Capacity of the experience replay buffer.

public int ReplayBufferSize { get; set; }

Property Value

int

Remarks

Typical values: 100,000-1,000,000.

Seed

Random seed for reproducibility (optional).

public int? Seed { get; set; }

Property Value

int?

StateSize

Size of the state observation space.

public int StateSize { get; set; }

Property Value

int

TargetEntropy

Target entropy for automatic temperature tuning.

public T? TargetEntropy { get; set; }

Property Value

T

Remarks

Typical: -ActionSize (for continuous actions). If null, uses -ActionSize as default.

TargetUpdateTau

Soft target update coefficient (tau).

public T TargetUpdateTau { get; set; }

Property Value

T

Remarks

Typical values: 0.005-0.01. Controls how quickly target networks track main networks.

WarmupSteps

Number of warmup steps before starting training.

public int WarmupSteps { get; set; }

Property Value

int

Remarks

Typical values: 1,000-10,000. Collects random experiences before training begins.