Class SACOptions<T>
Configuration options for Soft Actor-Critic (SAC) agents.
public class SACOptions<T>
Type Parameters
TThe numeric type used for calculations.
- Inheritance
-
SACOptions<T>
- Inherited Members
Remarks
SAC is a state-of-the-art off-policy actor-critic algorithm that combines maximum entropy RL with stable off-policy learning. It's particularly effective for continuous control tasks and is known for excellent sample efficiency and robustness.
For Beginners: SAC is one of the best algorithms for continuous control (like robot movement).
Key innovations:
- Maximum Entropy: Encourages exploration by being "random on purpose"
- Off-Policy: Learns from old experiences (sample efficient)
- Twin Q-Networks: Uses two Q-functions to prevent overestimation
- Automatic Tuning: Adjusts exploration automatically
Think of it like learning to drive while staying diverse in your driving style - you don't just learn one way to drive, you stay flexible and adaptable.
Used by: Robotic manipulation, dexterous control, autonomous systems
Constructors
SACOptions()
public SACOptions()
Properties
ActionSize
Size of the continuous action space.
public int ActionSize { get; set; }
Property Value
AlphaLearningRate
Learning rate for temperature parameter (alpha).
public T AlphaLearningRate { get; set; }
Property Value
- T
AutoTuneTemperature
Whether to automatically tune the temperature parameter.
public bool AutoTuneTemperature { get; set; }
Property Value
Remarks
Recommended: true. Automatically adjusts exploration based on entropy target.
BatchSize
Mini-batch size for training.
public int BatchSize { get; set; }
Property Value
Remarks
Typical values: 256-512.
DiscountFactor
Discount factor (gamma) for future rewards.
public T DiscountFactor { get; set; }
Property Value
- T
Remarks
Typical values: 0.95-0.99.
GradientSteps
Number of gradient steps per environment step.
public int GradientSteps { get; set; }
Property Value
Remarks
Typical value: 1. Can be > 1 for faster learning from collected experiences.
InitialTemperature
Initial temperature (alpha) for entropy regularization.
public T InitialTemperature { get; set; }
Property Value
- T
Remarks
Typical values: 0.2-1.0. Higher = more exploration. Can be automatically tuned if AutoTuneTemperature is true.
PolicyHiddenLayers
Hidden layer sizes for policy network.
public List<int> PolicyHiddenLayers { get; set; }
Property Value
PolicyLearningRate
Learning rate for policy network.
public T PolicyLearningRate { get; set; }
Property Value
- T
QHiddenLayers
Hidden layer sizes for Q-networks.
public List<int> QHiddenLayers { get; set; }
Property Value
QLearningRate
Learning rate for Q-networks.
public T QLearningRate { get; set; }
Property Value
- T
QLossFunction
Loss function for Q-networks (typically MSE).
public ILossFunction<T> QLossFunction { get; set; }
Property Value
Remarks
MSE (Mean Squared Error) is the standard loss for SAC Q-networks as it minimizes the Bellman error: L = E[(Q(s,a) - (r + γ * Q_target(s',a')))^2]. This is the correct loss function for value-based RL algorithms.
ReplayBufferSize
Capacity of the experience replay buffer.
public int ReplayBufferSize { get; set; }
Property Value
Remarks
Typical values: 100,000-1,000,000.
Seed
Random seed for reproducibility (optional).
public int? Seed { get; set; }
Property Value
- int?
StateSize
Size of the state observation space.
public int StateSize { get; set; }
Property Value
TargetEntropy
Target entropy for automatic temperature tuning.
public T? TargetEntropy { get; set; }
Property Value
- T
Remarks
Typical: -ActionSize (for continuous actions). If null, uses -ActionSize as default.
TargetUpdateTau
Soft target update coefficient (tau).
public T TargetUpdateTau { get; set; }
Property Value
- T
Remarks
Typical values: 0.005-0.01. Controls how quickly target networks track main networks.
WarmupSteps
Number of warmup steps before starting training.
public int WarmupSteps { get; set; }
Property Value
Remarks
Typical values: 1,000-10,000. Collects random experiences before training begins.