Class SACOptions<T>

Namespace: AiDotNet.Models.Options

Assembly: AiDotNet.dll

Configuration options for Soft Actor-Critic (SAC) agents.

public class SACOptions<T>

Type Parameters

T: The numeric type used for calculations.

Inheritance: object

SACOptions<T>

Inherited Members: object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

SAC is a state-of-the-art off-policy actor-critic algorithm that combines maximum entropy RL with stable off-policy learning. It's particularly effective for continuous control tasks and is known for excellent sample efficiency and robustness.

For Beginners: SAC is one of the best algorithms for continuous control (like robot movement).

Key innovations:

Maximum Entropy: Encourages exploration by being "random on purpose"
Off-Policy: Learns from old experiences (sample efficient)
Twin Q-Networks: Uses two Q-functions to prevent overestimation
Automatic Tuning: Adjusts exploration automatically

Think of it like learning to drive while staying diverse in your driving style - you don't just learn one way to drive, you stay flexible and adaptable.

Used by: Robotic manipulation, dexterous control, autonomous systems

Constructors

SACOptions()

public SACOptions()

Properties

ActionSize

Size of the continuous action space.

public int ActionSize { get; set; }

Property Value

int

AlphaLearningRate

Learning rate for temperature parameter (alpha).

public T AlphaLearningRate { get; set; }

Property Value

T

AutoTuneTemperature

Whether to automatically tune the temperature parameter.

public bool AutoTuneTemperature { get; set; }

Property Value

bool

Remarks

Recommended: true. Automatically adjusts exploration based on entropy target.

BatchSize

Mini-batch size for training.

public int BatchSize { get; set; }

Property Value

int

Remarks

Typical values: 256-512.

DiscountFactor

Discount factor (gamma) for future rewards.

public T DiscountFactor { get; set; }

Property Value

T

Remarks

Typical values: 0.95-0.99.

GradientSteps

Number of gradient steps per environment step.

public int GradientSteps { get; set; }

Property Value

int

Remarks

Typical value: 1. Can be > 1 for faster learning from collected experiences.

InitialTemperature

Initial temperature (alpha) for entropy regularization.

public T InitialTemperature { get; set; }

Property Value

T

Remarks

Typical values: 0.2-1.0. Higher = more exploration. Can be automatically tuned if AutoTuneTemperature is true.

PolicyHiddenLayers

Hidden layer sizes for policy network.

public List<int> PolicyHiddenLayers { get; set; }

Property Value

List<int>

PolicyLearningRate

Learning rate for policy network.

public T PolicyLearningRate { get; set; }

Property Value

T

QHiddenLayers

Hidden layer sizes for Q-networks.

public List<int> QHiddenLayers { get; set; }

Property Value

List<int>

QLearningRate

Learning rate for Q-networks.

public T QLearningRate { get; set; }

Property Value

T

QLossFunction

Loss function for Q-networks (typically MSE).

public ILossFunction<T> QLossFunction { get; set; }

Property Value

ILossFunction<T>

Remarks

MSE (Mean Squared Error) is the standard loss for SAC Q-networks as it minimizes the Bellman error: L = E[(Q(s,a) - (r + γ * Q_target(s',a')))^2]. This is the correct loss function for value-based RL algorithms.

ReplayBufferSize

Capacity of the experience replay buffer.

public int ReplayBufferSize { get; set; }

Property Value

int

Remarks

Typical values: 100,000-1,000,000.

Seed

Random seed for reproducibility (optional).

public int? Seed { get; set; }

Property Value

int?

StateSize

Size of the state observation space.

public int StateSize { get; set; }

Property Value

int

TargetEntropy

Target entropy for automatic temperature tuning.

public T? TargetEntropy { get; set; }

Property Value

T

Remarks

Typical: -ActionSize (for continuous actions). If null, uses -ActionSize as default.

TargetUpdateTau

Soft target update coefficient (tau).

public T TargetUpdateTau { get; set; }

Property Value

T

Remarks

Typical values: 0.005-0.01. Controls how quickly target networks track main networks.

WarmupSteps

Number of warmup steps before starting training.

public int WarmupSteps { get; set; }

Property Value

int

Remarks

Typical values: 1,000-10,000. Collects random experiences before training begins.

Table of Contents

Class SACOptions<T>

Type Parameters

Remarks

Constructors

SACOptions()

Properties

ActionSize

Property Value

AlphaLearningRate

Property Value

AutoTuneTemperature

Property Value

Remarks

BatchSize

Property Value

Remarks

DiscountFactor

Property Value

Remarks

GradientSteps

Property Value

Remarks

InitialTemperature

Property Value

Remarks

PolicyHiddenLayers

Property Value

PolicyLearningRate

Property Value

QHiddenLayers

Property Value

QLearningRate

Property Value

QLossFunction

Property Value

Remarks

ReplayBufferSize

Property Value

Remarks

Seed

Property Value

StateSize

Property Value

TargetEntropy

Property Value

Remarks

TargetUpdateTau

Property Value

Remarks

WarmupSteps

Property Value

Remarks