Class TRPOOptions<T>
Configuration options for Trust Region Policy Optimization (TRPO) agents.
public class TRPOOptions<T> : ReinforcementLearningOptions<T>
Type Parameters
TThe numeric type used for calculations.
- Inheritance
-
TRPOOptions<T>
- Inherited Members
Remarks
TRPO ensures monotonic improvement by constraining policy updates to a "trust region" using KL divergence. This prevents destructively large updates.
For Beginners: TRPO is like learning carefully - it never makes a change that's "too big". By limiting how much the policy can change, it guarantees that performance never gets worse (monotonic improvement).
Key features:
- Trust Region: Limits policy change per update (via KL divergence)
- Monotonic Improvement: Guarantees performance doesn't degrade
- Conjugate Gradient: Efficiently solves constrained optimization
- Line Search: Ensures constraints are satisfied
Think of it like taking small, safe steps when walking on uncertain terrain rather than making large leaps that might cause you to fall.
Famous for: OpenAI's robotics research, predecessor to PPO
Constructors
TRPOOptions()
public TRPOOptions()
Properties
ActionSize
public int ActionSize { get; init; }
Property Value
ConjugateGradientIterations
public int ConjugateGradientIterations { get; init; }
Property Value
Damping
public double Damping { get; init; }
Property Value
GaeLambda
public T GaeLambda { get; init; }
Property Value
- T
IsContinuous
public bool IsContinuous { get; init; }
Property Value
LineSearchAcceptRatio
public double LineSearchAcceptRatio { get; init; }
Property Value
LineSearchBacktrackCoeff
public double LineSearchBacktrackCoeff { get; init; }
Property Value
LineSearchSteps
public int LineSearchSteps { get; init; }
Property Value
MaxKL
public T MaxKL { get; init; }
Property Value
- T
Optimizer
The optimizer used for updating network parameters. If null, Adam optimizer will be used by default.
public IOptimizer<T, Vector<T>, Vector<T>>? Optimizer { get; init; }
Property Value
- IOptimizer<T, Vector<T>, Vector<T>>
PolicyHiddenLayers
public List<int> PolicyHiddenLayers { get; init; }
Property Value
StateSize
public int StateSize { get; init; }
Property Value
StepsPerUpdate
public int StepsPerUpdate { get; init; }
Property Value
ValueHiddenLayers
public List<int> ValueHiddenLayers { get; init; }
Property Value
ValueIterations
public int ValueIterations { get; init; }
Property Value
ValueLearningRate
public T ValueLearningRate { get; init; }
Property Value
- T
ValueLossFunction
public ILossFunction<T> ValueLossFunction { get; init; }