Table of Contents

Class TRPOOptions<T>

Namespace
AiDotNet.Models.Options
Assembly
AiDotNet.dll

Configuration options for Trust Region Policy Optimization (TRPO) agents.

public class TRPOOptions<T> : ReinforcementLearningOptions<T>

Type Parameters

T

The numeric type used for calculations.

Inheritance
TRPOOptions<T>
Inherited Members

Remarks

TRPO ensures monotonic improvement by constraining policy updates to a "trust region" using KL divergence. This prevents destructively large updates.

For Beginners: TRPO is like learning carefully - it never makes a change that's "too big". By limiting how much the policy can change, it guarantees that performance never gets worse (monotonic improvement).

Key features:

  • Trust Region: Limits policy change per update (via KL divergence)
  • Monotonic Improvement: Guarantees performance doesn't degrade
  • Conjugate Gradient: Efficiently solves constrained optimization
  • Line Search: Ensures constraints are satisfied

Think of it like taking small, safe steps when walking on uncertain terrain rather than making large leaps that might cause you to fall.

Famous for: OpenAI's robotics research, predecessor to PPO

Constructors

TRPOOptions()

public TRPOOptions()

Properties

ActionSize

public int ActionSize { get; init; }

Property Value

int

ConjugateGradientIterations

public int ConjugateGradientIterations { get; init; }

Property Value

int

Damping

public double Damping { get; init; }

Property Value

double

GaeLambda

public T GaeLambda { get; init; }

Property Value

T

IsContinuous

public bool IsContinuous { get; init; }

Property Value

bool

LineSearchAcceptRatio

public double LineSearchAcceptRatio { get; init; }

Property Value

double

LineSearchBacktrackCoeff

public double LineSearchBacktrackCoeff { get; init; }

Property Value

double

LineSearchSteps

public int LineSearchSteps { get; init; }

Property Value

int

MaxKL

public T MaxKL { get; init; }

Property Value

T

Optimizer

The optimizer used for updating network parameters. If null, Adam optimizer will be used by default.

public IOptimizer<T, Vector<T>, Vector<T>>? Optimizer { get; init; }

Property Value

IOptimizer<T, Vector<T>, Vector<T>>

PolicyHiddenLayers

public List<int> PolicyHiddenLayers { get; init; }

Property Value

List<int>

StateSize

public int StateSize { get; init; }

Property Value

int

StepsPerUpdate

public int StepsPerUpdate { get; init; }

Property Value

int

ValueHiddenLayers

public List<int> ValueHiddenLayers { get; init; }

Property Value

List<int>

ValueIterations

public int ValueIterations { get; init; }

Property Value

int

ValueLearningRate

public T ValueLearningRate { get; init; }

Property Value

T

ValueLossFunction

public ILossFunction<T> ValueLossFunction { get; init; }

Property Value

ILossFunction<T>