Class TRPOOptions<T>

Namespace: AiDotNet.Models.Options

Assembly: AiDotNet.dll

Configuration options for Trust Region Policy Optimization (TRPO) agents.

public class TRPOOptions<T> : ReinforcementLearningOptions<T>

Type Parameters

T: The numeric type used for calculations.

Inheritance: object

ReinforcementLearningOptions<T>

TRPOOptions<T>

Inherited Members: ReinforcementLearningOptions<T>.LearningRate

ReinforcementLearningOptions<T>.DiscountFactor

ReinforcementLearningOptions<T>.LossFunction

ReinforcementLearningOptions<T>.Seed

ReinforcementLearningOptions<T>.BatchSize

ReinforcementLearningOptions<T>.ReplayBufferSize

ReinforcementLearningOptions<T>.TargetUpdateFrequency

ReinforcementLearningOptions<T>.UsePrioritizedReplay

ReinforcementLearningOptions<T>.EpsilonStart

ReinforcementLearningOptions<T>.EpsilonEnd

ReinforcementLearningOptions<T>.EpsilonDecay

ReinforcementLearningOptions<T>.WarmupSteps

ReinforcementLearningOptions<T>.MaxGradientNorm

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

TRPO ensures monotonic improvement by constraining policy updates to a "trust region" using KL divergence. This prevents destructively large updates.

For Beginners: TRPO is like learning carefully - it never makes a change that's "too big". By limiting how much the policy can change, it guarantees that performance never gets worse (monotonic improvement).

Key features:

Trust Region: Limits policy change per update (via KL divergence)
Monotonic Improvement: Guarantees performance doesn't degrade
Conjugate Gradient: Efficiently solves constrained optimization
Line Search: Ensures constraints are satisfied

Think of it like taking small, safe steps when walking on uncertain terrain rather than making large leaps that might cause you to fall.

Famous for: OpenAI's robotics research, predecessor to PPO

Constructors

TRPOOptions()

public TRPOOptions()

Properties

ActionSize

public int ActionSize { get; init; }

Property Value

int

ConjugateGradientIterations

public int ConjugateGradientIterations { get; init; }

Property Value

int

Damping

public double Damping { get; init; }

Property Value

double

GaeLambda

public T GaeLambda { get; init; }

Property Value

T

IsContinuous

public bool IsContinuous { get; init; }

Property Value

bool

LineSearchAcceptRatio

public double LineSearchAcceptRatio { get; init; }

Property Value

double

LineSearchBacktrackCoeff

public double LineSearchBacktrackCoeff { get; init; }

Property Value

double

LineSearchSteps

public int LineSearchSteps { get; init; }

Property Value

int

MaxKL

public T MaxKL { get; init; }

Property Value

T

Optimizer

The optimizer used for updating network parameters. If null, Adam optimizer will be used by default.

public IOptimizer<T, Vector<T>, Vector<T>>? Optimizer { get; init; }

Property Value

IOptimizer<T, Vector<T>, Vector<T>>

PolicyHiddenLayers

public List<int> PolicyHiddenLayers { get; init; }

Property Value

List<int>

StateSize

public int StateSize { get; init; }

Property Value

int

StepsPerUpdate

public int StepsPerUpdate { get; init; }

Property Value

int

ValueHiddenLayers

public List<int> ValueHiddenLayers { get; init; }

Property Value

List<int>

ValueIterations

public int ValueIterations { get; init; }

Property Value

int

ValueLearningRate

public T ValueLearningRate { get; init; }

Property Value

T

ValueLossFunction

public ILossFunction<T> ValueLossFunction { get; init; }

Property Value

ILossFunction<T>

Table of Contents

Class TRPOOptions<T>

Type Parameters

Remarks

Constructors

TRPOOptions()

Properties

ActionSize

Property Value

ConjugateGradientIterations

Property Value

Damping

Property Value

GaeLambda

Property Value

IsContinuous

Property Value

LineSearchAcceptRatio

Property Value

LineSearchBacktrackCoeff

Property Value

LineSearchSteps

Property Value

MaxKL

Property Value

Optimizer

Property Value

PolicyHiddenLayers

Property Value

StateSize

Property Value

StepsPerUpdate

Property Value

ValueHiddenLayers

Property Value

ValueIterations

Property Value

ValueLearningRate

Property Value

ValueLossFunction

Property Value