Class LSPIOptions<T>
Configuration options for LSPI (Least-Squares Policy Iteration) agents.
public class LSPIOptions<T> : ReinforcementLearningOptions<T>
Type Parameters
TThe numeric type used for calculations.
- Inheritance
-
LSPIOptions<T>
- Inherited Members
Remarks
LSPI combines least-squares methods with policy iteration. It alternates between policy evaluation (using LSTDQ) and policy improvement, iteratively refining the policy until convergence.
For Beginners: LSPI is like repeatedly asking "what's the best policy?" and "how good is it?" until the answers stop changing. Each iteration uses LSTD to evaluate the current policy, then improves it based on those evaluations.
Best for:
- Batch reinforcement learning
- Offline learning from fixed datasets
- Sample-efficient policy learning
- When you need guaranteed convergence
Not suitable for:
- Online/streaming scenarios
- Very large feature spaces
- Continuous action spaces
- Real-time learning requirements
Properties
ActionSize
Size of the action space (number of possible actions).
public int ActionSize { get; init; }
Property Value
ConvergenceThreshold
Weight change threshold for determining convergence.
public double ConvergenceThreshold { get; init; }
Property Value
FeatureSize
Number of features in the state representation.
public int FeatureSize { get; init; }
Property Value
MaxIterations
Maximum number of policy iteration steps before stopping.
public int MaxIterations { get; init; }
Property Value
RegularizationParam
Regularization parameter to prevent overfitting and ensure numerical stability.
public double RegularizationParam { get; init; }