Class DeterministicPolicy<T>
- Namespace
- AiDotNet.ReinforcementLearning.Policies
- Assembly
- AiDotNet.dll
Deterministic policy for continuous action spaces. Directly outputs actions without sampling from a distribution. Commonly used in DDPG, TD3, and other deterministic policy gradient methods.
public class DeterministicPolicy<T> : PolicyBase<T>, IPolicy<T>, IDisposable
Type Parameters
TThe numeric type used for calculations.
- Inheritance
-
PolicyBase<T>DeterministicPolicy<T>
- Implements
-
IPolicy<T>
- Inherited Members
Constructors
DeterministicPolicy(NeuralNetwork<T>, int, IExplorationStrategy<T>, bool, Random?)
Initializes a new instance of the DeterministicPolicy class.
public DeterministicPolicy(NeuralNetwork<T> policyNetwork, int actionSize, IExplorationStrategy<T> explorationStrategy, bool useTanhSquashing = true, Random? random = null)
Parameters
policyNetworkNeuralNetwork<T>The neural network that outputs actions.
actionSizeintThe size of the action space.
explorationStrategyIExplorationStrategy<T>The exploration strategy for training.
useTanhSquashingboolWhether to apply tanh squashing to bound actions to [-1, 1].
randomRandomOptional random number generator.
Methods
ComputeLogProb(Vector<T>, Vector<T>)
Computes log probability for a deterministic policy. This returns a constant (zero) since deterministic policies have delta distribution.
public override T ComputeLogProb(Vector<T> state, Vector<T> action)
Parameters
stateVector<T>actionVector<T>
Returns
- T
Dispose(bool)
Disposes of policy resources.
protected override void Dispose(bool disposing)
Parameters
disposingbool
GetNetworks()
Gets the neural networks used by this policy.
public override IReadOnlyList<INeuralNetwork<T>> GetNetworks()
Returns
Reset()
Resets the exploration strategy.
public override void Reset()
SelectAction(Vector<T>, bool)
Selects a deterministic action from the policy network.
public override Vector<T> SelectAction(Vector<T> state, bool training = true)
Parameters
stateVector<T>trainingbool
Returns
- Vector<T>