Table of Contents

Class DoubleDQNAgent<T>

Namespace
AiDotNet.ReinforcementLearning.Agents.DoubleDQN
Assembly
AiDotNet.dll

Double Deep Q-Network (Double DQN) agent for reinforcement learning.

public class DoubleDQNAgent<T> : DeepReinforcementLearningAgentBase<T>, IRLAgent<T>, IFullModel<T, Vector<T>, Vector<T>>, IModel<Vector<T>, Vector<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Vector<T>, Vector<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Vector<T>, Vector<T>>>, IGradientComputable<T, Vector<T>, Vector<T>>, IJitCompilable<T>, IDisposable

Type Parameters

T

The numeric type used for calculations.

Inheritance
DoubleDQNAgent<T>
Implements
IFullModel<T, Vector<T>, Vector<T>>
IModel<Vector<T>, Vector<T>, ModelMetadata<T>>
IParameterizable<T, Vector<T>, Vector<T>>
ICloneable<IFullModel<T, Vector<T>, Vector<T>>>
IGradientComputable<T, Vector<T>, Vector<T>>
Inherited Members
Extension Methods

Remarks

Double DQN addresses the overestimation bias in standard DQN by decoupling action selection from action evaluation. It uses the online network to select actions and the target network to evaluate them, leading to more accurate Q-value estimates.

For Beginners: Standard DQN tends to overestimate Q-values because it uses the same network to both select and evaluate actions (max operator causes positive bias).

Double DQN fixes this by:

  • Using online network to SELECT the best action
  • Using target network to EVALUATE that action's value

Think of it like getting a second opinion: one expert picks what looks best, another expert judges its actual value. This reduces overoptimistic estimates.

Key Improvement: More stable learning, better performance, especially when there's noise or stochasticity in the environment.

Reference: van Hasselt et al., "Deep Reinforcement Learning with Double Q-learning", 2015.

Constructors

DoubleDQNAgent(DoubleDQNOptions<T>)

Initializes a new instance of the DoubleDQNAgent class.

public DoubleDQNAgent(DoubleDQNOptions<T> options)

Parameters

options DoubleDQNOptions<T>

Configuration options for the Double DQN agent.

Properties

FeatureCount

Gets the number of input features (state dimensions).

public override int FeatureCount { get; }

Property Value

int

Methods

ApplyGradients(Vector<T>, T)

Not supported for DoubleDQNAgent. Use the agent's internal Train() loop instead.

public override void ApplyGradients(Vector<T> gradients, T learningRate)

Parameters

gradients Vector<T>

Not used.

learningRate T

Not used.

Exceptions

NotSupportedException

Always thrown. DoubleDQN manages gradient computation and parameter updates internally through backpropagation.

Clone()

Clones the agent.

public override IFullModel<T, Vector<T>, Vector<T>> Clone()

Returns

IFullModel<T, Vector<T>, Vector<T>>

ComputeGradients(Vector<T>, Vector<T>, ILossFunction<T>?)

Computes gradients for the agent.

public override Vector<T> ComputeGradients(Vector<T> input, Vector<T> target, ILossFunction<T>? lossFunction = null)

Parameters

input Vector<T>
target Vector<T>
lossFunction ILossFunction<T>

Returns

Vector<T>

Deserialize(byte[])

Deserializes the agent from bytes.

public override void Deserialize(byte[] data)

Parameters

data byte[]

GetMetrics()

Gets the current training metrics.

public override Dictionary<string, T> GetMetrics()

Returns

Dictionary<string, T>

Dictionary of metric names to values.

GetModelMetadata()

Gets model metadata.

public override ModelMetadata<T> GetModelMetadata()

Returns

ModelMetadata<T>

GetParameters()

Gets the agent's parameters.

public override Vector<T> GetParameters()

Returns

Vector<T>

LoadModel(string)

Loads the agent's state from a file.

public override void LoadModel(string filepath)

Parameters

filepath string

Path to load the agent from.

SaveModel(string)

Saves the agent's state to a file.

public override void SaveModel(string filepath)

Parameters

filepath string

Path to save the agent.

SelectAction(Vector<T>, bool)

Selects an action given the current state observation.

public override Vector<T> SelectAction(Vector<T> state, bool training = true)

Parameters

state Vector<T>

The current state observation as a Vector.

training bool

Whether the agent is in training mode (affects exploration).

Returns

Vector<T>

Action as a Vector (can be discrete or continuous).

Serialize()

Serializes the agent to bytes.

public override byte[] Serialize()

Returns

byte[]

SetParameters(Vector<T>)

Sets the agent's parameters.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>

StoreExperience(Vector<T>, Vector<T>, T, Vector<T>, bool)

Stores an experience tuple for later learning.

public override void StoreExperience(Vector<T> state, Vector<T> action, T reward, Vector<T> nextState, bool done)

Parameters

state Vector<T>

The state before action.

action Vector<T>

The action taken.

reward T

The reward received.

nextState Vector<T>

The state after action.

done bool

Whether the episode terminated.

Train()

Performs one training step, updating the agent's policy/value function.

public override T Train()

Returns

T

The training loss for monitoring.