Class MADDPGAgent<T>

Namespace: AiDotNet.ReinforcementLearning.Agents.MADDPG

Assembly: AiDotNet.dll

public class MADDPGAgent<T> : DeepReinforcementLearningAgentBase<T>, IRLAgent<T>, IFullModel<T, Vector<T>, Vector<T>>, IModel<Vector<T>, Vector<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Vector<T>, Vector<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Vector<T>, Vector<T>>>, IGradientComputable<T, Vector<T>, Vector<T>>, IJitCompilable<T>, IDisposable

Type Parameters

T

Inheritance: object

ReinforcementLearningAgentBase<T>

DeepReinforcementLearningAgentBase<T>

MADDPGAgent<T>

Implements: IRLAgent<T>

IFullModel<T, Vector<T>, Vector<T>>

IModel<Vector<T>, Vector<T>, ModelMetadata<T>>

IModelSerializer

ICheckpointableModel

IParameterizable<T, Vector<T>, Vector<T>>

IFeatureAware

IFeatureImportance<T>

ICloneable<IFullModel<T, Vector<T>, Vector<T>>>

IGradientComputable<T, Vector<T>, Vector<T>>

IJitCompilable<T>

IDisposable

Inherited Members: DeepReinforcementLearningAgentBase<T>.Networks

DeepReinforcementLearningAgentBase<T>.ParameterCount

DeepReinforcementLearningAgentBase<T>.Dispose()

DeepReinforcementLearningAgentBase<T>.GetPolicyNetworkForJit()

DeepReinforcementLearningAgentBase<T>.SupportsJitCompilation

DeepReinforcementLearningAgentBase<T>.ExportComputationGraph(List<ComputationNode<T>>)

ReinforcementLearningAgentBase<T>.NumOps

ReinforcementLearningAgentBase<T>.Random

ReinforcementLearningAgentBase<T>.LossFunction

ReinforcementLearningAgentBase<T>.LearningRate

ReinforcementLearningAgentBase<T>.DiscountFactor

ReinforcementLearningAgentBase<T>.TrainingSteps

ReinforcementLearningAgentBase<T>.Episodes

ReinforcementLearningAgentBase<T>.LossHistory

ReinforcementLearningAgentBase<T>.RewardHistory

ReinforcementLearningAgentBase<T>.Options

ReinforcementLearningAgentBase<T>.DefaultLossFunction

ReinforcementLearningAgentBase<T>.Train(Vector<T>, Vector<T>)

ReinforcementLearningAgentBase<T>.FeatureNames

ReinforcementLearningAgentBase<T>.GetFeatureImportance()

ReinforcementLearningAgentBase<T>.GetActiveFeatureIndices()

ReinforcementLearningAgentBase<T>.IsFeatureUsed(int)

ReinforcementLearningAgentBase<T>.SetActiveFeatureIndices(IEnumerable<int>)

ReinforcementLearningAgentBase<T>.DeepCopy()

ReinforcementLearningAgentBase<T>.WithParameters(Vector<T>)

ReinforcementLearningAgentBase<T>.ComputeAverage(IEnumerable<T>)

ReinforcementLearningAgentBase<T>.SaveState(Stream)

ReinforcementLearningAgentBase<T>.LoadState(Stream)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Extension Methods: DistributedExtensions.AsDistributedForHighBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributedForLowBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, IShardingConfiguration<T>)

Constructors

MADDPGAgent(MADDPGOptions<T>, IOptimizer<T, Vector<T>, Vector<T>>?)

public MADDPGAgent(MADDPGOptions<T> options, IOptimizer<T, Vector<T>, Vector<T>>? optimizer = null)

Parameters

options MADDPGOptions<T>
optimizer IOptimizer<T, Vector<T>, Vector<T>>

Properties

FeatureCount

Gets the number of input features (state dimensions).

public override int FeatureCount { get; }

Property Value

int

Methods

ApplyGradients(Vector<T>, T)

Not supported for MADDPGAgent. Use the agent's internal Train() loop instead.

public override void ApplyGradients(Vector<T> gradients, T learningRate)

Parameters

gradients Vector<T>: Not used.
learningRate T: Not used.

Exceptions

NotSupportedException: Always thrown. MADDPG manages gradient computation and parameter updates internally through backpropagation.

Clone()

Creates a deep copy of this MADDPG agent including all trained network weights.

public override IFullModel<T, Vector<T>, Vector<T>> Clone()

Returns

IFullModel<T, Vector<T>, Vector<T>>: A new MADDPG agent with the same configuration and trained parameters.

Remarks

Issue #5 fix: Clone now properly copies all trained weights from actor and critic networks using GetParameters() and SetParameters(), ensuring the cloned agent has the same learned behavior.

ComputeGradients(Vector<T>, Vector<T>, ILossFunction<T>?)

Computes gradients for the agent.

public override Vector<T> ComputeGradients(Vector<T> input, Vector<T> target, ILossFunction<T>? lossFunction = null)

Parameters

input Vector<T>
target Vector<T>
lossFunction ILossFunction<T>

Returns

Vector<T>

Deserialize(byte[])

Deserializes a MADDPG agent from a byte array.

public override void Deserialize(byte[] data)

Parameters

data byte[]: Byte array containing the serialized agent data.

Remarks

Expects data created by Serialize() with a compatible configuration.

GetMetrics()

Gets the current training metrics.

public override Dictionary<string, T> GetMetrics()

Returns

Dictionary<string, T>: Dictionary of metric names to values.

GetModelMetadata()

Gets model metadata.

public override ModelMetadata<T> GetModelMetadata()

Returns

ModelMetadata<T>

GetParameters()

Gets the agent's parameters.

public override Vector<T> GetParameters()

Returns

Vector<T>

LoadModel(string)

Loads a trained model from a file.

public override void LoadModel(string filepath)

Parameters

filepath string: Path to load the model from.

Remarks

Uses Deserialize(byte[]) to restore network weights.

Predict(Vector<T>)

Makes a prediction using the trained agent.

public override Vector<T> Predict(Vector<T> input)

Parameters

input Vector<T>

Returns

Vector<T>

PredictAsync(Vector<T>)

public Task<Vector<T>> PredictAsync(Vector<T> input)

Parameters

input Vector<T>

Returns

Task<Vector<T>>

ResetEpisode()

Resets episode-specific state (if any).

public override void ResetEpisode()

SaveModel(string)

Saves the trained model to a file.

public override void SaveModel(string filepath)

Parameters

filepath string: Path to save the model.

Remarks

Uses Serialize() to persist network weights.

SelectAction(Vector<T>, bool)

Selects an action given the current state observation.

public override Vector<T> SelectAction(Vector<T> state, bool training = true)

Parameters

state Vector<T>: The current state observation as a Vector.
training bool: Whether the agent is in training mode (affects exploration).

Returns

Vector<T>: Action as a Vector (can be discrete or continuous).

SelectActionForAgent(int, Vector<T>, bool)

public Vector<T> SelectActionForAgent(int agentId, Vector<T> state, bool training = true)

Parameters

agentId int
state Vector<T>
training bool

Returns

Vector<T>

Serialize()

Serializes the MADDPG agent to a byte array.

public override byte[] Serialize()

Returns

byte[]: Byte array containing the serialized agent data.

Remarks

Serializes configuration values and all actor/critic network weights.

SetParameters(Vector<T>)

Sets the agent's parameters.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>

StoreExperience(Vector<T>, Vector<T>, T, Vector<T>, bool)

Stores an experience tuple for later learning.

public override void StoreExperience(Vector<T> state, Vector<T> action, T reward, Vector<T> nextState, bool done)

Parameters

state Vector<T>: The state before action.
action Vector<T>: The action taken.
reward T: The reward received.
nextState Vector<T>: The state after action.
done bool: Whether the episode terminated.

StoreMultiAgentExperience(List<Vector<T>>, List<Vector<T>>, List<T>, List<Vector<T>>, bool)

Store multi-agent experience with per-agent reward tracking.

public void StoreMultiAgentExperience(List<Vector<T>> states, List<Vector<T>> actions, List<T> rewards, List<Vector<T>> nextStates, bool done)

Parameters

states List<Vector<T>>
actions List<Vector<T>>
rewards List<T>
nextStates List<Vector<T>>
done bool

Remarks

Stores individual rewards for each agent to support both cooperative and competitive/mixed-motive scenarios. For backward compatibility, also stores an averaged reward in the experience.

The per-agent rewards are stored keyed by the buffer index where the experience will be placed. This accounts for the circular buffer behavior when capacity is reached.

Train()

Performs one training step, updating the agent's policy/value function.

public override T Train()

Returns

T: The training loss for monitoring.

TrainAsync()

public Task TrainAsync()

Returns

Task

Table of Contents

Class MADDPGAgent<T>

Type Parameters

Constructors

MADDPGAgent(MADDPGOptions<T>, IOptimizer<T, Vector<T>, Vector<T>>?)

Parameters

Properties

FeatureCount

Property Value

Methods

ApplyGradients(Vector<T>, T)

Parameters

Exceptions

Clone()

Returns

Remarks

ComputeGradients(Vector<T>, Vector<T>, ILossFunction<T>?)

Parameters

Returns

Deserialize(byte[])

Parameters

Remarks

GetMetrics()

Returns

GetModelMetadata()

Returns

GetParameters()

Returns

LoadModel(string)

Parameters

Remarks

Predict(Vector<T>)

Parameters

Returns

PredictAsync(Vector<T>)

Parameters

Returns

ResetEpisode()

SaveModel(string)

Parameters

Remarks

SelectAction(Vector<T>, bool)

Parameters

Returns

SelectActionForAgent(int, Vector<T>, bool)

Parameters

Returns

Serialize()

Returns

Remarks

SetParameters(Vector<T>)

Parameters

StoreExperience(Vector<T>, Vector<T>, T, Vector<T>, bool)

Parameters

StoreMultiAgentExperience(List<Vector<T>>, List<Vector<T>>, List<T>, List<Vector<T>>, bool)

Parameters

Remarks

Train()

Returns

TrainAsync()

Returns