Class MuZeroAgent<T>

Namespace: AiDotNet.ReinforcementLearning.Agents.MuZero

Assembly: AiDotNet.dll

MuZero agent combining tree search with learned models.

public class MuZeroAgent<T> : DeepReinforcementLearningAgentBase<T>, IRLAgent<T>, IFullModel<T, Vector<T>, Vector<T>>, IModel<Vector<T>, Vector<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Vector<T>, Vector<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Vector<T>, Vector<T>>>, IGradientComputable<T, Vector<T>, Vector<T>>, IJitCompilable<T>, IDisposable

Type Parameters

T: The numeric type used for calculations.

Inheritance: object

ReinforcementLearningAgentBase<T>

DeepReinforcementLearningAgentBase<T>

MuZeroAgent<T>

Implements: IRLAgent<T>

IFullModel<T, Vector<T>, Vector<T>>

IModel<Vector<T>, Vector<T>, ModelMetadata<T>>

IModelSerializer

ICheckpointableModel

IParameterizable<T, Vector<T>, Vector<T>>

IFeatureAware

IFeatureImportance<T>

ICloneable<IFullModel<T, Vector<T>, Vector<T>>>

IGradientComputable<T, Vector<T>, Vector<T>>

IJitCompilable<T>

IDisposable

Inherited Members: DeepReinforcementLearningAgentBase<T>.Networks

DeepReinforcementLearningAgentBase<T>.ParameterCount

DeepReinforcementLearningAgentBase<T>.Dispose()

DeepReinforcementLearningAgentBase<T>.GetPolicyNetworkForJit()

DeepReinforcementLearningAgentBase<T>.SupportsJitCompilation

DeepReinforcementLearningAgentBase<T>.ExportComputationGraph(List<ComputationNode<T>>)

ReinforcementLearningAgentBase<T>.NumOps

ReinforcementLearningAgentBase<T>.Random

ReinforcementLearningAgentBase<T>.LossFunction

ReinforcementLearningAgentBase<T>.LearningRate

ReinforcementLearningAgentBase<T>.DiscountFactor

ReinforcementLearningAgentBase<T>.TrainingSteps

ReinforcementLearningAgentBase<T>.Episodes

ReinforcementLearningAgentBase<T>.LossHistory

ReinforcementLearningAgentBase<T>.RewardHistory

ReinforcementLearningAgentBase<T>.Options

ReinforcementLearningAgentBase<T>.DefaultLossFunction

ReinforcementLearningAgentBase<T>.Train(Vector<T>, Vector<T>)

ReinforcementLearningAgentBase<T>.FeatureNames

ReinforcementLearningAgentBase<T>.GetFeatureImportance()

ReinforcementLearningAgentBase<T>.GetActiveFeatureIndices()

ReinforcementLearningAgentBase<T>.IsFeatureUsed(int)

ReinforcementLearningAgentBase<T>.SetActiveFeatureIndices(IEnumerable<int>)

ReinforcementLearningAgentBase<T>.DeepCopy()

ReinforcementLearningAgentBase<T>.WithParameters(Vector<T>)

ReinforcementLearningAgentBase<T>.ComputeAverage(IEnumerable<T>)

ReinforcementLearningAgentBase<T>.SaveState(Stream)

ReinforcementLearningAgentBase<T>.LoadState(Stream)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Extension Methods: DistributedExtensions.AsDistributedForHighBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributedForLowBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, IShardingConfiguration<T>)

Remarks

MuZero combines tree search (like AlphaZero) with learned dynamics. It masters games without knowing the rules, learning its own internal model.

For Beginners: MuZero is DeepMind's breakthrough that achieved superhuman performance in Atari, Go, Chess, and Shogi without being told the rules. It learns its own "internal model" of the game and uses tree search to plan ahead.

Three key networks:

Representation: Observation -> hidden state
Dynamics: (hidden state, action) -> (next hidden state, reward)
Prediction: hidden state -> (policy, value)

Plus tree search (MCTS) for planning using the learned model.

Think of it as: Learning chess by watching games, figuring out the rules yourself, then planning moves by mentally simulating the game.

Famous for: Superhuman Atari/board games without knowing rules

Constructors

MuZeroAgent(MuZeroOptions<T>)

public MuZeroAgent(MuZeroOptions<T> options)

Parameters

options MuZeroOptions<T>

Properties

FeatureCount

Gets the number of input features (state dimensions).

public override int FeatureCount { get; }

Property Value

int

Methods

ApplyGradients(Vector<T>, T)

Applies gradients to update the agent.

public override void ApplyGradients(Vector<T> gradients, T learningRate)

Parameters

gradients Vector<T>
learningRate T

Clone()

Clones the agent.

public override IFullModel<T, Vector<T>, Vector<T>> Clone()

Returns

IFullModel<T, Vector<T>, Vector<T>>

ComputeGradients(Vector<T>, Vector<T>, ILossFunction<T>?)

Computes gradients for the agent.

public override Vector<T> ComputeGradients(Vector<T> input, Vector<T> target, ILossFunction<T>? lossFunction = null)

Parameters

input Vector<T>
target Vector<T>
lossFunction ILossFunction<T>

Returns

Vector<T>

Deserialize(byte[])

Deserializes the agent from bytes.

public override void Deserialize(byte[] data)

Parameters

data byte[]

GetMetrics()

Gets the current training metrics.

public override Dictionary<string, T> GetMetrics()

Returns

Dictionary<string, T>: Dictionary of metric names to values.

GetModelMetadata()

Gets model metadata.

public override ModelMetadata<T> GetModelMetadata()

Returns

ModelMetadata<T>

GetParameters()

Gets the agent's parameters.

public override Vector<T> GetParameters()

Returns

Vector<T>

LoadModel(string)

Loads the agent's state from a file.

public override void LoadModel(string filepath)

Parameters

filepath string: Path to load the agent from.

Predict(Vector<T>)

Makes a prediction using the trained agent.

public override Vector<T> Predict(Vector<T> input)

Parameters

input Vector<T>

Returns

Vector<T>

PredictAsync(Vector<T>)

public Task<Vector<T>> PredictAsync(Vector<T> input)

Parameters

input Vector<T>

Returns

Task<Vector<T>>

ResetEpisode()

Resets episode-specific state (if any).

public override void ResetEpisode()

SaveModel(string)

Saves the agent's state to a file.

public override void SaveModel(string filepath)

Parameters

filepath string: Path to save the agent.

SelectAction(Vector<T>, bool)

Selects an action given the current state observation.

public override Vector<T> SelectAction(Vector<T> observation, bool training = true)

Parameters

observation Vector<T>
training bool: Whether the agent is in training mode (affects exploration).

Returns

Vector<T>: Action as a Vector (can be discrete or continuous).

Serialize()

Serializes the agent to bytes.

public override byte[] Serialize()

Returns

byte[]

SetParameters(Vector<T>)

Sets the agent's parameters.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>

StoreExperience(Vector<T>, Vector<T>, T, Vector<T>, bool)

Stores an experience tuple for later learning.

public override void StoreExperience(Vector<T> observation, Vector<T> action, T reward, Vector<T> nextObservation, bool done)

Parameters

observation Vector<T>
action Vector<T>: The action taken.
reward T: The reward received.
nextObservation Vector<T>
done bool: Whether the episode terminated.

Train()

Performs one training step, updating the agent's policy/value function.

public override T Train()

Returns

T: The training loss for monitoring.

TrainAsync()

public Task TrainAsync()

Returns

Task

Table of Contents

Class MuZeroAgent<T>

Type Parameters

Remarks

Constructors

MuZeroAgent(MuZeroOptions<T>)

Parameters

Properties

FeatureCount

Property Value

Methods

ApplyGradients(Vector<T>, T)

Parameters

Clone()

Returns

ComputeGradients(Vector<T>, Vector<T>, ILossFunction<T>?)

Parameters

Returns

Deserialize(byte[])

Parameters

GetMetrics()

Returns

GetModelMetadata()

Returns

GetParameters()

Returns

LoadModel(string)

Parameters

Predict(Vector<T>)

Parameters

Returns

PredictAsync(Vector<T>)

Parameters

Returns

ResetEpisode()

SaveModel(string)

Parameters

SelectAction(Vector<T>, bool)

Parameters

Returns

Serialize()

Returns

SetParameters(Vector<T>)

Parameters

StoreExperience(Vector<T>, Vector<T>, T, Vector<T>, bool)

Parameters

Train()

Returns

TrainAsync()

Returns