Table of Contents

Class UniformReplayBuffer<T, TState, TAction>

Namespace
AiDotNet.ReinforcementLearning.ReplayBuffers
Assembly
AiDotNet.dll

A replay buffer that samples experiences uniformly at random.

public class UniformReplayBuffer<T, TState, TAction> : IReplayBuffer<T, TState, TAction>

Type Parameters

T

The numeric type used for calculations (e.g., float, double).

TState

The type representing the state observation (e.g., Vector<T>, Tensor<T>).

TAction

The type representing the action (e.g., Vector<T> for continuous, int for discrete).

Inheritance
UniformReplayBuffer<T, TState, TAction>
Implements
IReplayBuffer<T, TState, TAction>
Inherited Members

Remarks

This is the standard replay buffer used in algorithms like DQN. Experiences are stored in a circular buffer and sampled uniformly at random for training. All experiences have an equal probability of being selected, regardless of their importance or recency.

For Beginners: This replay buffer treats all experiences equally - it's like having a bag of memories and pulling out random ones to learn from. When the buffer is full, the oldest memories get replaced with new ones.

Key Properties:

  • Uniform Sampling: Every experience has an equal chance of being picked
  • Circular Buffer: Old experiences are automatically removed when capacity is reached
  • No Prioritization: Unlike prioritized replay, doesn't favor "important" experiences

When to Use:

  • Good starting point for most RL algorithms
  • Works well when all experiences are roughly equally valuable
  • Simpler and faster than prioritized variants

Constructors

UniformReplayBuffer(int, int?)

Initializes a new instance of the UniformReplayBuffer class.

public UniformReplayBuffer(int capacity, int? seed = null)

Parameters

capacity int

Maximum number of experiences to store.

seed int?

Optional random seed for reproducibility.

Remarks

For Beginners: Capacity determines how many experiences the buffer remembers. Larger buffers have more diverse experiences but use more memory. Common values: 10,000 for simple problems, 100,000-1,000,000 for complex ones.

Properties

Capacity

Gets the maximum capacity of the buffer.

public int Capacity { get; }

Property Value

int

Remarks

For Beginners: This is the maximum number of experiences the buffer can hold. Once full, old experiences are typically replaced with new ones (FIFO). Common values: 10,000 to 1,000,000 depending on available memory.

Count

Gets the current number of experiences in the buffer.

public int Count { get; }

Property Value

int

Remarks

For Beginners: How many experiences are currently stored. Training usually starts only when the buffer has enough experiences (e.g., Count >= BatchSize).

Methods

Add(Experience<T, TState, TAction>)

Adds an experience to the buffer.

public void Add(Experience<T, TState, TAction> experience)

Parameters

experience Experience<T, TState, TAction>

The experience to add.

Remarks

For Beginners: Call this after each step in the environment to store what happened for later learning. If the buffer is full, the oldest experience is typically removed to make room.

CanSample(int)

Checks if the buffer has enough experiences to sample a batch.

public bool CanSample(int batchSize)

Parameters

batchSize int

The desired batch size.

Returns

bool

True if buffer contains at least batchSize experiences.

Remarks

For Beginners: Check this before sampling to avoid errors. Training should wait until CanSample(batchSize) returns true.

Clear()

Clears all experiences from the buffer.

public void Clear()

Remarks

For Beginners: Removes all stored experiences. Use this when starting fresh training or when the environment changes significantly.

Sample(int)

Samples a batch of experiences from the buffer.

public List<Experience<T, TState, TAction>> Sample(int batchSize)

Parameters

batchSize int

Number of experiences to sample.

Returns

List<Experience<T, TState, TAction>>

List of sampled experiences.

Remarks

For Beginners: Randomly selects experiences for training. Random sampling breaks temporal correlations (nearby experiences being similar) which helps the neural network learn more stable patterns.

SampleWithIndices(int)

Samples a batch of experiences with their buffer indices.

public (List<Experience<T, TState, TAction>> Experiences, List<int> Indices) SampleWithIndices(int batchSize)

Parameters

batchSize int

Number of experiences to sample.

Returns

(List<Experience<T, TState, TAction>> Experiences, List<int> Indices)

A tuple containing the list of sampled experiences and their corresponding buffer indices.

Remarks

This method is useful for multi-agent scenarios where additional per-agent data is stored separately and needs to be retrieved using the buffer index.

For Beginners: Sometimes you need to know where in the buffer each sampled experience came from. This method returns both the experiences and their positions, which is useful for advanced techniques like updating priorities in prioritized replay.