Table of Contents

Class SamplingHelper

Namespace
AiDotNet.Helpers
Assembly
AiDotNet.dll

Provides methods for sampling data, which is essential for many AI and machine learning techniques.

public static class SamplingHelper
Inheritance
SamplingHelper
Inherited Members

Remarks

For Beginners: Sampling is like picking random items from a collection. This is important in AI for creating training sets, validation sets, and implementing techniques like bootstrapping that help improve model accuracy and reliability.

Methods

ClearSeed()

Clears the seed and restores thread-safe random number generation.

public static void ClearSeed()

Remarks

For Beginners: After calling SetSeed for reproducible experiments, you can call this method to go back to using the default thread-safe random generation.

CreateBootstrapSamples<T>(T[], int, int?)

Creates bootstrap samples from the given data, which are random samples with replacement used for estimating statistical properties.

public static List<T[]> CreateBootstrapSamples<T>(T[] data, int numberOfSamples, int? sampleSize = null)

Parameters

data T[]

The original data array to sample from.

numberOfSamples int

The number of bootstrap samples to create.

sampleSize int?

The size of each bootstrap sample. If null, it will be the same as the original data size.

Returns

List<T[]>

A list of bootstrap samples, where each sample is an array of data elements.

Type Parameters

T

The type of the data elements.

Remarks

For Beginners: Bootstrapping is a powerful technique where you create multiple "synthetic" datasets by randomly sampling from your original data (with replacement).

For example, if you have 100 data points:

  1. You might create 50 different bootstrap samples
  2. Each sample contains 100 randomly selected data points (some repeated, some missing)
  3. You can train 50 different models on these samples
  4. The variation in these models helps you understand how reliable your predictions are

This is especially useful when you have limited data but need to understand the uncertainty in your model's predictions.

SampleWithReplacement(int, int)

Performs sampling with replacement, meaning the same item can be selected multiple times.

public static int[] SampleWithReplacement(int populationSize, int sampleSize)

Parameters

populationSize int

The size of the population to sample from.

sampleSize int

The number of samples to take.

Returns

int[]

An array of indices representing the sampled items.

Remarks

For Beginners: This is like rolling a die multiple times - you can get the same number more than once. In data terms, if you have 100 data points and need 10 samples, some data points might be selected multiple times while others might not be selected at all.

This approach is useful for techniques like bootstrapping, where repeated sampling helps estimate the reliability of your model.

SampleWithoutReplacement(int, int)

Performs sampling without replacement, meaning once an item is selected, it cannot be selected again.

public static int[] SampleWithoutReplacement(int populationSize, int sampleSize)

Parameters

populationSize int

The size of the population to sample from.

sampleSize int

The number of samples to take.

Returns

int[]

An array of indices representing the sampled items.

Remarks

For Beginners: Think of this like drawing lottery numbers where each ball can only be drawn once. For example, if you have 100 data points and need a random subset of 10, this method ensures you get 10 different data points.

Exceptions

ArgumentException

Thrown when sample size is greater than population size.

SetSeed(int)

Sets the seed for the random number generator to ensure reproducible results.

public static void SetSeed(int seed)

Parameters

seed int

The seed value to initialize the random number generator.

Remarks

For Beginners: Random number generators aren't truly random - they follow mathematical formulas that produce numbers that appear random. The "seed" is the starting point for this formula.

Setting a specific seed means you'll get the same sequence of "random" numbers every time. This is crucial in AI/ML when you want your experiments to be reproducible - so you can get the same results when you run your code again, or when someone else runs your code.

For example, setting seed=42 before training a model ensures that random operations like data shuffling happen the same way each time.

Note: Setting a seed overrides the thread-safe behavior. Call ClearSeed() to restore thread-safe random generation.