Class SamplingHelper
Provides methods for sampling data, which is essential for many AI and machine learning techniques.
public static class SamplingHelper
- Inheritance
-
SamplingHelper
- Inherited Members
Remarks
For Beginners: Sampling is like picking random items from a collection. This is important in AI for creating training sets, validation sets, and implementing techniques like bootstrapping that help improve model accuracy and reliability.
Methods
ClearSeed()
Clears the seed and restores thread-safe random number generation.
public static void ClearSeed()
Remarks
For Beginners: After calling SetSeed for reproducible experiments, you can call this method to go back to using the default thread-safe random generation.
CreateBootstrapSamples<T>(T[], int, int?)
Creates bootstrap samples from the given data, which are random samples with replacement used for estimating statistical properties.
public static List<T[]> CreateBootstrapSamples<T>(T[] data, int numberOfSamples, int? sampleSize = null)
Parameters
dataT[]The original data array to sample from.
numberOfSamplesintThe number of bootstrap samples to create.
sampleSizeint?The size of each bootstrap sample. If null, it will be the same as the original data size.
Returns
- List<T[]>
A list of bootstrap samples, where each sample is an array of data elements.
Type Parameters
TThe type of the data elements.
Remarks
For Beginners: Bootstrapping is a powerful technique where you create multiple "synthetic" datasets by randomly sampling from your original data (with replacement).
For example, if you have 100 data points:
- You might create 50 different bootstrap samples
- Each sample contains 100 randomly selected data points (some repeated, some missing)
- You can train 50 different models on these samples
- The variation in these models helps you understand how reliable your predictions are
This is especially useful when you have limited data but need to understand the uncertainty in your model's predictions.
SampleWithReplacement(int, int)
Performs sampling with replacement, meaning the same item can be selected multiple times.
public static int[] SampleWithReplacement(int populationSize, int sampleSize)
Parameters
populationSizeintThe size of the population to sample from.
sampleSizeintThe number of samples to take.
Returns
- int[]
An array of indices representing the sampled items.
Remarks
For Beginners: This is like rolling a die multiple times - you can get the same number more than once. In data terms, if you have 100 data points and need 10 samples, some data points might be selected multiple times while others might not be selected at all.
This approach is useful for techniques like bootstrapping, where repeated sampling helps estimate the reliability of your model.
SampleWithoutReplacement(int, int)
Performs sampling without replacement, meaning once an item is selected, it cannot be selected again.
public static int[] SampleWithoutReplacement(int populationSize, int sampleSize)
Parameters
populationSizeintThe size of the population to sample from.
sampleSizeintThe number of samples to take.
Returns
- int[]
An array of indices representing the sampled items.
Remarks
For Beginners: Think of this like drawing lottery numbers where each ball can only be drawn once. For example, if you have 100 data points and need a random subset of 10, this method ensures you get 10 different data points.
Exceptions
- ArgumentException
Thrown when sample size is greater than population size.
SetSeed(int)
Sets the seed for the random number generator to ensure reproducible results.
public static void SetSeed(int seed)
Parameters
seedintThe seed value to initialize the random number generator.
Remarks
For Beginners: Random number generators aren't truly random - they follow mathematical formulas that produce numbers that appear random. The "seed" is the starting point for this formula.
Setting a specific seed means you'll get the same sequence of "random" numbers every time. This is crucial in AI/ML when you want your experiments to be reproducible - so you can get the same results when you run your code again, or when someone else runs your code.
For example, setting seed=42 before training a model ensures that random operations like data shuffling happen the same way each time.
Note: Setting a seed overrides the thread-safe behavior. Call ClearSeed() to restore thread-safe random generation.