Class DataPipelineExtensions
Extension methods for creating data pipelines from various sources.
public static class DataPipelineExtensions
- Inheritance
-
DataPipelineExtensions
- Inherited Members
Methods
PaddedBatch<T>(DataPipeline<T>, int, T)
Creates batches with padding to ensure uniform batch sizes.
public static DataPipeline<T[]> PaddedBatch<T>(this DataPipeline<T> source, int batchSize, T padValue)
Parameters
sourceDataPipeline<T>The source pipeline.
batchSizeintNumber of elements per batch.
padValueTValue to use for padding.
Returns
- DataPipeline<T[]>
A new DataPipeline with padded batches.
Type Parameters
TThe element type.
Remarks
For Beginners: PaddedBatch ensures all batches have the same size by adding padding values to the last batch. This is useful when your model requires fixed batch sizes.
Sample<T>(DataPipeline<T>, IReadOnlyList<double>, int, int?)
Samples elements with replacement using the given weights.
public static DataPipeline<T> Sample<T>(this DataPipeline<T> source, IReadOnlyList<double> weights, int numSamples, int? seed = null)
Parameters
sourceDataPipeline<T>The source pipeline.
weightsIReadOnlyList<double>Weight for each element.
numSamplesintNumber of samples to draw.
seedint?Optional random seed.
Returns
- DataPipeline<T>
A new DataPipeline with sampled elements.
Type Parameters
TThe element type.
ToAsyncPipeline<T, TInput, TOutput>(IStreamingDataLoader<T, TInput, TOutput>, bool, int?)
Creates an async DataPipeline from a streaming data loader.
public static AsyncDataPipeline<(TInput[] Inputs, TOutput[] Outputs)> ToAsyncPipeline<T, TInput, TOutput>(this IStreamingDataLoader<T, TInput, TOutput> loader, bool shuffle = true, int? seed = null)
Parameters
loaderIStreamingDataLoader<T, TInput, TOutput>The streaming data loader.
shuffleboolWhether to shuffle the data.
seedint?Optional random seed for reproducibility.
Returns
- AsyncDataPipeline<(TInput[] Inputs, TOutput[] Outputs)>
A new async DataPipeline of input/output tuples.
Type Parameters
TThe numeric type.
TInputThe input type.
TOutputThe output type.
Remarks
For Beginners: This creates an async pipeline from a streaming data loader, enabling efficient prefetching and parallel processing.
Example:
await foreach (var batch in streamingLoader.ToAsyncPipeline().Prefetch(2))
{
await model.TrainOnBatchAsync(batch);
}
ToPipeline<TBatch>(IBatchIterable<TBatch>)
Creates a DataPipeline from a batch iterable.
public static DataPipeline<TBatch> ToPipeline<TBatch>(this IBatchIterable<TBatch> source)
Parameters
sourceIBatchIterable<TBatch>The batch iterable source.
Returns
- DataPipeline<TBatch>
A new DataPipeline.
Type Parameters
TBatchThe batch type.
ToPipeline<T>(IEnumerable<T>)
Creates a DataPipeline from an enumerable.
public static DataPipeline<T> ToPipeline<T>(this IEnumerable<T> source)
Parameters
sourceIEnumerable<T>The enumerable source.
Returns
- DataPipeline<T>
A new DataPipeline.
Type Parameters
TThe element type.
ToPipeline<T, TInput, TOutput>(IStreamingDataLoader<T, TInput, TOutput>, bool, int?)
Creates a DataPipeline from a streaming data loader.
public static DataPipeline<(TInput[] Inputs, TOutput[] Outputs)> ToPipeline<T, TInput, TOutput>(this IStreamingDataLoader<T, TInput, TOutput> loader, bool shuffle = true, int? seed = null)
Parameters
loaderIStreamingDataLoader<T, TInput, TOutput>The streaming data loader.
shuffleboolWhether to shuffle the data.
seedint?Optional random seed for reproducibility.
Returns
- DataPipeline<(TInput[] Inputs, TOutput[] Outputs)>
A new DataPipeline of input/output tuples.
Type Parameters
TThe numeric type.
TInputThe input type.
TOutputThe output type.
Remarks
For Beginners: This creates a pipeline from a streaming data loader, allowing you to chain operations like Map, Filter, Batch, etc.
Example:
var pipeline = streamingLoader.ToPipeline()
.Map(batch => NormalizeBatch(batch))
.Shuffle(1000)
.Prefetch(2);
ToSamplePipeline<T, TInput, TOutput>(IStreamingDataLoader<T, TInput, TOutput>, bool, int?)
Creates a DataPipeline of individual samples from a streaming data loader.
public static DataPipeline<(TInput Input, TOutput Output)> ToSamplePipeline<T, TInput, TOutput>(this IStreamingDataLoader<T, TInput, TOutput> loader, bool shuffle = true, int? seed = null)
Parameters
loaderIStreamingDataLoader<T, TInput, TOutput>The streaming data loader.
shuffleboolWhether to shuffle the data.
seedint?Optional random seed for reproducibility.
Returns
- DataPipeline<(TInput Input, TOutput Output)>
A new DataPipeline of individual input/output samples.
Type Parameters
TThe numeric type.
TInputThe input type.
TOutputThe output type.
Remarks
For Beginners: Unlike ToPipeline which yields batches, this yields individual samples. Useful when you want to apply per-sample operations before re-batching.
Example:
var pipeline = streamingLoader.ToSamplePipeline()
.Map(sample => AugmentSample(sample))
.Shuffle(5000)
.Batch(64);
Window<T>(DataPipeline<T>, int, int?)
Applies window-based operations to the pipeline.
public static DataPipeline<T[]> Window<T>(this DataPipeline<T> source, int windowSize, int? shift = null)
Parameters
sourceDataPipeline<T>The source pipeline.
windowSizeintSize of each window.
shiftint?Number of elements to shift between windows. Default is windowSize (non-overlapping).
Returns
- DataPipeline<T[]>
A new DataPipeline with windowed elements.
Type Parameters
TThe element type.
Remarks
For Beginners: Window groups consecutive elements together. With windowSize=5 and shift=1, you get overlapping windows useful for sequence models.