Table of Contents

Class DataPipelineExtensions

Namespace
AiDotNet.Data.Pipeline
Assembly
AiDotNet.dll

Extension methods for creating data pipelines from various sources.

public static class DataPipelineExtensions
Inheritance
DataPipelineExtensions
Inherited Members

Methods

PaddedBatch<T>(DataPipeline<T>, int, T)

Creates batches with padding to ensure uniform batch sizes.

public static DataPipeline<T[]> PaddedBatch<T>(this DataPipeline<T> source, int batchSize, T padValue)

Parameters

source DataPipeline<T>

The source pipeline.

batchSize int

Number of elements per batch.

padValue T

Value to use for padding.

Returns

DataPipeline<T[]>

A new DataPipeline with padded batches.

Type Parameters

T

The element type.

Remarks

For Beginners: PaddedBatch ensures all batches have the same size by adding padding values to the last batch. This is useful when your model requires fixed batch sizes.

Sample<T>(DataPipeline<T>, IReadOnlyList<double>, int, int?)

Samples elements with replacement using the given weights.

public static DataPipeline<T> Sample<T>(this DataPipeline<T> source, IReadOnlyList<double> weights, int numSamples, int? seed = null)

Parameters

source DataPipeline<T>

The source pipeline.

weights IReadOnlyList<double>

Weight for each element.

numSamples int

Number of samples to draw.

seed int?

Optional random seed.

Returns

DataPipeline<T>

A new DataPipeline with sampled elements.

Type Parameters

T

The element type.

ToAsyncPipeline<T, TInput, TOutput>(IStreamingDataLoader<T, TInput, TOutput>, bool, int?)

Creates an async DataPipeline from a streaming data loader.

public static AsyncDataPipeline<(TInput[] Inputs, TOutput[] Outputs)> ToAsyncPipeline<T, TInput, TOutput>(this IStreamingDataLoader<T, TInput, TOutput> loader, bool shuffle = true, int? seed = null)

Parameters

loader IStreamingDataLoader<T, TInput, TOutput>

The streaming data loader.

shuffle bool

Whether to shuffle the data.

seed int?

Optional random seed for reproducibility.

Returns

AsyncDataPipeline<(TInput[] Inputs, TOutput[] Outputs)>

A new async DataPipeline of input/output tuples.

Type Parameters

T

The numeric type.

TInput

The input type.

TOutput

The output type.

Remarks

For Beginners: This creates an async pipeline from a streaming data loader, enabling efficient prefetching and parallel processing.

Example:

await foreach (var batch in streamingLoader.ToAsyncPipeline().Prefetch(2))
{
    await model.TrainOnBatchAsync(batch);
}

ToPipeline<TBatch>(IBatchIterable<TBatch>)

Creates a DataPipeline from a batch iterable.

public static DataPipeline<TBatch> ToPipeline<TBatch>(this IBatchIterable<TBatch> source)

Parameters

source IBatchIterable<TBatch>

The batch iterable source.

Returns

DataPipeline<TBatch>

A new DataPipeline.

Type Parameters

TBatch

The batch type.

ToPipeline<T>(IEnumerable<T>)

Creates a DataPipeline from an enumerable.

public static DataPipeline<T> ToPipeline<T>(this IEnumerable<T> source)

Parameters

source IEnumerable<T>

The enumerable source.

Returns

DataPipeline<T>

A new DataPipeline.

Type Parameters

T

The element type.

ToPipeline<T, TInput, TOutput>(IStreamingDataLoader<T, TInput, TOutput>, bool, int?)

Creates a DataPipeline from a streaming data loader.

public static DataPipeline<(TInput[] Inputs, TOutput[] Outputs)> ToPipeline<T, TInput, TOutput>(this IStreamingDataLoader<T, TInput, TOutput> loader, bool shuffle = true, int? seed = null)

Parameters

loader IStreamingDataLoader<T, TInput, TOutput>

The streaming data loader.

shuffle bool

Whether to shuffle the data.

seed int?

Optional random seed for reproducibility.

Returns

DataPipeline<(TInput[] Inputs, TOutput[] Outputs)>

A new DataPipeline of input/output tuples.

Type Parameters

T

The numeric type.

TInput

The input type.

TOutput

The output type.

Remarks

For Beginners: This creates a pipeline from a streaming data loader, allowing you to chain operations like Map, Filter, Batch, etc.

Example:

var pipeline = streamingLoader.ToPipeline()
    .Map(batch => NormalizeBatch(batch))
    .Shuffle(1000)
    .Prefetch(2);

ToSamplePipeline<T, TInput, TOutput>(IStreamingDataLoader<T, TInput, TOutput>, bool, int?)

Creates a DataPipeline of individual samples from a streaming data loader.

public static DataPipeline<(TInput Input, TOutput Output)> ToSamplePipeline<T, TInput, TOutput>(this IStreamingDataLoader<T, TInput, TOutput> loader, bool shuffle = true, int? seed = null)

Parameters

loader IStreamingDataLoader<T, TInput, TOutput>

The streaming data loader.

shuffle bool

Whether to shuffle the data.

seed int?

Optional random seed for reproducibility.

Returns

DataPipeline<(TInput Input, TOutput Output)>

A new DataPipeline of individual input/output samples.

Type Parameters

T

The numeric type.

TInput

The input type.

TOutput

The output type.

Remarks

For Beginners: Unlike ToPipeline which yields batches, this yields individual samples. Useful when you want to apply per-sample operations before re-batching.

Example:

var pipeline = streamingLoader.ToSamplePipeline()
    .Map(sample => AugmentSample(sample))
    .Shuffle(5000)
    .Batch(64);

Window<T>(DataPipeline<T>, int, int?)

Applies window-based operations to the pipeline.

public static DataPipeline<T[]> Window<T>(this DataPipeline<T> source, int windowSize, int? shift = null)

Parameters

source DataPipeline<T>

The source pipeline.

windowSize int

Size of each window.

shift int?

Number of elements to shift between windows. Default is windowSize (non-overlapping).

Returns

DataPipeline<T[]>

A new DataPipeline with windowed elements.

Type Parameters

T

The element type.

Remarks

For Beginners: Window groups consecutive elements together. With windowSize=5 and shift=1, you get overlapping windows useful for sequence models.