Class DataPipelineExtensions

Namespace: AiDotNet.Data.Pipeline

Assembly: AiDotNet.dll

Extension methods for creating data pipelines from various sources.

public static class DataPipelineExtensions

Inheritance: object

DataPipelineExtensions

Inherited Members: object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Methods

PaddedBatch<T>(DataPipeline<T>, int, T)

Creates batches with padding to ensure uniform batch sizes.

public static DataPipeline<T[]> PaddedBatch<T>(this DataPipeline<T> source, int batchSize, T padValue)

Parameters

source DataPipeline<T>: The source pipeline.
batchSize int: Number of elements per batch.
padValue T: Value to use for padding.

Returns

DataPipeline<T[]>: A new DataPipeline with padded batches.

Type Parameters

T: The element type.

Remarks

For Beginners: PaddedBatch ensures all batches have the same size by adding padding values to the last batch. This is useful when your model requires fixed batch sizes.

Sample<T>(DataPipeline<T>, IReadOnlyList<double>, int, int?)

Samples elements with replacement using the given weights.

public static DataPipeline<T> Sample<T>(this DataPipeline<T> source, IReadOnlyList<double> weights, int numSamples, int? seed = null)

Parameters

source DataPipeline<T>: The source pipeline.
weights IReadOnlyList<double>: Weight for each element.
numSamples int: Number of samples to draw.
seed int?: Optional random seed.

Returns

DataPipeline<T>: A new DataPipeline with sampled elements.

Type Parameters

T: The element type.

ToAsyncPipeline<T, TInput, TOutput>(IStreamingDataLoader<T, TInput, TOutput>, bool, int?)

Creates an async DataPipeline from a streaming data loader.

public static AsyncDataPipeline<(TInput[] Inputs, TOutput[] Outputs)> ToAsyncPipeline<T, TInput, TOutput>(this IStreamingDataLoader<T, TInput, TOutput> loader, bool shuffle = true, int? seed = null)

Parameters

loader IStreamingDataLoader<T, TInput, TOutput>: The streaming data loader.
shuffle bool: Whether to shuffle the data.
seed int?: Optional random seed for reproducibility.

Returns

AsyncDataPipeline<(TInput[] Inputs, TOutput[] Outputs)>: A new async DataPipeline of input/output tuples.

Type Parameters

T: The numeric type.
TInput: The input type.
TOutput: The output type.

Remarks

For Beginners: This creates an async pipeline from a streaming data loader, enabling efficient prefetching and parallel processing.

Example:

await foreach (var batch in streamingLoader.ToAsyncPipeline().Prefetch(2))
{
    await model.TrainOnBatchAsync(batch);
}

ToPipeline<TBatch>(IBatchIterable<TBatch>)

Creates a DataPipeline from a batch iterable.

public static DataPipeline<TBatch> ToPipeline<TBatch>(this IBatchIterable<TBatch> source)

Parameters

source IBatchIterable<TBatch>: The batch iterable source.

Returns

DataPipeline<TBatch>: A new DataPipeline.

Type Parameters

TBatch: The batch type.

ToPipeline<T>(IEnumerable<T>)

Creates a DataPipeline from an enumerable.

public static DataPipeline<T> ToPipeline<T>(this IEnumerable<T> source)

Parameters

source IEnumerable<T>: The enumerable source.

Returns

DataPipeline<T>: A new DataPipeline.

Type Parameters

T: The element type.

ToPipeline<T, TInput, TOutput>(IStreamingDataLoader<T, TInput, TOutput>, bool, int?)

Creates a DataPipeline from a streaming data loader.

public static DataPipeline<(TInput[] Inputs, TOutput[] Outputs)> ToPipeline<T, TInput, TOutput>(this IStreamingDataLoader<T, TInput, TOutput> loader, bool shuffle = true, int? seed = null)

Parameters

loader IStreamingDataLoader<T, TInput, TOutput>: The streaming data loader.
shuffle bool: Whether to shuffle the data.
seed int?: Optional random seed for reproducibility.

Returns

DataPipeline<(TInput[] Inputs, TOutput[] Outputs)>: A new DataPipeline of input/output tuples.

Type Parameters

T: The numeric type.
TInput: The input type.
TOutput: The output type.

Remarks

For Beginners: This creates a pipeline from a streaming data loader, allowing you to chain operations like Map, Filter, Batch, etc.

Example:

var pipeline = streamingLoader.ToPipeline()
    .Map(batch => NormalizeBatch(batch))
    .Shuffle(1000)
    .Prefetch(2);

ToSamplePipeline<T, TInput, TOutput>(IStreamingDataLoader<T, TInput, TOutput>, bool, int?)

Creates a DataPipeline of individual samples from a streaming data loader.

public static DataPipeline<(TInput Input, TOutput Output)> ToSamplePipeline<T, TInput, TOutput>(this IStreamingDataLoader<T, TInput, TOutput> loader, bool shuffle = true, int? seed = null)

Parameters

loader IStreamingDataLoader<T, TInput, TOutput>: The streaming data loader.
shuffle bool: Whether to shuffle the data.
seed int?: Optional random seed for reproducibility.

Returns

DataPipeline<(TInput Input, TOutput Output)>: A new DataPipeline of individual input/output samples.

Type Parameters

T: The numeric type.
TInput: The input type.
TOutput: The output type.

Remarks

For Beginners: Unlike ToPipeline which yields batches, this yields individual samples. Useful when you want to apply per-sample operations before re-batching.

Example:

var pipeline = streamingLoader.ToSamplePipeline()
    .Map(sample => AugmentSample(sample))
    .Shuffle(5000)
    .Batch(64);

Window<T>(DataPipeline<T>, int, int?)

Applies window-based operations to the pipeline.

public static DataPipeline<T[]> Window<T>(this DataPipeline<T> source, int windowSize, int? shift = null)

Parameters

source DataPipeline<T>: The source pipeline.
windowSize int: Size of each window.
shift int?: Number of elements to shift between windows. Default is windowSize (non-overlapping).

Returns

DataPipeline<T[]>: A new DataPipeline with windowed elements.

Type Parameters

T: The element type.

Remarks

For Beginners: Window groups consecutive elements together. With windowSize=5 and shift=1, you get overlapping windows useful for sequence models.

Table of Contents

Class DataPipelineExtensions

Methods

PaddedBatch<T>(DataPipeline<T>, int, T)

Parameters

Returns

Type Parameters

Remarks

Sample<T>(DataPipeline<T>, IReadOnlyList<double>, int, int?)

Parameters

Returns

Type Parameters

ToAsyncPipeline<T, TInput, TOutput>(IStreamingDataLoader<T, TInput, TOutput>, bool, int?)

Parameters

Returns

Type Parameters

Remarks

ToPipeline<TBatch>(IBatchIterable<TBatch>)

Parameters

Returns

Type Parameters

ToPipeline<T>(IEnumerable<T>)

Parameters

Returns

Type Parameters

ToPipeline<T, TInput, TOutput>(IStreamingDataLoader<T, TInput, TOutput>, bool, int?)

Parameters

Returns

Type Parameters

Remarks

ToSamplePipeline<T, TInput, TOutput>(IStreamingDataLoader<T, TInput, TOutput>, bool, int?)

Parameters

Returns

Type Parameters

Remarks

Window<T>(DataPipeline<T>, int, int?)

Parameters

Returns

Type Parameters

Remarks