Class FileStreamingDataLoader<T, TInput, TOutput>
A streaming data loader that reads from files in a directory.
public class FileStreamingDataLoader<T, TInput, TOutput> : StreamingDataLoaderBase<T, TInput, TOutput>, IStreamingDataLoader<T, TInput, TOutput>, IDataLoader<T>, IResettable, ICountable
Type Parameters
TThe numeric type used for calculations.
TInputThe type of input data.
TOutputThe type of output/label data.
- Inheritance
-
StreamingDataLoaderBase<T, TInput, TOutput>FileStreamingDataLoader<T, TInput, TOutput>
- Implements
-
IStreamingDataLoader<T, TInput, TOutput>IDataLoader<T>
- Inherited Members
- Extension Methods
Remarks
FileStreamingDataLoader automatically discovers files in a directory and streams them during training. This is ideal for image datasets where each file is a sample.
For Beginners: If you have a folder full of images with labels in the filename or a separate label file, this loader will read them one by one as needed.
Example:
var loader = new FileStreamingDataLoader<float, float[], int>(
directory: "path/to/images",
filePattern: "*.png",
fileProcessor: async (filePath, ct) =>
{
var pixels = await LoadImagePixels(filePath, ct);
var label = ExtractLabelFromPath(filePath);
return (pixels, label);
},
batchSize: 32
);
Constructors
FileStreamingDataLoader(string, string, Func<string, CancellationToken, Task<(TInput, TOutput)>>, int, SearchOption, int, int)
Initializes a new instance of the FileStreamingDataLoader class.
public FileStreamingDataLoader(string directory, string filePattern, Func<string, CancellationToken, Task<(TInput, TOutput)>> fileProcessor, int batchSize, SearchOption searchOption = SearchOption.TopDirectoryOnly, int prefetchCount = 2, int numWorkers = 4)
Parameters
directorystringThe directory containing the data files.
filePatternstringThe file pattern to match (e.g., "*.png").
fileProcessorFunc<string, CancellationToken, Task<(TInput, TOutput)>>Function that processes a file and returns (input, output).
batchSizeintNumber of samples per batch.
searchOptionSearchOptionWhether to search subdirectories.
prefetchCountintNumber of batches to prefetch.
numWorkersintNumber of parallel workers.
Properties
FilePaths
Gets all file paths in the dataset.
public IReadOnlyList<string> FilePaths { get; }
Property Value
Name
Gets the human-readable name of this data loader.
public override string Name { get; }
Property Value
Remarks
Examples: "MNIST", "Cora Citation Network", "IMDB Reviews"
SampleCount
Gets the total number of samples in the dataset.
public override int SampleCount { get; }
Property Value
Remarks
This may be known upfront (e.g., from file metadata) or estimated. For truly streaming sources where the count is unknown, this may return -1.
Methods
ReadSampleAsync(int, CancellationToken)
Reads a single sample by index.
protected override Task<(TInput Input, TOutput Output)> ReadSampleAsync(int index, CancellationToken cancellationToken = default)
Parameters
indexintThe index of the sample to read.
cancellationTokenCancellationTokenCancellation token.
Returns
Remarks
Derived classes must implement this to read a single sample from the data source. This method is called by the batching infrastructure to build batches.