Dataset
is a class exposing two methods:__len__()
, which returns the amount of instances in the dataset (as an int
).__getitem__(idx)
, which returns the data point at index idx
.len(dataset)
function.x, y = dataset[idx]
syntax.__getitem__(idx)
is called. The way those things are managed is specific to each dataset implementation.Dataset
: an interface defining the __len__
and __getitem__
methods.TensorDataset
: instantiated by passing X and Y tensors. Each row of the X and Y tensors is interpreted as a data point. The __getitem__(idx)
method will simply return the idx
-th row of X and Y tensors.ConcatDataset
: instantiated by passing a list of datasets. The resulting dataset is a concatenation of those datasets.Subset
: instantiated by passing a dataset and a list of indices. The resulting dataset will only contain the data points described by that list of indices.transformation
function to be passed to the dataset constructor. The support for transformations is not mandatory for a dataset, but it is quite common to support them. The transformation is used to process the X value of a data point before returning it. This is used to normalize values, apply augmentations, etcetera.AvalancheDataset
class implements a very rich and powerful set of functionalities for managing transformations.Dataset
exist in PyTorch: the IterableDataset. When using an IterableDataset
, one can load the data points in a sequential way only (by using a tape-alike approach). The dataset[idx]
syntax and len(dataset)
function are not allowed. Avalanche does NOT support IterableDataset
s. You shouldn't worry about this because, realistically, you will never encounter such datasets.Dataset
is a very simple object that only returns one data point given its index. In order to create minibatches and speed-up the data loading process, a DataLoader
is required.DataLoader
class is a very efficient mechanism that, given a Dataset
, will return minibatches by optonally shuffling data brefore each epoch and by loading data in parallel by using multiple workers.TensorDataset
and then we load it in minibatches using a DataLoader
.