Creation and manipulation of AvalancheDatasets and its subclasses.
The AvalancheDataset is an implementation of the PyTorch Dataset class which comes with many out-of-the-box functionalities. The AvalancheDataset (an its few subclass) are extensively used through the whole Avalanche library as the reference way to manipulate datasets:
The dataset carried by the experience.dataset
field is always an AvalancheDataset.
Benchmark creation functions accept AvalancheDatasets to create benchmarks where a finer control over task labels is required.
Internally, benchmarks are created by manipulating AvalancheDatasets.
This first Mini How-To will guide through the main ways you can use to instantiate an AvalancheDataset while the other Mini How-Tos (complete list here) will show how to use its functionalities.
It is warmly recommended to run this page as a notebook using Colab (info at the bottom of this page).
Let's start by installing avalanche:
This mini How-To will guide you through the main ways used to instantiate an AvalancheDataset.
First thing: the base class AvalancheDataset
is a wrapper for existing datasets. Only two things must be considered when wrapping an existing dataset:
Apart from the x and y values, the resulting AvalancheDataset will also return a third value: the task label (which defaults to 0).
The wrapped dataset must contain a valid targets field.
The targets field is available is nearly all torchvision datasets. It must be a list containing the label for each data point (usually the y value). In this way, Avalanche can use that field when instantiating benchmarks like the "Class/Task-Incremental* and Domain-Incremental ones.
Avalanche exposes 4 classes of AvalancheDatasets which map exactly the 4 Dataset classes offered by PyTorch:
AvalancheDataset
: the base class, which acts a wrapper to existing Dataset instances.
AvalancheTensorDataset
: equivalent to PyTorch TesnsorDataset
.
AvalancheSubset
: equivalent to PyTorch Subset
.
AvalancheConcatDataset
: equivalent to PyTorch ConcatDataset
.
Given a dataset (like MNIST), an AvalancheDataset can be instantiated as follows:
Just like any other Dataset, a data point can be obtained using the x, y = dataset[idx]
syntax. When obtaining a data point from an AvalancheDataset, an additional third value (the task label) will be returned:
Useful tip: if you are not sure if you are dealing with a PyTorch Dataset or an AvalancheDataset, or if you want to ignore task labels, you can use this syntax:
The PyTorch TensorDataset is one of the most useful Dataset classes as it can be used to quickly prototype the data loading part of your code.
A TensorDataset can be wrapped in an AvalancheDataset just like any Dataset, but this is not much convenient, as shown below:
Instead, it is recommended to use the AvalancheTensorDataset class to get the same result. In this way, you can just skip one intermediate step.
In both cases, AvalancheDataset will automatically populate its targets field by using the values from the second Tensor (which usually contains the Y values). This behaviour can be customized by passing a custom targets
constructor parameter (by either passing a list of targets or the index of the Tensor to use).
The cell below shows the content of the target field of the dataset created in the cell above. Notice that the targets field has been filled with the content of the second Tensor (y_data).
Avalanche offers the AvalancheSubset
and AvalancheConcatDataset
implementations that extend the functionalities of PyTorch Subset and ConcatDataset.
Regarding the subsetting operation, AvalancheSubset
behaves in the same way the PyTorch Subset
class does: both implementations accept a dataset and a list of indices as parameters. The resulting Subset is not a copy of the dataset, it's just a view. This is similar to creating a view of a NumPy array by passing a list of indexes using the numpy_array[list_of_indices]
syntax. This can be used to both create a smaller dataset and to change the order of data points in the dataset.
Here we create a toy dataset in which each X and Y values are ints. We then obtain a subset of it by creating an AvalancheSubset:
Concatenation is even simpler. Just like with PyTorch ConcatDataset, one can easily concatentate datasets with AvalancheConcatDataset.
Both AvalancheConcatDataset and PyTorch ConcatDataset accept a list of datasets to concatenate.
This Mini How-To showed you how to create instances of AvalancheDataset (and its subclasses).
Other Mini How-Tos will guide you through the functionalities offered by AvalancheDataset. The list of Mini How-Tos can be found here.