1 of 4

AvalancheDataset

Dealing with AvalancheDatasets

The AvalancheDataset is an implementation of the PyTorch Dataset class that comes with many useful out-of-the-box functionalities. For most users, the AvalancheDataset can be used as a plain PyTorch Dataset that will return x, y, t elements. However, the AvalancheDataset is much more powerful than a simple PyTorch Dataset.

A serie of Mini How-Tos will guide you through the functionalities of the AvalancheDataset and its subclasses:

Brefore jumping to the actual Mini How-Tos, we recommend having a look at the basic notions of Dataset and DataLoader by reading the Preamble page.

Preamble: PyTorch Datasets

Few words about PyTorch Datasets

This short preamble will briefly go through the basic notions of Dataset offered natively by PyTorch. A solid grasp of these notions are needed to understand:

How PyTorch data loading works in general
How AvalancheDatasets differs from PyTorch Datasets

📚 Dataset: general definition

In PyTorch, a Dataset is a class exposing two methods:

__len__(), which returns the amount of instances in the dataset (as an int).
__getitem__(idx), which returns the data point at index idx.

In other words, a Dataset instance is just an object for which, similarly to a list, one can simply:

Obtain its length using the Python len(dataset) function.
Obtain a single data point using the x, y = dataset[idx] syntax.

The content of the dataset can be either loaded in memory when the dataset is instantiated (like the torchvision MNIST dataset does) or, for big datasets like ImageNet, the content is kept on disk, with the dataset keeping the list of files in an internal field. In this case, data is loaded from the storage on-the-fly when __getitem__(idx) is called. The way those things are managed is specific to each dataset implementation.

PyTorch Datasets

The PyTorch library offers 4 Dataset implementations:

Dataset: an interface defining the __len__ and __getitem__ methods.
TensorDataset: instantiated by passing X and Y tensors. Each row of the X and Y tensors is interpreted as a data point. The __getitem__(idx) method will simply return the idx-th row of X and Y tensors.
ConcatDataset: instantiated by passing a list of datasets. The resulting dataset is a concatenation of those datasets.
Subset: instantiated by passing a dataset and a list of indices. The resulting dataset will only contain the data points described by that list of indices.

As explained in the mini How-Tos, Avalanche offers a customized version for all these 4 datasets.

Transformations

Most datasets from the torchvision libraries (as well as datasets found "in the wild") allow for a transformation function to be passed to the dataset constructor. The support for transformations is not mandatory for a dataset, but it is quite common to support them. The transformation is used to process the X value of a data point before returning it. This is used to normalize values, apply augmentations, etcetera.

As explained in the mini How-Tos, the AvalancheDataset class implements a very rich and powerful set of functionalities for managing transformations.

Quick note on the IterableDataset class

A variation of the standard Dataset exist in PyTorch: the IterableDataset. When using an IterableDataset, one can load the data points in a sequential way only (by using a tape-alike approach). The dataset[idx] syntax and len(dataset) function are not allowed. Avalanche does NOT support IterableDatasets. You shouldn't worry about this because, realistically, you will never encounter such datasets.

DataLoader

The Dataset is a very simple object that only returns one data point given its index. In order to create minibatches and speed-up the data loading process, a DataLoader is required.

The PyTorch DataLoader class is a very efficient mechanism that, given a Dataset, will return minibatches by optonally shuffling data brefore each epoch and by loading data in parallel by using multiple workers.

Preamble wrap-up

To wrap-up, let's see how the native, non-Avalanche, PyTorch components work in practice. In the following code we create a TensorDataset and then we load it in minibatches using a DataLoader.

import torch
from torch.utils.data.dataset import TensorDataset
from torch.utils.data.dataloader import DataLoader

# Create a dataset of 100 data points described by 22 features + 1 class label
x_data = torch.rand(100, 22)
y_data = torch.randint(0, 5, (100,))

# Create the Dataset
my_dataset = TensorDataset(x_data, y_data)

# Create the DataLoader
my_dataloader = DataLoader(my_dataset, batch_size=10, shuffle=True, num_workers=4)

# Run one epoch
for x_minibatch, y_minibatch in my_dataloader:
    print('Loaded minibatch of', len(x_minibatch), 'instances')
# Output: "Loaded minibatch of 10 instances" x10 times

Next steps

With these notions in mind, you can start start your journey on understanding the functionalities offered by the AvalancheDatasets by going through the Mini How-Tos.

Please refer to the list of the Mini How-Tos regarding AvalancheDatasets for a complete list. It is recommended to start with the "Creating AvalancheDatasets" Mini How-To.

🤝 Run it on Google Colab

Creating AvalancheDatasets

Creation and manipulation of AvalancheDatasets and its subclasses.

The AvalancheDataset is an implementation of the PyTorch Dataset class which comes with many out-of-the-box functionalities. The AvalancheDataset (an its few subclass) are extensively used through the whole Avalanche library as the reference way to manipulate datasets:

The dataset carried by the experience.dataset field is always an AvalancheDataset.
Benchmark creation functions accept AvalancheDatasets to create benchmarks where a finer control over task labels is required.
Internally, benchmarks are created by manipulating AvalancheDatasets.

This first Mini How-To will guide through the main ways you can use to instantiate an AvalancheDataset while the other Mini How-Tos (complete list here) will show how to use its functionalities.

It is warmly recommended to run this page as a notebook using Colab (info at the bottom of this page).

Let's start by installing avalanche:

!pip install avalanche-lib

AvalancheDataset vs PyTorch Dataset

This mini How-To will guide you through the main ways used to instantiate an AvalancheDataset.

First thing: the base class AvalancheDataset is a wrapper for existing datasets. Only two things must be considered when wrapping an existing dataset:

Apart from the x and y values, the resulting AvalancheDataset will also return a third value: the task label (which defaults to 0).
The wrapped dataset must contain a valid targets field.

The targets field is available is nearly all torchvision datasets. It must be a list containing the label for each data point (usually the y value). In this way, Avalanche can use that field when instantiating benchmarks like the "Class/Task-Incremental* and Domain-Incremental ones.

Avalanche exposes 4 classes of AvalancheDatasets which map exactly the 4 Dataset classes offered by PyTorch:

AvalancheDataset: the base class, which acts a wrapper to existing Dataset instances.
AvalancheTensorDataset: equivalent to PyTorch TesnsorDataset.
AvalancheSubset: equivalent to PyTorch Subset.
AvalancheConcatDataset: equivalent to PyTorch ConcatDataset.

🛠️ Create an AvalancheDataset

Given a dataset (like MNIST), an AvalancheDataset can be instantiated as follows:

from avalanche.benchmarks.utils import AvalancheDataset
from torchvision.datasets import MNIST

# Instantiate the MNIST train dataset from torchvision
mnist_dataset = MNIST('mnist_data', download=True)

# Create the AvalancheDataset
mnist_avalanche_dataset = AvalancheDataset(mnist_dataset)

Just like any other Dataset, a data point can be obtained using the x, y = dataset[idx] syntax. When obtaining a data point from an AvalancheDataset, an additional third value (the task label) will be returned:

# Obtain the first instance from the original dataset
x, y = mnist_dataset[0]
print(f'x={x}, y={y}')
# Output: "x=<PIL.Image.Image image mode=L size=28x28 at 0x7FBEDFDB2430>, y=5"

# Obtain the first instance from the AvalancheDataset
x, y, t = mnist_avalanche_dataset[0]
print(f'x={x}, y={y}, t={t}')
# Output: "x=<PIL.Image.Image image mode=L size=28x28 at 0x7FBEEFD3A850>, y=5, t=0"

Useful tip: if you are not sure if you are dealing with a PyTorch Dataset or an AvalancheDataset, or if you want to ignore task labels, you can use this syntax:

# You can use "x, y, *_" to manage both kinds of Datasets
x, y, *_ = mnist_dataset[0]  # OK
x, y, *_ = mnist_avalanche_dataset[0]  # OK

The AvalancheTensorDataset

The PyTorch TensorDataset is one of the most useful Dataset classes as it can be used to quickly prototype the data loading part of your code.

A TensorDataset can be wrapped in an AvalancheDataset just like any Dataset, but this is not much convenient, as shown below:

import torch
from torch.utils.data import TensorDataset


# Create 10 instances described by 7 features 
x_data = torch.rand(10, 7)

# Create the class labels for the 10 instances
y_data = torch.randint(0, 5, (10,))

# Create the tensor dataset
tensor_dataset = TensorDataset(x_data, y_data)

# Wrap it in an AvalancheDataset
wrapped_tensor_dataset = AvalancheDataset(tensor_dataset)

# Obtain the first instance from the dataset
x, y, t = wrapped_tensor_dataset[0]
print(f'x={x}, y={y}, t={t}')
# Output: "x=tensor([0.6329, 0.8495, 0.1853, 0.7254, 0.7893, 0.8079, 0.1106]), y=4, t=0"

Instead, it is recommended to use the AvalancheTensorDataset class to get the same result. In this way, you can just skip one intermediate step.

from avalanche.benchmarks.utils import AvalancheTensorDataset

# Create the tensor dataset
avl_tensor_dataset = AvalancheTensorDataset(x_data, y_data)

# Obtain the first instance from the AvalancheTensorDataset
x, y, t = avl_tensor_dataset[0]
print(f'x={x}, y={y}, t={t}')
# Output: "x=tensor([0.6329, 0.8495, 0.1853, 0.7254, 0.7893, 0.8079, 0.1106]), y=4, t=0"

In both cases, AvalancheDataset will automatically populate its targets field by using the values from the second Tensor (which usually contains the Y values). This behaviour can be customized by passing a custom targets constructor parameter (by either passing a list of targets or the index of the Tensor to use).

The cell below shows the content of the target field of the dataset created in the cell above. Notice that the targets field has been filled with the content of the second Tensor (y_data).

# Check the targets field
print('y_data=', y_data)
 # Output: "y_data= tensor([4, 3, 3, 2, 0, 1, 3, 3, 3, 2])"

print('targets field=', avl_tensor_dataset.targets)
# Output: "targets field= [tensor(4), tensor(3), tensor(3), tensor(2), 
#          tensor(0), tensor(1), tensor(3), tensor(3), tensor(3), tensor(2)]"

The AvalancheSubset and AvalancheConcatDataset classes

Avalanche offers the AvalancheSubset and AvalancheConcatDataset implementations that extend the functionalities of PyTorch Subset and ConcatDataset.

Regarding the subsetting operation, AvalancheSubset behaves in the same way the PyTorch Subset class does: both implementations accept a dataset and a list of indices as parameters. The resulting Subset is not a copy of the dataset, it's just a view. This is similar to creating a view of a NumPy array by passing a list of indexes using the numpy_array[list_of_indices] syntax. This can be used to both create a smaller dataset and to change the order of data points in the dataset.

Here we create a toy dataset in which each X and Y values are ints. We then obtain a subset of it by creating an AvalancheSubset:

from avalanche.benchmarks.utils import AvalancheSubset

# Define the X values of 10 instances (each instance is an int)
x_data_toy = [50, 51, 52, 53, 54, 55, 56, 57, 58, 59]

# Define the class labels for the 10 instances
y_data_toy = [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

# Create  the tensor dataset
# Note: AvalancheSubset can also be applied to PyTorch TensorDataset directly!
# However, note that PyTorch TensorDataset doesn't support Python lists...
# ... (it only supports Tensors) while AvalancheTensorDataset does.
toy_dataset = AvalancheTensorDataset(x_data_toy, y_data_toy) 

# Define the indices for the subset
# Here we want to obtain a subset containing only the data points...
# ... at indices 0, 5, 8, 2 (in this specific order)
subset_indices = [0, 5, 8, 2]

# Create the subset
avl_subset = AvalancheSubset(toy_dataset, indices=subset_indices)
print('The subset contains', len(avl_subset), 'instances.')
# Output: "The subset contains 4 instances."

# Obtain instances from the AvalancheSubset
for x, y, t in avl_subset:
    print(f'x={x}, y={y}, t={t}')
# Output:
# x=50, y=10, t=0
# x=55, y=15, t=0
# x=58, y=18, t=0
# x=52, y=12, t=0

Concatenation is even simpler. Just like with PyTorch ConcatDataset, one can easily concatentate datasets with AvalancheConcatDataset.

Both AvalancheConcatDataset and PyTorch ConcatDataset accept a list of datasets to concatenate.

from avalanche.benchmarks.utils import AvalancheConcatDataset

# Define the 2 datasets to be concatenated
x_data_toy_1 = [50, 51, 52, 53, 54]
y_data_toy_1 = [10, 11, 12, 13, 14]
x_data_toy_2 = [60, 61, 62, 63, 64]
y_data_toy_2 = [20, 21, 22, 23, 24]

# Create the datasets
toy_dataset_1 = AvalancheTensorDataset(x_data_toy_1, y_data_toy_1) 
toy_dataset_2 = AvalancheTensorDataset(x_data_toy_2, y_data_toy_2) 

# Create the concat dataset
avl_concat = AvalancheConcatDataset([toy_dataset_1, toy_dataset_2])
print('The concat dataset contains', len(avl_concat), 'instances.')
# Output: "The concat dataset contains 10 instances."

# Obtain instances from the AvalancheConcatDataset
for x, y, t in avl_concat:
    print(f'x={x}, y={y}, t={t}')
# Output:
# x=51, y=11, t=0
# x=52, y=12, t=0
# x=53, y=13, t=0
# x=54, y=14, t=0
# x=60, y=20, t=0
# x=61, y=21, t=0
# x=62, y=22, t=0
# x=63, y=23, t=0
# x=64, y=24, t=0

Dataset Creation wrap-up

This Mini How-To showed you how to create instances of AvalancheDataset (and its subclasses).

Other Mini How-Tos will guide you through the functionalities offered by AvalancheDataset. The list of Mini How-Tos can be found here.

🤝 Run it on Google Colab

Advanced Transformations

Dealing with transformations (groups, appending, replacing, freezing).

AvalancheDataset (and its subclasses like the AvalancheTensor/Subset/ConcatDataset) allow for a finer control over transformations. While torchvision (and other) datasets allow for a minimal mechanism to apply transformations, with AvalancheDataset one can:

Have multiple transformation "groups" in the same dataset (like separated train and test transformations).
Append, replace and remove transformations, even by using nested Subset/Concat Datasets.
Freeze transformations, so that they can't be changed.

The following sub-sections show examples on how to use these features. Please note that all the constructor parameters and the methods described in this How-To can be used on AvalancheDataset subclasses as well. For more info on all the available subclasses, refer to .

It is warmly recommended to run this page as a notebook using Colab (info at the bottom of this page).

Let's start by installing Avalanche:

Transformation groups

AvalancheDatasets can contain multiple transformation groups. This can be useful to keep train and test transformations in the same dataset and to have different set of transformations. This may come in handy in many situations (for instance, to apply ad-hoc transformations to replay data).

As in torchvision datasets, AvalancheDataset supports the two kind of transformations: the transform, which is applied to X values, and the target_transform, which is applied to Y values. The latter is rarely used. This means that a transformation group is a pair of transformations to be applied to the X and Y values of each instance returned by the dataset. In both torchvision and Avalanche implementations, a transformation must be a function (or other callable object) that accepts one input (the X or Y value) and outputs its transformed version. This pair of functions is stored in the transform and target_transform fields of the dataset. A comprehensive guide on transformations can be found in the .

In the following example, a MNIST dataset is created and then wrapped in an AvalancheDataset. When creating the AvalancheDataset, we can set train and eval transformations by passing a transform_groups parameter. Train transformations usually include some form of random augmentation, while eval transformations usually include a sequence of deterministic transformations only. Here we define the sequence of train transformations as a random rotation followed by the ToTensor operation. The eval transformations only include the ToTensor operation.

Of course, one can also just use the transform and target_transform constructor parameters to set the transformations for both the train and the eval groups. However, it is recommended to use the approach based on transform_groups (shown in the code above) as it is much more flexible.

Using `.train()` and `.eval()`

The default behaviour of the AvalancheDataset is to use transformations from the train group. However, one can easily obtain a version of the dataset where the eval group is used. Note: when obtaining the dataset of experiences from the test stream, those datasets will already be using the eval group of transformations so you don't need to switch to the eval group ;).

As noted before, transformations for the current group are loaded in the transform and target_transform fields. These fields can be changed directly, but this is NOT recommended, as this will not create a copy of the dataset and may probably affect other parts of the code in which the dataset is used.

The recommended way to switch between the train and eval groups is to use the .train() and .eval() methods to obtain a copy (view) of the dataset with the proper transformations enabled. This is another very handy feature of the AvalancheDataset: methods that manipulate the AvalancheDataset fields (and transformations) always create a view of the dataset. The original dataset is never changed.

In the following cell we use the avl_mnist_transform dataset created in the cells above. We first obtain a view of it in which eval transformations are enabled. Then, starting from this view, we obtain a version of it in which train transformations are enabled. We want to double-stress that .train() and .eval() never change the group of the dataset on which they are called: they always create a view.

One can check that the correct transformation group is in use by looking at the content of the transform/target_transform fields.

Custom transformation groups

In AvalancheDatasets the train and eval transformation groups are always available. However, AvalancheDataset also supports custom transformation groups.

The following example shows how to create an AvalancheDataset with an additional group named replay. We define the replay transformation as a random crop followed by the ToTensor operation.

However, once created the dataset will use the train group. There are two ways to switch to our custom group:

Set the group when creating the dataset using the initial_transform_group constructor parameter
Switch to the group using the .with_transforms(group_name) method

The .with_transforms(group_name) method behaves in the same way .train() and .eval() do by creating a view of the original dataset.

The following example shows how to use both methods:

Appending transformations

In the standard torchvision datasets the only way to append (that is, add a new transformation step to the list of existing one) is to change the transform field directly by doing something like this:

This solution has many huge drawbacks:

The transformation field of the dataset is changed directly. This will affect other parts of the code that use that dataset instance.
If the initial transform is None, then Compose will not complain, but the process will crash later (try it by yourself: replace the first element of Compose in cell above with None, then try obtaining a data point from the dataset).
If you need to change transformations only temporarly to do some specific things in a limited part of the code, then you need to store the previous set of transformations in some variable in order to switch back to them later.

AvalancheDataset offers a very simple method to append transformations without incurring in those issues. The .add_transforms(transform=None, target_transform=None) method will append the given transform(s) to the currently enabled transform group and will return a new (a view actually) dataset with given transformations appended to the existing ones. The original dataset is not affected. One can also use .add_transforms_to_group(group_name, transform, target_transform) to change transformations for a different group.

The next cell shows how to use .add_transforms(...) to append the to_append_transform transform defined in the cell above.

Note that by using .add_transforms(...):

The original dataset is not changed, which means that other parts of the code that use that dataset instance are not affected.
You don't need to worry about None transformations.
In order to revert to the original transformations you don't need to keep a copy of them: the original dataset is not affected!

Replacing transformations

The replacement operation follows the same idea (and benefits) of the append one. By using .replace_transforms(transform, target_transform) one can obtain a view of the original dataset in which the transformaations for the current group are replaced with the given ones. One may also change tranformations for other groups by passing the name of the group as the optional parameter group. As with any transform-related operation, the original dataset is not affected.

Note: one can use .replace_transforms(...) to remove previous transformations (by passing None as the new transform).

The following cell shows how to use .replace_transforms(...) to replace the transformations of the current group:

Freezing transformations

One last functionality regarding transformations is the ability to "freeze" transformations. Freezing transformations menas permanently glueing transformations to the dataset so that they can't be replaced or changed in any way (usually by mistake). Frozen transformations cannot be changed by using .replace_transforms(...) or even by changing the transform field directly.

One may wonder when this may come in handy... in fact, you will probably rarely need to freeze transformations. However, imagine having to instantiate the PermutedMNIST benchmark. You want the permutation transformation to not be changed by mistake. However, the end users do not know how the internal implementations of the benchmark works, so they may end up messing with those transformations. By freezing the permutation transformation, users cannot mess with it.

Transformations for all transform groups can be frozen at once by using .freeze_transforms(). Transformations can be frozen for a single group by using .freeze_group_transforms(group_name). As always, those methods return a view of the original dataset.

In this way, that transform can't be removed. However, remember that one can always append other transforms atop of frozen transforms.

The cell below shows that replace_transforms can't remove frozen transformations:

Transformations wrap-up

This completes the Mini How-To for the functionalities of the AvalancheDataset related to transformations.

Here you learned how to use transformation groups and how to append/replace/freeze transformations in a simple way.

🤝 Run it on Google Colab

Preamble: PyTorch Datasets

Few words about PyTorch Datasets

This short preamble will briefly go through the basic notions of Dataset offered natively by PyTorch. A solid grasp of these notions are needed to understand:

How PyTorch data loading works in general
How AvalancheDatasets differs from PyTorch Datasets

📚 Dataset: general definition

In PyTorch, a Dataset is a class exposing two methods:

__len__(), which returns the amount of instances in the dataset (as an int).
__getitem__(idx), which returns the data point at index idx.

In other words, a Dataset instance is just an object for which, similarly to a list, one can simply:

Obtain its length using the Python len(dataset) function.
Obtain a single data point using the x, y = dataset[idx] syntax.

PyTorch Datasets

The PyTorch library offers 4 Dataset implementations:

Dataset: an interface defining the __len__ and __getitem__ methods.
TensorDataset: instantiated by passing X and Y tensors. Each row of the X and Y tensors is interpreted as a data point. The __getitem__(idx) method will simply return the idx-th row of X and Y tensors.
ConcatDataset: instantiated by passing a list of datasets. The resulting dataset is a concatenation of those datasets.
Subset: instantiated by passing a dataset and a list of indices. The resulting dataset will only contain the data points described by that list of indices.

As explained in the mini How-Tos, Avalanche offers a customized version for all these 4 datasets.

Transformations

As explained in the mini How-Tos, the AvalancheDataset class implements a very rich and powerful set of functionalities for managing transformations.

Quick note on the IterableDataset class

DataLoader

The Dataset is a very simple object that only returns one data point given its index. In order to create minibatches and speed-up the data loading process, a DataLoader is required.

Preamble wrap-up

To wrap-up, let's see how the native, non-Avalanche, PyTorch components work in practice. In the following code we create a TensorDataset and then we load it in minibatches using a DataLoader.

import torch
from torch.utils.data.dataset import TensorDataset
from torch.utils.data.dataloader import DataLoader

# Create a dataset of 100 data points described by 22 features + 1 class label
x_data = torch.rand(100, 22)
y_data = torch.randint(0, 5, (100,))

# Create the Dataset
my_dataset = TensorDataset(x_data, y_data)

# Create the DataLoader
my_dataloader = DataLoader(my_dataset, batch_size=10, shuffle=True, num_workers=4)

# Run one epoch
for x_minibatch, y_minibatch in my_dataloader:
    print('Loaded minibatch of', len(x_minibatch), 'instances')
# Output: "Loaded minibatch of 10 instances" x10 times

Next steps

With these notions in mind, you can start start your journey on understanding the functionalities offered by the AvalancheDatasets by going through the Mini How-Tos.

Please refer to the list of the Mini How-Tos regarding AvalancheDatasets for a complete list. It is recommended to start with the "Creating AvalancheDatasets" Mini How-To.

🤝 Run it on Google Colab

You can run this chapter and play with it on Google Colaboratory by clicking here:

Creating AvalancheDatasets

Creation and manipulation of AvalancheDatasets and its subclasses.

The dataset carried by the experience.dataset field is always an AvalancheDataset.
Benchmark creation functions accept AvalancheDatasets to create benchmarks where a finer control over task labels is required.
Internally, benchmarks are created by manipulating AvalancheDatasets.

It is warmly recommended to run this page as a notebook using Colab (info at the bottom of this page).

Let's start by installing avalanche:

!pip install avalanche-lib

AvalancheDataset vs PyTorch Dataset

This mini How-To will guide you through the main ways used to instantiate an AvalancheDataset.

First thing: the base class AvalancheDataset is a wrapper for existing datasets. Only two things must be considered when wrapping an existing dataset:

Apart from the x and y values, the resulting AvalancheDataset will also return a third value: the task label (which defaults to 0).
The wrapped dataset must contain a valid targets field.

Avalanche exposes 4 classes of AvalancheDatasets which map exactly the 4 Dataset classes offered by PyTorch:

AvalancheDataset: the base class, which acts a wrapper to existing Dataset instances.
AvalancheTensorDataset: equivalent to PyTorch TesnsorDataset.
AvalancheSubset: equivalent to PyTorch Subset.
AvalancheConcatDataset: equivalent to PyTorch ConcatDataset.

🛠️ Create an AvalancheDataset

Given a dataset (like MNIST), an AvalancheDataset can be instantiated as follows:

from avalanche.benchmarks.utils import AvalancheDataset
from torchvision.datasets import MNIST

# Instantiate the MNIST train dataset from torchvision
mnist_dataset = MNIST('mnist_data', download=True)

# Create the AvalancheDataset
mnist_avalanche_dataset = AvalancheDataset(mnist_dataset)

# Obtain the first instance from the original dataset
x, y = mnist_dataset[0]
print(f'x={x}, y={y}')
# Output: "x=<PIL.Image.Image image mode=L size=28x28 at 0x7FBEDFDB2430>, y=5"

# Obtain the first instance from the AvalancheDataset
x, y, t = mnist_avalanche_dataset[0]
print(f'x={x}, y={y}, t={t}')
# Output: "x=<PIL.Image.Image image mode=L size=28x28 at 0x7FBEEFD3A850>, y=5, t=0"

Useful tip: if you are not sure if you are dealing with a PyTorch Dataset or an AvalancheDataset, or if you want to ignore task labels, you can use this syntax:

# You can use "x, y, *_" to manage both kinds of Datasets
x, y, *_ = mnist_dataset[0]  # OK
x, y, *_ = mnist_avalanche_dataset[0]  # OK

The AvalancheTensorDataset

The PyTorch TensorDataset is one of the most useful Dataset classes as it can be used to quickly prototype the data loading part of your code.

A TensorDataset can be wrapped in an AvalancheDataset just like any Dataset, but this is not much convenient, as shown below:

import torch
from torch.utils.data import TensorDataset


# Create 10 instances described by 7 features 
x_data = torch.rand(10, 7)

# Create the class labels for the 10 instances
y_data = torch.randint(0, 5, (10,))

# Create the tensor dataset
tensor_dataset = TensorDataset(x_data, y_data)

# Wrap it in an AvalancheDataset
wrapped_tensor_dataset = AvalancheDataset(tensor_dataset)

# Obtain the first instance from the dataset
x, y, t = wrapped_tensor_dataset[0]
print(f'x={x}, y={y}, t={t}')
# Output: "x=tensor([0.6329, 0.8495, 0.1853, 0.7254, 0.7893, 0.8079, 0.1106]), y=4, t=0"

Instead, it is recommended to use the AvalancheTensorDataset class to get the same result. In this way, you can just skip one intermediate step.

from avalanche.benchmarks.utils import AvalancheTensorDataset

# Create the tensor dataset
avl_tensor_dataset = AvalancheTensorDataset(x_data, y_data)

# Obtain the first instance from the AvalancheTensorDataset
x, y, t = avl_tensor_dataset[0]
print(f'x={x}, y={y}, t={t}')
# Output: "x=tensor([0.6329, 0.8495, 0.1853, 0.7254, 0.7893, 0.8079, 0.1106]), y=4, t=0"

The cell below shows the content of the target field of the dataset created in the cell above. Notice that the targets field has been filled with the content of the second Tensor (y_data).

# Check the targets field
print('y_data=', y_data)
 # Output: "y_data= tensor([4, 3, 3, 2, 0, 1, 3, 3, 3, 2])"

print('targets field=', avl_tensor_dataset.targets)
# Output: "targets field= [tensor(4), tensor(3), tensor(3), tensor(2), 
#          tensor(0), tensor(1), tensor(3), tensor(3), tensor(3), tensor(2)]"

The AvalancheSubset and AvalancheConcatDataset classes

Avalanche offers the AvalancheSubset and AvalancheConcatDataset implementations that extend the functionalities of PyTorch Subset and ConcatDataset.

Here we create a toy dataset in which each X and Y values are ints. We then obtain a subset of it by creating an AvalancheSubset:

from avalanche.benchmarks.utils import AvalancheSubset

# Define the X values of 10 instances (each instance is an int)
x_data_toy = [50, 51, 52, 53, 54, 55, 56, 57, 58, 59]

# Define the class labels for the 10 instances
y_data_toy = [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

# Create  the tensor dataset
# Note: AvalancheSubset can also be applied to PyTorch TensorDataset directly!
# However, note that PyTorch TensorDataset doesn't support Python lists...
# ... (it only supports Tensors) while AvalancheTensorDataset does.
toy_dataset = AvalancheTensorDataset(x_data_toy, y_data_toy) 

# Define the indices for the subset
# Here we want to obtain a subset containing only the data points...
# ... at indices 0, 5, 8, 2 (in this specific order)
subset_indices = [0, 5, 8, 2]

# Create the subset
avl_subset = AvalancheSubset(toy_dataset, indices=subset_indices)
print('The subset contains', len(avl_subset), 'instances.')
# Output: "The subset contains 4 instances."

# Obtain instances from the AvalancheSubset
for x, y, t in avl_subset:
    print(f'x={x}, y={y}, t={t}')
# Output:
# x=50, y=10, t=0
# x=55, y=15, t=0
# x=58, y=18, t=0
# x=52, y=12, t=0

Concatenation is even simpler. Just like with PyTorch ConcatDataset, one can easily concatentate datasets with AvalancheConcatDataset.

Both AvalancheConcatDataset and PyTorch ConcatDataset accept a list of datasets to concatenate.

from avalanche.benchmarks.utils import AvalancheConcatDataset

# Define the 2 datasets to be concatenated
x_data_toy_1 = [50, 51, 52, 53, 54]
y_data_toy_1 = [10, 11, 12, 13, 14]
x_data_toy_2 = [60, 61, 62, 63, 64]
y_data_toy_2 = [20, 21, 22, 23, 24]

# Create the datasets
toy_dataset_1 = AvalancheTensorDataset(x_data_toy_1, y_data_toy_1) 
toy_dataset_2 = AvalancheTensorDataset(x_data_toy_2, y_data_toy_2) 

# Create the concat dataset
avl_concat = AvalancheConcatDataset([toy_dataset_1, toy_dataset_2])
print('The concat dataset contains', len(avl_concat), 'instances.')
# Output: "The concat dataset contains 10 instances."

# Obtain instances from the AvalancheConcatDataset
for x, y, t in avl_concat:
    print(f'x={x}, y={y}, t={t}')
# Output:
# x=51, y=11, t=0
# x=52, y=12, t=0
# x=53, y=13, t=0
# x=54, y=14, t=0
# x=60, y=20, t=0
# x=61, y=21, t=0
# x=62, y=22, t=0
# x=63, y=23, t=0
# x=64, y=24, t=0

Dataset Creation wrap-up

This Mini How-To showed you how to create instances of AvalancheDataset (and its subclasses).

Other Mini How-Tos will guide you through the functionalities offered by AvalancheDataset. The list of Mini How-Tos can be found here.

🤝 Run it on Google Colab

You can run this chapter and play with it on Google Colaboratory by clicking here:

Advanced Transformations

Dealing with transformations (groups, appending, replacing, freezing).

Have multiple transformation "groups" in the same dataset (like separated train and test transformations).
Append, replace and remove transformations, even by using nested Subset/Concat Datasets.
Freeze transformations, so that they can't be changed.

It is warmly recommended to run this page as a notebook using Colab (info at the bottom of this page).

Let's start by installing Avalanche:

Transformation groups

# Not recommended: use transform_groups instead
avl_mnist_same_transforms =  AvalancheDataset(mnist_dataset, transform=train_transformation)

Using `.train()` and `.eval()`

One can check that the correct transformation group is in use by looking at the content of the transform/target_transform fields.

# Obtain a view of the dataset in which eval transformations are enabled
avl_mnist_eval = avl_mnist_transform.eval()

# Obtain a view of the dataset in which we get back to train transforms
# Basically, avl_mnist_transform ~= avl_mnist_train
avl_mnist_train = avl_mnist_eval.train()

# Check the current transformations function for the 3 datasets
print('Original dataset transformation:', avl_mnist_transform.transform)
# Output:
# Original dataset transformation: Compose(
#     RandomRotation(degrees=[-45.0, 45.0], interpolation=nearest, expand=False, fill=0)
#     ToTensor()
# )
print('--------------------------------')
print('Eval version of the dataset:', avl_mnist_eval.transform)
# Output: "Eval version of the dataset: ToTensor()"
print('--------------------------------')
print('Back to train transformations:', avl_mnist_train.transform)
# Output:
# Back to train transformations: Compose(
#     RandomRotation(degrees=[-45.0, 45.0], interpolation=nearest, expand=False, fill=0)
#     ToTensor()
# )

Custom transformation groups

In AvalancheDatasets the train and eval transformation groups are always available. However, AvalancheDataset also supports custom transformation groups.

The following example shows how to create an AvalancheDataset with an additional group named replay. We define the replay transformation as a random crop followed by the ToTensor operation.

replay_transform = transforms.Compose([
    transforms.RandomCrop(28, padding=4),
    transforms.ToTensor()
])

replay_target_transform = None

transform_groups_with_replay = {
    'train': (None, None),
    'eval': (None, None),
    'replay': (replay_transform, replay_target_transform)
}

AvalancheDataset(mnist_dataset, transform_groups=transform_groups_with_replay)

However, once created the dataset will use the train group. There are two ways to switch to our custom group:

Set the group when creating the dataset using the initial_transform_group constructor parameter
Switch to the group using the .with_transforms(group_name) method

The .with_transforms(group_name) method behaves in the same way .train() and .eval() do by creating a view of the original dataset.

The following example shows how to use both methods:

# Method 1: create the dataset with "replay" as the default group
avl_mnist_custom_transform_1 = AvalancheDataset(
    mnist_dataset,
    transform_groups=transform_groups_with_replay,
    initial_transform_group='replay')

print(avl_mnist_custom_transform_1.transform)

# Method 2: switch to "replay" using `.with_transforms(group_name)`
avl_mnist_custom_transform_not_enabled = AvalancheDataset(
    mnist_dataset,
    transform_groups=transform_groups_with_replay)

avl_mnist_custom_transform_2 = avl_mnist_custom_transform_not_enabled.with_transforms('replay')
print(avl_mnist_custom_transform_2.transform)

# Both prints output:
# Compose(
#     RandomCrop(size=(28, 28), padding=4)
#     ToTensor()
# )

Appending transformations

# Append a transform by using torchvision datasets (>>> DON'T DO THIS! <<<)

# Create the dataset
mnist_dataset_w_totensor = MNIST('mnist_data', download=True, transform=transforms.ToTensor())

# Append a transform
to_append_transform = transforms.RandomCrop(size=(28, 28), padding=4)
mnist_dataset_w_totensor.transform = transforms.Compose(
    [mnist_dataset_w_totensor.transform, to_append_transform]
)
print(mnist_dataset_w_totensor.transform)
# Prints:
# Compose(
#     ToTensor()
#     RandomCrop(size=(28, 28), padding=4)
# )

This solution has many huge drawbacks:

The transformation field of the dataset is changed directly. This will affect other parts of the code that use that dataset instance.
If the initial transform is None, then Compose will not complain, but the process will crash later (try it by yourself: replace the first element of Compose in cell above with None, then try obtaining a data point from the dataset).
If you need to change transformations only temporarly to do some specific things in a limited part of the code, then you need to store the previous set of transformations in some variable in order to switch back to them later.

The next cell shows how to use .add_transforms(...) to append the to_append_transform transform defined in the cell above.

# Create the dataset
avl_mnist = AvalancheDataset(MNIST('mnist_data', download=True), transform=transforms.ToTensor())

# Append a transformation. Simple as:
avl_mnist_appended_transform = avl_mnist.add_transforms(to_append_transform)

print('With appended transforms:', avl_mnist_appended_transform.transform)
# Prints:
# With appended transforms: Compose(
#     ToTensor()
#     RandomCrop(size=(28, 28), padding=4)
# )

# Check that the original dataset was not affected:
print('Original dataset:', avl_mnist.transform)
# Prints: "Original dataset: ToTensor()"

Note that by using .add_transforms(...):

The original dataset is not changed, which means that other parts of the code that use that dataset instance are not affected.
You don't need to worry about None transformations.
In order to revert to the original transformations you don't need to keep a copy of them: the original dataset is not affected!

Replacing transformations

Note: one can use .replace_transforms(...) to remove previous transformations (by passing None as the new transform).

The following cell shows how to use .replace_transforms(...) to replace the transformations of the current group:

new_transform = transforms.RandomCrop(size=(28, 28), padding=4)

# Append a transformation. Simple as:
avl_mnist_replaced_transform = avl_mnist.replace_transforms(new_transform, None)

print('With replaced transform:', avl_mnist_replaced_transform.transform)
# Prints: "With replaces transforms: RandomCrop(size=(28, 28), padding=4)"

# Check that the original dataset was not affected:
print('Original dataset:', avl_mnist.transform)
# Prints: "Original dataset: ToTensor()"

Freezing transformations

The cell below shows a simplified excerpt from the . First, a PixelsPermutation instance is created. That instance is a transformation that will permute the pixels of the input image. We then create the train end test sets. Once created, transformations for those datasets are frozen using .freeze_transforms().

from avalanche.benchmarks.classic.cmnist import PixelsPermutation
import numpy as np
import torch

# Instantiate MNIST train and test sets
mnist_train = MNIST('mnist_data', train=True, download=True)
mnist_test = MNIST('mnist_data', train=False, download=True)
    
# Define the transformation used to permute the pixels
rng_seed = 4321
rng_permute = np.random.RandomState(rng_seed)
idx_permute = torch.from_numpy(rng_permute.permutation(784)).type(torch.int64)
permutation_transform = PixelsPermutation(idx_permute)

# Define the transforms group
perm_group_transforms = dict(
    train=(permutation_transform, None),
    eval=(permutation_transform, None)
)

# Create the datasets and freeze transforms
# Note: one can call "freeze_transforms" on constructor result
# or you can do this in 2 steps. The result is the same (obviously).
# The next part show both ways:

# Train set
permuted_train_set = AvalancheDataset(
    mnist_train, 
    transform_groups=perm_group_transforms).freeze_transforms()

# Test set
permuted_test_set = AvalancheDataset(
    mnist_test, transform_groups=perm_group_transforms, 
    initial_transform_group='eval')
permuted_test_set = permuted_test_set.freeze_transforms()

In this way, that transform can't be removed. However, remember that one can always append other transforms atop of frozen transforms.

The cell below shows that replace_transforms can't remove frozen transformations:

# First, show that the image pixels are permuted
print('Before replace_transforms:')
display(permuted_train_set[0][0].resize((192, 192), 0))

# Try to remove the permutation
with_removed_transforms = permuted_train_set.replace_transforms(None, None)

print('After replace_transforms:')
display(permuted_train_set[0][0].resize((192, 192), 0))
display(with_removed_transforms[0][0].resize((192, 192), 0))

Transformations wrap-up

This completes the Mini How-To for the functionalities of the AvalancheDataset related to transformations.

Here you learned how to use transformation groups and how to append/replace/freeze transformations in a simple way.

Other Mini How-Tos will guide you through the other functionalities offered by the AvalancheDataset class. The list of Mini How-Tos can be found .

🤝 Run it on Google Colab

You can run this chapter and play with it on Google Colaboratory by clicking here:

AvalancheDataset

Preamble: PyTorch Datasets

📚 Dataset: general definition

PyTorch Datasets

Transformations

Quick note on the IterableDataset class

DataLoader

Preamble wrap-up

Next steps

🤝 Run it on Google Colab

Creating AvalancheDatasets

AvalancheDataset vs PyTorch Dataset

🛠️ Create an AvalancheDataset

The AvalancheTensorDataset

The AvalancheSubset and AvalancheConcatDataset classes

Dataset Creation wrap-up

🤝 Run it on Google Colab

Advanced Transformations

Transformation groups

Using .train() and .eval()

Custom transformation groups

Appending transformations

Replacing transformations

Freezing transformations

Transformations wrap-up

🤝 Run it on Google Colab

AvalancheDataset

Preamble: PyTorch Datasets

📚 Dataset: general definition

PyTorch Datasets

Transformations

Quick note on the IterableDataset class

DataLoader

Preamble wrap-up

Next steps

🤝 Run it on Google Colab

Creating AvalancheDatasets

AvalancheDataset vs PyTorch Dataset

🛠️ Create an AvalancheDataset

The AvalancheTensorDataset

The AvalancheSubset and AvalancheConcatDataset classes

Dataset Creation wrap-up

🤝 Run it on Google Colab

Advanced Transformations

Transformation groups

Using .train() and .eval()

Custom transformation groups

Appending transformations

Replacing transformations

Freezing transformations

Transformations wrap-up

🤝 Run it on Google Colab

Using `.train()` and `.eval()`

Using `.train()` and `.eval()`