Benchmarks
Create your Continual Learning Benchmark and Start Prototyping
Welcome to the "benchmarks" tutorial of the "From Zero to Hero" series. In this part we will present the functionalities offered by the Benchmarks
module.
🎯 Nomenclature
First off, let's clarify a bit the nomenclature we are going to use, introducing the following terms: Datasets
, Scenarios
, Benchmarks
and Generators
.
By
Dataset
we mean a collection of examples that can be used for training or testing purposes but not already organized to be processed as a stream of batches or tasks. Since Avalanche is based on Pytorch, our Datasets are torch.utils.Datasets objects.By
Scenario
we mean a particular setting, i.e. specificities about the continual stream of data, a continual learning algorithm will face.By
Benchmark
we mean a well-defined and carefully thought combination of a scenario with one or multiple datasets that we can use to asses our continual learning algorithms.By
Generator
we mean a function that given a specific scenario and a dataset can generate a Benchmark.
📚 The Benchmarks Module
The bechmarks
module offers 3 types of utils:
Datasets: all the Pytorch datasets plus additional ones prepared by our community and particularly interesting for continual learning.
Classic Benchmarks: classic benchmarks used in CL litterature ready to be used with great flexibility.
Benchmarks Generators: a set of functions you can use to create your own benchmark starting from any kind of data and scenario. In particular, we distinguish two type of generators:
Specific
andGeneric
. The first ones will let you create a benchmark based on a clear scenarios and Pytorch dataset(s); the latters, instead, are more generic and flexible, both in terms of scenario definition then in terms of type of data they can manage.Specific:
nc_benchmark: given one or multiple datasets it creates a benchmark instance based on scenarios where New Classes (NC) are encountered over time. Notable scenarios that can be created using this utility include Class-Incremental, Task-Incremental and Task-Agnostic scenarios.
ni_benchmark: it creates a benchmark instance based on scenarios where New Instances (NI), i.e. new examples of the same classes are encountered over time. Notable scenarios that can be created using this utility include Domain-Incremental scenarios.
Generic:
filelist_benchmark: It creates a benchmark instance given a list of filelists.
paths_benchmark: It creates a benchmark instance given a list of file paths and class labels.
tensors_benchmark: It creates a benchmark instance given a list of tensors.
dataset_benchmark: It creates a benchmark instance given a list of pytorch datasets.
But let's see how we can use this module in practice!
🖼️ Datasets
Let's start with the Datasets
. As we previously hinted, in Avalanche you'll find all the standard Pytorch Datasets available in the torchvision package as well as a few others that are useful for continual learning but not already officially available within the Pytorch ecosystem.
Of course also the basic utilities ImageFolder
and DatasetFolder
can be used. These are two classes that you can use to create a Pytorch Dataset directly from your files (following a particular structure). You can read more about these in the Pytorch official documentation here.
We also provide an additional FilelistDataset
and AvalancheDataset
classes. The former to construct a dataset from a filelist (caffe style) pointing to files anywhere on the disk. The latter to augment the basic Pytorch Dataset functionalities with an extention to better deal with a stack of transformations to be used during train and test.
🛠️ Benchmarks Basics
The Avalanche benchmarks (instances of the Scenario class), contains several attributes that characterize the benchmark. However, the most important ones are the train
and test streams
.
In Avalanche we often suppose to have access to these two parallel stream of data (even though some benchmarks may not provide such feature, but contain just a unique test set).
Each of these streams
are iterable, indexable and sliceable objects that are composed of unique experiences. Experiences are batch of data (or "tasks") that can be provided with or without a specific task label.
Efficiency
It is worth mentioning that all the data belonging to a stream are not loaded into the RAM beforehand. Avalanche actually loads the data when a specific mini-batches are requested at training/test time based on the policy defined by each Dataset
implementation.
This means that memory requirements are very low, while the speed is guaranteed by a multi-processing data loading system based on the one defined in Pytorch.
Scenarios
So, as we have seen, each scenario
object in Avalanche has several useful attributes that characterizes the benchmark, including the two important train
and test streams
. Let's check what you can get from a scenario object more in details:
Train and Test Streams
The train and test streams can be used for training and testing purposes, respectively. This is what you can do with these streams:
Experiences
Each stream can in turn be treated as an iterator that produces a unique experience
, containing all the useful data regarding a batch or task in the continual stream our algorithms will face. Check out how can you use these experiences below:
🏛️ Classic Benchmarks
Now that we know how our benchmarks work in general through scenarios, streams and experiences objects, in this section we are going to explore common benchmarks already available for you with one line of code yet flexible enough to allow proper tuning based on your needs:
Many of the classic benchmarks will download the original datasets they are based on automatically and put it under the "~/.avalanche/data"
directory.
How to Use the Benchmarks
Let's see now how we can use the classic benchmark or the ones that you can create through the generators (see next section). For example, let's try out the classic PermutedMNIST
benchmark (Task-Incremental scenario).
🐣 Benchmarks Generators
What if we want to create a new benchmark that is not present in the "Classic" ones? Well, in that case Avalanche offer a number of utilites that you can use to create your own benchmark with maximum flexibility: the benchmarks generators!
Specific Generators
The specific scenario generators are useful when starting from one or multiple Pytorch datasets you want to create a "New Instances" or "New Classes" benchmark: i.e. it supports the easy and flexible creation of a Domain-Incremental, Class-Incremental or Task-Incremental scenarios among others.
For the New Classes scenario you can use the following function:
nc_benchmark
for the New Instances:
ni_benchmark
Let's start by creating the MNIST dataset object as we would normally do in Pytorch:
Then we can, for example, create a new benchmark based on MNIST and the classic Domain-Incremental scenario:
Or, we can create a benchmark based on MNIST and the Class-Incremental (what's commonly referred to as "Split-MNIST" benchmark):
Generic Generators
Finally, if you cannot create your ideal benchmark since it does not fit well in the aforementioned new classes or new instances scenarios, you can always use our generic generators:
filelist_benchmark
paths_benchmark
dataset_benchmark
tensors_benchmark
Let's start with the filelist_benchmark
utility. This function is particularly useful when it is important to preserve a particular order of the patterns to be processed (for example if they are frames of a video), or in general if we have data scattered around our drive and we want to create a sequence of batches/tasks providing only a txt file containing the list of their paths.
For Avalanche we follow the same format of the Caffe filelists ("path class_label"):
/path/to/a/file.jpg 0 /path/to/another/file.jpg 0 ... /path/to/another/file.jpg M /path/to/another/file.jpg M ... /path/to/another/file.jpg N /path/to/another/file.jpg N
So let's download the classic "Cats vs Dogs" dataset as an example:
You can now see in the content
directory on colab the image we downloaded. We are now going to create the filelists and then use the filelist_benchmark
function to create our benchmark:
In the previous cell we created a benchmark instance starting from file lists. However, paths_benchmark
is a better choice if you already have the list of paths directly loaded in memory:
Let us see how we can use the dataset_benchmark
utility, where we can use several PyTorch datasets as different batches or tasks. This utility expectes a list of datasets for the train, test (and other custom) streams. Each dataset will be used to create an experience:
Adding task labels can be achieved by wrapping each datasets using AvalancheDataset
. Apart from task labels, AvalancheDataset
allows for more control over transformations and offers an ever growing set of utilities (check the documentation for more details).
And finally, the tensors_benchmark
generator:
This completes the "Benchmark" tutorial for the "From Zero to Hero" series. We hope you enjoyed it!
🤝 Run it on Google Colab
You can run this chapter and play with it on Google Colaboratory:
Last updated