Benchmarks

Create your Continual Learning Benchmark and Start Prototyping

Welcome to the "benchmarks" tutorial of the "From Zero to Hero" series. In this part we will present the functionalities offered by the Benchmarks module.

%pip install avalanche-lib==0.5

🎯 Nomenclature

Avalanche Benchmarks provide the data that you will for training and evaluating your model. Benchmarks have the following structure:

  • A Benchmark is a collection of streams. Most benchmarks have at least a train_stream and a test_stream;

  • A Stream is a sequence of Experiences. It can be a list or a generator;

  • An Experience contains all the information available at a certain time t;

  • AvalancheDataset is a wrapper of PyTorch datasets. It provides functionalities used by the training module, such as concatenation, subsampling, and management of augmentations.

πŸ“š The Benchmarks Module

The bechmarks module offers:

  • Datasets: Pytorch datasets are wrapped in an AvalancheDataset to provide additional functionality.

  • Classic Benchmarks: classic benchmarks used in CL litterature ready to be used with great flexibility.

  • Benchmarks Generators: a set of functions you can use to create your own benchmark and streams starting from any kind of data and scenario, such as class-incremental or task-incremental streams.

But let's see how we can use this module in practice!

πŸ–ΌοΈ Datasets

Let's start with the Datasets. When using Avalanche, your code will manipulate AvalancheDatasets. It is a wrapper compatible with pytorch and torchvision map-style datasets.

In this example we created a classification dataset. Avalanche expects an attribute targets for classification dataset, which is provided by MNIST and most classification datasets. Avalanche provides concatenation and subsampling, which also keep the dataset attributes consistent.

πŸ›οΈ Classic Benchmarks

Most benchmarks will provide two streams: the train_stream and test_stream. Often, these are two parallel streams of the same length, where each experience is sampled from the same distribution (e.g. same set of classes). Some benchmarks may have a single test experience with the whole test dataset.

Experiences provide all the information needed to update the model, such as the new batch of data, and they may be decorated with attributes that are helpful for training or logging purposes. Long streams can be generated on-the-fly to reduce memory requirements and avoiding long preprocessing time during the benchmark creation step.

We will use SplitMNIST, a popular CL benchmark which is the class-incremental version of MNIST.

🐣 Benchmarks Generators

The most basic way to create a benchmark is to use the benchmark_from_datasets method. It takes a list of datasets for each stream and returns a benchmark with the specified streams.

we can also split a validation stream from the training stream

Experience Attributes

The Continual Learning nomenclature is overloaded and quite confusing. Avalanche has its own nomenclature to provide consistent naming across the library. For example:

  • Task-awareness: a model is task-aware if it requires task labels. Avalanche benchmarks can have task labels to support this use case;

  • Online: online streams are streams with small experiences (e.g. 10 samples). They look exactly like their "large batches" counterpart, except for the fact that len(experience.dataset) is small;

  • Boundary-awareness: a model is boundary-aware if it requires boundary labels. Boundary-free models are also called task-free in the literature (there is not accepted nomenclature for "boundary-aware" models). We don't use this nomenclature because task and boundaries are different concepts in Avalanche. Avalanche benchmarks can have boundary labels to support this use case. Even for boundary-free models, Avalanche benchmarks can provide boundary labels to support evaluation metrics that require them;

  • Classification: classification is the most common CL setting. Avalanche adds class labels to experience to simplify the code of the user. Similarly, Avalanche datasets keep track of targets to support this use case.

Avalanche experiences can be decorated with different attributes depending on the specific setting. Classic benchmarks already provide the attributes you need. We will see some examples of attributes and generators in the remaining part of this tutorial.

One general aspects of experience attributes is that they may not always be available. Sometimes, a model can use task labels during training but not at evaluation time. Other times, the model should never use task lavels but you may still need them for evaluation purposes (to compute task-aware metrics). Avalanche experience have different modalities:

  • training mode

  • evaluation mode

  • logging mode

Each modality can provide access or mask some of the experience attributes. This mechanism allows you to easily add private attributes to the experience for logging purposes while ensuring that the model will not cheat by using that information.

Classification

classification benchmarks follow the ClassesTimeline protocol and provide attributes about the classes in the stream.

Task Labels

task-aware benchmarks add task labels, following the TaskAware protocol.

Online

To define online streams we need two things:

  • a mechanism to split a larger stream

  • attribute that indicate the boundaries (if necessary)

This is how you do it in Avalanche:

This completes the "Benchmark" tutorial for the "From Zero to Hero" series. We hope you enjoyed it!

🀝 Run it on Google Colab

You can run this chapter and play with it on Google Colaboratory: Open In Colab

Was this helpful?