Datasetwe mean a collection of examples that can be used for training or testing purposes but not already organized to be processed as a stream of batches or tasks. Since Avalanche is based on Pytorch, our Datasets are torch.utils.Datasets objects.
Scenariowe mean a particular setting, i.e. specificities about the continual stream of data, a continual learning algorithm will face.
Benchmarkwe mean a well-defined and carefully thought combination of a scenario with one or multiple datasets that we can use to asses our continual learning algorithms.
Generatorwe mean a function that given a specific scenario and a dataset can generate a Benchmark.
bechmarksmodule offers 3 types of utils:
Generic. The first ones will let you create a benchmark based on a clear scenarios and Pytorch dataset(s); the latters, instead, are more generic and flexible, both in terms of scenario definition then in terms of type of data they can manage.
Datasets. As we previously hinted, in Avalanche you'll find all the standard Pytorch Datasets available in the torchvision package as well as a few others that are useful for continual learning but not already officially available within the Pytorch ecosystem.
DatasetFoldercan be used. These are two classes that you can use to create a Pytorch Dataset directly from your files (following a particular structure). You can read more about these in the Pytorch official documentation here.
AvalancheDatasetclasses. The former to construct a dataset from a filelist (caffe style) pointing to files anywhere on the disk. The latter to augment the basic Pytorch Dataset functionalities with an extention to better deal with a stack of transformations to be used during train and test.
streamsare iterable, indexable and sliceable objects that are composed of unique experiences. Experiences are batch of data (or "tasks") that can be provided with or without a specific task label.
scenarioobject in Avalanche has several useful attributes that characterizes the benchmark, including the two important
test streams. Let's check what you can get from a scenario object more in details:
experience, containing all the useful data regarding a batch or task in the continual stream our algorithms will face. Check out how can you use these experiences below:
PermutedMNISTbenchmark (Task-Incremental scenario).
filelist_benchmarkutility. This function is particularly useful when it is important to preserve a particular order of the patterns to be processed (for example if they are frames of a video), or in general if we have data scattered around our drive and we want to create a sequence of batches/tasks providing only a txt file containing the list of their paths.
contentdirectory on colab the image we downloaded. We are now going to create the filelists and then use the
filelist_benchmarkfunction to create our benchmark:
paths_benchmarkis a better choice if you already have the list of paths directly loaded in memory:
dataset_benchmarkutility, where we can use several PyTorch datasets as different batches or tasks. This utility expectes a list of datasets for the train, test (and other custom) streams. Each dataset will be used to create an experience:
AvalancheDataset. Apart from task labels,
AvalancheDatasetallows for more control over transformations and offers an ever growing set of utilities (check the documentation for more details).