Benchmarks
module.Datasets
, Scenarios
, Benchmarks
and Generators
.Dataset
we mean a collection of examples that can be used for training or testing purposes but not already organized to be processed as a stream of batches or tasks. Since Avalanche is based on Pytorch, our Datasets are torch.utils.Datasets objects.Scenario
we mean a particular setting, i.e. specificities about the continual stream of data, a continual learning algorithm will face.Benchmark
we mean a well-defined and carefully thought combination of a scenario with one or multiple datasets that we can use to asses our continual learning algorithms.Generator
we mean a function that given a specific scenario and a dataset can generate a Benchmark.bechmarks
module offers 3 types of utils:Specific
and Generic
. The first ones will let you create a benchmark based on a clear scenarios and Pytorch dataset(s); the latters, instead, are more generic and flexible, both in terms of scenario definition then in terms of type of data they can manage.Datasets
. As we previously hinted, in Avalanche you'll find all the standard Pytorch Datasets available in the torchvision package as well as a few others that are useful for continual learning but not already officially available within the Pytorch ecosystem.ImageFolder
and DatasetFolder
can be used. These are two classes that you can use to create a Pytorch Dataset directly from your files (following a particular structure). You can read more about these in the Pytorch official documentation here.FilelistDataset
and AvalancheDataset
classes. The former to construct a dataset from a filelist (caffe style) pointing to files anywhere on the disk. The latter to augment the basic Pytorch Dataset functionalities with an extention to better deal with a stack of transformations to be used during train and test.train
and test streams
.streams
are iterable, indexable and sliceable objects that are composed of unique experiences. Experiences are batch of data (or "tasks") that can be provided with or without a specific task label.Dataset
implementation.scenario
object in Avalanche has several useful attributes that characterizes the benchmark, including the two important train
and test streams
. Let's check what you can get from a scenario object more in details:experience
, containing all the useful data regarding a batch or task in the continual stream our algorithms will face. Check out how can you use these experiences below:"~/.avalanche/data"
directory.PermutedMNIST
benchmark (Task-Incremental scenario).nc_benchmark
ni_benchmark
filelist_benchmark
utility. This function is particularly useful when it is important to preserve a particular order of the patterns to be processed (for example if they are frames of a video), or in general if we have data scattered around our drive and we want to create a sequence of batches/tasks providing only a txt file containing the list of their paths.content
directory on colab the image we downloaded. We are now going to create the filelists and then use the filelist_benchmark
function to create our benchmark:paths_benchmark
is a better choice if you already have the list of paths directly loaded in memory:dataset_benchmark
utility, where we can use several PyTorch datasets as different batches or tasks. This utility expectes a list of datasets for the train, test (and other custom) streams. Each dataset will be used to create an experience:AvalancheDataset
. Apart from task labels, AvalancheDataset
allows for more control over transformations and offers an ever growing set of utilities (check the documentation for more details).tensors_benchmark
generator: