1 of 3

AvalancheDataset

Dealing with AvalancheDatasets

The AvalancheDataset is an implementation of the PyTorch Dataset class that comes with many useful out-of-the-box functionalities. For most users, the AvalancheDataset can be used as a plain PyTorch Dataset. For classification problems, AvalancheDataset return x, y, t elements (input, target, task label). However, the AvalancheDataset can be easily extended for any custom needs.

A serie of Mini How-Tos will guide you through the functionalities of the AvalancheDataset and its subclasses:

avalanche-datasets

Converting PyTorch Datasets to Avalanche Dataset

Datasets are a fundamental data structure for continual learning. Unlike offline training, in continual learning we often need to manipulate datasets to create streams, benchmarks, or to manage replay buffers. High-level utilities and predefined benchmarks already take care of the details for you, but you can easily manipulate the data yourself if you need to. These how-to will explain:

PyTorch datasets and data loading
How to instantiate Avalanche Datasets
AvalancheDataset features

In Avalanche, the AvalancheDataset is everywhere:

The dataset carried by the experience.dataset field is always an AvalancheDataset.
Many benchmark creation functions accept AvalancheDatasets to create benchmarks.
Avalanche benchmarks are created by manipulating AvalancheDatasets.
Replay buffers also use AvalancheDataset to easily concanate data and handle transformations.

📚 PyTorch Dataset: general definition

In PyTorch, a Dataset is a class exposing two methods:

__len__(), which returns the amount of instances in the dataset (as an int).
__getitem__(idx), which returns the data point at index idx.

In other words, a Dataset instance is just an object for which, similarly to a list, one can simply:

Obtain its length using the Python len(dataset) function.
Obtain a single data point using the x, y = dataset[idx] syntax.

The content of the dataset can be either loaded in memory when the dataset is instantiated (like the torchvision MNIST dataset does) or, for big datasets like ImageNet, the content is kept on disk, with the dataset keeping the list of files in an internal field. In this case, data is loaded from the storage on-the-fly when __getitem__(idx) is called. The way those things are managed is specific to each dataset implementation.

Quick note on the IterableDataset class

A variation of the standard Dataset exist in PyTorch: the IterableDataset. When using an IterableDataset, one can load the data points in a sequential way only (by using a tape-alike approach). The dataset[idx] syntax and len(dataset) function are not allowed. Avalanche does NOT support IterableDatasets. You shouldn't worry about this because, realistically, you will never encounter such datasets (at least in torchvision). If you need IterableDataset let us know and we will consider adding support for them.

How to Create an AvalancheDataset

To create an AvalancheDataset from a PyTorch you only need to pass the original data to the constructor as follows

!pip install avalanche-lib

import torch
from torch.utils.data.dataset import TensorDataset
from avalanche.benchmarks.utils import AvalancheDataset

# Create a dataset of 100 data points described by 22 features + 1 class label
x_data = torch.rand(100, 22)
y_data = torch.randint(0, 5, (100,))

# Create the Dataset
torch_data = TensorDataset(x_data, y_data)

avl_data = AvalancheDataset(torch_data)

The dataset is equivalent to the original one:

print(torch_data[0])
print(avl_data[0])

Classification Datasets

most of the time, you can also use one of the utility function in benchmark utils that also add attributes such as class and task labels to the dataset. For example, you can create a classification dataset using make_classification_dataset.

Classification dataset

returns triplets of the form <x, y, t>, where t is the task label (which defaults to 0).
The wrapped dataset must contain a valid targets field.

Avalanche provides some utility functions to create supervised classification datasets such as:

make_tensor_classification_dataset for tensor datasets all of these will automatically create the targets and targets_task_labels attributes.

from avalanche.benchmarks.utils import make_classification_dataset

# first, we add targets to the dataset. This will be used by the AvalancheDataset
# If possible, avalanche tries to extract the targets from the dataset.
# most datasets in torchvision already have a targets field so you don't need this step.
torch_data.targets = torch.randint(0, 5, (100,)).tolist()
tls = [0 for _ in range(100)] # one task label for each sample
sup_data = make_classification_dataset(torch_data, task_labels=tls)

DataLoader

Avalanche provides some custom dataloaders to sample in a task-balanced way or to balance the replay buffer and current data, but you can also use the standard pytorch DataLoader.

from torch.utils.data.dataloader import DataLoader

my_dataloader = DataLoader(avl_data, batch_size=10, shuffle=True)

# Run one epoch
for x_minibatch, y_minibatch in my_dataloader:
    print('Loaded minibatch of', len(x_minibatch), 'instances')
# Output: "Loaded minibatch of 10 instances" x10 times

Dataset Operations: Concatenation and SubSampling

While PyTorch provides two different classes for concatenation and subsampling (ConcatDataset and Subset), Avalanche implements them as dataset methods. These operations return a new dataset, leaving the original one unchanged.

cat_data = avl_data.concat(avl_data)
print(len(cat_data))  # 100 + 100 = 200
print(len(avl_data))  # 100, original data stays the same

sub_data = avl_data.subset(list(range(50)))
print(len(sub_data))  # 50
print(len(avl_data))  # 100, original data stays the same

Dataset Attributes

AvalancheDataset allows to add attributes to datasets. Attributes are named arrays that carry some information that is propagated by concatenation and subsampling operations. For example, classification datasets use this functionality to manage class and task labels.

tls = [0 for _ in range(100)] # one task label for each sample
sup_data = make_classification_dataset(torch_data, task_labels=tls)
print(sup_data.targets.name, len(sup_data.targets._data))
print(sup_data.targets_task_labels.name, len(sup_data.targets_task_labels._data))

# after subsampling
sub_data = sup_data.subset(range(10))
print(sub_data.targets.name, len(sub_data.targets._data))
print(sub_data.targets_task_labels.name, len(sub_data.targets_task_labels._data))

# after concat
cat_data = sup_data.concat(sup_data)
print(cat_data.targets.name, len(cat_data.targets._data))
print(cat_data.targets_task_labels.name, len(cat_data.targets_task_labels._data))

Thanks to DataAttributes, you can freely operate on your data (e.g. to manage a replay buffer) without losing class or task labels. This makes it easy to manage multi-task datasets or to balance datasets by class.

Transformations

Most datasets from the torchvision libraries (as well as datasets found "in the wild") allow for a transformation function to be passed to the dataset constructor. The support for transformations is not mandatory for a dataset, but it is quite common to support them. The transformation is used to process the X value of a data point before returning it. This is used to normalize values, apply augmentations, etcetera.

AvalancheDataset implements a very rich and powerful set of functionalities for managing transformation. You can learn more about it in the Advanced Transformations How-To.

Next steps

With these notions in mind, you can start start your journey on understanding the functionalities offered by the AvalancheDatasets by going through the Mini How-Tos.

Please refer to the list of the Mini How-Tos regarding AvalancheDatasets for a complete list. It is recommended to start with the "Creating AvalancheDatasets" Mini How-To.

🤝 Run it on Google Colab

avalanche-transformations

Dealing with transformations (groups, appending, replacing, freezing).

While torchvision (and other) datasets typically have a fixed set of transformations, AvalancheDataset also provides some additional functionalities. AvalancheDatasets can:

Have multiple transformation "groups" in the same dataset (like separate train and eval transformations).
Manipulate transformation by freezing, replacing and removing them.

The following sub-sections show examples on how to use these features. It is warmly recommended to run this page as a notebook using Colab (info at the bottom of this page).

Let's start by installing Avalanche:

!pip install avalanche-lib

Transformation groups

AvalancheDatasets can contain multiple transformation groups. This can be useful to keep train and test transformations in the same dataset and to have different sets of transformations. For instance, you can easily add ad-hoc transformations to using for replay data.

For classification dataset, we follow torchvision conventions. Therefore, make_classification_dataset supports transform, which is applied to input (X) values, and target_transform, which is applied to class labels (Y). The latter is rarely used. This means that a transformation group is a pair of transformations to be applied to the X and Y values of each instance returned by the dataset. In both torchvision and Avalanche implementations, a transformation must be a function (or other callable object) that accepts one input (the X or Y value) and outputs its transformed version. A comprehensive guide on transformations can be found in the torchvision documentation.

In the following example, a MNIST dataset is created and then wrapped in an AvalancheDataset. When creating the AvalancheDataset, we can set train and eval transformations by passing a transform_groups parameter. Train transformations usually include some form of random augmentation, while eval transformations usually include a sequence of deterministic transformations only. Here we define the sequence of train transformations as a random rotation followed by the ToTensor operation. The eval transformations only include the ToTensor operation.

from torchvision import transforms
from torchvision.datasets import MNIST
from avalanche.benchmarks.utils import make_classification_dataset

mnist_dataset = MNIST('mnist_data', download=True)

# Define the training transformation for X values
train_transformation = transforms.Compose([
    transforms.RandomRotation(45),
    transforms.ToTensor(),
])
# Define the training transformation for Y values (rarely used)
train_target_transformation = None

# Define the test transformation for X values
eval_transformation = transforms.ToTensor()
# Define the test transformation for Y values (rarely used)
eval_target_transformation = None

transform_groups = {
    'train': (train_transformation, train_target_transformation),
    'eval': (eval_transformation, eval_target_transformation)
}

avl_mnist_transform = make_classification_dataset(mnist_dataset, transform_groups=transform_groups)

Of course, one can also just use the transform and target_transform constructor parameters to set the transformations for both the train and the eval groups. However, it is recommended to use the approach based on transform_groups (shown in the code above) as it is much more flexible.

# Not recommended: use transform_groups instead
avl_mnist_same_transforms =  make_classification_dataset(mnist_dataset, transform=train_transformation)

Using `.train()` and `.eval()`

The default behaviour of the AvalancheDataset is to use transformations from the train group. However, one can easily obtain a version of the dataset where the eval group is used. Note: when obtaining the dataset of experiences from the test stream, those datasets will already be using the eval group of transformations so you don't need to switch to the eval group ;).

You can switch between the train and eval groups using the .train() and .eval() methods to obtain a copy (view) of the dataset with the proper transformations enabled. As a general rule, methods that manipulate the AvalancheDataset fields (and transformations) always create a view of the dataset. The original dataset is never changed.

In the following cell we use the avl_mnist_transform dataset created in the cells above. We first obtain a view of it in which eval transformations are enabled. Then, starting from this view, we obtain a version of it in which train transformations are enabled. We want to double-stress that .train() and .eval() never change the group of the dataset on which they are called: they always create a view.

One can check that the correct transformation group is in use by looking at the content of the transform/target_transform fields.

# Obtain a view of the dataset in which eval transformations are enabled
avl_mnist_eval = avl_mnist_transform.eval()

# Obtain a view of the dataset in which we get back to train transforms
# Basically, avl_mnist_transform ~= avl_mnist_train
avl_mnist_train = avl_mnist_eval.train()

# we are looking inside the dataset to check the transformations.
# in real code, you never need to do this ;)
cgroup = avl_mnist_train._transform_groups.current_group
print("Original dataset transformations: (train group by default)")
# notice that the original transform are unchanged.
print(avl_mnist_train._transform_groups.transform_groups[cgroup])

print("\neval mode dataset transformations:")
cgroup = avl_mnist_eval._transform_groups.current_group
print(avl_mnist_eval._transform_groups.transform_groups[cgroup])

print("\ntrain mode dataset transformations:")
cgroup = avl_mnist_train._transform_groups.current_group
print(avl_mnist_train._transform_groups.transform_groups[cgroup])

Custom transformation groups

In AvalancheDatasets the train and eval transformation groups are always available. However, AvalancheDataset also supports custom transformation groups.

The following example shows how to create an AvalancheDataset with an additional group named replay. We define the replay transformation as a random crop followed by the ToTensor operation.

from avalanche.benchmarks.utils import AvalancheDataset

replay_transform = transforms.Compose([
    transforms.RandomCrop(28, padding=4),
    transforms.ToTensor()
])

replay_target_transform = None

transform_groups_with_replay = {
    'train': (None, None),
    'eval': (None, None),
    'replay': (replay_transform, replay_target_transform)
}

AvalancheDataset(mnist_dataset, transform_groups=transform_groups_with_replay)

However, once created the dataset will use the train group. You can switch to the group using the .with_transforms(group_name) method. The .with_transforms(group_name) method behaves in the same way .train() and .eval() do by creating a view of the original dataset.

avl_mnist_custom_transform_not_enabled = AvalancheDataset(
    mnist_dataset,
    transform_groups=transform_groups_with_replay)

avl_mnist_custom_transform_2 = avl_mnist_custom_transform_not_enabled.with_transforms('replay')
cgroup = avl_mnist_custom_transform_2._transform_groups.current_group
print(avl_mnist_custom_transform_2._transform_groups.transform_groups[cgroup])

# prints output:
# Compose(
#     RandomCrop(size=(28, 28), padding=4)
#     ToTensor()
# )

Replacing transformations

The replacement operation follows the same idea (and benefits) of the append one. By using .replace_current_transform_group(transform, target_transform) one can obtain a view of the original dataset in which the transformaations for the current group are replaced with the given ones. One may also change tranformations for other groups by passing the name of the group as the optional parameter group. As with any transform-related operation, the original dataset is not affected.

Note: one can use .replace_transforms(...) to remove previous transformations (by passing None as the new transform).

The following cell shows how to use .replace_transforms(...) to replace the transformations of the current group:

avl_mnist = make_classification_dataset(mnist_dataset, transform_groups=transform_groups)
new_transform = transforms.RandomCrop(size=(28, 28), padding=4)

# Append a transformation. Simple as:
transform = (new_transform, None)
avl_mnist_replaced_transform = avl_mnist.replace_current_transform_group(transform)

cgroup = avl_mnist_replaced_transform._transform_groups.current_group
print('With replaced transform:', avl_mnist_replaced_transform._transform_groups.transform_groups[cgroup])
# Prints: "With replaces transforms: RandomCrop(size=(28, 28), padding=4)"

# Check that the original dataset was not affected:
cgroup = avl_mnist._transform_groups.current_group
print('Original dataset:', avl_mnist._transform_groups.transform_groups[cgroup])
# Prints: "Original dataset: ToTensor()"

Freezing transformations

One last functionality regarding transformations is the ability to "freeze" transformations. Freezing transformations menas permanently glueing transformations to the dataset so that they can't be replaced or changed in any way (usually by mistake). Frozen transformations cannot be changed by using .replace_transforms(...).

One may wonder when this may come in handy... in fact, you will probably rarely need to freeze transformations. However, imagine having to instantiate the PermutedMNIST benchmark. You want the permutation transformation to not be changed by mistake. However, the end users do not know how the internal implementations of the benchmark works, so they may end up messing with those transformations. By freezing the permutation transformation, users cannot mess with it.

Transformations for all transform groups can be frozen at once by using .freeze_transforms(). As always, those methods return a view of the original dataset.

The cell below shows a simplified excerpt from the PermutedMNIST benchmark implementation. First, a PixelsPermutation instance is created. That instance is a transformation that will permute the pixels of the input image. We then create the train end test sets. Once created, transformations for those datasets are frozen using .freeze_transforms().

from avalanche.benchmarks.classic.cmnist import PixelsPermutation
import numpy as np
import torch

# Instantiate MNIST train and test sets
mnist_train = MNIST('mnist_data', train=True, download=True)
mnist_test = MNIST('mnist_data', train=False, download=True)
    
# Define the transformation used to permute the pixels
rng_seed = 4321
rng_permute = np.random.RandomState(rng_seed)
idx_permute = torch.from_numpy(rng_permute.permutation(784)).type(torch.int64)
permutation_transform = PixelsPermutation(idx_permute)

# Define the transforms group
perm_group_transforms = dict(
    train=(permutation_transform, None),
    eval=(permutation_transform, None)
)

# Create the datasets and freeze transforms
# Note: one can call "freeze_transforms" on constructor result
# or you can do this in 2 steps. The result is the same (obviously).
# The next part show both ways:

# Train set
permuted_train_set = AvalancheDataset(
    mnist_train, 
    transform_groups=perm_group_transforms).freeze_transforms()

# Test set
permuted_test_set = AvalancheDataset(mnist_test, transform_groups=perm_group_transforms).eval()
permuted_test_set = permuted_test_set.freeze_transforms()

In this way, that transform can't be removed. However, remember that one can always append other transforms atop of frozen transforms.

The cell below shows that replace_transforms can't remove frozen transformations:

# First, show that the image pixels are permuted
print('Before replace_transforms:')
display(permuted_train_set[0][0].resize((192, 192), 0))

# Try to remove the permutation
with_removed_transforms = permuted_train_set.replace_current_transform_group((None, None))

print('After replace_transforms:')
display(permuted_train_set[0][0].resize((192, 192), 0))
display(with_removed_transforms[0][0].resize((192, 192), 0))

Transformations wrap-up

This completes the Mini How-To for the functionalities of the AvalancheDataset related to transformations.

Here you learned how to use transformation groups and how to append/replace/freeze transformations in a simple way.

Other Mini How-Tos will guide you through the other functionalities offered by the AvalancheDataset class. The list of Mini How-Tos can be found here.

🤝 Run it on Google Colab

avalanche-transformations

Dealing with transformations (groups, appending, replacing, freezing).

While torchvision (and other) datasets typically have a fixed set of transformations, AvalancheDataset also provides some additional functionalities. AvalancheDatasets can:

Have multiple transformation "groups" in the same dataset (like separate train and eval transformations).
Manipulate transformation by freezing, replacing and removing them.

The following sub-sections show examples on how to use these features. It is warmly recommended to run this page as a notebook using Colab (info at the bottom of this page).

Let's start by installing Avalanche:

!pip install avalanche-lib

Transformation groups

from torchvision import transforms
from torchvision.datasets import MNIST
from avalanche.benchmarks.utils import make_classification_dataset

mnist_dataset = MNIST('mnist_data', download=True)

# Define the training transformation for X values
train_transformation = transforms.Compose([
    transforms.RandomRotation(45),
    transforms.ToTensor(),
])
# Define the training transformation for Y values (rarely used)
train_target_transformation = None

# Define the test transformation for X values
eval_transformation = transforms.ToTensor()
# Define the test transformation for Y values (rarely used)
eval_target_transformation = None

transform_groups = {
    'train': (train_transformation, train_target_transformation),
    'eval': (eval_transformation, eval_target_transformation)
}

avl_mnist_transform = make_classification_dataset(mnist_dataset, transform_groups=transform_groups)

# Not recommended: use transform_groups instead
avl_mnist_same_transforms =  make_classification_dataset(mnist_dataset, transform=train_transformation)

Using `.train()` and `.eval()`

One can check that the correct transformation group is in use by looking at the content of the transform/target_transform fields.

# Obtain a view of the dataset in which eval transformations are enabled
avl_mnist_eval = avl_mnist_transform.eval()

# Obtain a view of the dataset in which we get back to train transforms
# Basically, avl_mnist_transform ~= avl_mnist_train
avl_mnist_train = avl_mnist_eval.train()

# we are looking inside the dataset to check the transformations.
# in real code, you never need to do this ;)
cgroup = avl_mnist_train._transform_groups.current_group
print("Original dataset transformations: (train group by default)")
# notice that the original transform are unchanged.
print(avl_mnist_train._transform_groups.transform_groups[cgroup])

print("\neval mode dataset transformations:")
cgroup = avl_mnist_eval._transform_groups.current_group
print(avl_mnist_eval._transform_groups.transform_groups[cgroup])

print("\ntrain mode dataset transformations:")
cgroup = avl_mnist_train._transform_groups.current_group
print(avl_mnist_train._transform_groups.transform_groups[cgroup])

Custom transformation groups

In AvalancheDatasets the train and eval transformation groups are always available. However, AvalancheDataset also supports custom transformation groups.

The following example shows how to create an AvalancheDataset with an additional group named replay. We define the replay transformation as a random crop followed by the ToTensor operation.

from avalanche.benchmarks.utils import AvalancheDataset

replay_transform = transforms.Compose([
    transforms.RandomCrop(28, padding=4),
    transforms.ToTensor()
])

replay_target_transform = None

transform_groups_with_replay = {
    'train': (None, None),
    'eval': (None, None),
    'replay': (replay_transform, replay_target_transform)
}

AvalancheDataset(mnist_dataset, transform_groups=transform_groups_with_replay)

avl_mnist_custom_transform_not_enabled = AvalancheDataset(
    mnist_dataset,
    transform_groups=transform_groups_with_replay)

avl_mnist_custom_transform_2 = avl_mnist_custom_transform_not_enabled.with_transforms('replay')
cgroup = avl_mnist_custom_transform_2._transform_groups.current_group
print(avl_mnist_custom_transform_2._transform_groups.transform_groups[cgroup])

# prints output:
# Compose(
#     RandomCrop(size=(28, 28), padding=4)
#     ToTensor()
# )

Replacing transformations

Note: one can use .replace_transforms(...) to remove previous transformations (by passing None as the new transform).

The following cell shows how to use .replace_transforms(...) to replace the transformations of the current group:

avl_mnist = make_classification_dataset(mnist_dataset, transform_groups=transform_groups)
new_transform = transforms.RandomCrop(size=(28, 28), padding=4)

# Append a transformation. Simple as:
transform = (new_transform, None)
avl_mnist_replaced_transform = avl_mnist.replace_current_transform_group(transform)

cgroup = avl_mnist_replaced_transform._transform_groups.current_group
print('With replaced transform:', avl_mnist_replaced_transform._transform_groups.transform_groups[cgroup])
# Prints: "With replaces transforms: RandomCrop(size=(28, 28), padding=4)"

# Check that the original dataset was not affected:
cgroup = avl_mnist._transform_groups.current_group
print('Original dataset:', avl_mnist._transform_groups.transform_groups[cgroup])
# Prints: "Original dataset: ToTensor()"

Freezing transformations

Transformations for all transform groups can be frozen at once by using .freeze_transforms(). As always, those methods return a view of the original dataset.

from avalanche.benchmarks.classic.cmnist import PixelsPermutation
import numpy as np
import torch

# Instantiate MNIST train and test sets
mnist_train = MNIST('mnist_data', train=True, download=True)
mnist_test = MNIST('mnist_data', train=False, download=True)
    
# Define the transformation used to permute the pixels
rng_seed = 4321
rng_permute = np.random.RandomState(rng_seed)
idx_permute = torch.from_numpy(rng_permute.permutation(784)).type(torch.int64)
permutation_transform = PixelsPermutation(idx_permute)

# Define the transforms group
perm_group_transforms = dict(
    train=(permutation_transform, None),
    eval=(permutation_transform, None)
)

# Create the datasets and freeze transforms
# Note: one can call "freeze_transforms" on constructor result
# or you can do this in 2 steps. The result is the same (obviously).
# The next part show both ways:

# Train set
permuted_train_set = AvalancheDataset(
    mnist_train, 
    transform_groups=perm_group_transforms).freeze_transforms()

# Test set
permuted_test_set = AvalancheDataset(mnist_test, transform_groups=perm_group_transforms).eval()
permuted_test_set = permuted_test_set.freeze_transforms()

In this way, that transform can't be removed. However, remember that one can always append other transforms atop of frozen transforms.

The cell below shows that replace_transforms can't remove frozen transformations:

# First, show that the image pixels are permuted
print('Before replace_transforms:')
display(permuted_train_set[0][0].resize((192, 192), 0))

# Try to remove the permutation
with_removed_transforms = permuted_train_set.replace_current_transform_group((None, None))

print('After replace_transforms:')
display(permuted_train_set[0][0].resize((192, 192), 0))
display(with_removed_transforms[0][0].resize((192, 192), 0))

Transformations wrap-up

This completes the Mini How-To for the functionalities of the AvalancheDataset related to transformations.

Here you learned how to use transformation groups and how to append/replace/freeze transformations in a simple way.

Other Mini How-Tos will guide you through the other functionalities offered by the AvalancheDataset class. The list of Mini How-Tos can be found here.

🤝 Run it on Google Colab

You can run this chapter and play with it on Google Colaboratory by clicking here:

avalanche-datasets

Converting PyTorch Datasets to Avalanche Dataset

PyTorch datasets and data loading
How to instantiate Avalanche Datasets
AvalancheDataset features

In Avalanche, the AvalancheDataset is everywhere:

The dataset carried by the experience.dataset field is always an AvalancheDataset.
Many benchmark creation functions accept AvalancheDatasets to create benchmarks.
Avalanche benchmarks are created by manipulating AvalancheDatasets.
Replay buffers also use AvalancheDataset to easily concanate data and handle transformations.

📚 PyTorch Dataset: general definition

In PyTorch, a Dataset is a class exposing two methods:

__len__(), which returns the amount of instances in the dataset (as an int).
__getitem__(idx), which returns the data point at index idx.

In other words, a Dataset instance is just an object for which, similarly to a list, one can simply:

Obtain its length using the Python len(dataset) function.
Obtain a single data point using the x, y = dataset[idx] syntax.

Quick note on the IterableDataset class

How to Create an AvalancheDataset

To create an AvalancheDataset from a PyTorch you only need to pass the original data to the constructor as follows

!pip install avalanche-lib

import torch
from torch.utils.data.dataset import TensorDataset
from avalanche.benchmarks.utils import AvalancheDataset

# Create a dataset of 100 data points described by 22 features + 1 class label
x_data = torch.rand(100, 22)
y_data = torch.randint(0, 5, (100,))

# Create the Dataset
torch_data = TensorDataset(x_data, y_data)

avl_data = AvalancheDataset(torch_data)

The dataset is equivalent to the original one:

print(torch_data[0])
print(avl_data[0])

Classification Datasets

Classification dataset

returns triplets of the form <x, y, t>, where t is the task label (which defaults to 0).
The wrapped dataset must contain a valid targets field.

Avalanche provides some utility functions to create supervised classification datasets such as:

make_tensor_classification_dataset for tensor datasets all of these will automatically create the targets and targets_task_labels attributes.

from avalanche.benchmarks.utils import make_classification_dataset

# first, we add targets to the dataset. This will be used by the AvalancheDataset
# If possible, avalanche tries to extract the targets from the dataset.
# most datasets in torchvision already have a targets field so you don't need this step.
torch_data.targets = torch.randint(0, 5, (100,)).tolist()
tls = [0 for _ in range(100)] # one task label for each sample
sup_data = make_classification_dataset(torch_data, task_labels=tls)

DataLoader

Avalanche provides some custom dataloaders to sample in a task-balanced way or to balance the replay buffer and current data, but you can also use the standard pytorch DataLoader.

from torch.utils.data.dataloader import DataLoader

my_dataloader = DataLoader(avl_data, batch_size=10, shuffle=True)

# Run one epoch
for x_minibatch, y_minibatch in my_dataloader:
    print('Loaded minibatch of', len(x_minibatch), 'instances')
# Output: "Loaded minibatch of 10 instances" x10 times

Dataset Operations: Concatenation and SubSampling

cat_data = avl_data.concat(avl_data)
print(len(cat_data))  # 100 + 100 = 200
print(len(avl_data))  # 100, original data stays the same

sub_data = avl_data.subset(list(range(50)))
print(len(sub_data))  # 50
print(len(avl_data))  # 100, original data stays the same

Dataset Attributes

tls = [0 for _ in range(100)] # one task label for each sample
sup_data = make_classification_dataset(torch_data, task_labels=tls)
print(sup_data.targets.name, len(sup_data.targets._data))
print(sup_data.targets_task_labels.name, len(sup_data.targets_task_labels._data))

# after subsampling
sub_data = sup_data.subset(range(10))
print(sub_data.targets.name, len(sub_data.targets._data))
print(sub_data.targets_task_labels.name, len(sub_data.targets_task_labels._data))

# after concat
cat_data = sup_data.concat(sup_data)
print(cat_data.targets.name, len(cat_data.targets._data))
print(cat_data.targets_task_labels.name, len(cat_data.targets_task_labels._data))

Transformations

AvalancheDataset implements a very rich and powerful set of functionalities for managing transformation. You can learn more about it in the Advanced Transformations How-To.

Next steps

With these notions in mind, you can start start your journey on understanding the functionalities offered by the AvalancheDatasets by going through the Mini How-Tos.

Please refer to the list of the Mini How-Tos regarding AvalancheDatasets for a complete list. It is recommended to start with the "Creating AvalancheDatasets" Mini How-To.

🤝 Run it on Google Colab

You can run this chapter and play with it on Google Colaboratory by clicking here:

AvalancheDataset

avalanche-datasets

📚 PyTorch Dataset: general definition

Quick note on the IterableDataset class

How to Create an AvalancheDataset

Classification Datasets

DataLoader

Dataset Operations: Concatenation and SubSampling

Dataset Attributes

Transformations

Next steps

🤝 Run it on Google Colab

avalanche-transformations

Transformation groups

Using .train() and .eval()

Custom transformation groups

Replacing transformations

Freezing transformations

Transformations wrap-up

🤝 Run it on Google Colab

AvalancheDataset

avalanche-transformations

Transformation groups

Using .train() and .eval()

Custom transformation groups

Replacing transformations

Freezing transformations

Transformations wrap-up

🤝 Run it on Google Colab

avalanche-datasets

📚 PyTorch Dataset: general definition

Quick note on the IterableDataset class

How to Create an AvalancheDataset

Classification Datasets

DataLoader

Dataset Operations: Concatenation and SubSampling

Dataset Attributes

Transformations

Next steps

🤝 Run it on Google Colab

Using `.train()` and `.eval()`

Using `.train()` and `.eval()`