How to implement replay and data loading
Avalanche provides several components that help you to balance data loading and implement rehearsal strategies.
Dataloaders are used to provide balancing between groups (e.g. tasks/classes/experiences). This is especially useful when you have unbalanced data.
Buffers are used to store data from the previous experiences. They are dynamic datasets with a fixed maximum size, and they can be updated with new data continuously.
Finally, Replay strategies implement rehearsal by using Avalanche's plugin system. Most rehearsal strategies use a custom dataloader to balance the buffer with the current experience and a buffer that is updated for each experience.
First, let's install Avalanche. You can skip this step if you have installed it already.
Avalanche dataloaders are simple iterators, located under avalanche.benchmarks.utils.data_loader
. Their interface is equivalent to pytorch's dataloaders. For example, GroupBalancedDataLoader
takes a sequence of datasets and iterates over them by providing balanced mini-batches, where the number of samples is split equally among groups. Internally, it instantiate a DataLoader
for each separate group. More specialized dataloaders exist such as TaskBalancedDataLoader
.
All the dataloaders accept keyword arguments (**kwargs
) that are passed directly to the dataloaders for each group.
Memory buffers store data up to a maximum capacity, and they implement policies to select which data to store and which the to remove when the buffer is full. They are available in the module avalanche.training.storage_policy
. The base class is the ExemplarsBuffer
, which implements two methods:
update(strategy)
- given the strategy's state it updates the buffer (using the data in strategy.experience.dataset
).
resize(strategy, new_size)
- updates the maximum size and updates the buffer accordingly.
The data can be access using the attribute buffer
.
At first, the buffer is empty. We can update it with data from a new experience.
Notice that we use a SimpleNamespace
because we want to use the buffer standalone, without instantiating an Avalanche strategy. Reservoir sampling requires only the experience
from the strategy's state.
Notice after each update some samples are substituted with new data. Reservoir sampling select these samples randomly.
Avalanche offers many more storage policies. For example, ParametricBuffer
is a buffer split into several groups according to the groupby
parameters (None
, 'class', 'task', 'experience'), and according to an optional ExemplarsSelectionStrategy
(random selection is the default choice).
The advantage of using grouping buffers is that you get a balanced rehearsal buffer. You can even access the groups separately with the buffer_groups
attribute. Combined with balanced dataloaders, you can ensure that the mini-batches stay balanced during training.
Avalanche's strategy plugins can be used to update the rehearsal buffer and set the dataloader. This allows to easily implement replay strategies:
And of course, we can use the plugin to train our continual model