Evaluation
Automatic Evaluation with Pre-implemented Metrics
Welcome to the "Evaluation" tutorial of the "From Zero to Hero" series. In this part we will present the functionalities offered by the evaluation module.
!pip install avalanche-lib==0.2.1π The Evaluation Module
The evaluation module is quite straightforward: it offers all the basic functionalities to evaluate and keep track of a continual learning experiment.
This is mostly done through the Metrics: a set of classes which implement the main continual learning metrics computation like A_ccuracy_, F_orgetting_, M_emory Usage_, R_unning Times_, etc. At the moment, in Avalanche we offer a number of pre-implemented metrics you can use for your own experiments. We made sure to include all the major accuracy-based metrics but also the ones related to computation and memory.
Each metric comes with a standalone class and a set of plugin classes aimed at emitting metric values on specific moments during training and evaluation.
Standalone metric
As an example, the standalone Accuracy class can be used to monitor the average accuracy over a stream of <input,target> pairs. The class provides an update method to update the current average accuracy, a result method to print the current average accuracy and a reset method to set the current average accuracy to zero. The call to resultdoes not change the metric state.
The Accuracy metric requires the task_labels parameter, which specifies which task is associated with the current patterns. The metric returns a dictionary mapping task labels to accuracy values.
import torch
from avalanche.evaluation.metrics import Accuracy
task_labels = 0 # we will work with a single task
# create an instance of the standalone Accuracy metric
# initial accuracy is 0 for each task
acc_metric = Accuracy()
print("Initial Accuracy: ", acc_metric.result()) # output {}
# two consecutive metric updates
real_y = torch.tensor([1, 2]).long()
predicted_y = torch.tensor([1, 0]).float()
acc_metric.update(real_y, predicted_y, task_labels)
acc = acc_metric.result()
print("Average Accuracy: ", acc) # output 0.5 on task 0
predicted_y = torch.tensor([1,2]).float()
acc_metric.update(real_y, predicted_y, task_labels)
acc = acc_metric.result()
print("Average Accuracy: ", acc) # output 0.75 on task 0
# reset accuracy
acc_metric.reset()
print("After reset: ", acc_metric.result()) # output {}Plugin metric
If you want to integrate the available metrics automatically in the training and evaluation flow, you can use plugin metrics, like EpochAccuracy which logs the accuracy after each training epoch, or ExperienceAccuracy which logs the accuracy after each evaluation experience. Each of these metrics emits a curve composed by its values at different points in time (e.g. on different training epochs). In order to simplify the use of these metrics, we provided utility functions with which you can create different plugin metrics in one shot. The results of these functions can be passed as parameters directly to the EvaluationPlugin(see below).
πEvaluation Plugin
The Evaluation Plugin is the object in charge of configuring and controlling the evaluation procedure. This object can be passed to a Strategy as a "special" plugin through the evaluator attribute.
The Evaluation Plugin accepts as inputs the plugin metrics you want to track. In addition, you can add one or more loggers to print the metrics in different ways (on file, on standard output, on Tensorboard...).
It is also recommended to pass to the Evaluation Plugin the benchmark instance used in the experiment. This allows the plugin to check for consistency during metrics computation. For example, the Evaluation Plugin checks that the strategy.eval calls are performed on the same stream or sub-stream. Otherwise, same metric could refer to different portions of the stream.
These checks can be configured to raise errors (stopping computation) or only warnings.
Implement your own metric
To implement a standalone metric, you have to subclass Metric class.
To implement a plugin metric you have to subclass PluginMetric class
Accessing metric values
If you want to access all the metrics computed during training and evaluation, you have to make sure that collect_all=True is set when creating the EvaluationPlugin (default option is True). This option maintains an updated version of all metric results in the plugin, which can be retrieved by calling evaluation_plugin.get_all_metrics(). You can call this methods whenever you need the metrics.
The result is a dictionary with full metric names as keys and a tuple of two lists as values. The first list stores all the x values recorded for that metric. Each x value represents the time step at which the corresponding metric value has been computed. The second list stores metric values associated to the corresponding x value.
Alternatively, the train and eval method of every strategy returns a dictionary storing, for each metric, the last value recorded for that metric. You can use these dictionaries to incrementally accumulate metrics.
This completes the "Evaluation" tutorial for the "From Zero to Hero" series. We hope you enjoyed it!
π€ Run it on Google Colab
You can run this chapter and play with it on Google Colaboratory:
Last updated
Was this helpful?