Composer CLI

Since version 0.3, the toolkit includes support for running Experiments. An Experiment represents a high-level use case, such as training a neural network, in a compact form that allows for easily running the experiment and variations of it with ease both locally, in the cloud and its variations.

Experiments

The following types of Experiments are available:

Tile class	Description
`BasicTraining`	Simple training of a neural network
`BasicInferencing`	Simple inference of a neural network

Creating an Experiment

A BasicTraining Experiment can be created just by creating an instance of its class:

from torchvision.datasets import FashionMNIST

from torch.nn import Flatten, LogSoftmax, Sigmoid
from aihwkit.nn import AnalogLinear, AnalogSequential

from aihwkit.experiments import BasicTraining


my_experiment = BasicTraining(
    dataset=FashionMNIST,
    model=AnalogSequential(
        Flatten(),
        AnalogLinear(784, 256, bias=True),
        Sigmoid(),
        AnalogLinear(256, 128, bias=True),
        Sigmoid(),
        AnalogLinear(128, 10, bias=True),
        LogSoftmax(dim=1)
    )

Similarly a BasicInferencing Experiment can also be created by creating an instance of the class

from torch.nn import (
Flatten, LogSoftmax, MaxPool2d, Module, Tanh
)
from torchvision.datasets import FashionMNIST

from aihwkit.nn import AnalogConv2dMapped, AnalogLinearMapped, AnalogSequential

from aihwkit.experiments.experiments.inferencing import BasicInferencing

DATASET = FashionMNIST
MODEL = create_analog_lenet5_network()
BATCH_SIZE = 8
REPEATS = 2
I_TIMES = 86400
TEMPLATE_ID = 'hwa-trained-lenet5-mapped'

my_experiment = BasicInferencing(
  dataset=DATASET,
  model = MODEL,
  batch_size = BATCH_SIZE,
  weight_template_id = TEMPLATE_ID,
  inference_repeats = REPEATS,
  inference_time = I_TIMES
)

Each Experiment has its own attributes, providing sensible defaults as needed. For example, the BasicTraining Experiment allows setting attributes that define the characteristics of the training (dataset, model, batch_size, loss_function, epochs, learning_rate).

Similarly the BasicInferencing Experiment allows setting attributes that define the characteristics of the Inferencing experiment (dataset, model, batch_size , inference_repeats , inference_time)

The created Experiment contains the definition of the operation to be performed, but is not executed automatically. That is the role of the Runners.

Runners

A Runner is the object that controls the execution of an Experiment, setting up the environment and providing a convenient way of starting it and retrieving its results.

The following types of Runners are available:

Tile class	Description
`LocalRunner`	Runner for executing training experiments locally
`CloudRunner`	Runner for executing training experiments in the cloud
`InferenceLocalRunner`	Runner for executing inference experiments locally
`InferenceCloudRunner`	Runner for executing inference experiments in the cloud

Running an Experiment Locally

In order to run an Experiment, the first step is creating the appropriate runner, for executing a training exepriment locally we create LocalRunner

from aihwkit.experiments.runners import LocalRunner

my_runner = LocalRunner()

Similarly for executing a Inferencing Experimnet locally we create InferenceLocalRunner

from aihwkit.experiments.runners import InferenceLocalRunner

my_runner = InferenceLocalRunner()

Note

Each runner has different configurations options depending on their type. For example, the LocalRunner has an option for setting the device where the model will be executed into, that can be used for using GPU:

from torch import device as torch_device

my_runner = LocalRunner(device=torch_device('cuda'))

Similarly , the InferenceLocalRunner has also an option for setting the device when the model would be used for inferencing , for using the available GPU

from torch import device as torch_device

my_runner - InferenceLocalRunner(device=torch_device('cuda'))

Once the runner is created for either Training or Inferencing experiment , the Experiment can be executed via:

result = my_runner.run(my_experiment)

This will start the desired experiment, and return the results of the experiment - in the training case, a dictionary containing the metrics for each epoch:

 print(result)

[{
  'epoch': 0,
  'accuracy': 0.8289,
  'train_loss': 0.4497026850991666,
  'valid_loss': 0.07776954893999771
 },
 {
  'epoch': 1,
  'accuracy': 0.8299,
  'train_loss': 0.43052176381352103,
  'valid_loss': 0.07716381718227858
 },
 {
  'epoch': 2,
  'accuracy': 0.8392,
  'train_loss': 0.41551961805393445,
  'valid_loss': 0.07490375201140385
 },
 ...
]

The LocalRunner for Training experiment and InferenceLocalRunner for Inferencing experiment will also print information by default while the experiment is being executed (for example, if running the experiment in an interactive session, as a way of tracking progress). This can be turned off by the stdout argument to the run() function:

result = my_runner.run(my_experiment, stdout=False)

Note

The local runner for both Training and Inferencing type of experiments will automatically attempt to download the dataset if it is FashionMNIST or SVHN into a temporary folder. For other datasets, please ensure that the dataset is downloaded previously, using the dataset_root argument to indicate the location of the data files:

result = my_runner.run(my_experiment, dataset_root='/some/path')

Cloud Runner

Experiments can also be run in the cloud at our companion AIHW Composer application, that allows for executing the experiments remotely using hardware acceleration and inspect the experiments and their results visually, along other features.

Setting up your account

The integration is provided by a Python client included in aihwkit that allows connecting to the AIHW Composer platform. In order to be able to run experiments in the cloud:

Register in the platform and generate an API token in your user page. This token acts as the credentials for connecting with the application.
Store your credentials by creating a ~/.config/aihwkit.conf file with the following contents, replacing YOUR_API_TOKEN with the string from the previous step:
```
[cloud]
api_token = YOUR_API_TOKEN
```
You may need to download the SSL certificates and add them to the certificate store.
- https://cacerts.digicert.com/DigiCertGlobalRootCA.crt.pem
- https://cacerts.digicert.com/DigiCertTLSRSASHA2562020CA1-1.crt.pem
- Append the certificates to the cacert.pem file

Note

You can run the following command to find the location of the cacert.pem file

$ python -c “import certifi; print(certifi.where())“

Running an Experiment in the cloud

Once your credentials are configured, running training experiments in the cloud can be performed by using the CloudRunner, in an analogous way as running experiments locally:

from aihwkit.experiments.runners import CloudRunner

my_cloud_runner = CloudRunner()
cloud_experiment = my_cloud_runner.run(my_experiment)

Similarly Inferencing experiment can also be performed in the cloud by using the InferenceCloudRunner , in an analogous way as running experiments locally

from aihwkit.experiments.runners import InferenceCloudRunner

cloud_runner = InferenceCloudRunner()
cloud_experiment = cloud_runner.run(my_experiment, analog_info,
                                noise_model_info, name=NAME, device='gpu')

Instead of waiting for the experiment to be completed, the run() method returns an object that represents a job in the cloud. As such, it has several convenience methods:

Checking the status of a cloud experiment

The status of a cloud experiment for both Training and Inferencing experiments can be retrieved via:

cloud_experiment.status()

The response will provide information about the cloud experiment:

WAITING: if the experiment is waiting to be processed.
RUNNING: when the experiment is being executed in the cloud.
COMPLETED: if the experiment was executed successfully.
FAILED: if there was an error during the execution of the experiment.

Note

Some actions are only possible if the cloud experiment has finished successfully, for example, retrieving its results. Please also be mindful that some experiments can take a sizeable amount of time to be executed, specially during the initial versions of the platform.

Retrieving the results of a cloud experiment

Once the cloud experiment (Training or Inferencing) completes its execution, its results can be retrieved using:

result = cloud_experiment.get_result()

This will display the result of executing the experiment, in a similar form as the output of running an Experiment locally.

Retrieving the content of the experiment

The Experiment can be retrieved using:

experiment = cloud_experiment.get_experiment()

This will return a local Experiment (for example, a BasicTraining or BasicInferencing) that can be used locally and their properties inspected. In particular, the weights of the model will reflect the results of the experiment.

Retrieving a previous cloud experiment

The list of experiments previously executed in the cloud can be retrieved via:

cloud_experiments = my_cloud_runner.list_experiments()

Please see https://github.com/IBM/aihwkit/tree/master/notebooks/cli for the experiment example notebooks.