Composer CLI
=================

Since version ``0.3``, the toolkit includes support for running ``Experiments``.
An **Experiment** represents a high-level use case, such as training a neural
network, in a compact form that allows for easily running the experiment and
variations of it with ease both locally, in the cloud and its variations.

Experiments
-----------

The following types of Experiments are available:

=====================================================================  ========
Tile class                                                             Description
=====================================================================  ========
:class:`~aihwkit.experiments.experiments.training.BasicTraining`       Simple training of a neural network
:class:`~aihwkit.experiments.experiments.training.BasicInferencing`    Simple inference of a neural network
=====================================================================  ========

Creating an Experiment
^^^^^^^^^^^^^^^^^^^^^^

A :class:`~aihwkit.experiments.experiments.training.BasicTraining`  Experiment can be created just by creating an instance of its class::

    from torchvision.datasets import FashionMNIST

    from torch.nn import Flatten, LogSoftmax, Sigmoid
    from aihwkit.nn import AnalogLinear, AnalogSequential

    from aihwkit.experiments import BasicTraining


    my_experiment = BasicTraining(
        dataset=FashionMNIST,
        model=AnalogSequential(
            Flatten(),
            AnalogLinear(784, 256, bias=True),
            Sigmoid(),
            AnalogLinear(256, 128, bias=True),
            Sigmoid(),
            AnalogLinear(128, 10, bias=True),
            LogSoftmax(dim=1)
        )

Similarly a :class:`~aihwkit.experiments.experiments.training.BasicInferencing`  Experiment can also be created by creating an instance of the class ::

    from torch.nn import (
    Flatten, LogSoftmax, MaxPool2d, Module, Tanh
    )
    from torchvision.datasets import FashionMNIST

    from aihwkit.nn import AnalogConv2dMapped, AnalogLinearMapped, AnalogSequential

    from aihwkit.experiments.experiments.inferencing import BasicInferencing

    DATASET = FashionMNIST
    MODEL = create_analog_lenet5_network()
    BATCH_SIZE = 8
    REPEATS = 2
    I_TIMES = 86400
    TEMPLATE_ID = 'hwa-trained-lenet5-mapped'

    my_experiment = BasicInferencing(
      dataset=DATASET,
      model = MODEL,
      batch_size = BATCH_SIZE,
      weight_template_id = TEMPLATE_ID,
      inference_repeats = REPEATS,
      inference_time = I_TIMES
    )


Each Experiment has its own attributes, providing sensible defaults as needed.
For example, the ``BasicTraining`` Experiment allows setting attributes that
define the characteristics of the training (``dataset``, ``model``,
``batch_size``, ``loss_function``, ``epochs``, ``learning_rate``).

Similarly the ``BasicInferencing`` Experiment allows setting attributes
that define the characteristics of the Inferencing experiment (``dataset``,
``model``, ``batch_size`` , ``inference_repeats`` , ``inference_time``)

The created Experiment contains the definition of the operation to be performed,
but is not executed automatically.  That is the role of the ``Runners``.

Runners
-------

A **Runner** is the object that controls the execution of an Experiment,
setting up the environment and providing a convenient way of starting it and
retrieving its results.

The following types of Runners are available:

===================================================================  ========
Tile class                                                           Description
===================================================================  ========
:class:`~aihwkit.experiments.runners.local.LocalRunner`              Runner for executing training experiments locally
:class:`~aihwkit.experiments.runners.cloud.CloudRunner`              Runner for executing training experiments in the cloud
:class:`~aihwkit.experiments.runners.i_local.InferenceLocalRunner`   Runner for executing inference experiments locally
:class:`~aihwkit.experiments.runners.i_cloud.InferenceCloudRunner`   Runner for executing inference experiments in the cloud
===================================================================  ========

Running an Experiment Locally
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In order to run an Experiment, the first step is creating the appropriate
runner, for executing a ``training exepriment`` locally we create :class:`~aihwkit.experiments.runners.local.LocalRunner`  ::

    from aihwkit.experiments.runners import LocalRunner

    my_runner = LocalRunner()

Similarly for executing a ``Inferencing Experimnet`` locally  we create :class:`~aihwkit.experiments.runners.i_local.InferenceLocalRunner` ::

    from aihwkit.experiments.runners import InferenceLocalRunner

    my_runner = InferenceLocalRunner()

.. note::

    Each runner has different configurations options depending on their type.
    For example, the ``LocalRunner`` has an option for setting the device where
    the model will be executed into, that can be used for using GPU::

        from torch import device as torch_device

        my_runner = LocalRunner(device=torch_device('cuda'))

    Similarly , the ``InferenceLocalRunner`` has also an option for setting the device
    when the model would be used for inferencing , for using the available GPU ::

        from torch import device as torch_device

        my_runner - InferenceLocalRunner(device=torch_device('cuda'))

Once the runner is created for either ``Training`` or ``Inferencing`` experiment , the Experiment can be executed via::

    result = my_runner.run(my_experiment)

This will start the desired experiment, and return the results of the
experiment - in the training case, a dictionary containing the metrics for each
epoch::

     print(result)

    [{
      'epoch': 0,
      'accuracy': 0.8289,
      'train_loss': 0.4497026850991666,
      'valid_loss': 0.07776954893999771
     },
     {
      'epoch': 1,
      'accuracy': 0.8299,
      'train_loss': 0.43052176381352103,
      'valid_loss': 0.07716381718227858
     },
     {
      'epoch': 2,
      'accuracy': 0.8392,
      'train_loss': 0.41551961805393445,
      'valid_loss': 0.07490375201140385
     },
     ...
    ]

The ``LocalRunner`` for ``Training`` experiment and ``InferenceLocalRunner`` for ``Inferencing`` experiment
will also print information by default while the experiment is being executed (for example, if running
the experiment in an interactive session, as a way of tracking progress). This can be turned off by the
``stdout`` argument to the ``run()`` function::

    result = my_runner.run(my_experiment, stdout=False)

.. note::

    The local runner for both ``Training`` and ``Inferencing`` type of experiments
    will automatically attempt to download the dataset if it
    is ``FashionMNIST`` or ``SVHN`` into a temporary folder. For other datasets,
    please ensure that the dataset is downloaded previously, using the
    ``dataset_root`` argument to indicate the location of the data files::

        result = my_runner.run(my_experiment, dataset_root='/some/path')

Cloud Runner
------------

Experiments can also be run in the cloud at our companion `AIHW Composer`_
application, that allows for executing the experiments remotely using hardware
acceleration and inspect the experiments and their results visually, along
other features.

Setting up your account
^^^^^^^^^^^^^^^^^^^^^^^

The integration is provided by a Python client included in ``aihwkit`` that
allows connecting to the `AIHW Composer`_ platform. In order to be able to
run experiments in the cloud:

1. Register in the platform and generate an `API token`_ in your user page.
   This token acts as the credentials for connecting with the application.

2. Store your credentials by creating a ``~/.config/aihwkit.conf`` file with
   the following contents, replacing ``YOUR_API_TOKEN`` with the string
   from the previous step::

    [cloud]
    api_token = YOUR_API_TOKEN

3. You may need to download the SSL certificates and add them to the certificate store.

   - https://cacerts.digicert.com/DigiCertGlobalRootCA.crt.pem
   - https://cacerts.digicert.com/DigiCertTLSRSASHA2562020CA1-1.crt.pem
   - Append the certificates to the cacert.pem file

.. note::

   You can run the following command to find the location of the cacert.pem file

   $ python -c "import certifi; print(certifi.where())“

Running an Experiment in the cloud
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Once your credentials are configured, running ``training`` experiments in the cloud can
be performed by using the ``CloudRunner``, in an analogous way as running
experiments locally::

    from aihwkit.experiments.runners import CloudRunner

    my_cloud_runner = CloudRunner()
    cloud_experiment = my_cloud_runner.run(my_experiment)

Similarly ``Inferencing`` experiment can also be performed in the cloud by using
the ``InferenceCloudRunner`` , in an analogous way as running experiments locally ::

    from aihwkit.experiments.runners import InferenceCloudRunner

    cloud_runner = InferenceCloudRunner()
    cloud_experiment = cloud_runner.run(my_experiment, analog_info,
                                    noise_model_info, name=NAME, device='gpu')

Instead of waiting for the experiment to be completed, the ``run()`` method
returns an object that represents a job in the cloud. As such, it has several
convenience methods:

Checking the status of a cloud experiment
"""""""""""""""""""""""""""""""""""""""""

The status of a cloud experiment for both ``Training`` and ``Inferencing`` experiments can be retrieved via::

    cloud_experiment.status()

The response will provide information about the cloud experiment:
    * ``WAITING``: if the experiment is waiting to be processed.
    * ``RUNNING``: when the experiment is being executed in the cloud.
    * ``COMPLETED``: if the experiment was executed successfully.
    * ``FAILED``: if there was an error during the execution of the experiment.

.. note::

    Some actions are only possible if the cloud experiment has finished
    successfully, for example, retrieving its results. Please also be mindful
    that some experiments can take a sizeable amount of time to be executed,
    specially during the initial versions of the platform.

Retrieving the results of a cloud experiment
""""""""""""""""""""""""""""""""""""""""""""

Once the cloud experiment (``Training`` or ``Inferencing``) completes its execution, its results can be retrieved
using::

    result = cloud_experiment.get_result()

This will display the result of executing the experiment, in a similar form as
the output of running an Experiment locally.

Retrieving the content of the experiment
""""""""""""""""""""""""""""""""""""""""

The Experiment can be retrieved using::

    experiment = cloud_experiment.get_experiment()

This will return a local Experiment (for example, a ``BasicTraining`` or ``BasicInferencing``) that
can be used locally and their properties inspected. In particular, the weights
of the model will reflect the results of the experiment.

Retrieving a previous cloud experiment
""""""""""""""""""""""""""""""""""""""

The list of experiments previously executed in the cloud can be retrieved via::

    cloud_experiments = my_cloud_runner.list_experiments()

Please see https://github.com/IBM/aihwkit/tree/master/notebooks/cli for the experiment example notebooks.


.. _AIHW Composer: https://aihw-composer.draco.res.ibm.com/
.. _API token: https://aihw-composer.draco.res.ibm.com/account