aihwkit.simulator.tiles.transfer module

High level analog transfer tiles (analog).

class aihwkit.simulator.tiles.transfer.TorchTransferTile(out_size, in_size, rpu_config, bias=False, in_trans=False, out_trans=False)[source]

Bases: TileModule, TileWithPeriphery, SimulatorTileWrapper

Transfer tile for in-memory gradient accumulation algorithms.

This is a (mostly) python re-implemetation of the AnalogTile with ChoppedTransferCompound` that is using the C++ RPUCuda library.

Here only a subset of the parameters are implemented. However, all ChoppedTransferCompound` as well as DynamicTransferCompound` are implemented here.

Thus, TTv2, c-TTv2, and AGAD learning algorithms are implemented here.

Note

This implementation is for instructive use mostly. The C++ implementation has large speed advantage if the batch size is large and transfer is done multiple times per batch. For the torch implementation, at transfer_every needs to be larger or same size as the batch size, so that only one transfer is made per batch.

Caution

When using model.analog_tiles() generators, this parent tile as well as the children tiles will be looped over, which might cause in e.g. getting the same weight twice. This is because TorchTransferTile is registered separately as an TileModule to support the periphery, while internally two additional tiles are instantiated.

Usage:

rpu_config = build_config('agad', device_config)
# use the torch implementation tile instead of the default RPUCuda with AnalogTile
rpu_config.tile_class = TorchTransferTile

Parameters:

out_size (int) – output vector size of the tile, ie. the dimension of \(\mathbf{y}\) in case of \(\mathbf{y} = W\mathbf{x}\) (or equivalently the dimension of the \(\boldsymbol{\delta}\) of the backward pass).
in_size (int) – input vector size, ie. the dimension of the vector \(\mathbf{x}\) in case of \(\mathbf{y} = W\mathbf{x}\)).
rpu_config (InferenceRPUConfig | SingleRPUConfig | UnitCellRPUConfig | TorchInferenceRPUConfig | DigitalRankUpdateRPUConfig) – resistive processing unit configuration. This has to be of type UnitCellRPUConfig with a device compound derived from ChoppedTransferCompound`.
bias (bool) – whether to add a bias column to the tile, ie. \(W\) has an extra column to code the biases. This is not supported here.
in_trans (bool) – Whether to assume an transposed input (batch first). Not supported
out_trans (bool) – Whether to assume an transposed output (batch first). Not supported

Raises:

ConfigError – if one of the not supported cases is used.

forward(x_input, tensor_view=None)[source]

Torch forward function that calls the analog forward

Parameters:

x_input (Tensor) –
tensor_view (Tuple | None) –

Return type:

Tensor

post_update_step()[source]

Operators that need to be called once per mini-batch.

Note

This function is called by the analog optimizer.

Caution

If no analog optimizer is used, the post update steps will not be performed.

Return type:: None

replace_with(rpu_config)[source]

Replacing the current RPUConfig is not supported.

Parameters:: rpu_config (RPUConfigBase) – New RPUConfig to check against
Raises:: TileModuleError – always
Return type:: None

supports_ddp: bool = False

supports_indexed: bool = False

class aihwkit.simulator.tiles.transfer.TransferSimulatorTile(x_size, d_size, rpu_config, dtype)[source]

Bases: SimulatorTile, Module

SimulatorTile for transfer.

The RPUCuda library is only used for the single-tile forward / backward / pulsed update, however, not for the transfer from the gradient tile to the actual weight tile. The transfer part is implemented in python mostly for illustrative purposes and to allow for flexible adjustments and development of new algorithms based on the Tiki-taka approach.

Note

Only a subset of parameter settings are supported.

Caution

The construction seed that is applied for both tiles when using ChoppedTransferCompound is here not applied, unless given explicitly for the unit cell devices.

Parameters:

out_size – output size
in_size – input size
rpu_config (UnitCellRPUConfig) – resistive processing unit configuration.
dtype (RPUDataType) – data type to use for the tiles.
x_size (int) –
d_size (int) –

Raises:

ConfigError – in case a setting is not supported.

backward(d_input, bias=False, in_trans=False, out_trans=False, non_blocking=False)[source]

Backward pass.

Only needs to be implemented if torch autograd is not used.

Parameters:

d_input (Tensor) –
bias (bool) –
in_trans (bool) –
out_trans (bool) –
non_blocking (bool) –

Return type:

Tensor

dump_extra()[source]

Dumps any extra states / attributed necessary for checkpointing.

For Tiles based on Modules, this should be normally handled by torch automatically.

Return type:: Dict | None

forward(x_input, bias=False, in_trans=False, out_trans=False, is_test=False, non_blocking=False)[source]

General simulator tile forward.

Parameters:

x_input (Tensor) –
bias (bool) –
in_trans (bool) –
out_trans (bool) –
is_test (bool) –
non_blocking (bool) –

Return type:

Tensor

get_brief_info()[source]

Returns a brief info

Return type:: str

get_d_size()[source]

Returns output size of tile

Return type:: int

get_hidden_parameter_names()[source]

Get the hidden parameters names.

Each name corresponds to a slice in the Tensor slice of the get_hidden_parameters tensor.

Returns:: List of names.
Return type:: List[str]

get_hidden_parameters()[source]

Get the hidden parameters of the tile.

Returns:: Hidden parameter tensor.
Return type:: Tensor

get_learning_rate()[source]

Get the learning rate of the tile.

Returns:: learning rate if exists.
Return type:: float | None

get_meta_parameters()[source]

Returns meta parameters.

Return type:: Any

get_weights()[source]

Returns the analog weights.

Return type:: Tensor

get_x_size()[source]

Returns input size of tile

Return type:: int

load_extra(extra, strict=False)[source]

Load any extra states / attributed necessary for loading from checkpoint.

For Tiles based on Modules, this should be normally handled by torch automatically.

Note

Expects the exact same RPUConfig / device etc for applying the states. Cross-loading of state-dicts is not supported for extra states, they will be just ignored.

Parameters:

extra (Dict) – dictionary of states from dump_extra.
strict (bool) – Whether to throw an error if keys are not found.

Raises:

RuntimeError – in case keys are wrong

Return type:

None

post_update_step()[source]

Operators that need to be called once per mini-batch.

Note

This function is called by the analog optimizer.

Caution

If no analog optimizer is used, the post update steps will not be performed.

Return type:: None

set_hidden_parameters(params)[source]

Set the hidden parameters of the tile.

Parameters:: params (Tensor) –
Return type:: None

set_learning_rate(learning_rate)[source]

Set the learning rate of the tile.

No-op for tiles that do not need a learning rate.

Parameters:

rate (learning) – learning rate to set
learning_rate (float | None) –

Return type:

None

set_weights(weight)[source]

Stets the analog weights.

Parameters:: weight (Tensor) –
Return type:: None

set_weights_uniform_random(bmin, bmax)[source]

Sets the weights to uniform random numbers.

Parameters:

bmin (float) – min value
bmax (float) – max value

Return type:

None

update(x_input, d_input, bias=False, in_trans=False, out_trans=False, non_blocking=False)[source]

Transfer update.

Parameters:

x_input (Tensor) –
d_input (Tensor) –
bias (bool) –
in_trans (bool) –
out_trans (bool) –
non_blocking (bool) –

Return type:

Tensor