aihwkit.simulator.tiles.transfer module
High level analog transfer tiles (analog).
- class aihwkit.simulator.tiles.transfer.TorchTransferTile(out_size, in_size, rpu_config, bias=False, in_trans=False, out_trans=False)[source]
Bases:
TileModule
,TileWithPeriphery
,SimulatorTileWrapper
Transfer tile for in-memory gradient accumulation algorithms.
This is a (mostly) python re-implemetation of the
AnalogTile
withChoppedTransferCompound`
that is using the C++ RPUCuda library.Here only a subset of the parameters are implemented. However, all
ChoppedTransferCompound`
as well asDynamicTransferCompound`
are implemented here.Thus, TTv2, c-TTv2, and AGAD learning algorithms are implemented here.
Note
This implementation is for instructive use mostly. The C++ implementation has large speed advantage if the batch size is large and transfer is done multiple times per batch. For the torch implementation, at transfer_every needs to be larger or same size as the batch size, so that only one transfer is made per batch.
Caution
When using
model.analog_tiles()
generators, this parent tile as well as the children tiles will be looped over, which might cause in e.g. getting the same weight twice. This is becauseTorchTransferTile
is registered separately as anTileModule
to support the periphery, while internally two additional tiles are instantiated.Usage:
rpu_config = build_config('agad', device_config) # use the torch implementation tile instead of the default RPUCuda with AnalogTile rpu_config.tile_class = TorchTransferTile
- Parameters:
out_size (int) – output vector size of the tile, ie. the dimension of \(\mathbf{y}\) in case of \(\mathbf{y} = W\mathbf{x}\) (or equivalently the dimension of the \(\boldsymbol{\delta}\) of the backward pass).
in_size (int) – input vector size, ie. the dimension of the vector \(\mathbf{x}\) in case of \(\mathbf{y} = W\mathbf{x}\)).
rpu_config (InferenceRPUConfig | SingleRPUConfig | UnitCellRPUConfig | TorchInferenceRPUConfig | DigitalRankUpdateRPUConfig) – resistive processing unit configuration. This has to be of type
UnitCellRPUConfig
with a device compound derived fromChoppedTransferCompound`
.bias (bool) – whether to add a bias column to the tile, ie. \(W\) has an extra column to code the biases. This is not supported here.
in_trans (bool) – Whether to assume an transposed input (batch first). Not supported
out_trans (bool) – Whether to assume an transposed output (batch first). Not supported
- Raises:
ConfigError – if one of the not supported cases is used.
- forward(x_input, tensor_view=None)[source]
Torch forward function that calls the analog forward
- Parameters:
x_input (Tensor) –
tensor_view (Tuple | None) –
- Return type:
Tensor
- post_update_step()[source]
Operators that need to be called once per mini-batch.
Note
This function is called by the analog optimizer.
Caution
If no analog optimizer is used, the post update steps will not be performed.
- Return type:
None
- replace_with(rpu_config)[source]
Replacing the current RPUConfig is not supported.
- Parameters:
rpu_config (RPUConfigBase) – New RPUConfig to check against
- Raises:
TileModuleError – always
- Return type:
None
- supports_ddp: bool = False
- supports_indexed: bool = False
- class aihwkit.simulator.tiles.transfer.TransferSimulatorTile(x_size, d_size, rpu_config, dtype)[source]
Bases:
SimulatorTile
,Module
SimulatorTile for transfer.
The RPUCuda library is only used for the single-tile forward / backward / pulsed update, however, not for the transfer from the gradient tile to the actual weight tile. The transfer part is implemented in python mostly for illustrative purposes and to allow for flexible adjustments and development of new algorithms based on the Tiki-taka approach.
Note
Only a subset of parameter settings are supported.
Caution
The construction seed that is applied for both tiles when using
ChoppedTransferCompound
is here not applied, unless given explicitly for the unit cell devices.- Parameters:
out_size – output size
in_size – input size
rpu_config (UnitCellRPUConfig) – resistive processing unit configuration.
dtype (RPUDataType) – data type to use for the tiles.
x_size (int) –
d_size (int) –
- Raises:
ConfigError – in case a setting is not supported.
- backward(d_input, bias=False, in_trans=False, out_trans=False, non_blocking=False)[source]
Backward pass.
Only needs to be implemented if torch autograd is not used.
- Parameters:
d_input (Tensor) –
bias (bool) –
in_trans (bool) –
out_trans (bool) –
non_blocking (bool) –
- Return type:
Tensor
- dump_extra()[source]
Dumps any extra states / attributed necessary for checkpointing.
For Tiles based on Modules, this should be normally handled by torch automatically.
- Return type:
Dict | None
- forward(x_input, bias=False, in_trans=False, out_trans=False, is_test=False, non_blocking=False)[source]
General simulator tile forward.
- Parameters:
x_input (Tensor) –
bias (bool) –
in_trans (bool) –
out_trans (bool) –
is_test (bool) –
non_blocking (bool) –
- Return type:
Tensor
Get the hidden parameters names.
Each name corresponds to a slice in the Tensor slice of the
get_hidden_parameters
tensor.- Returns:
List of names.
- Return type:
List[str]
Get the hidden parameters of the tile.
- Returns:
Hidden parameter tensor.
- Return type:
Tensor
- get_learning_rate()[source]
Get the learning rate of the tile.
- Returns:
learning rate if exists.
- Return type:
float | None
- load_extra(extra, strict=False)[source]
Load any extra states / attributed necessary for loading from checkpoint.
For Tiles based on Modules, this should be normally handled by torch automatically.
Note
Expects the exact same RPUConfig / device etc for applying the states. Cross-loading of state-dicts is not supported for extra states, they will be just ignored.
- Parameters:
extra (Dict) – dictionary of states from dump_extra.
strict (bool) – Whether to throw an error if keys are not found.
- Raises:
RuntimeError – in case keys are wrong
- Return type:
None
- post_update_step()[source]
Operators that need to be called once per mini-batch.
Note
This function is called by the analog optimizer.
Caution
If no analog optimizer is used, the post update steps will not be performed.
- Return type:
None
Set the hidden parameters of the tile.
- Parameters:
params (Tensor) –
- Return type:
None
- set_learning_rate(learning_rate)[source]
Set the learning rate of the tile.
No-op for tiles that do not need a learning rate.
- Parameters:
rate (learning) – learning rate to set
learning_rate (float | None) –
- Return type:
None
- set_weights(weight)[source]
Stets the analog weights.
- Parameters:
weight (Tensor) –
- Return type:
None