aihwkit.simulator.tiles.periphery module
Base tile with added periphery and common utility methods.
- class aihwkit.simulator.tiles.periphery.TileWithPeriphery[source]
Bases:
BaseTile
,SimulatorTileWrapper
Partial class for tile modules with periphery.
The function
joint_forward
should be called from the TileModule level.The class also implements the digital bias and adds output scales as well as mapping / reading / programming functionality. Additionally input range and output scale learning is implemented.
Note
This is only a partial class implementation for the periphery. All classes inherit from this need to also inherit from
TileModule
.All the module buffers and parameters will be handled by the TileModule.
- apply_input_range(values, update_from_data=False)[source]
Apply the input clipping.
- Parameters:
values (Tensor) – tensor to clip
update_from_data (bool) – whether to update from data if applicable
- Returns:
clipped output tensor
- Return type:
Tensor
- apply_out_scaling(values, tensor_view=None)[source]
Apply the learned out scaling to the given tensor.
- Parameters:
values (Tensor) – tensor to apply scaling to.
tensor_view (Tuple[int, ...] | None) – view to cast the out scalings before multiplication
- Returns:
output tensor with applied out scaling factors
- Return type:
Tensor
- apply_weight_scaling(combined_weights, weight_scaling_omega=None)[source]
Set the tile weights (and biases) in a scaled fashion.
Scales the weights by a layerwise scale or columnwise scale (if
weight_scaling_columnwise
is set), that is then applied in digital at the output of forward and backward pass, and the learning rate for this tile is adjusted accordingly.If layerwise scale is chosen, weights are scaled by \(\omega/\max_{ij} |w_{ij}|\) and the global digital factor \(alpha\) is set to \(\max_{ij} |w_{ij}|/\omega\).
It can be shown that such a constant factor greatly improves the SNR and training accuracy as the full weight range of the analog devices are used. See also Rasch, Gokmen & Haensch (2019) for more details.
- Parameters:
combined_weights (Tensor) –
[d_size, x_size]
weight matrix.weight_scaling_omega (float | None) – where the weight max should be mapped in terms of the weight range. Note that for
omega
larger than the maximal weight of the device, weights will get clipped for most devices. If this parameter is not given, it will default to theweight_scaling_omega
value set in theMappingParameter
of therpu_config
- Returns:
scaled weights.
- Return type:
Tensor
- backward(d_input, ctx=None)[source]
Perform the backward pass.
- Parameters:
d_input (Tensor) –
[N, out_size]
tensor. Ifout_trans
is set, transposed.ctx (Any | None) – torch auto-grad context [Optional]
- Returns:
[N, in_size]
tensor. Ifin_trans
is set, transposed.- Return type:
torch.Tensor
- backward_indexed(d_input, ctx=None)[source]
Perform the backward pass for convolutions.
Depending on the input tensor size it performs the backward pass for a 2D image or a 3D one.
- Parameters:
d_input (Tensor) –
[N, out_size]
tensor. Ifout_trans
is set, transposed.ctx (Any | None) – torch auto-grad context [Optional]
- Returns:
[N, in_size]
tensor. Ifin_trans
is set, transposed.- Return type:
torch.Tensor
- Raises:
TileError – if the indexed tile has not been initialized, or if
self.images_sizes
does not have a valid dimennion.
- cpu()[source]
Return a copy of this tile in CPU memory.
- Returns:
Self with the underlying buffers moved to CPU memory.
- Return type:
- cuda(device=None)[source]
Return a copy of this tile in CUDA memory.
- Parameters:
device (str | device | int | None) – CUDA device
- Returns:
Self with the underlying buffers to CUDA memory.
- Return type:
- get_learned_out_scales()[source]
Get the learned_out_scaled that can be used add an output scale to the weights, that is learned.
- Returns:
learned_out_scales
- Return type:
tensor
- get_learning_rate()[source]
Return the tile learning rate.
- Returns:
the tile learning rate.
- Return type:
float
- get_mapping_scales()[source]
Get the scales used for the weight mapping.
- Returns:
the vector (or scalar) that is used to determine the mapping into (norm) conductance units. These scales are used at the output of the analog MVM.
- Return type:
Mapping scales
- get_scales()[source]
Get all scales with a new scale.
- Returns:
Scale tensor if any scale exist else None.
- Return type:
Tensor | None
- get_weights(apply_weight_scaling=True, realistic=False)[source]
Get the tile weights (and biases).
Gets the tile weights and extracts the mathematical weight matrix and biases (if present, by determined by the
self.analog_bias
parameter).Note
The returned weight is a copy of the internal weights (not a pointer) and is always on CPU and detached.
Note
By default tis is not a hardware realistic weight readout. Use
read_weights()
for a realistic transfer.- Parameters:
apply_weight_scaling (bool) – Whether to return the weights with the (digital) output scaling factors applied. Note the “logical” weights of the layer which the DNN is effectively using are those with the output scales applied. If
apply_weight_scaling
is set to False, then only the weight values that is programmed onto the crossbar array are returned, without applying the digital scales.realistic (bool) – whether to enable realistic read/write for getting the weights. Internally calls read_weights.
- Returns:
a tuple where the first item is the
[out_size, in_size]
weight matrix; and the second item is either the[out_size]
bias vector orNone
if the tile is set not to use bias.- Return type:
Tuple[Tensor, Tensor | None]
- init_input_processing()[source]
Helper function to initialize the input processing.
Note
This method is called from the constructor.
- Returns:
whether input processing is enabled
- Raises:
ConfigError – in case
manage_output_clipping
is enabled but not supported.- Return type:
bool
- init_learned_out_scales()[source]
Helper function to initialize the learned out scaling used to scale the weights in digital.
Note
This method is called from the constructor.
- Return type:
None
- init_mapping_scales()[source]
Helper function to initialize the mapping scales used to scale the weights in digital and determine the conductance conversion.
Note
This method is called from the constructor.
- Return type:
None
- is_indexed()[source]
Returns whether index matrix for convolutions has been set.
- Returns:
Whether index matrix has been set
- Raises:
TileError – if has_matrix_indices method is not avialable
- Return type:
bool
- joint_forward(x_input, is_test=False, ctx=None)[source]
Perform the forward pass.
Calls first the
pre_forward
, then the tile forward, and finally thepost_forward
step.Caution
This will apply the (digital) mapping scales, but not the learnable out-scales which are handled in the forward pass of the module
Note
The full forward pass is not using autograd, thus all pre and post functions need to be handled appropriately in the pre/post backward functions.
- Parameters:
x_input (Tensor) –
[N, in_size]
tensor. Ifin_trans
is set, transposed.is_test (bool) – whether to assume testing mode.
ctx (Any | None) – torch auto-grad context [Optional]
- Returns:
[N, out_size]
tensor. Ifout_trans
is set, transposed.- Return type:
torch.Tensor
- joint_forward_indexed(x_input, is_test=False, ctx=None)[source]
Perform the forward pass for convolutions.
Depending on the input tensor size it performs the forward pass for a 2D image or a 3D one.
- Parameters:
x_input (Tensor) –
[N, in_size]
tensor. Ifin_trans
is set, transposed.is_test (bool) – whether to assume testing mode.
ctx (Any) – torch auto-grad context [Optional]
- Returns:
[N, out_size]
tensor. Ifout_trans
is set, transposed.- Return type:
torch.Tensor
- Raises:
TileError – if the indexed tile has not been initialized, or if
self.images_sizes
does not have a valid dimennion.
- post_backward(d_output, dim, ctx=None)[source]
Operations after the actual backward step for post processing.
Here, the mapping scales are applied if exist.
- Parameters:
d_output (Tensor) – The output tensor from the analog MVM of the tile.
dim (int) – the dim of the x_size dimension
ctx (Any | None) – torch auto-grad context [Optional]
- Returns:
The postprocessed tensor of the same shape
- Return type:
Tensor
- post_forward(x_output, dim, is_test=False, ctx=None)[source]
Operations after the actual forward step for post processing.
- Parameters:
x_output (Tensor) – tensor that is the output from the forward pass of the tile
dim (int) – output channel dimension, ie the d_size dimension
is_test (bool) – whether in eval mode
ctx (Any | None) – torch auto-grad context [Optional]
- Returns:
Output tensor of the same shape
- Return type:
Tensor
- pre_backward(d_input, dim, ctx=None)[source]
Operations before the actual backward step for pre processing.
By default, this is an no-op. However, it could be overridden in derived tile classes.
- Parameters:
d_input (Tensor) – The input tensor from to the analog MVM of the tile.
dim (int) – the dim of the d_size dimension
ctx (Any | None) – torch auto-grad context [Optional]
- Returns:
The preprocessed tensor of the same shape
- Return type:
Tensor
- pre_forward(x_input, dim, is_test=False, ctx=None)[source]
Operations before the actual forward step for pre processing.
By default, this is an no-op. However, it could be overridden in derived tile classes.
- Parameters:
x_input (Tensor) – input tensor for the analog MVM of the tile.
dim (int) – input channel dimension, ie the x_size dimension
is_test (bool) – whether in eval mode
ctx (Any | None) – torch auto-grad context [Optional]
- Returns:
Output tensor of the same shape
- Return type:
Tensor
- pre_update(x_input, x_dim, d_input, d_dim)[source]
Operations before the actual update step for pre processing.
By default, if the mapping scales are used, the
d_input
will be divided by the mapping scales to compensate for the conductance mapping.Caution
The
x_input
andd_input
here are the original inputs to theforward` and ``backward
methods, thus thepre_forward
andpre_backward
function are not applied, and might need to be applied again here.- Parameters:
x_input (Tensor) – The forward input tensor.
x_dim (int) – the dim of the x_size dimension of the forward input.
d_input (Tensor) – The backward (gradient) input tensor.
d_dim (int) – the dim of the d_size dimension of the backward input.
- Returns:
Tuple of the preprocessed x_input and d_input tensors of the same shape
- Return type:
Tuple[Tensor, Tensor]
- program_weights(from_reference=True, x_values=None, learning_rate=0.1, max_iter=10000, tolerance=0.01, w_init=0.01)[source]
Programm the target weights into the conductances using the pulse update defined.
Programming is done using the defined tile-update (e.g. SGD) and matching inputs (x_values by default eye).
- Parameters:
from_reference (bool) – Whether to use weights from reference (those that were initally set with set_weights) or the current weights.
x_values (Tensor | None) – Values to use for the read-and verify. If none are given, unit-vectors are used
learning_rate (float) – Learning rate of the optimization
max_iter (int) – max number of batches for the iterative programming
tolerance (float | None) – Stop the iteration loop early if the mean output deviation is below this number. Given in relation to the max output.
w_init (float | Tensor) – initial weight matrix to start from. If given as float, weights are set uniform random in [-w_init, w_init]. This init weight is given directly in normalized conductance units and should include the bias row if existing.
- Return type:
None
- read_weights(apply_weight_scaling=False, x_values=None, over_sampling=10)[source]
Reads the weights (and biases) in a realistic manner by using the forward pass for weights readout.
Gets the tile weights and extracts the mathematical weight matrix and biases (if present, by determined by the
self.analog_bias
parameter).The weight will not be directly read, but linearly estimated using random inputs using the analog forward pass.
Note
If the tile includes digital periphery (e.g. out scaling), these will be applied. Thus this weight is the logical weights that correspond to the weights in an FP network.
Note
weights are estimated using the
lstsq
solver from torch.- Parameters:
apply_weight_scaling (bool) – Whether to rescale the given weight matrix and populate the digital output scaling factors as specified in the configuration
MappingParameter
. A newweight_scaling_omega
can be given. Note that this will overwrite the existing digital out scaling factors.x_values (Tensor | None) – Values to use for estimating the matrix. If not given, inputs are standard normal vectors.
over_sampling (int) – If
x_values
is not given,over_sampling * in_size
random vectors are used for the estimation
- Returns:
a tuple where the first item is the
[out_size, in_size]
weight matrix; and the second item is either the[out_size]
bias vector orNone
if the tile is set not to use bias.- Raises:
TileError – in case wrong code usage of TileWithPeriphery
- Return type:
Tuple[Tensor, Tensor | None]
- remap_weights(weight_scaling_omega=1.0)[source]
Gets and re-sets the weights in case of using the weight scaling.
This re-sets the weights with applied mapping scales, so that the weight mapping scales are updated.
In case of hardware-aware training, this would update the weight mapping scales so that the absolute max analog weights are set to 1 (as specified in the
weight_scaling
configuration ofMappingParameter
).Note
By default the weight scaling omega factor is set to 1 here (overriding any setting in the
rpu_config
). This means that the max weight value is set to 1 internally for the analog weights.Caution
This should typically not be called for analog. Use
program_weights
to re-program.- Parameters:
weight_scaling_omega (float | None) – The weight scaling omega factor (see
MappingParameter
). If set to None here, it will take the value in the mapping parameters. Default is however 1.0.- Return type:
None
- set_indexed(indices, image_sizes)[source]
Set the index matrix for convolutions and switches to indexed forward/backward/update versions.
- Parameters:
indices (Tensor) – torch.tensor with int indices
image_sizes (List) – [C_in, H_in, W_in, H_out, W_out] sizes
- Raises:
ValueError – if
image_sizes
does not have valid dimensionsTileError – if the tile uses transposition or indexed not supported..
- Return type:
None
- set_input_range(value)[source]
Sets the input range.
- Parameters:
value (Tensor | float) – input range value
- Raises:
ConfigError – in case input range is None
- Return type:
None
- set_learned_out_scales(alpha)[source]
Helper function to set the out scaling alpha used to scale the weights in digital.
Note
Will be a no-op in case
init_learned_out_scales()
was not calledCaution
Will not check the correct size of the given alpha.
- Parameters:
alpha (Tensor | float) – out scales as a parameter that is learned.
- Return type:
None
- set_learning_rate(learning_rate)[source]
Set the tile learning rate.
Set the tile learning rate to
-learning_rate
. Note that the learning rate is always taken to be negative (because of the meaning in gradient descent) and positive learning rates are not supported.- Parameters:
learning_rate (float | None) – the desired learning rate.
- Return type:
None
- set_mapping_scales(mapping_scales)[source]
Set the scales used for the weight mapping.
- Parameters:
mapping_scales (Tensor | float | None) – Vector (or scalar) used for the mapping
in (of weights into conductance units. This mapping is never) –
when (the SGD graph but might get initialized) –
enforced. (weight_scaling_omega is used or remapping is) –
- Return type:
None
- set_scales(scales)[source]
Set all scales with a new scale.
This will set the mapping scales to
scales
and set all other scales to 1.- Parameters:
scales (Tensor | float) – scales to set.
- Return type:
None
- set_weights(weight, bias=None, apply_weight_scaling=True, realistic=False, weight_scaling_omega=None)[source]
Set the tile weights (and biases).
Sets the internal tile weights (and biases) to the specified values.
Note
By default this is not a hardware realistic weight readout but an exact weight copy of the internal weights.
Caution
By default the peripheral digital scales are applied to the weights, so that the weight is scaled (in case
weight_scaling_omega
is set accordingly).- Parameters:
weight (Tensor) –
[out_size, in_size]
weight matrix.bias (Tensor | None) –
[out_size]
bias vector. This parameter is required ifself.analog_bias
isTrue
, and ignored otherwise.apply_weight_scaling (bool) – Whether to rescale the given weight matrix and populate the digital output scaling factors as specified in the configuration
MappingParameter
. A newweight_scaling_omega
can be given. Note that this will overwrite the existing digital out scaling factors.realistic (bool) – whether to enable realistic write for getting the weights. Internally calls program_weights.
weight_scaling_omega (float | None) – The weight scaling omega factor (see
MappingParameter
). If given explicitly here, it will overwrite the value in the mapping field.
- Return type:
None
- supports_indexed = True
- update(x_input, d_input)[source]
Perform the update pass.
Calls the
pre_update
method to pre-process the inputs.- Parameters:
x_input (Tensor) –
[..., in_size]
tensor. Ifin_trans
is set,[in_size, ...]
.d_input (Tensor) –
[..., out_size]
tensor. Ifout_trans
is set,[out_size, ...]
.
- Returns:
None
- Return type:
None
- update_indexed(x_input, d_input)[source]
Perform the update pass for convolutions.
Calls the
pre_update
methods to pre-process the inputs.- Parameters:
x_input (Tensor) –
[N, in_size]
tensor. Ifin_trans
is set, transposed.d_input (Tensor) –
[N, out_size]
tensor. Ifout_trans
is set, transposed.
- Returns:
None
- Return type:
None