aihwkit.simulator.tiles.base module¶
High level analog tiles (base).
- class aihwkit.simulator.tiles.base.AnalogTileStateNames[source]
Bases:
object
Class defining analog tile state name constants.
Caution
Do not edit. Some names are attribute names of the tile.
- CLASS = 'analog_tile_class'
- CONTEXT = 'analog_ctx'
- HIDDEN_PARAMETERS = 'analog_tile_hidden_parameters'
- HIDDEN_PARAMETER_NAMES = 'analog_tile_hidden_parameter_names'
- LR = 'analog_lr'
- MAPPING_SCALES = 'mapping_scales'
- OUT_SCALING = 'out_scaling_alpha'
- RPU_CONFIG = 'rpu_config'
- SHARED_WEIGHTS = 'shared_weights'
- WEIGHTS = 'analog_tile_weights'
- class aihwkit.simulator.tiles.base.BaseTile(out_size, in_size, rpu_config, bias=True, in_trans=False, out_trans=False)[source]
Bases:
Generic
[aihwkit.simulator.tiles.base.RPUConfigGeneric
]Base class for tiles.
- Parameters
out_size – output size
in_size – input size
rpu_config – resistive processing unit configuration.
bias – whether to add a bias column to the tile.
in_trans – Whether to assume an transposed input (batch first)
out_trans – Whether to assume an transposed output (batch first)
- apply_input_range(values, update_from_data=False)[source]
Apply the input clipping.
- Parameters
values (torch.Tensor) – tensor to clip
update_from_data (bool) – whether to update from data if applicable
- Returns
clipped output tensor
- Return type
torch.Tensor
- apply_out_scaling(values, tensor_view=None)[source]
Apply the learned out scaling to the given tensor.
- Parameters
values (torch.Tensor) – tensor to apply scaling to.
tensor_view (Optional[Tuple[int, ...]]) – view to cast the out scalings before multiplication
- Returns
output tensor with applied out scaling factors
- Return type
torch.Tensor
- apply_weight_scaling(combined_weights, weight_scaling_omega=None)[source]
Set the tile weights (and biases) in a scaled fashion.
Scales the weights by a layerwise scale or columnwise scale (if
weight_scaling_columnwise
is set), that is then applied in digital at the output of forward and backward pass, and the learning rate for this tile is adjusted accordingly.If layerwise scale is chosen, weights are scaled by \(\omega/\max_{ij} |w_{ij}|\) and the global digital factor \(alpha\) is set to \(\max_{ij} |w_{ij}|/\omega\).
It can be shown that such a constant factor greatly improves the SNR and training accuracy as the full weight range of the analog devices are used. See also Rasch, Gokmen & Haensch (2019) for more details.
- Parameters
combined_weights (torch.Tensor) –
[d_size, x_size]
weight matrix.weight_scaling_omega (Optional[float]) – where the weight max should be mapped in terms of the weight range. Note that for
omega
larger than the maximal weight of the device, weights will get clipped for most devices. If this parameter is not given, it will default to theweight_scaling_omega
value set in theMappingParameter
of therpu_config
- Returns
scaled weights.
- Return type
torch.Tensor
- backward(d_input, ctx=None)[source]
Perform the backward pass.
- Parameters
d_input (torch.Tensor) –
[N, out_size]
tensor. Ifout_trans
is set, transposed.ctx (Any) – torch auto-grad context [Optional]
- Returns
[N, in_size]
tensor. Ifin_trans
is set, transposed.- Return type
torch.Tensor
- backward_indexed(d_input, ctx=None)[source]
Perform the backward pass for convolutions.
Depending on the input tensor size it performs the backward pass for a 2D image or a 3D one.
- Parameters
d_input (torch.Tensor) –
[N, out_size]
tensor. Ifout_trans
is set, transposed.ctx (Any) – torch auto-grad context [Optional]
- Returns
[N, in_size]
tensor. Ifin_trans
is set, transposed.- Return type
torch.Tensor
- Raises
TileError – if the indexed tile has not been initialized, or if
self.images_sizes
does not have a valid dimennion.
- cpu()[source]
Return a copy of this tile in CPU memory.
- Returns
self in case of CPU
- Return type
aihwkit.simulator.tiles.base.BaseTile
- cuda(device=None)[source]
Return a copy of this tile in CUDA memory.
- Parameters
device (Optional[Union[torch.device, str, int]]) –
- Return type
aihwkit.simulator.tiles.base.BaseTile
- decay_weights(alpha=1.0)[source]
Decays the weights once according to the decay parameters of the tile.
- Parameters
alpha (float) – additional decay scale (such as LR). The base decay rate is set during tile init.
- Returns
None.
- Return type
None
- diffuse_weights()[source]
Diffuses the weights once according to the diffusion parameters of the tile.
The base diffusion rate is set during tile init.
- Returns
None
- Return type
None
- drift_weights(delta_t=1.0)[source]
Drifts the weights once according to the drift parameters of the tile.
See also
DriftParameter
.- Parameters
delta_t (float) – Time since last drift call.
- Returns
None.
- Return type
None
- ensure_shared_weights(shared_weights=None)[source]
Ensure that the shared_weights is set properly.
Caution
This is only called from analog function.
No-op if shared weights is not used.
- Parameters
shared_weights (Optional[torch.Tensor]) –
- Return type
None
- forward(x_input, is_test=False, ctx=None)[source]
Perform the forward pass.
Calls first the
pre_forward
, then the tile forward, and finally thepost_forward
step.Note
The full forward pass is not using autograd, thus all pre and post functions need to be handled appropriately in the pre/post backward functions.
- Parameters
x_input (torch.Tensor) –
[N, in_size]
tensor. Ifin_trans
is set, transposed.is_test (bool) – whether to assume testing mode.
ctx (Any) – torch auto-grad context [Optional]
- Returns
[N, out_size]
tensor. Ifout_trans
is set, transposed.- Return type
torch.Tensor
- forward_indexed(x_input, is_test=False, ctx=None)[source]
Perform the forward pass for convolutions.
Depending on the input tensor size it performs the forward pass for a 2D image or a 3D one.
- Parameters
x_input (torch.Tensor) –
[N, in_size]
tensor. Ifin_trans
is set, transposed.is_test (bool) – whether to assume testing mode.
ctx (Any) – torch auto-grad context [Optional]
- Returns
[N, out_size]
tensor. Ifout_trans
is set, transposed.- Return type
torch.Tensor
- Raises
TileError – if the indexed tile has not been initialized, or if
self.images_sizes
does not have a valid dimennion.
- get_analog_ctx()[source]
Return the analog context of the tile to be used in
AnalogFunction
.- Return type
- get_brief_info()[source]
Return short info about the underlying C++ tile.
- Return type
str
- get_hidden_parameters()[source]
Get the hidden parameters of the tile.
- Returns
Ordered dictionary of hidden parameter tensors.
- Return type
collections.OrderedDict
- get_hidden_update_index()[source]
Get the current updated device index of the hidden devices.
Usually this is 0 as only one device is present per cross-point for many tile RPU configs. However, some RPU configs maintain internally multiple devices per cross-point (e.g.
VectorUnitCell
).- Returns
The next mini-batch updated device index.
- Return type
int
Note
Depending on the update and learning policy implemented in the tile, updated devices might switch internally as well.
- get_learned_out_scales()[source]
Get the learned_out_scaled that can be used add an output scale to the weights, that is learned.
- Returns
learned_out_scales
- Return type
tensor
- get_learning_rate()[source]
Return the tile learning rate.
- Returns
the tile learning rate.
- Return type
float
- get_mapping_scales()[source]
Get the scales used for the weight mapping.
- Returns
the vector (or scalar) that is used to determine the mapping into (norm) conductance units. These scales are used at the output of the analog MVM.
- Return type
Mapping scales
- get_scales()[source]
Set all scales with a new scale.
- Returns
Scale tensor if any scale exist else None.
- Return type
Optional[torch.Tensor]
- get_weights(apply_weight_scaling=False)[source]
Get the tile weights (and biases).
Gets the tile weights and extracts the mathematical weight matrix and biases (if present, by determined by the
self.bias
parameter).Note
The returned weight is a copy of the internal weights (not a pointer) and is always on CPU and detached.
Note
This is not a hardware realistic weight readout. Use
get_weights_realistic()
for a realistic transfer.- Parameters
apply_weight_scaling (bool) – Whether to return the weights with the (digital) output scaling factors applied. Note the “logical” weights of the layer which the DNN is effectively using are those with the output scales applied. If
apply_weight_scaling
is set to False, then only the weight values that is programmed onto the crossbar array are returned, without applying the digital scales.- Returns
a tuple where the first item is the
[out_size, in_size]
weight matrix; and the second item is either the[out_size]
bias vector orNone
if the tile is set not to use bias.- Return type
Tuple[torch.Tensor, Optional[torch.Tensor]]
- get_weights_realistic(apply_weight_scaling=False)[source]
Get the tile weights (and biases) in a realistic manner by using the forward pass for weights readout.
Gets the tile weights and extracts the mathematical weight matrix and biases (if present, by determined by the
self.bias
parameter).Note
The returned weight is a copy of the internal weights (not a pointer) and is always on CPU and detached.
- Parameters
apply_weight_scaling (bool) – Whether to return the weights with the (digital) output scaling factors applied. Note the “logical” weights of the layer which the DNN is effectively using are those with the output scales applied. If
apply_weight_scaling
is set to False, then only the weight values that is programmed onto the crossbar array are returned, without applying the digital scales.- Returns
a tuple where the first item is the
[out_size, in_size]
weight matrix; and the second item is either the[out_size]
bias vector orNone
if the tile is set not to use bias.- Return type
Tuple[torch.Tensor, Optional[torch.Tensor]]
- init_input_processing()[source]
Helper function to initialize the input processing.
Note
This method is called from the constructor.
- Raises: ConfigError in case
manage_output_clipping
is enabled but not supported.
- Return type
None
- Raises: ConfigError in case
- init_learned_out_scales()[source]
Helper function to initialize the learned out scaling used to scale the weights in digital.
Note
This method is called from the constructor.
- Return type
None
- init_mapping_scales()[source]
Helper function to initialize the mapping scales used to scale the weights in digital and determine the conductance conversion.
Note
This method is called from the constructor.
- Return type
None
- is_indexed()[source]
Returns whether index matrix for convolutions has been set.
- Returns
Whether index matrix has been set
- Return type
bool
- post_backward(d_output, dim, ctx=None)[source]
Operations after the actual backward step for post processing.
Here, the mapping scales are applied if exist.
- Parameters
d_output (torch.Tensor) – The output tensor from the analog MVM of the tile.
dim (int) – the dim of the x_size dimension
ctx (Any) – torch auto-grad context [Optional]
- Returns
The postprocessed tensor of the same shape
- Return type
torch.Tensor
- post_forward(x_output, dim, is_test=False, ctx=None)[source]
Operations after the actual forward step for post processing.
- Parameters
x_output (torch.Tensor) – tensor that is the output from the forward pass of the tile
dim (int) – output channel dimension, ie the d_size dimension
is_test (bool) – whether in eval mode
ctx (Any) – torch auto-grad context [Optional]
- Returns
Output tensor of the same shape
- Return type
torch.Tensor
- post_update_step()[source]
Operators that need to be called once per mini-batch.
Note
This function is called by the analog optimizer.
Caution
If no analog optimizer is used, the post update steps will not be performed.
- Return type
None
- pre_backward(d_input, dim, ctx=None)[source]
Operations before the actual backward step for pre processing.
By default, this is an no-op. However, it could be overridden in derived tile classes.
- Parameters
d_input (torch.Tensor) – The input tensor from to the analog MVM of the tile.
dim (int) – the dim of the d_size dimension
ctx (Any) – torch auto-grad context [Optional]
- Returns
The preprocessed tensor of the same shape
- Return type
torch.Tensor
- pre_forward(x_input, dim, is_test=False, ctx=None)[source]
Operations before the actual forward step for pre processing.
By default, this is an no-op. However, it could be overridden in derived tile classes.
- Parameters
x_input (torch.Tensor) – input tensor for the analog MVM of the tile.
dim (int) – input channel dimension, ie the x_size dimension
is_test (bool) – whether in eval mode
ctx (Any) – torch auto-grad context [Optional]
- Returns
Output tensor of the same shape
- Return type
torch.Tensor
- pre_update(x_input, x_dim, d_input, d_dim)[source]
Operations before the actual update step for pre processing.
Be default, if the mapping scales are used, the
d_input
will be divided by the mapping scales to compensate for the conductance mapping.Caution
The
x_input
andd_input
here are the original inputs to theforward` and ``backward
methods, thus thepre_forward
andpre_backward
function are not applied, and might need to be applied again here.- Parameters
x_input (torch.Tensor) – The forward input tensor.
x_dim (int) – the dim of the x_size dimension of the forward input.
d_input (torch.Tensor) – The backward (gradient) input tensor.
d_dim (int) – the dim of the d_size dimension of the backward input.
- Returns
Tuple of the preprocessed x_input and d_input tensors of the same shape
- Return type
Tuple[torch.Tensor, torch.Tensor]
- reset(reset_prob=1.0)[source]
Reset the updated device tile according to the reset parameters of the tile.
Resets the weights with device-to-device and cycle-to-cycle variability (depending on device type), typically:
\[W_{ij} = \xi*\sigma_\text{reset} + b^\text{reset}_{ij}\]The reset parameters are set during tile init.
- Parameters
reset_prob (float) – individual probability of reset.
- Returns
None
- Return type
None
- reset_columns(start_column_idx=0, num_columns=1, reset_prob=1.0)[source]
Reset (a number of) columns according to the reset parameters of the tile.
Resets the weights with device-to-device and cycle-to-cycle variability (depending on device type), typically:
\[W_{ij} = \xi*\sigma_\text{reset} + b^\text{reset}_{ij}\]The reset parameters are set during tile init.
- Parameters
start_column_idx (int) – a start index of columns (0..x_size-1)
num_columns (int) – how many consecutive columns to reset (with circular warping)
reset_prob (float) – individual probability of reset.
- Returns
None
- Return type
None
- reset_delta_weights()[source]
Reset the weight grad tensor to default update behavior (i.e. adding the update directly to the weight).
No-op if shared weights is not used.
- Return type
None
- set_delta_weights(delta_weights=None)[source]
Set the weight grad tensor and set the update to.
No-op if shared weights is not used.
- Parameters
delta_weights (Optional[torch.Tensor]) –
- Return type
None
- set_hidden_parameters(ordered_parameters)[source]
Set the hidden parameters of the tile.
Caution
Usually the hidden parameters are drawn according to the parameter definitions (those given in the RPU config). If the hidden parameters are arbitrary set by the user, then this correspondence might be broken. This might cause problems in the learning, in particular, the weight granularity (usually
dw_min
, depending on the device) is needed for the dynamic adjustment of the bit length (update_bl_management
, seeUpdateParameters
).Currently, the new
dw_min
parameter is tried to be estimated from the average of hidden parameters if the discrepancy with thedw_min
from the definition is too large.- Parameters
ordered_parameters (collections.OrderedDict) – Ordered dictionary of hidden parameter tensors.
- Raises
TileError – In case the ordered dict keys do not conform with the current rpu config tile structure of the hidden parameters
- Return type
None
- set_hidden_update_index(index)[source]
Set the current updated hidden device index.
Usually this is ignored and fixed to 0 as only one device is present per cross-point. Other devices, might not allow explicit setting as it would interfere with the implemented learning rule. However, some tiles have internally multiple devices per cross-point (eg. unit cell) that can be chosen depending on the update policy.
- Parameters
index (int) – device index to be updated in the next mini-batch
- Return type
None
Note
Depending on the update and learning policy implemented in the tile, updated devices might switch internally as well.
- set_indexed(indices, image_sizes)[source]
Set the index matrix for convolutions and switches to indexed forward/backward/update versions.
- Parameters
indices (torch.Tensor) – torch.tensor with int indices
image_sizes (List) – [C_in, H_in, W_in, H_out, W_out] sizes
- Raises
ValueError – if
image_sizes
does not have valid dimensions.TileError – if the tile uses transposition.
- Return type
None
- set_learned_out_scales(alpha)[source]
Helper function to set the out scaling alpha used to scale the weights in digital.
Note
Will be a no-op in case
init_learned_out_scales()
was not calledCaution
Will not check the correct size of the given alpha.
- Parameters
alpha (Union[torch.Tensor, float]) – out scales as a parameter that is learned.
- Return type
None
- set_learning_rate(learning_rate)[source]
Set the tile learning rate.
Set the tile learning rate to
-learning_rate
. Note that the learning rate is always taken to be negative (because of the meaning in gradient descent) and positive learning rates are not supported.- Parameters
learning_rate (float) – the desired learning rate.
- Returns
None.
- Return type
None
- set_mapping_scales(mapping_scales)[source]
Set the scales used for the weight mapping.
- Parameters
mapping_scales (Optional[Union[torch.Tensor, float]]) – Vector (or scalar) used for the mapping
in (of weights into conductance units. This mapping is never) –
when (the SGD graph but might get initialized) –
enforced. (weight_scaling_omega is used or remapping is) –
- Return type
None
- set_scales(scales)[source]
Set all scales with a new scale.
This will set the mapping scales to
scales
and set all other scales to 1.- Parameters
scales (Union[torch.Tensor, float]) – scales to set.
- Return type
None
- set_weights(weights, biases=None, apply_weight_scaling=False, weight_scaling_omega=None)[source]
Set the tile weights (and biases).
Sets the internal tile weights to the specified values, and also the internal tile biases if the tile was set to use bias (via
self.bias
).Note
This setting is not hardware realistic. Use the
set_weights_realistic()
for a realistic weight transfer.- Parameters
weights (torch.Tensor) –
[out_size, in_size]
weight matrix.biases (Optional[torch.Tensor]) –
[out_size]
bias vector. This parameter is required ifself.bias
isTrue
, and ignored otherwise.apply_weight_scaling (bool) – Whether to rescale the given weight matrix and populate the digital output scaling factors as specified in the configuration
MappingParameter
. A newweight_scaling_omega
can be given. Note that this will overwrite the existing digital out scaling factors.weight_scaling_omega (Optional[float]) – The weight scaling omega factor (see
MappingParameter
). If given explicitly here, it will overwrite the value in the mapping field.
- Returns
None.
- Return type
None
- set_weights_realistic(weights, biases=None, apply_weight_scaling=False, weight_scaling_omega=None, n_loops=10)[source]
Set the tile weights (and biases) in a realistic manner by using the forward and update pass.
Sets the internal tile weights to the specified values, and also the internal tile biases if the tile was set to use bias (via
self.bias
).- Parameters
weights (torch.Tensor) –
[out_size, in_size]
weight matrix.biases (Optional[torch.Tensor]) –
[out_size]
bias vector. This parameter is required ifself.bias
isTrue
, and ignored otherwise.apply_weight_scaling (bool) – Whether to rescale the given weight matrix and populate the digital output scaling factors as specified in the configuration
MappingParameter
. A newweight_scaling_omega
can be given. Note that this will overwrite the existing digital out scaling factors.weight_scaling_omega (Optional[float]) – The weight scaling omega factor (see
MappingParameter
). If given explicitly here, it will overwrite the value in the mapping field.n_loops (int) – number of times the columns of the weights are set in a closed-loop manner. A value of
1
means that all columns in principle receive enough pulses to change fromw_min
tow_max
.
- Returns
None.
- Raises
ValueError – if the tile has bias but
bias
has not been specified.- Return type
None
- update(x_input, d_input)[source]
Perform the update pass.
Calls the
pre_update
method to pre-process the inputs.- Parameters
x_input (torch.Tensor) –
[..., in_size]
tensor. Ifin_trans
is set,[in_size, ...]
.d_input (torch.Tensor) –
[..., out_size]
tensor. Ifout_trans
is set,[out_size, ...]
.
- Returns
None
- Return type
None
- update_indexed(x_input, d_input)[source]
Perform the update pass for convolutions.
Calls the
pre_update
methods to pre-process the inputs.- Parameters
x_input (torch.Tensor) –
[N, in_size]
tensor. Ifin_trans
is set, transposed.d_input (torch.Tensor) –
[N, out_size]
tensor. Ifout_trans
is set, transposed.
- Returns
None
- Return type
None