aihwkit.simulator.tiles.base module

High level analog tiles (base).

class aihwkit.simulator.tiles.base.BaseTile(out_size, in_size, rpu_config, bias=True, in_trans=False, out_trans=False)[source]

Bases: Generic[aihwkit.simulator.tiles.base.RPUConfigGeneric]

Base class for tiles.

Parameters
  • out_size – output size

  • in_size – input size

  • rpu_config – resistive processing unit configuration.

  • bias – whether to add a bias column to the tile.

  • in_trans – Whether to assume an transposed input (batch first)

  • out_trans – Whether to assume an transposed output (batch first)

apply_out_scaling(values, tensor_view=(- 1))[source]

Apply the out scaling to the given tensor.

Parameters
  • values (torch.Tensor) – tensor to apply the out scaling alphas to.

  • tensor_view (Tuple[int, ...]) – view to cast the out scaling alphas before multiplication

Returns

output tensor with applied out scaling factors

Return type

torch.Tensor

backward(d_input)[source]

Perform the backward pass.

Parameters

d_input (torch.Tensor) – [N, out_size] tensor. If out_trans is set, transposed.

Returns

[N, in_size] tensor. If in_trans is set, transposed.

Return type

torch.Tensor

backward_indexed(d_input)[source]

Perform the backward pass for convolutions.

Depending on the input tensor size it performs the backward pass for a 2D image or a 3D one.

Parameters

d_input (torch.Tensor) – [N, out_size] tensor. If out_trans is set, transposed.

Returns

[N, in_size] tensor. If in_trans is set, transposed.

Return type

torch.Tensor

Raises

TileError – if the indexed tile has not been initialized, or if self.images_sizes does not have a valid dimennion.

cpu()[source]

Return a copy of this tile in CPU memory.

Return type

aihwkit.simulator.tiles.base.BaseTile

cuda(device=None)[source]

Return a copy of this tile in CUDA memory.

Parameters

device (Optional[Union[torch.device, str, int]]) –

Return type

aihwkit.simulator.tiles.base.BaseTile

decay_weights(alpha=1.0)[source]

Decays the weights once according to the decay parameters of the tile.

Parameters

alpha (float) – additional decay scale (such as LR). The base decay rate is set during tile init.

Returns

None.

Return type

None

diffuse_weights()[source]

Diffuses the weights once according to the diffusion parameters of the tile.

The base diffusion rate is set during tile init.

Returns

None

Return type

None

drift_weights(delta_t=1.0)[source]

Drifts the weights once according to the drift parameters of the tile.

See also DriftParameter.

Parameters

delta_t (float) – Time since last drift call.

Returns

None.

Return type

None

ensure_shared_weights(shared_weights=None)[source]

Ensure that the shared_weights is set properly.

Caution

This is only called from analog function.

No-op if shared weights is not used.

Parameters

shared_weights (Optional[torch.Tensor]) –

Return type

None

forward(x_input, is_test=False)[source]

Perform the forward pass.

Parameters
  • x_input (torch.Tensor) – [N, in_size] tensor. If in_trans is set, transposed.

  • is_test (bool) – whether to assume testing mode.

Returns

[N, out_size] tensor. If out_trans is set, transposed.

Return type

torch.Tensor

forward_indexed(x_input, is_test=False)[source]

Perform the forward pass for convolutions.

Depending on the input tensor size it performs the forward pass for a 2D image or a 3D one.

Parameters
  • x_input (torch.Tensor) – [N, in_size] tensor. If in_trans is set, transposed.

  • is_test (bool) – whether to assume testing mode.

Returns

[N, out_size] tensor. If out_trans is set, transposed.

Return type

torch.Tensor

Raises

TileError – if the indexed tile has not been initialized, or if self.images_sizes does not have a valid dimennion.

get_analog_ctx()[source]

Return the analog context of the tile to be used in AnalogFunction.

Return type

aihwkit.optim.context.AnalogContext

get_brief_info()[source]

Return short info about the underlying C++ tile.

Return type

str

get_hidden_parameters()[source]

Get the hidden parameters of the tile.

Returns

Ordered dictionary of hidden parameter tensors.

Return type

collections.OrderedDict

get_hidden_update_index()[source]

Get the current updated device index of the hidden devices.

Usually this is 0 as only one device is present per cross-point for many tile RPU configs. However, some RPU configs maintain internally multiple devices per cross-point (e.g. VectorUnitCell).

Returns

The next mini-batch updated device index.

Return type

int

Note

Depending on the update and learning policy implemented in the tile, updated devices might switch internally as well.

get_learning_rate()[source]

Return the tile learning rate.

Returns

the tile learning rate.

Return type

float

get_out_scaling_alpha()[source]

Get the out_scaling_alpha used to scale the weights

Returns

out_scaling_alpha

Return type

tensor

get_weights(realistic=False)[source]

Get the tile weights (and biases).

Gets the tile weights and extracts the mathematical weight matrix and biases (if present, by determined by the self.bias parameter).

Note

By default this is not hardware realistic. Use set realistic to True for a realistic transfer.

Parameters

realistic (bool) – Whether to use the forward pass to read out the tile weights iteratively, using get_weights_realistic().

Returns

a tuple where the first item is the [out_size, in_size] weight matrix; and the second item is either the [out_size] bias vector or None if the tile is set not to use bias.

Return type

Tuple[torch.Tensor, Optional[torch.Tensor]]

get_weights_scaled(realistic=False)[source]

Get the tile weights (and biases) and applies the current alpha scale to it.

Gets the tile weights and extracts the mathematical weight matrix and biases (if present, by determined by the self.bias parameter).

Note

By default this is not hardware realistic. Use set realistic to True for a realistic transfer.

Parameters

realistic (bool) – Whether to use the forward pass to read out the tile weights iteratively, using get_weights_realistic().

Returns

where the first item is the [out_size, in_size] weight

matrix; and the second item is either the [out_size] bias vector or None if the tile is set not to use bias. Both have the alpha scale applied.

Return type

tuple

post_update_step()[source]

Operators that need to be called once per mini-batch.

Return type

None

reset_columns(start_column_idx=0, num_columns=1, reset_prob=1.0)[source]

Reset (a number of) columns according to the reset parameters of the tile.

Resets the weights with device-to-device and cycle-to-cycle variability (depending on device type), typically:

\[W_{ij} = \xi*\sigma_\text{reset} + b^\text{reset}_{ij}\]

The reset parameters are set during tile init.

Parameters
  • start_column_idx (int) – a start index of columns (0..x_size-1)

  • num_columns (int) – how many consecutive columns to reset (with circular warping)

  • reset_prob (float) – individual probability of reset.

Returns

None

Return type

None

reset_delta_weights()[source]

Reset the weight grad tensor to default update behavior (i.e. adding the update directly to the weight).

No-op if shared weights is not used.

Return type

None

set_delta_weights(delta_weights=None)[source]

Set the weight grad tensor and set the update to.

No-op if shared weights is not used.

Parameters

delta_weights (Optional[torch.Tensor]) –

Return type

None

set_hidden_parameters(ordered_parameters)[source]

Set the hidden parameters of the tile.

Caution

Usually the hidden parameters are drawn according to the parameter definitions (those given in the RPU config). If the hidden parameters are arbitrary set by the user, then this correspondence might be broken. This might cause problems in the learning, in particular, the weight granularity (usually dw_min, depending on the device) is needed for the dynamic adjustment of the bit length (update_bl_management, see UpdateParameters).

Currently, the new dw_min parameter is tried to be estimated from the average of hidden parameters if the discrepancy with the dw_min from the definition is too large.

Parameters

ordered_parameters (collections.OrderedDict) – Ordered dictionary of hidden parameter tensors.

Raises

TileError – In case the ordered dict keys do not conform with the current rpu config tile structure of the hidden parameters

Return type

None

set_hidden_update_index(index)[source]

Set the current updated hidden device index.

Usually this is ignored and fixed to 0 as only one device is present per cross-point. Other devices, might not allow explicit setting as it would interfere with the implemented learning rule. However, some tiles have internally multiple devices per cross-point (eg. unit cell) that can be chosen depending on the update policy.

Parameters

index (int) – device index to be updated in the next mini-batch

Return type

None

Note

Depending on the update and learning policy implemented in the tile, updated devices might switch internally as well.

set_indexed(indices, image_sizes)[source]

Set the index matrix for convolutions ans switches to indexed forward/backward/update versions.

Parameters
  • indices (torch.Tensor) – torch.tensor with int indices

  • image_sizes (List) – [C_in, H_in, W_in, H_out, W_out] sizes

Raises
  • ValueError – if image_sizes does not have valid dimensions.

  • TileError – if the tile uses transposition.

Return type

None

set_learning_rate(learning_rate)[source]

Set the tile learning rate.

Set the tile learning rate to -learning_rate. Note that the learning rate is always taken to be negative (because of the meaning in gradient descent) and positive learning rates are not supported.

Parameters

learning_rate (float) – the desired learning rate.

Returns

None.

Return type

None

set_out_scaling_alpha(alpha)[source]

Helper function to set the out scaling alpha used to scale the weights in digital.

Parameters

alpha (Union[torch.Tensor, float]) – out scaling alpha scale as a tensor or float value (depending on the property set by in the MappingParameter configurations

Return type

None

Caution

Will not check the correct size of the given alpha.

set_weights(weights, biases=None, realistic=False, n_loops=10)[source]

Set the tile weights (and biases).

Sets the internal tile weights to the specified values, and also the internal tile biases if the tile was set to use bias (via self.bias).

Note

By default this is not hardware realistic. You can set the realistic parameter to True for a realistic transfer.

Parameters
  • weights (torch.Tensor) – [out_size, in_size] weight matrix.

  • biases (Optional[torch.Tensor]) – [out_size] bias vector. This parameter is required if self.bias is True, and ignored otherwise.

  • realistic (bool) – whether to use the forward and update pass to program the weights iteratively, using set_weights_realistic().

  • n_loops (int) – number of times the columns of the weights are set in a closed-loop manner. A value of 1 means that all columns in principle receive enough pulses to change from w_min to w_max.

Returns

None.

Raises

ValueError – if the tile has bias but bias has not been specified.

Return type

None

set_weights_scaled(weights, biases=None, realistic=False, n_loops=10, weight_scaling_omega=None)[source]

Set the tile weights (and biases) in a scaled fashion.

Similar to set_weights(), however, additionally scales the weights by a global scale \(\alpha\), that is then applied in digital at the output of forward and backward pass, and the learning rate for this tile is adjusted accordingly.

The weights are scaled by \(\omega/\max_{ij} |w_{ij}|\) and the global digital factor \(alpha\) is set to \(\max_{ij} |w_{ij}|/\omega\).

It can be shown that such a constant factor greatly improves the SNR and training accuracy as the full weight range of the analog devices are used. See also Rasch, Gokmen & Haensch (2019) for more details.

Caution

Using get_weights will now retrieve the true analog weights without applying the global factor. To get the true weights, use get_weights and scale it by the \(\alpha\) of this layer which can be retrieved by get_alpha_scale().

Parameters
  • weights (torch.Tensor) – [out_size, in_size] weight matrix.

  • biases (Optional[torch.Tensor]) – [out_size] bias vector. This parameter is required if self.bias is True, and ignored otherwise.

  • realistic (bool) – whether to use the forward and update pass to program the weights iteratively, using set_weights_realistic().

  • n_loops (int) – number of times the columns of the weights are set in a closed-loop manner. A value of 1 means that all columns in principle receive enough pulses to change from w_min to w_max.

  • weight_scaling_omega (Optional[float]) – where the weight max should be mapped in terms of the weight range. Note that for omega larger than the maximal weight of the device, weights will get clipped for most devices. If this parameter is not given, it will default to the weight_scaling_omega value set in the MappingParameter of the rpu_config

Returns

None.

Raises

ValueError – if the tile has bias but bias has not been specified.

Return type

None

update(x_input, d_input)[source]

Perform the update pass.

Parameters
  • x_input (torch.Tensor) – [N, in_size] tensor. If in_trans is set, transposed.

  • d_input (torch.Tensor) – [N, out_size] tensor. If out_trans is set, transposed.

Returns

None

Return type

None

update_indexed(x_input, d_input)[source]

Perform the update pass for convolutions.

Parameters
  • x_input (torch.Tensor) – [N, in_size] tensor. If in_trans is set, transposed.

  • d_input (torch.Tensor) – [N, out_size] tensor. If out_trans is set, transposed.

Returns

None

Return type

None

aihwkit.simulator.tiles.base.as_tensor(data, dtype=None, device=None)Tensor

Converts data into a tensor, sharing data and preserving autograd history if possible.

If data is already a tensor with the requeseted dtype and device then data itself is returned, but if data is a tensor with a different dtype or device then it’s copied as if using data.to(dtype=dtype, device=device).

If data is a NumPy array (an ndarray) with the same dtype and device then a tensor is constructed using torch.from_numpy().

See also

torch.tensor() never shares its data and creates a new “leaf tensor” (see /notes/autograd).

Parameters
  • data (array_like) – Initial data for the tensor. Can be a list, tuple, NumPy ndarray, scalar, and other types.

  • dtype (torch.dtype, optional) – the desired data type of returned tensor. Default: if None, infers data type from data.

  • device (torch.device, optional) – the device of the constructed tensor. If None and data is a tensor then the device of data is used. If None and data is not a tensor then the result tensor is constructed on the CPU.

Example:

>>> a = numpy.array([1, 2, 3])
>>> t = torch.as_tensor(a)
>>> t
tensor([ 1,  2,  3])
>>> t[0] = -1
>>> a
array([-1,  2,  3])

>>> a = numpy.array([1, 2, 3])
>>> t = torch.as_tensor(a, device=torch.device('cuda'))
>>> t
tensor([ 1,  2,  3])
>>> t[0] = -1
>>> a
array([1,  2,  3])
aihwkit.simulator.tiles.base.cat(tensors, dim=0, *, out=None)Tensor

Concatenates the given sequence of seq tensors in the given dimension. All tensors must either have the same shape (except in the concatenating dimension) or be empty.

torch.cat() can be seen as an inverse operation for torch.split() and torch.chunk().

torch.cat() can be best understood via examples.

Parameters
  • tensors (sequence of Tensors) – any python sequence of tensors of the same type. Non-empty tensors provided must have the same shape, except in the cat dimension.

  • dim (int, optional) – the dimension over which the tensors are concatenated

Keyword Arguments

out (Tensor, optional) – the output tensor.

Example:

>>> x = torch.randn(2, 3)
>>> x
tensor([[ 0.6580, -1.0969, -0.4614],
        [-0.1034, -0.5790,  0.1497]])
>>> torch.cat((x, x, x), 0)
tensor([[ 0.6580, -1.0969, -0.4614],
        [-0.1034, -0.5790,  0.1497],
        [ 0.6580, -1.0969, -0.4614],
        [-0.1034, -0.5790,  0.1497],
        [ 0.6580, -1.0969, -0.4614],
        [-0.1034, -0.5790,  0.1497]])
>>> torch.cat((x, x, x), 1)
tensor([[ 0.6580, -1.0969, -0.4614,  0.6580, -1.0969, -0.4614,  0.6580,
         -1.0969, -0.4614],
        [-0.1034, -0.5790,  0.1497, -0.1034, -0.5790,  0.1497, -0.1034,
         -0.5790,  0.1497]])
aihwkit.simulator.tiles.base.ones_like(input, *, dtype=None, layout=None, device=None, requires_grad=False, memory_format=torch.preserve_format)Tensor

Returns a tensor filled with the scalar value 1, with the same size as input. torch.ones_like(input) is equivalent to torch.ones(input.size(), dtype=input.dtype, layout=input.layout, device=input.device).

Warning

As of 0.4, this function does not support an out keyword. As an alternative, the old torch.ones_like(input, out=output) is equivalent to torch.ones(input.size(), out=output).

Parameters

input (Tensor) – the size of input will determine size of the output tensor.

Keyword Arguments
  • dtype (torch.dtype, optional) – the desired data type of returned Tensor. Default: if None, defaults to the dtype of input.

  • layout (torch.layout, optional) – the desired layout of returned tensor. Default: if None, defaults to the layout of input.

  • device (torch.device, optional) – the desired device of returned tensor. Default: if None, defaults to the device of input.

  • requires_grad (bool, optional) – If autograd should record operations on the returned tensor. Default: False.

  • memory_format (torch.memory_format, optional) – the desired memory format of returned Tensor. Default: torch.preserve_format.

Example:

>>> input = torch.empty(2, 3)
>>> torch.ones_like(input)
tensor([[ 1.,  1.,  1.],
        [ 1.,  1.,  1.]])
aihwkit.simulator.tiles.base.squeeze(input, dim=None, *, out=None)Tensor

Returns a tensor with all the dimensions of input of size 1 removed.

For example, if input is of shape: \((A \times 1 \times B \times C \times 1 \times D)\) then the out tensor will be of shape: \((A \times B \times C \times D)\).

When dim is given, a squeeze operation is done only in the given dimension. If input is of shape: \((A \times 1 \times B)\), squeeze(input, 0) leaves the tensor unchanged, but squeeze(input, 1) will squeeze the tensor to the shape \((A \times B)\).

Note

The returned tensor shares the storage with the input tensor, so changing the contents of one will change the contents of the other.

Warning

If the tensor has a batch dimension of size 1, then squeeze(input) will also remove the batch dimension, which can lead to unexpected errors.

Parameters
  • input (Tensor) – the input tensor.

  • dim (int, optional) – if given, the input will be squeezed only in this dimension

Keyword Arguments

out (Tensor, optional) – the output tensor.

Example:

>>> x = torch.zeros(2, 1, 2, 1, 2)
>>> x.size()
torch.Size([2, 1, 2, 1, 2])
>>> y = torch.squeeze(x)
>>> y.size()
torch.Size([2, 2, 2])
>>> y = torch.squeeze(x, 0)
>>> y.size()
torch.Size([2, 1, 2, 1, 2])
>>> y = torch.squeeze(x, 1)
>>> y.size()
torch.Size([2, 2, 1, 2])
aihwkit.simulator.tiles.base.unsqueeze(input, dim)Tensor

Returns a new tensor with a dimension of size one inserted at the specified position.

The returned tensor shares the same underlying data with this tensor.

A dim value within the range [-input.dim() - 1, input.dim() + 1) can be used. Negative dim will correspond to unsqueeze() applied at dim = dim + input.dim() + 1.

Parameters
  • input (Tensor) – the input tensor.

  • dim (int) – the index at which to insert the singleton dimension

Example:

>>> x = torch.tensor([1, 2, 3, 4])
>>> torch.unsqueeze(x, 0)
tensor([[ 1,  2,  3,  4]])
>>> torch.unsqueeze(x, 1)
tensor([[ 1],
        [ 2],
        [ 3],
        [ 4]])
aihwkit.simulator.tiles.base.zeros(*size, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)Tensor

Returns a tensor filled with the scalar value 0, with the shape defined by the variable argument size.

Parameters

size (int...) – a sequence of integers defining the shape of the output tensor. Can be a variable number of arguments or a collection like a list or tuple.

Keyword Arguments
  • out (Tensor, optional) – the output tensor.

  • dtype (torch.dtype, optional) – the desired data type of returned tensor. Default: if None, uses a global default (see torch.set_default_tensor_type()).

  • layout (torch.layout, optional) – the desired layout of returned Tensor. Default: torch.strided.

  • device (torch.device, optional) – the desired device of returned tensor. Default: if None, uses the current device for the default tensor type (see torch.set_default_tensor_type()). device will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.

  • requires_grad (bool, optional) – If autograd should record operations on the returned tensor. Default: False.

Example:

>>> torch.zeros(2, 3)
tensor([[ 0.,  0.,  0.],
        [ 0.,  0.,  0.]])

>>> torch.zeros(5)
tensor([ 0.,  0.,  0.,  0.,  0.])