aihwkit.simulator.tiles.base module¶
High level analog tiles (base).
-
class
aihwkit.simulator.tiles.base.
BaseTile
(*args, **kwds)[source]¶ Bases:
typing.Generic
Base class for tiles.
- Parameters
out_size – output size
in_size – input size
rpu_config – resistive processing unit configuration.
bias – whether to add a bias column to the tile.
in_trans – Whether to assume an transposed input (batch first)
out_trans – Whether to assume an transposed output (batch first)
-
backward
(d_input)[source]¶ Perform the backward pass.
- Parameters
d_input (torch.Tensor) –
[N, out_size]
tensor. Ifout_trans
is set, transposed.- Returns
[N, in_size]
tensor. Ifin_trans
is set, transposed.- Return type
torch.Tensor
-
backward_indexed
(d_input)[source]¶ Perform the backward pass for convolutions.
Depending on the input tensor size it performs the backward pass for a 2D image or a 3D one.
- Parameters
d_input (torch.Tensor) –
[N, out_size]
tensor. Ifout_trans
is set, transposed.- Returns
[N, in_size]
tensor. Ifin_trans
is set, transposed.- Return type
torch.Tensor
- Raises
TileError – if the indexed tile has not been initialized, or if
self.images_sizes
does not have a valid dimennion.
-
cuda
(device=None)[source]¶ Return a copy of this tile in CUDA memory.
- Parameters
device (Optional[Union[torch.device, str, int]]) –
- Return type
-
decay_weights
(alpha=1.0)[source]¶ Decays the weights once according to the decay parameters of the tile.
- Parameters
alpha – additional decay scale (such as LR). The base decay rate is set during tile init.
- Returns
None.
- Return type
None
-
diffuse_weights
()[source]¶ Diffuses the weights once according to the diffusion parameters of the tile.
The base diffusion rate is set during tile init.
- Returns
None
- Return type
None
-
drift_weights
(delta_t=1.0)[source]¶ Drifts the weights once according to the drift parameters of the tile.
See also
DriftParameter
.- Parameters
delta_t – Time since last drift call.
- Returns
None.
- Return type
None
Ensure that the shared_weights is set properly.
Caution
This is only called from analog function.
No-op if shared weights is not used.
- Return type
None
-
forward
(x_input, is_test=False)[source]¶ Perform the forward pass.
- Parameters
x_input –
[N, in_size]
tensor. Ifin_trans
is set, transposed.is_test – whether to assume testing mode.
- Returns
[N, out_size]
tensor. Ifout_trans
is set, transposed.- Return type
torch.Tensor
-
forward_indexed
(x_input, is_test=False)[source]¶ Perform the forward pass for convolutions.
Depending on the input tensor size it performs the forward pass for a 2D image or a 3D one.
- Parameters
x_input –
[N, in_size]
tensor. Ifin_trans
is set, transposed.is_test – whether to assume testing mode.
- Returns
[N, out_size]
tensor. Ifout_trans
is set, transposed.- Return type
torch.Tensor
- Raises
TileError – if the indexed tile has not been initialized, or if
self.images_sizes
does not have a valid dimennion.
-
get_analog_ctx
()[source]¶ Return the analog context of the tile to be used in
AnalogFunction
.- Return type
Get the hidden parameters of the tile.
- Returns
Ordered dictionary of hidden parameter tensors.
- Return type
collections.OrderedDict
Get the current updated device index of the hidden devices.
Usually this is 0 as only one device is present per cross-point for many tile RPU configs. However, some RPU configs maintain internally multiple devices per cross-point (e.g.
VectorUnitCell
).- Returns
The next mini-batch updated device index.
- Return type
int
Note
Depending on the update and learning policy implemented in the tile, updated devices might switch internally as well.
-
get_learning_rate
()[source]¶ Return the tile learning rate.
- Returns
the tile learning rate.
- Return type
float
-
get_out_scaling_alpha
()[source]¶ Get the out_scaling_alpha used to scale the weights
- Returns
out_scaling_alpha
- Return type
tensor
-
get_weights
(realistic=False)[source]¶ Get the tile weights (and biases).
Gets the tile weights and extracts the mathematical weight matrix and biases (if present, by determined by the
self.bias
parameter).Note
By default this is not hardware realistic. Use set
realistic
to True for a realistic transfer.- Parameters
realistic (bool) – Whether to use the forward pass to read out the tile weights iteratively, using
get_weights_realistic()
.- Returns
a tuple where the first item is the
[out_size, in_size]
weight matrix; and the second item is either the[out_size]
bias vector orNone
if the tile is set not to use bias.- Return type
Tuple[torch.Tensor, Optional[torch.Tensor]]
-
get_weights_scaled
(realistic=False, weight_scaling_omega_columnwise=False)[source]¶ Get the tile weights (and biases) and applies the current alpha scale to it.
Gets the tile weights and extracts the mathematical weight matrix and biases (if present, by determined by the
self.bias
parameter).Note
By default this is not hardware realistic. Use set
realistic
to True for a realistic transfer.- Parameters
realistic (bool) – Whether to use the forward pass to read out the tile weights iteratively, using
get_weights_realistic()
.weight_scaling_omega_columnwise (bool) – whether the weight matrix will be remapped column-wise over the maximum device value allowed.
- Returns
- where the first item is the
[out_size, in_size]
weight matrix; and the second item is either the
[out_size]
bias vector orNone
if the tile is set not to use bias. Both have the alpha scale applied.
- where the first item is the
- Return type
tuple
-
reset_columns
(start_column_idx=0, num_columns=1, reset_prob=1.0)[source]¶ Reset (a number of) columns according to the reset parameters of the tile.
Resets the weights with device-to-device and cycle-to-cycle variability (depending on device type), typically:
\[W_{ij} = \xi*\sigma_\text{reset} + b^\text{reset}_{ij}\]The reset parameters are set during tile init.
- Parameters
start_column_idx – a start index of columns (0..x_size-1)
num_columns – how many consecutive columns to reset (with circular warping)
reset_prob – individual probability of reset.
- Returns
None
- Return type
None
-
reset_delta_weights
()[source]¶ Reset the weight grad tensor to default update behavior (i.e. adding the update directly to the weight).
No-op if shared weights is not used.
- Return type
None
-
set_delta_weights
(delta_weights=None)[source]¶ Set the weight grad tensor and set the update to.
No-op if shared weights is not used.
- Return type
None
Set the hidden parameters of the tile.
Caution
Usually the hidden parameters are drawn according to the parameter definitions (those given in the RPU config). If the hidden parameters are arbitrary set by the user, then this correspondence might be broken. This might cause problems in the learning, in particular, the weight granularity (usually
dw_min
, depending on the device) is needed for the dynamic adjustment of the bit length (update_bl_management
, seeUpdateParameters
).Currently, the new
dw_min
parameter is tried to be estimated from the average of hidden parameters if the discrepancy with thedw_min
from the definition is too large.- Parameters
ordered_parameters (collections.OrderedDict) – Ordered dictionary of hidden parameter tensors.
- Raises
TileError – In case the ordered dict keys do not conform with the current rpu config tile structure of the hidden parameters
- Return type
None
Set the current updated hidden device index.
Usually this is ignored and fixed to 0 as only one device is present per cross-point. Other devices, might not allow explicit setting as it would interfere with the implemented learning rule. However, some tiles have internally multiple devices per cross-point (eg. unit cell) that can be chosen depending on the update policy.
- Parameters
index (int) – device index to be updated in the next mini-batch
- Return type
None
Note
Depending on the update and learning policy implemented in the tile, updated devices might switch internally as well.
-
set_indexed
(indices, image_sizes)[source]¶ Set the index matrix for convolutions ans switches to indexed forward/backward/update versions.
- Parameters
indices (torch.Tensor) – torch.tensor with int indices
image_sizes (List) – [C_in, H_in, W_in, H_out, W_out] sizes
- Raises
ValueError – if
image_sizes
does not have valid dimensions.TileError – if the tile uses transposition.
- Return type
None
-
set_learning_rate
(learning_rate)[source]¶ Set the tile learning rate.
Set the tile learning rate to
-learning_rate
. Note that the learning rate is always taken to be negative (because of the meaning in gradient descent) and positive learning rates are not supported.- Parameters
learning_rate (float) – the desired learning rate.
- Returns
None.
- Return type
None
-
set_weights
(weights, biases=None, realistic=False, n_loops=10)[source]¶ Set the tile weights (and biases).
Sets the internal tile weights to the specified values, and also the internal tile biases if the tile was set to use bias (via
self.bias
).Note
By default this is not hardware realistic. You can set the
realistic
parameter toTrue
for a realistic transfer.- Parameters
weights (torch.Tensor) –
[out_size, in_size]
weight matrix.biases (Optional[torch.Tensor]) –
[out_size]
bias vector. This parameter is required ifself.bias
isTrue
, and ignored otherwise.realistic (bool) – whether to use the forward and update pass to program the weights iteratively, using
set_weights_realistic()
.n_loops (int) – number of times the columns of the weights are set in a closed-loop manner. A value of
1
means that all columns in principle receive enough pulses to change fromw_min
tow_max
.
- Returns
None.
- Raises
ValueError – if the tile has bias but
bias
has not been specified.- Return type
None
-
set_weights_scaled
(weights, biases=None, realistic=False, n_loops=10, omega=1.0, weight_scaling_omega_columnwise=False, learn_out_scaling_alpha=False)[source]¶ Set the tile weights (and biases) in a scaled fashion.
Similar to
set_weights()
, however, additionally scales the weights by a global scale \(\alpha\), that is then applied in digital at the output of forward and backward pass, and the learning rate for this tile is adjusted accordingly.The weights are scaled by \(\omega/\max_{ij} |w_{ij}|\) and the global digital factor \(alpha\) is set to \(\max_{ij} |w_{ij}|/\omega\).
It can be shown that such a constant factor greatly improves the SNR and training accuracy as the full weight range of the analog devices are used. See also Rasch, Gokmen & Haensch (2019) for more details.
Caution
Using
get_weights
will now retrieve the true analog weights without applying the global factor. To get the true weights, useget_weights
and scale it by the \(\alpha\) of this layer which can be retrieved byget_alpha_scale()
.- Parameters
weights (torch.Tensor) –
[out_size, in_size]
weight matrix.biases (Optional[torch.Tensor]) –
[out_size]
bias vector. This parameter is required ifself.bias
isTrue
, and ignored otherwise.realistic (bool) – whether to use the forward and update pass to program the weights iteratively, using
set_weights_realistic()
.n_loops (int) – number of times the columns of the weights are set in a closed-loop manner. A value of
1
means that all columns in principle receive enough pulses to change fromw_min
tow_max
.omega (float) – where the weight max should be mapped in terms of the weight range. Note that for
omega
larger than the maximal weight of the device, weights will get clipped for most devices.weight_scaling_omega_columnwise (bool) – whether the weight matrix will be remapped column-wise over the maximum device value allowed.
learn_out_scaling_alpha (bool) – whether the alpha scaling are learnable.
- Returns
None.
- Raises
ValueError – if the tile has bias but
bias
has not been specified.- Return type
None
-
aihwkit.simulator.tiles.base.
as_tensor
(data, dtype=None, device=None) → Tensor¶ Convert the data into a torch.Tensor. If the data is already a Tensor with the same dtype and device, no copy will be performed, otherwise a new Tensor will be returned with computational graph retained if data Tensor has
requires_grad=True
. Similarly, if the data is anndarray
of the corresponding dtype and the device is the cpu, no copy will be performed.- Parameters
data (array_like) – Initial data for the tensor. Can be a list, tuple, NumPy
ndarray
, scalar, and other types.dtype (
torch.dtype
, optional) – the desired data type of returned tensor. Default: ifNone
, infers data type fromdata
.device (
torch.device
, optional) – the desired device of returned tensor. Default: ifNone
, uses the current device for the default tensor type (seetorch.set_default_tensor_type()
).device
will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
Example:
>>> a = numpy.array([1, 2, 3]) >>> t = torch.as_tensor(a) >>> t tensor([ 1, 2, 3]) >>> t[0] = -1 >>> a array([-1, 2, 3]) >>> a = numpy.array([1, 2, 3]) >>> t = torch.as_tensor(a, device=torch.device('cuda')) >>> t tensor([ 1, 2, 3]) >>> t[0] = -1 >>> a array([1, 2, 3])
-
aihwkit.simulator.tiles.base.
cat
(tensors, dim=0, *, out=None) → Tensor¶ Concatenates the given sequence of
seq
tensors in the given dimension. All tensors must either have the same shape (except in the concatenating dimension) or be empty.torch.cat()
can be seen as an inverse operation fortorch.split()
andtorch.chunk()
.torch.cat()
can be best understood via examples.- Parameters
tensors (sequence of Tensors) – any python sequence of tensors of the same type. Non-empty tensors provided must have the same shape, except in the cat dimension.
dim (int, optional) – the dimension over which the tensors are concatenated
- Keyword Arguments
out (Tensor, optional) – the output tensor.
Example:
>>> x = torch.randn(2, 3) >>> x tensor([[ 0.6580, -1.0969, -0.4614], [-0.1034, -0.5790, 0.1497]]) >>> torch.cat((x, x, x), 0) tensor([[ 0.6580, -1.0969, -0.4614], [-0.1034, -0.5790, 0.1497], [ 0.6580, -1.0969, -0.4614], [-0.1034, -0.5790, 0.1497], [ 0.6580, -1.0969, -0.4614], [-0.1034, -0.5790, 0.1497]]) >>> torch.cat((x, x, x), 1) tensor([[ 0.6580, -1.0969, -0.4614, 0.6580, -1.0969, -0.4614, 0.6580, -1.0969, -0.4614], [-0.1034, -0.5790, 0.1497, -0.1034, -0.5790, 0.1497, -0.1034, -0.5790, 0.1497]])
-
aihwkit.simulator.tiles.base.
squeeze
(input, dim=None, *, out=None) → Tensor¶ Returns a tensor with all the dimensions of
input
of size 1 removed.For example, if input is of shape: \((A \times 1 \times B \times C \times 1 \times D)\) then the out tensor will be of shape: \((A \times B \times C \times D)\).
When
dim
is given, a squeeze operation is done only in the given dimension. If input is of shape: \((A \times 1 \times B)\),squeeze(input, 0)
leaves the tensor unchanged, butsqueeze(input, 1)
will squeeze the tensor to the shape \((A \times B)\).Note
The returned tensor shares the storage with the input tensor, so changing the contents of one will change the contents of the other.
Warning
If the tensor has a batch dimension of size 1, then squeeze(input) will also remove the batch dimension, which can lead to unexpected errors.
- Parameters
input (Tensor) – the input tensor.
dim (int, optional) – if given, the input will be squeezed only in this dimension
- Keyword Arguments
out (Tensor, optional) – the output tensor.
Example:
>>> x = torch.zeros(2, 1, 2, 1, 2) >>> x.size() torch.Size([2, 1, 2, 1, 2]) >>> y = torch.squeeze(x) >>> y.size() torch.Size([2, 2, 2]) >>> y = torch.squeeze(x, 0) >>> y.size() torch.Size([2, 1, 2, 1, 2]) >>> y = torch.squeeze(x, 1) >>> y.size() torch.Size([2, 2, 1, 2])
-
aihwkit.simulator.tiles.base.
torch_max
()¶ max(input) -> Tensor
Returns the maximum value of all elements in the
input
tensor.Warning
This function produces deterministic (sub)gradients unlike
max(dim=0)
- Parameters
input (Tensor) – the input tensor.
Example:
>>> a = torch.randn(1, 3) >>> a tensor([[ 0.6763, 0.7445, -2.2369]]) >>> torch.max(a) tensor(0.7445)
-
aihwkit.simulator.tiles.base.
max
(input, dim, keepdim=False, *, out=None)
Returns a namedtuple
(values, indices)
wherevalues
is the maximum value of each row of theinput
tensor in the given dimensiondim
. Andindices
is the index location of each maximum value found (argmax).If
keepdim
isTrue
, the output tensors are of the same size asinput
except in the dimensiondim
where they are of size 1. Otherwise,dim
is squeezed (seetorch.squeeze()
), resulting in the output tensors having 1 fewer dimension thaninput
.Note
If there are multiple maximal values in a reduced row then the indices of the first maximal value are returned.
- Parameters
input (Tensor) – the input tensor.
dim (int) – the dimension to reduce.
keepdim (bool) – whether the output tensor has
dim
retained or not. Default:False
.
- Keyword Arguments
out (tuple, optional) – the result tuple of two output tensors (max, max_indices)
Example:
>>> a = torch.randn(4, 4) >>> a tensor([[-1.2360, -0.2942, -0.1222, 0.8475], [ 1.1949, -1.1127, -2.2379, -0.6702], [ 1.5717, -0.9207, 0.1297, -1.8768], [-0.6172, 1.0036, -0.6060, -0.2432]]) >>> torch.max(a, 1) torch.return_types.max(values=tensor([0.8475, 1.1949, 1.5717, 1.0036]), indices=tensor([3, 0, 0, 1]))
-
aihwkit.simulator.tiles.base.
max
(input, other, *, out=None) → Tensor
See
torch.maximum()
.
-
aihwkit.simulator.tiles.base.
unsqueeze
(input, dim) → Tensor¶ Returns a new tensor with a dimension of size one inserted at the specified position.
The returned tensor shares the same underlying data with this tensor.
A
dim
value within the range[-input.dim() - 1, input.dim() + 1)
can be used. Negativedim
will correspond tounsqueeze()
applied atdim
=dim + input.dim() + 1
.- Parameters
input (Tensor) – the input tensor.
dim (int) – the index at which to insert the singleton dimension
Example:
>>> x = torch.tensor([1, 2, 3, 4]) >>> torch.unsqueeze(x, 0) tensor([[ 1, 2, 3, 4]]) >>> torch.unsqueeze(x, 1) tensor([[ 1], [ 2], [ 3], [ 4]])
-
aihwkit.simulator.tiles.base.
zeros
(*size, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor¶ Returns a tensor filled with the scalar value 0, with the shape defined by the variable argument
size
.- Parameters
size (int...) – a sequence of integers defining the shape of the output tensor. Can be a variable number of arguments or a collection like a list or tuple.
- Keyword Arguments
out (Tensor, optional) – the output tensor.
dtype (
torch.dtype
, optional) – the desired data type of returned tensor. Default: ifNone
, uses a global default (seetorch.set_default_tensor_type()
).layout (
torch.layout
, optional) – the desired layout of returned Tensor. Default:torch.strided
.device (
torch.device
, optional) – the desired device of returned tensor. Default: ifNone
, uses the current device for the default tensor type (seetorch.set_default_tensor_type()
).device
will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.requires_grad (bool, optional) – If autograd should record operations on the returned tensor. Default:
False
.
Example:
>>> torch.zeros(2, 3) tensor([[ 0., 0., 0.], [ 0., 0., 0.]]) >>> torch.zeros(5) tensor([ 0., 0., 0., 0., 0.])