aihwkit.simulator.tiles module¶
High level analog tiles.
-
class
aihwkit.simulator.tiles.
AnalogTile
(out_size, in_size, resistive_device=None, bias=False, in_trans=False, out_trans=False)¶ Bases:
aihwkit.simulator.tiles.BaseTile
Analog tile.
This analog tile implements an abstract analog tile where many cycle-tp-cycle non-idealities and systematic parameter-spreads that can be user-defined.
In general stochastic bit pulse trains are generate during update and device materials (or unit cells) at each cross-point are only updated if a coincidence of rows and columns pulses.
Here, a resistive device material is assumed that response with a finite step change of its conductance value that is independent of its own conductance value.
In its basic parameter settings it implements the analog RPU tile model described in Gokmen & Vlasov (2016), but with a number of enhancements that are adjustable by parameter settings.
All tile parameters are given in
AnalogTileParameters
.Forward pass:
In general, the following analog forward pass is computed:
\[\mathbf{y} = f_\text{ADC}((W + \sigma_\text{w}\Xi) \otimes (f_\text{DAC}( x/\alpha ) + \sigma_\text{inp}\,\boldsymbol{\xi}_1 ) + \sigma_\text{out}\,\boldsymbol{\xi}_2)\,s_\alpha\, s_\text{out}\,\alpha\]where \(W\) is the weight matrix, \(\mathbf{x}\) the input vector and the \(\Xi,\boldsymbol{\xi}_1,\boldsymbol{\xi}_2\) Gaussian noise variables (with corresponding matrix and vector sizes). The \(\alpha\) is a scale from the noise management (see
rpu_types.NoiseManagementTypeMap
). The symbol \(\otimes\) refers to the ‘analog’ matrix-vector multiplication, that might have additional non-linearities.\(f_\text{Z}\) (with Z either ADC or DAC) indicates the discretization to a number of equidistant steps between a bound value \(-b_\text{Z},\ldots,b_\text{Z}\) potentially with stochastic rounding (SR):
\[f_\text{Z}(x) = \text{round}(x\, \frac{r_\text{Z}}{2\,b_\text{Z}} + \zeta)\frac{2b_\text{Z}}{r_\text{Z}}\]If SR is enabled \(\zeta\) is an uniform random \(\in [-0.5,0.5)\). Otherwise \(\zeta=0\). Inputs are clipped below \(-b_\text{Z}\) and above \(b_\text{Z}\)
\(r_Z\) is the resolution of the ADC or DAC. E.g. for 8 bit, it would be \(1/256\)
Note
Typically the resolution is reduced by 2 level, eg. in case of 8 bits it is set to \(1/254\) to account for a discretization mirror symmetric around zero, including the zero and discarding one value.
The scalar scale \(s_\text{out}\) can be set by
out_scale
. The scalar scale \(s_\alpha\) is an additional scale that might be use to map weight better to conductance ranges.For parameters regarding the forward pass behavior, see
AnalogTileInputOutputParameters
.Backward pass:
Identical to the forward direction except that the transposed weight matrix is used. Same parameters as during the forward pass except that bound management is not supported.
For parameters regarding the backward pass behavior, see
AnalogTileInputOutputParameters
.General weight update:
The weight update that theoretically needs to be computed is
\[w_{ij} = w_{ij} + \lambda d_i\,x_j\]thus the outer product of error vector and input vector.
Although the update depends on the ResistiveDevice used, in general, stochastic pulse trains of a given length are drawn, where the probability of occurrence of an pulse is proportional to \(\sqrt{\lambda}d_i\) and \(\sqrt{\lambda}x_j\) respectively. Then for each cross-point, in case a coincidence of column and row pulses occur, the weight is updated one step. For details, see Gokmen & Vlasov (2016).
The amount of how the weight changes per single step might be different for the different resistive devices.
In pseudo code:
# generate prob number p_i = quantize(A * d_i, res, sto_round) q_j = quantize(B * x_j, res, sto_round) sign = sign(d_i)*sign(x_j) # generate pulse trains of length BL pulse_train_d = gen_pulse_train(p_i, BL) # e.g 101001001 pulse_train_x = gen_pulse_train(q_j, BL) # e.g 001010010 for t in range(BL): if (pulse_train_x[t]==1) and (pulse_train_d[t]==1) update_once(w_ij, direction = sign)
The probabilities are generated using scaling factors
A
andB
that are determined by the learning rate and pulse train lengthBL
(see below).quantize
is an optional discretization of the resulting probability, to account for limited resolution number in the stochastic pulse train generation process on the chip .The
update_once
functionality is in general dependent on the analog tile class. For ConstantStep the step width is independent of the actual weight, but has cycle-to-cycle variation, device-to-device variation or systematic bias for up versus down direction (see below).For parameters regarding the update behaviour, see
AnalogTileUpdateParameters
.- Parameters
out_size – output vector size of the tile, ie. the dimension of \(\mathbf{y}\) in case of \(\mathbf{y} = W\mathbf{x}\) (or equivalently the dimension of the \(\boldsymbol{\delta}\) of the backward pass).
in_size – input vector size, ie. the dimension of the vector \(\mathbf{x}\) in case of \(\mathbf{y} = W\mathbf{x}\)).
resistive_device – resistive device.
bias – whether to add a bias column to the tile, ie. \(W\) has an extra column to code the biases. Internally, the input \(\mathbf{x}\) will be automatically expanded by an extra dimension which will be set to 1 always.
-
cuda
(device=None)¶ Return a copy of this tile in CUDA memory.
- Parameters
device (Optional[Union[torch.device, str, int]]) – CUDA device
- Return type
-
class
aihwkit.simulator.tiles.
BaseTile
(out_size, in_size, resistive_device, bias=True, in_trans=False, out_trans=False)¶ Bases:
object
Base class for tiles.
- Parameters
out_size – output size
in_size – input size
resistive_device – resistive device.
bias – whether to add a bias column to the tile.
in_trans – Whether to assume an transposed input (batch first)
out_trans – Whether to assume an transposed output (batch first)
-
backward
(d_input)¶ Perform the backward pass.
- Parameters
d_input (torch.Tensor) –
[N, out_size]
tensor. Ifout_trans
is set, transposed.- Returns
[N, in_size]
tensor. Ifin_trans
is set, transposed.- Return type
torch.Tensor
-
cuda
(device=None)¶ Return a copy of this tile in CUDA memory.
- Parameters
device (Optional[Union[torch.device, str, int]]) –
- Return type
-
decay_weights
(alpha=1.0)¶ Decays the weights once.
- Parameters
alpha (float) – additional decay scale (such as LR). The base decay rate is set during tile init.
- Return type
None
-
diffuse_weights
()¶ Diffuses the weights once.
The base diffusion rate is set during tile init.
- Return type
None
-
forward
(x_input, is_test=False)¶ Perform the forward pass.
- Parameters
x_input (torch.Tensor) –
[N, in_size]
tensor. Ifin_trans
is set, transposed.is_test (bool) – whether to assume testing mode.
- Returns
[N, out_size]
tensor. Ifout_trans
is set, transposed.- Return type
torch.Tensor
Get the hidden parameters of the tile.
- Returns
Ordered dictionary of hidden parameter tensors.
- Return type
collections.OrderedDict
-
get_learning_rate
()¶ Return the tile learning rate.
- Returns
the tile learning rate.
- Return type
float
-
get_weights
(realistic=False)¶ Get the tile weights (and biases).
Gets the tile weights and extracts the mathematical weight matrix and biases (if present, by determined by the
self.bias
parameter).Note
By default this is not hardware realistic. Use set
realistic
to True for a realistic transfer.- Parameters
realistic (bool) – Whether to use the forward pass to read out the tile weights iteratively, using
get_weights_realistic()
- Returns
a tuple where the first item is the
[out_size, in_size]
weight matrix; and the second item is either the[out_size]
bias vector orNone
if the tile is set not to use bias.- Return type
Tuple[torch.Tensor, Optional[torch.Tensor]]
-
is_cuda
= False¶
Set the hidden parameters of the tile.
- Parameters
ordered_parameters (collections.OrderedDict) – Ordered dictionary of hidden parameter tensors.
- Return type
None
-
set_learning_rate
(learning_rate)¶ Set the tile learning rate.
Set the tile learning rate to
-learning_rate
. Note that the learning rate is always taken to be negative (because of the meaning in gradient descent) and positive learning rates are not supported.- Parameters
learning_rate (float) – the desired learning rate.
- Return type
None
-
set_weights
(weights, biases=None, realistic=False, n_loops=10)¶ Set the tile weights (and biases).
Sets the internal tile weights to the specified values, and also the internal tile biases if the tile was set to use bias (via
self.bias
).Note
By default this is not hardware realistic. You can set the
realistic
parameter toTrue
for a realistic transfer.- Parameters
weights (torch.Tensor) –
[out_size, in_size]
weight matrix.biases (Optional[torch.Tensor]) –
[out_size]
bias vector. This parameter is required ifself.bias
isTrue
, and ignored otherwise.realistic (bool) – whether to use the forward and update pass to program the weights iteratively, using
set_weights_realistic()
.n_loops (int) – number of times the columns of the weights are set in a closed-loop manner. A value of
1
means that all columns in principle receive enough pulses to change fromw_min
tow_max
.
- Return type
None
-
update
(x_input, d_input)¶ Perform the update pass.
- Parameters
x_input (torch.Tensor) –
[N, in_size]
tensor. Ifin_trans
is set, transposed.d_input (torch.Tensor) –
[N, out_size]
tensor. Ifout_trans
is set, transposed.
- Return type
None
-
class
aihwkit.simulator.tiles.
CudaAnalogTile
(out_size, in_size, resistive_device=None, bias=False, in_trans=False, out_trans=False)¶ Bases:
aihwkit.simulator.tiles.AnalogTile
Analog tile (CUDA).
- Parameters
out_size – output vector size of the tile.
in_size – input vector size of the tile.
resistive_device – resistive device.
bias – whether to add a bias column to the tile.
in_trans – whether to assume a transposed input (batch first)
out_trans – whether to assume a transposed output (batch first)
-
cuda
(device=None)¶ Return a copy of this tile in CUDA memory.
- Parameters
device (Optional[Union[torch.device, str, int]]) – CUDA device
- Return type
-
is_cuda
= True¶
-
class
aihwkit.simulator.tiles.
CudaFloatingPointTile
(out_size, in_size, resistive_device=None, bias=False, in_trans=False, out_trans=False)¶ Bases:
aihwkit.simulator.tiles.FloatingPointTile
Floating point tile (CUDA).
- Parameters
out_size – output vector size of the tile.
in_size – input vector size of the tile.
resistive_device – resistive device.
bias – whether to add a bias column to the tile.
in_trans – whether to assume a transposed input (batch first)
out_trans – whether to assume a transposed output (batch first)
-
cuda
(device=None)¶ Return a copy of this tile in CUDA memory.
- Parameters
device (Optional[Union[torch.device, str, int]]) –
- Return type
-
is_cuda
= True¶
-
class
aihwkit.simulator.tiles.
FloatingPointTile
(out_size, in_size, resistive_device=None, bias=False, in_trans=False, out_trans=False)¶ Bases:
aihwkit.simulator.tiles.BaseTile
Floating point tile.
Implements a floating point or ideal analog tile.
A linear layer with this tile is perfectly linear, it just uses the RPUCuda library for execution.
Forward pass:
\[\mathbf{y} = W\mathbf{x}\]\(W\) are the weights, \(\mathbf{x}\) is the input vector. \(\mathbf{y}\) is output of the vector matrix multiplication. Note that if bias is used, \(\mathbf{x}\) is concatenated with 1 so that the last column of \(W\) are the biases.
Backward pass:
Typical backward pass with transposed weights:
\[\mathbf{d'} = W^T\mathbf{d}\]where \(\mathbf{d}\) is the error vector. \(\mathbf{d}_o\) is output of the backward matrix vector multiplication.
Weight update:
Usual learning rule for back-propagation:
\[w_{ij} \leftarrow w_{ij} + \lambda d_i\,x_j\]Decay:
\[w_{ij} \leftarrow w_{ij}(1-\alpha r_\text{decay})\]Weight decay can be called by calling the analog tile decay.
Note
life_time
parameter is set during initialization. alpha is a scaling factor that can be given during run-time.Diffusion:
\[w_{ij} \leftarrow w_{ij} + \xi\;r_\text{diffusion}\]Similar to the decay, diffusion is only done when explicitly called. However, the parameter of the diffusion process are set during initialization and are fixed for the remainder. \(\xi\) is a standard Gaussian process.
- Parameters
out_size – output vector size of the tile, ie. the dimension of \(\mathbf{y}\) in case of \(\mathbf{y} = W\mathbf{x}\) (or equivalently the dimension of the \(\boldsymbol{\delta}\) of the backward pass).
in_size – input vector size, ie. the dimension of the vector \(\mathbf{x}\) in case of \(\mathbf{y} = W\mathbf{x}\)).
resistive_device – resistive device.
bias – whether to add a bias column to the tile, ie. \(W\) has an extra column to code the biases. Internally, the input \(\mathbf{x}\) will be automatically expanded by an extra dimension which will be set to 1 always.
-
cuda
(device=None)¶ Return a copy of this tile in CUDA memory.
- Parameters
device (Optional[Union[torch.device, str, int]]) –
- Return type