aihwkit.simulator.tiles.analog module

High level analog tiles (analog).

class aihwkit.simulator.tiles.analog.AnalogTile(out_size, in_size, rpu_config, bias=False, in_trans=False, out_trans=False)[source]

Bases: TileModule, TileWithPeriphery, RPUCudaSimulatorTileWrapper

Analog tile.

This analog tile implements an abstract analog tile where many cycle-tp-cycle non-idealities and systematic parameter-spreads that can be user-defined.

In general stochastic bit pulse trains are generate during update and device materials (or unit cells) at each cross-point are only updated if a coincidence of rows and columns pulses.

Here, a resistive device material is assumed that response with a finite step change of its conductance value that is independent of its own conductance value.

In its basic parameter settings it implements the analog RPU tile model described in Gokmen & Vlasov (2016), but with a number of enhancements that are adjustable by parameter settings.

All tile parameters are given in AnalogTileParameters.

Forward pass:

In general, the following analog forward pass is computed:

\[\mathbf{y} = f_\text{ADC}((W + \sigma_\text{w}\Xi) \otimes (f_\text{DAC}( x/\alpha ) + \sigma_\text{inp}\,\boldsymbol{\xi}_1 ) + \sigma_\text{out}\,\boldsymbol{\xi}_2)\,s_\alpha\, s_\text{out}\,\alpha\]

where \(W\) is the weight matrix, \(\mathbf{x}\) the input vector and the \(\Xi,\boldsymbol{\xi}_1,\boldsymbol{\xi}_2\) Gaussian noise variables (with corresponding matrix and vector sizes). The \(\alpha\) is a scale from the noise management (see rpu_types.NoiseManagementTypeMap). The symbol \(\otimes\) refers to the ‘analog’ matrix-vector multiplication, that might have additional non-linearities.

\(f_\text{Z}\) (with Z either ADC or DAC) indicates the discretization to a number of equidistant steps between a bound value \(-b_\text{Z},\ldots,b_\text{Z}\) potentially with stochastic rounding (SR):

\[f_\text{Z}(x) = \text{round}(x\, \frac{r_\text{Z}}{2\,b_\text{Z}} + \zeta)\frac{2b_\text{Z}}{r_\text{Z}}\]

If SR is enabled \(\zeta\) is an uniform random \(\in [-0.5,0.5)\). Otherwise \(\zeta=0\). Inputs are clipped below \(-b_\text{Z}\) and above \(b_\text{Z}\)

\(r_Z\) is the resolution of the ADC or DAC. E.g. for 8 bit, it would be \(1/256\)

Note

Typically the resolution is reduced by 2 level, eg. in case of 8 bits it is set to \(1/254\) to account for a discretization mirror symmetric around zero, including the zero and discarding one value.

The scalar scale \(s_\text{out}\) can be set by out_scale. The scalar scale \(s_\alpha\) is an additional scale that might be use to map weight better to conductance ranges.

For parameters regarding the forward pass behavior, see AnalogTileInputOutputParameters.

Backward pass:

Identical to the forward direction except that the transposed weight matrix is used. Same parameters as during the forward pass except that bound management is not supported.

For parameters regarding the backward pass behavior, see AnalogTileInputOutputParameters.

General weight update:

The weight update that theoretically needs to be computed is

\[w_{ij} = w_{ij} + \lambda d_i\,x_j\]

thus the outer product of error vector and input vector.

Although the update depends on the ResistiveDevice used, in general, stochastic pulse trains of a given length are drawn, where the probability of occurrence of an pulse is proportional to \(\sqrt{\lambda}d_i\) and \(\sqrt{\lambda}x_j\) respectively. Then for each cross-point, in case a coincidence of column and row pulses occur, the weight is updated one step. For details, see Gokmen & Vlasov (2016).

The amount of how the weight changes per single step might be different for the different resistive devices.

In pseudo code:

# generate prob number
p_i  = quantize(A * d_i, res, sto_round)
q_j  = quantize(B * x_j, res, sto_round)
sign = sign(d_i)*sign(x_j)

# generate pulse trains of length BL
pulse_train_d = gen_pulse_train(p_i, BL) # e.g 101001001
pulse_train_x = gen_pulse_train(q_j, BL) # e.g 001010010

for t in range(BL):
    if (pulse_train_x[t]==1) and (pulse_train_d[t]==1)
        update_once(w_{ij}, direction = sign)

The probabilities are generated using scaling factors A and B that are determined by the learning rate and pulse train length BL (see below). quantize is an optional discretization of the resulting probability, to account for limited resolution number in the stochastic pulse train generation process on the chip .

The update_once functionality is in general dependent on the analog tile class. For ConstantStep the step width is independent of the actual weight, but has cycle-to-cycle variation, device-to-device variation or systematic bias for up versus down direction (see below).

For parameters regarding the update behaviour, see AnalogTileUpdateParameters.

Parameters:

out_size (int) – output vector size of the tile, ie. the dimension of \(\mathbf{y}\) in case of \(\mathbf{y} = W\mathbf{x}\) (or equivalently the dimension of the \(\boldsymbol{\delta}\) of the backward pass).
in_size (int) – input vector size, ie. the dimension of the vector \(\mathbf{x}\) in case of \(\mathbf{y} = W\mathbf{x}\)).
rpu_config (InferenceRPUConfig | SingleRPUConfig | UnitCellRPUConfig | TorchInferenceRPUConfig | DigitalRankUpdateRPUConfig | QuantizedTorchInferenceRPUConfig) – resistive processing unit configuration.
bias (bool) – whether to add a bias column to the tile, ie. \(W\) has an extra column to code the biases. Internally, the input \(\mathbf{x}\) will be automatically expanded by an extra dimension which will be set to 1 always.
in_trans (bool) – Whether to assume an transposed input (batch first).
out_trans (bool) – Whether to assume an transposed output (batch first).

forward(x_input, tensor_view=None)[source]

Torch forward function that calls the analog forward

Parameters:

x_input (Tensor)
tensor_view (Tuple | None)

Return type:

Tensor

supports_ddp: bool = False

class aihwkit.simulator.tiles.analog.AnalogTileWithoutPeriphery(out_size, in_size, rpu_config, bias=False, in_trans=False, out_trans=False)[source]

Bases: TileModule, BaseTile, RPUCudaSimulatorTileWrapper

Analog tile without the periphery.

Same basic functionality as class:AnalogTile, however, without the digital periphery, such as weight scaling and bias.

Parameters:

out_size (int)
in_size (int)
rpu_config (InferenceRPUConfig | SingleRPUConfig | UnitCellRPUConfig | TorchInferenceRPUConfig | DigitalRankUpdateRPUConfig | QuantizedTorchInferenceRPUConfig)
bias (bool)
in_trans (bool)
out_trans (bool)

backward(d_input, ctx=None)[source]

Perform the backward pass.

Parameters:

d_input (Tensor) – [N, out_size] tensor. If out_trans is set, transposed.
ctx (Any) – torch auto-grad context [Optional]

Returns:

[N, in_size] tensor. If in_trans is set, transposed.

Return type:

torch.Tensor

forward(x_input)[source]

Torch forward function that calls the analog forward

Parameters:: x_input (Tensor)
Return type:: Tensor

get_learning_rate()[source]

Return the tile learning rate.

Returns:: the tile learning rate.
Return type:: float

joint_forward(x_input, is_test=False, ctx=None)[source]

Perform the joint forward method.

Parameters:

x_input (Tensor) – [N, in_size] tensor. If in_trans is set, transposed.
is_test (bool) – whether to assume testing mode.
ctx (Any) – torch auto-grad context [Optional]

Returns:

[N, out_size] tensor. If out_trans is set, transposed.

Return type:

torch.Tensor

set_learning_rate(learning_rate)[source]

Set the tile learning rate.

Set the tile learning rate to -learning_rate. Note that the learning rate is always taken to be negative (because of the meaning in gradient descent) and positive learning rates are not supported.

Parameters:: learning_rate (float | None) – the desired learning rate.
Return type:: None

supports_indexed: bool = False

update(x_input, d_input)[source]

Perform the update pass.

Parameters:

x_input (Tensor) – [..., in_size] tensor. If in_trans is set, [in_size, ...].
d_input (Tensor) – [..., out_size] tensor. If out_trans is set, [out_size, ...].

Returns:

None

Return type:

None