# Performance Evaluation of Various Configuration of Adder in Variable Latency Circuits with Error Detection/Correction Mechanism 

Kenta Ando<br>Atsushi Takahashi<br>Graduate School of Engineering, Osaka University<br>Suita, 565-0871, Japan<br>\{ando,atsushi\}@si.eei.eng.osaka-u.ac.jp


#### Abstract

In this paper, in order to develop a circuit synthesis method for error detection/correction framework, various ripple-carry-adders (RCA) in which the minimum delay is increased by delay insertion and/or the probability of large delay is reduced by changing the configuration of the circuit components are designed and evaluated. In experiments, we confirm that a circuit obtained achieves a better performance in error detection/correction framework.


## I. Introduction

Most of digital circuits nowadays are designed as clock synchronous circuits with global clocks. In a typical clock synchronous circuit implementation, a global clock is designed to be inputted to every flip-flop simultaneously, and every primitive computation is executed in one clock cycle. That is, every signal transfer between flip-flops is done in a clock cycle, and the latency of a primitive computation is fixed. In such implementation, the performance of a circuit depends on the clock period, and the maximum delay of primitive computations gives a lower bound of the clock period. Therefore, the reduction of the maximum delay of primitive computations without changing the amount of primitive computations executed in one clock cycle have been pursued in circuit synthesis to maximize the performance of a circuit.

Various methods to reduce the maximum delay have been proposed. However, the reduction of delay by these methods approaches to the limit. Critical paths of a circuit that require larger delay may be seldom activated in a primitive computation. As the manufacturing technology advances, the variation of delay of a circuit element has become larger as the size of the element has become smaller. The difference of delay of a primitive computation between the best cases and the worst cases has become larger. While, the circuit behavior should be guaranteed in all cases. Disadvantages to guarantee the circuit behavior in all cases have become larger.

Variable latency circuits (VLC) in which each primitive computation is not done in fixed latency have potential to ease these disadvantages and to improve the circuit performance remarkably. In VLC, the number of clock cycles
of each primitive computation changes depending on circuit status etc [1]. There are various implementations of VLC. In this paper, VLC that is implemented by using error detection/correction mechanism [2], called VLEDC, is assumed.

A conventional fixed latency circuit can be converted into VLEDC by lapping it by a circuitry that implements error detection/correction mechanism. In paper [3], several conventional adders are evaluated in VLEDC framework. It was shown that the effective clock periods of them are improved by converting them into VLEDC. However, it is not clear whether the maximum performance of addition in VLEDC framework is achieved by converting conventional adders. The performance of a circuit in VLEDC framework depends on the minimum delay, maximum delay, and delay distribution of the circuit. In general, the performance is better if the larger the minimum delay is and/or the lower the possibility of large delay is. However, conventional circuits are usually designed so that the maximum delay is reduced as much as possible to maximize the performance in the conventional framework and are not necessarily fitted to VLEDC framework.

In this paper, in order to develop a circuit synthesis method for error detection/correction framework, various ripple-carry-adders (RCA) in which the minimum delay is increased by delay insertion and/or the probability of large delay is reduced by changing the configuration of the circuit components are designed and evaluated. In experiments, we confirm that a circuit obtained achieves a better performance in error detection/correction framework.

## II. Variable Latency Circuits with error DETECTION/CORRECTION MECHANISM

In this section, the behavior of a variable latency circuit with error detection/correction mechanism [2] is explained. The latency of a circuit is the time required to generate the outputs after the inputs are given, and is usually the multiple of the clock period. VLEDC changes the latency according to the time required to generate the output signals.


Fig. 1. VLEDC

VLEDC has two execution modes. One is regular mode in which primitive computations are being executed. The other is correction mode in which wrong circuit behavior caused by delay errors is being corrected. VLEDC continues executing primitive computations in regular mode if no delay error occurs. If delay errors occur, VLEDC detects them and corrects the circuit status by replacing wrong values with correct values in correction mode, while suspending the execution of primitive computations. When the circuit status is corrected, VLEDC returns to regular mode and resumes the execution of primitive computations.

The latency of a primitive computation is the time required to execute it in regular mode if no delay occurs. While, the time required in correction mode is added if a delay error occurs.

Fig. 1 shows an overview of an implementation of VLEDC where a functional unit is converted into VLEDC. In this implementation, a conventional deterministic flip-flop is replaced by a speculative flip-flop. A speculative flip-flop contains two conventional deterministic flip-flops, called spFF and cfFF. A delay error caused at $\operatorname{spFF} q$ is allowed, while no delay error is allowed at $\operatorname{cfFF} r$. The value stored at $q$ is erroneous but is available earlier. While, the value stored at $r$ is errorless but are available later. The values stored at $q$ is used as an "output" signal of this circuit, and the following primitive computations start earlier. The value generated by comparing the values of $q$ and $r$ after the correct value is available at $r$ is used as an "error" signal of this circuit, and the behavior of the following circuits is controlled.

When a primitive computation that takes less than the clock period is executed in this circuit, both $q$ and $r$ store the correct value. However, when a primitive computation


Fig. 2. A timing chart of VLEDC
that takes more than the clock period is executed, $q$ does not always store the correct value. If two values stored in $q$ and $r$ are different, it means that $q$ does not store the correct value, and that a delay error occurs. When a delay error occurs, it is notified to the following circuit by using "error" signal, and $q$ is made to store the correct value.
Fig. 2 shows the timing chart that indicates a behavior of the circuit. As represents input values, and $P_{\mathrm{S}}$ represents output values.

## III. Timing constraints of circuits

## A. Timing constraint

VLEDC contains speculative flip-flops and conventional deterministic flip-flops. Let $F_{s}$ and $F_{n}$ be the set of speculative flip-flops and deterministic flip-flops in the circuit, respectively. Let $F=F_{s} \cup F_{n}$.

Let $T$ be the clock period and $s(a)$ be the clock timing of flip-flop $a$. The clock timing of a speculative flip-flop is defined as the clock timing of spFF in it. The difference of the clock timings of $\operatorname{spFF} q$ and cfFF $r$ in a speculative flip-flop $p$ is called confirmation margin, and denoted by $d(p)$. The confirmation margin is assumed to be non negative. That is, $s(p)=s(q)$ and $d(p)=s(r)-s(q) \geq 0$. In the following discussion, a deterministic flip-flip $a \in F_{n}$ is regarded as a speculative flip-flop which consists of spFF and cfFF where the confirmation margin is 0 .

Let $P$ be the set of all pairs of flip-flops such that signals are transferred from one to another without through other flip-flops. Let $d_{\max }(a, b)$ and $d_{\text {min }}(a, b)$ be the maximum delay and the minimum delay from spFF of speculative flip-flop $a$ to cfFF of speculative flip-flop $b$ without passing through other flip-flops, respectively. These delays correspond to the signal transfers in regular mode. Similarly, let $c_{\text {max }}(a, b)$ and $c_{\text {min }}(a, b)$ be the maximum delay and the minimum delay from cfFF of speculative flip-flop $a$ to spFF or cfFF of speculative flip-flop $b$ without passing through other flip-flops, respectively. These delays correspond to the signal transfers in correction mode.

In order to transfer signal correctly in the regular mode of VLEDC, the following two constraints must be satisfied
by $(a, b) \in P[8]$ :
Setup constraint

$$
s(a)+d_{\max }(a, b) \leq T+s(b)+d(b)
$$

Hold constraint

$$
s(b)+d(b) \leq s(a)+d_{\min }(a, b)
$$

These two constraints guarantee that the correct signal is stored at cfFF of $b$. Since a delay error is allowed at spFF , no constraint is imposed on spFF when it is the destination of a signal transfer.

Also, in order to detect delay errors and to correct signal correctly in the correction mode of VLEDC, the following two constraints must be satisfied by $(a, b) \in P$ :

Setup constraint

$$
s(a)+d(a)+c_{\max }(a, b) \leq T+s(b)
$$

Hold constraint

$$
s(b) \leq s(a)+d(a)+c_{\min }(a, b)
$$

Assume that the clock timings of flip-flops in $F$ are the same and that the confirmation margins of speculative flip-flops in $F_{s}$ are the same. Let $s(a)=0$ for all $a \in F$ and $d(p)=d$ for all $p \in F_{s}$ where $d$ is a non negative constant. Then
$T \geq \max \left\{\max \left\{d_{\max }(a, b)-d \mid a \in F, b \in F_{s},(a, b) \in P\right\}\right.$,

$$
\begin{aligned}
& \max \left\{d_{\max }(a, b) \mid a \in F, b \in F_{n},(a, b) \in P\right\} \\
& \left.\max \left\{c_{\max }(a, b)+d \mid a \in F_{s}, b \in F,(a, b) \in P\right\}\right\}
\end{aligned}
$$

and

$$
d \leq \min \left\{d_{\min }(a, b) \mid a \in F, b \in F_{s},(a, b) \in P\right\}
$$

Roughly speaking, the clock period can be reduced without violating the setup constraint by setting the confirmation margin larger. The confirmation margin can be set larger without violating the hold constraint when the minimum delay is large.

## B. Effective clock period

In a typical fixed latency circuit, one primitive computation is executed in one clock cycle. So the latency of a primitive computation is equal to the clock period and the speed performance of a circuit can be evaluated by the clock period. However, in VLEDC, the number of clock cycles required in one primitive computation changes. In VLEDC, the speed performance of a circuit is evaluated by effective clock period $T_{e f f}$, which is defined as the average latency of primitive computations. When a circuit executes a primitive computation in $\alpha$ cycles and error correction in $\beta$ cycles when a delay error occurs, the effective clock period is given by $T_{e f f}(T)=T(\alpha(1-E(T))+(\alpha+\beta) E(T))=T(\alpha+\beta E(T))$,


Fig. 3. Half-adder and full-adder
where $E(T)$ is the probability of delay error with clock pe$\operatorname{riod} T$.

Generally speaking, in order to reduce the effective clock period of VLEDC, (1) the minimization of the maximum delay, (2) the maximization of the minimum delay, and (3) the reduction of the probability of larger delay are preferred.

## IV. Configuration of adders

In this paper, various 6-bit ripple-carry-adders (RCA) which consist of one half-adder (HA) and five full-adders (FA) are evaluated. In Fig.3, two HAs and four FAs which are used to form RCA are shown. $H_{a}$ and $H_{b}$ are HA. $F_{a}$, $F_{b}, F_{c}$, and $F_{d}$ are FA. They consists of NAND, NOR, NOT, and BUF gates. BUF consists of the even number of NOT gates in series. BUF gates which are shown by black triangles are inserted to increase delay, if necessary, where its size which is the number of NOT gates contained is represented by $m . \quad H_{x}$ and $F_{x}$ with BUF of size $m$ are denoted by $H_{x}(m)$ and $F_{x}(m)$, respectively. Note that $H_{x}(0)=H_{x}$ and $F_{x}(0)=F_{x}$. In cases that BUF is

TABLE I
Performances of RCAs using one type of full－adder

| Name | Configuration |  | No．of gates | Delay［ns］ |  | Period［ns］ |  | $\begin{array}{r} E(T) \\ {[\%]} \end{array}$ | $\begin{array}{r} T_{e f f} \\ {[\%]} \\ \hline \end{array}$ | $\begin{array}{r} P \\ {[\%]} \end{array}$ | $\begin{array}{r} P T_{e f f} \\ {[\%]} \end{array}$ |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  | Half－adder | Full－adder |  | max | min | eff． | clock |  |  |  |  |
| 〈aaaaaa＞（conventional） | $H_{a}(0)$ | $F_{a}(0)$ | 49 | 26 | 4 | 22.58 | 22 | 2.6 | 100.0 | 100.0 | 100.0 |
| 〈accccc＞ |  | $F_{c}(0)$ | 49 | 26 | 4 | 23.10 | 22 | 5.0 | 102.3 | 107.0 | 109.5 |
| 〈abbbbb＞ |  | $F_{b}(0)$ | 54 | 24 | 4 | 20.89 | 20 | 4.4 | 92.5 | 111.6 | 103.2 |
| 〈addddd＞ |  | $F_{d}(0)$ | 54 | 24 | 4 | 21.64 | 20 | 8.2 | 95.8 | 113.2 | 108.4 |
| 〈baaaa） | $H_{b}(0)$ | $F_{a}(0)$ | 49 | 26 | 4 | 22.52 | 22 | 2.3 | 99.7 | 98.8 | 98.6 |
| 〈bccccc〉 |  | $F_{c}(0)$ | 49 | 26 | 4 | 23.17 | 22 | 5.3 | 102.6 | 105.9 | 108.6 |
| 〈bbbbbb） |  | $F_{b}(0)$ | 54 | 25 | 4 | 21.99 | 21 | 4.7 | 97.4 | 112.7 | 109.7 |
| 〈bddddd〉 |  | $F_{d}(0)$ | 54 | 25 | 4 | 22.20 | 21 | 5.7 | 98.3 | 114.3 | 112.4 |

inserted to $F_{a}$ or $F_{c}$ ，NOT gate contained is used as the head NOT gate of BUF，and the increase of the number of NOT gates by insertion is reduced by one．

The basic structural comparisons of HAs are as follows． Output C is generated by NOT gate in $H_{a}$ ．While，output C is generated by NAND gate in $H_{b}$ ．The numbers of gates of $H_{a}$ and $H_{b}$ are the same．The maximum and minimum number of gates from an input to an output of HA are the same in both of them．

The basic structural comparisons of FAs are as follows． The number of gates of $F_{a}$ is one smaller than $F_{b}$ ．The maximum number of gates from input A or B to output C of $F_{b}(0)$ is one smaller than $F_{a}(0) . F_{c}$ and $F_{d}$ are obtained from $F_{a}$ and $F_{b}$ by replacing NAND gates with NOR gates and vice versa，respectively．

## V．Delay and Power Model

In our delay analysis，the following simple model is used．The gate delay of NAND，NOR，and NOT gates are set to $2[\mathrm{~ns}], 2[\mathrm{~ns}]$ ，and $1[\mathrm{~ns}]$ ，respectively．No other delay is assumed．The maximum and minimum delays from each input to each output of HAs and FAs are de－ scribed in Fig．3．For example，the maximum delays to output S and C of $H_{a}$ are 1 ［ ns ］shorter than that of $H_{b}$ ． The maximum delays from input A or B to output C of $F_{b}(0)$ are $2[\mathrm{~ns}]$ shorter than that of $F_{a}(0)$ ．
In our power analysis，we assume that each gate con－ sumes the same constant power during it works，and that the power consumption of a circuit is the sum of powers consumed by gates in the circuit．The energy consump－ tion of a circuit to execute a primitive computation is the total power consumed by the circuit from the inputs are given to the outputs are generated．A change of an input of a gate influences the output of the gate after the gate delay．The gate consumes power during the gate delay if the output of the gate is changed according to the change of the input．
The delay and energy consumption of a circuit to ex－ ecute a primitive computation varies according to an in－ put vector pairs．The distributions of delay and energy consumption of a circuit to execute a primitive compu－ tation depend on the characteristics of the set of input
vector pairs．In this paper，the distributions of delay and energy consumption of a circuit are obtained by assum－ ing that the probability of occurrence of each input vec－ tor pair is equal．The energy consumed by error detec－ tion／correction circuit is ignored．

## VI．Simulation

In our evaluation，each RCA is written by verilog lan－ guage and is simulated by using VCS．Each RCA is eval－ uated by the minimum effective clock period $T_{e f f}$ ，the av－ erage energy consumption $P$ ，and the product of these $P T_{e f f}$ ．

Each configuration of 6－bit RCA shown in this paper is represented by the sequence of types of HA and FA and the amount of delay insertion．For example，$\langle$ abaaaa $\rangle(2)$ consists of $H_{a}$（bit－1），$F_{b}$（bit－2），and $F_{a}$（from bit－3 to bit－6），and the size of each BUF is determined so that the minimum delay is increased by 2 ［ns］without increasing the maximum delay．

The distributions of delay and energy consumption of a circuit are obtained by simulating all input vector pairs． The probability of a delay error $E(T)$ is defined as the probability of delay of a circuit that exceeds the clock period $T$ ．The minimum effective clock period $T_{e f f}$ is determined by evaluating all feasible clock period $T$ ．

Tables．I－III are the result of simulation．In Table I，II， and III，the number of gates，the maximum and minimum delays，the minimum effective clock period，and the clock period and the probability of delay error when the mini－ mum effective clock period is achieved are shown．Also， the normalized evaluation indices where 〈aaaaaa〉 is used as $100 \%$ are shown．The minimum values of $T_{e f f}, P$ and $P T_{e f f}$ over all configurations are shown in bold．

## A．Difference of Full－Adders

Table I shows the effects of difference of FAs．In this evaluation，each RCA consists of one type of full－adders． Even though the static evaluations of $F_{a}$ and $F_{b}$ are the same as $F_{c}$ and $F_{d}$ ，respectively，the minimum effective clock periods are different．The maximum delay and the minimum effective clock period of $\langle\mathrm{abbbbb}\rangle$ is minimum

TABLE II
Performances of RCAs using two types of full－adder

| Name | Configuration |  |  |  |  | No．of gates | Delay［ns］ |  | Period［ns］ |  | $\begin{array}{r} E(T) \\ {[\%]} \\ \hline \end{array}$ | $\begin{array}{r} T_{e f f} \\ {[\%]} \end{array}$ | $\begin{array}{r} P \\ {[\%]} \end{array}$ | $\begin{array}{r} P T_{e f f} \\ {[\%]} \end{array}$ |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  | bit－1 | bit－2 | bit－3 | bit－4 | bit－5 bit－6 |  | max | min | eff． | clock |  |  |  |  |
| 〈aaaaaa） | $H_{a}(0)$ | $F_{a}(0)$ | $F_{a}(0)$ | $\begin{aligned} & F_{a}(0) \\ & \hline F_{b}(0) \\ & \hline \end{aligned}$ | $F_{a}(0)$ | 49 | 26 | 4 | 22.58 | 22 | 2.6 | 100.0 | 100.0 | 100.0 |
| 〈abaaaa） |  | $F_{b}(0)$ |  |  |  | 50 | 24 | 4 | 21.09 | 20 | 5.5 | 93.4 | 101.4 | 94.7 |
| 〈ababaa） |  |  |  |  |  | 51 | 24 | 4 | 21.09 | 20 | 5.5 | 93.4 | 105.0 | 98.1 |
| 〈abbaaa） |  |  | $F_{b}(0)$ | $\begin{array}{\|l\|l\|} \hline F_{a}(0) \mid F_{b}(0) \\ \hline F_{b}(0) \frac{F_{a}(0)}{F_{b}(0)} \\ \hline \end{array}$ |  | 51 | 24 | 4 | 20.89 | 20 | 4.4 | 92.5 | 104.2 | 96.4 |
| 〈abbaba） |  |  |  |  |  | 52 | 24 | 4 | 20.89 | 20 | 4.4 | 92.5 | 107.6 | 99.5 |
| 〈abbbaa） |  |  |  |  |  | 52 | 24 | 4 | 20.89 | 20 | 4.4 | 92.5 | 106.7 | 98.7 |
| ＜abbbbb） |  |  |  |  |  | 54 | 24 | 4 | 20.89 | 20 | 4.4 | 92.5 | 111.6 | 103.2 |

TABLE III
Performances of RCAs with delay insertion

among them．While，the average energy consumption of〈baaaaa〉 is minimum among them．The average energy consumption of 〈baaaaa〉 is small because not only the number of gates is small but also the number of switchings is small since the cases that inputs of a gate in FA at bit－2 are inputted simultaneously are large．

## B．Combination of Full－Adders

Table II shows the effects when two types of full－adders are used．In this evaluation，each RCA except 〈aaaaaa〉 consists of $H_{a}, F_{a}$ and $F_{b}$ ．In these RCAs，the maximum delay is $24[\mathrm{~ns}]$ when $F_{b}$ is used at bit－2．The minimum effective clock period is minimum among them when $F_{b}$ is used both at bit－2 and bit－3．The average energy con－ sumption tends to be large when the number of gates is large．Even though the number of gates of 〈abbaaa〉 and $\langle$ abbbaa〉 are the same as 〈ababaa〉 and 〈abbaba〉，respec－ tively，the average energy consumptions of formers are small．The average energy consumption is small when $F_{b}$ is used in series，since the probability of switching of the gate that outputs the carry signal of $F_{b}$ is small．

## C．Insertion of Delay

Table III shows the effects of insertion of delay．In this evaluation，the delays are minimally inserted by changing the sizes of BUFs so that the minimum delay is increased without changing the maximum delay．The minimum ef－ fective clock period of $\langle\mathrm{abbbaa}\rangle(4)$ is minimum among them．The minimum effective clock period of 〈abbbaa〉（4）
is $19.3 \%$ shorter than 〈aaaaaa〉．This is mainly due to the fact that the maximum delay is small and that the shorter clock period is feasible since the minimum delay is larger． In addition，the probability of delay error of 〈abbbaa〉（4） is smaller than the other configurations whose maximum and minimum delays are the same as 〈abbbaa〉（4）．
The minimum delay can be increased up to 8 ［ns］with－ out increasing the maximum delay by changing the sizes of BUFs．The minimum delay can be increased more with－ out increasing the maximum delay by inserting delays to arbitrary places．Even if the clock period is set shorter than 16 ［ns］by inserting delays more，the effective clock period is not reduced since the probability of delay error increases much．
The average energy consumptions of them are larger than 〈aaaaaa〉．PT product of $\langle$ abaaaa $\rangle(2)$ is minimum over all configurations shown in this paper．Even though the average energy consumption of 〈baaaaa〉 is minimum over all configurations shown in this paper，the aver－ age energy consumption of $\langle\mathrm{bbbaaa}\rangle(4)$ is larger than〈abbaaa〉（4）．That is，the replacement of $H_{a}$ with $H_{b}$ does not necessarily reduce the energy consumption．

## D．Distribution of delay

Fig． 4 shows the distributions of delay of 6 －RCAs．The maximum delay and the minimum delay of them are 24 ［ns］and 8 ［ns］，respectively．The distributions of delay would change drastically even if the maximum delay and the minimum delay are the same．Table IV shows the probabilities of delay errors of them in the cases where

(a) $\langle$ abaaaa $\rangle(4)$

(b) $\langle$ abbaaa $\rangle(4)$

(c) $\langle\mathrm{abbbaa}\rangle(4)$

(d) $\langle$ addaaa $\rangle(4)$

Fig. 4. Distribution of delay

TABLE IV
Probability of error and effective clock period

| Name | $T=16$ |  | $T=18$ |  | $T=20$ |  |
| :---: | ---: | ---: | ---: | ---: | ---: | ---: |
|  | $E(T)$ <br> $[\%]$ | $T_{\text {eff }}$ <br> $[\mathrm{ns}]$ | $E(T)$ <br> $[\%]$ | $T_{\text {eff }}$ <br> $[\mathrm{ns}]$ | $E(T)$ <br> $[\%]$ | $T_{\text {eff }}$ <br> $[\mathrm{ns}]$ |
| $\langle$ abaaaa $\rangle(4)$ | 16.6 | 18.66 | 10.1 | 19.82 | 5.5 | 21.10 |
| $\langle$ abbaaa $\rangle(4)$ | 16.6 | 18.66 | 11.0 | 19.98 | 4.4 | 20.88 |
| $\langle$ abbbaa $\rangle(4)$ | 13.9 | 18.22 | 11.0 | 19.98 | 4.4 | 20.88 |
| $\langle$ bbbbaa $\rangle(4)$ | 19.1 | 19.06 | 14.9 | 20.68 | 5.4 | 21.08 |

the clock period is 16 [ns], 18 [ns], and 20 [ns]. The distribution of delay should be taken into account to minimize the effective clock period.

## VII. Summary and future works

In this paper, we evaluated the performance of various 6bit RCA supposing these RCA work in VLEDC. The improvement of effective clock period by introducing VLEDC can be larger by making the minimum delay larger without increasing the maximum delay. The distribution of delay can be changed so that the probability of delay error is reduced by modification of circuits. The performance of a circuit in VLEDC is improved much if the probability of large delay is reduced by circuit modification.

As future works, a method for evaluating the delay and the energy consumption of a circuit efficiently and a method for improving the distribution of delay of practical circuits are needed to be investigated.

## Acknowledgements

This work is supported by VLSI Design and Education Center (VDEC), the University of Tokyo in collaboration with Synopsis, Inc. and Grant-in-Aid for Scientific Research (B) 21300012.

## References

[1] G. Wolrich, E. McLellan, L. Harada, J. Montanaro, and R. Yodlowski, "A High Performance Floating-Point Coprocessor," IEEE Jounal of Solid-State Circuits, vol. SC-19, pp.690-696, Oct. 1984.
[2] D. Ernst, N. S. Kim, S. Das, S. Pant, T. Pham, R Rao, C, Ziesler, D. Blaauw, T. Austin, T. Mudge, and K. Flautner, "Razor: A Low-Power Pipeline Based on CircuitLevel Timing Speculation," Proc. 36th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 7-18, 2003.
[3] Y. Ukon, A. Takahashi, and K. Taniguchi "An evaluation of delay error rate of an adder in terms of clock period," in IEICE Technical Report (ICD2009-91), Vol.109, No.336, pp.77-81, 2009. (in Japanese)
[4] M.Kurimoto, H.Suzuki, R.Akiyama, T.Yamanaka, H.Ohkuma, H.Takata, H.Shinohara, "Phase-Adjustable Error Detection Flip-flops with 2-Stage Hold-Driven Optimization, and Slack Based Grouping Scheme and Slack Distribution Control for Dynamic Voltage Scaling," ACM Trans. DAES, Vol.15, No. 2 Article 17, 2010.
[5] D. Bull, S. Das, K. Shivashankar, G. Dasika, K. Flautner, and D. Blaauw, "A Power-Effecient 32b ARM ISA Processor Using Timing-Error Detection and Correction for Transient-Error Tolerance and Adaptation to PVT Variation," Proc. IEEE International Solid-State Circuits Conference (ISSCC), pp. 284-285, 2010.
[6] M. Inoue, Y. Ukon, and A. Takahashi, "An evaluation of error detection/correction circuits by gate level simulation," IEICE Technical Report (VLD2010-141), Vol.110, No.432, pp.147-152, 2011. (in Japanese)
[7] Y. Ukon, M. Inoue, A. Takahashi, and K. Taniguchi, "Behavioral Verification of a Variable Latency Circuit on FPGA," IEICE Technical Report (VLD2010-142), Vol.110, No.432, pp.153-158, 2011. (in Japanese)
[8] A. Takahashi, Y. Kajitani, "Performance and Reliability Driven Clock Scheduling of Sequential Logic Circuits," Proc. of Asia and South Pacific Design Automation Conference '97, pp. 37-42, 1997.
[9] H. Fujioka, K. Nakamae, Calculator system -The base of hardware-, Shokodo, 2007. (in Japanese)

