# A Tuning Method of Programmable Delay Element with Two Values for Yield Improvement 

Hayato Mashiko Yukihide Kohira<br>School of Computer Science, the University of Aizu<br>Aizu-Wakamatsu City, Fukushima, 965-8580 Japan<br>\{m5161143,kohira\}@u-aizu.ac.jp


#### Abstract

Due to progressing the process technology in LSI and increasing variations of wire and gate delays after fabrication, timing violations cause significant reduction in the yield of LSI chips. To recover the timing violations, programmable delay elements called PDEs are inserted into the clock tree before fabrication and their delays are tuned after fabrication. In this paper, we use PDEs with two delay values and propose a delay tuning method of the PDEs to improve the yield and to reduce the number of tests. Moreover, we evaluate the circuits obtained by the proposed method on commercial CAD tools. Experimental results show that the proposed method is effective.


## I. Introduction

Due to progressing the process technology in LSI, the performance of LSI circuits has been advanced. Progressing the process technology increases the process variation. The process variation in fabrication causes the delay variation and the delay variation causes the timing violation. The yield is reduced significantly by the timing violation, and the cost of LSI chips has increased. In general, the addition of margins and the statistical static delay analysis have been applied to satisfy the timing constraints. However, since these methods need to estimate the delay variations before fabrication, they are not essential solutions in the situation in which the uncertainty is increasing in fabrication process.
Recently, deskew methods have been proposed to recover the timing violations. This method inserts the elements called Programmable Delay Elements (PDEs), whose delays can be tuned after fabrication, into the clock tree before fabrication. When some timing violations are found in test process, delays of PDEs are tuned to recover timing violations.

In [1], under assumptions that the delay of each PDE can be set to an arbitrary integer and delays between all register pairs are given, the delay tuning method formulated to integer linear programming (ILP) has been proposed. Since the formulation of ILP is totally unimodular, it can be solved in polynomial time. Therefore, the op-
timality of the delay timing method is guaranteed and it can be expected to improve the yield highly in short computational time. However, the structure of the PDE with arbitrary integer values is not considered in [1]. Furthermore, since measuring the exact delay for each register pair is required to apply this delay tuning method, the cost for this delay tuning method is high.

In [2], given PDEs with discrete values, a delay tuning method formulated to ILP and a heuristic method have been proposed. The optimality of the delay tuning method formulated to ILP is guaranteed and can be expected to improve the yield highly. However, the delay tuning method is unsuitable to large scale circuits because ILP typically takes long computation time. On the other hand, although the heuristic delay tuning method can be solved in short computation time, it cannot be expected to improve the yield enough since it does not guarantee the optimality. Moreover, when the PDEs with discrete values are constructed by only basic logic gates, the circuit size with the PDEs is larger and the power consumption in the circuit must be increased.

When PDEs with two delay values are constructed by only basic logic gates, the circuit size with the PDEs can be smaller. The delay tuning method of the PDEs can be obtained by applying 2-clustering method [3] at the given clock cycle time in polynomial time. However, the number of tests is considerable in this delay tuning method because whether the timing constraints are violated for any register pairs with signal propagations at all PDE delay tuning patterns must be checked by the path-delaytest [4].

In this paper, we use the PDEs with two delay values and propose a delay tuning method of the PDE to improve the yield and to reduce the number of tests. Moreover, we evaluate the circuits obtained by the proposed method on commercial CAD tools. Experimental results show that the proposed method is effective under Monte Carlo simulation which uses the delay information obtained by commercial CAD tools.


Fig. 1. Timing constraints in clock synchronous circuit.

## II. Preliminaries

## A. Timing Constraint

To work clock synchronous circuits correctly, timing constraints between register pairs with signal propagation must be satisfied (Fig. 1) [5]. Suppose that signal is propagated from register $a$ to register $b$. The timing constraints consist of hold constraint and setup constraint as follows:

## Hold Constraint:

$$
S(b)-S(a) \leq D_{\min }(a, b)
$$

## Setup Constraint:

$$
S(a)-S(b) \leq T-D_{\max }(a, b)
$$

where $T$ is the clock period, $D_{\max }(a, b)\left(D_{\min }(a, b)\right)$ is the maximum (minimum) delay from register $a$ to register $b$, and $S(a)(S(b))$ is the clock timing of register $a(b)$. Especially, violating the hold (setup) constraint is described as hold (setup) violation and violating the hold constraint or the setup constraint is described as timing violation.

Wire and gate delays are varied after fabrication (Fig. 2). Thus, delays between register pairs and clock timings are varied and the timing constraints in the clock synchronous circuit with delay variations are transformed as follows:

$$
\begin{align*}
(S(b) & \left.+\Delta_{S(b)}\right)-\left(S(a)+\Delta_{S(a)}\right) \\
& \leq D_{\min }(a, b)+\Delta_{D_{\min }}  \tag{1}\\
(S(a) & \left.+\Delta_{S(a)}\right)-\left(S(b)+\Delta_{S(b)}\right) \\
& \leq T-\left(D_{\max }(a, b)+\Delta_{D_{\max }}\right) \tag{2}
\end{align*}
$$

where $\Delta_{\max }\left(\Delta_{\min }\right)$ is the delay variation of maximum (minimum) delay from register $a$ to register $b$, and $\Delta_{S(a)}$ $\left(\Delta_{S(b)}\right)$ is the delay variation of clock timing $S(a)(S(b))$. By converting eq. (1) and eq. (2) to eq. (3) and eq. (4), all delay variations are regarded as the delays between


Fig. 2. Timing constraints with delay variations in clock synchronous circuit.
register pairs.

$$
\begin{align*}
S(b)-S(a) & \leq D_{\min }^{\prime}(a, b)  \tag{3}\\
S(a)-S(b) & \leq T-D_{\max }^{\prime}(a, b) \tag{4}
\end{align*}
$$

where

$$
\begin{aligned}
D_{\min }^{\prime}(a, b) & =D_{\min }-\left(\Delta_{S(a)}+\Delta_{S(b)}-\Delta_{D_{\min }}\right) \\
D_{\max }^{\prime}(a, b) & =D_{\max }-\left(\Delta_{S(a)}-\Delta_{S(b)}+\Delta_{D_{\max }}\right)
\end{aligned}
$$

Hereinafter, we discuss that all delays are varied between register pairs.

## B. Delay Tuning of PDEs

In the delay tuning methods for PDEs $[1,2,6,7]$, the delays of PDEs inserted into the clock tree before fabrication are tuned to recover the timing violations caused by the delay variations after fabrication. The timing constraints in the clock synchronous circuit with delay variations considering delays of PDEs are as follows:

$$
\begin{align*}
S^{\prime}(b)-S^{\prime}(a) & \leq D_{\min }^{\prime}(a, b)  \tag{5}\\
S^{\prime}(a)-S^{\prime}(b) & \leq T-D_{\max }^{\prime}(a, b) \tag{6}
\end{align*}
$$

where $S^{\prime}(a)=S(a)+d_{a}, S^{\prime}(b)=S(b)+d_{b}$ and $d_{a}\left(d_{b}\right)$ is the delay of PDE inserted for register $a(b)$.

If the hold constraint is violated, the left side in eq. (5) must be decreased to recover it. Therefore, the delay of PDE inserted for register $a$ is increased or the delay of PDE inserted for register $b$ is decreased. Similarly, if the setup constraint is violated, the left side in eq. (6) must be decreased to recover it. Therefore, the delay of PDE inserted for register $a$ is decreased or the delay of PDE inserted for register $b$ is increased.

## C. 2-clustering Method [3]

In clock skew scheduling [3], the set of registers with same clock timing is called cluster and partitioning registers to clusters is called clustering.

TABLE I
Definition of maxterm.

|  |  | Timing constraint of (a,b) |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: |
| $x_{a}$ | $x_{b}$ | $S(a)$ | $S(b)$ | Satisfaction | Violation |
| 0 | 0 | 0 | 0 | $c_{1}^{(a, b)}=1$ | $c_{1}^{(a, b)}=x_{a} \vee x_{b}$ |
| 0 | 1 | 0 | $d$ | $c_{2}^{(a, b)}=1$ | $c_{2}^{(a, b)}=x_{a} \vee \overline{x_{b}}$ |
| 1 | 0 | $d$ | 0 | $c_{3}^{(a, b)}=1$ | $c_{3}^{(a, b)}=\overline{x_{a}} \vee x_{b}$ |
| 1 | 1 | $d$ | $d$ | $c_{4}^{(a, b)}=1$ | $c_{4}^{(a, b)}=\overline{x_{a}} \vee \overline{x_{b}}$ |

The 2-clustering problem in which the number of cluster is equal to or less than two is defined as follows:

## 2-clustering Problem

Inputs: Delays between register pairs, clock period $T$ and clock skew $d(d \geq 0)$

Output: The clock timing for each register
Constraint: All timing constraints are satisfied at the given clock period $T$.

In 2-clustering method, the 2 -clustering problem is converted to 2 -SAT problem and it is solved by a 2-SAT solver. 2-SAT problem assigns logic variable $x_{i}(1 \leq \forall j \leq$ $n$ ) in the set of logic variables $X=\left\{x_{1}, x_{2}, \ldots, x_{n}\right\}$ to 0 or 1 so that a logic function $C(X)=c_{1}(X) \wedge c_{2}(X) \wedge \cdots \wedge$ $c_{m}(X)$ equals to 1 . Since 2-SAT problem can be solved in polynomial time [8], the 2-clustering problem can also be solved in polynomial time.
The logic variable $x_{a}$ in 2-SAT problem corresponds to the clock timing $S(a)$ of register $a$ in the 2-clustering problem and $S(a)=0(S(a)=d)$ if and only if $x_{a}=0\left(x_{a}=\right.$ 1). For each register pair with signal propagation in the 2-clustering problem, four maxterms $c_{k}^{(a, b)}$ are defined by whether or not the timing constraint is satisfied at the clock timings as Table I. The logic function is defined by $C(X)=\wedge\left(c_{1}^{(a, b)} \wedge c_{2}^{(a, b)} \wedge c_{3}^{(a, b)} \wedge c_{4}^{(a, b)}\right)$.

## III. Proposed Method

## A. PDE Structure

In this paper, PDEs with two delay values 0 and $d$ are constructed by only basic logic gates. The PDE structure is shown in Fig. 3. A certain delay $d$ is realized by buffer chains connecting buffers in series and the control to select whether or not the delay $d$ is added is realized by a multiplexer.

## B. Design Flow for Circuit with PDEs

The design flow for a circuit with PDEs is shown in Fig. 4. At first, logic synthesis is applied to the circuit. Next, in the circuit in gate-level representation, a PDE in gate-level representation is inserted to each path from


Fig. 3. PDE structure.


Fig. 4. Design flow for a circuit with PDEs.
the clock source to each register. Furthermore, placement, global routing and detailed routing are applied to the circuit with PDEs. In this paper, the yields of the circuits with PDEs are estimated by Monte Carlo simulation which uses the delay information obtained by commercial CAD tools.

## C. Proposed PDE Delay Tuning Method

To apply the 2 -clustering method directly, it is required to check whether or not there are timing violations by the path-delay-test at all PDE delay tuning patterns for any register pairs with signal propagations. The number of path-delay-tests for this method is $4 P+2 S$, where $P(S)$ is the number of signal propagations between different registers (a register).
The flow chart of the proposed PDE delay tuning method is shown in Fig. 5. In the proposed method, the 2 -clustering method is firstly applied by using the timing constraint information assumed that all timing constraints are satisfied at all PDE delay tuning patterns such as Table II. Each PDE is tuned to 0 or $d$ by the 2 -clustering method. After applying the 2 -clustering

TABLE II
Initialized timing constraint information. OK? means that the timing constraint is assumed to be satisfied.

| $S^{\prime}(a)$ | $S^{\prime}(b)$ | Timing constraint of $(a, b)$ |
| :---: | :---: | :---: |
| 0 | 0 | OK? |
| 0 | d | OK? |
| d | 0 | OK? |
| d | d | OK? |

method, the timing constraints are checked by the path-delay-test for the register pairs assumed that the timing constraint is satisfied. In this paper, we define the number of the path-delay-tests as the number of PDE delay tuning patterns for register pairs assumed the satisfaction of timing constraint and applied the path-delay-test. If the all timing constraints are satisfied, the proposed method is finished as a success. Otherwise, the timing constraint information is updated. Next, the following failure conditions of PDE delay tuning are confirmed.

Failure conditions of the PDE delay tuning

- A timing constraint for signal propagation between a register is violated.
- Timing constraints at all PDE delay tuning patterns for a register pair are violated.
- Timing constraint information for no register pairs is updated.

Even if only one of the failure conditions is satisfied about the timing constraint information, it is guaranteed that the circuit cannot be recovered by PDE delay tuning. Therefore, the proposed method is finished as a failure. Otherwise, the 2 -clustering method is applied by using the updated timing constraint information. The proposed method repeats the timing constraint information update and 2-clustering method until the success or failure is determined.

An example of the 2 -clustering method to the circuit with PDEs and two register pairs $(a, b)$ and $(b, c)$ is shown in Fig. 6. At first, the timing constraint information for the register pairs $(a, b)$ and $(b, c)$ is assumed that all timing constraints are satisfied, and the 2-clustering method is applied by using the timing constraint information. After applying the 2 -clustering method and tuning delays of PDEs, it is found that the timing constraint of the register pair $(a, b)$ is satisfied but that of $(b, c)$ is not. These results are updated to the timing constraint information and the 2 -clustering method is applied again. After that, when the timing violation is recovered for the register pair $(b, c)$, tuning PDE delays are succeeded.

## D. Comparative PDE Delay Tuning Method

If the delays of the PDEs can be set to 0 or $d$, the PDE delay tuning is uniquely determined to recover tim-


Fig. 5. Flow chart of the proposed PDE delay tuning method.


Fig. 6. An example of the proposed PDE delay tuning method.
ing violations in the circuit with the PDEs according to the violated constraint. Suppose that signal is propagated from register $a$ to register $b$. For the hold violation, the delay of PDE inserted to register $a$ is tuned to $d$ and the delay of PDE inserted to register $b$ is tuned to 0 so that the signal propagation $S(a)+D_{\text {min }}^{\prime}(a, b)$ may be later than the rise of clock in the current period $S(b)$. Similarly, for the setup violation, the delay of PDE inserted to register $a$ is tuned to 0 and the delay of PDE inserted to register $b$ is tuned to $d$ so that the signal propagation $S(a)+D_{\max }^{\prime}(a, b)$ may be faster than the rise of clock in the next period $S(b)+T$. To compare with the proposed PDE delay tuning method in the experiments, the comparative PDE delay tuning method tunes the delays for each register pair with timing violation at once (Fig. 7).

## IV. Experiments

To confirm the effects of the proposed method, we apply the proposed method and the comparative method to 26 circuits in ISCAS89 benchmarks in which the time for yield estimation is not long. We compare the yields of the circuits applying between the proposed method and the comparative method. In this paper, we focus on the

Step1: Detect all timing violations and these violated constraints.

Step2: For any timing violations of signal propagation from register $a$ to register $b$, apply the following procedure.

Step2-1: If the hold violation is found, the delay of PDE inserted to register $a$ is tuned to $d$.

Step2-2: If the setup violation is found, the delay of PDE inserted to register $b$ is tuned to $d$.

Step3: Apply path-delay-tests to the register pairs including the registers tuned to $d$. If no timing violations exist, PDE delay tuning is succeed. Otherwise, PDE delay tuning is failed.

Fig. 7. Comparative PDE delay tuning method.
timing violation. The yield is defined by the ratio of the number of circuits without timing violation to the total number of circuits. The yields are estimated by applying the Monte Calro simulation to the circuits at 1000 times. In Monte Carlo simulation, the delay of each wire or each gate is varied in Gaussian distribution modeled by its minimum and maximum delay in Rohm $0.18 \mu m$ library.

To implement the proposed method, the PDEs increasing the number of buffer chains one by one are prepared and are inserted to the circuits according to the design flow for each circuit. Furthermore, the clock period is increased by 0.01 ns and the yields are estimated by each method at the clock period. Cadence Encounter is used for placement and routing in the design flow.

Fig. 8 and Fig. 9 are experimental results for the two circuits s298 and s382, respectively. The relations between clock period and yield by no PDE delay tuning, the comparative method, and the proposed method are reprensented by no\&5chains (no\&6chains), comp\&5chains (comp\&6chains) and prop\&5chains (prop\&6chains), respectively. In Fig. 8, there is no distinct difference between comp\&5chains and prop\&5chains because many timing violations in s298 are recovered by the comparative method. In Fig. 9, there is the distinct difference between comp\&6chains and prop\&6chains because many timing violations in s382 are recovered by the proposed method but not the comparative method.

The experimental results for the 26 circuits are shown in Table III. The minimum clock periods where the yields are over $50 \%$ in the proposed method are improved by $13 \%$ compared to those in no PDE delay tuning on average. Additionally, the yields in the proposed method are improved compared to those in no PDE delay tuning and the comparative method. Furthermore, the numbers of path-delay-tests for the proposed method are improved


Fig. 8. Relations between clock period and yield in s298.


Fig. 9. Relations between clock period and yield in s382.
by $45.6 \%$ compared to those for the 2 -clustering method on average.

## V. Summary and Conclusions

In this paper, we use PDEs with two delay values and propose a delay tuning method of the PDEs to improve the yield and to reduce the number of tests. Experimental results show that the yields are highly improved with the low path-delay-test cost by using the PDEs with two delay values and applying the proposed PDE delay tuning method.

In the future works, how to decide the number of buffers in the PDE with high yield improvement and decrease the number of the PDEs to keep the yield will be considered.

TABLE III
Experimental Results.


## Acknowledgements

This work is supported by VLSI Design and Education Center(VDEC), the University of Tokyo in collaboration with Cadence Design Systems, Inc, Rohm Corporation and Toppan Printing Corporation.

## References

[1] Y. Hashizume, Y. Takashima, and Y. Nakamura, "Postsilicon clock-timing tuning based on statistical estimation," IEICE Trans. Fundamentals, vol.E91-A, no.9, pp.2322-2327, 2008.
[2] M. Kaneko and L. Jian, "Post-silicon skew tuning algorithm utilizing setup and hold timing tests," Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS), pp.125-128, 2012.
[3] Y. Kohira and A. Takahashi, "An optimum 2-clustering method in general-synchronous framework," Proceedings
of the 25th Workshop on Circuits and Systems, pp.178183, 2012 (Japanese).
[4] A. Krstic and K.-T. Cheng, Delay Fault Testing for VLSI Circuits, Kluwer Academic Publishers, 1998.
[5] J.P. Fishburn, "Clock skew optimization," IEEE Tranc. on Computers, vol.39, no.7, pp.945-951, 1990.
[6] S. Tam, U. Desai, and R. Lymaye, "Clock generation and distribution for the third generation itanium processor," Proc. of Symp. on VLSI Circuits, pp.9-12, 2003.
[7] K. Nagaraj and S. Kundu, "An automatic post-silicon clock tuning system for improving system performance based on tester measurements," IEEE International Test Conference, pp.1-8, 2008.
[8] B. Aspvall, M.F. Plass, and R.E. Tarjan, "A linear-time algorithm for testing the truth of certain quantified boolean formulas," Information Processing Letters, vol.8, no.3, pp.121-123, 1979.

