# Technology Mapping Method for Low Power Consumption and High Performance in General-Synchronous Framework

Junki Kawaguchi Yukihide Kohira School of Computer Science, the University of Aizu Aizu-Wakamatsu City, Fukushima, 965-8580 Japan {m5171113,kohira}@u-aizu.ac.jp

Abstract— In general-synchronous framework, in which the clock is distributed periodically to each register but not necessarily simultaneously, circuit performance is expected to be improved compared to complete-synchronous framework, in which the clock is distributed periodically and simultaneously to each register. To improve the circuit performance more, logic circuit synthesis for general-synchronous framework is required. In this paper, under the assumption that any clock schedule is realized by an ideal clock distribution circuit, when two or more cell libraries are available, a technology mapping method which assigns a cell to each gate in the given logic circuit by using integer linear programming is proposed. In experiments, we show the effectiveness of the proposed technology mapping method.

# I. INTRODUCTION

In the conventional integrated circuit design, the complete-synchronous framework (c-frame), in which a clock is distributed simultaneously to each individual register, has been pursued for easy understanding and easy designing. However, the performance improvement on the premise approaches a limit, and the cost for the premise is very high. On the other hand, in general-synchronous framework (g-frame) [1–3] in which a clock is not assumed to be distributed to all registers simultaneously, since delay in g-frame may be used more efficiently than that in c-frame, it is expected to improve the performance such as the clock period, clock area, power consumption, and reliability with low costs. Previous works for g-frame are summarized in [4].

In many studies of g-frame, clock scheduling algorithms [1-3, 5-8] and clock distribution circuit synthesis algorithms [9], [10] for given logic circuits have been investigated. However, the given logic circuits are synthesized in c-frame. Since the optimum logic circuit in g-frame is different from that in c-frame, the performance might not be improved enough in g-frame. In order to improve the clock period in c-frame, the circuits are synthesized so that the maximum delay between registers is as small as possible. However, in g-frame, the clock period might not be reduced even if the maximum delay is reduced since the clock period is obtained by constraint graph which is defined by maximum and minimum delays between registers



Fig. 1. Circuit graph G.

[3]. Therefore, the logic circuit with low costs can be obtained by synthesizing logic circuit for g-frame. The delay insertion [11–14], gate sizing [15] and register relocation [16,17] have been proposed. Although the clock periods are improved by the existing methods, they do not take power consumption and how to realize the circuit modification into consideration and/or do not guarantee the optimality.

When two or more cell libraries are available, there is a trade-off between the delay and power consumption for each gate. The assignment for a cell in cell libraries to each gate is called *technology mapping*. In this paper, under the assumption that any clock schedule is realized by an ideal clock distribution circuit, when two or more cell libraries are available, a technology mapping method which assigns a cell to each gate in the given logic circuit by using integer linear programming (ILP) is proposed. In experiments, we show the effectiveness of the proposed technology mapping method.

# II. Preliminaries

In this paper, we assume that a circuit consists of registers, gates, and wires connecting registers and gates. We refer to registers, gates, and wires as elements. A circuit is represented by the graph G = (V, E), where V is the vertex set corresponding to elements in the circuit and E is the directed edge set corresponding to signal propagations in the circuit. In this paper, we assume that each element has a unique non-negative delay. Let d(v)be the weight of  $v \in V$  which corresponds to the delay of corresponding element.

Let  $V_r$  be the register set of G and  $V_g$  be the gate set of G. Necessarily,  $V_r$  and  $V_g$  are subsets of V. An example of the circuit graph is shown in Fig. 1. In Fig. 1,  $\{a, b, c, d\}$  is the register set, and the value in each vertex represents



Fig. 2. Timing chart.



Fig. 3. Constraing graph for circcuit G shown in Fig. 1.

delay.

# A. General-synchronous framework

In g-frame, the clock input timing of a register may be different from other registers. The *clock timing* S(r)of register r is defined by the difference between clock arrival time between r and an arbitrary chosen reference register. A circuit works correctly with clock period Tif the following two types of constraints are satisfied for every register pair with signal propagations (Fig. 2) [1]. **Setup (No-Zero-Clocking) Constraints** 

$$S(a) - S(b) \le T - D_{\max}(a, b)$$

## Hold (No-Double-Clocking) Constraints

$$S(b) - S(a) \le D_{\min}(a, b),$$

where  $D_{\max}(a, b)$  is the maximum delay and  $D_{\min}(a, b)$  is the minimum delay from register a to b.

Let  $T_S(G)$  be the minimum clock period of circuit G in g-frame under the assumption that clock can be input to each register at an arbitrary timing. The minimum clock period  $T_S(G)$  of G in g-frame is determined by the constraint graph  $H(V_r, E_r)$ , where vertex set  $V_r$  corresponds to registers in G and directed edge set  $E_r$  corresponds to two types of constraints [3]. An edge in  $E_r$  from register a to b with weight  $D_{\min}(a, b)$ , called D-edge, corresponds to the hold constraint, and an edge from register b to awith weight  $T - D_{\max}(a, b)$ , called Z-edge, corresponds to the setup constraint. Hereafter, the constraint graph  $H(V_r, E_r)$  is simply represented by H(G). Let H(G, t)be the constraint graph in which the clock period T of Z-edges in H(G) is set to t.

**Theorem 1** [3]  $T_S(G)$  is the minimum t such that there is no negative-cycle in the constraint graph H(G, t).



Fig. 4. Circuit G' which is assigned a cell of two or more cell libraries to each gate.



Fig. 5. Constraint graph H(G'8) for circuit G' shown in Fig. 4.

For example, the delay from register d to a in G shown in Fig. 1 is 12 which is the maximum delay between registers. Therefore, the minimum clock period in c-frame is 12. The constraint graph H(G,T) is shown in Fig. 3 (a). Since H(G,9) shown in Fig. 3 (b) includes no negativecycle and cycle (a, d, b, a) shown in Fig. 3 (a) is negative when T < 9, the minimum clock period  $T_S(G)$  is 9.

# B. Technology Mapping

When two or more cell libraries are available, there is a trade-off between the delay and power consumption for each gate. The circuit G' obtained from G shown in Fig. 1 by assigning a cell in two or more cell libraries to each gate in consideration of the minimization of  $T_s$  is shown in Fig. 4. The gate represented by slash in Fig. 4 is different from that in G. The constraint graph H(G', 8) is shown in Fig. 5. Since H(G', 8) includes no negative-cycle and cycle (a, d, b, a) is negative when T < 8, the minimum clock period  $T_S(G')$  is 8. G' is expected to become low power consumption compared with G since a cell library with large delay has low power consumption. Therefore, G' is expected to become low power consumption and high performance in g-frame compared with G.

#### III. PROPOSED METHOD

Setup and Hold constraints are given by a linear expression [1]. Therefore, when the delay of each circuit element is given, the problem which obtains the minimum clock period  $T_s$  is defined as follows:

## objective

#### minimize T

 $\operatorname{constraints}$ 

Setup constrains

$$S(r_i) - S(r_j) \le T - D_{\max}(r_i, r_j)((r_i, r_j) \in E_r(G))$$
 (1)

Hold constrains

$$S(r_j) - S(r_i) \le D_{\min}(r_i, r_j) \quad ((r_i, r_j) \in E_r(G)) \quad (2)$$

ILP1 which is an ILP which minimizes the clock period in g-frame by technology mapping and ILP2 which is an ILP which maximizes the number of the cell with low power consumption with the minimum clock period by technology mapping in g-frame are formulized by modifying the above problem definition.

## A. ILP1

When k cell libraries are available, let  $d_i(1), d_i(2), \ldots$  $d_i(k)$  be the delays of cells to gate *i*. ILP1 which is an ILP which minimizes the clock period in g-frame by technology mapping is defined as follows:

# objective

minimize T

# constraints

For  $e = (t, u) \in E$ , if  $t, u \notin V_r$ ,

$$r_t - r_u \le -D_u \tag{3}$$
$$a_t - a_u \ge -D_u \tag{4}$$

$$a_t - a_u \ge -D_u$$

if  $t \in V_r, u \notin V_r$ ,

$$S(t) - r_u \le -D_u \tag{5}$$
$$S(t) - a_u \ge -D_u \tag{6}$$

$$\delta(t) - a_u \ge -D_u \tag{6}$$

if 
$$u \in V_r, t \notin V_r$$
,

$$r_t - S(u) - T \le 0 \tag{7}$$

$$a_t - S(u) \ge 0 \tag{8}$$

if  $t, u \in V_r$ ,

$$S(t) - S(u) - T \le 0 \tag{9}$$
  

$$S(t) - S(u) \ge 0 \tag{10}$$

For  $i \in V_q$ 

$$\sum_{j=1}^{k} g_i(j) = 1$$
 (11)

$$D_{i} = \sum_{j=1}^{\kappa} d_{i}(j) * g_{i}(j)$$
(12)

variable constraints

$$g_i(j) = \{0, 1\} \quad (i \in V_q, 1 \le j \le k) \quad (13)$$

Let r, a be variables to reduce the number of the constraints as mentioned later. Let  $q_i(j)$  be a variable which is set to either 0 or 1 and it represents the assignment for the cell in cell library j with delay  $d_i(j)$  to gate i. The cell with delay  $d_i(j)$  is assigned to gate i if and only if



Fig. 6. Formulation to ILP.

 $g_i(j) = 1$ . Equation (11) means that exactly one cell is assigned to each gate. Equation (12) sets the delay of assigned cell to the gate delay  $D_i$ . Equations from (3) to (10) correspond to Setup constraints (equation (1)) and Hold constraints (equation (2)). For example, Setup and Hold constraints between register (d, b) of the circuit G shown in Fig.1 are discussed (Fig. 6). The constraints from (3) to (10) on the paths between register (d, b) in circuits G are defined as follows:

$$S(d) - r_0 \geq -D_0 \tag{14}$$

$$S(d) - a_0 \leq -D_0 \tag{15}$$

$$r_0 - r_4 \geq -D_4 \tag{16}$$

$$a_0 - a_4 < -D_4$$
 (17)

$$r_o - r_6 \ge -D_6 \tag{18}$$

$$a_o - a_6 \leq -D_6 \tag{19}$$

$$r_4 - r_5 \geq -D_5 \tag{20}$$

$$a_4 - a_5 < -D_5$$
 (21)

$$r_6 - r_5 \geq -D_5 \tag{22}$$

$$a_6 - a_5 \leq -D_5 \tag{23}$$

$$r_5 - S(b) - T \ge 0 \tag{24}$$

$$a_5 - S(b) \leq 0 \tag{25}$$

Equation (26) is obtained by the addition of equations (14), (16), (20), (24). Equations from (27) to (29) are similarly obtained by the addition of equations from (14)to (25).

$$S(d) - S(b) - T \leq -D_0 - D_4 - D_5$$
 (26)

$$S(d) - S(b) - T \leq -D_0 - D_6 - D_5$$
 (27)

$$S(b) - S(d) \leq D_0 + D_4 + D_5$$
 (28)

$$S(b) - S(d) \leq D_0 + D_6 + D_5$$
 (29)

Equations from (26) to (29) mean that the delays of all paths between register pair (d, b) satisfy Setup and Hold constraints.

The number of equations for all paths between register pairs such as equations from (26) to (28) may increase exponentially since the order of the number of paths between registers becomes exponential according to the number of gates in some cases. On the other hand, the number of equations from (3) to (10) is  $O(|E_q|)$  and the number of equations (11) and (12) is  $O(|V_q|)$ .

#### B. ILP2

In ILP1, although the clock period is minimized in gframe, the power consumption is not taken into consideration. Therefore, after the minimum clock period by

|                    | TABLE .            | l          |                                                                                           |   |
|--------------------|--------------------|------------|-------------------------------------------------------------------------------------------|---|
| C                  | ELL LIBRA          | RIES.      |                                                                                           |   |
| Type of Transister | High int           | High speed |                                                                                           |   |
| Low Leakage        | SI                 | MNC        |                                                                                           |   |
| Standard           | S                  | MN         |                                                                                           |   |
| High Speed         | S                  | MZ         |                                                                                           |   |
|                    |                    |            |                                                                                           |   |
| 1.4 -              |                    | 1          | MZ TC<br>MZ TS<br>MN TC<br>MN TS<br>MNC TC<br>MNC TS<br>MNC TS<br>ILP1 TS<br>ILP2 TS<br>♥ |   |
| 1.2 -<br>vojd      |                    |            |                                                                                           | - |
| 0.8 - ♥            |                    | 0          |                                                                                           | - |
| 0.6 -              | ▲                  |            | Δ                                                                                         |   |
| 0.5 1              | 1.5<br>Clock perio | 2<br>d     | 2.5                                                                                       | 3 |

Fig. 7. Average of clock period and power consumption.

technology mapping  $T_{opt}$  is obtained by ILP1, the power consumption is minimized without increasing the minimum clock period. ILP2 which is the minimization of the power consumption and which achieves the minimum clock period  $T_{opt}$  by technology mapping is formulized based on ILP1. The objective is defined as follows:

$$Minimize \qquad \sum_{j=1}^{k} \{\alpha_j * \sum g_i(j)\}$$

 $\alpha_j$  is the priority of cell library *j*. The cell library with low power consumption is assigned preferentially by setting a smaller value to  $\alpha_j$  for the cell library with low power consumption. In order to minimize the power consumption without increasing the minimum clock period  $T_{opt}$  which is obtained by ILP1 in g-frame, the following equation is added to ILP1 as a constraint.

 $T \leq T_{opt}$ 

Other constraints are the same as the constraints of ILP1.

# IV. EXPERIMENTAL RESULTS

We used the cell libraries of 65nm CMOS standard cell in experiments. The cell libraries have two types: high integration and high speed. Moreover, they are divided into a high speed, a standard and a low leakage according to the type of transistor. It is shown in Table I. The delay and power consumption of the circuit were evaluated by Synopsys Design Compiler. The SDF file was generated by using only each of cell libraries MZ, MN, MNC. Moreover, the delay written in SDF file was set as the delay of each cell, and ILP1 and ILP2 were formulized. ILP1 and ILP2 obtained solutions using CPLEX[18] in PC with a 2.93GHz Intel Core i7 CPU and 6GB RAM. The verilog files corresponding to the cell assignment obtained by ILP1 and ILP2 were made by our program. The perfor-

TABLE II The number of cells in each cell libraries.

|          |       |       | ILP1 |      | ILP2    |          |           |  |  |
|----------|-------|-------|------|------|---------|----------|-----------|--|--|
|          | #gate | #MZ   | #MN  | #MNC | #MZ     | #MN      | #MNC      |  |  |
| s27      | 10    | 10    | 0    | 0    | 5       | 0        | 5         |  |  |
| s208.1   | 104   | 104   | 0    | 0    | 32      | 8        | 64        |  |  |
| s298     | 119   | 94    | 13   | 12   | 21      | 19       | 79        |  |  |
| s344     | 160   | 160   | 0    | 0    | 26      | 5        | 129       |  |  |
| s349     | 161   | 161   | 0    | 0    | 26      | 5        | 130       |  |  |
| s382     | 157   | 136   | 4    | 18   | 28      | 42       | 88        |  |  |
| s386     | 159   | 159   | 0    | 0    | 25      | 28       | 106       |  |  |
| s400     | 159   | 136   | 9    | 17   | 34      | 36       | 92        |  |  |
| s420.1   | 218   | 197   | 10   | 11   | 42      | 34       | 142       |  |  |
| s444     | 181   | 153   | 7    | 21   | 46      | 42       | 93        |  |  |
| s499     | 152   | 152   | 0    | 0    | 46      | 42       | 93        |  |  |
| s510     | 211   | 211   | 0    | 0    | 31      | 45       | 135       |  |  |
| s526     | 193   | 193   | 0    | 0    | 0 23 37 |          | 133       |  |  |
| s526n    | 194   | 194   | 0    | 0 24 |         | 42       | 128       |  |  |
| s635     | 286   | 286   | 0    | 0    | 0 95    |          | 164       |  |  |
| s641     | 379   | 379   | 0    | 0    | 0 53    |          | 326       |  |  |
| s713     | 393   | 393   | 0    | 0    | 0 53    |          | 338       |  |  |
| s820     | 289   | 289   | 0    | 0    | 14      | 30       | 245       |  |  |
| s832     | 287   | 286   | 0    | 1    | 15      | 29       | 243       |  |  |
| s838.1   | 446   | 398   | 13   | 35   | 60      | 81       | 305       |  |  |
| s938     | 446   | 411   | 16   | 19   | 67      | 71       | 308       |  |  |
| s953     | 395   | 395   | 0    | 0    | 62      | 36       | 296       |  |  |
| s967     | 394   | 394   | Ő    | Õ    | 65      | 36       | 293       |  |  |
| s991     | 519   | 443   | 36   | 40   | 57      | 1        | 461       |  |  |
| s1196    | 529   | 512   | 3    | 14   | 17      | 7        | 505       |  |  |
| s1238    | 508   | 491   | 3    | 14   | 17      | 8        | 483       |  |  |
| s1269    | 569   | 447   | 39   | 83   | 45      | 66       | 458       |  |  |
| s1423    | 657   | 531   | 27   | 99   | 61      | 7        | 589       |  |  |
| s1488    | 653   | 653   | 0    | 0    | 34      | 85       | 534       |  |  |
| s1494    | 647   | 647   | Ő    | Õ    | 29      | 81       | 537       |  |  |
| s1512    | 780   | 780   | Ő    | Õ    | 77      | 62       | 641       |  |  |
| s3271    | 1572  | 1293  | 93   | 186  | 54      | 85       | 1433      |  |  |
| s3330    | 1789  | 1654  | 42   | 93   | 105     | 132      | 1552      |  |  |
| s3384    | 1685  | 1542  | 38   | 105  | 57      | 3        | 1625      |  |  |
| s4863    | 2342  | 2014  | 143  | 185  | x       | x        | x         |  |  |
| s5378    | 2779  | 2779  | 0    | 0    | 101     | 114      | 2564      |  |  |
| s6669    | 3080  | 2692  | 152  | 236  | 322     | 182      | 2576      |  |  |
| s9234    | 5597  | 5597  | 102  | 200  | x       | 102<br>X | 2010<br>x |  |  |
| s9234_1  | 5597  | 5597  | õ    | Ő    | x       | x        | x         |  |  |
| s13207   | 7951  | 7254  | 145  | 552  | v       | v        | v         |  |  |
| s13207 1 | 7951  | 7293  | 130  | 528  | v       | v        | v         |  |  |
| s15850   | 9772  | 8726  | 257  | 789  | x x     | v v      | v v       |  |  |
| e15850 1 | 0772  | 0772  | 201  | 105  | v       | v        | v         |  |  |
| a35039   | 16065 | 16065 | 0    | 0    | v v     | v<br>v   | v         |  |  |
| s38417   | 22170 | 20170 | 0    | 0    |         | N<br>V   | A<br>V    |  |  |
| -38584   | 10252 | 10251 | 0    | 0    | 100     | 36       | 10117     |  |  |
| -2858/ 1 | 10252 | 10252 | 0    | 2    | 100     | 35       | 10119     |  |  |
| proleg   | 1601  | 13200 | 40   | 150  | 85      | 100      | 1/107     |  |  |
| protog   | 1001  | 1000  | 49   | 109  |         | 109      | 1407      |  |  |

x : out of memory

mance of the circuits obtained by ILP1 and ILP2 was also evaluated by Design Compiler.

The experiment results of the circuits constituted from only each cell library and the circuits obtained by ILP1 and ILP2 are compared. We applied these methods to ISCAS89 benchmark suite. The number of cells in each cell library of the circuits obtained by ILP1 and ILP2 is shown in table II. The experiment results normalized using the minimum clock period and the power consumption of the circuits constituted from only MZ are shown in table III. In ILP2, in 9 circuits, the solution is not obtained by ILP2 due to out of memory in CPLEX. The solution which is not obtained is shown by x in table II and table III. Ave. shown in table III is the average value of remaining 39 circuits. The relation of averages of the minimum clock period and power consumption is shown in Fig. 7. In order to check the effect of the power consumption by technology mapping, the power consumption is estimated without changing frequency and synthesizing clock tree. Therefore, since gates in g-frame are the same as those in c-frame, the power consumption in g-frame and that

in c-frame are the same. The minimum clock period in g-frame is smaller than that in c-frame. Although details are omitted for the space of the paper, in ILP1, the average of relative error of the minimum clock period  $T_{s}$  obtained by CPLEX and the minimum clock period  $T_{opt}$  which is evaluated by Design Compiler is 0.55%. In ILP2, the average of relative error is 2.38%. In ILP1 and ILP2, since the delay of each gate is obtained by the delay of the circuit constituted from only each cell library, the error of the estimation of gate delay is occurred by subsequent gates.

In ILP1, in 21 circuits, the minimum clock period  $T_s$ and the power consumption of circuits obtained by ILP1 are smaller than those of circuits constituted from only MZ. In the remaining 27 circuits, the minimum clock period  $T_s$  and the power consumption of circuits obtained by ILP1 are the same as those of circuits constituted by only MZ since most gates in the circuits obtained by ILP1 are assigned cells in MZ.

In 13 circuits, the minimum clock period  $T_s$  and the power consumption of circuits obtained by ILP2 are smaller than those of circuits constituted from only MZ. In 5 circuits, although the minimum clock period  $T_s$  of circuits obtained by ILP2 are the same as that of circuits constituted from only MZ, the power consumption of circuits obtained by ILP2 is smaller than that of the circuit constituted from only MZ. In remaining 21 circuits, although the power consumption of circuits obtained by ILP2 is smaller than that of the circuit constituted from only MZ, the minimum clock period  $T_s$  of circuits obtained by ILP2 is larger than that of circuits constituted from only MZ since the error is occurred by the estimation of delays. The power consumption of circuits obtained by ILP2 is smaller than that of circuits obtained by ILP1 since most gates in the circuits obtained by ILP2 are assigned cells of cell library with low power consumption. However, the minimum clock period  $T_s$  of circuits obtained by ILP2 is larger than that of circuits obtained by ILP1.

In experiment results, the circuits with low power consumption and high performance are obtained by ILP2.

#### V. SUMMARY AND CONCLUSIONS

In this paper, we proposed the technology mapping method for low power consumption in g-frame. The proposed technology mapping method improves clock period and power consumption. Moreover, in the experiments, the effectiveness of the proposed technology mapping method is shown.

In our future works, we will propose a heuristic technology mapping method which assigns a cell to each gate since a solution is not obtained by ILP2 in large circuits.

#### Acknowledgements

This work is supported by VLSI Design and Education Center (VDEC), the University of Tokyo in collaboration with Synopsys Inc., eShuttle Inc. and Fujitsu Semiconductor Ltd. This work is partly supported by JSPS KAK-ENHI Grant-in-Aid for Young Scientists (B) 26730029.

#### References

- J. Fishburn, "Clock skew optimization," *IEEE Tranc. on Computers*, vol. 39, no. 7, pp. 945–951, 1990.
- [2] R. Deoker and S. Sapatneker, "A graph-theoretic approach to clock skew optimization," in *ISCAS*, pp. 407– 410, 1994.
- [3] A. Takahashi and Y. Kajitani, "Performance and reliability driven clock scheduling of sequential logic circuits," in *ASP-DAC*, pp. 37–43, 1997.
- [4] A. Takahashi, "[invited talk] general synchronous circuits using global clock -design methodologies, tools, and prospects," *IPSJ SIG Technical Report, 2006-SLDM-126*, vol. 2006, no. 111, pp. 159–164, 2006.
- [5] A. Takahashi, "Practical fast clock-schedule design algorithms," *IEICE Trans. Fundamentals*, vol. E89-A, no. 4, pp. 1005–1011, 2006.
- [6] Y. Zhi, H. Zho, and X. Zeng, "A practical method for multi-domain clock skew optimization," in ASP-DAC, pp. 521–526, 2011.
- [7] L. Li, Y. Lu, and H. Zhou, "Optimal multi-domain clock skew scheduling," in DAC, pp. 152–157, 2011.
- [8] Y. Kohira and A. Takahashi, "2-sat based linear time optimum two-domain clock skew scheduling," in ASP-DAC, pp. 173–178, 2014.
- [9] K. Inoue, W. Takahashi, A. Takahashi, and Y. Kajitani, "Schedule-clock-tree routing for semi-synchnorous circuits," *IEICE Trans. Fundamentals*, vol. E82-A, no. 11, pp. 2431–2439, 1999.
- [10] S. Ishijima, T. Utsumi, T. Oto, and A. Takahashi, "A semi-synchronous circuit design method by clock tree modification," *IEICE Trans. Fundamentals*, vol. E85-A, no. 12, pp. 2596–2602, 2002.
- [11] K. Morishita, A. Takahashi, and Y. Kajitani, "Clockperiod minimization by delay optimization on the semisynchronous circuit," *IPSJ SIG Technical Report*, pp. 73– 80, 1997.
- [12] T. Yoda and A. Takahashi, "Clock period minimization of semi-synchronous circuits by gate-level delay insertion," *IEICE Trans. Fundamentals*, vol. E82-A, no. 11, pp. 2383–2389, 1999.
- [13] Y. Kohira and A. Takahashi, "Clock period minimization method of semi-synchronous circuits by delay insertion," *IEICE Trans. Fundamentals*, vol. E88-A, no. 4, pp. 892– 898, 2005.
- [14] Y. Kohira, S. Tani, and A. Takahashi, "Minimization of delay insertion in clock period improvement in generalsynchronous framework," *IEICE Trans. Fundamentals*, vol. E92-A, no. 4, pp. 1106–1114, 2009.
- [15] T. Yasui, K. Kurokawa, M. Toyonaga, and A. Takahashi, "A circuit optimization method by the register path modificsation in consideration of the range of feasible clock timing," *DA Symposium 2002, IPSJ Symposium Series*, vol. 2002, no. 10, pp. 259–264, 2002.
- [16] Y. Kohira and A. Takahashi, "Gate-level register relocation in generalized synchronous framework for clock period minimization," *IEICE Trans. Fundamentals*, vol. E90-A, no. 4, pp. 800–807, 2007.
- [17] Y. Kohira and A. Takahashi, "A fast gate-level register relocation method for circuit size reduction in generalsynchronous framework," *IEIICE Trans. Fundamentals*, vol. E91-A, no. 10, pp. 3030–3037, 2008.
- [18] "CPLEX 12.1." http://www-01.ibm.com/software/ commerce/optimization/cplex-optimizer/.

| Experimental Results. |       |       |       |       |       |       |       |       |       |       |       |         |       |       |         |
|-----------------------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|---------|-------|-------|---------|
|                       | MZ MN |       |       | MNC   |       |       | ILP1  |       |       | ILP2  |       |         |       |       |         |
|                       | $T_C$ | $T_S$ | $P_d$ | $T_C$ | $T_S$ | $P_d$ | $T_C$ | $T_S$ | $P_d$ | $T_S$ | $P_d$ | Time[s] | $T_S$ | $P_d$ | Time[s] |
| s27                   | 1.09  | 1.00  | 1.00  | 1.34  | 1.22  | 0.83  | 1.73  | 1.57  | 0.68  | 1.00  | 1.00  | 0.04    | 1.01  | 0.94  | 0.03    |
| s208.1                | 1.87  | 1.00  | 1.00  | 2.32  | 1.22  | 0.81  | 2.97  | 1.57  | 0.66  | 1.00  | 1.00  | 0.06    | 1.01  | 0.85  | 0.23    |
| s298                  | 1.53  | 1.00  | 1.00  | 1.90  | 1.22  | 0.82  | 2.47  | 1.58  | 0.66  | 0.95  | 0.96  | 0.13    | 0.96  | 0.89  | 0.16    |
| s344                  | 1.28  | 1.00  | 1.00  | 1.58  | 1.23  | 0.81  | 2.02  | 1.58  | 0.67  | 1.00  | 1.00  | 0.07    | 1.01  | 0.91  | 0.16    |
| s349                  | 1.28  | 1.00  | 1.00  | 1.58  | 1.23  | 0.81  | 2.02  | 1.58  | 0.67  | 1.00  | 1.00  | 0.09    | 1.01  | 0.91  | 0.14    |
| s382                  | 1.67  | 1.00  | 1.00  | 2.03  | 1.20  | 0.83  | 2.67  | 1.57  | 0.67  | 0.98  | 1.00  | 0.23    | 0.97  | 0.82  | 0.56    |
| s386                  | 1.01  | 1.00  | 1.00  | 1.25  | 1.23  | 0.81  | 1.60  | 1.58  | 0.66  | 1.00  | 1.00  | 0.05    | 1.02  | 0.84  | 0.44    |
| s400                  | 1.65  | 1.00  | 1.00  | 2.02  | 1.20  | 0.83  | 2.66  | 1.57  | 0.67  | 0.96  | 0.99  | 0.14    | 0.97  | 0.83  | 0.46    |
| s420.1                | 2.00  | 1.00  | 1.00  | 2.49  | 1.21  | 0.81  | 3.21  | 1.57  | 0.66  | 0.99  | 0.95  | 0.14    | 0.99  | 0.94  | 0.54    |
| s444                  | 1.63  | 1.00  | 1.00  | 2.00  | 1.22  | 0.83  | 2.63  | 1.59  | 0.66  | 0.97  | 0.99  | 0.14    | 0.97  | 0.85  | 3.95    |
| s499                  | 1.07  | 1.00  | 1.00  | 1.34  | 1.26  | 0.80  | 1.75  | 1.64  | 0.65  | 1.00  | 1.00  | 0.14    | 1.00  | 0.96  | 3.14    |
| s510                  | 1.03  | 1.00  | 1.00  | 1.27  | 1.23  | 0.81  | 1.64  | 1.61  | 0.66  | 1.00  | 1.00  | 0.14    | 1.02  | 0.93  | 4.63    |
| s526                  | 1.32  | 1.00  | 1.00  | 1.65  | 1.22  | 0.82  | 2.13  | 1.59  | 0.66  | 1.00  | 1.00  | 0.17    | 1.01  | 0.78  | 3.18    |
| s526n                 | 1.33  | 1.00  | 1.00  | 1.66  | 1.22  | 0.82  | 2.15  | 1.59  | 0.66  | 1.00  | 1.00  | 0.14    | 1.02  | 0.81  | 3.17    |
| s635                  | 14.02 | 1.00  | 1.00  | 17.18 | 1.22  | 0.82  | 21.93 | 1.59  | 0.68  | 1.00  | 1.00  | 0.11    | 1.00  | 0.66  | 1.56    |
| s641                  | 1.22  | 1.00  | 1.00  | 1.51  | 1.24  | 0.82  | 1.92  | 1.58  | 0.68  | 1.00  | 1.00  | 0.29    | 1.02  | 0.82  | 0.09    |
| s713                  | 1.25  | 1.00  | 1.00  | 1.55  | 1.24  | 0.81  | 1.98  | 1.59  | 0.67  | 1.00  | 1.00  | 0.08    | 1.02  | 0.81  | 0.14    |
| s820                  | 1.00  | 1.00  | 1.00  | 1.24  | 1.24  | 0.78  | 1.63  | 1.63  | 0.62  | 1.00  | 1.00  | 0.12    | 1.05  | 0.81  | 0.62    |
| s832                  | 1.00  | 1.00  | 1.00  | 1.24  | 1.24  | 0.77  | 1.63  | 1.63  | 0.62  | 1.00  | 1.00  | 0.10    | 1.05  | 0.80  | 0.65    |
| s838.1                | 2.65  | 1.00  | 1.00  | 3.33  | 1.21  | 0.80  | 4.28  | 1.58  | 0.66  | 0.99  | 0.96  | 0.26    | 1.00  | 0.93  | 0.56    |
| s938                  | 2.70  | 1.00  | 1.00  | 3.39  | 1.21  | 0.80  | 4.36  | 1.58  | 0.66  | 0.99  | 0.96  | 0.34    | 1.00  | 0.94  | 3.86    |
| s953                  | 1.15  | 1.00  | 1.00  | 1.42  | 1.24  | 0.81  | 1.83  | 1.60  | 0.66  | 1.00  | 1.00  | 0.16    | 1.03  | 0.84  | 3.05    |
| s967                  | 1.13  | 1.00  | 1.00  | 1.41  | 1.24  | 0.81  | 1.81  | 1.61  | 0.66  | 1.00  | 1.00  | 0.15    | 1.02  | 0.81  | 1.01    |
| s991                  | 1.25  | 1.00  | 1.00  | 1.55  | 1.23  | 0.71  | 1.98  | 1.57  | 0.54  | 0.96  | 0.96  | 0.23    | 0.98  | 0.79  | 0.24    |
| s1196                 | 2.14  | 1.00  | 1.00  | 2.64  | 1.16  | 0.82  | 3.41  | 1.48  | 0.67  | 0.66  | 1.00  | 0.06    | 0.68  | 0.78  | 0.16    |
| s1238                 | 2.18  | 1.00  | 1.00  | 2.71  | 1.16  | 0.82  | 3.48  | 1.48  | 0.67  | 0.65  | 1.00  | 0.04    | 0.68  | 0.77  | 0.17    |
| s1269                 | 1.23  | 1.00  | 1.00  | 1.52  | 1.22  | 0.81  | 1.95  | 1.57  | 0.67  | 0.94  | 0.99  | 0.23    | 0.95  | 0.85  | 18.79   |
| s1423                 | 1.19  | 1.00  | 1.00  | 1.49  | 1.25  | 0.80  | 1.90  | 1.60  | 0.66  | 0.99  | 0.99  | 0.44    | 1.01  | 0.96  | 0.49    |
| s1488                 | 1.03  | 1.00  | 1.00  | 1.28  | 1.25  | 0.77  | 1.65  | 1.61  | 0.61  | 1.00  | 1.00  | 0.15    | 1.04  | 0.82  | 7.18    |
| s1494                 | 1.03  | 1.00  | 1.00  | 1.28  | 1.25  | 0.76  | 1.65  | 1.61  | 0.61  | 1.00  | 1.00  | 0.12    | 1.05  | 0.82  | 17.62   |
| s1512                 | 1.04  | 1.00  | 1.00  | 1.26  | 1.21  | 0.83  | 1.65  | 1.58  | 0.67  | 1.00  | 1.00  | 0.21    | 1.04  | 0.88  | 22.71   |
| s3271                 | 1.51  | 1.00  | 1.00  | 1.86  | 1.22  | 0.80  | 2.39  | 1.57  | 0.66  | 0.96  | 0.97  | 1.02    | 0.99  | 0.78  | 21.88   |
| s3330                 | 1.86  | 1.00  | 1.00  | 2.33  | 1.25  | 0.80  | 2.98  | 1.61  | 0.66  | 0.95  | 0.96  | 0.86    | 0.96  | 0.84  | 74.45   |
| s3384                 | 1.22  | 1.00  | 1.00  | 1.49  | 1.22  | 0.82  | 1.93  | 1.57  | 0.67  | 0.96  | 0.98  | 0.88    | 0.96  | 0.96  | 0.81    |
| s4863                 | 1.16  | 1.00  | 1.00  | 1.42  | 1.21  | 0.81  | 1.84  | 1.57  | 0.66  | 0.96  | 1.00  | 2.68    | x     | х     | х       |
| s5378                 | 1.24  | 1.00  | 1.00  | 1.51  | 1.21  | 0.82  | 1.98  | 1.57  | 0.66  | 1.00  | 1.00  | 0.44    | 1.03  | 0.80  | 643.02  |
| s6669                 | 1.21  | 1.00  | 1.00  | 1.48  | 1.22  | 0.81  | 1.91  | 1.57  | 0.67  | 0.99  | 0.99  | 2.86    | 1.00  | 0.93  | 858.68  |
| s9234                 | 1.49  | 1.00  | 1.00  | 1.83  | 1.24  | 0.82  | 2.36  | 1.59  | 0.68  | 1.00  | 1.00  | 2.13    | x     | x     | х       |
| s9234.1               | 1.83  | 1.00  | 1.00  | 1.83  | 1.24  | 0.82  | 2.36  | 1.59  | 0.68  | 1.00  | 1.00  | 2.22    | x     | х     | х       |
| s13207                | 1.96  | 1.00  | 1.00  | 1.74  | 1.24  | 0.81  | 2.24  | 1.61  | 0.67  | 0.98  | 0.98  | 2.93    | x     | x     | х       |
| s13207.1              | 1.35  | 1.00  | 1.00  | 1.74  | 1.23  | 0.81  | 2.24  | 1.60  | 0.67  | 0.98  | 0.98  | 2.72    | x     | х     | x       |
| s15850                | 1.31  | 1.00  | 1.00  | 1.62  | 1.25  | 0.82  | 2.09  | 1.61  | 0.68  | 0.96  | 0.97  | 8.87    | x     | x     | х       |
| s15850.1              | 1.31  | 1.00  | 1.00  | 1.70  | 1.24  | 0.81  | 2.19  | 1.59  | 0.67  | 1.00  | 1.00  | 3.85    | x     | х     | х       |
| s35932                | 2.66  | 1.00  | 1.00  | 1.25  | 1.25  | 0.79  | 1.58  | 1.58  | 0.65  | 1.00  | 1.00  | 9.72    | x     | x     | х       |
| s38417                | 2.59  | 1.00  | 1.00  | 1.87  | 1.25  | 0.80  | 2.39  | 1.60  | 0.66  | 1.00  | 1.00  | 23.49   | x     | x     | х       |
| s38584                | 1.28  | 1.00  | 1.00  | 1.59  | 1.25  | 0.80  | 1.00  | 1.61  | 0.66  | 1.00  | 1.00  | 6.35    | 1.04  | 0.84  | 569.17  |
| s38584.1              | 1.26  | 1.00  | 1.00  | 1.58  | 1.25  | 0.79  | 2.03  | 1.61  | 0.65  | 1.00  | 1.00  | 39.39   | 1.04  | 0.82  | 581.43  |
| prolog                | 1.87  | 1.00  | 1.00  | 2.35  | 1.25  | 0.80  | 2.99  | 1.60  | 0.66  | 0.96  | 0.94  | 1.03    | 0.98  | 0.77  | 1944.56 |
| Ave.                  | 1.75  | 1.00  | 1.00  | 2.07  | 1.23  | 0.81  | 2.65  | 1.59  | 0.66  | 0.97  | 0.99  |         | 0.99  | 0.85  |         |

TABLE III D

 $T_C$ 

 $T_S$  $P_d$ minimum clock period in g-frame power consumption in gate level

х out of memory