Wire Load Model for Rapid Power Consumption Evaluation in Early Design Stage of Via-Switch FPGA

Asuka Natsuhara\textsuperscript{1} Takashi Imagawa\textsuperscript{2} Hiroyuki Ochi\textsuperscript{1,2}

\textsuperscript{1}Graduate School of Information Science and Engineering, Ritsumeikan University
\textsuperscript{2}College of Information Science and Engineering, Ritsumeikan University
1-1-1 Noji-higashi, Kusatsu, Shiga, 525-8577 Japan
is0271pp@ed.ritsumei.ac.jp, takac-i@fc.ritsumei.ac.jp, ochi@cs.ritsumei.ac.jp

Abstract—This paper proposes a wire load model for via-switch FPGA to allow simulation-based power estimation before routing. Via-switch FPGA is expected to achieve a dramatic improvement in the area, delay, and power compared with conventional SRAM-based FPGA. To estimate the power consumption of an application circuit mapped on a via-switch FPGA, time-consuming routing process was needed before circuit simulation. Using the proposed post-placement simulation flow, runtime for power estimation is reduced by 63.8\% on average compared with the conventional post-routing simulation flow, with 11.8\% degradation of estimation error on average.

I. INTRODUCTION

The recent progress of manufacturing processes improved the scale, performance, and energy efficiency of circuits that application specific integrated circuits (ASICs) can realize. On the other hand, however, higher costs and longer time for development and fabrication of ASICs have become a serious problem. As an alternative for the ASICs, reconfigurable devices that allow the users to change the circuit configuration after fabrication are getting the attention. Among reconfigurable devices, FPGAs that hold configuration information in SRAM cells (SRAM-type FPGAs) have made remarkable progress, thanks to advances in CMOS process technology. However, SRAM-type FPGAs are far worse in area, delay, and power consumption than ASICs in exchange for their flexibility [1]. One of the factors is that the area of the SRAM cell used to hold configuration information is large.

To reduce the area, power and performance overhead of conventional FPGAs, a new reconfigurable device has been proposed that realizes the ON/OFF switching function between the wire segments by an element called an atom switch instead of an SRAM cell and a pass transistor [2]. Atom switch is a kind of a non-volatile resistive-change switch. It occupies a small footprint area and is integrated on back-end-of-line (BEoL) layers.

In [3], not only the routing resources of FPGA but also the truth table of LUT (Look-Up Table) that serves as the logic resource is realized using atom switches. Reference [4] proposes a 0-1-A-\overline{A} LUT to further improve the area and delay of the LUT in [3]. Thanks to the fact that the atom switch has lower resistance and smaller parasitic capacitance than the pass transistor, one of the LUT input signals of the 0-1-A-\overline{A} LUT is connected to the switch array, unlike conventional FPGAs in which all inputs are connected to the selection signals of the multiplexer (MUX). Reduction of power consumption as well as area and delay has been confirmed for LUT alone [5]. However, it has not been evaluated how much 0-1-A-\overline{A} LUT contributes to low power consumption by implementing benchmark circuits of practical size on an FPGA with a large number of LUTs. To evaluate the power consumption of the entire FPGA, it is necessary to analyze power consumption not only of LUTs but also of wire segments. To evaluate power consumption considering parasitic capacitance of wire segments, however, required time-consuming post-routing circuit simulation.

To eliminate the routing process from power estimation, this paper proposes a wire load model for via-switch FPGA derived from actual placement and routing results of circuits and present the experimental results of power consumption analysis using the model. The standard deviation of the relative estimation error on the number of switches calculated by the proposed post-placement model is 15.0\% in the worst case, while that of the post-synthesis model is 46.6\%. Using the proposed post-placement simulation flow, runtime for power estimation is reduced by 63.8\% on average compared with the conventional post-routing simulation flow, with 11.8\% degradation of estimation error on average.

The next section reviews via-switch device and via-switch FPGA architecture, and power estimation method for conventional FPGAs. Section 3 proposes the wire load model that utilizes post-placement information, and Section 4 evaluates the accuracy and runtime of the proposed method. Section 5 concludes the paper.
II. Background

A. Atom switch

Atom switch is a nonvolatile switch device that utilizes formation of a copper bridge from metal ion and its extinction [2]. It is composed of a solid-electrolyte sandwiched between copper (Cu) and ruthenium (Ru) electrodes. By applying a positive voltage to the Cu electrode, a Cu bridge is formed in the solid-electrolyte, and the switch turns ON. When a negative voltage is applied, the Cu atoms in the bridge are reverted to the Cu electrode, and the switch turns OFF. Figure 1 shows the ON- and OFF-state of the atom switch.

To improve the OFF-state reliability of the device, Complementary Atom Switch (CAS) that consists of two atom switches connected in series with opposite direction has been proposed [6]. Via-switch with two varistors stacked on a CAS has also been proposed to allow writing multi-fanout routing to two-dimensional CAS array without using access transistors [7]. Figure 2 and 3 shows the structure of the via-switch and its equivalent circuit model, respectively.

Since the via switch has both state storage and a switch function, it replaces the programmable switch element used in the existing FPGA, that is, an SRAM cell for the state storage and a pass transistor for the switch function. The via-switch is superior in terms of the following four points when compared with the switch consisting of an SRAM element and a pass transistor.

Footprint area: The footprint area of a via-switch is $18F^2$ [7], where $F$ is feature size, while that of conventional switch consisting of an SRAM element and a pass transistor is approximately 10x larger ($\approx 200F^2$). In addition, via-switch requires metal layers only, and does not consume transistor layers.

ON-resistance: It is possible to achieve low on-resistance (e.g., 200 ohms) of an atom switch, hence that of a CAS with two atom switches connected in series is smaller than that of a pass transistor.

Parasitic Capacitance: The parasitic capacitance of an atom switch is 0.14F, which is approximately 1/10 of a pass transistor.

Non-volatility: Unlike the SRAM, the via switch has non-volatility because the state of the bridge does not change even when the power is turned off.

B. Crossbar circuit

Achieving both area and programmability has been a challenge for reconfigurable devices. In conventional island-type FPGAs, switch blocks and connection blocks are used to realize programmable routing resources. In the via switch FPGA, a crossbar structure is used for this purpose. Figure 4 shows a crossbar circuit as an example. Via-switches are placed at each cross point of the crossbar, and a vertical and a horizontal track are connected by turning ON the switch at the crosspoint. Since the crossbar circuit in Fig. 4 has $6 \times 6$ tracks, 36 via-switches exist at the intersections. In the crossbar of the via-switch FPGA, at most one switch is allowed to be in the ON-state in the same row.

C. 0-1-$A$-$\overline{A}$ LUT

In the via-switch FPGA, 0-1-$A$-$\overline{A}$ LUT proposed in [4] is used as a programmable logic resource. As an example, Fig. 5 shows a 4-input 0-1-$A$-$\overline{A}$ LUT. This LUT has 4-bit inputs $A, B, C, D$, and 1-bit output $X$, and consists of an 8-row by 4-column switch matrix and an 8-input MUX. In the figure, the eight-input MUX is realized by seven two-input MUXs. A via-switch exists at each intersection of the left array part. Exactly one of the switches in each row is turned to an ON-state. Thereby, any one of 0, 1, $A$, and $\overline{A}$ can be selected as the value corresponding to each combination of the values of the inputs $B, C, D$, and $A$, and an arbitrary 4-input logic function can be realized by rewriting the states of via-switches. In this research, a combination of a 0-1-$A$-$\overline{A}$ LUT and a DFF is used as a logic block (LB).

D. Power consumption analysis for FPGAs

There is an increasing demand for power saving in FPGAs because FPGA consumes more power than ASIC in exchange for flexibility. Power optimization for FPGA using power consumption analysis has been investigated. Power analysis of application circuits implemented on FPGA has been attempted at various stages of the design.
flow. At a low design level where the wiring length after placement and routing can be determined accurately, the accuracy increases but the execution time becomes longer. On the other hand, at high design levels such as RTL, accuracy would be lower but execution time is much shorter. Thus, there is a trade-off between accuracy and execution time of power analysis.

The existing methods for dynamic and static power analysis for FPGAs are briefly summarized below [9, 10]. Dynamic power estimation consists of two components, load capacitance estimation and switching activity estimation. In the former, when the result of placement and routing is not available, the method of using the estimated wire length has been widely used, and in the latter, several methods have been reported, including methods to calculate the average switching behavior using a specific power estimation formula for FPGAs, methods based on statistical model, and those using simulation results with random input vector.

Static power estimation methods have been proposed, such as an analytical method using transistor model parameters and a macro model construction method based on simulation.

Above methods are all for SRAM based FPGAs, and it is necessary to establish a power model for accurate and efficient analysis in via-switch FPGAs.

E. Power analysis method when using placement and routing results

An overview of the architecture assumed in this paper is shown in Fig. 6. LB1 to LB4 in the figure are logic blocks (LB) composed of the LUT in Fig. 5 etc., and XB1 to 2 in the figure represent the crossbar circuits in Fig. 4. One tile consists of four LBs and two XBs, and Fig. 6 shows a tile array of $1 \times 2$ tiles. Also, the black lines in the left side of Figure 6 shows a path connecting LB4 to LB3, and the black lines on the right side is the path connecting LB4 to LB3 and LB2. The red circles shows the ON-state via-switches in these paths.

Figure 7 shows the equivalent circuit of the path of fan-out 1 on the left side of Fig. 6. Note that each track in the path has one ON-state switch and $(N - 1)$ OFF-state switches, where $N$ is the number of the tracks of the crossbar. In power consumption analysis, a via-switch in the OFF-state contributes as a parasitic capacitance. As shown in Fig. 3, the parasitic capacitance per switch in the OFF-state is 0.56 fF, so the parasitic capacitance per track is $0.56(N - 1)$ fF. Similarly, the equivalent circuit of the fan-out 2 path in the right side of Fig. 6 is as shown in Fig. 8. In this way, if netlist is created using each element obtained by placement and routing results, power consumption can be calculated by performing HSPICE simulation.

In this study, we use 0-1- $A\overline{A}$ LUTs with $k=4, 5,$ and 6 with inverters whose sizes are optimized in terms of the energy-delay product (ED product). We set $N = 100$. We set the tile array size for each circuit so that wiring completes successfully.

F. Power analysis using naive method and its problem

As outlined in the previous section, we can obtain power analysis results by generating the net list and conducting circuit simulation using HSPICE. A power consumption analysis example are illustrated in Fig. 9. Power consumption analysis results for four relatively similar scale circuits, bigkey, des, ex5p, and s298 from the MCNC benchmark are shown in Fig. 9(a), (b), (c), and (d), respectively. This figure shows the breakdown of the power consumption of LB and wiring resources when each circuit is placed and routed on a via-switch FPGA with LUT size $k$ being 4, 5, and 6. Here, the target process is the SOTB 65 nm process, the supply voltage 0.55 V, and the temperature 27 °C. Power consumption was analyzed by applying a random pattern of sufficient sequence length.

As demonstrated in Fig. 9, we can obtain valuable re-
III. Proposed Wire Load Model

To speed up the power analysis of the circuit implemented on the via-switch FPGA, we propose a wire load model to estimate the wiring capacitance without performing the routing process that takes long runtime.

To derive a wire load model, we first examine the relationship between fanout of nets (F.O.) and the number of switches on the nets (#switch_net) using the same benchmark circuit as in the previous section, namely bigkey, des, ex5p, and s298. In this experiment, we mapped each of the four circuits to LUTs of \( k = 6 \), and performed placement and routing to a via-switch FPGA to determine #switch_net. Figure 11(a) shows the distribution map in which the relationship between F.O. and #switch_net of all four circuits are plotted, and its enlarged view is shown in Fig. 11. The horizontal and vertical axis show F.O. and #switch_path, respectively, where #switch_path is defined as #switch_path = #switch_net/F.O. From these distribution maps, it can be seen that there is a negative correlation between #switch_path and F.O. We introduce a single-variable constant-logarithmic regression model to represent this relationship, which is plotted by an orange line in these figures.

As can be seen from Fig. 11(b), however, there is still a large variation in #switch_path for F.O. To improve the estimation accuracy, we introduce a new variable BOX, which is defined as a half perimeter of a bounding box that surrounds each net which is extracted from the post-placement layout. Note that runtime for placement is negligible compared with those for routing and circuit simulation as demonstrated in Fig. 10.

Since #switch_net is expected to increase monotonically as both BOX and F.O. variables increase, we introduce a two-variable linear regression model #switch_net = \( a \times F.O + b \times \text{BOX} + c \) to estimate #switch_net. Figure 12 exemplifies the relation between BOX and #switch_path for F.O. = 1, from which we can observe a positive correlation between BOX and #switch_path. We determined parameters of the two-variable linear regression model as follows.

\[
\begin{align*}
    a &= 3.287, \quad b = 1.600, \quad c = -3.918 \quad (\text{F.O.} \leq 10) \\
    a &= 3.713, \quad b = 3.714, \quad c = -37.527 \quad (\text{F.O.} > 10)
\end{align*}
\]

IV. Experimental Results

Table I shows the total switch count estimated by the proposed one-variable constant-logarithmic regression
model (post-synthesis) and two-variable linear regression model (post-placement) with actual total switch count (post-routing) for comparison. We used four circuits (bigkey, des, ex5p, s298) and three LUT sizes $k = 4$, 5, and 6 for the experiment. As shown in Table I, the estimation error of the total switch count ranges from $-9.1\%$ to $3.2\%$ for the post-synthesis model, and from $-7.3\%$ to $7.0\%$ for the post-placement model. Table II shows the standard deviation of the relative estimation error of the proposed models. As can be seen from Table II, the post-placement model is superior in accuracy over the post-synthesis model.

Figure 13 shows the estimated power consumption by the proposed post-placement model in comparison with those using post-routing (actual) #switch. The estimation error is $47.0\%$ in the worst case. Note that the post-placement model successfully identifies an architecture parameter $k$ that achieves the minimum power for each circuit.

V. Conclusion

In this paper, we proposed a wire load model for via-switch FPGA derived from actual placement-and-routing results of circuits and present the experimental results of power estimation accuracy and runtime using the model. The standard deviation of the relative estimation error on the number of switches calculated by the proposed post-placement model is $15.0\%$ in the worst case, while that of the post-synthesis model is $46.6\%$. Using the proposed post-placement simulation flow, runtime for power estimation is reduced by $63.8\%$ on average compared with the conventional post-routing simulation flow, with $11.8\%$ degradation of estimation error on average. As a fu-
tecture work, it is desirable to extend the applicability of the model to the via-switch FPGAs including arithmetic blocks (or DSP blocks). By quantitative analysis of the power consumption of the entire circuit implemented on a via-switch FPGA using our model, further development such as effective power optimization method is expected.

Acknowledgement

This work was supported by JST CREST under Grant JPMJCR1432. This work was also supported by VLSI Design and Education Center (VDEC), the University of Tokyo in collaboration with Synopsys, Inc.

References


