Title | A BCH Decode Accelerator for Application Specific Processors |
Author | *Kazuhito Ito (Saitama University, Japan) |
Page | pp. 115 - 121 |
Keyword | BCH, accelerator, processor |
Abstract | The BCH code is one of popular error correction codes (ECC) and decoding BCH requires many bit oriented operations as well as word oriented operations. A dedicated hardware BCH decoder is less flexible and decoding BCH by base processor consumes many instructions in bit operations and requires large memory area for look-up tables. In this paper, we propose an auxiliary circuit included in application specific pipelined processors which accelerates the BCH decoding process. |
Title | Design and FPGA Implementation of a High-Speed String Matching Engine |
Author | *Yosuke Kawanaka, Shin'ichi Wakabayashi, Shinobu Nagayama (Hiroshima City University, Japan) |
Page | pp. 122 - 129 |
Keyword | string matching, FPGA, special-purpose hardware, regular expressions |
Abstract | A high-speed string matching circuit for searching a pattern in a given text is proposed. In the circuit, a pattern is specified by a class of restricted regular expressions. The architecture of the circuit is a one-dimensional array of simple processing units. The proposed circuit was designed with Verilog-HDL, and was implemented using a Xilinx Virtex4 chip. |
Title | Speed Improvement of AES Encryption using Hardware Acclererators Synthesized by C Compatible Architecture Prototyper (CCAP) |
Author | *Hiroyuki Kanbara (ASTEM RI, Japan), Takayuki Nakatani, Naoto Umehara (Ritsumeikan University, Japan), Nagisa Ishiura (Kwansei Gakuin University, Japan), Hiroyuki Tomiyama (Nagoya University, Japan) |
Page | pp. 130 - 134 |
Keyword | high level synthesis, Embedded system, Codesign, AES Encryption |
Abstract | The authors are developping a high-level synthesizer called C Compatible Architecture Prototyper (CCAP). CCAP compiles ANSI C program which is a part of embedded software and generates an hardware accelerator in HDL. CCAP offers an arbiter circuit which makes it possible for the synthesized accelerator and a cpu to access main memory in parallel. In this paer we report the speed improvement of AES Encryption using CCAP. |
Title | A Hybrid Logic Simulator Using LUT Cascade Emulators |
Author | *Hiroki Nakahara, Tsutomu Sasao, Munehiro Matsuura (Kyushu Institute of Technology, Japan) |
Page | pp. 135 - 141 |
Keyword | LUT cascade, Logic simulation, Design Verification |
Abstract | This paper presents a hybrid logic simulator using both an event-driven and a cycle-based methods. For special primitives such as memories and tri-state buffers, it uses an event-driven method. For other parts, it uses a cycle-based method using LUT cascade emulators. To simulate a large scale circuit, it partitions the circuit into smaller ones, and realizes each part by an LUT cascade emulator. Next, it combines these emulators by interconnections. Since a multiplier often requires large memories in an LUT cascade, an instruction of the processor is used instead of the LUT cascade. This will reduce the code size and the simulation time. Our experiment shows that proposed method is effective for circuits including arithmetic operations. |
Title | Statistical Estimation Method for Verification Coverage Using FPGA-based Emulators |
Author | *Kohei Hosokawa, Yuichi Nakamura (NEC, Japan), Baku Haraguchi (NEC Micro Systems, Japan) |
Page | pp. 142 - 146 |
Keyword | FPGA-based Emulators, Verification Coverage, Toggle Coverage, Statistics, Test-Pattern |
Abstract | We propose a new method to quickly estimate toggle coverage as an indicator of verification coverage for a large number of test patterns. The proposed method uses statistical interval estimation theory to reduce the number of signals required to estimate the toggle coverage, which normally requires transition information for all the signals in a circuit. Since this reduction decreases a size of toggle measurement circuits on an FPGA, the toggle coverage can be estimated by an FPGA-based emulator that can operate at speeds in the MHz order, which is roughly 10^4 - 10^5 times faster than HDL simulators. We confirmed by experiment that the average estimation error is within +-1% in actual LSI emulations. |
Title | Blockage-Aware Routing Tree Construction with Concurrent Buffer and Flip-Flop Insertion |
Author | Shu-Yun Chen (Realtek Semiconductor Corp., Taiwan), *Ting-Chi Wang (National Tsing Hua University, Taiwan) |
Page | pp. 147 - 154 |
Keyword | Routing, Buffer/Flip-Flop Insertion, Physical Design |
Abstract | For high-frequency designs, concurrent buffer and flip-flop insertion becomes inevitable for interconnect delay optimization. To the best of our knowledge, all existing works perform concurrent buffer and flip-flop insertion on a given routing tree. The given routing tree, however, may greatly limit the effectiveness of concurrent buffer and flip-flop insertion. In this paper, we present a method which simultaneously constructs a routing tree and performs concurrent buffer and flip-flop insertion subject to latency constraints. We also propose four speed-up techniques to further reduce the computation time. The experimental results show that our method has 90% success rate in generating a feasible solution while a sequential method, which separates the tree construction and the concurrent buffer and flip-flop insertion into 2 steps, has only 57% success rate. For the test cases in which both our method and the sequential method can generate feasible solutions, our method has up to 96% chance to produce better solutions. |
Title | Low-Power Clock Tree Synthesis by Low-Swing Techniques |
Author | Yun-Ta Lin (SpringSoft, Inc., Taiwan), *Hung-Ming Chen (Dept of EE and SoC Research Center, National Chiao Tung University, Taiwan) |
Page | pp. 155 - 160 |
Keyword | Clock Tree Synthesis, Low Power, Low Swing |
Abstract | Chips running at higher frequency consume much more power. Without carefully planning clock network, the chips will suffer from high power dissipation. In this paper, we present a methodology which can be applied in buffered clock tree synthesis to achieve low power demands and zero-skew constraint. It is based on the low-swing interconnections for the clock signal transmission and the low-swing double-edge triggered flip-flops for synchronizing elements. DME based buffering is applied for reducing the number of buffers inserted as well as wirelength in order to lower power consumption. The experimental results are encouraging. We obtain average 49\% power saving in equivalent clock rate, compared with a previous work based on low-swing interconnection. |
Title | Post-Silicon Clock-timing Tuning Based on Statistical Estimation |
Author | *Yuko Hashizume, Yasuhiro Takashima (The University of Kitakyushu, Japan), Yuichi Nakamura (NEC Corporation, Japan) |
Page | pp. 161 - 165 |
Keyword | deskew, linear programming, PDE |
Abstract | In deep-submicron technologies, process variations can severely affect the performance and yield of VLSI chips. As a countermeasure to the variations, post-silicon tuning has been proposed. Deskew, where the clock timing of flip-flops (FFs) is tuned by inserting delay elements into the clock tree is classified into this method. We propose a novel deskew method that decides delay values from measuring a small amount of FFs’ clock timing and estimating the rest of FFs’ clock timings based on a statistical model. |
Title | Speed Enhancement Technique for the Post-fabrication Clock-timing Adjustment of Digital LSIs |
Author | *Tatsuya Susa (Graduate School of Science, Toho University, Japan), Masahiro Murakawa, Eiichi Takahashi (National Institute of Advanced Industrial Science and Technology, Japan), Tatsumi Furuya (Graduate School of Science, Toho University, Japan), Tetsuya Higuchi (National Institute of Advanced Industrial Science and Technology, Japan), Shinji Furuichi, Yoshitaka Ueda, Atsushi Wada (Sanyo Electric Co., Ltd, Japan) |
Page | pp. 166 - 173 |
Keyword | post-fabrication adjustment, adjustment simulation, process variation, yield, genetic algorithm |
Abstract | We propose a speed enhancement technique for post-fabrication clock-timing adjustment to realize practical applications. The method reduces adjustment time by reducing the number of adjustment points by utilizing static timing analysis (STA) results and adopting an improved distribution for the initial GA population. Moreover, we have developed an adjustment simulator to predict adjustment results with the proposed method at the LSI design stage. Adjustment experiments using the developed simulator demonstrate that our method can adjust practical LSIs with 1,031 flipflops within a few seconds. |
Title | Repairs for Voltage Drop and Noise Violation in Late Design Stages |
Author | Shih-Tsung Huang (AnaGlobe Technology, Taiwan), *Hung-Ming Chen (Dept of EE and SoC Research Center, National Chiao Tung University, Taiwan) |
Page | pp. 174 - 178 |
Keyword | DSM, ECO, Voltage Drop, Crosstalk Noise |
Abstract | Since many second order problems have emerged in deep submicron (DSM) era, some critical functional changes in ECO cause inevitable timing and voltage drop violations. In this paper, we have proposed a methodology to reduce %coupling capacitance and voltage drop and noise violation with minimal design changes, which can be used in ECO or late design stage. It is simple to be plugged it into current design flow, and is efficient so that we can avoid excess timing and voltage drop check iterations and repair the power delivery damage from limited resource in late design stage. We formulate this problem as a longest path problem and fix the violation by using lower metal layer power lines for power compensation. We have integrated this framework with a commercial tool and experimental results show that our methodology can successfully relieve the violations of noise and IR-drop in ECO or late design stage. |
Title | Estimation of Yield Enhancement by Critical Path Reconfiguration Utilizing Random Variations on Deep-submicron FPGAs |
Author | *Yuuri Sugihara, Yohei Kume, Kazutoshi Kobayashi, Hidetoshi Onodera (Kyoto University, Japan) |
Page | pp. 179 - 183 |
Keyword | FPGA, variation-aware, yield enhancement |
Abstract | In this paper, we estimate yield enhancement by critical path reconfiguration of deep submicron FPGAs which suffer from drastic yield loss due to process variations. Critical path reconfiguration is dedicated to random process variations which are hard to predict. First, an initial configuration for an implemented circuit is applied to all fabricated FPGAs and at-speed test are done. Then failed signal paths are rerouted to different locations. Reroute and at-speed test are repeated several times to enhance yield. Locations of the critical paths are optimized chip by chip incrementally according to chip-oriented random variations. Theoretical analysis is done to verify the effectiveness of critical path reconfiguration compared with multiple configurations according to the number of critical paths in the presense of random variations. |
Title | A Mixed Integer Linear Programming Based Approach for Post-Routing Redundant Via Insertion |
Author | Kuang-Yao Lee, *Ting-Chi Wang (National Tsing Hua University, Taiwan), Kai-Yuan Chao (Intel Corporation, United States) |
Page | pp. 184 - 191 |
Keyword | Redundant via, Physical design, Design for manufacturability |
Abstract | Redundant via insertion is highly recommended to improve chip yield and reliability. The well-studied double-cut via insertion (DVI) problem allows a single via in a chip to have at most one redundant via inserted next to it, but the solution to this problem is not good enough particularly for high-activity and power nets because those nets typically need more redundant vias to further enhance reliability. This motivates us to study in this paper a new problem, called the multiple-cut via insertion (MVI) problem, in which one redundant via or more can be inserted next to a single via such that the amount of single vias with redundant vias inserted next to them and the amount of inserted redundant vias are both maximized. We formulate the MVI problem as a mixed integer linear programming (MILP) problem. To make the problem tractable, we further break the MILP problem into a set of much smaller MILP problems each of which is solved independently and efficiently without sacrificing the optimality. Besides, we identify that the DVI problem is just a special case of the MVI problem, and therefore our MILP approach can be easily adapted to optimally solve the DVI problem as well. To the best of our knowledge, none of the existing DVI works can guarantee the optimality. The extensive experimental results are provided to support the efficiencies of our MILP approaches on both the MVI and DVI problems. |
Title | Fast Monotonic Via Assignment Excluding Mold Gates for 2-Layer Ball Grid Array Packages |
Author | *Yoichi Tomioka, Atsushi Takahashi (Tokyo Institute of Technology, Japan) |
Page | pp. 192 - 197 |
Keyword | ball grid array, package, monotonic, 2-layer, routing |
Abstract | Ball Grid Array packages in which I/O pins are arranged in a grid array pattern realize a number of connections between chips and a printed circuit board, but it takes much time in manual routing. We propose a fast routing method for 2-layer Ball Grid Array packages to support designers. Our method obtains a via assignment which distributes wires evenly on top layer and has high completion ratio of nets by improving via assignment iteratively. |
Title | An I/O Planning Method for Three-Dimensional Integrated Circuits |
Author | *Chao-Hung Lu (National Central University, Taiwan), Hung-Ming Chen (National Chiao Tung University, Taiwan), Chien-Nan Jimmy Liu, Wen-Yu Shih (National Central University, Taiwan) |
Page | pp. 198 - 202 |
Keyword | I/O, Partition, 3D |
Abstract | 3DIC is an alternative choice when we design a chip because this architecture has high performance and high density properties. In this paper, we propose a partition methodology to solve the problem of I/O assignment and number of 3D-Via in the 3DIC design. The I/O partitioning method is based on the F-M algorithm and the method would consider the total number of 3D-Via and the I/O number for each tier at the same time. Experimental results show that our approach can reduce the number of 3D-Vias while balances the I/O number for each tier. Additionally, our partition result and the floorplan algorithm can be integrated together. |
Title | Non-Slicing Floorplanning-Based Crosstalk Reduction on Gridless Track Assignment |
Author | *Wen-Nai Cheng, Yu-Ning Chang, Yih-Lang Li (National Chiao-Tung University, Taiwan) |
Page | pp. 203 - 207 |
Keyword | VLSI design, physical design, Gridless Routing, Track Assignment, Crosstalk minimization |
Abstract | Track assignment, which is an intermediate stage between global routing and detailed routing, provides a good platform for promoting performance, and for imposing additional constraints during routing, such as crosstalk. Gridless track assignment (GTA) has not been addressed in public literature. This work develops a gridless crosstalk-driven GTA. Initial assignment is produced rapidly with a left-edge like algorithm. Crosstalk reduction on the assignment is then transformed to a restricted non-slicing floorplanning problem, and a deterministic O-tree based algorithm is employed to re-assign each net segment. Finally, each panel is partitioned into several sub-panels, and the sub-panels are re-ordered using branch and bound algorithm to decrease the crosstalk further. Experimental results demonstrate that the proposed gridless crosstalk-driven GTA has over 80% reduction in the overlapping length of adjacent wires. |
Title | Fujimaki-Takahashi Squeeze : Linear Time Construction of Constraint Graphs of a Floorplan for a Given Permutation |
Author | *Ryo Fujimaki, Toshihiko Takahashi (Niigata University, Japan) |
Page | pp. 208 - 213 |
Keyword | Floorplan, Representation, Permutation, Constraint graph |
Abstract | A floorplan is a subdivision of a rectangle into rectangular faces with horizontal and vertical line segments. We call a floorplan room-to-room when adjacency between rooms are considered. Fujimaki and Takahashi showed that any room-to-room floorplan can be represented as a permutation. In this paper, we give an O(n)-time algorithm that constructs the vertical and the horizontal constraint graphs of a floorplan for a given permutation under the representation. |
Title | Placement with Symmetry Constraints for Analog IC Layout Design based on Tree Representation |
Author | *Natsumi Hirakawa, Kunihiro Fujiyoshi (Tokyo University of Agriculture and Technology, Japan) |
Page | pp. 214 - 221 |
Keyword | symmetry constraints, O-tree |
Abstract | Symmetry constrains are the constraints that the given cells should be placed symmetrically in design of analog ICs. We use O-tree to represent placements and propose a decoding algorithm which can obtain a closest packing satisfying the constraints. The decoding algorithm uses linear programming, which is time consuming. Therefore we propose a method to judge if there exists a packing corresponding to a given O-tree or not on graph, and use the method before linear programming. The effectiveness of the proposed method was shown by computational experiments. |