(Back to Session Schedule)

The 15th Workshop on Synthesis And System Integration of Mixed Information technologies

Poster I: Low Power and Timing
Time: 10:00 - 11:45 Monday, March 9, 2009
Location: Waikele & Kaneohe
Chairs: Jimmy Chien-Nan Liu (National Central University, Taiwan), Hiroaki Yoshida (University of Tokyo, Japan)

R1-1 (Time: 10:00 - 10:03)
TitleA New RTL Power Macro-modeling and Efficient Power Estimation Scheme
Author*Masaaki Ohtsuki, Masato Kawai, Masahiro Fukui (Ritsumeikan University, Japan)
Pagepp. 11 - 16
KeywordPower macro model, Power consumption estimation, Power model library, LUT
AbstractIn this paper, we have proposed a new efficient power modeling environment which uses a look-up table (LUT). It reduces the size of the LUT grossly, compared to conventional algorithms. It makes the power analysis and library building high efficient. The experimental results show our approach reduces the computation time to build the library to one tenth while keeping the accuracy of the power analysis. The RMS error and the largest error has been less than 15.23%, +/-60%, respectively.

R1-2 (Time: 10:03 - 10:06)
TitleAn Efficient Hardware Circuit Simulator for Power Grid Optimization System
Author*Taiki Hashizume (EECS Course Graduate School of Science and Engineering, Ritsumeikan University, Japan), Shinichi Nishizawa, Hisako Sugano (Department of VLSI System Design, Ritsumeikan University, Japan), Masaya Yoshikawa (Meijo University, Japan), Masahiro Fukui (Department of VLSI System Design, Ritsumeikan University, Japan)
Pagepp. 17 - 22
Keywordhardware simulator, power grid, optimization
AbstractThis paper discusses an efficient hardware circuit simulator for power grid optimization, and focuses particularly on the following points: (1) The simulator achieves high-speed simulation by developing dedicated hardware and adopting parallel processing. (2) Regarding simulation accuracy, the proposed simulator introduces hardware-oriented fixed point arithmetic instead of floating point arithmetic. It accomplishes the high accuracy by controlling intervals of simulation. Experiments prove that the proposed simulator using 79MHz FPGA and eight parallel processing achieves 32 times faster simulation than software processing with 2.8GHz CPU while maintaining the same accuracy in comparison with SPICE simulation.

R1-3 (Time: 10:06 - 10:09)
TitleIR-Drop-Aware Buffer/Flip-Flop Station Planning in Floorplan Design
AuthorHsin-Hwa Pan (AnaGlobe Technology, Inc., Taiwan), *Hung-Ming Chen (National Chiao Tung University, Taiwan), Chia-Yi Chang (Realtek Semiconductor Corp., Taiwan)
Pagepp. 23 - 28
KeywordPower Integrity, Buffer/FF Station, Floorplan Design
AbstractAs the technology scaled down, it is known that interconnect has become the dominant factor in determining the overall circuit performance and complexity. Buffer insertion is one of very effective and useful techniques to improve the interconnect performance. In order to find better places for buffers to be inserted, the buffer insertion stage during floorplanning usually clusters buffers in a region, which may cause additional IR-drop violation. On the other hand, in complex digital system with relatively large die areas operating at very high frequencies, many global signals traveling across the chip need several clock cycles to reach their destinations, thus requiring the adoption of pipelined interconnects. Together with the buffer stations/blocks, the increasing number of flip-flops will cause further voltage drop violation. In this paper, we propose a methodology to pipeline interconnect during the floorplan stage and consider the IR-drop during the planning of buffers and flip-flops at the same time. The experimental results show that our method can get a low system latency with power integrity preservation in 90nm technology node.

R1-4 (Time: 10:09 - 10:12)
TitleIR Drop-Driven Algorithm for Standard Cell Placement Considering Timing Windows
Author*Naoki Kitamura, Nobuyuki Umakoshi, Kaoru Okazaki (Osaka Electro-Communication University, Japan), Masayuki Terai (Osaka Gakuin University, Japan)
Pagepp. 29 - 34
Keywordplacement algorithm, power supply network, IR drop reduction, timing window
AbstractThis paper proposes a novel IR drop-driven algorithm for standard cell placement. We introduce our own function H that is an estimate of static IR drop for a standard cell placement. In order to improve the accuracy of the function H, the timing window and the short circuit current caused by cells during their output state transitions are taken into consideration. The proposed algorithm improves an initial placement by the simulated annealing, in which H is used as the cost function. The experimental results show that the proposed algorithm is effective.

R1-5 (Time: 10:12 - 10:15)
TitleEnergy Dissipation Reduction of Arithmetic Operations with Valid Digits
Author*Kazuhito Ito, Yorito Nagasaka (Saitama University, Japan)
Pagepp. 35 - 40
Keywordlow power, functional unit, adder, multiplier
AbstractIn order to reduce the energy dissipation in LSI chip, it is effective to reduce the frequency of value changes of the signals. In this paper, the valid digit bit is introduced to accompany the data to indicate whether the corresponding digit needs to be processed in arithmetic operations or processing can be omitted to reduce signal value changes. Experimental results show that the proposed functional units with the valid digit bit effectively reduces the energy dissipation.

R1-6 (Time: 10:15 - 10:18)
TitlePower Efficiency Index for Low Power LSI Design
Author*Yutaka Tamiya (Fujitsu Laboratories Limited, Japan), Masahiro Fujita (University of Tokyo, Japan)
Pagepp. 41 - 46
KeywordLow Power, Clock Gating
AbstractLow power is one of the most important issues on LSIs these days. However it is very hard to detect wasted power in large-scale LSIs. In this paper we propose "Power Efficiency Index (PEI)", which shows how efficiently the module consumes power to accomplish its task, and suggests which hardware modules may have wasted power. PEI is defined as a ratio of amount of output data against power consumption. The amount of output data indicates how much effects the module causes outside, and is easily calculated by a trace log of simulation. In our case studies, we have applied PEI to hardware optimization, and shown PEI can be useful for power optimizing: we have detected incomplete logics of clock gating in hardware and achieved 13.9% and 24.4% power reduction.

R1-7 (Time: 10:18 - 10:21)
TitleA Microprocessor-based Architecture for a Smart in vivo Biosensor
Author*Yohei Fukumizu, Tomonori Izumi, Hironori Yamauchi (Ritsumeikan University, Japan)
Pagepp. 47 - 51
Keywordin vivo, biosensor, low invasive, health care
AbstractA microprocessor-based chip architecture for in vivo health care device is presented. Since the microprocessor is intended to use in a biosensor, a capsule-type endoscope, and a micro-surgery robot, the chip needs to contain a sensor interface, an actuator interface, and a video controller as well as a microprocessor unit and a wireless communication circuit. A test implementation with 23,928 gates in 180 nm standard CMOS technology for validating operation is demonstrated.

R1-8 (Time: 10:21 - 10:24)
TitleLow Power Unequal Error Protection Media System Based on Error Concealment in H.264/AVC
Author*Yichun Tang, Jun Wang, Naoki Tajima, Satoshi Goto (Graduate School of Information, Production and Systems, Waseda University, Japan)
Pagepp. 52 - 57
KeywordUEP, H.264/AVC, LDPC
AbstractSince currently used Error Concealment (EC) has several disadvantages, also power consumed by error resilience tools will significantly affects battery life of mobile terminal (e.g. Cell-phone). In this paper we introduced a novel low power Unequal Error Protection (UEP) error robust media system, it integrates multi-rate Low Density Parity-Check (LDPC) codes as forward error correction (FEC) tools as well as H.264 codec. By utilizing our two proposed classification algorithms and motion stability estimation based UEP method, results proved our system greatly reduces power and video quality outperforms original method.

R1-9 (Time: 10:24 - 10:27)
TitleAn Experimental Comparison of Power Analysis Attacks against RSA Processors on ASIC and FPGA
Author*Atsushi Miyamoto, Naofumi Homma, Takafumi Aoki (Tohoku University, Japan), Akashi Satoh (National Institute of Advanced Industrial Science and Technology, Japan)
Pagepp. 58 - 63
KeywordCircuit analysis, Cryptographic hardware, Security evaluation, Side-channel attacks, Power analysis
AbstractThis paper presents Simple Power Analysis (SPA) attacks with chosen-message techniques against RSA processors, and investigates the different characteristics of power waveforms caused by two types of implementations (ASIC and FPGA) in detail. We also present Comparative Power Analysis an advanced power analysis attacks in which a pair of input data was used to enhance the waveform pattern for modular exponentiation. The result clearly shows that the power dissipation of modular squaring in the difference waveform was greatly reduced when compared to modular multiplication, allowing all of the secret key bits to be successfully revealed.

R1-10 (Time: 10:27 - 10:30)
TitleOn Using Spare Cells for Functional Changes with Wirelength Consideration
Author*Yun-Ru Wu, Shu-Yun Chen (Realtek Semiconductor Crop., Taiwan), Kuang-Yao Lee, Ting-Chi Wang (National Tsing Hua University, Taiwan)
Pagepp. 64 - 69
Keywordspare cell, ECO functional change
AbstractIn current industrial design methodologies, designers often take advantage of using spare cells when they have to make some functional changes or fix timing problems. However, the methodology of realizing functional changes by using spare cells is very complex and difficult. It could consist of two steps – technology mapping and spare cell selection. Traditional technology mapping only maps functions into the cells in a library without considering any resource constraint, so it is not suitable for this methodology. After technology mapping, how to make selections on spare cells is also an important issue, because bad selections will seriously impact the result. In this paper, we study the problem of functional changes using spare cells, and present an approach to efficiently solve the problem with the goal of minimizing the increase in wirelength. Our approach consists of a technology mapping method and a legalization method that both work together to generate the initial selection on spare cells, followed by a refinement process that is used to improve the selection with further reduction in wirelength increase. We also propose two methods for the refinement process. The experimental results are given to demonstrate the effectiveness and efficiency of our approach.

R1-11 (Time: 10:30 - 10:33)
TitleA Gaussian Mixture Model to Propagate Delay and Slew Distributions Together in Statistical Timing Analysis
Author*Shingo Takahashi, Shuji Tsukiyama (Chuo University, Japan)
Pagepp. 70 - 75
Keywordstatistical timing analysis, Gaussian mixture model, delay distribution, slew distribution, variability
AbstractIn order to improve the performance of the current statistical timing analysis, a mechanism to propagate slews together with delay distributions along signal paths is necessary, since the delay of any circuit element depends on the input slew, and the input slew is adjunct to the propagated input which is determined from delay values. In this paper, we introduce Gaussian mixture models to represent delay distributions, and propose a novel algorithm to propagate a pair of delay distribution and slew distribution in a given circuit graph. By using Gaussian mixture model, we can represent a non-Gaussian delay distribution generated by the statistical Max or Min operation appropriately, and handle topological correlations easily by storing necessary covariance values. Moreover, by propagating slews together with delay distributions, we can modify delay distributions of circuit elements dynamically by the propagated slew distributions. An experimental result shows that the proposed algorithm could reduce the error of mean+3sigma value from the statistical timing analysis using simple Gaussian distributions, and the maximum improvement was 4.5 points.

R1-12 (Time: 10:33 - 10:36)
TitleEmbedded Delay Detectors to Choose the Fastest Route in FPGAs for Variation-aware Reconfiguration
Author*Yohei Kume, Yuuri Sugihara, Camlai Ngo, Kazutoshi Kobayashi, Hidetoshi Onodera (Kyoto University, Japan)
Pagepp. 76 - 81
Keywordvariation, fpga, Reconfigure
AbstractWe propose a variation-aware post-fabrication optimization scheme on FPGAs using delay detectors. Variation-aware optimization usually takes huge measurement cost. The proposed scheme achieves a constant optimization cost for any  circuit configuration. Delay detectors are embedded in clustered CLBs to choose fastest paths among multiple candidates, which enable simultaneous measurement of critical path candidates to partition all critical paths into segments. We fabricated a test chip in a 90nm process and confirm the detection capability is less than 10ps.

R1-13 (Time: 10:36 - 10:39)
TitlePerformance-Driven Architectural Synthesis for Multicycle Communication
Author*Chia-I Chen, Juinn-Dar Huang (Department of Electronics Engineering, National Chiao Tung University, Taiwan)
Pagepp. 82 - 87
Keywordmulticycle communication, Architectural Synthesis, distributed register architecture
AbstractIn deep submicron era, wire delay is no longer negligible and is gradually dominating the system performance. To solve this problem, several state-of-art architecture synthesis flows have been proposed for the distributed register architecture by allowing on-chip multicycle communication. In this paper, we present a new performance-driven criticality- aware synthesis flow CriAS targeting regular distributed register architectures. CriAS features a hierarchical binding strategy and a coarse-grained placer to minimize the number of critical global data transfers. The key ideas are to take time criticality as the major concern at earlier binding stages before the detailed physical placement information is available, and to preserve the locality of closely related critical components in the later placement phase. The experimental results show that 19% overall performance improvement can be achieved on average as compared to the previous work.

R1-14 (Time: 10:39 - 10:42)
TitleA Fast Regular Expression Matching Engine for an FPGA-based Network Intrusion Detection System
Author*Yosuke Kawanaka, Shin'ichi Wakabayashi, Shinobu Nagayama (Hiroshima City University, Japan)
Pagepp. 88 - 93
Keywordpattern matching, network intrusion detection, regular expression, FPGA
AbstractThis paper presents a high-performance pattern matching engine for network intrusion detection. In the proposed pattern matching engine, a pattern is specified by a subclass of regular expression. Since the proposed circuit is based on a pattern-independent architecture, it allows dynamic pattern updating, that is important for network intrusion detection. By processing multiple packets (character strings) simultaneously, our pattern matching engine achieves high throughput. This paper also presents a new FPGA-based network intrusion detection system (NIDS) architecture using our pattern matching engine.

R1-15 (Time: 10:42 - 10:45)
TitleFast Division Circuit in GF(2m) Based on the Extended Euclid's Algorithm with Parallelization of Modular Reductions
Author*Katsuki Kobayashi, Naofumi Takagi (Nagoya University, Japan)
Pagepp. 94 - 99
KeywordGalois field, division, Euclid's algorithm
AbstractWe propose a fast division circuit in GF(2m). It is based on the extended Euclid's algorithm and requires only one cycle to perform the operations that require two cycles of previously reported division circuits based on the extended Euclid's algorithm. Since the proposed circuit performs modular reductions in parallel by changing the order of execution of the operations, it has almost the same critical path delay as the previously proposed ones. The proposed circuit computes division in m clock cycles, whereas the previously proposed circuits take 2m-1 or more clock cycles. By logic synthesis, the computation time of the proposed circuit is estimated to over 35% shorter than that of a previously proposed circuit.