|
|
|||||||||||||||||||||
| Monday, March 9, 2009 |
| Title | Analysis and Optimization of Power-Gated Designs |
| Author | Ming-Chao Lee, De-Shiuan Chiou, *Shih-Chieh Chang (National Tsing Hua University, Taiwan) |
| Page | pp. 3 - 8 |
| Abstract | Leakage power has become a major contribution in total power consumption. A very popular technique to reduce the leakage power is power gating. In this paper, we discuss two important optimization issues in power gating designs. First, we discuss the sizing problem of the sleep transistors which is the trade-off between the size and IR drop noise in the power gating designs. Moreover, we also discuss the wakeup scheduling optimization and propose efficient wakeup scheduling for a power gating design. Our experimental results are very encouraging and outperform those previous works. |
| Title | A New RTL Power Macro-modeling and Efficient Power Estimation Scheme |
| Author | *Masaaki Ohtsuki, Masato Kawai, Masahiro Fukui (Ritsumeikan University, Japan) |
| Page | pp. 11 - 16 |
| Keyword | Power macro model, Power consumption estimation, Power model library, LUT |
| Abstract | In this paper, we have proposed a new efficient power modeling environment which uses a look-up table (LUT). It reduces the size of the LUT grossly, compared to conventional algorithms. It makes the power analysis and library building high efficient. The experimental results show our approach reduces the computation time to build the library to one tenth while keeping the accuracy of the power analysis. The RMS error and the largest error has been less than 15.23%, +/-60%, respectively. |
| Title | An Efficient Hardware Circuit Simulator for Power Grid Optimization System |
| Author | *Taiki Hashizume (EECS Course Graduate School of Science and Engineering, Ritsumeikan University, Japan), Shinichi Nishizawa, Hisako Sugano (Department of VLSI System Design, Ritsumeikan University, Japan), Masaya Yoshikawa (Meijo University, Japan), Masahiro Fukui (Department of VLSI System Design, Ritsumeikan University, Japan) |
| Page | pp. 17 - 22 |
| Keyword | hardware simulator, power grid, optimization |
| Abstract | This paper discusses an efficient hardware circuit simulator for power grid optimization, and focuses particularly on the following points: (1) The simulator achieves high-speed simulation by developing dedicated hardware and adopting parallel processing. (2) Regarding simulation accuracy, the proposed simulator introduces hardware-oriented fixed point arithmetic instead of floating point arithmetic. It accomplishes the high accuracy by controlling intervals of simulation. Experiments prove that the proposed simulator using 79MHz FPGA and eight parallel processing achieves 32 times faster simulation than software processing with 2.8GHz CPU while maintaining the same accuracy in comparison with SPICE simulation. |
| Title | IR-Drop-Aware Buffer/Flip-Flop Station Planning in Floorplan Design |
| Author | Hsin-Hwa Pan (AnaGlobe Technology, Inc., Taiwan), *Hung-Ming Chen (National Chiao Tung University, Taiwan), Chia-Yi Chang (Realtek Semiconductor Corp., Taiwan) |
| Page | pp. 23 - 28 |
| Keyword | Power Integrity, Buffer/FF Station, Floorplan Design |
| Abstract | As the technology scaled down, it is known that interconnect has become the dominant factor in determining the overall circuit performance and complexity. Buffer insertion is one of very effective and useful techniques to improve the interconnect performance. In order to find better places for buffers to be inserted, the buffer insertion stage during floorplanning usually clusters buffers in a region, which may cause additional IR-drop violation. On the other hand, in complex digital system with relatively large die areas operating at very high frequencies, many global signals traveling across the chip need several clock cycles to reach their destinations, thus requiring the adoption of pipelined interconnects. Together with the buffer stations/blocks, the increasing number of flip-flops will cause further voltage drop violation. In this paper, we propose a methodology to pipeline interconnect during the floorplan stage and consider the IR-drop during the planning of buffers and flip-flops at the same time. The experimental results show that our method can get a low system latency with power integrity preservation in 90nm technology node. |
| Title | IR Drop-Driven Algorithm for Standard Cell Placement Considering Timing Windows |
| Author | *Naoki Kitamura, Nobuyuki Umakoshi, Kaoru Okazaki (Osaka Electro-Communication University, Japan), Masayuki Terai (Osaka Gakuin University, Japan) |
| Page | pp. 29 - 34 |
| Keyword | placement algorithm, power supply network, IR drop reduction, timing window |
| Abstract | This paper proposes a novel IR drop-driven algorithm for standard cell placement. We introduce our own function H that is an estimate of static IR drop for a standard cell placement. In order to improve the accuracy of the function H, the timing window and the short circuit current caused by cells during their output state transitions are taken into consideration. The proposed algorithm improves an initial placement by the simulated annealing, in which H is used as the cost function. The experimental results show that the proposed algorithm is effective. |
| Title | Energy Dissipation Reduction of Arithmetic Operations with Valid Digits |
| Author | *Kazuhito Ito, Yorito Nagasaka (Saitama University, Japan) |
| Page | pp. 35 - 40 |
| Keyword | low power, functional unit, adder, multiplier |
| Abstract | In order to reduce the energy dissipation in LSI chip, it is effective to reduce the frequency of value changes of the signals. In this paper, the valid digit bit is introduced to accompany the data to indicate whether the corresponding digit needs to be processed in arithmetic operations or processing can be omitted to reduce signal value changes. Experimental results show that the proposed functional units with the valid digit bit effectively reduces the energy dissipation. |
| Title | Power Efficiency Index for Low Power LSI Design |
| Author | *Yutaka Tamiya (Fujitsu Laboratories Limited, Japan), Masahiro Fujita (University of Tokyo, Japan) |
| Page | pp. 41 - 46 |
| Keyword | Low Power, Clock Gating |
| Abstract | Low power is one of the most important issues on LSIs these days. However it is very hard to detect wasted power in large-scale LSIs. In this paper we propose "Power Efficiency Index (PEI)", which shows how efficiently the module consumes power to accomplish its task, and suggests which hardware modules may have wasted power. PEI is defined as a ratio of amount of output data against power consumption. The amount of output data indicates how much effects the module causes outside, and is easily calculated by a trace log of simulation. In our case studies, we have applied PEI to hardware optimization, and shown PEI can be useful for power optimizing: we have detected incomplete logics of clock gating in hardware and achieved 13.9% and 24.4% power reduction. |
| Title | A Microprocessor-based Architecture for a Smart in vivo Biosensor |
| Author | *Yohei Fukumizu, Tomonori Izumi, Hironori Yamauchi (Ritsumeikan University, Japan) |
| Page | pp. 47 - 51 |
| Keyword | in vivo, biosensor, low invasive, health care |
| Abstract | A microprocessor-based chip architecture for in vivo health care device is presented. Since the microprocessor is intended to use in a biosensor, a capsule-type endoscope, and a micro-surgery robot, the chip needs to contain a sensor interface, an actuator interface, and a video controller as well as a microprocessor unit and a wireless communication circuit. A test implementation with 23,928 gates in 180 nm standard CMOS technology for validating operation is demonstrated. |
| Title | Low Power Unequal Error Protection Media System Based on Error Concealment in H.264/AVC |
| Author | *Yichun Tang, Jun Wang, Naoki Tajima, Satoshi Goto (Graduate School of Information, Production and Systems, Waseda University, Japan) |
| Page | pp. 52 - 57 |
| Keyword | UEP, H.264/AVC, LDPC |
| Abstract | Since currently used Error Concealment (EC) has several disadvantages, also power consumed by error resilience tools will significantly affects battery life of mobile terminal (e.g. Cell-phone). In this paper we introduced a novel low power Unequal Error Protection (UEP) error robust media system, it integrates multi-rate Low Density Parity-Check (LDPC) codes as forward error correction (FEC) tools as well as H.264 codec. By utilizing our two proposed classification algorithms and motion stability estimation based UEP method, results proved our system greatly reduces power and video quality outperforms original method. |
| Title | An Experimental Comparison of Power Analysis Attacks against RSA Processors on ASIC and FPGA |
| Author | *Atsushi Miyamoto, Naofumi Homma, Takafumi Aoki (Tohoku University, Japan), Akashi Satoh (National Institute of Advanced Industrial Science and Technology, Japan) |
| Page | pp. 58 - 63 |
| Keyword | Circuit analysis, Cryptographic hardware, Security evaluation, Side-channel attacks, Power analysis |
| Abstract | This paper presents Simple Power Analysis (SPA) attacks with chosen-message techniques against RSA processors, and investigates the different characteristics of power waveforms caused by two types of implementations (ASIC and FPGA) in detail. We also present Comparative Power Analysis an advanced power analysis attacks in which a pair of input data was used to enhance the waveform pattern for modular exponentiation. The result clearly shows that the power dissipation of modular squaring in the difference waveform was greatly reduced when compared to modular multiplication, allowing all of the secret key bits to be successfully revealed. |
| Title | On Using Spare Cells for Functional Changes with Wirelength Consideration |
| Author | *Yun-Ru Wu, Shu-Yun Chen (Realtek Semiconductor Crop., Taiwan), Kuang-Yao Lee, Ting-Chi Wang (National Tsing Hua University, Taiwan) |
| Page | pp. 64 - 69 |
| Keyword | spare cell, ECO functional change |
| Abstract | In current industrial design methodologies, designers often take advantage of using spare cells when they have to make some functional changes or fix timing problems. However, the methodology of realizing functional changes by using spare cells is very complex and difficult. It could consist of two steps – technology mapping and spare cell selection. Traditional technology mapping only maps functions into the cells in a library without considering any resource constraint, so it is not suitable for this methodology. After technology mapping, how to make selections on spare cells is also an important issue, because bad selections will seriously impact the result. In this paper, we study the problem of functional changes using spare cells, and present an approach to efficiently solve the problem with the goal of minimizing the increase in wirelength. Our approach consists of a technology mapping method and a legalization method that both work together to generate the initial selection on spare cells, followed by a refinement process that is used to improve the selection with further reduction in wirelength increase. We also propose two methods for the refinement process. The experimental results are given to demonstrate the effectiveness and efficiency of our approach. |
| Title | A Gaussian Mixture Model to Propagate Delay and Slew Distributions Together in Statistical Timing Analysis |
| Author | *Shingo Takahashi, Shuji Tsukiyama (Chuo University, Japan) |
| Page | pp. 70 - 75 |
| Keyword | statistical timing analysis, Gaussian mixture model, delay distribution, slew distribution, variability |
| Abstract | In order to improve the performance of the current statistical timing analysis, a mechanism to propagate slews together with delay distributions along signal paths is necessary, since the delay of any circuit element depends on the input slew, and the input slew is adjunct to the propagated input which is determined from delay values. In this paper, we introduce Gaussian mixture models to represent delay distributions, and propose a novel algorithm to propagate a pair of delay distribution and slew distribution in a given circuit graph. By using Gaussian mixture model, we can represent a non-Gaussian delay distribution generated by the statistical Max or Min operation appropriately, and handle topological correlations easily by storing necessary covariance values. Moreover, by propagating slews together with delay distributions, we can modify delay distributions of circuit elements dynamically by the propagated slew distributions. An experimental result shows that the proposed algorithm could reduce the error of mean+3sigma value from the statistical timing analysis using simple Gaussian distributions, and the maximum improvement was 4.5 points. |
| Title | Embedded Delay Detectors to Choose the Fastest Route in FPGAs for Variation-aware Reconfiguration |
| Author | *Yohei Kume, Yuuri Sugihara, Camlai Ngo, Kazutoshi Kobayashi, Hidetoshi Onodera (Kyoto University, Japan) |
| Page | pp. 76 - 81 |
| Keyword | variation, fpga, Reconfigure |
| Abstract | We propose a variation-aware post-fabrication optimization scheme on FPGAs using delay detectors. Variation-aware optimization usually takes huge measurement cost. The proposed scheme achieves a constant optimization cost for any circuit configuration. Delay detectors are embedded in clustered CLBs to choose fastest paths among multiple candidates, which enable simultaneous measurement of critical path candidates to partition all critical paths into segments. We fabricated a test chip in a 90nm process and confirm the detection capability is less than 10ps. |
| Title | Performance-Driven Architectural Synthesis for Multicycle Communication |
| Author | *Chia-I Chen, Juinn-Dar Huang (Department of Electronics Engineering, National Chiao Tung University, Taiwan) |
| Page | pp. 82 - 87 |
| Keyword | multicycle communication, Architectural Synthesis, distributed register architecture |
| Abstract | In deep submicron era, wire delay is no longer negligible and is gradually dominating the system performance. To solve this problem, several state-of-art architecture synthesis flows have been proposed for the distributed register architecture by allowing on-chip multicycle communication. In this paper, we present a new performance-driven criticality- aware synthesis flow CriAS targeting regular distributed register architectures. CriAS features a hierarchical binding strategy and a coarse-grained placer to minimize the number of critical global data transfers. The key ideas are to take time criticality as the major concern at earlier binding stages before the detailed physical placement information is available, and to preserve the locality of closely related critical components in the later placement phase. The experimental results show that 19% overall performance improvement can be achieved on average as compared to the previous work. |
| Title | A Fast Regular Expression Matching Engine for an FPGA-based Network Intrusion Detection System |
| Author | *Yosuke Kawanaka, Shin'ichi Wakabayashi, Shinobu Nagayama (Hiroshima City University, Japan) |
| Page | pp. 88 - 93 |
| Keyword | pattern matching, network intrusion detection, regular expression, FPGA |
| Abstract | This paper presents a high-performance pattern matching engine for network intrusion detection. In the proposed pattern matching engine, a pattern is specified by a subclass of regular expression. Since the proposed circuit is based on a pattern-independent architecture, it allows dynamic pattern updating, that is important for network intrusion detection. By processing multiple packets (character strings) simultaneously, our pattern matching engine achieves high throughput. This paper also presents a new FPGA-based network intrusion detection system (NIDS) architecture using our pattern matching engine. |
| Title | Fast Division Circuit in GF(2m) Based on the Extended Euclid's Algorithm with Parallelization of Modular Reductions |
| Author | *Katsuki Kobayashi, Naofumi Takagi (Nagoya University, Japan) |
| Page | pp. 94 - 99 |
| Keyword | Galois field, division, Euclid's algorithm |
| Abstract | We propose a fast division circuit in GF(2m). It is based on the extended Euclid's algorithm and requires only one cycle to perform the operations that require two cycles of previously reported division circuits based on the extended Euclid's algorithm. Since the proposed circuit performs modular reductions in parallel by changing the order of execution of the operations, it has almost the same critical path delay as the previously proposed ones. The proposed circuit computes division in m clock cycles, whereas the previously proposed circuits take 2m-1 or more clock cycles. By logic synthesis, the computation time of the proposed circuit is estimated to over 35% shorter than that of a previously proposed circuit. |
| Title | Future Direction of Integrated Nano and Micro-systems |
| Author | Cyril Condemine, Marc Belleville, *Ahmed Jerraya (CEA-LETI, France) |
| Page | pp. 103 - 104 |
| Title | Static Scheduling of Dynamic Execution for High-Level Synthesis |
| Author | *Yuki Toda, Nagisa Ishiura, Kousuke Sone (Kwansei Gakuin University, Japan) |
| Page | pp. 107 - 112 |
| Keyword | high-level synthesis, behavioral synthesis, variable scheduling, indefinite cycle operation |
| Abstract | This article presents variable scheduling and binding for high-level synthesis. Conventional scheduling algorithms decide the operations' execution timing assuming that each operation takes a fixed number of cycles. However, on some operations, the number of cycles may vary depending on the values of operands or the states of the hardware. The variable scheduling enables efficient computation in the presence of such indefinite cycle operations. Experimental results show that the number of the execution cycles are reduced by about 18%. |
| Title | TRANSYSCTOR: A General Methodology and Framework for Rule-Based Transformation and Refactoring of SystemC Designs |
| Author | *Alexander Viehl, Jordan Dukadinov, Oliver Bringmann (FZI Forschungszentrum Informatik, Germany), Wolfgang Rosenstiel (University of Tübingen, Germany) |
| Page | pp. 113 - 118 |
| Keyword | SystemC, Design Automation, Transformation, Verification, Performance analysis |
| Abstract | In this paper, a framework is presented for performing automated transformations of SystemC designs based on transformation rules with the objective of speeding up system design and verification implementation with SystemC. The framework is based on an open source SystemC parser and provides a generic transformation core as well as the ability of C++/SystemC code generation as backend. Two use cases from design verification and formal performance analysis are presented for showing different application areas, flexibility, and extensibility of the developed methodology. Experiments provide promising numbers on saved implementation efforts and hence on the high value of the developed solution. |
| Title | An Error Diagnosis Technique Based on Location Sets to Rectify Subcircuits |
| Author | *Kosuke Shioki, Narumi Okada, Toshiro Ishihara, Tetsuya Hirose, Nobutaka Kuroki, Masahiro Numa (Kobe University, Japan) |
| Page | pp. 119 - 124 |
| Keyword | Error diagnosis, ECO, Design error, Incremental synthesis |
| Abstract | This paper presents an error diagnosis technique based on location sets to rectify subcircuits. This technique can rectify circuits including LUT function errors by fewer modifications than the conventional technique. The proposed technique obtains sets of the locations for rectifying subcircuits, and rectifies circuits based on the sets. Experimental results have shown that our technique reduces increase in the number of locations to be rectified with conventional technique by 95 %. |
| Title | An Efficient Exploring Method of Room-to-Room Floorplan |
| Author | *Yosuke Takahashi, Kunihiro Fujiyoshi (Tokyo University of Agriculture and Technology, Japan) |
| Page | pp. 125 - 130 |
| Keyword | Floorplan, Adjacency, FT-Squeeze, Simulated Annealing |
| Abstract | A floorplan is often a dissection of a rectangle chip by horizontal and vertical line segments, and decides rough position of modules in the chip. FT-Squeeze, which is a permutation of room numbers and can represent any floorplan, has been proposed. Based on FT-Squeeze, we can search floorplans using Simulated Annealing. But if a constraint that some modules must be assigned to adjacent rooms is imposed, the search takes enormous time since many solutions are found that they violate the constraints after the decoding. In this paper, we propose a method to check whether two appointed modules are assigned to adjacent rooms in constant time. |
| Title | A Conjecture on the Number of Extra Registers in Safe Clocking-Based Register Assignment |
| Author | *Keisuke Inoue, Mineo Kaneko, Tsuyoshi Iwagaki (Japan Advanced Institute of Science and Technology, Japan) |
| Page | pp. 131 - 136 |
| Keyword | delay variation, register assignment, BDD clocking |
| Abstract | Recently, Backward-Data-Direction (BDD) clocking based register assignment in high-level synthesis has been proposed. However, as a major drawback, BDD clocking based register assignment tends to increase the number of registers. It causes the area overhead and the extra power consumption, therefore it should be minimized. Interestingly, the experiments we have done so far show that the increase is at most one. This paper treats the simple question whether the register overhead is always at most one or not. An estimation of the upper-bound is helpful not only for estimating the number of extra registers when we apply BDD clocking, but also for developing a heuristic algorithm for BDD clocking based register assignment. The conjecture proposed in this paper is not proved nor disproved in general cases. In this paper, we proved and showed that it is true for two simple cases. |
| Title | Circuit Acyclic Clustering with Input/Output Constraints and Applications |
| Author | *Rung-Bin Lin, Tsung-Han Lin, Shin-An Wu (Yuan Ze University, Taiwan) |
| Page | pp. 137 - 142 |
| Keyword | Circuit Clustering, Partitioning, Acyclic, Logic simulation, Leakage power |
| Abstract | This article studies a new circuit acyclic clustering problem which divides a combinational circuit into groups of sub-circuits, each of which has a limited numbers of inputs and outputs. Several heuristics are proposed for solving this problem. With application of our partitioning results, we achieve on average three times speedup on logic simulation for finding an input vector that incurs minimum or maximum leakage power dissipation. Applications of our approach to other area such as physical design are possible. |
| Title | On the Number of Rooms in a Rectangular Solid Dissection |
| Author | *Hidenori Ohta (Tokyo University of Agriculture and Technology, Japan), Toshinori Yamada (Saitama University, Japan), Kunihiro Fujiyoshi (Tokyo University of Agriculture and Technology, Japan) |
| Page | pp. 143 - 148 |
| Keyword | Rectangular Solid Dissection, Rectangular Dissection, 3D-VLSI, Number of Rooms, Plane Graph |
| Abstract | In these years, 3D-LSIs which consist of several silicon layers have been developed and been attracted attentions. For floorplaning of 3D-LSIs, a rectangular solid dissection, which is a dissection of a rectangular solid into smaller rectangular solids by planes, also has been attracted attentions and been studied much. However, not so many properties have been clarified about a rectangular solid dissection. This paper presents the relation between the number of rooms and that of walls in a rectangular solid dissection. |
| Title | Assertion Checker Synthesis for FPGA Emulation |
| Author | *Chengjie Zang, Qixin Wei, Shinji Kimura (Graduate School of Information, Production and Systems, Waseda University, Japan) |
| Page | pp. 149 - 154 |
| Keyword | SystemVerilog Assertion, finite input memory automaton, synthesis, FPGA |
| Abstract | In the paper, we propose a method to synthesize SystemVerilog Assertion checkers for FPGA emulation. The main idea is to synthesize assertions based on finite input memory automata(FIMA) and use embedded RAM modules to construct shift register chain to store the history of variables. The method does not consume logic elements for storing the value and the shift register using the embedded RAM is much more efficient compared with the one uses the registers in logic elements. We also compare proposed FIMA method with MBAC method and a tool of FoCs. |
| Title | Automatic Pipeline Generation for FPGA-based Prototyping |
| Author | *Weijie Xing, Kai Zheng (Graduate School of Information, Production and Systems, Waseda University, Japan), Tomoo Kimura, Shunichi Kuromaru, Kouji Kai (Panasonic Corporation, Japan), Shinji Kimura (Graduate School of Information, Production and Systems, Waseda University, Japan) |
| Page | pp. 155 - 160 |
| Keyword | FPGA, acceleration, pipeline |
| Abstract | In this paper, we propose a new approach for the acceleration of circuits by automatically generating pipeline structures for FPGA-based prototyping. In the method, an original circuit from the FPGA mapping result is converted into a pipelined one by dividing the circuits to several parts using pipeling registers. When introducing pipeline registers, we utilize the un-used registers in logic elements of FPGA. The method divides a combinational part of the original circuit into shorter-delay parts by using the cut-set based algorithm for enhancing the data throughput, and show several sufficient conditions under which circuits can be correctly converted to pipelined ones. The effect of this method is shown by the experimental results using a tool implementing the algorithm |
| Title | VLSI Design of a Handwritten-Character Learning and Recognition system based on Associative Memory |
| Author | *Shogo Sakakibara, Wataru Imafuku, Akio Kawabata, Tania Ansari, Hans Jürgen Mattausch, Tetsushi Koide (Hiroshima University, Japan) |
| Page | pp. 161 - 166 |
| Keyword | Associative Memory, Character Recognition, Learning, Optimization, LSI |
| Abstract | In the presented research, an associative memory architecture for searching the most similar data among previously stored reference data is applied. The chosen associative memory achieves high speed, low power consumption and small implementation area. To recognize new data, a learning capability based on the concept of short/long-term memory is realized. For improvement of the recognition rate, we propose a reference-data-optimization algorithm. We evaluated the proposed VLSI-design method for the application of hand-written character learning and recognition. Test-chip in 0.18 um CMOS technology was designed to demonstrate the proposed algorithm and design method. |
| Title | Improved Region-Growing Image-Segmentation Algorithm Based on the HSV Color Space |
| Author | *Tatsuya Sugahara, Keita Okazaki, Naomi Nagaoka, Ryosuke Kimura, Tetsushi Koide, Hans Jürgen Mattausch (Hiroshima University, Japan) |
| Page | pp. 167 - 171 |
| Keyword | image segmentation, connection-weight, color space |
| Abstract | This paper presents an image-segmentation algorithm which uses a connection-weight-based region-growing algorithm. The calculation of the connection weights, which express the similarity between neighboring pixels, is based on the HSV color space. Two new methods for improvement of segmentation results are applied, namely (i) dynamically changing the classification border between chromatic and achromatic pixels and (ii) 2nd stage segmentation if the Volume color space component has a wide distribution for a segment obtained in the 1st stage segmentation. The effectiveness of these methods for improving the segmentation quality is confirmed with segmentation examples of natural images. } |
| Title | The Design of Frequency Domain Inter Carrier Interference (ICI) Canceling Circuit caused by Radio Frequency Shift for OFDM Receiver |
| Author | *Kenta Nohara, Tomohisa Wada (University of the Ryukyus, Japan) |
| Page | pp. 172 - 176 |
| Keyword | OFDM, ICI, FIR filter, Scatterd pilot, Radio Frequency Shift |
| Abstract | Orthogonal Frequency Division Multiplexing (OFDM) is getting popular for high bandwidth digital communication. Since its sub-carrier spacing is small, OFDM performance is easily degraded by Doppler Frequency Error. This paper proposes a Frequency Domain Radio frequency shift canceller. Simulation result shows roughly two times higher Doppler shift performance was obtained for 64QAM, 8K-FFT OFDM system. |
| Title | A New Architecture Extension for Mitigation of Permanent Functional Unit Faults Using Hot-Swapping Concepts |
| Author | *Zoltan Endre Rakosi, Masayuki Hiromoto, Hiroyuki Ochi (Kyoto University, Japan), Yukihiro Nakamura (Ritsumeikan University, Japan) |
| Page | pp. 177 - 182 |
| Keyword | Hot-Swapping, Dynamically Re-configurable, Fault-tolerant, Dependable Computing, Coarse-grained ALU array |
| Abstract | In this paper, we propose a new architecture extension suitable for arrays of functional units, that will provide testing and replacement of faulty units, without interrupting normal system operation. The extension relies on data-path switching controlled by a hot-swapping algorithm, by use of which functional units are tested and replaced by spares if necessary, ensuring permanent operation while the spares last. A case study is presented on a sample architecture. The Hot-Swapping functionality could be added with an overhead of 74-87% based on the granularity of the native array. |
| Title | A Bottom-Up Exploration Approach for 3D Graphics Hardware Accelerator in Consumer Electronics |
| Author | *Chi-Tsai Yeh, Liang-Bi Chen, Ching-Yuan Lin, Hung-Yu Chen, Ing-Jer Huang (Department of Computer Science and Engineering, National Sun Yat-Sen University, Taiwan) |
| Page | pp. 183 - 188 |
| Keyword | 3D Graphics, SystemC, SoC, System-Level, ESL |
| Abstract | 3D Graphics (3DG) application is generally used in consumer electronics which is an inevitable tendency in the future. Usually, we use high abstraction-level to model a complex system like 3DG SoC. However, the concerned issue is that how to use an efficient method to achieve the required performance within the cost constraint. We propose a bottom-up exploration approach by using SystemC that progressively improve system performance. According to result, we improve 198% at geometry function and 69% at rendering function, respectively, altogether. |
| Title | Small Area Multipliers Utilizing the Sum of Operands |
| Author | *Hirotaka Kawashima, Naofumi Takagi (Nagoya University, Japan) |
| Page | pp. 189 - 194 |
| Keyword | VLSI, arithmetic circuit, multiplication |
| Abstract | A method to halve the number of partial product bits in multiplication is proposed. An integrated partial product (IPP) is introduced. The proposed method separates the IPP into four cases according to a pair of a multiplicand bit and a multiplier bit. The value of the IPP is obtained by selecting a value from the four cases. The total number of IPP bits becomes half the total number of ordinary partial product bits by utilizing the sum of the operands. The proposed method is applicable to both unsigned and signed multiplication. Multipliers using the proposed method are smaller than array multipliers and Wallace multipliers by approximately 30%, and smaller than multipliers with radix-4 Booth's method by approximately 10%. |
| Title | Design Challenges and Technologies for Cell Broadband Engine |
| Author | *Yoshio Masubuchi (Toshiba Corporation, Japan) |
| Page | p. 197 |
| Title | Evaluation of the Performance of the MIMD Mode of a Dynamically Switchable SIMD/MIMD Processor by Using an Image Recognition Application |
| Author | *Shohei Nomoto, Shorin Kyo, Shinichiro Okazaki (NEC, Japan) |
| Page | pp. 201 - 206 |
| Keyword | SIMD, MIMD, Reconfigurable |
| Abstract | We have developed an “XC core” processor that achieves low cost, high performance, and low power consumption through the use of a highly parallel SIMD architecture (the SIMD mode), as well as achieves high flexibility by morphing into a MIMD architecture (MIMD mode). In this paper, the effectiveness of the MIMD mode is evaluated by using a white line detection algorithm for open roads. The evaluation shows that real-time processing of the algorithm (less than 33 ms) can be achieved by using the MIMD mode to execute the verification process of white line segments, which is a part of the algorithm not suitable to be executed by the SIMD mode. Moreover we also show that verification can be executed five times faster by using region of interest (ROI) transfer instructions to efficiently transfer the ROI of an image. Furthermore, the execution time in the MIMD mode according to the number of PUs used, from 2 to 32, is also measured. The measured results show that the performance improvement rate slow down when using more than 16 PUs in the MIMD mode, mainly due to the insufficient parallelism in the verification process. As a whole, by using the MIMD mode, a 12.6 times speedup is achieved by using 32 PUs, comparing with only using the SIMD mode. |
| Title | Pipelining SHA-2 Implementations using Carry Save Adders |
| Author | *Anh Tuan Hoang, Katsuhiro Yamazaki (Department of VLSI System Design, Ritsumeikan University, Japan), Shigeru Oyanagi (Department of Computer Science, Ritsumeikan University, Japan) |
| Page | pp. 207 - 212 |
| Keyword | SHA-2, fine-grained pipelining, cryptography, carry save adder |
| Abstract | The security hash algorithm (SHA), which is used to verify the integrity of a message, involves computation iterations on data. The huge computation delay generated in that iteration limits the entire throughput of the system, and makes it difficult to pipeline the computation. We describe a way to pipeline the computation using fine-grained pipelining with balanced critical paths. One critical path is broken into two by using data forwarding. The other critical path is broken into three stages by using computation postponement. The results critical paths all have two full-adder-layers with some data movements, and thus are balanced. The adders are implemented using carry save adders (CSA). Effectiveness of the usage of the two adder architectures analyzed and compared in terms of hardware size, frequency, throughput, and performance area rate. |
| Title | Hardware Accelerator for Feature Point Detection Part of SIFT Algorithm & Corresponding Hardware-Friendly Modification |
| Author | *Jingbang Qiu, Tianci Huang, Takeshi Ikenaga (Graduate School of IPS, Waseda University, Japan) |
| Page | pp. 213 - 218 |
| Keyword | SIFT, hardware accelerator, inetger solution, one time interpolation, real-time |
| Abstract | We propose a hardware accelerator structure of the Feature Point Detection part in SIFT which is possible to implement on FPGA. Fully Integer Solution is applied. Also, we re-design the process as a 12-block structure and reduce the times of interpolation so as to lower hardware cost. In our experiment, we achieve Max Clock Frequency of 68.0MHz, which could deal with about 100 640x480-size images per second. The proposal is suitable for real-time FPGA system. |
| Title | Variability Characterization and Tolerance on Throughput and Power for Chip-Multiprocessors |
| Author | *Wan-Yu Lee, Iris Hui-Ru Jiang (Department of Electronics Engineering, National Chiao Tung University, Taiwan) |
| Page | pp. 219 - 223 |
| Keyword | process variability, chip-multiprocessor, voltage island, frequency island, Monte Carlo analysis |
| Abstract | This paper proposes a new architecture of variability-tolerant chip-multiprocessor. To mitigate the impact of process variability on throughput and power, voltage and frequency islands are introduced into chip-multiprocessors. Thus, voltage island frequency island chip-multiprocessors enable per-core scaling on the supply voltage and operating frequency. It can naturally collaborate with dynamic voltage frequency scaling. The process variations are characterized through an analytical model, and are quantified through Monte Carlo analysis. Compared with the design without process variations, when 70 threads are run on a chip of 70 small cores, our results show throughput degradation is 0.1%, while power reduction is 34.3%. |
| Title | A Ternary Multi-Ported Content Addressable Memory Architecture utilizing Asynchronous Multiple Search-Operation Technology |
| Author | *Takeshi Kumaki, Masaharu Tagami, Yuta Imai, Tetsushi Koide, Hans Jürgen Mattausch (Hiroshima University, Japan) |
| Page | pp. 224 - 229 |
| Keyword | CAM, multiport, ternary, routing, redundancy |
| Abstract | This paper presents a ternary multi-ported content addressable memory (CAM) architecture utilizing asynchronous multiple search-operation technology, aiming at efficient high throughput of associative-search operations. The asynchronous multiple search-operation technology adopts a Flexible Multi-ported Content Addressable Memory (FMCAM) architecture, which is reported. The proposed ternary multi-ported CAM architecture achieves a fast associative table-lookup solution for high-speed routing applications, such as IP packet forwarding and effectively realizes a Ternary Flexible Multi-ported Content Addressable Memory which we refer to as TFMCAM in this paper. The main novel points of the architecture are simultaneous multiple associative-search operations and a high implementation-yield ratio. Furthermore, the TFMCAM architecture realizes the necessary background table maintenance function without preventing the associative-search operation. For verifying the effectiveness of the TFMCAM architecture, FPGA and ASIC implementation results will be evaluated for final paper. |
| Title | A Hardware Design for the First Pass of A Large Vocabulary Continuous Speech Recognition System |
| Author | *Akihiko Eguchi, Joe Hashimoto (Kinki University, Japan), Makoto Saituji (NEC Electronics, Japan), Akihisa Yamada (Sharp Corporation, Japan), Takashi Kambe (Kinki University, Japan) |
| Page | pp. 230 - 235 |
| Keyword | speech recognition, first pass, C-based architecture design, function based vector pipeline |
| Abstract | Speech recognition is becoming popular as a technology for the implementation of human interfaces. However, conventional approaches to large vocabulary continuous speech recognition require a high performance CPU. In this paper, we describe a speech recognition system designed using a C-based architecture design methodology, which avoids this limitation. Application specific hardware for the first pass data processing step is designed to achieve real time recognition with low-speed CPU on a portable terminal, and its performance is evaluated. |
| Title | Coarse-Grained Dynamically Reconfigurable Architecture with Flexible Reliability |
| Author | *Younghun Ko, Dawood Alnajjar, Yukio Mitsuyama, Masanori Hashimoto, Takao Onoye (Osaka University, Japan) |
| Page | pp. 236 - 241 |
| Keyword | reliability, soft error, coarse-grained, reconfigurable architecture, TMR |
| Abstract | Abstract—Acceptable soft error rate on a VLSI chip varies depending on applications and operating environment so that recent VLSI designers concern reliability specification. In this paper, we propose a novel coarse-grained dynamically reconfigurable architecture, which offers flexible reliability. A notion of cluster is introduced as a basic element of the proposed architecture, each of which can select four operation modes with different levels of spatial redundancy and area-efficiency. In the TMR operation mode, which attains the highest reliability level, outputs of three execution modules are voted inside of a cluster, making it possible to perform an error recovery without any rollback operations. Evaluation of permanent error rates demonstrates that four different reliability levels can be achieved by the proposed architecture. The area of additional circuits to attain tolerance to soft errors provide flexible reliability accounts for 30.5% of the proposed coarse-grained reconfigurable device. |
| Title | Low Cost Design of an Advanced Encryption Standard (AES) Processor Using a New Common-Subexpression-Elimination Algorithm |
| Author | *Ming-Chih Chen (Department of Electronic Engineering, National Kaohsiung First University of Science and Technology, Taiwan), Shen-Fu Hsiao (Department of Computer Science and Engineering, National Sun Yat-Sen University, Taiwan) |
| Page | pp. 242 - 247 |
| Keyword | AES, VLSI, chip, CSE, encryption |
| Abstract | In this paper, we propose an area-efficient design of Advanced Encryption Standard (AES) processor by applying a new common-expression-elimination (CSE) method to the sub-functions of various transformations in AES. The proposed method reduces the area cost of realizing the sub-functions by extracting the common factors in the bit-level expressions of these sub-functions using a new CSE algorithm. Cell-based implementation results show that the AES processor with our proposed CSE method has significant area improvement compared with previous designs. |
| Title | DSP Array Breadboard System for Application on Foreground Segmentation |
| Author | *Bin Wu, Takao Nishitani (Tokyo Metropolitan University, Japan) |
| Page | pp. 248 - 253 |
| Keyword | DSP array, Gaussian Mixture Model, FPGA |
| Abstract | This paper describes a DSP array breadboard system for evaluating statistical signal processing architectures for various algorithms. An example algorithm, employed here, is foreground segmentation from a dynamic background. Although several different algorithms have been proposed, the simplest but most popular pixel based algorithm is introduced for the evaluation. A trial of a single chip FPGA implementation is also shown to pave the way to realize future signal processing architecture. |
| Title | An Interface for Representing Dynamically Reconfigurable Architectures by using Graph with Configuration Information |
| Author | *Vasutan Tunbunheng, Hideharu Amano (Keio University, Japan) |
| Page | pp. 254 - 259 |
| Keyword | dynamically reconfigurable system, retargetable compiler, architecture representation |
| Abstract | For developing a new dynamically reconfigurable architecture, a designer requires retargetable compiler for generating configuration data to evaluate the architecture in architectural exploration space. The Black-Diamond compiler using Graph with Configuration Information (GCI) to represent reconfigurable resources inside the target architectures. It translates data-flow graph from C-like front-end description, applies placement and routing by using GCI, and generates configuration data for each element of the architecture. This paper shows an idea of interface using for modifying the design on GCI. |
| Title | A Case Study of Clockless Bundled-data On-chip Interconnect Design using Double Edge Triggered Flip-flops |
| Author | *Katsunori Tanaka, Yuichi Nakamura (NEC Corporation, Japan) |
| Page | pp. 260 - 265 |
| Keyword | clockless (asynchronous) logic, on-chip interconnect, network-on-chip, bundled-data, four phase protocol |
| Abstract | This paper shows a case study of clockless (asynchronous) bundled-data on-chip interconnect design using double edge-triggered flip-flops (DET-FFs). Increasing power dissipation by shrinking technology process has led LSI designers to multi-core design, but due to load unbalance, multi-core LSIs still waste large power for cores with small loads. Then, dynamic and flexible adjustment of clock frequencies to the cores and GALS (Globally-Asynchronous, Locally-Synchronous) design using clockless on-chip interconnect are key techniques for reducing the wasted power. Since four-phase handshaking used in clockless logic requires two round-trips of signal communication, it has significant difficulties to provide high-speed inter-core communication. This paper thus proposes use of DET-FFs for higher throughput, and shows an experimental design with its area and performance. |
| Title | A VLSI Architecture of Tone Classification Function-Based Isolated-Word Speech Recognition |
| Author | *Jirabhorn Chaiwongsai, Werapon Chiracharit, Kosin Chamnongthai (King Mongkut's University of Technology Thonburi, Thailand), Yoshikazu Miyanaga (Hokkaido University, Japan), Kouji Higuchi (University of Electro-Communications, Japan) |
| Page | pp. 266 - 270 |
| Keyword | tone classification function, VLSI implementation, parallel computation, pipeline process, look-up table |
| Abstract | Speech recognition in tonal languages such as Thai, Chinese, etc. classifies word meaning by using tone. Therefore tone classification function is extremely essential part for improving accuracy rate. This paper presents a novel VLSI architecture of tone classification function-based isolated word speech recognition. The architecture consists of two parts; feature extraction and tone classification function. In feature extraction part, voice detection, pitch period estimation and slope classification are introduced. The proposed pitch period is calculated by using parallel computation and 3-stage pipeline process. In the classification function, look-up table technique is employed to detect tone by using only F0 characteristic information. This takes advantage in reducing the complexity of computation cost of the proposed architecture. Moreover, no training set is used. To evaluate the proposed architecture, the experiment is performed with 100 word vocabularies selected from 20-40 years old dependent-speakers. The architecture is implemented on Altera Cyclone II series FPGAs running at 50 MHz. The results reveal 88.25% accuracy rate and 8.27 ms/word processing time. |
| Title | Speculative Configuration Prefetching for Multi-Context Architectures |
| Author | *Sven Eisenhardt, Julio Oliveira, Tommy Kuhn, Wolfgang Rosenstiel (Universität Tübingen, Germany) |
| Page | pp. 271 - 276 |
| Keyword | coarse-grained, reconfiguration, multi-context, array, prefetching |
| Abstract | Multi-context reconfigurable arrays provide the ability for prefetching the subsequent configuration into the architecture's context memory during execution. This is difficult, however, if the subsequent configuration cannot be determined ahead of execution. In this paper we present a method to minimize the reconfiguration time overhead by speculatively prefetching configurations in non-deterministic sequences. As an example we reconfigured an array to process FFT kernels of different sizes. By applying speculative reconfiguration prefetching it was possible to reduce the reconfiguration overhead by 38%. |
| Title | Efficient Mode Selection Algorithm for Inter-Layer Residual Prediction of H.264/SVC |
| Author | *Yoshitaka Morigami, Shinpei Matsuoka, Tian Song, Takashi Shimamoto (Tokushima University, Japan) |
| Page | pp. 277 - 282 |
| Keyword | H.264, SVC |
| Abstract | This paper presents an efficient mode selection algorithm to reduce the computational complexity when using inter-layer residual prediction of H.264/SVC. Proposed two steps algorithm focuses on the complexity reduction of the inter-layer residual prediction. The experiment results show that proposed algorithm can considerably reduce redundant computation complexity with almost no coding efficiency loss. |
| Title | A Case Study on AES Encryption System Design with SystemBuilder |
| Author | *Yuki Ando, Seiya Shibata, Shinya Honda, Hiroyuki Tomiyama, Hiroaki Takada (Nagoya University, Japan) |
| Page | pp. 283 - 288 |
| Keyword | AES, HLS, coarse-grain pipelining, SW/HW partitioning |
| Abstract | This paper presents a case study on designing an Advanced Encryption Standard (AES) Encryption System using our system-level design toolkit named SystemBuilder. We start with a sequential specification of the AES Encryption System behavior and generate an FPGA implementation. In order to improve the performance, we iteratively refine the behavioral description based on the analysis result obtained by a profiler. Finally, AES Encryption System with pipelined hardware implementation achieved 5.0 times better performance than that with software implementation. |
| Tuesday, March 10, 2009 |
| Title | CMP Service: Past, Present, Future |
| Author | *Bernard Courtois (CMP, France) |
| Page | pp. 291 - 298 |
| Abstract | Infrastructures to provide access to custom integrated hardware manufacturing facilities are important because they allow Students and Researchers to access professional facilities at a reasonable cost, and they allow Companies to access small volume production, otherwise difficult to obtain directly from manufacturers. This paper is reviewing the most recent developments at CMP, as well as other services similar to CMP. These services helped the development of microelectronics for the EE&CS communities. Other communities might take advantage the same way, like the BioMed community. Examples of BioMed applications using CMOS and MEMS are given. The conclusion includes statements for the BioMed community as well as statements on where manufacturing infrastructures like CMP should go, considering technical developments towards More Moore, More than Moore, as well as statements related to globalization. |
| Title | A Two-Layer Global Router for Ball Grid Array Packages |
| Author | Yung-Chia Lin, *Kuang-Yao Lee, Ting-Chi Wang (National Tsing Hua University, Taiwan) |
| Page | pp. 301 - 306 |
| Keyword | package routing, ball grid array |
| Abstract | As the manufacturing technology keeps shrinking, the number of I/O pins in a modern VLSI chip has easily grown to hundreds, or even thousands. With the pressing need of connecting the huge number of I/O pins to a PCB (Printed Circuit Board), a BGA (Ball Grid Array) package is used mostly nowadays. In this paper, we present a two-layer BGA global routing algorithm which routes nets one at a time while considering the minimization of the total wirelength and overflow. The experimental results show that our algorithm averagely decreases 96.8% total overflow and 83.33% maximum overflow as compared to a recent work; besides, our algorithm produces smaller total wirelength and runs 4.39 times faster. |
| Title | A Routing Method based on Nearest Via Assignment for 2-Layer Ball Grid Array Packages |
| Author | *Yoshiaki Kurata, Yoichi Tomioka, Yukihide Kohira, Atsushi Takahashi (Tokyo Institute of Technology, Japan) |
| Page | pp. 307 - 312 |
| Keyword | package routing, BGA packages, mixed integer programming |
| Abstract | In this paper, we propose a routing method for 2-layer ball grid array packages that generates a routing pattern satisfying the constraint of wire congestions. In the proposed method, the via of a net is restricted to be placed near to the ball of the net, and a routing pattern that satisfies the constraints is formulated as a mixed integer programming. In experiments with several data, we obtain a routing pattern that satisfies the constraints of wire congestion within a practical time by using a mixed integer programming solver. |
| Title | Throughput-Driven Hierarchical Partitioning-Based Placement for Regular Distributed Register Architecture |
| Author | *Ya-Shih Huang, Juinn-Dar Huang (National Chiao Tung University, Taiwan) |
| Page | pp. 313 - 317 |
| Keyword | throughput-driven, partitioning-based, placement, RDR architecture, multicycle communication |
| Abstract | As proceeding into deep submicron technology era, interconnect delay is no longer negligible and becoming the dominant factor of system performance. Allowing multicycle communication in distributed register architecture is one promising solution to cope with this problem. However, multicycle communication can worsen system throughput due to additionally incurred interconnect latency. Meanwhile, different placements incur different interconnect latency then result in different system throughputs. Therefore, in this paper, we propose a hierarchical partitioning-based placement algorithm targeting the regular distributed register architecture that tries to maximize the overall system throughput. The experimental results show that our placer achieves on average 4.77 times throughput improvement compared with the SA-based timing-driven placer VPR. We also believe our idea can be further applied to task mapping in network-on-chips as well as global placement of a gate-level placer. |
| Title | Overlap-aware Analytical Placement Based on Stable-LSE |
| Author | *Naoto Funatsu, Yasuhiro Takashima (University of Kitakyushu, Japan) |
| Page | pp. 318 - 323 |
| Keyword | LSE, Stable-LSE, Overlap area, Analytical Placement |
| Abstract | We propose a differentiable approximation with Stable-LSE in which the overlap area between two cells can be expressed. In the Stable-LSE, the drawback of LSE with a numerical unstable problem is improved without loss of its efficiency. As a result, two objective functions of total wire length minimization and overlap area minimization are optimized simultaneously. We also implement the proposed method with Stable-LSE. Our prototype solves a problem with 300 cells in a few seconds with little overlap. Thus, we confirm its efficiency empirically. |
| Title | Yield Improvement in Gridless Detailed Routing with Redundant Via Insertion |
| Author | *Chih-Ta Lin, Yih-Lang Li (National Chiao Tung University, Taiwan) |
| Page | pp. 324 - 329 |
| Keyword | redundant via, detailed routing |
| Abstract | In this paper, a redundant via-aware routing system is proposed. A via-aware global router is used to reduce the number of vias. A redundant via-aware detailed router combined with redundant via protection, dead-via avoided tile propagation, redundant via-aware path construction and incremental dead-via constraint relaxation is applied to reduce the number of dead-vias. Finally a greedy post-layout redundant via insertion method is used to insert the redundant via of all alive vias. Experimental results show that the proposed redundant via-aware routing system runs faster than previous works and is the first work to achieve a 100% redundant via insertion rate for all vias in MCNC benchmark circuits. |
| Title | Interconnect Utilization of the VPEX Via-Programmable Structured ASIC |
| Author | *Kazuma Kitamura, Syouta Yamada, Masahide Kawarasaki, Yuuichi Kokusyou (Ritsumeikan University, Japan), Usman Ahmed, Guy Lemieux (University of British Columbia, Canada), Masaya Yoshikawa (Meijo University, Japan), Takeshi Fujino (Ritsumeikan University, Japan) |
| Page | pp. 330 - 334 |
| Keyword | Via programabble device, CAD system |
| Abstract | In the past, we proposed via-programmable logic architecture “VPEX” to reduce the manufacturing cost of SoCs by eliminating the cost associated with photomasks. In this paper, we describe the CAD framework for VPEX. We focus mainly on the routing algorithm. The proposed CAD flow is used to generate the final GDS-II layout for some sample circuits. The preliminary results show that VPEX has enough routing resources – a circuit occupying 89% of the logic fabric can be successfully routed utilizing only 58% of the available routing resources. |
| Title | Slack-Driven Obstacle-Avoiding Rectilinear Steiner Tree Routing |
| Author | *Yen-Hung Lin, Shu-Hsin Chang, Yih-Lang Li (National Chiao Tung University, Taiwan) |
| Page | pp. 335 - 340 |
| Keyword | Obstacle-avoiding rectilinear Steiner tree, slack-driven routing, Elmore delay model, timing constraint, worst negative slack |
| Abstract | Obstacle-avoiding rectilinear Steiner tree (OARST) con-struction is a fundamental problem associated with the trend toward IP-block-based System-on-Chip designs. The objective of previous studies on obstacle-avoiding rectilinear Steiner minimal tree (OARSMT) has been to minimize the total wire-length of the constructed Steiner tree. Studies of performance-driven Steiner trees have demonstrated that the minimization of wirelength may worsen the performance of the Steiner tree. This work is the first to construct OARST based on the Elmore delay model and consider timing constraints. A critical-trunk-based tree growth mechanism is proposed. The critical trunks are constructed by extended single-source single-target maze routing called multi-source single-target maze routing. The unconnected pins are connected to critical trunks under the delay constraints of every sink. The proposed critical trunk is applied to solve slack-driven OARST problem. Experimental results demonstrate that the proposed algorithm successfully solves 66.67% worst negative slack (WNS) violations in slack-driven OARST problem while running faster than previous OARSMT algorithms. |
| Title | FLEC: A Framework for System-level Debugging Support, Formal Verification and Static Analysis |
| Author | *Yoshihisa Kojima, Tasuku Nishihara, Takeshi Matsumoto, Masahiro Fujita (University of Tokyo, Japan) |
| Page | pp. 341 - 346 |
| Keyword | system-level design, formal verification, system dependence graph, framework |
| Abstract | System-level design methodology has become more widely accepted for more productivity, however, there are still few verification tools available for system-level designs. In this paper, we propose a system-level verification framework called FLEC, which has the Extended System Dependence Graphs (ExSDGs) as the intermediate representation, and the engines for analysis, simulation and verification, wrapped with the shell scripting interface. Its modular structure allows to easily realize various applications for debugging support, formal verification and static analysis. |
| Title | Language-Controlled Integrated Debugging Technique |
| Author | *Noriaki Suzuki, Junji Sakai (NEC Corporation, Japan) |
| Page | pp. 347 - 351 |
| Keyword | debugging, system LSI |
| Abstract | A control technique is described that simplifies the setting of the mode used for debugging software of multi-core system LSIs. This technique is controlled by special debugging program written in DCL(gdebugging mechanisms control languageh) with co-operating special debugging hardware mechanisms which include a system-event trapper, a bus tracer, debug processor. DCL language is designed subset of C-language and adding several functions to control these hardware mechanisms effectively and easily. This technique was adopted in an application system LSI design for a cellular phone. We applied it for an actual program debugging including external events. In the result, efficient debugging was achieved using integration of break debugging and tracing. |
| Title | Soft-error Resiliency Evaluation on Delayed Multiple-modular Flip-Flops |
| Author | *Jun Furuta, Yusuke Moritani, Kazutoshi Kobayashi, Hidetoshi Onodera (Kyoto University, Japan) |
| Page | pp. 352 - 357 |
| Keyword | TMR, Built-in Soft Error, SEU, SET |
| Abstract | According to the process scaling, semiconductor devices are becoming more sensitive to soft errors since amount of critical charges are decreasing. In this paper, we estimate soft error rates(SERs) on latches and combinational circuits on a 90nm CMOS process from circuit-level simulations. We also estimate SERs of delayed TMR and DMR in which SET pulses areremoved by delay elements. As a result, it reveals that the delayed DMRare very weak to soft errors compared with the delayed TMR. |
| Title | Evaluation of Statistical Method of Estimating Coverage Metrics for Functional Verification |
| Author | Kohei Hosokawa, *Yuichi Nakamura (NEC, Japan) |
| Page | pp. 358 - 363 |
| Keyword | Statistics, Coverage Metrics, Functional Verification, FPGA-based emulators |
| Abstract | We propose a statistical method to estimate the coverage metrics for functional verification, which are statement, branch, condition, expression, and toggle coverage of LSIs. The new method of estimation evaluates the coverage metrics for functional verification from only a few hundred or thousand randomly sampled signals. The statistical estimation method allows the coverage metrics for functional verification to be measured by a system, such as with FPGA-based emulators, which cannot observe the status of all signals in LSIs. We applied the new approach to a circuit to bridge a PCI bus and a local communication bus. We confirmed that the estimation error in all the coverage metrics for function verification visually followed a normal probability distribution in theory, and passed the most of normal distribution tests with a 1% significance level. |
| Title | A Fast Approximation Method of Maximum Operation in Statistical Static Timing Analysis for Achieving Specified Yield |
| Author | *Shun Gokita, Yukihide Kohira, Atsushi Takahashi (Tokyo Institute of Technology, Japan) |
| Page | pp. 364 - 369 |
| Keyword | SSTA, normal distribution, maximum operation, specified yield |
| Abstract | In this paper, we propose a fast maximum-operation of two normal distributions that reduces the error of the estimated worst delay of a circuit obtained by repeating the proposed operation. The proposed maximum-operation outputs a normal distribution by which the worst delay defined is equal to the worst delay defined by the actual distribution and whose shape near the worst delay is close to the actual distribution. In experiments by using benchmark circuits, it is shown that the estimated worst delay obtained by using the proposed method is more accurate than that by existing methods. |
| Title | Dynamic Model of a Parallel Plate Actuator with Pull-in Consideration for CMOS-MEMS Simultaneous Behavior Anticipation |
| Author | *Yuheon Yi, Hiroyuki Fujita, Hiroshi Toshiyoshi (IIS, University of Tokyo, Japan) |
| Page | pp. 370 - 373 |
| Keyword | dynamic model, pullin, CMOS-MEMS behavior, resonant oscillation |
| Abstract | This study presents a newly developed analytical model that can handle post pull-in behavior of MEMS electrostatic actuator. In addition to the conventional Matlab model of the quadratic oscillation system, a displacement limiter has been inserted to represent the mechanical stopper. Thanks to the modification, the model has become applicable to both pre- and post pull-in region of electrostatic actuators. The new model has been integrated in the simulation model of self-oscillating CMOS-MEMS, and the versatility in the behavior-level simulation has been verified. |
| Title | Support System for ASIC Design based on Sysem Block Diagram |
| Author | *Koichi Mori (Tokyo Metropolitan University, Japan), Yuichi Nakamura (NEC, Japan), Takao Nishitani (Tokyo Metropolitan University, Japan) |
| Page | pp. 374 - 379 |
| Keyword | co-simulation, FPGA, Simulink, SOC |
| Abstract | A support system for designing a complex SOC is proposed. The employed approach is based on a system block diagram which runs on software. A specific block in the diagram is replaced by a FPGA chip, connected to a PC. Therefore, a system level support system with the co-simulation environment between software and hardware is realized. The simulation time is reduced 1/200 compared with the conventional approaches and the system bottle neck will be easily estimated by our proposed system. |
| Title | Equivalence Checking of Loops Before and After Pipelining by Applying Symbolic Simulation and Induction |
| Author | *Shanghua Gao, Takeshi Matsumoto, Hiroaki Yoshida, Masahiro Fujita (University of Tokyo, Japan) |
| Page | pp. 380 - 385 |
| Keyword | Equivalence checking, Pipelining, Loop, Symbolic simulation, Induction |
| Abstract | When applications contain large loops, high level synthesis often takes advantage of software pipelining technique in order to improve the performance. High level synthesis with pipelining utilization needs complicated algorithms. So it is desired to check its correctness. In this paper, we propose a novel approach for equivalence checking of loops before and after pipelining. The proposed approach applies a combination of symbolic simulation and induction. The experimental results show that our method can verify the equivalence of loops before and after pipelining. |
| Title | Future Design and Tool Directions for Mixed-signal IC Design in Nanometer CMOS |
| Author | *Georges Gielen (Katholieke Universiteit Leuven, Belgium) |
| Page | pp. 389 - 396 |
| Abstract | Technology advances allows the integration of electronic systems in 65 nm and beyond. These nanometer technologies however pose serious challenges for the design of the analog and mixed-signal circuits. The benefits of scaling are shrinking. Power, variability and reliability are becoming increasing problems. Therefore, new design paradigms need to be developed to address these problems and fully exploit the technology capabilities. It is also requires novel design tools to be adopted by analog and mixed-signal designers. This keynote presentation will outline such future design and tool directions, and will illustrate this with some practical design results from state-of-the-art research. |
| Title | A Design Optimization of Low-Phase-Noise LC-VCO Using Multiple-Divide Technique |
| Author | *Shoichi Hara, Rui Murakami, Kenichi Okada, Akira Matsuzawa (Tokyo Institute of Technology, Japan) |
| Page | pp. 399 - 404 |
| Keyword | VCO, phase-noise, multiple-divide, ILFD |
| Abstract | The multiple-divide technique, using the multi-ratio frequency divider, has a possibility to improve FoM of VCO. This paper proposes a design optimization of LC-VCO using the multiple-divide technique. In the simulated results using 90-nm CMOS model parameters, the optimum frequency range, achieving better than -187.0 dBc/Hz of FoM, can be extended from 6.5 - 12.5GHz to 1.5 - 12.5 GHz. The proposed multiple-divide technique can provide a lower phase-noise, lower power consumption, smaller layout area of LC-VCO. |
| Title | Numerical Flicker Noise Model for Dual Channel FETs |
| Author | *Chia-Yu Chen, Yang Liu, Robert W. Dutton (Stanford University, United States), Junko Sato-Iwanaga, Akira Inoue, Haruyuki Sorada (Matsushita Electric Industrial Co.,Ltd., Japan) |
| Page | pp. 405 - 409 |
| Keyword | Flicker noise, heterostructure, MOSFETs, SiGe, TCAD |
| Abstract | A layer-dependent flicker noise model is proposed to predict low-frequency noise behavior in dual channel MOSFETs. A numerical noise model is implemented to account for unified number-mobility fluctuations in the buried and parasitic surface channels. A layer-dependent Hooge mobility fluctuation is also included in the numerical model. Based on the advanced TCAD capability the contributions of different flicker noise mechanisms can be quantified and the dominant flicker noise sources in different bias conditions can be discussed. |
| Title | Efficient State Space Enumeration for the Verification of Analog Designs |
| Author | *Pao-Jen Huang, Wei-Hsiang Cheng, Chien-Nan Liu (National Central University, Taiwan) |
| Page | pp. 410 - 415 |
| Keyword | Formal Verification, State Emulation, Analog Verification |
| Abstract | In previous approaches toward analog formal verification, the continuous state space of an analog circuit should be bounded and transformed to a FSM-like discrete model before applying formal methods. It implies that the reachable state space enumeration is the first key step in analog formal verification. In this paper, we propose an efficient state space enumeration approach that treats each MOS transistor as a storage element with 3 states and constructs all possible state combinations of an analog circuit. Because the range concept is considered, the states with similar behavior can be merged to reduce the number of possible states. We also propose an algorithm to analyze the netlist connection and further eliminate unreachable state combinations. Besides the reduction of verification complexity, the proposed method can also help designers to verify the operation range of the circuit without simulation. As shown in the experimental results, the state transition predicted by our approach does match the real circuit behavior very well, which provides another fast solution for the verification of analog designs. |
| Title | A Study on Mobility Degradation Effect for High PSRR Linear Voltage-to-Current Converter Design |
| Author | Chun Wei Lin, *Sheng Feng Lin, You Cheng Huang (National Yunlin University of Science and Technology, Taiwan) |
| Page | pp. 416 - 421 |
| Keyword | voltage-to-current converter, mobility degradation, transconductance, PSRR |
| Abstract | In this work, we propose a linear voltage-to-current converter (VIC) with mobility degradation compensation. Through utilizing the sum of two current sources which operated in linear and saturation region respectively, the nonlinearity of complementary parabolic voltage to current characteristics caused by mobility degradation are minimized. All theoretical analysis and design flow are developed well and a practical chip was fabricated by TSMC 0.35um 3.3V CMOS process with its measured transconductance and current variation are 0.923~1.064 and 1.5% respectively. The experiment results show that the proposed design significantly improves PSRR and the nonlinearity effect of VIC originated from mobility degradation. |
| Title | A Predictive Test Strategy for LNAs for RF CMOS Receivers |
| Author | *Kay Suenaga, Rodrigo Picos, Sebastia Bota, Miquel Roca, Eugeni Isern, Eugeni Garcia-Moreno (University of Balearic Islands, Spain) |
| Page | pp. 422 - 427 |
| Keyword | BiST, LNA, Predictive test, RF test |
| Abstract | In this paper, a Predictive Test strategy is proposed to estimate basic LNA performance parameters (such as S-parameters and IP1dB). This strategy uses easily measurable test observables. The method is suitable to be used in amplifiers embedded in RF receivers. This study presents a full On-Chip test setup, including the circuitry used to generate the input test stimuli needed to obtain the test observables: it consists of an IF generator and an auxiliary mixer. The test area overhead has been kept low by reusing some receiver blocks as test circuitry. RMS estimation errors for the predicted values of S12, S21, S22 and IP1dB are less than 1 %. |
| Title | A Compact On-Chip Testing Scheme for Analog-Mixed Signal Systems Using Two-Step AC and DC Fault Signature Characterizations |
| Author | *Wimol San-Um, Masayoshi Tachibana (Kochi University of Technology, Japan) |
| Page | pp. 428 - 433 |
| Keyword | Analog Testing, BIST, Amplifer |
| Abstract | This paper presents a compact on-chip testing scheme for detecting catastrophic faults in the pre-screening process of defective analog circuits in mixed-signal systems. The technique is based on AC and DC fault signature characterizations, which detect faults by monitoring and analyzing the fault signatures though amplitude and offset of voltage signals. This technique simplifies the design of fault-sensing circuits and provides digital logic test outputs and is applicable in most types of analog circuits. Demonstrations of a two-stage differential amplifier using 0.18-ìm CMOS technology show that fault coverage and area overhead are 97.5% and 15%, respectively. |
| Title | An Assertion-Based Verification Methodology for SystemC-AMS Designs |
| Author | *Stefan Lämmermann (Universität Tübingen, Germany), Alexander Jesser (Universität Frankfurt, Germany), Roland Weiss, Juergen Ruf, Thomas Kropf (Universität Tübingen, Germany), Lars Hedrich (Universität Frankfurt, Germany), Wolfgang Rosenstiel (Universität Tübingen, Germany) |
| Page | pp. 434 - 439 |
| Keyword | Assertion based verification, Mixed-signal verification, Simulation, Heterogeneous System Verification |
| Abstract | Heterogeneous system languages lack on functional and formal verification methodologies. However, there exists a verification gap between the different domains. An assertion-based design method is essential to bridge the verification gap. This requires mixed-signal assertions which include properties from all domains together. Therefore, we extend the formal semantics for mixed-signal assertions with new constraints of analog and transaction level modeling. Our approach improves the assertion-based verification technique with our implemented simulation based checker. The proposed method is a new assertion-based verification methodology for heterogenous systems. The effectiveness is demonstrated on two examples. |
| Title | Reliability Aware Power Grid Optimization with Consideration of Thermal Effects |
| Author | *Haruo Miki, Yoshiyuki Kawakami (Ritsumeikan University, Japan), Masaya Yoshikawa (Meijo University, Japan), Masahiro Fukui (Ritsumeikan University, Japan) |
| Page | pp. 440 - 445 |
| Keyword | power grid optimization, thermal analysis, electromigration |
| Abstract | Reliability becomes one of the most important issues for designing LSIfs. Major metrics related to reliability are timing violation risk by IR drop and wire cutoff risk by electric migration. These risks are sensitive to thermal conditions. This paper proposes a new power grid optimization algorithm that considers these thermal effects to get more reliable results. Risks of timing violation and wire cutoff are simultaneously considered and worse risk is minimized by this method. Experimental results depict that our new approach deals with reliability issues with more reality. |
| Title | Analysis of Process Variations in 90-nm CMOS Technology using Ring Oscillators |
| Author | Akihiro Kaya, Koh Johguchi, Shinya Izumi, Hans Jürgen Mattausch, *Tetsushi Koide (Hiroshima University, Japan) |
| Page | pp. 446 - 449 |
| Keyword | Ring oscillator, Process variation, within-wafer, within-die, HiSIM |
| Abstract | Process variations are rapidly increasing as the transistor size is scaled down. Thus, it is necessary to estimate accurately within-die and inter-die variations, in order to construct a design method for correct operation of circuits and integrated systems under these unavoidable variations. Here we report an analysis of ring oscillators, designed in a 90-nm CMOS technology, and the determination of their frequency variations for different stage numbers and supply voltages. The measurement results are also used to separate the within-wafer and within-die variations. In particular, the variation dependence on the supply voltages shows that within-die variations increase 3-times faster than the within-wafer variations when the power-supply voltage is reduced from 1.0 V to 0.6 V. Consequently, the within-die variations are expected to limit the low-power operation of VLSI circuits in future small scale process technologies. |
| Title | Yield Improvement in Memory Compiler Generated SRAM with Inter-Die Variations |
| Author | Chia-Chi Hsiao, *Hung-Ming Chen (National Chiao Tung University, Taiwan), Ching-Che Chung (National Chung Cheng University, Taiwan) |
| Page | pp. 450 - 455 |
| Keyword | Yield Improvement, SRAM, Process Variations |
| Abstract | As the technology scales down to nanometer, the yield degradation caused by inter-die variations is getting worse. Using adaptive body bias is an effective method to eliminate the yield degradation, however we need to know a die having high threshold voltage or low threshold voltage (also called process corner) in order to use this technique. Unfortunately, it is hard to detect the process corner when PMOS and NMOS variations are uncorrelated. In this paper, we propose some improved circuits of delay monitor and leakage monitor for both PMOS and NMOS, which are uncorrelated in inter-die variations. The experimental results show that our circuits can clearly distinguish each process corner of PMOS and NMOS, thus improve the yield obviously by adopting correct body bias. |
| Title | Asynchronous Differential Capacitance-to-Digital Converter for Capacitive Sensors |
| Author | *Tuan Minh Vo, Yasuhide Kuramochi, Masaya Miyahara, Takashi Kurashina, Akira Matsuzawa (Tokyo Institute of Technology, Japan) |
| Page | pp. 456 - 461 |
| Keyword | DIFFERENTIAL, ASYNCHRONOUS, CAPACITANCE-TO-DIGITAL CONVERTER, LOW-POWER |
| Abstract | This paper proposed a 10-bit low-power asynchronous differential capacitance-to-digital converter (CDC) for capacitive sensors. The proposed differential architecture makes the circuit insensitive to variations of sensor capacitance. Additionally, asynchronous mechanism and a dynamic regenerative comparator are utilized to lower the overall power of the circuit. Simulation results show that the effective number of bits (ENOB) of the proposed circuit is improved by 3.3 bits at Nyquist frequency as compared with previous work. The power consumption at 262 kHz is 8.45uA, which reducing from the previous work by 95% at the same frequency. |
| Title | New Device-Level Technology Retargeting Algorithm with Fixed-Topology Constraints |
| Author | *Ying-Zhih Chuang, De-Shiun Fu, Yih-Lang Li (Computer Science Department National Chiao Tung University, Taiwan) |
| Page | pp. 462 - 467 |
| Keyword | Migration, Constraint Graph, Topology |
| Abstract | Traditional constraint graph approaches only consider space utilization and therefore generate a compact cell layout with most a majority of the change in the shape and topology of interconnection. Moreover, traditional constraint-graph based migration algorithms have no capability to handle 45 degree wires. In this study, we propose and enhanced edge-based constraint graph compaction algorithm to prevent the distortion of the original shape and topology for digital devices. Based on the edge-based algorithm, a pseudo 45 degree edge model is integrated into our device migration framework. This model strengthens our device migration framework to process 45 degree wires. Furthermore, an effective wire extraction algorithm is proposed to identify interconnection between devices. The experimental results show that the proposed device migration algorithm can fast yield compact layout that conforms to new design rules without layout distortion. |
| Title | A Design of Active Decoupling Circuit for the Substrate Noise Reduction on a Mixed Signal LSI |
| Author | *Daisuke Satoh, Nobuhiko Nakano (Keio University, Japan) |
| Page | pp. 468 - 472 |
| Keyword | substrate noise, active decoupling, mixed signal, substrate compact model |
| Abstract | This paper presents an active decoupling circuit chip design with a variable substrate contact array to change the impedance to substrate. To simulate the substrate noise propagation, we made a substrate compact model using 3D field solver. The simulation results show the active decoupling can damp substrate noise to about 16-45 % in certain cases. The active decoupling circuit designed in this work was better performance under a few MHz than previously designed. |
| Title | A Multiphase Digital Controlled Oscillator with DVC Technique |
| Author | *Pao-Lung Chen, Chun-Fu Liu, Tsung-Hsiang Lin (National Kaohsiung First University of Science and Technology, Taiwan) |
| Page | pp. 473 - 476 |
| Keyword | DCO, ADPLL, VCO |
| Abstract | This paper presents a multiphase digital controlled oscillator with digital to voltage converter (DVC) technique. This multiphase digital controlled oscillator works from 102 MHz to 735 MHz with six coarse controlled bits and seven fine controlled bits. The power consumption of the ten phases output is 17mW at 735 MHz based on post-layout simulation in TSMC 0.18 um 1P6M CMOS process. The core area is 133 um x 117u m. |
| Title | A Wireless Chip for Intra-Oral Temperature Measurement |
| Author | *Tomohiro Ishikawa, Yoshihiro Masui, Koh Johguchi, Takeshi Yoshida, Yuji Murakami (Hiroshima University, Japan) |
| Page | pp. 477 - 481 |
| Keyword | Wireless, Temperature, Measurement, Intra-Oral, Denture |
| Abstract | A CMOS chip for an intra-oral temperature measurement is designed. The chip measures temperature with relatively slow sampling rate (<10Hz), converts it into serial data and sends them wirelessly. The measurement can provide a good benchmark of swallowing function. The chip is molded into a polymer based denture with a button battery and a crystal oscillator. To maintain acceptable feel of wear, there is a constraint on size, and as a result, there is strong demand for low power consumption. Each circuit block such as sensor, pre-amplifier, analog-to-digital converter (ADC) and static random access memory (SRAM) are designed in custom and evaluated by SPICE simulation. |
| Title | How to Design the Future Mixed Signal LSIs? |
| Author | Organizer & Moderator: Akira Matsuzawa (Tokyo Institute of Technology, Japan), Panelists: Georges Gielen (Katholieke Universiteit Leuven, Belgium), Mar Hershenson (Magma Design Automation, Inc., United States), Shoji Kawahito (Shizuoka University, Japan), Shiro Dosho (Panasonic Corp., Japan) |
| Page | p. 485 |
| Abstract | An importance of mixed signal LSI is increasing continuously. Emerging application areas such as medical, sensor networking, and energy saving will need this technology, as well as current application areas e.g. communications and digital consumers. However there are many issues such as requirement for high performance and low power operation, low voltage operation, big variations, limited development cost and time, tough noises, and complicated relationship between chip, package and board. This panel will discuss following topics for future mixed signal LSI design; 1) Issues; what is the most important or serious? 2) Direction of circuit design 3) Design methodology; ideal design methodology and reality 4) EDA technology; what have been done and residuals? 5) Education |