(Back to Session Schedule)

SASIMI 2012
The 17th Workshop on Synthesis And System Integration of Mixed Information Technologies

Poster III
Time: 10:00 - 11:45 Friday, March 9, 2012
Location: Int'l Conf. Room & Mtg. Room 31
Chairs: Qiang Zhu (Cadence Design Systems, Japan), Kyungsoo Lee (Kyoto University, Japan)

R3-1
TitleReplacement of Flip-Flops by Latches and Pulsed Latches for Power and Timing Optimization
AuthorYao-Ting Wu, *Rung-Bin Lin (Yuan Ze University, Taiwan)
Pagepp. 300 - 304
Keywordlow-power, timing optimization, latch, plused latch, clock tree
AbstractThis paper presents a simple (pulsed) latch replacement method that does not require clock tree re-synthesis while still excels at maintaining clock skew. It can improve timing performance by 14% and save clock tree power by 10% for already routed circuits with very tight timings. For circuits with looser timings, performance improvement is 6% to 8% and power saving is up to 24%. We also find out that a longer duty cycle has a great negative impact on the percentage of flip-flops being replaced with latches.

R3-2
TitleA Routability-oriented Packing Method for FPGA with Fracturable Logic Elements
AuthorWei Chen (Waseda University, Japan), Yuichi Nakamura (NEC Corporation, Japan), *Nan Liu, Takeshi Yoshimura (Waseda University, Japan)
Pagepp. 305 - 310
KeywordFPGA, Packing, Fracturable BLE, ALM
AbstractFracturable basic logic element (BLE) is widely applied in modern FPGAs to increase logic utilization rate, helping to reduce area of FPGAs. In this paper, we propose a novel packing method for FPGA with Adaptive Logic Module (ALM)-a kind of fracturable BLE manufactured by Altera. Our method can pack the LUTs and registers into ALMs as compactly as possible to reduce area and meanwhile improve routability of the result. Our method is based on a max-weight matching algorithm and the weight is decided in regard of area and routability. Experimental results show that by using fracturable BLE instead of traditional BLE, our method can reduce area by 37% and improve routability of the design by 15%.

R3-3
TitleA Two-Step BIST Scheme for Operational Amplifier
Author*Jun Yuan, Masayoshi Tachibana (Kochi University of Technology, Japan)
Pagepp. 311 - 316
KeywordBuilt-in Self-Test, Operational Amplifier, Compensation Capacitor, Current-based
AbstractThis paper presents a two-step Built-in Self-Test (BIST) scheme and its implementation for Operational Amplifier (Opamp). In addition to the catastrophic faults, the proposed technique can particularly detect the capacitance variation in the compensation capacitor by combining the current-based test with the offset-based test to detect the physical defects in the Opamp. The circuit-level simulation results of the proposed BIST system are presented to demonstrate the feasibility of the proposed BIST scheme with high fault coverage of 98%.
PDF file

R3-4s
TitleCircuit Partitioning Methods for FPGA-based ASIC Emulator using High-speed Serial Wires
Author*Katsunori Takahashi, Motoki Amagasaki, Morihiro Kuga, Masahiro Iida, Toshinori Sueyoshi (Kumamoto University, Japan)
Pagepp. 317 - 318
Keywordemulator, serial communication, virtual wire, FPGA
AbstractWe are studying FPGA-based ASIC emulator via high-speed serial communication. In this emulator, there are restrictions on placement of the FFs on FPGA and we have to reduce replicated logic gates and replicated input terminal when partitioning the cicuit to FPGAs. If the proposed circuit partitioning techniques are compared with hMETIS, it achieved average 56.4% reduction in the technique for suppressing the duplicution of external inputs. In the technique for suppressing the duplicution of nodes, it achieved average 71.8% reduction.
PDF file

R3-5
TitleTiming-aware Description Methods and Gate-level Simulation of Single Flux Quantum Logic Circuits
Author*Nobutaka Kito, Kazuyoshi Takagi, Naofumi Takagi (Kyoto University, Japan)
Pagepp. 319 - 324
KeywordSFQ circuit, timing, logic simulation
AbstractSingle-flux-quantum (SFQ) circuits are high-speed and low-power circuits using superconductive device. In SFQ circuits, skew of signals are not negligible and basic gates are clocked because SFQ circuits are fast and use pulse logic. Thus, we need to be aware timing issues for designing SFQ circuits. We propose two timing-aware description methods for SFQ circuits. One method is a circuit schematic with a note about order of pulse arrival. The other method is a timing-aware circuit description language. As an example application, we show a logic simulation algorithm.
PDF file

R3-6
TitleDesign and Analysis of Via-Configurable Routing Fabrics for Structured ASICs
AuthorHsin-Pei Tsai, *Rung-Bin Lin, Liang-Chi Lai (Yuan Ze University, Taiwan)
Pagepp. 325 - 329
KeywordStructured ASIC, Regular routing fabric, Via configurable, Routing resource, Router
AbstractThis paper presents a simple method for design and analysis of a via-configurable routing fabric formed by an array of routing fabric blocks (RFBs). The method simply probes into an RFB rather than resorts to full-chip routing to collect some statistics for a metric used to qualify the RFB. We find that the trade-off between wire length and via count is a good metric. This metric has been validated by full-chip routing and used successfully to create better routing fabrics.

R3-7
TitleDevice-level Simulations of Parasitic Bipolar Mechanisim on Preventing MCUs of Redundant Filp-Flops
Author*Kuiyuan Zhang, Ryosuke Yamamoto, Kazutoshi Kobayashi (Kyoto Institute of Technology, Japan)
Pagepp. 330 - 333
KeywordSoft error, Parasitic bipolar mechanisim, Multiple Cell Upset(MCU), Device simulation, Flip Flop
AbstractParasitic bipolar mechanisim can effectively prevent MCUs of redundant flip-flop, which improve the torlenrance of soft errors. Device-level simulations reveals that no MCU occurs in redundant latches storing the opposite values by the parasitic bipolar effect, while MCU occurs by a particle hit with high energy in the redundant latches storing the same value.

R3-8
TitleA Method of Analog IC Placement with Common Centroid Constraints
Author*Keitaro Ue, Kunihiro Fujiyoshi (Tokyo University of Agriculture and Technology, Japan)
Pagepp. 334 - 339
Keywordcommon centroid, sequence-pair, analog IC, placement
AbstractTo improve the immunity against process gradients, a common centroid constraint, in which every pair of capacitors which has been derived by dividing some original capacitors into two halves should be placed symmetrically with respect to a common centroid, is widely used. Xiao et al. proposed a method to obtain a placement satisfying the common centroid constraints, but this method has a defect. In this paper, we propose a method to obtain a placement which satisfies common centroid constraints.
PDF file

R3-9
TitleGPU-based Line Probing Techniques for Mikami Routing Algorithm
Author*Chiu-Yi Chan (Department of Computer Science and Engineering, Yuan Ze University, Taiwan), Jiun-Li Lin (Institute of Computer Science and Information Engineering, National Cheng Kung University, Taiwan), Lung-Sheng Chien (Department of Mathematics, National Tsing Hua University, Taiwan), Tsung-Yi Ho (Institute of Computer Science and Information Engineering, National Cheng Kung University, Taiwan), Yi-Yu Liu (Department of Computer Science and Engineering, Yuan Ze University, Taiwan)
Pagepp. 340 - 344
KeywordRouting, GPU, CUDA, Mikami router
AbstractGraphic processing unit (GPU), which contains hundreds of processing cores, is becoming a popular device for high performance computation in multi-core era. With strictly computation regularity characteristic, specific algorithms are key challenges for performance speed-up. In this paper, we propose a parallel CUDA-Mikami routing algorithm on NVIDIA's GPU. A 32-bit routing grid encoding is proposed to simplify wire intersection identification and wire direction recognition. Furthermore, thread-level and warp-level line probing techniques are proposed for vertical and horizontal routings, respectively. The experimental results indicate that the run-time efficiency is promising as compared to traditional CPU-version algorithms.
PDF file

R3-10
TitleTopology Design for Power Delivery in 3-D Integrated Circuits
Author*Shu-Han Wei, Yi-Hsuan Lee (Department of Electrical Engineering, National Chiao Tung University, Taiwan), Chih-Ting Sun, Yu-Min Lee (Department of Communication Engineering, National Chiao Tung University, Taiwan), Liang-Chia Cheng (Industrial Technology Research Institute, Taiwan)
Pagepp. 345 - 350
Keyword3D Power Delivery Network, Topology Optimization, 3D ICs, Power Grid, Through Silicon Via
AbstractThe three dimensional integrated circuit (3D IC) technology has been viewed as an effective method to improve the chip performance by overcoming the bottleneck of long global interconnection. However, the design of powerful 3D power delivery network (3D-PDN) becomes a serious challenge for 3D ICs. This work develops an efficient method to optimize the topology of 3D-PDN. A 3D-PDN topology design considers the 2D power grid design and through-silicon via placement. The proposed approach includes three main headings: (1) Initial 3DPDN Topology for early estimating the PG source and TSV source based on a compact circuit model of 3D-PDN; (2) Fast 3D-PDN IR Drop Analysis for identifying the correctness of 3D-PDN Topology; (3) 3D-PDN Topology Modification for refining the performance of initial 3D-PDN topology. The experimental results demonstrate the effectiveness of proposed 3D-PDN topology design method.

R3-11
TitleA Spur-Reduction Frequency Synthesizer For Wireless Application
Author*Te-Wen Liao, Jun-Ren Su, Chung-Chih Hung (Department of Electrical Engineering, National Chiao Tung University, Taiwan)
Pagepp. 351 - 354
KeywordPLL, VCO, Synthesizer
AbstractIn this paper, we presents a low-spur phase locked loop (PLL) system for wireless applications. The low-spur frequency synthesizer randomizes the periodic ripples on the control voltage of the voltage-controlled oscillator (VCO) in order to reduce the reference spur at the output of the PLL. A new random clock generator is presented to perform a random selection of phase frequency detector (PFD) control for charge pump at locked state. The proposed frequency synthesizer was fabricated in TSMC 0.18-µm CMOS process. The PLL has achieved the phase noise of -93dBc/Hz at 600 KHz offset frequency and reference spurs below -72dBc.
PDF file

R3-12
TitleDefinite Feature of Low-Energy Operation of Scaled Cross-Current Tetrode (XCT) SOI CMOS Circuits
Author*Yasuhisa Omura, Daishi Ino (Kansai University, Japan)
Pagepp. 355 - 360
KeywordSOI, CMOS, Low energy, XCT
AbstractThis paper describes an advanced aspect of cross-current tetrode (XCT) CMOS devices and demonstrates the outstanding low-energy characteristics of XCT-SOI CMOS by analyzing device operations. It is expected that this feature will be very useful to many medical implant applications.
PDF file

R3-13
TitleA Matching Method for Look-ahead Assertion on Pattern Independent Regular Expression Matching Engine
Author*Yoichi Wakaba, Shinobu Nagayama, Masato Inagi, Shin'ichi Wakabayashi (Hiroshima City University, Japan)
Pagepp. 361 - 366
KeywordFPGA, NIDS, Regular expression matching
AbstractIn this paper, we propose a matching method for look-ahead assertion on our pattern independent regular expression matching engine. Our pattern independent engine is suitable for network intrusion detection systems (NIDSs), which require quick updating of patterns. Look-ahead assertion is often used to describe patterns in NIDSs. However, as far as we know, existing pattern independent matching engines which can handle look-ahead assertion have not been proposed. In the proposed matching method, we introduce a preprocessing circuit into a matching engine. It performs matching for look-ahead assertion by searching from the end of a text to the beginning of the text. We also discuss the throughput of the proposed engine.
PDF file

R3-14
TitleHighly-parallel AES Processing for Five Confidentiality Modes with Massive-Parallel SIMD Matrix Processor
Author*Hiroki Yoshikawa, Takeshi Kumaki, Takeshi Fujino (Ritsumeikan University, Japan)
Pagepp. 367 - 371
KeywordAES, SIMD, matrix-processing architecture, cipher mode, parallel processing
AbstractThis paper presents a Highly-parallel AES processing of five confidentiality mode implementation with a Massive-Parallel SIMD Matrix processor (MX-1). MX-1 has 1,024 2-bit processing elements, which are connected by a flexible switching network, and supports 2-bit 1,024-way bit-serial and word-parallel operations in a single command. A method of parallel ECB processing with MX-1 has been reported previously. This research realizes to implement other AES cipher modes for expanding MX-1 capability. In order to realize the confidential of AES processing, we implemented AES with other cipher modes.

R3-15
TitleA Trace-Back Method with Source States and its Application to Viterbi Decoders of Low Power and Short Latency
Author*Kazuhito Ito (Saitama University, Japan)
Pagepp. 372 - 377
KeywordViterbi algorithm, Convolutional code, Source state, Low power
AbstractThe Viterbi algorithm is widely used for decoding of the convolutional codes. To find the survivor path, the traceback method is often employed because it consumes less power than the register exchange method especially for convolutional codes with many states. The disadvantage of the conventional trace-back using decision bits is the long decode latency. In this paper, a method of trace-back with source states instead of decision bits is proposed which reduces the number of memory accesses. The dedicated memory is also presented which supports the proposed trace-back method. The reduced memory accesses result in smaller power consumption and shorter decode latency than the conventional method.
PDF file

R3-16
TitleEvaluation of Migration Methods for Island Based Parallel Genetic Algorithm on CUDA
Author*Yuri Ardila, Shigeru Yamashita (Ritsumeikan University, Japan)
Pagepp. 378 - 383
KeywordEvolutionary Algorithm, Optimization, CUDA, GPU
AbstractIn EDA research community, various optimization problems have been studied so far. One of the successful metaheuristics for EDAs is Genetic Algorithm (GA). To speed-up many optimization methods based on GA, parallel GA implementations using GPUs have been proposed. This paper proposes new migration methods for parallel island based GAs, namely Roulette Wheel Migration (RWM), Developed City Migration (DCM), and Developed City Migration-α (DCM-α), and compares these methods to an existing method, Unidirectional Ring Migration. We implement our parallel GA on the CUDA's newest architecture, the Fermi architecture. The implemented parallel island based GA with the proposed migratios methods is tested using Travelling Salesman Problem benchmark. Our experimental results show that two of our proposed migration methods, RWM and DCM-α, are better than the existing method from the viewpoint of execution speed and solution quality.

R3-17
TitleFPGA Design of User Monitoring System for Display Power Control
Author*Tomoaki Ando, Vasily Moshnyaga (Fukuoka University, Japan)
Pagepp. 384 - 389
Keywordlow-power, FPGA, design, eye-tracking
AbstractThis paper describes the FPGA design of user-monitoring system for power management of PC display. From the camera readings the system detects whether the user looks at the screen or not and produces signals to control the display backlight. The system provides over 88% eye detection accuracy at 8f/s image processing rate. We describe the hardware and present the results of its experimental evaluation.

R3-18
TitleA Debug Solution with Synchronizer for CDC
Author*Akitoshi Matsuda (Kyushu University, Japan), Shinichi Baba (Kyushu Embedded Forum, Japan)
Pagepp. 390 - 393
Keywordclock domain crossing, low power, synchronizer
AbstractIt is important to advance correspondence of the high-performance and low-power requirements in system LSI designs. A CDC (clock domain crossing) verification solution needs to be deployed to detect efficiently debug the causes of CDC issues as well as to perform analysis of the design in low power issues. Even if some synchronizers are added for solving CDC issues, we have to make sure the amount of power. This paper describes that the power consumption decreased several percent using synchronizers by some case studies.

R3-19s
TitleA Low Power-Delay Product Processor Using Multi-valued Decision Diagram Machine
Author*Hiroki Nakahara (Kagoshima University, Japan), Tsutomu Sasao, Munehiro Matsuura (Kyushu Institute of Technology, Japan)
Pagepp. 394 - 395
KeywordBDD, MDD, Processor, MPU, Low Power
AbstractA heterogeneous multi-valued decision diagram of encoded characteristic function for non-zero outputs~(HMDD for ECFN) represents a multi-output logic function efficiently. As for the speed, the HMDD for ECFN machine is 3.02 times faster than the Core~i5 processor, and is 12.50 times faster than the Nios~II processor. As for the power-delay product, it is 32.72 times lower than the Core~i5 processor, and is 57.92 times lower than the Nios~II processor.
PDF file

R3-20
TitleA TMR-based Soft Error Mitigation Technique With Less Area Overhead in High-Level Synthesis
AuthorDaiki Tsuruta, *Masayuki Wakizaka, Yuko Hara-Azumi, Shigeru Yamashita (Ritsumeikan University, Japan)
Pagepp. 396 - 401
KeywordHigh-Level Synthesis, Fault Tolerant, TMR
AbstractIt is very important to consider soft errors in LSI designs. Although TMR (Triple Modular Redundancy) is an effective way of preventing soft errors, but it increases the mounting area in data path. In this paper, we propose a technique that can decrease the mounting area in data path for generating soft-error tolerant LSIs in high-level synthesis. Through experiments, our method demonstrates that it achieves high reliability at little area overhead compared with a traditional TMR-based method.

R3-21
TitlePipeline Circuit Synthesis from C Descriptions for Fast Memory Access in System LSI
Author*Yu-ichi Kitamura (Kinki University, Japan), Kazuya Kishida (Panasonic Industrial Devices S&T, Japan), Takashi Kambe (Kinki University, Japan)
Pagepp. 402 - 407
Keywordmemory access, C based deisgn, behavior synthesis, pipelining, register
AbstractHigh level design methodologies are becoming more and more important in the design of large system LSI devices. As a result, behavioral synthesis from C and other high level languages is key to achieving the productivity demanded by such large designs. For memory intensive applications in particular, the automatic identi cation, optimization and synthesis of memory access operations is essential. This paper describes a method for automatically generating behavioral descriptions for memory access pipeline circuits. Combined with registerization, the approach can accelerate Memory Accesses (MA) irrespective of the degree of data reuse. The method is applied to well-known algorithms used in applications such as speech recognition, JPEG encoding and particle tracking technology, and its effectiveness evaluated.

R3-22
TitleA PE-based Pipelining and Assignment Algorithm for Coarse Grained Dynamic Reconfigurable Circuits
Author*Nobuyuki Araki, Takashi Kambe (Kinki University, Japan)
Pagepp. 408 - 413
KeywordReconfigurable Computing, pipelining, PE assignment, C level language, configuration synthesis
AbstractReconfigurable Computing (RC) has been proposed as a new paradigm to address the conflicting design requirements of high performance and area efficiency. Coarse-grained architecture RC (CGA-RC) operates at the word level of granularity and exhibits better power and performance features than fine-grained architectures. However, in a CGA-RC system, the processing elements (PE) implement several types of multiple arithmetic operations and the routing between them has a fixed architecture. It is difficult for these systems to achieve both good performance and high PE utilization automatically for all applications. To cope with this issue, we propose a PE-based automatic loop pipelining algorithm to accelerate loop processing and a simultaneous PE assignment and routing algorithm to improve the PE utilization ratio in CGA-RC. In this paper, we investigate and evaluate these algorithms.

R3-23
TitleHigh-Level Synthesis Using Partially-Programmable Resources for Yield Improvement
Author*Yuko Hara-Azumi (University of California, Irvine, U.S.A.), Hiroyuki Tomiyama, Shigeru Yamashita (Ritsumeikan University, Japan), Nikil D. Dutt (University of California, Irvine, U.S.A.)
Pagepp. 414 - 419
KeywordHigh-level synthesis, Partially-programmable circuits, Yield improvement, Resource binding
AbstractThis paper proposes a novel binding technique in high-level synthesis (HLS) for yield improvement by using resources realized by Partially-Programmable Circuits (PPCs). A PPC, which has been recently developed, is very unique in that it can improve yield by reconfiguring its internal functionality depending on the faults detected after fabrication. We aim at further improving the yield by utilizing the PPC-realized resources. Our work performs resource binding in HLS considering the reconfigurations of the PPC-realized resources after fabrication, i.e., maximizing the yield expectation. Our work is formulated as an ILP problem. Several case studies demonstrate the effectiveness of our work.

R3-24
TitleA Method of Power Supply Voltage Assignment and Scheduling of Operations to Reduce Energy Consumption of Error Detectable Computations
Author*Yuki Suda, Kazuhito Ito (Saitama University, Japan)
Pagepp. 420 - 424
Keywordsupply voltage assignment, scheduling, low power, error detection, dependability
AbstractAs the VLSI technology evolves, VLSI circuits are becoming more vulnerable to noises such as the crosstalk, the power supply fluctuation, and single event upsets (SEU). To detect an error caused by the SEU in functional units, operations are executed twice and the results are compared to check if those are identical or not. Such doubly executing operations and the comparison may require large energy consumption. In this paper a method of the power supply voltage assignment and the scheduling of operations is proposed to reduce the energy consumption of the error detectable circuits.
PDF file

R3-25
TitleSoftware Design Methodology based on Energy Consumption Model Considering Relationship between Software and Hardware
Author*Koji Kurihara, Hiromasa Yamauchi, Toshiya Otomo, Takahisa Suzuki (Fujitsu Laboratories Ltd., Japan), Yuta Teranishi (Fujitsu Kyushu Network Technologies Limited, Japan), Koichiro Yamashita (Fujitsu Laboratories Ltd., Japan)
Pagepp. 425 - 430
Keywordmulti-core, energy consumption, model
AbstractIn an SoC for industrial systems, there is a case that we have to optimize energy consumption and performance with existing software and hardware. However, it is difficult to achieve this without evaluation methodology considering relationship between software and hardware. Therefore, we propose an evaluation methodology based on energy consumption model considering relationship between software and hardware. We verified the accuracy of our methodology by comparing it to an experimental result.
PDF file

R3-26
TitleElectro-Thermal Modeling and Reliability Simulation of Power MOSFETs with SystemC-AMS - Case Study: An Unclamped Inductive Switching Test Circuit
Author*Keiji Nakabayashi, Takahiro Ozasa (Keirex Technology Inc., Japan), Tamiyo Nakabayashi (Nara Women's University, Japan)
Pagepp. 431 - 436
KeywordSystemC-AMS, Power MOSFET, Electro-Thermal Simulation, Device Modeling, Unclamped Inductive Switching test circuit
AbstractWe present a new technique for the electro-thermal modeling and reliability simulation of power MOSFETs with SystemC-AMS. We model the non-linear electrical characteristics and self-heating effect of the power MOSFETs, and improve a numerical integration method in order to solve numerical instability of SystemC-AMS. Our technique is verified by experimental results using an Unclamped Inductive Switching (UIS) test circuit.
PDF file