(Back to Session Schedule)

SASIMI 2012
The 17th Workshop on Synthesis And System Integration of Mixed Information Technologies

Poster IV
Time: 14:15 - 16:00 Friday, March 9, 2012
Location: Int'l Conf. Room & Mtg. Room 31
Chairs: Chikaaki Kodama (Toshiba Corp., Japan), Keishi Sakanushi (Osaka University, Japan)

R4-1
TitleDesign Automation for Digital Microfluidic Biochips: From Fluidic-Level Toward Chip-Level
AuthorTsung-Wei Huang, *Tsung-Yi Ho (National Cheng Kung University, Taiwan)
Pagepp. 439 - 444
KeywordChip-Level, Digital Microfluidic Biochips, Optimization, Physical Design, Synthesis
AbstractAdvances in droplet-based digital microfluidic biochips (DMFBs) have led to the emergence of biochips for automating laboratory procedures in biochemistry and molecular biology. These devices enable the precise control of microliter of nanoliter volumes of biochemical samples and reagents. They combine electronics with biology, and integrate various bioassay operations, such as sample preparation, analysis, separation, and detection. To meet the challenges of increasing design complexity, computer-aided-design (CAD) tools have been involved to build DMFBs efficiently. This paper provides an overview of DMFBs and describes emerging CAD tools for the automated synthesis and optimization of DMFB designs, from fluidic-level synthesis to chip-level design. Design automations are expected to relieve the design burden of manual optimization of bioassays, time-consuming chip designs, and costly testing and maintenance procedures. With the assistance of CAD tools, users can concentrate on the development and abstraction of nanoscale bioassays while leaving chip optimization and implementation details to CAD tools.

R4-2
TitleTiming-Aware Clock Gating Algorithm for Pulse-Latch Circuits
Author*Zong-Han Yang, Tsung-Yi Ho (National Cheng Kung University, Taiwan)
Pagepp. 445 - 450
KeywordClock Gating, Pulse Latch, Timing
AbstractLow power design is a crucial issue in modern circuit design. Recently, several techniques are proposed to save power consumption. One of them is the pulse-latch technologies which replace the flip-flops with pulse-latches due to smaller capacitance. To further reduce power consumption of pulse-latch-based circuits, the clock gating of pulse-latch, which is called pulser gating, has been proposed recently. However, pulser gating may cause the violation of setup time constraint and thus the data cannot be stored to the registers correctly, causing a fatal error in the design. In this paper, we propose an algorithm to deal with the problem of pulser gating and setup time constraint simultaneously. We use a line-search algorithm to capture the problem of setup time constraint and apply the minimum-cost maximum-flow technique to determine the clock tree topology of pulse-latch-based circuits. Experimental results show that our algorithm can reduce power consumption effectively by 58.35% on average compared to binary merge algorithm.

R4-3
TitleResistivity-based Modeling of Substrate Non-uniformity for Resistance Extraction of Low-Resistivity Substrate
Author*Yasuhiro Ogasahara, Toshiki Kanamoto (Renesas Electronics Corp., Japan), Hisato Inaba, Toshiharu Chiba (Renesas Design Corp., Japan)
Pagepp. 451 - 456
Keywordsubstrate noise, substrate extraction, low-resistivity substrate, doping profile
AbstractThis paper discusses modeling of non-uniform substrate resistivity for substrate resistance extraction. Though substrate resistivity of each substrate layer is frequently assumed to be uniform, doping profile of each substrate layer is not uniform. We present the extraction error of substrate resistance under uniform resistivity assumption. The resistivity model which enables accurate resistance extraction of substrate with non-uniform profile is suggested. We also demonstrate characterization of the suggested model using substrate resistances which are easily obtained from fabricated chips.
PDF file

R4-4
TitleTemperature-Constrained Fixed-Outline Floorplanning for 3D ICs
AuthorCiao-Yu Hong, Wai-Kei Mak, *Ting-Chi Wang (Department of Computer Science National Tsing Hua University, Taiwan)
Pagepp. 457 - 459
KeywordTemperature, Fixed-Outline, Floorplanning, 3D-IC
AbstractThree-dimensional (3D) ICs are produced by stacking multiple dies and delivering inter-die signals with Through-Silicon Vias (TSVs). Typically, TSVs which deliver signals among dies are called signal TSVs, while those enhancing heat dissipation are called thermal TSVs. In this paper we present a temperature-constrained fixed-outline 3D-IC floorplanner which also simultaneously places signal and thermal TSVs to benefit wirelength and temperature reduction. Encouraging experimental results are shown to demonstrate the effectiveness and efficiency of our floorplanner.

R4-5
TitleA GPGPU Implementation of Parallel Backward Euler Algorithm for Power Grid Circuit Simulation
AuthorLei Lin, *Hayato Shiono, Makoto Yokota, Masahiro Fukui (Ritsumeikan University, Japan)
Pagepp. 460 - 465
Keywordpower grid, simulator, GPGPU, Backward Euler
AbstractWith the increase in VLSI scale, have been increasing the time required for power grid simulation. This paper describes a fast and accurate parallel transient simulator for power grid, which is implemented by GPU (Graphics Processing Unit) using CUDA. This simulator employs accurate simulation by Backward Eular method. Experimental results show that the proposed simulator can achieve 86.2 times faster than CPU software.

R4-6s
TitleA Third Order Delta-Sigma Modulator with Shared Opamp Technique for Wireless Applications
Author*Ghazal Fahmy, Daisuke Kanemoto, Haruichi Kanaya, Ramesh Pokharel, Keiji Yoshida (Kyushu University, Japan)
Pagepp. 466 - 467
Keyworddelta- sigma modulator, ADC, shared-opamp
AbstractThis paper described the design of A third orders delta-sigma modulator (DSM) exploited shared opamp technique in order to reduce number of opamp required, consequently the total power consumption for the modulator decreased as well as required area decreased too. The architecture relaxed comparator speed which appropriate for wireless applications. First and second stages are sharing one opamp in integration and sampling phase. The proposed circuit has been designed on TSMC 0.18um CMOS technology. 2MHz Bandwidth, 50dB Peak Signal-to-Quantization-Noise Ratio (SQNR), which is suitable for WCDMA, have been achieved. It consumes 2.4mW with power supply 1.2V and area is 0.3mm2.
PDF file

R4-7s
TitleA Self-Organization Maps Approach to FPGA Placement
AuthorMotoki Amagasaki, *Yasuaki Tomonari, Masahiro Iida, Morihiro Kuga, Toshinori Sueyoshi (Kumamoto University, Japan)
Pagepp. 468 - 469
KeywordSOM, FPGA, Placement
AbstractCell placement is an important phase of current Field Programmable Gate Array(FPGA) cir- cuit design. However, this placement problem is NP- hard. Although nondeterministic algorithms such as Simulated Annealing(SA) are successful in solving this problem, they are known to be slow. In this paper, we introduce a new neural network approach to placement problem of FPGA. The used network is a Kohonen self-organization Map. A connection relation ship of cluster-level netlists is converted to a a set of appropriate input vectors. These vectors which have higher dimensionality are fed to the self-organization Map at random to map themselves onto a 2 dimensional plane of the regular chip. The key feature is that SOM algorithm perform the cell placement to minimize total connection length in the circuit. In this paper, we evaluate our placement tool using some benchmark circuits.
PDF file

R4-8
TitleThe Development of CAD System for Via Programmable Structured ASIC VPEX3
Author*Ryohei Hori (Ritsumeikan University, Japan), Masaya Yoshikawa (Meijo University, Japan), Takeshi Fujino (Ritsumeikan University, Japan)
Pagepp. 470 - 475
KeywordStructured ASIC, Via Programmable, Exclusive-or
AbstractVarious kinds of structured ASICs (SA) which can be customized by only few mask, make the photomask cost drastically decrease. We have been developing the novel VPSA architecture "VPEX (Via Programmable logic device using EXclusive-or array)". It is necessary to develop CAD system for VPEX, because there are no general tools supporting Placement and Routing for VPSA. In this paper, we describe the dedicated CAD system and studied the area penalty of VPEX compared with ASIC.

R4-9
TitleDesign of Low-Voltage High-Precision Complex Quadrature Modulators
Author*Takahiro Tsushima, Tsuneo Tsukahara (University of Aizu, Japan)
Pagepp. 476 - 481
Keywordquadrature modulator, LO calibration, transmitter
AbstractWe propose novel structures of quadrature modulator suitable for software-defined radio and cognitive radio transmitters. The proposed modulators can correct LO phase and amplitude errors, and achieve high modulation accuracy and low power consumption. The simulated sideband rejection ratios are better than 60dB when phase error is 3 degrees and amplitude error is 0.1 dB and the power consumption is about 13mW.

R4-10s
TitleA Design of 2GHz Band O-QPSK Wireless Transmitter using 0.18µmCMOS Technology
Author*Yuki Mitani, Nobuhiko Nakano (Keio University, Japan)
Pagepp. 482 - 483
KeywordBMI, wireless
AbstractBrain-Machine-Interface(BMI) has been attracted attention in recent years, and the demands for wireless communication are increasing. In this paper, we proposed a transmitter using O-QPSK on 0.18µm CMOS technology to meet the requirements for wireless communication. This transmitter operates at 1V supply voltage, and current consumption is 15.03mA. Output is -3dBm, and the maximum data rate is 12.8Mbps.
PDF file

R4-11
TitleA 0.5V PWM-Driven Analog Differential Amplifier Using Subthreshold Leakage Current
Author*Tomochika Harada, Ryuuya Otaki (Yamagata University, Japan)
Pagepp. 484 - 487
KeywordPWM, subthreshold, amplifier, mixied circuit
AbstractIn this paper, we design and fabricate a PWM-driven analog differential amplifier using only sub-uA order subthreshold current for realizing ultra-low power analog/digital LSI system by using low output power supply. In this circuit, 2 inputs analog data are translated to PWM signals. And they are operated using differential calculation by digital processing method. This circuit has almost the same performance as the ultra-low power analog operational amplifier we designed. It is designed and fabricated using triple-well structure 65nm CMOS process. From measurement results, we make sure of the circuit operation and power consumption, which is 1.06uW@55kHz.

R4-12s
Title16PE 3D-Mesh NOC Based 3D Multicore Design and Implementation
AuthorMohamad Hairol Jabbar (ENSTA ParisTech, France), Dominique Houzet (GIPSA-LAB, France), *Omar Hammami (ENSTA ParisTech, France)
Pagepp. 488 - 489
Keyword3D, multicore, mesh, noc, tezzaron
AbstractIn this paper, we describe the design flow, architecture and implementation of our 3D multiprocessor with NoC . The design based on 16 processors communicating using a 4x2x2 mesh NoC spread on two tiers is discussed in detail and will be fabricated using Tezzaron technology with 130 nm Global Foundaries standard library. The purpose of this work is to accurately measure NoC performances in real 3D chip when running mobile multimedia applications to evaluate the impact of 3D architecture compared to 2D

R4-13s
TitleA Performance Improvement for Floating-Point Arithmetic Unit with Precision Degradation Detection
Author*Soseki Aniya, Toshiaki Kitamura (Graduate School of Information Sciences, Hiroshima City University, Japan)
Pagepp. 490 - 491
Keywordperformance improvement, precision degradation detection, vector processor
AbstractSome errors are very important in the scientific computation observed in floating-point calculations caused by rounding, overflow, underflow, loss of significant digits, or loss of trailing digits. In the prior work, we designed a vector co-processor that has floating-point arithmetic units with detection of loss of significant digits and precision degradation. We propose a partitioned vector co-processor design. The design can improve performance of the data transfer throughput between vector co-processor and SSRAM. Compared to the prior work, the number of execution cycles of the vector load instruction becomes twice faster in the RTL simulation.
PDF file

R4-14
TitleHardware Architecture for Real-Time Operation of Learning-Based Super-Resolution Using Binary Search Tree
Author*Takahiro Kitayama, Kohei Michibata, Yuzuru Shizuku, Tetsuya Hirose, Nobutaka Kuroki, Masahiro Numa (Kobe University, Japan)
Pagepp. 492 - 496
KeywordLearning-Based Super-Resolution, hardware architecture, stream data-processing system, real-time operation, pipeline
AbstractIn this paper, we propose a hardware architecture for real-time operation of Learning-Based Super-Resolution using binary search tree. In the proposed architecture, the stream data-processing system is applied in the whole circuit, and burst transmission is applied between each module to improve the transfer rate. Moreover, the Search Dictionary module which has been a bottleneck is pipelined to improve the throughput. Experimental results have shown that the processing speed with our architecture is about 83 times faster than that of a software processing for a picture of 1,024 × 1,024 pixels.

R4-15
TitleArchitecture Optimization of Group Signature Circuits for Cloud Computing Environment
Author*Sumio Morioka, Jun Furukawa, Yuichi Nakamura, Kazue Sako (NEC Corporation, Japan)
Pagepp. 497 - 502
Keywordcloud security, digital signature, server accelerator, IP core design, HLS
AbstractGroup signature is one of the main theme in recent digital signature studies. The signature algorithm is a combination of more than 30 elliptic curve (ECC), modular (RSA), long-bit integer (INT) and hash arithmetic operations. In cloud computing environment where a lot of client devices (mobile devices, embedded systems, sensor devices and etc.) are connected to servers in data center via network, low-power and fast H/W accelerators are strongly desired. In this paper, we propose a H/W macro-architecture for servers in data center, and will compare it with the architecture for client devices. While these architectures are completely different, we can use the same H/W design methodology where the architectures are explored automatically by a custom-made HLS (High Level Synthesis) tool.
PDF file

R4-16
TitleEfficient Packet Transmission Priority Control Method for Network-on-Chip
Author*Yusuke Sekihara, Takashi Aoki, Akira Onozawa (NTT Microsystem Integration Laboratories, Japan)
Pagepp. 503 - 507
KeywordNoC, performance, priority, transmit, flit
AbstractTo meet the ever-increasing need for high-performance computing, the performance of a single processor has been improved almost to its limit and parallelization has thus become inevitable. NoC architecture based on packet switching is becoming popular for large-scale parallelism. In this paper, we propose a new packet transmission control method in the NoC architecture that can improve the efficiency of the buffers. The simulation results prove that the proposed method can improve average latency about 10-20% when congested.
PDF file

R4-17s
TitleDirect Memory Access Transfer Method with Chaining for Inter-Chip Network
Author*Eiichi Sasaki, Daisuke Sasaki, Ikan Wang, Yusuke Koizumi, Hideharu Amano (Keio University, Japan)
Pagepp. 508 - 509
KeywordNoC, multi-core
AbstractWireless 3D-NoC architecture has highly flexibility, but it is important how to communicate between processing nodes. We propose a DMA transfer mechanism using packet-request for inter-chip network router. In evaluation, by using the direct data transfer using the chaining mechanism, 7.7 times improvement on communication latency was achieved.

R4-18
TitleEfficient Barrier Synchronization for 2D Meshed NoC-based Many-core Processors
Author*Lovic Gauthier, Farhad Mehdipour, Koji Inoue, Shinya Ueno, Hiroshi Sasaki (Kyushu University, Japan)
Pagepp. 510 - 515
KeywordBarrier, Synchronization, NoC, Many-core, Multi-thread
AbstractNetwork-on-Chip (NoC) based many-cores are becoming popular due to their high scalability compared to traditional bus-based architectures. However they still lack software tailored to their specificities. In this paper we propose several techniques for tailoring and combining barrier synchronizations in order to take advantage of the 2D-meshed NoCs. Experimental results show that our combined barriers achieve often twice shorter delays than state of the art barriers.
PDF file

R4-19
TitleEffective Distributed Parallel Scheduling Methodology for Mobile Cloud Computing
Author*Hiromasa Yamauchi, Koji Kurihara, Toshiya Otomo (Fujitsu Laboratories Ltd., Japan), Yuta Teranishi (Fujitsu Kyushu Network Technologies Ltd., Japan), Takahisa Suzuki, Koichiro Yamashita (Fujitsu Laboratories Ltd., Japan)
Pagepp. 516 - 521
KeywordMobile phone, Parallel processing, Cloud computing, Scheduling, Sensor network
AbstractThere is a category of the device such as mobile phones and the sensor devices. If each device is considered as a node, these devices will be considered to be a distributed parallel processing system. It is defined as “Mobile Cloud computing (MC)”. The collaborated processing between mobile phones, calculation by sensor devices, etc. are practical usage of MC. This MC differs from traditional parallel processing among servers, mainframe or HPC in respect of dynamic fluctuation of battery power and mobile network quality. We propose a distributed parallel scheduling methodology for MC and developed a simulator to analyze these characteristics and the bottleneck of MC.
PDF file

R4-20
TitleExtending Intent in Android for Distributed Collaboration Framework
Author*Takahiro Ito, Takuya Azumi, Nobuhiko Nishio (Ritsumeikan University, Japan)
Pagepp. 522 - 527
KeywordAndroid, Embedded System
AbstractThe Android is widely used on mobile devices. An approach to control embedded devices from Android was proposed. Moreover, frameworks to collaborate embedded devices were proposed. These proposals have some issues, however, at the point of flexibility. In this paper, we propose a flexible framework using "Intent" to control embedded devices from Android. Our framework makes Android possible to control embedded devices which are manufactured to use not only our framework but also existing frameworks.

R4-21
TitleEnergy Efficient Instruction-set Extension Considering Inline Expansion
Author*Sho Ninomiya, Keishi Sakanushi, Yoshinori Takeuchi, Masaharu Imai (Osaka University, Japan)
Pagepp. 528 - 533
KeywordInstruction-set Extension, Inline Expansion, Energy-efficient, ASIP, Embedded Systems
AbstractTo reduce energy consumption of applications in embedded systems, instruction-set extension suitable for the application is necessary on ASIP. Inline expansion, one of the software optimization, is not considered in conventional instruction set extension method. In this paper, we propose energy efficient instruction-set extension method considering inline expansion. The experiment shows the proposed method reduce more energy consumption.
PDF file

R4-22
TitleReduction of Glitches for Low-Power Multipliers Using 4-2 Compressors Based on Hybrid-CMOS Logic Style
Author*Yang-uk Son, Yuzuru Shizuku, Takeshi Kogure, Tetsuya Hirose, Nobutaka Kuroki, Masahiro Numa (Kobe University, Japan)
Pagepp. 534 - 538
Keywordlow-power, multiplier, glitch, 4-2 compressor, 4-2 tree architecture
AbstractIn this paper, we propose a technique to reduce glitches for reducing power consumption in multipliers. Conventional approaches using flip-flops for synchronization increase area and power. Our 4-2 compressor based on hybrid-CMOS logic style reduces glitches without additional circuits by using transmission-gates and pass-transistors which act like resistors when cascaded. In addition, CMOS inverters reduce speed deterioration. Simulation results have shown that the proposed technique reduces glitch activity by 1/12.

R4-23
TitleAffine Transformations of Logic Functions and Their Application to Affine Decompositions of Index Generation Functions
Author*Tsutomu Sasao, Masao Maeta (Kyushu Institute of Technology, Japan), Radomir Stankovic (University of Nis, Serbia), Stanislav Stankovic (Tampere University of Technology, Finland)
Pagepp. 539 - 543
Keywordlinear transform, Incompletely specified function, functional decomposition, Boolean matching
AbstractAffine transformations are used to find optimal affine decompositions of incompletely specified index generation functions. This paper shows that the number of equivalence classes to consider is equal to the number of affine equivalence classes of logic functions. Exact minimum solutions with up to five variables are obtained.

R4-24
TitleAn Error Diagnosis Technique Based on SAT Solver
Author*Tomoki Matsuyama, Hiroto Senzaki, Kosuke Watanabe, Tetsuya Hirose, Nobutaka Kuroki, Masahiro Numa (Kobe University, Japan)
Pagepp. 544 - 548
KeywordECO, Error Diagnosis, SAT solver
AbstractThis paper presents an error diagnosis technique based on a SAT solver, which has an advantage in lower memory consumption and larger number of variables to be processed in comparison with Binary Decision Diagrams (BDDs). The SAT solver is used for generating input patterns for error diagnosis, and verification of a solution by the proposed technique. By using the SAT solver, the proposed technique can rectify such large circuit that cannot be represented by BDDs. Experimental results have shown that our technique rectifies the circuit of 21,061 gates.

R4-25
TitlePerformance Evaluation of Various Configuration of Adder in Variable Latency Circuits with Error Detection/Correction Mechanism
Author*Kenta Ando, Atsushi Takahashi (Osaka University, Japan)
Pagepp. 549 - 554
Keyworderror detection/correction circuits, maximum delay time, minimum delay time, distribution of delay, effective clock period
AbstractThe performance of a circuit is improved by introducing error detection/correction mechanism which uses the variation of delays between Flip-Flops effectively. The performance of an error detection/correction circuit depends on the minimum delay, maximum delay, and delay distribution of the circuit. In general, the performance is better if the larger the minimum delay is and/or the lower the possibility of large delay is. However, circuits are usually designed so that the maximum delay is reduced as much as possible to maximize the performance in the conventional framework and are not necessarily fitted to error detection/correction framework. In this paper, in order to develop a circuit synthesis method for error detection/correction framework, various ripple-carry-adders (RCA) in which the minimum delay is increased by delay insertion and/or the probability of large delay is reduced by changing the configuration of the circuit components are designed and evaluated. In experiments, we confirm that a circuit obtained achieves a better performance in error detection/correction framework.
PDF file

R4-26
TitleA Delay Control Technique for Extremely Low-Voltage Subthreshold CMOS Digital Circuits
Author*Seiichiro Shiga, Tetsuya Hirose, Yuji Osaki, Nobutaka Kuroki, Masahiro Numa (Kobe University, Japan)
Pagepp. 555 - 559
KeywordCMOS, subthreshold, on-chip, compensation circuit, PVT variation
AbstractIn this paper, we propose a fully on-chip delay control technique for extremely low-voltage (ELV) subthreshold CMOS digital circuits. Because the performance of ELV subthreshold CMOS digital circuits degrades with the process, supply voltage, and temperature (PVT) variations, we developed a delay control circuit consisting of voltage and current reference circuits, a delay monitoring circuit, a current comparator, and a frequency-current converter. The operation of the circuit was confirmed by SPICE simulations with a set of 0.18-um standard CMOS parameters. The results demonstrated that process and temperature variations can be compensated 59% and 95%, respectively.