Title | Design Automation for Digital Microfluidic Biochips: From Fluidic-Level Toward Chip-Level |
Author | Tsung-Wei Huang, *Tsung-Yi Ho (National Cheng Kung University, Taiwan) |
Page | pp. 439 - 444 |
Keyword | Chip-Level, Digital Microfluidic Biochips, Optimization, Physical Design, Synthesis |
Abstract | Advances in droplet-based digital microfluidic biochips (DMFBs) have led to the emergence of biochips for automating laboratory procedures in biochemistry and molecular biology. These devices enable the precise control of microliter of nanoliter volumes of biochemical samples and reagents. They combine electronics with biology, and integrate various bioassay operations, such as sample preparation, analysis, separation, and detection.
To meet the challenges of increasing design complexity, computer-aided-design (CAD) tools have been involved to build DMFBs efficiently. This paper provides an overview of DMFBs and describes emerging CAD tools for the automated synthesis and optimization of DMFB designs, from fluidic-level synthesis to chip-level design. Design automations are expected to relieve the design burden of manual optimization of bioassays, time-consuming chip designs, and costly testing and maintenance procedures. With the assistance of CAD tools, users can concentrate on the development and abstraction of nanoscale bioassays while leaving chip optimization and implementation details to CAD tools. |
Title | Timing-Aware Clock Gating Algorithm for Pulse-Latch Circuits |
Author | *Zong-Han Yang, Tsung-Yi Ho (National Cheng Kung University, Taiwan) |
Page | pp. 445 - 450 |
Keyword | Clock Gating, Pulse Latch, Timing |
Abstract | Low power design is a crucial issue in modern circuit design.
Recently, several techniques are proposed to save power
consumption. One of them is the pulse-latch technologies which
replace the flip-flops with pulse-latches due to smaller
capacitance. To further reduce power consumption of
pulse-latch-based circuits, the clock gating of pulse-latch, which
is called pulser gating, has been proposed recently. However,
pulser gating may cause the violation of setup time constraint and
thus the data cannot be stored to the registers correctly, causing
a fatal error in the design. In this paper, we propose an
algorithm to deal with the problem of pulser gating and setup time
constraint simultaneously. We use a line-search algorithm to
capture the problem of setup time constraint and apply the
minimum-cost maximum-flow technique to determine the clock tree
topology of pulse-latch-based circuits. Experimental results show
that our algorithm can reduce power consumption effectively by
58.35% on average compared to binary merge algorithm. |
Title | Resistivity-based Modeling of Substrate Non-uniformity for Resistance Extraction of Low-Resistivity Substrate |
Author | *Yasuhiro Ogasahara, Toshiki Kanamoto (Renesas Electronics Corp., Japan), Hisato Inaba, Toshiharu Chiba (Renesas Design Corp., Japan) |
Page | pp. 451 - 456 |
Keyword | substrate noise, substrate extraction, low-resistivity substrate, doping profile |
Abstract | This paper discusses modeling of non-uniform substrate resistivity for substrate resistance extraction. Though substrate resistivity of each substrate layer is frequently assumed to be uniform, doping profile of each substrate layer is not uniform. We present the extraction error of substrate resistance under uniform resistivity assumption. The resistivity model which enables accurate resistance extraction of substrate with non-uniform profile is suggested. We also demonstrate characterization of the suggested model using substrate resistances which are easily obtained from fabricated chips. |
PDF file |
Title | Temperature-Constrained Fixed-Outline Floorplanning for 3D ICs |
Author | Ciao-Yu Hong, Wai-Kei Mak, *Ting-Chi Wang (Department of Computer Science National Tsing Hua University, Taiwan) |
Page | pp. 457 - 459 |
Keyword | Temperature, Fixed-Outline, Floorplanning, 3D-IC |
Abstract | Three-dimensional (3D) ICs are produced by stacking multiple dies and delivering inter-die signals with Through-Silicon Vias (TSVs). Typically, TSVs which deliver signals among dies are called signal TSVs, while those enhancing heat dissipation are called thermal TSVs. In this paper we present a temperature-constrained fixed-outline 3D-IC floorplanner which also simultaneously places signal and thermal TSVs to benefit wirelength and temperature reduction. Encouraging experimental results are shown to demonstrate the effectiveness and efficiency of our
floorplanner. |
Title | A GPGPU Implementation of Parallel Backward Euler Algorithm for Power Grid Circuit Simulation |
Author | Lei Lin, *Hayato Shiono, Makoto Yokota, Masahiro Fukui (Ritsumeikan University, Japan) |
Page | pp. 460 - 465 |
Keyword | power grid, simulator, GPGPU, Backward Euler |
Abstract | With the increase in VLSI scale, have been increasing the time required for power grid simulation. This paper describes a fast and accurate parallel transient simulator for power grid, which is implemented by GPU (Graphics Processing Unit) using CUDA. This simulator employs accurate simulation by Backward Eular method. Experimental results show that the proposed simulator can achieve 86.2 times faster than CPU software. |
Title | A Third Order Delta-Sigma Modulator with Shared Opamp Technique for Wireless Applications |
Author | *Ghazal Fahmy, Daisuke Kanemoto, Haruichi Kanaya, Ramesh Pokharel, Keiji Yoshida (Kyushu University, Japan) |
Page | pp. 466 - 467 |
Keyword | delta- sigma modulator, ADC, shared-opamp |
Abstract | This paper described the design of A third orders delta-sigma modulator (DSM) exploited shared opamp technique in order to reduce number of opamp required, consequently the total power consumption for the modulator decreased as well as required area decreased too. The architecture relaxed comparator speed which appropriate for wireless applications. First and second stages are sharing one opamp in integration and sampling phase. The proposed circuit has been designed on TSMC 0.18um CMOS technology. 2MHz Bandwidth, 50dB Peak Signal-to-Quantization-Noise Ratio (SQNR), which is suitable for WCDMA, have been achieved. It consumes 2.4mW with power supply 1.2V and area is 0.3mm2. |
PDF file |
Title | The Development of CAD System for Via Programmable Structured ASIC VPEX3 |
Author | *Ryohei Hori (Ritsumeikan University, Japan), Masaya Yoshikawa (Meijo University, Japan), Takeshi Fujino (Ritsumeikan University, Japan) |
Page | pp. 470 - 475 |
Keyword | Structured ASIC, Via Programmable, Exclusive-or |
Abstract | Various kinds of structured ASICs (SA) which can be customized by only few mask, make the photomask cost drastically decrease.
We have been developing the novel VPSA architecture "VPEX (Via Programmable logic device using EXclusive-or array)".
It is necessary to develop CAD system for VPEX, because there are no general tools supporting Placement and Routing for VPSA.
In this paper, we describe the dedicated CAD system and studied the area penalty of VPEX compared with ASIC. |
Title | A 0.5V PWM-Driven Analog Differential Amplifier Using Subthreshold Leakage Current |
Author | *Tomochika Harada, Ryuuya Otaki (Yamagata University, Japan) |
Page | pp. 484 - 487 |
Keyword | PWM, subthreshold, amplifier, mixied circuit |
Abstract | In this paper, we design and fabricate a PWM-driven analog differential amplifier using only sub-uA order subthreshold current for realizing ultra-low power analog/digital LSI system by using low output power supply. In this circuit, 2 inputs analog data are translated to PWM signals. And they are operated using differential calculation by digital processing method. This circuit has almost the same performance as the ultra-low power analog operational amplifier we designed. It is designed and fabricated using triple-well structure 65nm CMOS process. From measurement results, we make sure of the circuit operation and power consumption, which is 1.06uW@55kHz. |
Title | 16PE 3D-Mesh NOC Based 3D Multicore Design and Implementation |
Author | Mohamad Hairol Jabbar (ENSTA ParisTech, France), Dominique Houzet (GIPSA-LAB, France), *Omar Hammami (ENSTA ParisTech, France) |
Page | pp. 488 - 489 |
Keyword | 3D, multicore, mesh, noc, tezzaron |
Abstract | In this paper, we describe the design flow, architecture and implementation of our 3D multiprocessor with NoC . The design based on 16 processors communicating using a 4x2x2 mesh NoC spread on two tiers is discussed in detail and will be fabricated using Tezzaron technology with 130 nm Global Foundaries standard library. The purpose of this work is to accurately measure NoC performances in real 3D chip when running mobile multimedia applications to evaluate the impact of 3D architecture compared to 2D |
Title | A Performance Improvement for Floating-Point Arithmetic Unit with Precision Degradation Detection |
Author | *Soseki Aniya, Toshiaki Kitamura (Graduate School of Information Sciences, Hiroshima City University, Japan) |
Page | pp. 490 - 491 |
Keyword | performance improvement, precision degradation detection, vector processor |
Abstract | Some errors are very important in the scientific computation observed in floating-point calculations caused by rounding, overflow, underflow, loss of significant digits, or loss of trailing digits. In the prior work, we designed a vector co-processor that has floating-point arithmetic units with detection of loss of significant digits and precision degradation. We propose a partitioned vector co-processor design. The design can improve performance of the data transfer throughput between vector co-processor and SSRAM. Compared to the prior work, the number of execution cycles of the vector load instruction becomes twice faster in the RTL simulation. |
PDF file |
Title | Hardware Architecture for Real-Time Operation of Learning-Based Super-Resolution Using Binary Search Tree |
Author | *Takahiro Kitayama, Kohei Michibata, Yuzuru Shizuku, Tetsuya Hirose, Nobutaka Kuroki, Masahiro Numa (Kobe University, Japan) |
Page | pp. 492 - 496 |
Keyword | Learning-Based Super-Resolution, hardware architecture, stream data-processing system, real-time operation, pipeline |
Abstract | In this paper, we propose a hardware architecture for real-time operation of Learning-Based Super-Resolution using binary search tree. In the proposed architecture, the stream data-processing system is applied in the whole circuit, and burst transmission is applied between each module to improve the transfer rate. Moreover, the Search Dictionary module which has been a bottleneck is pipelined to improve the throughput. Experimental results have shown that the processing speed with our architecture is about 83 times faster than that of a software processing for a picture of 1,024 × 1,024 pixels. |
Title | Architecture Optimization of Group Signature Circuits for Cloud Computing Environment |
Author | *Sumio Morioka, Jun Furukawa, Yuichi Nakamura, Kazue Sako (NEC Corporation, Japan) |
Page | pp. 497 - 502 |
Keyword | cloud security, digital signature, server accelerator, IP core design, HLS |
Abstract | Group signature is one of the main theme in recent digital signature studies. The signature algorithm is a combination of more than 30 elliptic curve (ECC), modular (RSA), long-bit integer (INT) and hash arithmetic operations. In cloud computing environment where a lot of client devices (mobile devices, embedded systems, sensor devices and etc.) are connected to servers in data center via network, low-power and fast H/W accelerators are strongly desired. In this paper, we propose a H/W macro-architecture for servers in data center,
and will compare it with the architecture for client devices. While these architectures are completely different, we can use the same H/W design methodology where the architectures are explored automatically by a custom-made HLS (High Level Synthesis) tool. |
PDF file |
Title | Efficient Packet Transmission Priority Control Method for Network-on-Chip |
Author | *Yusuke Sekihara, Takashi Aoki, Akira Onozawa (NTT Microsystem Integration Laboratories, Japan) |
Page | pp. 503 - 507 |
Keyword | NoC, performance, priority, transmit, flit |
Abstract | To meet the ever-increasing need for high-performance computing, the performance of a single processor has been improved almost to its limit and parallelization has thus become inevitable. NoC architecture based on packet switching is becoming popular for large-scale parallelism. In this paper, we propose a new packet transmission control method in the NoC architecture that can improve the efficiency of the buffers. The simulation results prove that the proposed method can improve average latency about 10-20% when congested. |
PDF file |
Title | Efficient Barrier Synchronization for 2D Meshed NoC-based Many-core Processors |
Author | *Lovic Gauthier, Farhad Mehdipour, Koji Inoue, Shinya Ueno, Hiroshi Sasaki (Kyushu University, Japan) |
Page | pp. 510 - 515 |
Keyword | Barrier, Synchronization, NoC, Many-core, Multi-thread |
Abstract | Network-on-Chip (NoC) based many-cores are becoming
popular due to their high scalability compared to traditional
bus-based architectures. However they still lack software tailored to
their specificities. In this paper we propose several techniques for
tailoring and combining barrier synchronizations in order to take advantage of the 2D-meshed NoCs. Experimental results show that our combined barriers achieve often twice shorter delays than state of the art barriers. |
PDF file |
Title | Effective Distributed Parallel Scheduling Methodology for Mobile Cloud Computing |
Author | *Hiromasa Yamauchi, Koji Kurihara, Toshiya Otomo (Fujitsu Laboratories Ltd., Japan), Yuta Teranishi (Fujitsu Kyushu Network Technologies Ltd., Japan), Takahisa Suzuki, Koichiro Yamashita (Fujitsu Laboratories Ltd., Japan) |
Page | pp. 516 - 521 |
Keyword | Mobile phone, Parallel processing, Cloud computing, Scheduling, Sensor network |
Abstract | There is a category of the device such as mobile phones and the sensor devices. If each device is considered as a node, these devices will be considered to be a distributed parallel processing system. It is defined as “Mobile Cloud computing (MC)”. The collaborated processing between mobile phones, calculation by sensor devices, etc. are practical usage of MC. This MC differs from traditional parallel processing among servers, mainframe or HPC in respect of dynamic fluctuation of battery power and mobile network quality. We propose a distributed parallel scheduling methodology for MC and developed a simulator to analyze these characteristics and the bottleneck of MC. |
PDF file |
Title | Energy Efficient Instruction-set Extension Considering Inline Expansion |
Author | *Sho Ninomiya, Keishi Sakanushi, Yoshinori Takeuchi, Masaharu Imai (Osaka University, Japan) |
Page | pp. 528 - 533 |
Keyword | Instruction-set Extension, Inline Expansion, Energy-efficient, ASIP, Embedded Systems |
Abstract | To reduce energy consumption of applications in embedded systems, instruction-set extension suitable for the application is necessary on ASIP.
Inline expansion, one of the software optimization, is not considered in conventional instruction set extension method.
In this paper, we propose energy efficient instruction-set extension method considering inline expansion.
The experiment shows the proposed
method reduce more energy consumption. |
PDF file |
Title | Reduction of Glitches for Low-Power Multipliers Using 4-2 Compressors Based on Hybrid-CMOS Logic Style |
Author | *Yang-uk Son, Yuzuru Shizuku, Takeshi Kogure, Tetsuya Hirose, Nobutaka Kuroki, Masahiro Numa (Kobe University, Japan) |
Page | pp. 534 - 538 |
Keyword | low-power, multiplier, glitch, 4-2 compressor, 4-2 tree architecture |
Abstract | In this paper, we propose a technique to reduce glitches for reducing power consumption in multipliers. Conventional approaches using flip-flops for synchronization increase area and power. Our 4-2 compressor based on hybrid-CMOS logic style reduces glitches without additional circuits by using transmission-gates and pass-transistors which act like resistors when cascaded. In addition, CMOS inverters reduce speed deterioration. Simulation results have shown that the proposed technique reduces glitch activity by 1/12. |
Title | Affine Transformations of Logic Functions and Their Application to Affine Decompositions of Index Generation Functions |
Author | *Tsutomu Sasao, Masao Maeta (Kyushu Institute of Technology, Japan), Radomir Stankovic (University of Nis, Serbia), Stanislav Stankovic (Tampere University of Technology, Finland) |
Page | pp. 539 - 543 |
Keyword | linear transform, Incompletely specified function, functional decomposition, Boolean matching |
Abstract | Affine transformations are used to find optimal affine decompositions of incompletely specified index generation functions. This paper shows that the number of equivalence classes to consider is equal to the number of affine equivalence classes of logic functions. Exact minimum solutions with up to five variables are obtained. |
Title | An Error Diagnosis Technique Based on SAT Solver |
Author | *Tomoki Matsuyama, Hiroto Senzaki, Kosuke Watanabe, Tetsuya Hirose, Nobutaka Kuroki, Masahiro Numa (Kobe University, Japan) |
Page | pp. 544 - 548 |
Keyword | ECO, Error Diagnosis, SAT solver |
Abstract | This paper presents an error diagnosis technique based on a SAT solver, which has an advantage in lower memory consumption and larger number of variables to be processed in comparison with Binary Decision Diagrams (BDDs). The SAT solver is used for generating input patterns for error diagnosis, and verification of a solution by the proposed technique. By using the SAT solver, the proposed technique can rectify such large circuit that cannot be represented by BDDs. Experimental results have shown that our technique rectifies the circuit of 21,061 gates. |
Title | Performance Evaluation of Various Configuration of Adder in Variable Latency Circuits with Error Detection/Correction Mechanism |
Author | *Kenta Ando, Atsushi Takahashi (Osaka University, Japan) |
Page | pp. 549 - 554 |
Keyword | error detection/correction circuits, maximum delay time, minimum delay time, distribution of delay, effective clock period |
Abstract | The performance of a circuit is improved by introducing error detection/correction mechanism which uses the variation of delays between Flip-Flops effectively.
The performance of an error detection/correction circuit depends on the minimum delay, maximum delay, and delay distribution of the circuit.
In general, the performance is better if the larger the minimum delay is and/or the lower the possibility of large delay is.
However, circuits are usually designed so that the maximum delay is reduced as much as possible to maximize the performance in the conventional framework and are not necessarily fitted to error detection/correction framework.
In this paper, in order to develop a circuit synthesis method for error detection/correction framework, various ripple-carry-adders (RCA) in which the minimum delay is increased by delay insertion and/or the probability of large delay is reduced by changing the configuration of the circuit components are designed and evaluated.
In experiments, we confirm that a circuit obtained achieves a better performance in error detection/correction framework. |
PDF file |
Title | A Delay Control Technique for Extremely Low-Voltage Subthreshold CMOS Digital Circuits |
Author | *Seiichiro Shiga, Tetsuya Hirose, Yuji Osaki, Nobutaka Kuroki, Masahiro Numa (Kobe University, Japan) |
Page | pp. 555 - 559 |
Keyword | CMOS, subthreshold, on-chip, compensation circuit, PVT variation |
Abstract | In this paper, we propose a fully on-chip delay control technique for extremely low-voltage (ELV) subthreshold CMOS digital circuits. Because the performance of ELV subthreshold CMOS digital circuits degrades with the process, supply voltage, and temperature (PVT) variations, we developed a delay control circuit consisting of voltage and current reference circuits, a delay monitoring circuit, a current comparator, and a frequency-current converter. The operation of the circuit was confirmed by SPICE simulations with a set of 0.18-um standard CMOS parameters. The results demonstrated that process and temperature variations can be compensated 59% and 95%, respectively. |