|
|
||||||||||||||||||||
Monday, October 21, 2019 |
Title | (Keynote Speech) Microfluidics Meets Microbiology: The Journey of Digital Microfluidic Biochips from Laboratory Research to Commercialization and Beyond |
Author | *Krishnendu Chakrabarty (Duke University, USA) |
Page | p. 1 |
Keyword | Microfluidics |
Abstract | Digital microfluidics was transitioned to the marketplace for sample preparation by Illumina a few years ago. Since then, this technology has also been deployed by Genmark for infectious disease testing and Baebies for the detection of lysosomal enzymes in newborns. This lecture will describe the journey from early laboratory research, PhD theses and publication of research articles, to technology transfer and licensing to companies. Despite these success stories, there still remains a significant gap between microfluidics research and its adoption in microbiology. The presenter will describe how this gap can potentially be closed through new directions in digital microfluidics, including recent advances in micro-electrode-dot arrays, acoustofluidics, and countermeasures against malicious attacks on biomolecular protocols. |
Title | Energy-efficient ECG Signals Outlier Detection Hardware using a Sparse Robust Deep Autoencoder |
Author | *Naoto Soga, Shimpei Sato, Hiroki Nakahara (Tokyo Institute of Technology, Japan) |
Page | pp. 2 - 7 |
Keyword | Outlier Detection, autoencoder, a sparse network, FPGA, ECG |
Abstract | In recent years, portable electrocardiographs have begun to spread, which enable us to record electrocardiogram (ECG) signals in everyday life. A portable ECG analysis device is needed so that abnormal ECG waves can be detected anywhere. Machine learning techniques, including deep learning, are used in a lot of research to analyze ECG signals since they show more superb performance than conventional methods. However, deep learning models often have too many parameters to implement on mobile hardware. In this research, we propose a method to implement an ECG outlier detector using deep learning techniques in a small builtin device. As a way of detecting outliers, an autoencoder, which is based on neural networks, was used. A sparseness technique was applied to the autoencoder, and the trained autoencoder was implemented on a low-end FPGA. Compared with ARM Cortex M3 embedded processor, the proposed hardware result in 159 times better for energy-efficiency improvement. |
PDF file |
Title | A Design Space Exploration Method of SoC Architecture for CNN-based AI Platform |
Author | *Salita Sombatsiri (Osaka University, NEC Corporation, Japan), Jaehoon Yu, Masanori Hashimoto (Osaka University, Japan), Yoshinori Takeuchi (Kindai University, Japan) |
Page | pp. 8 - 13 |
Keyword | Design space exploration, System-on-a-chip, CNN, multi-layer bus |
Abstract | This paper proposes a design space exploration (DSE) method for CNN-based AI platform to find SoC architectures that optimally parallelize massive data computation and data transfer. First, the proposed DSE explores both functional blocks, which undertake a process execution, and their parameters, i.e. the number of instances and PEs, to parallelize CNN's intensive intra-process computation with the ease of system modeling and exploration. Second, a multi-layer bus architecture and configuration are optimized to parallelize data transfer by performing master-slave clustering with three-step channel mapping. Experimental result shows that the proposed DSE with pruning technique found 17 Pareto-optimal architectures from the design space of 2 million architectures within 11.5 hours, which is 21% time reduction compared to the exhaustive exploration. |
PDF file |
Title | Reconfigurable Activation Functions for Neural Networks Application |
Author | Yu-Jung Huang (I-Shou University, Taiwan), Meng-Jhe Li, *Wun-Siou Jhong, Shao-I Chu (National Kaohsiung University of Science and Technology, Taiwan) |
Page | pp. 14 - 17 |
Keyword | FPGA, activation function, neural networks |
Abstract | Field programmable gate arrays (FPGAs) have recently become popular for accelerating the deep learning networks due to their parallel processing and reconfigurable capabilities as well as their energy efficiency. This paper presents a multi-layer neural network architecture with novel reconfigurable activation functions by utilizing the coordinate rotation digital computer (CORDIC) technique and applying the floating-point format (IEEE 754 standard in single precision). The functionality was successfully verified in hardware using a DE2-115 board that included an Altera Cyclone® IV FPGA. |
PDF file |
Title | Minimization of Energy Consumption of Double Modular Redundancy Design of Conditional Processing by Common Condition Dependency |
Author | *Kazuhito Ito (Saitama University, Japan) |
Page | pp. 18 - 23 |
Keyword | Double modular redundancy, soft error, conditional processing, energy minimization |
Abstract | Double modular redundancy (DMR) is to execute an operation twice and detect soft error by comparing the operation results. The error is corrected by executing necessary operations again. The DMR design for conditional processing is considered in this work. A method is proposed which makes the secondary executions of the duplicated operations be dependent on the primary execution of the condition operation, thereby widening the schedule solution space and allowing better results to be derived. The minimization of energy consumption with the proposed method is formulated as ILP models and the optimum solution is obtained by using an ILP solver. |
PDF file |
Title | Application of Overlap-Add FFT Algorithm for Computation Reduction of Convolution Neural Networks |
Author | Hsia-Tsung Wang, *Wei-Kai Cheng (Chung Yuan Christian University, Taiwan) |
Page | pp. 24 - 26 |
Keyword | CNN, FFT |
Abstract | As the computation demand of CNNs is dominated by convolution layers, some researches exploit the duality between spatial domain and frequency domain through fast Fourier transform (FFT) to replace convolutions with pointwise multiplications. However, the FFT approach requires zero padding to enlarge the filter kernel to be the same size of input feature map. In this paper, we apply the overlap-add FFT algorithm to resolve the large zero padding problem in full FFT model. Our approach can fit all filter kernel size, and especially benefit small filter kernel size like 3x3. Experiments on ResNet-34 shows that in average, our overlap-add FFT scheme achieves near to 41% of convolution complexity, and can further reduced to 10% of complexity with circuit optimization. |
Title | Improving Global Motion Compensation for Frame Interpolation with High-Resolution and High-Frame-Rate Video |
Author | *Keita Ukihashi, Takashi Imagawa (Ritsumeikan University, Japan), Hiroshi Tsutsui, Yoshikazu Miyanaga (Hokkaido University, Japan), Hiroyuki Ochi (Ritsumeikan University, Japan) |
Page | pp. 27 - 32 |
Keyword | frame interpolation, motion compensation |
Abstract | In this paper, we propose a novel global motion compensation method to be used in frame interpolation from input video that consists of high-resolution less-frequent frames (keyframes) and low-resolution high-frame-rate (LR-HF) frames. To generate better-interpolated background from two keyframes using homography transformation, we improve the accuracy of global motion estimaion by eliminating and interpolating feature point (FP) and by detecting erroneous homography matrix. We also introduce an adaptive weight model for superimposing transformed keyframes. The experimental results show that the proposed method achieves interpolated frames with better quality than the conventional one. |
PDF file |
Title | Configurable Processor Hardware Developing Environment for RISC-V with Vector Extension |
Author | *Ryo Taketani (Department of Information Systems Engineering, Osaka University, Japan), Yoshinori Takeuchi (Department of Electric and Electronic Engineering, Kindai University, Japan) |
Page | pp. 33 - 38 |
Keyword | Configurable processor, RISC-V, Vector architecture |
Abstract | This research proposes a configurable processor hardware developing environment for RISC-V with vector extension. RISC-V is getting more attention as an open Instruction Set Architecture. RISC-V has vector extension specified for parallel computing takes power savings and high executed cycle performance into consideration. We challenged to implement a RISC-V based hardware processor with vector extension and evaluated it. |
PDF file |
Title | Improved Multiplier Architecture on ASIC for RLWE-based Key Exchange |
Author | *Tatsuki Ono, Song Bian, Takashi Sato (Graduate School of Informatics, Kyoto University, Japan) |
Page | pp. 39 - 40 |
Keyword | ring learning with errors, application specific integrated circuit, cryptography, key exchange, multiplier |
Abstract | The ring learning with errors (RLWE) problem is one of the most promising candidates for constructing quantum-resistant cryptosystems. In this work, we implement an improved hardware multiplier unit for RLWE key exchange schemes. By reducing internal processing units and shortening processing steps, circuit area, power, and latency are reduced to 0.63x, 0.48x, and 0.86x, respectively, compared to the conventional architecture. |
Title | Parameter Embedding for Efficient FPGA Implementation of Binarized Neural Networks |
Author | *Reina Sugimoto, Nagisa Ishiura (Kwansei Gakuin University, Japan) |
Page | pp. 41 - 45 |
Keyword | binarized neural network, FPGA implementation, parameter embedding |
Abstract | A binarized neural network (BNN), a restricted type of neural network where weights and activations are binary, enables compact hardware implementation. While the existing architectures for BNN assume that weights and biases are stored in on-chip RAMs, this paper presents an attempt to embed those parameters into processing elements by utilizing LUTs in FPGAs as ROMs. This eliminates the bandwidth limitation between memories and neuron PEs and allows higher parallelism, as well as it reduces the hardware cost of the neuron PEs. This paper also proposes a map-shift scheme to efficiently supply the neuron PEs with feature map data for convolution. As a case study, LeNet5 has been implemented based on this method targeting Xilinx FPGA Artix-7, which can process a frame in 1,386 cycles at 21.1MHz. |
PDF file |
Title | A 4CH CNN Hardware Architecture for Image Super-Resolution |
Author | *Koyo Suzuki, Kazuki Mori, Nobutaka Kuroki (Kobe University, Japan), Tetsuya Hirose (Osaka University, Japan), Masahiro Numa (Kobe University, Japan) |
Page | pp. 46 - 50 |
Keyword | Super-Resolution, CNN |
Abstract | This paper presents two hardware architectures for super-resolution technology with 4CH CNN (convolutional neural network with four output-channels). We introduce time-division processing to save resources. Moreover, we propose a technique to save resources by sharing some part of the circuit in one architecture. Experimental results have shown that the architecture reduces resources by about 4 to 21 pt. compared to the other architecture. Both architectures speed up about 5.5 times as fast as software processing. |
Title | Approximate Function Configuration by Neural Network on Memory-array Unit |
Author | *Xuechen Zang, Shigetoshi Nakatake (The University of Kitakyushu, Japan), Hiroyuki Kozutsumi, Mitsunori Katsu (TRL Corp., Japan), Shoichi Sekiguchi (TAIYO YUDEN Co., LTD, Japan) |
Page | pp. 51 - 55 |
Keyword | Approximate Computing, Reconfigurable Systems, MRLD, Approximate Logic |
Abstract | This paper presents approximate computing consistent with a memory-based reconfigurable logic device (MRLD). We propose a novel implementation flow how to realize a function of multiple look up table (MLUT) by employing neural network (NN) based machine learning. Like a function fitting, our method implement a logic function induced by a set of input and output. To verify the performance of approximate computing implementation, we compare a general polynomial regression method and a deep neural networks. The results suggest relatively a deeper NN is superior on loss value and accuracy rate. The NN models achieve lower symbol error rate (SER) and get considerable loss reduction respectively compared to the polynomial regression. Besides, we demonstrate how to use such models for an 8-bit inverter logic example. |
Title | A Deep Neuro-Fuzzy for False Decision Prevention on an FPGA |
Author | *Masayuki Shimoda, Hiroki Nakahara (Tokyo Institute of Technology, Japan) |
Page | pp. 56 - 61 |
Keyword | Deep Neural Netwrok, Fuzzy Inference, FPGA |
Abstract | We propose a deep neuro-fuzzy that consists of a deep neural network(DNN) and fuzzy inference. The fuzzy inference judges whether inputs are distinguishable or not from the DNN outputs to avoid critical errors(e.g., recognizing malignancy data as benign one). When our system detects a distinguishable data, it outputs indistinguishable. Experimental results shows that the recall increased by 20.52% in the best case and its area and computation time are almost the same compared with typical DNNs. Thus, our proposal is more suitable for embedded systems under the situations where the error is critical. |
Title | A Real Chip Evaluation of a CNN Accelerator SNACC |
Author | *Ryohei Tomura, Takuya Kojima, Hideharu Amano (Dept. of Information and Computer Science, Keio University, Japan), Ryuichi Sakamoto, Masaki Kondo (Graduate School of Information Science and Technology, The University of Tokyo, Japan) |
Page | pp. 62 - 67 |
Keyword | Accelerator, CNN |
Abstract | SNACC (Scalable Neuro Accelerator Core with Cubic integration) is an accelerator for deep neural network, which can improve the performance by increasing the number of stacked chips with inductive coupling wireless through chip interface (TCI). The chip implementation and real chip evaluation of SNACC are introduced. It consists of four processing element cores which executes dedicated SIMD instructions, distributed memory modules for storing weight data, and TCI. The real chip evaluation by using Lenesas Electronics’ 65nm SOTB (Silicon On Thin Box) CMOS technology appears that a simple CNN LeNet works at 50MHz for all layers with 0.90V supply voltage. The power consumption is less than 12mW. The performance can be enhanced by the forward body biasing about 15% in exchange for about 2mW leakage increasing. Also, SNACC archieved more than 20 times high performance to a MIPS R3000 compatible embedded processor. |
PDF file |
Title | IMU-based Rehabilitation System for Upper and Lower Limbs |
Author | Chun-Jui Chen, Yi-Ting Lin, Chia-Chun Lin (Department of Computer Science, National Tsing Hua University, Taiwan), Yung-Chih Chen (Department of Computer Science and Engineering, Yuan Ze University, Taiwan), *Chun-Yao Wang (Department of Computer Science, National Tsing Hua University, Taiwan) |
Page | pp. 68 - 73 |
Keyword | Rehabilitation, knee angle, elbow angle |
Abstract | In this work, we present an IMU-based rehabilitation system for upper and lower limbs. This system uses two wearable IMU sensors to detect rehabilitation motions of patients suffering from frozen shoulder, knees, and hip surgeries. The sensors are also connected to a smartphone via Bluetooth, and an Android APP is designed to show the correctness and the statistics of the rehabilitation exercises. The experimental results show that the average errors of knee angle, and elbow angle are both less than 5°. The average recognition rates of all rehabilitation exercises are larger than 85%. |
Title | A Smart Single-Sensor Device for Instantaneously Monitoring Lower Limb Exercises |
Author | Yan-Ping Chang, Teng-Chia Wang, Chun-Jui Chen, Chia-Chun Lin (National Tsing Hua University, Taiwan), *Yung-Chih Chen (Yuan Ze University, Taiwan), Chun-Yao Wang (National Tsing Hua University, Taiwan) |
Page | pp. 74 - 79 |
Keyword | stride count, walking distance, 9-axial sensor |
Abstract | Studies have shown that stair exercises can enhance the strength of lower limbs for patients with limb disorders. However, there are only few systems that can monitor the lower limb exercises in the medical institutes. To analyze the lower limb exercises instantaneously, we propose a smart single-sensor wearable device, S3-Sock, equipped on shoes. The sock can monitor and measure the stride count, step height, and the distance of step trajectory about lower limb exercises. The experimental results demonstrate that the proposed system is reliable under different lower limb exercises. The averages of absolute mean errors of stride count in stair-climbing and walking are about 2.00% and 0.88%, respectively. The averages of absolute mean errors of step height are about 5.12% and 8.23% in step-by-step and step-over-step stair climbing, respectively. |
Title | 1-D GDR Aware Cell Generation via P/N bi-partition |
Author | Yao-Lin Chang, Hung-Ming Chen, *Wei-Tung Chao, Chien-Hung Lin (National Chiao Tung University, Taiwan) |
Page | pp. 80 - 81 |
Keyword | layout, standard cell |
Abstract | As the complexity of a layout design grows, layout generation problem has been more challenging. This work features the bi-partition tree and the selective stage. With this bi-partition tree, we speed up the layout generation flow and guarantee no additional wire length. With objective functions in the placement selection stage and the routing stage, a lithographyfriendly layout with low congestion, minimum area and high performance is accomplished. |
PDF file |
Title | (Invited Talk) LSI Design and Current Topics for Automotives |
Author | *Toshihiro Hattori (Renesas Electronics, Japan) |
Page | p. 82 |
Keyword | Automotives |
Abstract | Automotive is one of the major applications for the semiconductor devices. And the semiconductor devices are the key factors to support the current innovation of MOBILTY (automotive) systems. Firstly, I will explain the different needs, feature, and technology for automotive oriented LSI’s. As you know, Automotive technology is performing a drastic innovation leaded the key words “CASE (Connected, Autonomous, Shared & Services, Electric” and “MaaS (Mobility as a Service)”. I will overview the trends and needs for automotive LSI’s. Functional Safety and Security is the key technology required current automotive LSI’s. I will explain the trends and background of autonomous driving and show the example of the latest implementation for autonomous driving support LSI’s. I will show the background of the functional safety trends and the example of a 28nm automotive flash microcontroller for next-generation automotive architecture complying with ISO26262 ASIL-D. I will show the background of the security trends in automotive and the example of a 24MB embedded flash system based on 28nm SG-MONOS featuring robust over-the-air software update. |
Title | Insertion Based Procedural Construction of Parallel Prefix Adders |
Author | *Bo-Yu Tseng, Mineo Kaneko (Japan Advanced Institute of Science and Technology, Japan) |
Page | pp. 83 - 88 |
Keyword | adder, optimization, binary tree |
Abstract | As a novel approach to the design of parallel prefix adders, the framework of the procedural construction of parallel prefix adders has been proposed. This approach aims to configure the prefix tree structure by the sequence of basic structural operations. Among several basic operations, ``insertion'' has a potential to produce a variety of prefix structures while keeping the hardware cost low. This paper explores the essential structural variations achieved by insertion operation, and proposes a coding scheme which can represent all these essential variations with excluding redundancy as much as possible. In our approach, we focus on the sequence of insertion operations applied at various positions, and propose to use a binary tree to specify the order of applying insertion operations. Our discussions in this paper would be an important base for the optimization of parallel prefix adder, which is one of our future works. |
Title | 3D Test Wrapper Chain Synthesis for Test Time and TSV Count Co-optimization under Constraints on I/O Cells |
Author | Fan-Hsuan Tang, Hsu-Yu Kao, *Shih-Hsu Huang (Chung Yuan Christian University, Taiwan) |
Page | pp. 89 - 94 |
Keyword | SoC Testing, Test Wrapper Chain Synthesis, Design for Testability, TSV Count Minimization, 3D ICs |
Abstract | In addition to test time minimization, the number of testing TSVs is also an important concern for the 3D test wrapper chain synthesis problem. Previous co-optimization algorithms only can work under no constraints on I/O cells. In this paper, we propose a single-stage KL (Kernighan-Lin) based algorithm to overcome this drawback. Different from previous works, the proposed synthesis algorithm can take specified I/O cells constraints into account during co-optimization. Benchmark data consistently show that the proposed algorithm can greatly reduce both test time and TSV number. |
Title | A New Approach to Express Stochastic Numbers |
Author | *Yukino Watanabe, Shigeru Yamashita (Graduate School of Science and Engineering, Ritsumeikan University, Japan) |
Page | pp. 95 - 98 |
Keyword | Stochastic Computing, Stochastic Numbers |
Abstract | Stochastic Computing (SC) is a technique to calculate complex functions with very small hardware overhead when we can allow some small errors. SC uses Stochastic Numbers (SNs) which are generally long (e.g., 1024) bit string; we need many cycles to calculate a function with SNs. In this paper, we propose a novel idea to reduce the length of SNs while the precision level of SNs is not changed. Our idea is to express one SN by using two bit-strings, and the two bit-strings has different weights. The multiplication of two SNs by our expression is not trivial. So we propose how to multiply two SNs by our new expressions. Then we show some experimental results to confirm that our proposed multiplication can provide almost similar error rate as the conventional SNs with significantly small length of bits. |
Title | Rapid Single-Flux-Quantum Matrix Multiplication Circuit Utilizing Bit-Level Processing |
Author | *Nobutaka Kito, Takuya Kumagai (Chukyo University, Japan), Kazuyoshi Takagi (Mie University, Japan) |
Page | pp. 99 - 103 |
Keyword | matrix multiplication, RSFQ circuits |
Abstract | A rapid single-flux-quantum (RSFQ) matrix multiplication circuit utilizing bit-level processing is presented. The proposed circuit utilizes characteristics of pulse logic used in RSFQ circuits and utilizes bit-level processing. The circuit carries out multiplications and additions by counting pulses on signal lines. It uses fewer gates compared with previously proposed parallel processing designs and could be realized in small layout area. A layout for 4-bit 4 x 4 matrix multiplication was designed and its correct operation was verified in simulation. |
PDF file |
Title | Irregular Bumps Design Planning for Modern Ball Grid Array Packages |
Author | Hsin-Yu Chang, Jyun-Ru Jiang, Simon Chen, Hung-Ming Chen, *Ya-Ying Chien (National Chiao Tung University, Taiwan) |
Page | pp. 104 - 109 |
Keyword | flip-chip packages, routability |
Abstract | In modern flip-chip packages, bumps are often placed irregularly due to different design needs. It costs a great amount of time and manual effort to generate substrate routing from bumps through vias to package balls. Moreover, any single model in prior works could not be simultaneously applied between bumps, vias and balls. In this work, we propose a hybrid flow network model to formulate the 2-layer substrate routing problem on irregular package structure. We present a new bump model that can handle irregular bump plans. With our methodology, signal assignment on vias and balls, and substrate routing on two layers can be obtained at the same time. We also present an iterative optimization technique to improve wire congestion. Our results show that the proposed method completes via and ball assignment efficiently, and obtain 100% routability and an average wirelength improvement of 16.45%, compared with manual design in real industrial cases. |
PDF file |
Title | Droplet Splitting Routing for Micro-Electrode-Dot-Array Digital Microfluidic Biochips |
Author | *Ikuru Yoshida, Kota Asai (Graduate School of Science and Engineering, Ritsumeikan University, Japan), Tsung-Yi Ho (National Tsing Hua University, Japan), Shigeru Yamashita (Graduate School of Science and Engineering, Ritsumeikan University, Japan) |
Page | pp. 110 - 115 |
Keyword | biochips, droplet routing, micro-electrode dot array |
Abstract | Digital micro fluidic biochips (DMFBs) is one of the most promising technologies to use for sample preparation. Among them, DMFBs based on micro-electrode dot array (MEDA) is the technology overcoming the drawback of a conventional DMFB. On MEDA based biochips, we can perform droplet shaping and splitting operations that cannot be performed on a conventional DMFB. In this paper, we propose an efficient droplet routing method by splitting droplets in MEDA when there are multiple spaces between block regions. We confirm by our experiment that our method indeed can reduce the necessary time steps for droplets to reach target regions. |
Title | Exploring Time-space Trade-off for Application Mapping onto 3-D Torus NoCs |
Author | *Yao Hu, Michihiro Koibuchi (National Institute of Informatics, Japan) |
Page | pp. 116 - 117 |
Keyword | Network-on-Chip (NoC), topology embedding, interconnection network, job mapping |
Abstract | One application usually has many parallel tasks running on multiple processing cores which communicate with each other on a many-core chip. Traditionally, the tasks are mapped onto a regular topology of network-on-chip (NoC) with nearby processing cores to reduce the network distances. In this case, fragmentation of unused processing cores may occur when receiving a new incoming application on a chip. In this study, we assume that each application has to be executed on a pre-fixed network topology on a many-core chip with 3-D torus NoC. To improve the system utilization, i.e. reducing a number of unused processing cores, we allow to use non-adjacent processing cores for an application mapping, which form a pre-fixed network topology. We evaluate the time-space trade-off during node allocation with different mapping dilations for the purpose of improving job scheduling abilities. Evaluation results show that, for a large compound workload of NAS Parallel Benchmarks (NPB) applications, the proposed mapping can reduce up to 6% of turnaround time when compared with the regular topology mapping on a large 3-D torus NoC. |
PDF file |
Title | On Power Supply Pads Planning for Wire-bonded IC |
Author | Hui Zhong Leong, *Ming-Yu Huang, Hung-Ming Chen (NCTU Taiwan, Taiwan), Chang-Tzu Lin (ITRI Taiwan, Taiwan) |
Page | pp. 118 - 121 |
Keyword | power supply, pdn, wire-bonded ic |
Abstract | In wire-bonding technology, Input/Output (I/O) pads are located along the peripheral of integrated circuit (IC) and power pad placement is limited by available I/O pad candidates. Power pads supply voltage to the IC through power delivery network (PDN), hence insufficient power pads may cause IC failure. To overcome this problem, we propose a power pad placement algorithm for wire-bonding technology. Experimental results show that the proposed algorithm determines both power pad counts and power pad locations effectively for a given power delivery network. In addition, the worst voltage drop for the IC is guaranteed to be less than 3% of the supply voltage. |
PDF file |
Title | Sample Preparation with Efficient Dilution of Biochemical Fluids using Programmable Microfluidic Devices |
Author | *Ying Shuaijie (Graduate School of Science and Engineering, Ritsumeikan University, Japan), Sudip Roy (Indian Institute of Technology (IIT) Roorkee, India), Juinn-Dar Huang (National Chiao Tung University, Taiwan), Shigeru Yamashita (Graduate School of Science and Engineering, Ritsumeikan University, Japan) |
Page | pp. 122 - 125 |
Keyword | PMD, Sample preparation, two steps, small area |
Abstract | Sample preparation, which is a front-end process to produce the desired target concentrations of the input reagent fluid, plays a pivotal role in every bioassay or biochemical laboratory protocol. In this paper, we propose two sample preparation algorithms for efficient dilution of biochemical fluids using programmable microfluidic devices (PMDs). The first method is called as dilution algorithm in two steps (DATS), which needs only two diluting operations. Whereas, the other method is called as dilution algorithm in a small dilution area (DASDA), which needs less area compared to that by DATS. |
Title | An Efficient Character Generation Algorithm for High-Throughput E-Beam Lithography |
Author | *Shih-Ting Lin, Hong-Yan Su (National Chiao Tung University, Taiwan), Oscar Chen (AnaGlobe Technology, Inc, Taiwan), Yih-Lang Li (National Chiao Tung University, Taiwan) |
Page | pp. 126 - 131 |
Keyword | Character projection E-beam lithography, exact pattern matching, frequently used character, multi-intersection-level layout |
Abstract | E-beam lithography has been one of promising next generation lithography for 7nm and below technology nodes. Among vari-ous electron-beam lithography features, character projection (CP) attracts users because complex patterns can be printed in one e-beam shot. However, we still face severe challenges of gen-erating characters on interconnection layers due to its pattern diversity. In this paper, we proposes a multi-intersection-level (MIL) layout that can efficiently capture the relationships be-tween nearby objects including the spacing between them. The inflated layer reduces the problem instance size for identifying the frequently used patterns while the intersection layers help in clipping windows to obtain ideal character set. Experimental results show that the proposed methodology can efficiently yield the frequently used character set with up to 93.3% and 81.23% covering rate in via layer and metal layer. Besides, for a panel layout, a set of frequently used characters to reach 100% cov-ering rate is successfully identified. |
PDF file |
Title | Color Balancing-aware Non-Stitch Routing for Multiple Patterning Lithography |
Author | *Jia-Hong Chang, Shao-Yun Fang (National Taiwan University of Science and Technology, Taiwan) |
Page | pp. 132 - 135 |
Keyword | Multiple Patterning Lithography, Color Balancing, Routing |
Abstract | Multiple Patterning Lithography (MPL) is one of the major resolution enhancement technologies for sub-20 nm nodes, which requires to decompose a layout into multiple masks considering the minimum mask spacing rule. In this paper, we propose an MPL-aware routing algorithm considering mask usage balancing to optimize pattern printability. Different from previous works, stitch insertion is not considered in our router since stitches are usually forbidden in industry to guarantee sufficient yield. To maximize the flexibility in mask usage optimization that is deficient for non-stitch routing, a multiple-objective minimum spanning tree algorithm (MO-MST) is proposed to make the distribution of generated wire segments more scattered. An integer linear programming (ILP)-based color refinement approach is also proposed to optimize mask usage balancing. Experimental results show that the proposed algorithm flow can generate MPL-compliant routing solutions with excellent mask usage balancing for the benchmarks released by 2018 CAD Contest at ICCAD. |
PDF file |
Title | An Efficient and Effective Macro Placement Algorithm for Large-Scale Mixed-Size Designs |
Author | Jai-Ming Lin, You-Lun Deng, Ya-Chu Yang, *Jia-Jian Chen (Department of Electrical Engineering, National Cheng Kung University, Taiwan) |
Page | pp. 136 - 137 |
Keyword | macro placement, simulated evolution, physical design, design hierarchy, mixed-size |
Abstract | We propose a novel approach which integrates the simulated evolution algorithm and corner stitching data structure. Unlike the simulated annealing algorithm which existing works adopt, our approach prevents a solution from getting stuck at a local optimal solution but takes smaller runtime. Even though a chip contains several preplaced macros and may not abutted to chip boundaries, our approach is able to be handled these situations. Experimental results show that our approach obtains better results in wirelength, routability, and runtime. |
Title | Thermal Modeling and Simulation of a Smart Wrist-worn Wearable Device |
Author | *Kodai Matsuhashi (Hirosaki University, Japan), Koutaro Hachiya (Teikyo Heisei University, Japan), Toshiki Kanamoto, Masasi Imai, Atsushi Kurokawa (Hirosaki University, Japan) |
Page | pp. 138 - 143 |
Keyword | wearable device, thermal design, smart watch |
Abstract | We propose a thermal-circuit model that can calculate temperatures in important places for thermal designs of smart wrist-worn wearable devices. The thermal model can be applied to various wrist-worn wearable devices, which consist of different device-body shapes, belt sizes, and materials. The temperatures obtained using the proposed model agree well with those obtained by a commercial thermal solver. Moreover, by simulations applying the model, we present important knowledge for thermal designs of wrist-worn wearable devices. |
PDF file |
Title | Mixing of Biochemical Fluids using Programmable Microfluidic Devices |
Author | *Yuto Umeda (Graduate School of Science and Engineering, Ritsumeikan University, Japan), Sudip Roy (Indian Institute of Technology (IIT) Roorkee, India), Shigeru Yamashita (Graduate School of Science and Engineering, Ritsumeikan University, Japan) |
Page | pp. 144 - 149 |
Keyword | programmable microfluidic device, the number of mixing operations, assigning reagents |
Abstract | A programmable microfluidic device (PMD) can mix the reagents in various ratios. In this paper, we propose a mixing method to reduce the number of mixing operations on PMDs. Our method finds the best assignment of each reagent to each mixing operation so that we can reduce the number of mixing operations by simplifying the ratio of reagents and reusing intermediate waste reagent. Experimental results show that our proposed method can make mixing trees with the smallest number of mixing operations. |
Title | Generalized Via Pattern Awareness Substrate Routing Framework for Fine Pitch Ball Grid Array |
Author | Jun-Sheng Wu, Chi-An Pan, *Yi-Yu Liu (National Taiwan University of Science and Technology, Taiwan) |
Page | pp. 150 - 151 |
Keyword | Routing, ILP |
Abstract | Packaging substrate has become one of the most important carriers to enable system-level and heterogeneous design within a small footprint size. Instead of applying advanced semiconductor interposer process technologies, the fine pitch ball grid array (FBGA) package substrates are manufactured by mechanical processes. To tackle stringent design rules owing to the mismatched via dimension and miscellaneous routing obstacles, substrate interconnect designs are usually customized by experienced substrate layout engineers. However, fully net-by-net manual design for hundred-scale FBGA is time consuming and error-prone. In this paper, we model the FBGA substrate routing as an integer linear programming (ILP) problem taking various via patterns and design-dependent constraints into account. Two-stage early exit methodology and ILP constraint reduction techniques are developed to boost the runtime of ILP solver. Experimental results indicate the potential of the proposed framework. We argue that complex FBGA designs could be semi-automated by using via pattern candidates to reduce the substrate layout design cycle time. |
PDF file |
Title | Acceleration of Radix-Heap based Dijkstra algorithm by Lazy Update |
Author | Tomohiro Takahashi (University of Kitakyshu, Japan), *Yasuhiro Takashima (University of Kitakyushu, Japan) |
Page | pp. 152 - 157 |
Keyword | Dijkstra's algorithm, Lazy update, Radix-heap |
Abstract | This paper proposes a fast Dijkstra algorithm with radix-heap by lazy update which solves the single source shortest path problem (SSSP). The conventional Dijkstra algorithm chooses one vertex with the minimum tentative distance among the unvisited vertices. For the problem, the relaxation of the number of selected vertices not only one but also multiple under the guarantee of its optimality has been proposed, called lazy update. In this paper, we utilize this lazy update method to the radix-heap based Dijkstra which solves SSSP with the integer edge distances. The experimental results confirm the efficiency of the proposed method which execute 50 % faster than the conventional Dijkstra. |
PDF file |
Title | A Global Placement Method for RECON Spare Cells in ECO-Friendly Design Style |
Author | *Junpei Akashi, Suguru Hojo, Nobutaka Kuroki (Kobe University, Japan), Tetsuya Hirose (Osaka University, Japan), Masahiro Numa (Kobe University, Japan) |
Page | pp. 158 - 163 |
Keyword | ECO, reconfigurable cell, error diagnosis, technology remapping |
Abstract | This paper presents an approach to obtain suitable global placement of RECON spare cells in the ECO (Engineering Change Order)-friendly design style based on the statistics with each subregion concerning critical and near-critical paths, occupancy of RECON embedded cells, and utilization of RECON cells. Experimental results have shown that the proposed method is effective to fix post-mask ECO’s suppressing increase in the maximum delay time compared with the conventional approach. |
Title | An Efficient Thermal Model of Thin Film NiCr Resistors Considering Pulse Response |
Author | *Ryosuke Watanabe (Hirosaki University, Japan), Keita Izawa (Nikkohm Co., Ltd., Japan), Shota Kajiya, Daiki Tsunemoto, Koki Kasai, Atsushi Kurokawa, Toshiki Kanamoto (Hirosaki University, Japan) |
Page | pp. 164 - 167 |
Keyword | Thin film resistors, Thermal circuit analysis |
Abstract | This paper proposes an efficient thermal model of an industrial thin film NiCr resistors. We considered the thermal destruction effect of the thin film NiCr resistors for high pulsed power incident condition. The thin film NiCr resistors considered in this study have two types of thermal time constant. TCAD calculation indicates that the short thermal time constant around 55 $\mu$s exist in the resistors, and experimental results indicate that long thermal time constant around 40 seconds exist. Therefore, to analyses the thermal transient behaviors of the resistors more precisely, we propose the thermal circuit model that includes both the short and long thermal time constant. In the model, thermal resistance and heat capacitance of the thin NiCr sheet are precisely considered, and these parameters are quite important for the existence of short thermal time constant. Existence of the short thermal time constant in this model strongly related to the peak temperature of the considered resistors, and we think that the short time thermal response of the thin film NiCr resistors is related to the pulse durability of the resistors. |
Title | A Smart Knee Pad for Stride Count and Walking Distance Measurement via Knee Angle Calculation |
Author | Teng-Chia Wang, Yan-Ping Chang, Chun-Jui Chen, *Chia-Chun Lin (National Tsing Hua University, Taiwan), Yung-Chih Chen (Yuan Ze University, Taiwan), Chun-Yao Wang (National Tsing Hua University, Taiwan) |
Page | pp. 168 - 173 |
Keyword | knee angle, stride count, walking distance, 9-axial sensor |
Abstract | To calculate the knee angle, stride counts, and walking distance, we propose a system, iKneePad, fusing two 9-axis sensors with Bluetooth equipped on the thigh and shank segments. The changing rates of hip and knee angles are used to determine the beginning and the ending of a stride. The thigh length, shank length, hip angle, and knee angle are used to calculate the walking distance. The experimental results show that the accuracy of stride count is 100%, the absolute mean errors of knee angle are 2.99 and 1.42 for the maximum and minimum flexion angles, respectively. For walking distance, the mean error rates are -2.40% and -2.26% for short (10m) and long (33m) distances, respectively. The proposed system also instantly provides feedback to users by showing on an Android smartphone when conducting rehabilitation or exercise with iKneePad. |
Title | (Panel Discussion) Quo Vadis, EDA? |
Author | Moderator: Hung-Ming Chen (National Chiao Tung University, Taiwan), Panelists: Krishnendu Chakrabarty (Duke University, USA), Ulf Schlichtmann (Technische Universität München, Germany), Toshihiro Hattori (Renesas Electronics, Japan), Pai H. Chou (National Tsing Hua University, Taiwan), Akira Fujimaki (Nagoya University, Japan), Donald Lie (Texas Tech University, USA), Organizer: Tsung-Yi Ho (National Tsing Hua University, Taiwan) |
Page | p. 174 |
Keyword | EDA |
Abstract | Nowadays electronics and biomedical designs/applications have been facing critical moments, including the end/extension of Moore's law, killer applications and sustainability issues, etc. How to leverage all possible solutions in design and tools development including the employment of AI is thus essential. In this panel, we have six international researchers leading the discussion in the fields of biomedical, optical designs, automotives, 5G/IoTs, and quantum computing, figuring out how EDA can help shape the future designs. |
Tuesday, October 22, 2019 |
Title | (Keynote Speech) EDA for Optical Networks-on-Chip (ONoCs): Achievements and Future Opportunities |
Author | *Ulf Schlichtmann (Technische Universität München, Germany) |
Page | p. 175 |
Keyword | Optical NoCs |
Abstract | Optical Networks on Chip (ONoCs) are a promising technology to resolve some issues which are increasingly plaguing traditional electrical NoCs. Excessive power consumption is chief among these issues. As researchers started looking into architectural options for ONoCs, it soon became apparent that Electronic Design Automation (EDA) would be very beneficial to improve such architectures and especially their physical implementation, e.g. due to the complexity involved. This is true already on a netlist level, but even more so once physical design is considered. Thus, since about 10 years, researchers have started working on EDA approaches for the design of ONoCs. I will review some achievements of EDA for ONoCs, with a focus on physical design (placement, routing). I will discuss current challenges in further improving EDA results. This will be followed by a look at opportunities how EDA research can further improve ONoC architectures. Opportunities exist especially in simultaneously considering multiple design aspects. The emphasis in this talk will be on Wavelength-Routed ONoCs (WRONoCs). |
Title | Efficiency Investigation of Capacitors Mounted on Re-distribution Layers for FOWLP |
Author | *Koki Kasai, Atsushi Kurokawa, Masashi Imai, Toshiki Kanamoto (Hirosaki University, Japan) |
Page | pp. 176 - 179 |
Keyword | PDN, Impedance, Capacitance, FOWLP |
Abstract | This paper provides insights on effective usage of an emerging decoupling capacitor. Power supply noise is one of the most serious concerns in the modern low voltage integrated circuits. Decoupling capacitors embedded in the re-distribution layers (RDL) are potentially effective to reduce the noise caused by the internal switching. However, the effectiveness of them is easily lost due to the equivalent series inductance and resistance. Here, we construct a post-layout simulation test bench to discuss the effectiveness by evaluating impedance profile as well as transient noise waveform. The experimental results show that the horizontal proximity of the RDL embedded capacitors to the noise source is an important factor to keep the advantage. |
PDF file |
Title | Unbalanced Splitting Tolerant Sample Preparation Algorithm for Digital Microfluidic Biochips |
Author | Ling-Yen Song, Yi-Ling Chen, Yung-Chun Lei, *Juinn-Dar Huang (Institute of Electronics, National Chiao Tung University, Taiwan) |
Page | pp. 180 - 183 |
Keyword | digital microfluidic biochip, sample preparation, unbalanced splitting, probability-based forecast, forecast-based correction |
Abstract | Sample preparation is regarded as one of necessary processing steps in most biochemical assays. In the past decade, several techniques have been presented to deal with sample preparation issues under the (1:1) mixing model on digital microfluidic biochips (DMFBs). Most of previous works assumed that mixing-then-splitting would get two identical output droplets. However, due to uncontrollable variabilities, previous works may fail to provide exact solutions as the present of unbalanced splitting. In this paper, we propose a new forecast-based correction algorithm for unbalanced splitting problem. Our new algorithm not only guarantees a correct solution, but requires neither extra reactants nor on-chip specialized hardware. Experimental results show that the effect of unbalanced splitting can be eliminated only at the cost of 20% more operation steps. Therefore, the proposed algorithm is both reliable and efficient. |
Title | KR-CHIP: An Educational Computer equipped with 8-bit Accumulator-based, 16-bit Accumulator-based and 32-bit Pipeline Processors |
Author | Hiroyuki Kanbara (ASTEM RI, Japan), Kagumi Azuma, Yuuki Oosako (Kwansei Gakuin University, Japan), Atsuya Shibata (Nara Institute of Science and Technology, Japan), *Wakako Nakano (Kwansei Gakuin University, Japan) |
Page | pp. 184 - 189 |
Keyword | Education, CPU, FPGA, Accumulator-based, Pipeline |
Abstract | This article presents a processor for computer education named KR-CHIP. KR-CHIP integrates 3 CPUs: 8-bit accumulator-based, 16-bit accumulator-based and 32-bit pipeline architecture. Every register, counter, flag and memory can be observed directly by hardware at any clock cycle or at any phase of instruction execution. KR-CHIP is useful for beginners of computer hardware to understand how instructions are processed inside a CPU. |
PDF file |
Title | A Trial of Electric Chemical Degradation Process Simulation for Lead-acid Batteries |
Author | *Daiki Imai, Masahiro Fukui (Ritsumeikan University, Japan), Keiichi Hasegawa (Plan Be, Japan) |
Page | pp. 190 - 191 |
Keyword | Battery Management, Simulation, Optimization, Lead-acid Battery |
Abstract | A trial of computer simulation for degradation of lead-acid battery is examined by the concepts of reaction distance. The recovery rate depends on the time of charge after discharge, the reaction distance, and the particle diameter of PbSO4 salts. |
Title | Register Minimization in Double Modular Redundancy Design with Soft Error Correction by Replay |
Author | *Yuya Kitazawa (Saitama University, Japan), Shinichi Nishizawa (Fukuoka University, Japan), Kazuhito Ito (Saitama University, Japan) |
Page | pp. 192 - 197 |
Keyword | Double modular redundancy, soft error, register minimization |
Abstract | Double modular redundancy (DMR) is to execute an operation twice and detect soft error by comparing the duplicated operation results. The soft error is corrected by executing necessary operations again, called replay. The replay requires error-free input data and registers are needed to store such necessary error-free data. In this paper, a method to minimize the required number of registers is proposed where replay intervals are appropriately selected so as not to increase the register requirement. The experimental results show up to 27% reduction of required registers. |
PDF file |
Title | Comparison of Diagnostic Performance Metrics for Test Point Selection in Analog Circuits |
Author | *Koutaro Hachiya (Teikyo Heisei University, Japan), Atshushi Kurokawa (Hirosaki University, Japan) |
Page | pp. 198 - 203 |
Keyword | Analog Test, Diagnostic Performance Metric, 3D-IC, Through Silicon Via |
Abstract | Diagnostic performance metrics proposed in literature for finding measurement points in analog circuits are compared in terms of four properties: related to test metrics, sensitivity, symmetric and parametric. According to the comparison result, the guideline for metrics selection is proposed. As a case study, the metrics are applied to finding measurement points to detect open defects of through silicon vias in power distribution networks of 3D-ICs. |
PDF file |
Title | A 12-bit 500-kS/s SAR ADC with Reconfigurable Mismatch Tolerance |
Author | *Yu-Hsiang Nien, Tsung-Heng Tsai (National Chung Cheng University, Taiwan) |
Page | pp. 204 - 207 |
Keyword | SAR ADC |
Abstract | This paper presents an energy-efficient 12-bit 500-kS/s SAR ADC with reconfigurable mismatch tolerance for high-resolution wearable biomedical sensor networks. Switching-back is used to create a tolerance range of 1/4Vref per bit. Reconfigurable mismatch tolerance (RTM) is assigned for each bit independently to compensate process variations. In this work, the unit capacitance is 1 fF. This SAR ADC consumes 39.5 μW at 500-kS/s under a 1 V supply in 65 nm CMOS process. It achieves a signal-to-noise and distortion ratio of 64.79 dB. The effective number of bits (ENOB) is 10.4 bits, resulting in figure of merit of 55.6 fJ/conversion-step. The implemented prototype occupies an active area of 0.178 mm2. |
Title | High-level synthesis code optimization with loop fusion based on LLVM/Polly |
Author | *Yuta Hiyama, Takayuki Todokoro, Kenshu Seto (Tokyo City University, Japan), Masato Tatsuoka (Socionext Inc., Japan Advanced Institute of Science and Technology, Japan), Yoshihito Nishida (Socionext Inc., Japan), Mineo Kaneko (Japan Advanced Institute of Science and Technology, Japan) |
Page | pp. 208 - 213 |
Keyword | Loop fusion, Polyhedral model, High-level synthesis, LLVM, Polly |
Abstract | Loop fusion is an effective loop optimization for high-level synthesis. Loop fusion can be performed automatically with an LLVM-based polyhedral compiler called Polly. However, Polly's loop fusion algorithm may output a loop structure unsuitable for high-level synthesis. We implemented an algorithm that uses Polly to output a loop structure suitable for high-level synthesis. The proposed method reduced the average number of execution cycles for high-level synthesis by 33.4% compared to that before loop fusion. |
Title | Ultra Low Current Measurement with On-chip High Resistance of MOSFET Array |
Author | *Xinghuai Zhang, Daishi Isogai, Takaaki Shirakawa, Shigetoshi Nakatake (The University of Kitakyushu, Japan) |
Page | pp. 214 - 217 |
Keyword | On-chip High Resistance, Ultra Low Current, Sensor |
Abstract | We propose on-chip high resistance using MOSFET array. We adopt the potentiostat method as an electrochemical sensing to measure ultra low current being aware of biosensing and implant sensing. The sensor circuit includes a high resistance array which is configured by connecting unit resistors in series and parallel. We verify the DC characteristics, the area, and the temperature characteristics of the resistor array by the SPICE simulation, then demonstrate the promising result compared with the conventional Poly resistance |
PDF file |
Title | A Note on Optimization Algorithms for FF/Latch-Based High-Level Synthesis |
Author | *Keisuke Inoue (International College of Technology, Kanazawa, Japan) |
Page | pp. 218 - 222 |
Keyword | high-level synthesis, latch |
Abstract | This paper presents a new design framework for register-transfer-level data-paths. The conventional D-flip-flop-based register (D-REG) is very practical, since the designers can concentrate only on the timing constraints between registers. However, with the development of deep sub-micron technology and the increase in the data length, the D-REG hardware cost is becoming relatively larger than the other hardware resources. Thus, latch-based design methods have been proposed as alternatives to D-REG-based design methods, since the latch-based register has smaller hardware cost than D-REG. A disadvantage of the conventional latch-based architecture is the increase in the hardware resources. As a result, the total register cost cannot be fully reduced. We propose a new design framework, a kind of level-triggered latch design, in which a D-REG is replaced by a pair of latch-based registers: a master latch-based register (M-REG) and a slave latch-based register (S-REG). |
PDF file |
Title | FPGA Implementation for WDF-Based Analog Emulator with Complicated Topology |
Author | Hsin-Ju Hsu (National Chiao Tung University, Taiwan), Ji-Xuan Tsai, Meng-Lin Li (National Central University, Taiwan), *Chien-Nan Liu (National Chiao Tung University, Taiwan), Jing-Yang Jou (National Central University, Taiwan) |
Page | pp. 223 - 226 |
Keyword | WDF, analog emulation, FPGA, system verification |
Abstract | System verification is still a big challenge for system-on-chip (SoC) designs with AMS circuits. Wave digital filter (WDF)-based approach is a possible solution to emulate analog circuits in existing FPGA with digital circuits. In order to solve the loop problem in WDF structures, a special J-type adaptor was proposed. However, the automatic transformation flow and corresponding FPGA implementation flow with this new J-type adaptor is not discussed in previous papers. Therefore, this paper focuses on the hardware implementation issues for WDF-based analog emulators with J-type adaptor. The FPGA results on several circuits with nonlinear elements have demonstrated the effectiveness and feasibility of the proposed solution for supporting various circuit types on an FPGA-based platform. |
Title | Binary Synthesis from RISC-V Executables |
Author | *Shoki Hamana, Nagisa Ishiura (Kwansei Gakuin University, Japan) |
Page | pp. 227 - 228 |
Keyword | high-level synthesis, binary synthesis, RISC-V |
Abstract | This paper presents an implementation of a binary synthesizer which converts a given executable binary code of RISC-V into hardware functionally equivalent to a RISC-V core executing the code. A CPU core and an instruction memory are replaced by the synthesized hardware, which reduces execution time and hardware size for small scale programs. A given binary code is disassembled and parsed to build a control dataflow graph (CDFG), then traditional high-level synthesis techniques are applied to generate RT level Verilog HDL. For a small example program consisting of 34 through 160 instructions, synthesized hardware on Xilinx FPGA Artix-7 took about 74.5% less cycles than on RISC-V Rocket core, with smaller number of LUTs. |
PDF file |
Title | Detection of Vulnerability Guard Elimination by Compiler Optimization Based on Binary Code Comparison |
Author | *Yuka Azuma, Nagisa Ishiura (Kwansei Gakuin University, Japan) |
Page | pp. 229 - 230 |
Keyword | software security, compiler optimization, undefined behavior, binary comparison, buffer overflow |
Abstract | It is known that guards against vulnerabilities in C programs might be eliminated by compiler optimization if they are not written properly. This paper proposes a method to detect such flaws in software by binary code comparison. Given a source code, a pair of binary codes are generated, one with standard optimization and the other with problematic optimization suppressed. Since simple comparison of the binary codes end up with an unacceptable amount of false positives, call instructions in each function are collated to detect discrepancies. In a preliminary experiment on 7 programs, our method successfully detected 2 instances of guard losses with only one false positive. |
PDF file |
Title | A Stable Equivalent Circuit Identification Algorithm for Li ion Batteries |
Author | *Lei Lin, Masahiro Fukui (Ritsumeikan University, Japan) |
Page | pp. 231 - 236 |
Keyword | SOC, Estimation, Parameter, RLS, EKF |
Abstract | This paper discusses the equivalent circuit parameter and state synchronous estimation method for Li-ion battery. In the conventional method, accuracy and stability are hard to improve. In order to solve this problem, we proposed a solution of the equivalent circuit parameter and state synchronous estimation with feedback. In this paper, we will introduce to the effectiveness of this solution through experiments. |
Title | An Intravesical Urine Volume Sensor Robust to Body Posture and Movement |
Author | *Ryousuke Sakai, Shigetoshi Nakatake (The University of Kitakyushu, Japan) |
Page | pp. 237 - 238 |
Keyword | Biomedeical sensor, AC impedance method, Interavesical urine volume, IoT device |
Abstract | In this work, in order to prevent urinary incontinence, we aim to estimate the urination condition from the body water amount in the vicinity of the bladder. Our sensor has a good robustness to body posture and movement by applying the AC impedance method to the bladder. We implement an impedance-based prototype system and experiment to estimate intravesical urine volume. As a result, we are confirmed that the impedance value decreased according to time after drinking water. In addition, we compare the measurement results with the commercial ultrasonic monitoring system and verify the robustness of our proposed system to body posture and movement. |
Title | Test Pattern Generation for Timing Faults in Rapid Single-Flux-Quantum Circuits |
Author | *Kazuyoshi Takagi (Mie University, Japan), Mikihiro Ono (Kyoto University, Japan), Nobutaka Kito (Chukyo University, Japan), Naofumi Takagi (Kyoto University, Japan) |
Page | pp. 239 - 243 |
Keyword | Superconducting RSFQ circuits, test pattern generation, timing faults, fault detection, fault diagnosis |
Abstract | A new fault model and test pattern generation methods considering characteristics of superconducting Rapid Single-Flux-Quantum (RSFQ) logic circuits are presented. We define a timing fault model for RSFQ circuits by focusing on the order of pulse arrivals at each clocked logic gate. Subject to the fault model, we propose test pattern generation methods for fault detection and fault diagnosis of RSFQ circuits. |
Title | Incremental Approaches for Locating Design Errors: Averaging EPI-Groups and Generating Additional Input Patterns |
Author | *Shogo Ohmura, Hiroshi Nakano, Nobutaka Kuroki (Kobe University, Japan), Tetsuya Hirose (Osaka University, Japan), Masahiro Numa (Kobe University, Japan) |
Page | pp. 244 - 249 |
Keyword | error diagnosis, ECO, PLEM, EPI |
Abstract | This paper presents two kinds of incremental approaches for locating design errors: averaging EPI-groups and generating additional input patterns to reduce EPI values used for extraction of error location sets in order to shorten the processing time. The experimental results have shown that the proposed techniques are effective to reduce the number of initial error location sets by 96.8% or more, and to shorten the processing time by 86.6% or more. |
Title | (Invited Talk) IoT for Enabling Precision Medicine |
Author | *Pai H. Chou (National Tsing Hua University, Taiwan) |
Page | p. 250 |
Keyword | IoT |
Abstract | IoT technologies have the potential of revolutionizing medicine by enabling precision diagnostics and treatment. Medical misdiagnoses are frequently caused by over-reliance on patients' biased recall and by measurement limited to the clinical settings. Doctors also have little control over follow-up treatment prescribed for outside the clinic. These limitations can be overcome by a combination of wearable medical and non-medical IoT devices that produce objective, unbiased data from or around the patient. This talk presents a number of case studies on the design of such IoT devices to enable precision medicine, including cardiovascular and pulmonary applications. |
Title | A Case Study on Design of Approximate Multipliers for MNIST CNN |
Author | *Kenta Shirane, Takahiro Yamamoto, Hiroyuki Tomiyama (Ritsumeikan University, Japan) |
Page | pp. 251 - 255 |
Keyword | Approximate Computing, Approximate Multiplier, CNN, MNIST |
Abstract | In this paper, we present a case study on approximate multipliers for MNIST CNN. We apply approximate multipliers with different bit-width to the convolution layer in MNIST CNN, evaluate the accuracy of MNIST recognition, and analyze the trade-off between approximate multiplier’s area, critical path delay and the accuracy. We further reduce area and delay of the multipliers with keeping high accuracy in MNIST CNN. |
Title | A Layout Design Method of QCA without Fixing Data Flow |
Author | *Kazuki Morita, Wakaki Hattori, Shigeru Yamashita (Graduate School of Science and Engineering, Ritsumeikan University, Japan) |
Page | pp. 256 - 261 |
Keyword | Quantum-dot Cellular Automata, clocking scheme, Field-Coupled Nanotechnology |
Abstract | Quantum-dot Cellular Automata (QCA) is a promising nanotechnology with ultra-low power consumption and high clock rates. Thus, QCA overcomes the physical limitation of conventional technologies like CMOS and it is an alternative technology to maintain Moore's law. Pre-planned zone clocking schemes are proposed in order to facilitate a design of a QCA circuit. In a QCA circuit designed with a pre-planned zone clocking scheme, data flows are predetermined; it leads to an increase of a circuit area. To solve this problem, this paper proposes a new approach to fnd an efficient data flow for a circuit. Experimental results show the usefulness of the proposed method. |
Title | An Error Diagnosis Technique Using ZDD to Extract Error Location Sets |
Author | *Hiroshi Nakano, Shogo Ohmura, Nobutaka Kuroki (Kobe University, Japan), Tetsuya Hirose (Osaka University, Japan), Masahiro Numa (Kobe University, Japan) |
Page | pp. 262 - 267 |
Keyword | error diagnosis, ZDD, ECO |
Abstract | This paper presents an error diagnosis technique using ZDD (zero-suppressed binary decision diagram) to extract error location sets. A ZDD represents error location sets implicitly, which reduces processing time to extract them. Experimental results have shown that the proposed technique reduces the processing time by 92.4% in average, and the proposed variable ordering technique is effective to reduce ZDD node counts by 86.5% for large circuits. |
Title | Performance Improvements for Block-Flushing |
Author | *Bao Yifang (Graduate School of Science and Engineering, Ritsumeikan University, Japan), Bing Li (Technical University of Munich, Germany), Tsung-Yi Ho (National Tsing Hua University, Taiwan), Shigeru Yamashita (Graduate School of Science and Engineering, Ritsumeikan University, Japan) |
Page | pp. 268 - 269 |
Keyword | Block-Flushing, Path Changing, PMD |
Abstract | During execution of the execution of multiple bioassays, some areas on Programmable Microfluidic Devices (PMDs) become contaminated and must be cleaned by washing them with a buffer flow before they are reused. There have been proposed an efficient method for washing called “Block-Flushing.” We show that Block-flushing can make the cleaning work complicated in specific cases and then we propose an improvement of Block-Flushing to alleviate the situation by adjusting flushing paths carefully. |
Title | A Proposal of Application Specific Approach with RISC-V Processor on FPGA |
Author | *Tetsuo Miyauchi, Kiyofumi Tanaka (Japan Advanced Institute of Science and Technology, Japan) |
Page | pp. 270 - 273 |
Keyword | RISC-V, FPGA, Processor, Adapting |
Abstract | Currently, the number of IoT(Internet of Things) devices is increasing. In IoT devices, small footprint is desirable. RISC-V is an open processor architecture, which is becoming popular for IoT devices. We implemented RISC-V soft processor core, of which instruction set is RV32IM (base implementation and multiple/division in 32 bit registers), on an FPGA with 5-stage pipeline. In this paper, we propose a method for reducing hardware resources by adapting the processor core to an application program. We show our approach can reduce necessary FPGA resources to 14.8% (Rijndael) and 14.4% (Matrix) of the full processor core implementation. |
PDF file |
Title | A Study on the Optimization of Asynchronous Circuits During RTL Conversion from Synchronous Circuits |
Author | *Shogo Semba, Hiroshi Saito (The University of Aizu, Japan) |
Page | pp. 274 - 279 |
Keyword | asynchronous circuits, RTL design, optimization |
Abstract | In this paper, we propose three optimization methods for asynchronous circuits during the Register Transfer Level (RTL) conversion from synchronous RTL models. The modularization of datapath resources and the restriction of the use of D flip-flops reduce the circuit area while fixing the control signal of the multiplexers reduces the dynamic power consumption. In the experiment, we evaluated the effect of the three optimization methods. The combination of the three optimization methods could reduce the energy consumption 24.6% in the case of a differential equation solver and 12.6% in the case of a tiny encryption algorithm compared to the ones without the proposed optimization methods. |
PDF file |
Title | Effect of Reducing the Bit Length of LFSRs for SC |
Author | *Yudai Sakamoto, Shigeru Yamashita (Graduate School of Science and Engineering, Ritsumeikan University, Japan) |
Page | pp. 280 - 285 |
Keyword | Stochastic Computing, Stochastic Number, linear-feedback shift register (LFSR) |
Abstract | Stochastic Computing (SC) is an approximation method to calculate functions by using Stochastic Numbers (SNs) which are generated by a linear-feedback shift register (LFSR) and a comparator in general. In this paper, we propose a method to reduce the bit length of LFSRs, and then we verify the errors of the proposed method. We provide some experimental results by which we can confirm that our proposed scheme is very useful. |
Title | Design of Asynchronous Circuits on Commercial FPGAs Using Placement Constraints |
Author | *Tatsuki Otake, Hiroshi Saito (The University of Aizu, Japan) |
Page | pp. 286 - 291 |
Keyword | asynchronous circuits, FPGA, placement constraints |
Abstract | In this paper, we propose a design method to implement asynchronous circuits with bundled-data implementation on commercial Field Programmable Gate Arrays (FPGAs) using placement constraints. Using the proposed method, we can obtain the asynchronous circuits whose performance is close to and the energy consumption is smaller (21.3% reduction on average) than the synchronous counterpart with a fewer delay adjustment. |
PDF file |
Title | Parallelizing SAT-based Coverage-Driven Design Verification |
Author | *Kiyoharu Hamaguchi (Shimane University, Japan) |
Page | pp. 292 - 295 |
Keyword | design verification, SAT solver, coverage-driven verification, automated testbench |
Abstract | We show results on parallelization of automated coverage-driven verification. In our prior work, we have shown an approach which combines random simulation with input pattern generation using a SAT solver. Experimental results show that the parallelization is promising for achieving higher coverage. |
Title | Quantitative Performance Comparison of Asynchronous and Synchronous Comparators |
Author | *Kyota Akimoto, Toshiki Kanamoto, Atsushi Kurokawa, Masashi Imai (Hirosaki University, Japan) |
Page | pp. 296 - 297 |
Keyword | asynchronous circuit, comparator, bundled-data, average performance, hardware merge sorter |
Abstract | Asynchronous circuits which can achieve average performance thanks to request-and-acknowledge handshaking protocols have a great potential to improve speed performance compared to synchronous circuits. In this paper, we propose a performance efficient circuit structure of a comparator for hardware merge sorters. We evaluate the proposed circuit and its counterpart synchronous circuit using 130nm process technology. As a result, the proposed asynchronous comparator can achieve higher performance than synchronous circuits according to input data. |
Title | Wire Load Model for Rapid Power Consumption Evaluation in Early Design Stage of Via-Switch FPGA |
Author | *Asuka Natsuhara, Takashi Imagawa, Hiroyuki Ochi (Ritsumeikan University, Japan) |
Page | pp. 298 - 303 |
Keyword | atom switch, reconfigurable architecture, power estimation |
Abstract | This paper proposes a wire load model for via-switch FPGA to allow simulation-based power estimation before routing. Via-switch FPGA is expected to achieve a dramatic improvement in the area, delay, and power compared with conventional SRAM-based FPGA. To estimate the power consumption of an application circuit mapped on a via-switch FPGA, time-consuming routing process was needed before circuit simulation. Using the proposed post-placement simulation flow, runtime for power estimation is reduced by 63.8% on average compared with the conventional post-routing simulation flow, with 11.8% degradation of estimation error on average. |
PDF file |
Title | Clock Tree Modification for Circuits with Programmable Delay Elements |
Author | *Kota Muroi, Yukihide Kohira (The University of Aizu, Japan) |
Page | pp. 304 - 309 |
Keyword | post-silicon delay tuning, programmable delay element, clock tree modification |
Abstract | In this paper, a clock tree modification method for circuits with programmable delay elements (PDEs) is proposed. Since the clock tree is designed without taking PSDT into consideration in existing methods, it may not be suitable for post-silicon delay tuning (PSDT). Our proposed method modifies the clock tree to improve yield in PSDT. Moreover, we propose a design flow for circuits with PDEs so that the design time be shortened and it can be applied to large circuits. |
Title | A Study on Updating Spins in Ising Model to Solve Combinatorial Optimization Problems |
Author | *Yuki Naito, Kunihiro Fujiyoshi (Tokyo University of Agriculture and Technology, Japan) |
Page | pp. 310 - 315 |
Keyword | Ising model, Ising computer, combinatorial optimization problem, traveling salesman problem |
Abstract | Ising model, which consists of spins and interactions of them, is a novel way to solve combinatorial optimization problems, for example, LSI layout problem. The problem is solved by updating the spins stochastically after being mapped to the model. Spins can be updated simultaneously on hardware. However, the problems aren’t solved fast since two spins with interaction should not be updated simultaneously. In this paper, we give a guideline of updating the spins simultaneously to execute high-speed search and confirm it through experiments. |
PDF file |
Title | A Fast Hotspot Detector Based on Local Features Using Concentric Circle Area Sampling |
Author | *Hidekazu Takahashi, Shimpei Sato, Atsushi Takahashi (Department of Information and Communications Engineering, Tokyo Institute of Technology, Japan) |
Page | pp. 316 - 321 |
Keyword | Design for Manufacturability, Lithography Hotspot Detection, Machine Learning |
Abstract | With the development of technology nodes, a defective circuit pattern has occurred on a chip. Areas, which may cause defects such as opens/shorts, are called hotspots. In this paper, we propose the hotspot detector based on the probability distribution of feature vectors. Experimental results show that our proposed method achieves 98% accuracy while False Positive Rate is less than 1%, and its computation is 8 times faster than conventional machine learning based methods on ICCAD2012 benchmark suite. |
PDF file |
Title | ROAD: A Novel Approach for Improving Reliability of Multi-core Systems— How Asymmetric Aging Can Lead a Way |
Author | Yu-Guang Chen (National Central University, Taiwan), Jian-Ting Ke (National Cheng Kung University, Taiwan), *Shu-Ting Cheng (Yuan Ze University, Taiwan), Ing-Chao Lin (National Cheng Kung University, Taiwan) |
Page | pp. 322 - 323 |
Keyword | Asymmetric Aging, NBTI, multi-core system |
Abstract | Negative-Bias Temperature Instability (NBTI) has become one of the most drastic reliability threats in modern IC designs. To tolerance NBTI on multi-core systems, previous researchers have proposed various task assignment and/or dynamic voltage frequency scaling algorithms. Most of the proposed methods maintain all cores in the multi-core system under similar aging conditions (symmetric aging). Although these methods can mitigate NBTI, the symmetric aging may reduce the lifetime of a multi-core system. If a critical task (i.e., a task with a tight timing constraint) arrives when the system has already operated for years, it is possible that none of the equivalently aged cores can complete the critical task within its timing constraints. This unavoidable timing failure then will shorten the lifetime of the system. With the above observation, this paper proposes a novel reliability improvement framework which realize the concept of asymmetric aging by task graph Retiming, task Ordering, task Assignment under asymmetric aging, and Dynamic voltage selection (ROAD) for multi-core systems. Experimental results show that our approach can significantly increase the system lifetime with no or insignificant energy overhead. |
Title | A Tuning-Free Reservoir of MOSFET Crossbar Array for Inexpensive Hardware Realization of Echo State Network |
Author | *Yuki Kume, Masayuki Hiromoto, Takashi Sato (Graduate School of Informatics, Kyoto University, Japan) |
Page | pp. 324 - 329 |
Keyword | recurrent neural network, echo state network, reservoir computing, hardware implementation, weight tuning |
Abstract | Echo state network (ESN) is a class of recurrent neural network, which drastically reduces training time by the use of a reservoir, a random and fixed network as the input and middle layers. In this paper, we propose a hardware implementation of ESN that uses inexpensive MOSFET-based reservoir. As opposed to existing reservoirs that require post tuning of weights for stability improvement, our ESN requires no post parameter tuning. For that purpose, we extend the circular law of random matrix for the sparse reservoirs so that a fixed feedback gain can be determined. Through the evaluations using Mackey-Glass time-series dataset, the proposed ESN achieved stable and successful inference without post parameter tuning. |
Title | Estimation of NBTI-Induced Timing Degradation Considering Duty Ratio |
Author | *Kunihiro Oshima, Song Bian, Takashi Sato (Graduate School of Informatics, Kyoto University, Japan) |
Page | pp. 330 - 335 |
Keyword | Negative bias temperature instability, timing degradation sensor, critical path, duty ratio |
Abstract | We propose a novel estimation method for NBTI-induced timing degradation that takes the duty ratios of the input signals into account. In the proposed method, the signal propagation delay is evaluated with the proposed replica sensor circuit. With evaluations of the threshold voltage degradation model, delays of critical path candidates are estimated. The simulation results show that the proposed method can reduce the estimation error of critical path delay by 63 % compared to the delay estimation without duty consideration. |
Title | Polygon Fracture Method Considering Maximum Shot Size for Variable Shaped-Beam Mask Writing |
Author | Mitsuru Hasegawa, *Kunihiro Fujiyoshi (Tokyo University of Agriculture and Technology, Japan) |
Page | pp. 336 - 340 |
Keyword | polygon fracturing, EB writing, mask data, dynamic programming |
Abstract | Since variable shaped-beam mask writing machines for LSI mask production can expose a rectangle shaped-beam, we need to partition rectilinear polygons in layout into a set of rectangles of the minimum number with considering size limit. In this paper, we propose a new fracturing method for convex rectilinear polygons using dynamic programming, which cuts each polygon by slice-lines through concave vertices firstly. The proposed method can solve the problem in polynomial time. Computer experiments confirm the space and time complexity of the method. |
Title | (Invited Talk) Design and Demonstration of Superconducting Single Flux Quantum Circuits Operating around 50 GHz |
Author | *Akira Fujimaki (Nagoya University, Japan) |
Page | pp. 341 - 342 |
Keyword | Superconductor |
Abstract | We have been developing very high-speed digital processors including microprocessors based on the superconducting single flux quantum (SFQ) circuit. So far, we have successfully executed programs stored in an embedded memory at 50 GHz in a bit-serial microprocessor and demonstrated an 8-bit-parallel arithmetic logic unit at 50 GHz. These SFQ circuits show extremely high energy-efficiency and high performance compared to semiconductor circuits even if the cooling penalty for superconducting circuits is considered. The SFQ circuit is classified into the pulse logic, in which binary signals ‘1’ and ‘0’ are defined as the presence and absence of a signal impulse between two consecutive clock signals, respectively. The pulse logic is released from the processes of charge-up and discharge for capacitors or inductors, which leads to the features of high-speed operation and low power consumption. The impulses of the SFQ circuits referred to as the SFQ pulse have typical pulse width of 4 ps and pulse height of sub mV. The pulses corresponding to signals and clocks can propagate along superconducting passive transmission lines (PTLs) of the strip line/micro strip line structures at the speed of light with very small distortion, while transmitters and receivers made up of a few Josephson junctions are needed. Special care is required to be paid for designing SFQ circuits, because all the logic gates need clock signals. The setup- and hold-times for logic gates are 4 ps at most, and the accumulated time jitter of signals traveling in very long transmission lines reaches 1 ps. This means that the effective time window of signals to the two consecutive clock signals becomes 10 ps for 50-GHz-operation. Considering the parallel data lines, the timing of the signals including clocks arriving at the logic gates are required to be controlled in a pico-second order. We have been building the top-down design method for SFQ large-scale integrated circuits based on the cell library, in which the bias-voltage-dependent timing parameters such as delay, setup time, hold time are registered for all the logic gates and interconnects. |