|
|
|||||||||||||||||||||
Monday, October 21, 2013 |
Title | Scaling the Many-Memory Wall for Many-Core Architectures |
Author | *Nikil Dutt (University of California, Irvine, U.S.A.) |
Page | p. 1 |
Abstract | The move towards many-core architectures creates an inherent demand for high memory bandwidth, which in turn results in the need for vast amounts of on-chip memory space. On the other hand, many-core architectures have many (distributed) on-chip memories with limited capacities, resulting in a “many-memory wall”. While efforts such as 3D stacking and smarter memory controllers try to alleviate the off-chip memory access problem, there is still a pressing need to carefully provision the limited on-chip memory budget to meet application needs. For on-chip memories, embedded systems often use both software controlled memories (e.g., scratchpad memories) and hardware-controlled memories (e.g., caches), with each having their pros and cons. Efficient on-chip memory management is extremely critical as it has a great impact on the system’s power consumption and throughput. Traditional memory hierarchies primarily consist of SRAM-based on-chip caches. However, with the emergence of non-volatile memories (NVMs) and mixed-criticality systems, we expect to see heterogeneous on-chip memory hierarchies, not only in type (cache vs. scratchpad) but also in technology (e.g., SRAM vs. NVM). This talk will survey the state of the art in memory subsystems for many-core platforms, and present strategies for efficiently managing software-controlled memories in the many-core domain, while addressing emerging challenges faced by designers. I will also propose a holistic software/hardware solution to the problem of scaling the memory wall for many-core architectures. |
PDF file |
Title | A Novel Fast and Accurate Hot Spot Detection Method with Prüfer Code Layout Encoding |
Author | *Hong-Yan Su, Chieh-Chu Chen, Yih-Lang Li (National Chiao Tung University, Taiwan), An-Chun Tu, Chuh-Jen Wu, Chen-Ming Huang (Taiwan Semiconductor Manufacturing Company, Taiwan) |
Page | pp. 2 - 7 |
Keyword | Design for manufacturability, process hotspot, pattern matching, centerline, Prüfer Encoding |
Abstract | As design-for-manufacturability techniques have become widely used to improve the yield of nano-scale semiconductor technology in recent years, hot-spot detection methods have been investigated with a view to calibrating layout patterns that tend to reduce yield. In this work, we propose two graph models, i.e., skeleton graph and space graph, to formulate polygon topology and spatial relationship among polygons. In addition, a Prüfer Encoding based method is presented to encode each skeleton graph. Single polygon matching problem is then equivalent to the verification of graph isomorphism, which is realized by checking the identity of two correspond-ing enhanced Prüfer codes. A branch-and-bound based pattern anchoring algorithm is presented to resolve the vertex ordering problem for isomorphism checking. Finally, the general exact pattern matching problem can be accom-plished by adopting the space graph to identify the similarity of spatial rela-tionship among polygons. Experimental results show that we can achieve 5.6x runtime speedup than design-rule-based methodology in average. |
Title | Implementation of Protocol Independent Control-Intensive Design in High-Level Synthesis |
Author | Tung-Hua Yeh, Jen-Chieh Yeh (Industrial Technology Research Institute, Taiwan), *Qiang Zhu (Cadence Design Systems, Japan) |
Page | pp. 8 - 12 |
Keyword | design experiences, high-level synthesis |
Abstract | High-level synthesis (HLS) has previously been applied to a variety of datapath-dominated and algorithmic designs achieving a comparable quality of result (QoR) with hand-edited RTL designs. However the capability and the applicability of HLS to control-intensive designs were always challenging. In this paper we present an efficient strategy to abstract control-intensive designs to which HLS technologies can efficiently be applied. The design using this strategy not only achieves good QoR, but also improves design reusability and productivity. We demonstrated two control-intensive designs: a Direct Memory Access (DMA) controller and a NAND flash controller, which resulted in a 3X design productivity improvement compared to traditional RTL design methodology, while maintaining comparable design quality to hand-edited RTL designs. |
Title | Evaluation of On-Chip Decoupling Capacitor's Effect on AES Cryptographic Circuit |
Author | *Tsunato Nakai, Mitsuru Shiozaki, Takaya Kubota, Takeshi Fujino (Ritsumeikan University, Japan) |
Page | pp. 13 - 18 |
Keyword | Side-channel attack, Electromagnetic analysis, ASIC semi-custom design, Cryptographic circuit, On-chip capacitor |
Abstract | Power Analysis (PA) attack and Electromagnetic Analysis (EMA) attack reveal a secret key on cryptographic circuits by measuring power variation and electromagnetic radiation during cryptographic operations, respectively. Inserting decoupling capacitors reduces PA leak; however, a resistance against EMA attack is not well-known. We fabricated Advanced Encryption Standard (AES) cryptographic chips with and without on-chip decoupling capacitors, and evaluated the resistance against PA and EMA attack. This paper presents on-chip decoupling capacitors make vulnerable to the EMA attack using Hamming-weight model. |
PDF file |
Title | A Real-Time Peak Load Shaving with Error Compensation of Residential Load/PV Power Generation Forecasting |
Author | *Hide Nishihara, Ittetsu Taniguchi, Masahiro Fukui (Ritsumeikan University, Japan) |
Page | pp. 19 - 24 |
Keyword | Peak-Shaving, Smart Grid |
Abstract | This paper proposes a real-time peak load shaving with error compensation of residential load/PV power generation forecasting. Various load/generation forecasting techniques have been proposed, but it is impossible to avoid forecasting error completely. This paper supposes a house with photovoltaic (PV) panel and energy storages, and proposes a power distribution method at household level to minimize a peak value of electric power demand. Experimental results show that the proposed method reduces the sum of purchased energy and wasted energy drastically with the same peak-shaving ratio, and the load/generation forecasting error is effectively compensated. |
Title | A Design of CMOS On-Chip Photovoltaic Device and Regulated DC-DC Converter for Micro System |
Author | *Haruki Ono, Kazuki Nomura, Nobuhiko Nakano (Keio University, Japan) |
Page | pp. 25 - 27 |
Keyword | stand-alone micro system, photovoltaic device, bootstrap charge pump |
Abstract | In this paper, we propose electric power system for a stand-alone micro system. The micro system consists of photovoltaic device, voltage boost, ring oscillator, and regulator on a single silicon chip. We designed and measured several types of photovoltaic devices. The maximum output voltage of photovoltaic device is 550mV. The bootstrap charge pump circuit and regulator are designed for this power supply. This power supply outputs more than 1V. It is enough voltage for standard CMOS circuit. |
PDF file |
Title | An Error Diagnosis Technique Using QBF Solver to Fix LUT Functions |
Author | *Naoki Katayama, Hiroyuki Sakamoto, Tetsuya Hirose, Nobutaka Kuroki, Masahiro Numa (Kobe University, Japan) |
Page | pp. 28 - 33 |
Keyword | error diagnosis, ECO, QBF |
Abstract | This paper presents an error diagnosis technique using a QBF (Quantified Boolean Formula) solver to fix LUT functions. Although the conventional SAT-based error diagnosis technique checks equivalence between the given specification and the rectified circuit for every assignment to truth variables with each LUT function, the proposed QBF-based technique obtains all assignments to truth variables for satisfying equivalence at a time. Experimental results have shown that the proposed technique rectifies circuits which were unable to be corrected by the conventional SAT-based technique. |
Title | Energy-Efficient Dynamic Voltage and Frequency Scaling by P/N-Performance Self-Adjustment Using Adaptive Body Bias |
Author | *A.K.M. Mahfuzul Islam, Norihiro Kamae, Tohru Ishihara, Hidetoshi Onodera (Kyoto University, Japan) |
Page | pp. 34 - 39 |
Keyword | DVFS, Energy Efficiency, Process Variation, Adaptive Body Bias |
Abstract | Dynamic voltage and frequency scaling (DVFS) is a promising technique to improve energy efficiency for heterogeneous systems where work load varies with time. This paper addresses the effects of process variation on the energy efficiency for wide voltage range DVFS and proposes the use of P/N-performance self-adjustment scheme to enable typical-case design. Simulation results show that energy efficiency can be improved by more than 100% with the proposed technique compare to the conventional worst-case design methodology for a 65-nm commercial process. |
Title | A Nested Loop Pipelining in C Descriptions for System LSI Design |
Author | *Masahiro Nambu, Takashi Kambe (Kinki University, Japan), Shuji Tsukiyama (Chuo University, Japan) |
Page | pp. 40 - 43 |
Keyword | nested loop, pipelining, C based design, high level synthesis, BACH sytem |
Abstract | Behavioral synthesis from C language is now a key technology of system LSI design. Since large streaming data are usually processed by nested loops in behavioral description of system LSI, it is important to synthesize a circuit which can process such data efficiently. Nested loop pipelining is a useful implementation technique of the description to synthesize a circuit such that both computational throughput and hardware utilization are maximized. In this paper, we propose an algorithm for nested loop pipelining, which can produce pipeline stages with different processing times. We show two practical experimental results in order to demonstrate the performance of the proposed algorithm. |
PDF file |
Title | General Position-Based Weighted Round-Robin Arbitration for Arbitrary Traffic Patterns |
Author | *Hanmin Park, Kiyoung Choi (Seoul National University, Republic of Korea) |
Page | pp. 44 - 49 |
Keyword | Network-on-Chip, Weighted Round-Robin, Fair Arbitration, Equality of Service |
Abstract | This paper presents the position-based weighted round-robin arbitration for equality of service in many-core network-on-chips employing a deterministic routing algorithm. We concentrate on the network saturation induced by arbitrary traffic patterns. It exploits the deterministic properties of the network to achieve global fairness of service provided to each node. The weights for input arbitration can be adjusted to make the network better adapted to arbitrary traffic patterns. By the adjustment, better equality of service can be achieved with no degradation of the network saturation throughput. |
Title | Memory Management for Dual-Addressing Memory Architecture |
Author | *Ting-Wei Hong, Yen-Hao Chen, Yi-Yu Liu (Yuan Ze University, Taiwan) |
Page | pp. 50 - 55 |
Keyword | Dual-addressing memory, 2D virtual memory management, Data granularity and indexing |
Abstract | Dual-addressing memory architecture is designed for two-dimensional memory access with both row-major and column-major localities. In this paper, we highlight two memory management issues in dual-addressing memory. First, to avoid the external fragmentation, we propose a virtual dual-addressing memory design to enable memory management via operating system. After that, to deal with the size mismatch between user-defined data and dual-addressing memory, we discuss data arrangement policies for different data granularity. With the proposed memory management techniques, we are capable of maximizing the memory utilization of dual-addressing memory. |
PDF file |
Title | Alpha-Gamma Data Compression Method for Artificial Vision Systems Using Visual Cortex Stimulation |
Author | *Tomoki Sugiura, Arif Ullah Khan, Yoshinori Takeuchi, Masaharu Imai (Osaka University, Japan) |
Page | pp. 56 - 61 |
Keyword | data compression, artificial vision, hybrid organ |
Abstract | In this paper a data compression method for visual cortex stimulation based artificial vision is proposed and evaluated. The proposed method uses run-length encoding to express visual cortex stimulus data in numerical form, in which the numerical data representing ’1’ data and ’0’ data are encoded into binary by alpha encoding and gamma encoding, respectively. From experimental results, the proposed method reduced data size approximately 83% while execution cycles of the proposed method is practically equal to gamma encoding. |
PDF file |
Title | An Efficient Test Pattern Generator -Mersenne Twister- |
Author | Hiroshi Iwata, *Sayaka Satonaka, Ken'ichi Yamaguchi (Nara National College of Technology, Japan) |
Page | pp. 62 - 67 |
Keyword | Mersenne Twister, manufacturing test, pseudo random pattern, built-in self test, fault coverage |
Abstract | Built-in self test (BIST) is an answer for a high reliable manufacturing test with a reasonable cost. In this paper, we supposed that the Mersenne Twister is used as the test pattern generator instead of the LFSR to implement BIST into VLSIs. Experimental results show that the test patterns generated through the Mersenne Twister are efficient with respect to the fault coverage and it is implemented with a comparable cost to the LFSR. |
PDF file |
Title | Power Optimization of a Micro-Controller with Silicon on Thin Buried Oxide |
Author | *Kuniaki Kitamori, Hongliang Su, Hideharu Amano (Keio University, Japan) |
Page | pp. 68 - 73 |
Keyword | SOTB, V850E-Star, low power consumption |
Abstract | Nowadays, from battery supplied mobile devices to supercomputers, reducing the power consumption has become a serious design issue. Although using low power supply is the most efficient way to reduce the power, it also increase the leakage power and delay variance. Low-power Electronics Association & Project(LEAP) developed Silicon On Thin Buried Oxide(SOTB) technology to solve those problems. In order to verify the SOTB technology, we have applied to an automotive microcontroller V850E-Star. In this report, we investigate the operational speed and leak power with 40 kinds of reverse bias and forward bias voltages for each purpose: standby, energy maximum and performance maximum. In the standby mode, leak power of the energy maximum mode is reduced by 92%, while it works with 33MHz frequency clock in the energy maximum mode. |
PDF file |
Title | Computer-Aided Design of Electric Vehicle Hybrid Energy Storage System |
Author | Sangyoung Park, Younghyun Kim, *Naehyuck Chang (Seoul National University, Republic of Korea) |
Page | pp. 74 - 75 |
Abstract | Electric vehicles (EV) are considered a strong alternative to internal combustion engine (ICE) vehicles. EVs are considered to have low environmental impact and operating costs. However, studies show that electric vehicle is not a cure-all solution for all the problems and needs careful optimization. Many problems persist including the drive range, high initial cost, and battery degradation of the EVs. These shortcomings are mainly due to cycle efficiency and cycle life constraints of the energy storage system (ESS), which is usually a homogeneous lithium-ion battery bank in commercial vehicles. Such optimization practices have been performed in many ways, which are rather layered optimization. On the other way, a more systematic approach based on computer-aided design gives holistic optimization opportunities. We propose to overcome the hurdles by computer-aided design optimization of hybrid energy storage system (HESS) for EVs. |
Title | Place-and-Route Algorithms for a Reliability-Oriented Coarse-Grained Reconfigurable Architecture Using Time Redundancy |
Author | *Takashi Imagawa, Masayuki Hiromoto (Kyoto University, Japan), Hiroshi Tsutsui (Hokkaido University, Japan), Hiroyuki Ochi (Ritsumeikan University, Japan), Takashi Sato (Kyoto University, Japan) |
Page | pp. 76 - 81 |
Keyword | coarse-grained reconfigurable architecture, reliability, time redundancy, dynamic reconfiguration, place-and-route algorithm |
Abstract | Coarse-grained reconfigurable architectures (CGRAs) are expected to enhance the reliability of LSI systems. The time-redundancy technique can enhance the fault tolerance even under severe circuit area constraints. This paper proposes two place-and-route algorithms for the CGRA that utilizes time-redundancy. The application circuits implemented on the CGRA with these algorithms are different in the performance degradation and hard-error tolerance. The one algorithm can achieve the hard-error tolerance improvement with small performance degradations, the other improves the tolerance largely with large degradations. |
Title | Power Analysis Resistant IP Core Using IO-Masked Dual-Rail ROM for Easy Implementation into Low-Power Area-Efficient Cryptographic LSIs |
Author | *Megumi Shibatani, Mitsuru Shiozaki, Yuki Hashimoto, Takaya Kubota, Takeshi Fujino (Ritsumeikan University, Japan) |
Page | pp. 82 - 87 |
Keyword | cryptographic module, side-channel attack, power analysis, countermeasure circuit, IO-masked dual-rail ROM |
Abstract | Recently, it has been pointed out that power analysis (PA) attacks are a threat to cryptographic circuits which handle confidential information. Our goal of this study is to provide easily implementable cryptographic IP core in small area and low power consumption with PA resistance. We have proposed IO-masked dual-rail ROM scheme and prototyped an advanced encryption standard (AES) circuit using the proposed scheme. This paper presents the evaluated results of chip area, power consumption, and PA resistance. |
PDF file |
Title | Scaling up Size and Number of Expressions in Random Testing of Arithmetic Optimization of C Compilers |
Author | Eriko Nagai (Fujitsu Systems West Limited, Japan), *Atsushi Hashimoto, Nagisa Ishiura (Kwansei Gakuin University, Japan) |
Page | pp. 88 - 93 |
Keyword | compiler, random testing, programming language C |
Abstract | This paper presents an enhanced method of testing validity of arithmetic optimization of C compilers using randomly generated programs. Its bug detection capability is improved over an existing method by 1) generating longer arithmetic expressions and 2) accommodating multiple expressions in test programs. Undefined behavior in long expressions is successfully avoided by modifying problematic subexpressions during computation of expected values for the expressions. An efficient method for minimizing error inducing test programs is also presented, which utilizes binary search. Experimental results show that a random test system based on our method has higher bug detection capability than existing methods; it has detected more bugs than previous method in earlier versions of GCCs and has revealed new bugs in the latest versions of GCCs and LLVMs. |
PDF file |
Title | A Routing Method Using Minimum Cost Flow Algorithm for Routes with Target Wire Lengths |
Author | *Kunihiro Fujiyoshi, Kazuo Yamane (Tokyo University of Agriculture and Technology, Japan) |
Page | pp. 94 - 99 |
Keyword | routing, PCB, error, minimum cost flow algorithm |
Abstract | Due to the increase of operation frequency, influence of routing delays is increasing. So it is important to obtain the routes with the small difference between target wire length and actual wire length. For this purpose, CAFE router which obtains the river routing with small length error using maximum flow was proposed. But, in many cases, the obtained routes have small length error. In this paper, we propose a method using minimum cost flow, which obtains routes with smaller differences. |
Title | Compact Pipeline Hardware Architecture for Pattern Matching on Real-Time Traffic Signs Detection |
Author | *Anh-Tuan Hoang, Mutsumi Omori, Masaharu Yamamoto, Tetsushi Koide (Hiroshima University, Japan) |
Page | pp. 100 - 105 |
Keyword | Traffic signs detection, Pipeline Architecture, Compact Hardware |
Abstract | This paper describes a novel compact hardware oriented algorithm and its conceptual implementation for real-time traffic signs detection system. The limit speed sign area on a grayscale video frame is detected based on a novel, simple and compact rectangle pattern matching and circle detection modules. The limit speed recognition system is divided into two-pipeline stages. The frame is scanned with multi-scan windows in parallel for each position and each scan windows is also processed in pipeline to increase throughput. It achieve 100% in detection rate. |
PDF file |
Title | A Parallel Simulated Annealing Algorithm with Look-Ahead Neighbor Solution Generation |
Author | *Yusuke Ota, Kazuhito Ito (Saitama University, Japan) |
Page | pp. 106 - 111 |
Keyword | simulated annaling, parallel SA, lookahead |
Abstract | Simulated annealing (SA) is a general method to solve combinational optimization problems. SA generates a neighbor solution from a current solution randomly and evaluates the solution with a cost function. If the neighbor solution is better than the current solution, or otherwise stochastically, the neighbor solution is accepted as a new current solution. This process is iterated many times and therefore SA needs long execution time. We propose a fast SA method where some neighbor solutions are generated at a time in a look-ahead manner and evaluated in parallel. To increase the efficiency of the parallelized SA, a method to adaptively generate neighbor solutions is proposed to reduce void solutions not used in a SA chain. |
PDF file |
Title | A 10-Bit Low-Glitch Binary-Weighted Current-Steering DAC |
Author | *Fang-Ting Chou, Chung-Chih Hung (National Chiao Tung University, Taiwan) |
Page | pp. 112 - 113 |
Keyword | DAC, binary-weighted, low glitch |
Abstract | A low-glitch and low-power design for a 10-bit binary-weighted current-steering digital-to-analogue converter (DAC) is presented. Instead of large input buffers, the proposed design uses variable-delay buffers with a compact layout to compensate for delay difference and to reduce high glitch energy significantly, from 7 pVsec to less than 1.5 pVsec. The proposed DAC is capable of high-speed, low-glitch operation without compromising power consumption and chip area. |
Title | Rover II: A Router for Via Configurable Structured ASIC with Standard Cells and IPs |
Author | Chiung-Chih Ho, Hsin-Pei Tsai, *Rung-Bin Lin (Yuan Ze University, Taiwan) |
Page | pp. 114 - 117 |
Keyword | Structured ASIC, Router, Regular fabric, IP |
Abstract | This article presents a router, called Rover II, for via-configurable structured ASIC. Rover II extends the work of Rover to handle IPs and incorporate a porting of NTHU-route 2.0 and NCTU-GR global routers. Experimental results show that Rover II can successfully route a via-configurable structured ASIC with standard cells and IPs under different routing fabrics. The results also show that the global router in Rover is as good as the state-of-art global routers such as NTHU2.0 and NCTU-GR. |
Title | A Compact and Energy-Efficient Muller C-Element for Low-Voltage Asynchronous CMOS Digital Circuits |
Author | *Yuzuru Shizuku, Tetsuya Hirose, Yuya Danno, Nobutaka Kuroki, Masahiro Numa (Kobe University, Japan) |
Page | pp. 118 - 122 |
Keyword | low supply voltage, asynchronous circuit, Muller-C-element, energy-efficient |
Abstract | An asynchronous circuit has attracted much attention as a promising low-power and robust digital design technique. Muller C-element is one of the fundamental building blocks for asynchronous circuit and is used in timing control of each circuit block and pipeline processing. However, conventional Muller C-elements have the problem that it is difficult to operate at lower supply voltage. In this paper, we propose a new Muller C-element capable with low supply voltage operation. The circuit is based on the conventional C-element and use a MOS resistor to ensure robust operation. Simulation results have demonstrated that the proposed circuit can operate at low-supply voltage of 0.38 V and the power-delay product (PDP) was 4.32 aJ at VDD = 1.08 V, which was lower by 9.3% compared with a conventional Muller C-element. |
Title | Analytical Thermal Modeling and Calibration Method for Lithium-Ion Batteries |
Author | *Keiji Kato, Yusuke Yamamoto, Naoki Kawarabayashi, Lei Lin, Masahiro Fukui (Ritsumeikan University, Japan) |
Page | pp. 123 - 128 |
Keyword | Lithium-ion battery, Thermal analysis, Calibration |
Abstract | Lithium-ion battery is an important component to construct the circuit of mobile systems. However, the behavior of the battery varies depending on its thermal condition. Thus, to optimize the mobile system, in terms of long life and low power, it is important to estimate the inner temperature of the battery. This paper proposes an analytical method to estimate the inner temperature considering Joule heat and Entropy heat. Evaluation by a real battery sample is also shown. |
Title | A Sensor Modeling Technique Using SystemC-AMS For Fast Simulation of System-in-Package Based Bio-Medical Systems |
Author | *Arif Ullah Khan, Yoshinori Takeuchi, Masaharu Imai (Osaka University, Japan) |
Page | pp. 129 - 133 |
Keyword | Sensor, Bio-Medical, Simulation Model, SystemC-AMS, SiP |
Abstract | Use of biomedical systems, which includes healthcare systems and bio-medical implants, is increasing rapidly. These systems consist of analog and digital blocks. In order to develop these systems in short time while meeting strict size, energy and cost constraints there is need for a new design methodology. This research is focused on developing an application specific instruction-set processor (ASIP) and system in package (SiP) based common hardware platform, which could be used by different health monitoring and bio-medical systems. For fast design space exploration there a fast simulation model of complete system is also needed. Different blocks of SiP have been modeled, at different abstraction levels, using SystemC and SystemC-AMS. In this paper a sensor modeling technique, for modeling of available analog sensors, will be presented using SystemC-AMS, which will be used in the SiP simulation model. |
PDF file |
Title | A Cool Charger for Lithium-Ion Battery |
Author | *Yusuke Yamamoto, Keiji Kato, Lei Lin, Masahiro Fukui (Ritsumeikan University, Japan) |
Page | pp. 134 - 139 |
Keyword | Lithium-ion battery, Charger, Thermal Management |
Abstract | Mobile systems, electric vehicle and smart house have lithium-ion battery. It has many advantages like high capacity. On the other hand, degradations of battery are occurred by various factors. We focus on thermal degradation and developed temperature management system for lithium-ion battery. This system includes inner temperature estimation system to grasp battery characteristics. We construct charging system to control temperature of battery while charging. We check about characteristic of cooling battery for charging and discharging. |
Title | A Hardware Generator for Aesthetic Nonlinear Filter Banks |
Author | *Tomoki Komuro, Hirotaka Nishikawa, Yukihiro Iguchi, Kaoru Arakawa (Meiji University, Japan) |
Page | pp. 140 - 141 |
Keyword | Signal Processing, nonlinear filter bank, aesthetic filter bank, hardware generator, FPGA |
Abstract | This paper considers hardware realizations of nonlinear filter banks for facial beautification. First, users describe filters' characteristics and how to connect them using filter bank description languages (FDLs), then the proposed system generates Verilog HDLs to realize them. Preliminary experimental results show that generated HDLs have the same performance as ones coded by hands. |
PDF file |
Title | Replacing Optical Lenses by Silicon Structures |
Author | *Rudy Lauwereins (IMEC, Belgium) |
Page | p. 142 |
Abstract | Since the start of photography, image optics, recording and display has been analog. In the forties, analog image display has been partially replaced by digital displays: through the line based CRT screen. In the early eighties, also image recording has been digitized thanks to solid state silicon imagers. We are now at the edge of digitizing the last analog part in the image handling flow: the optics. By making opto-mechanical structures in silicon matching the size of the wave length of light, the large, heavy, fragile and costly optical lenses can be replaced by cheap mass produced silicon, often integrated together with the imager and/or processing logic in a heterogeneous stack. This presentation first describes the opportunities silicon processing offers to replace lenses, as well as the requirements that have to be met to make such a replacement successful. Examples are given of a few applications domains that benefit from silicon lenses. Next, one application area is presented in more detail, namely fast and cheap hyperspectral imaging. The adoption of hyperspectral cameras by industry has so far been limited due to their limited speed, limited compactness and their high cost, all caused by the need for many high quality optical lenses. Silicon structures allow to counter all these drawbacks. A novel hyperspectral sensor is detailed with the following key innovations: a Fabry-Pérot wedge filter monolithically integrated on top of a standard CMOS sensor; processed with minimal cavity sizes and integrated with the needed software to improve image quality. The result is a compact 2 megapixel sensor with a spectral range between 550 and 1000 nm and a spectral resolution lower than 10 nm. The speed is 340 fps at illumination levels as used in machine vision. Developing such a breakthrough solution required intense cross-disciplinary collaboration between process technologists, circuit designers, optical experts, camera and system designers, image enhancement specialists, image classification specialists and application experts. The design approach followed will be explained, as well as the need to develop special software tools to emulate the complete system, including lighting, lens aberrations, process technology variability, mechanical variability, optical distortions and software correction algorithms. |
PDF file |
Title | A Heuristic Method to Find Linear Decompositions for Incompletely Specified Index Generation Functions |
Author | Tsutomu Sasao, *Yuta Urano, Yukihiro Iguchi (Meiji University, Japan) |
Page | pp. 143 - 148 |
Keyword | logic minimization, linear transform, Functional decomposition, minimul covering |
Abstract | This paper shows a method to find a linear transformation that reduces the number of variables to represent a given incompletely specified index generation function. It first generates the difference matrix, and then finds the minimal set of variables using a covering table. Linear transformations are used to modify the covering table to produce a smaller solution. |
Title | A New Design Methodology for Rounding and Hardware Minimization in Look-Up-Table-Based Arithmetic Function Evaluation |
Author | *Shen-Fu Hsiao (National Sun Yat-sen University, Taiwan), Hou-Jen Ko (Purdue University, U.S.A.), Yu-Ling Tseng (SpringSoft, Taiwan), Chia-Sheng Wen (National Sun Yat-sen University, Taiwan) |
Page | pp. 149 - 152 |
Keyword | function evaluation, error analysis, truncated multiplier, digital arithmetic |
Abstract | This paper presents a new approach of determining the bit-widths of hardware components in arithmetic function evaluators based on Look-Up Tables (LUTs) The rounding of floating-point constant values to be stored in LUTs and the hardware minimization in the subsequent arithmetic computation are considered jointly in order to optimize the entire design. The piecewise polynomial approximation with truncated multiplication is used to demonstrate the proposed method. Previous similar designs usually determine the bit widths of the quantized polynomial coefficients and the corresponding multipliers by pre-assigning allowable errors for the individual hardware components, including ROM and arithmetic units. The proposed design considers all the error sources, including the approximation errors, quantization errors, truncation errors, and final rounding errors simultaneously. Thus, the total error budget can be utilized more efficiently and the bit widths of the hardware components (ROM, multipliers, adders) can be optimized, leading to significant improvements in both area and delay. Experimental results show that the proposed method can reduce up to 48% of the total area and up to 25% of delay compared to conventional design approaches. |
Title | A Global Router Considering Scenic Controls |
Author | Hsueh-Ju Chou (Faraday Technology Corporation, Taiwan), Hsi-An Chien, *Ting-Chi Wang (National Tsing Hua University, Taiwan) |
Page | pp. 153 - 158 |
Keyword | Global Routing, Scenic Controls |
Abstract | In this paper, we study a global routing problem that considers not only overflow and wirelength but also scenic controls. A scenic control is often given to a timing-critical net for coping with timing closure in a modern physical synthesis flow. We enhance an academic global router to handle scenic controls. The enhancements are (1) a new net ordering method for rip-up and reroute, (2) two length bound allocation methods, and (3) a length-bounded adaptive multi-source and multi-sink maze routing method. The experimental results show that our global router is able to produce a high-quality solution in terms of overflow and wirelength without any violation of scenic controls for each test case. |
Title | A Tuning Method of Programmable Delay Element with Two Values for Yield Improvement |
Author | *Hayato Mashiko, Yukihide Kohira (The University of Aizu, Japan) |
Page | pp. 159 - 164 |
Keyword | Delay variation, Timing violation, Yield, Programmable delay element |
Abstract | To recover the timing violations, which cause significant reduction in the yield of LSI chips, programmable delay elements called PDEs are inserted into the clock tree before fabrication and their delays are tuned after fabrication. In this paper, we use PDEs with two delay values and propose a delay tuning method of the PDE to improve the yield and to reduce the number of tests. Moreover, we evaluate the circuits obtained by the proposed method by using commercial CAD tools. |
PDF file |
Title | Impact of Drive Strength and Well-Contact Density on Heavy-Ion-Induced Single Event Transient |
Author | *Jun Furuta (Kyoto University, Japan), Masaki Masuda, Katsuyuki Takeuchi, Kazutoshi Kobayashi (Kyoto Institute of Technology, Japan), Hidetoshi Onodera (Kyoto University, Japan) |
Page | pp. 165 - 169 |
Keyword | soft error, SET, pulse width |
Abstract | We measure distributions of heavy-ioninduced Single Event Transient (SET) pulse widths from the 4 kinds of inverter chains to measure their characteristics and estimate SET-induced soft error rates on a Flip-Flop (FF) and a delayed TMR FF. Measurement results show that maximum SET-induced soft error rate on a FF is equivalent to 20% of Single Event Upset (SEU) rate. On the delayed TMR with 400ps delay element, SET-induced soft error rate can be reduced by using 4x inverters with 2μm well-contact distance. |
Title | A Technique for Accelerating Adaptive Super Resolution Technique Based on Local Features of Images Using GPU |
Author | *Kento Kugai, Yuzuru Shizuku, Tetsuya Hirose, Nobutaka Kuroki, Masahiro Numa (Kobe University, Japan) |
Page | pp. 170 - 175 |
Keyword | CUDA, GPGPU, Super Resolution |
Abstract | In this paper, we propose a technique to accelerate adaptive super resolution technique based on local features of images using a Graphics Processing Unit (GPU). We have applied the acceleration technique to both super resolution process and learning process. Experimental results have shown that the proposed technique achieves speedups of 2.36 times in average, and 3.35 times at the maximum compared with the conventional technique using Central Processing Unit (CPU). |
Title | Parallel Layer-Aware Partitioning for 3D Designs |
Author | *Yi-Hang Chen, Yi-Ting Chen, Juinn-Dar Huang (National Chiao Tung University, Taiwan) |
Page | pp. 176 - 179 |
Keyword | through-silicon via, 3D integration technology, layering, partitioning, multicore architecture |
Abstract | As compared with two-dimensional (2D) ICs, 3D integration is a breakthrough technology of growing importance that has the potential to offer significant performance and functional benefits. This emerging technology allows stacking multiple layers of dies and resolves the vertical connection issue by through-silicon vias (TSVs). However, though a TSV is considered a good solution for vertical connection, it also occupies significant silicon estate and incurs reliability problem. Because of these challenges, to minimize the number of TSVs becomes important in the design processes. Therefore, in this paper, we propose a two-phase parallel layer-aware partitioning algorithm for TSV minimization in 3D structures. In the first-phase, we employ OpenMP to parallelize the 2-way min-cut partitioning steps and get the initial solution. In the second-phase, we further improve the result by using parallel simulated annealing algorithm on GPU. The experimental results show that proposed method can reduce the number of TSVs by about 39% as compared to several existing methods. |
PDF file |
Title | Lithium-Ion Battery Degradation Model and Its Application to Power Management of Smart House |
Author | *Ryosuke Miyahara, Ami Watanabe, Masahiro Fukui (Ritsumeikan University, Japan) |
Page | pp. 180 - 185 |
Keyword | Smart house, battery modeling, power management |
Abstract | Aiming at low carbon society, it is important that smart houses spread for efficient utilization of natural energy sources, e.g., photovoltaic battery. However, cost of the degradation of the batteries is a non-ignorable portion of the total cost for the power management in the smart houses. This paper discusses how to formulate the degradation of the batteries, and clarifies the efficient keys to control the cost of battery degradation. Finally, we propose an efficient power management scheme for real smart houses. |
Title | Parameter Estimation and Model Reduction for Digital IIR Filters Using a Modified PSO Algorithm |
Author | *Wei-Der Chang, Ching-Lung Chi (Shu-Te University, Taiwan) |
Page | pp. 186 - 189 |
Keyword | modified PSO, filter design |
Abstract | This paper develops a new design method for digital IIR filters in the parameter estimation and model reduction. The employed method is the modified particle swarm optimization (MPSO) whose velocity formula is slightly changed to enhance the algorithm performance. Design steps of MPSO-based are presented for the IIR filters. Finally, two kinds of experiments including the filter parameter estimation and model reduction are provided as well. Simulation results will show the applicability of the proposed method. |
Title | Investigating Performance Advantages of Random Topologies on Network-on-Chip |
Author | Sarat Yoowattana (Asian Institute of Technology, Thailand), *Ikki Fujiwara, Michihiro Koibuchi (National Institute of Informatics, Japan) |
Page | pp. 190 - 194 |
Keyword | Network-on-Chip, topology, interconnection networks |
Abstract | As technology continues to scale down, the number of cores significantly increases, e.g. 64 cores. The communication latencies increasingly give the negative impact on the performance of parallel applications on Chip MultiProcessors (CMPs). A random topology, which provides lowest diameter and average shortest path length, has been recently considered for low-latency Network-on-Chip (NoC). In this work we investigate its advantage in throughput-and-latency properties for various traffic patterns and we compare the random topology with traditional non-random topologies, such as two-dimensional mesh in various network sizes. Thorough our cycle-accurate network simulation, we found that the random topology significantly outperforms 2-D mesh and 2-D torus in terms of network latency. |
PDF file |
Title | Speed Traffic-Sign Recognition Algorithm for Real-Time Driving Assistant System |
Author | *Masaharu Yamamoto, Anh-Tuan Hoang, Mutsumi Omori, Tetsushi Koide (Hiroshima University, Japan) |
Page | pp. 195 - 200 |
Keyword | Number recognition, Hardware implementation, Driving safety support system, Speed traffic-sign, Real-time |
Abstract | The purpose of this research is development of an algorithm for hardware implementation for number recognition applying in speed traffic-sign recognition system for car driving assistant. We recognize the speed limit of the speed traffic-sign using hardware oriented extraction algorithm. The numbers are recognized by comparing their feature values with the recognized features. The proposed hardware oriented number recognition algorithm achieves almost 100 % in recognition rate in 31 scenes in highways and 23 scenes in local roads. |
PDF file |
Title | A Development and Evaluation of Variable Speed Charger System for Lithium-Ion Battery |
Author | *Akihiro Segawa, Yusuke Yamamoto, Lei Lin, Masahiro Fukui (Ritsumeikan University, Japan) |
Page | pp. 201 - 202 |
Keyword | Lithium-ion Battery, Charger, variable speed charger |
Abstract | This paper describes a development and evaluation variable speed charger system for lithium ion battery. High speed charge is a cause to occur degradation and a safe problem because the over-current and over-voltage are supplied to the battery at charging, and temperature of the battery rises. Thus, we proposed a variable speed charge system that controls the charge current depending on the temperature rise. The variable speed charger supplies variable current during the CC (Constant Current) charge operation. That is controlling the currents to restrain a temperature rise of the battery. It reduces the damages to battery and optimally charge for a battery. |
Tuesday, October 22, 2013 |
Title | Power of Enumeration --- State-of-the-art Algorithms for Tackling Combinatorial Explosion |
Author | *Shin-ichi Minato (Hokkaido University/JST, Japan) |
Page | pp. 203 - 207 |
Abstract | Discrete structure manipulation is a fundamental technique for many problems solved by computers. BDDs/ZDDs have attracted a great deal of attention for twenty years, because it efficiently manipulates basic discrete structures such as logic functions and sets of combinations. Recently, one of the most interesting research topics related to BDDs/ZDDs is “frontier-based method,” a very efficient algorithm for enumerating and indexing the subsets of a graph to satisfy a given constraint. This work is important because many kinds of practical problems can be efficiently solved by some variations of this algorithm. In this article, we present an overview of the frontier-based method and recent topics on the state-of-the-art algorithms to show the power of enumeration. |
Title | High Speed Approximation Feature Extraction in CAD System for Colorectal Endoscopic Images with NBI Magnification |
Author | *Tsubasa Mishima, Satoshi Shigemi, Anh-Tuan Hoang, Tetsushi Koide, Toru Tamaki, Bisser Raytchev, Kazufumi Kaneda, Yoko Kominami, Rie Miyaki, Taiji Matsuo, Shigeto Yoshida, Shinji Tanaka (Hiroshima University, Japan) |
Page | pp. 208 - 213 |
Keyword | Dense Scale-Invariant Feature Transform (D-SIFT), Colorectal Endoscopic Images, Computer-Aided Diagnosis (CAD), Bag-of-Features (BoF), FPGA |
Abstract | In this study, we have proposed an improvement for feature extraction in computer-aided diagnosis system for colorectal endoscopic images with narrow-band imaging (NBI) magnification. Dense Scale-Invariant Feature Transform (D-SIFT) is used in the feature extraction. It is necessary to consider a trade-off between the precision of the feature extraction and speedup by the FPGA implementation for processing of real time full high definition image. In this paper, we reduced the number of dimensions for feature representation in hardware implementation purpose. |
PDF file |
Title | A Fixed-Length Routing Method Based on the Color-Coding Algorithm |
Author | *Tieyuan Pan, Yasuhiro Takashima (University of Kitakyushu, Japan) |
Page | pp. 214 - 219 |
Keyword | Fixed-Length Routing, Color-Coding, PCB |
Abstract | This paper proposes a fixed-length routing method based on the Color-Coding Algorithm. In recent LSI system design, exact signal propagation delay is required because of the growth of the operation frequency. As one of the techniques to control the delay, the wire-length matching is widely used. This paper proposes a fixed-length routing method based on the Color-Coding algorithm. We analyze the complexity of the proposed approach and confirm its efficiency empirically. |
PDF file |
Title | Retiming of Single Flux Quantum Logic Circuits for Flip-Flop Reduction |
Author | *Nobutaka Kito (Chukyo University, Japan), Kazuyoshi Takagi, Naofumi Takagi (Kyoto University, Japan) |
Page | pp. 220 - 225 |
Keyword | SFQ circuits, retiming, flip-flop |
Abstract | We propose a retiming method of superconductive Single Flux Quantum (SFQ) logic circuits for flip-flop reduction. Because SFQ logic circuits use pulse logic, each input of logic gates has latching function. The number of flip-flops in SFQ circuits can be reduced by utilizing the latching function. We formulate retiming for flip-flop reduction as an instance of integer linear program considering the latching function. Experimental results show that most of flip-flops in SFQ circuit realizations of ISCAS'89 benchmark circuits can be eliminated by the proposed method. |
PDF file |
Title | Forwarding Unit Generation with Runtime Dependency Analysis in High-Level Synthesis |
Author | *Shingo Kusakabe, Kenshu Seto (Tokyo City University, Japan) |
Page | pp. 226 - 230 |
Keyword | high-level synthesis, loop pipelining, forwarding |
Abstract | We propose a technique to reduce the initiation intervals of loops which contain RAW dependences whose occurrences change during runtime. In the proposed technique, the written data to arrays in such RAW dependences are also written to temporary variables and the temporary variables are read when the RAW dependences occur, thereby the initiation intervals are minimized. Experimental results show that the proposed technique successfully achieves significant speedups with moderate increase in gate counts. |
Title | An NFA-Based Programmable Regular Expression Matching Engine Highly Suitable for FPGA Implementation |
Author | *Hiroki Takaguchi, Yoichi Wakaba, Shin'ichi Wakabayashi, Shinobu Nagayama, Masato Inagi (Hiroshima City University, Japan) |
Page | pp. 231 - 236 |
Keyword | Regular Expression Matching, FPGA, NFA |
Abstract | In this paper, we propose a new programmable regular expression matching engine based on a string-transition NFA. The proposed engine can perform matching at high speed, and any regular expression can be set as a pattern in a very short time. The proposed hardware engine has a two-dimensional circuit structure, and thus it is highly suitable for FPGA implementation. Comparing with an existing hardware matching engine, the effectiveness of the proposed hardware was evaluated. |
PDF file |
Title | Graphillion: ZDD-Based Software Library for Very Large Sets of Graphs |
Author | Takeru Inoue, *Hiroaki Iwashita (Japan Science and Technology Agency, Japan), Jun Kawahara (Nara Institute of Science and Technology, Japan), Shin-ichi Minato (Hokkaido University, Japan) |
Page | pp. 237 - 242 |
Keyword | graph, binary decision diagram, frontier-based search, software library, Python |
Abstract | Graphillion is a library for manipulating very large sets of graphs, based on zero-suppressed binary decision diagrams (ZDDs) with advanced graph enumeration algorithms. Graphillion is implemented as a Python extension in C++, to encourage easy development of its applications without introducing significant performance overhead. Experimental results show that Graphillion allows us to manage an astronomical number of graphs with very low development effort. |
PDF file |
Title | Clock Jitter Compensation for Continuous-Time Sigma-Delta Modulator Through Divided-by-N Feedback DAC |
Author | *Zong-Yi Chen, Chung-Chih Hung (Department of Electrical Engineering, National Chiao Tung University, Taiwan) |
Page | pp. 243 - 247 |
Keyword | clock jitter, sigma-delta modulator, ADC, divided-by-n feedback DAC |
Abstract | This paper proposes a new compensation method to overcome the high sensitivity of the continuous-time (CT) sigma-delta modulator to clock jitter by using divided-by-n (D-N) feedback DAC waveform. There are two types of clock jitter: independent clock jitter and accumulated clock jitter. This method provides a useful approach to solve one of the critical non-idealities, independent clock jitter, in the CT sigma-delta modulator without increasing the speed requirement of the modulator as well as the complexity of system and circuit design. Results prove the effectiveness of this new compensation method for independent clock jitter. |
Title | Simultaneous Escape Routing Considering Length Matching of Differential Pairs |
Author | Yen-Jung Lee, *Hung-Ming Chen, Ching-Yu Chin (National Chiao Tung University, Taiwan) |
Page | pp. 248 - 252 |
Keyword | Escape routing, differential pairs, length matching |
Abstract | In PCB design, the escape routing problem is considered an essential part and has been widely studied in literature. There are industrial tools and some studies that work on simultaneous escape routing and escape routing of differential pairs on dense circuit boards. However, to route differential pairs simultaneously considering length-matching is still an important and on-going research problem. In this work, inspired by prior state-of-the-arts, we have implemented an integrated approach that achieves simultaneous escape routing considering length matching of differential pairs, our method avoids time-consuming ILP solutions in finding length-matching differential signal paths. Experimental results show that our approach can efficiently and effectively obtain length-matching of differential pairs on simultaneous escape routing to reduce differential-pair skews, compared with B-escape router we reimplemented. |
Title | Technology Remapping Based on Multiple Solutions for Post-Mask Functional ECO |
Author | *Yudai Kabata, Tetsuya Hirose, Nobutaka Kuroki, Masahiro Numa (Kobe University, Japan) |
Page | pp. 253 - 258 |
Keyword | Incremental synthesis, Engineering change order (ECO) |
Abstract | This paper presents a technology remapping technique using reconfigurable (RECON) cells in order to reduce an increase in delay time induced by Engineering Change Orders (ECO’s). Based on the estimated maximum delay time for the paths related to ECO’s using each of multiple solutions obtained by error diagnosis, we can select a solution which minimizes increase in the delay along with the critical path. Experimental results have shown that the proposed technique is effective to reduce the critical path delay with the rectified circuit for post-mask ECO's. |
Title | A Fast Trace-Driven Heterogeneous L1 Cache Configuration Simulator for Dual-Core Processors |
Author | *Masashi Tawada, Masao Yanagisawa, Nozomu Togawa (Waseda University, Japan) |
Page | pp. 259 - 260 |
Keyword | cache, simulation, multi-core |
Abstract | Multi-core processors are used in embedded systems very often. Since application programs running on embedded systems are much limited, there must exists an optimal cache memory configuration in terms of speed, power and area. Simulating application programs on various cache configurations is one of the best options to determine the optimal one. In this paper, we propose a very fast heterogeneous dual-core L1 cache configuration simulation method. Experimental results show that our method runs up to 14x faster than a naive simulation algorithm. |
Title | A Dynamic Offload Scheduler for Spatial Multitasking on Intel Xeon Phi Coprocessor |
Author | *Takamichi Miyamoto, Kazuhisa Ishizaka, Takeo Hosomi (NEC, Japan) |
Page | pp. 261 - 266 |
Keyword | Intel Xeon Phi, Multi-tasking, Offload, Scheduling |
Abstract | Intel Xeon Phi Coprocessor appears and it fully supports multitasking, but it does not automatically ensure high performance in this case. A conventional task level resource allocation scheduler could be used, but a processor utilization of the Xeon Phi is low because of idle time on the Xeon Phi. In this paper, we propose a dynamic offload scheduler which assigns processor resources of the Xeon Phi to tasks by an offload level. We describe an effectiveness of the proposed method with evaluations. |
PDF file |
Title | A Restricted Dynamically Reconfigurable Architecture for Low Power Processors |
Author | *Takeshi Hirao, Dahoo Kim, Itaru Hida, Tetsuya Asai, Masato Motomura (Hokkaido University, Japan) |
Page | pp. 267 - 268 |
Keyword | Reconfigurable system, Processor architecture, Embedded system |
Abstract | In this paper, we propose a Control-flow Driven Data-flow Switching variable datapath architecture for embedded applications that demand extremely low power consumption and a wide range of usage. In the proposed architecture aim to achieve both flexibility and low power consumption by limiting the scope of dynamic reconfiguration. As a preliminary evaluation, we have mapped a small program to understand the fundamental characteristics of the proposed architecture. |
PDF file |
Title | A Processor Architecture for Motion Sensing Systems Using Accelerometer |
Author | *Takashi Matsuo, Arif Ullah Khan (Graduate School of Information Science and Technology, Osaka University, Japan), Takashi Hamabe (MICRONIX Inc., Japan), Yoshinori Takeuchi, Masaharu Imai (Graduate School of Information Science and Technology, Osaka University, Japan) |
Page | pp. 269 - 274 |
Keyword | ASIP, Coordinate transform, Power consumption, Accelerometer, Motion sensing |
Abstract | Microelectromechanical-systems-based accelerometers are widely used in motion sensing, in which coordinate system transform is the major process. In this study, a method for coordinate system transform is introduced, and a low-power processor architecture specialized for coordinate system transform is proposed and evaluated. Through experimental results, the proposed processor reduced energy consumption by 48.3% compared with a conventional RISC processor implementation. |
PDF file |
Title | Radiation-Hard Layout Structures on Bulk and SOI Process by Device-Level Simulations |
Author | *Kuiyuan Zhang, Kazutoshi Kobayashi (Kyoto Institute of Technology, Japan) |
Page | pp. 275 - 279 |
Keyword | Soft error, Layout, Bulk, SOI, TCAD |
Abstract | This paper analyze the soft error tolerance related to layout structures on 65-nm bulk and SOI processes. The layout structure in which well contacts are placed between redundant latches suppresses MCU effectively. Also the tolerance of SOI structure transistor is estimated by TCAD simulations. The charge collection mechanism is suppressed by the BOX (Buried Oxide) in SOI transistor. Charge sharing and bipolar effects between SOI redundant latches are suppressed. There is no MCU occurrence in SOI redundant latches. |
Title | Event Modeling Method for Verification of Power Analysis Attacks |
Author | *Kyota Sugioka, Toshiya Asai, Masaya Yoshikawa (Meijo University, Japan) |
Page | pp. 280 - 281 |
Keyword | Tamper resistance, power analysis attacks, cryptographic circuit |
Abstract | To evaluate the resistance of a cryptographic circuit to power analysis attacks during the design stage, this paper proposes a method to improve the efficiency of the acquisition process of information about power consumption, which requires a large number of encryption simulations. The proposed method newly introduces an event modeling method, maintains accuracy equivalent to that of a simulation program with an integrated circuit emphasis (SPICE), and acquires information about power consumption within a realistic processing timeframe that is similar to that of a logic simulation. |
Title | A Variable-Length String Matching Circuit Based On SeqBDDs |
Author | *Atsushi Matsuo, Yasunori Takagi (Ritsumeikan University, Japan), Hiroki Nakahara (Kagoshima University, Japan), Shigeru Yamashita (Ritsumeikan University, Japan) |
Page | pp. 282 - 287 |
Keyword | String Matching, Haradware, Programmable Sequence Logic, Sequence Binary Decision Diagram |
Abstract | This paper proposes a new hardware architecture for fast string matching. The proposed method utilizes a concept of Sequence Binary Decision Diagrams that is an efficient data structure to store a set of string sequences. Our proposed architecture is a natural extension of an Programmable Sequence Logic Circuit to SeqBDDs. The naive implementation of the Programmable Sequence Logic takes one character in one clock cycle, and thus we seek a way to evaluate multiple characters in one clock cycle with some Content-addressable memories (CAMs). We also report preliminary evaluations for the proposed architectures. |
Title | An Image Compression Method for Frame Memory Size Reduction Using Local Feature of Images |
Author | *Yuki Fukuhara (Osaka University, Japan), Akihisa Yamada (Sharp Corporation/Osaka University, Japan), Takao Onoye (Osaka University, Japan) |
Page | pp. 288 - 289 |
Keyword | frame memory, image compression, system architecture |
Abstract | This paper proposes a method of image compression aiming at reduction of frame memory size of digital appliances. In spite of adopting a variable length coding, this method can guarantee the minimum compression ratio by adaptively controlling pixel data reduction rate. Utilizing local feature of images, the reduction rate is more finely controlled so as to maintain visual quality of images. Experimental results show that it attains a compression ratio of 1/3, while keeping visual quality of images by means of adequate bit allocation. |
PDF file |
Title | Physical Design of Microfluidic Biochips |
Author | *Tsung-Yi Ho (National Cheng Kung University, Taiwan) |
Page | p. 290 |
Abstract | Microfluidic-based biochips are soon revolutionizing clinical diagnostics and many biochemical laboratory procedures due to their advantages of automation, cost reduction, portability, and efficiency. The basic idea of microfluidic biochips is to integrate all necessary functions for biochemical analysis onto one chip using microfluidics technology. These micro-total-analysis- systems (μTAS) are more versatile and complex than microarrays. Integrated functions include microfluidic assay operations and detection, as well as sample pre-treatment and preparation. The first generation of microfluidic biochips contained permanently etched structures such as pumps, valves and channels, and relied on continuous liquid flow stream to carry out specific tasks. This type of biochips hereafter is referred to as continuous-flow microfluidics or channel- based biochips. On the contrary, digital microfluidics, the second-generation biochip architecture, relies on discrete liquid particles to carry out general-purpose analysis. Continuing growth of various applications have dramatically complicated the chip/system integration and design complexity making traditional manual designs not suitable enough especially under the time-to-market issue. It is necessary to develop high-quality physical design tools to relieve the design burden of manual optimization of bioassays and time-consuming chip layout designs. In this talk, technology platforms for accomplishing “biochemistry on a chip”, and introduce the audience to both the droplet-based "digital" microfluidics based on electrowetting actuation and flow-based “continuous” microfluidics based on microvalve technology will be described. Next, a holistic perspective on physical design tools for microfluidic biochips and several associative combinatorial and geometric optimization for placement and routing problems will be discussed. In this way, the audience will see how a “biochip compiler” can translate protocol descriptions provided by an end user (e.g., a chemist or a nurse at a doctor’s clinic) to a set of optimized and executable fluidic instructions that will run on the underlying microfluidic platform. Having these physical design tools, users and designers will be able to generate an optimized chip layout for good fluidic performance, high reliability, and low manufacturing cost. Therefore, biochip users and designers can concentrate on the development at application level, leaving layout details to physical design tools. |
Title | Application of EDA Technologies to Non-EDA Areas |
Author | Organizer/Moderator: Masahiro Fujita (University of Tokyo, Japan), Panelists: Rudy Lauwereins (IMEC, Belgium), Shin-ichi Minato (Hokkaido University, Japan), Tsung-Yi Ho (National Cheng Kung University, Taiwan), Giovanni De Micheli (EPFL, Switzerland), Kazutoshi Wakabayashi (NEC, Japan) |
Page | p. 291 |
Abstract | EDA (Electronic Design Automation) technologies have been developed for over 40 years. Lots of research and development efforts have been put into EDA technology development. As a result, a number of sophisticated algorithms have come out and been proven to work in practical situations. Now with the state-of-the-art EDA tools, high level design descriptions, such as the ones in C programming language, can automatically be converted into silicon. Also, as programmable hardware, such as FPGA (Field Programmable Gate Arrays), is commonly used, EDA technologies can be directly applied to make them more efficient. Also, as hardware design flows include various stages, such as high-level synthesis, logic synthesis, layout synthesis, test synthesis, pre- and post-silicon verification, and others, EDA techniques are dealing with varieties of problems in computer science and electrical engineering. Some of them can be applied to totally different areas from the traditional pure hardware design problems. In this panel discussion, various aspects of non-EDA applications of EDA technologies are discussed, and new possible directions are explored. Applications to be discussed include not only custom and highly efficient FPGA based hardware for specific computations, but also synthesis and verification of biomedical optical systems as well as efficient data structures and their associated algorithms for general discrete problems in computer science. |
PDF file |
Title | A Study of ESD Clamp Placement Impact on Peripheral- and Area-I/O Designs |
Author | *Yi-Cheng Liang, Hung-Ming Chen, Ming-Fang Lai (National Chiao Tung University, Taiwan) |
Page | pp. 292 - 297 |
Keyword | ESD, I/O Placement |
Abstract | Area-I/O style flip-chip designs are now used in the main stream high-end electronics products due to the higher performance and better noise control in high density microsystem designs. Among design requirements in such microsystems and packaging, electrostatic discharge (ESD) is still one of the most important reliability concerns. The conventional I/O ring has been used for a long time, however it increases the distance of connection in flip-chip designs. In this study, we analyze rule-of-thumb principles and develop a new I/O distribution structure. In our analysis, the new structure in area-I/O has a large improvement for ESD clamp protection over peripheral I/O, and novel strategies of cell assignment on this structure can obtain less ESD violations than that from general assignment method. Our method can be easily applied in the usual design flow, especially with state-of-the-art area-I/O style cases. |
Title | Customizable Hardware Architecture of Support Vector Machine in CAD System for Colorectal Endoscopic Images with NBI Magnification |
Author | *Satoshi Shigemi, Tsubasa Mishima, Anh-Tuan Hoang, Tetsushi Koide, Toru Tamaki, Bisser Raytchev, Kazufumi Kaneda, Yoko Kominami, Rie Miyaki, Taiji Matsuo, Shigeto Yoshida, Shinji Tanaka (Hiroshima University, Japan) |
Page | pp. 298 - 303 |
Keyword | Colorectal Endoscopic Images with NBI Magnification, Support Vector Machine (SVM), Computer-Aided Diagnosis (CAD), FPGA |
Abstract | With the increase of colorectal cancer patients in recent years, the needs of quantitative evaluation of colorectal cancer are increased, and the computer-aided diagnosis (CAD) system which supports doctor's diagnosis is essential. In this paper, a hardware design of type identification module in CAD system for colorectal endoscopic images with narrow band imaging (NBI) magnification [1] is proposed for real-time processing of full high definition (Full HD) image (1920 x 1080 pixel). As a result, it has possible to realize real-time processing of our system. In addition, in order to improve the identification accuracy for type B (TA: tubular adenoma) and type C3 (SM-m cancer) , algorithms to realize a 3-class identification, which has high efficiency and high accuracy, is proposed. |
PDF file |
Title | Analysis of Corner Conditions in PVT Variations and Reliability Degradations |
Author | Atsushi Kurokawa, *Masayuki Watanabe, Makoto Hoshi, Tetsuya Kobayashi, Masa-aki Fukase (Hirosaki University, Japan) |
Page | pp. 304 - 309 |
Keyword | variability, reliability, timing analysis, corner model, on-chip variation |
Abstract | The opposite conditions exist between the best/worst cases for PVT variations and reliability degradations. There are also gaps between general PVT variation and reliability degradation and that of product specifications that must be guaranteed by timing verification during the design process. We clarify these issues through analysis and then present an approach for design guarantee with realistic best-case/worst-case (BC/WC) corner conditions. Finally, the result that analyzed the max conditions of WC corners is shown. |
Title | High Level Synthesis with Stream Query to C Parser: Eliminating Hardware Development Difficulties for Software Developers |
Author | *Eric Shun Fukuda (Hokkaido University, Japan), Takashi Takenaka, Hiroaki Inoue (NEC Corporation, Japan), Hideyuki Kawashima (University of Tsukuba, Japan), Tetsuya Asai, Masato Motomura (Hokkaido University, Japan) |
Page | pp. 310 - 315 |
Keyword | Dynamically Reconfigurable Hardware, Stream Processing, SQL, HLS, C |
Abstract | Recently, reconfigurable hardware is attracting wide attention as a stream processing platform for its high performance and power efficiency. To allow many software engineers to benefit from reconfigurable hardware, high level synthesis tools have been actively developed. Although these tools have enormously reduced the amount of work and difficulties, the users still need hardware development knowledge. In this paper, we introduce a method that parses SQL queries into high-level-synthesis-intended C codes. Our experiments using a dynamically reconfigurable hardware that features a high level synthesis tool showed that the hardware's potential was fully extracted and the developer writing the SQL queries does not need hardware development knowledge. |
Title | Faster Multiple Pattern Matching System on GPU Based on Bit-Parallelism |
Author | *Hirohito Sasakawa, Hiroki Arimura (Hokkaido University, Japan) |
Page | pp. 316 - 321 |
Keyword | GPGPU, extended pattern matching, large-scale pattern matching, bit-parallel method |
Abstract | In this paper, we propose fast string matching system using GPU for large scale string matching. The key of our proposed system is the use of bit-parallel pattern matching approach for compact and fast parallel simulation of NFA transition on GPU. In the experiments, we show the usefulness of our proposed pattern matching system. |
PDF file |
Title | High-Level Synthesis for Nested Loop Kernels with Non-Uniform Dependencies |
Author | *Akihiro Suda, Hideki Takase, Kazuyoshi Takagi, Naofumi Takagi (Kyoto University, Japan) |
Page | pp. 322 - 327 |
Keyword | High-Level Synthesis, Polyhedral Optimization, Buffering, OpenMP |
Abstract | In high-level synthesis, parallelization for nested loop kernels has been hard due to their complex data dependencies, especially non-uniform dependencies. In this paper, we propose a new method to synthesize a parallelized circuit from such kernels using polyhedral optimization, which has been vigorously studied in the software field. The key point of our contribution is a buffering method for parallel RAM accesses. The experimental result shows that the parallelized circuit with 8 PEs is 5.73 times faster than the sequential one. |
PDF file |
Title | A Fast Simplification Algorithm for Packet Classification |
Author | *Infall Syafalni (Kyushu Institute of Technology, Japan), Tsutomu Sasao (Meiji University, Japan) |
Page | pp. 328 - 333 |
Keyword | Partitioning, Elimination of rules, TCAM, Packet classification |
Abstract | Packet classification is used in various network applications such as firewalls, access control lists, and network address translators. This technology uses ternary content addressable memories (TCAMs) to perform high speed packet forwarding. However, TCAMs dissipate high power and their cost are high. Thus, reduction of TCAMs is crucial. This paper shows a method to simplify rules in TCAMs for packet classification. We partition the rules into groups so that each group has the same source address, destination address and protocol. After that, we simplify rules in each group by removing redundant rules. We developed a computer program to simplify rules among groups. Experimental results show that this method reduces the size of rules up to 57% of the original specification for ACL5 filter, 73% for ACL3 filter, and 87% for overall filters. This algorithm is useful to reduce TCAMs for packet classification. |
Title | A Low Energy Full TMR Design Method with Optimized Selection of Time/Space TMR Mode and Supply Voltage |
Author | *Kazuhito Ito, Yuki Hayashi (Saitama University, Japan) |
Page | pp. 334 - 339 |
Keyword | TMR, Low energy, MIP, Schedule exploration |
Abstract | Triple modular redundancy (TMR) is to execute an operation three times and obtain the correct result by taking the majority of the three outputs. While TMR is effective in eliminating soft errors in LSIs, the overhead of area as well as the energy consumption is the problem. In addition to the space TMR mode, where the three copies of an operation are actually executed, the time TMR mode is available, where only two copies of an operation are executed and the results are compared, then if the results differ, the third copy is executed to get the correct result. With the time TMR mode, the penalty of energy consumption can be reduced. The drawback of time TMR is that it requires longer time duration. Appropriately selecting the power supply voltage is also an effective technique to reduce the energy consumption. In this paper, a method to derive a TMR design is proposed which selects the TMR mode and supply voltage for each operation to minimize the energy consumption within the time and area constraints. |
PDF file |
Title | Via-Configurable Structured Asic Using Dual Supply Voltages |
Author | Ta-Kai Lin (Yuan Ze University, Taiwan), Kuen-Wey Lin (National Chiao Tung University, Taiwan), Chang-Hao Chiu, *Rung-Bin Lin (Yuan Ze University, Taiwan) |
Page | pp. 340 - 341 |
Keyword | Dual supply voltages, Structured ASIC, Level converter, Low power |
Abstract | This paper presents a via-configurable logic block and a design methodology for realizing fine-grained, dual-supply-voltage structured ASIC. Our results show that, given various timing budgets, our approach achieves a reduction up to 44% on energy per switching of our dual-supply-voltage structured ASIC at the expense of 1.6% overhead on level converters. |
Title | Automatic On-Chip Interface Synthesis Between Incompatible Protocols with Advanced Features |
Author | *Jiayi Zhang, Masahiro Fujita (University of Tokyo, Japan) |
Page | pp. 342 - 347 |
Keyword | Protocol, Conversion |
Abstract | Abstract - A system-on-chip contains individual processing and peripheral components connected together. Hardware module reuse is a standard solution to the problem of increasing complexity of chip architectures and growing pressure to reduce time to market. In the absence of a single module interface standard, integration of pre-designed modules often requires the use of protocol converters to solve the mismatches. Mismatches occur when the exchange of control signals and/or data between components is not consistent with the intended behavior of their interaction. Complete automation of the converter synthesis process can save time and effort in both design and verification phase and reduce the risk of human error. The ability of the converter to deal with data mismatches and clock mismatches is essential for industrial usage. In the paper we proposed a method to automatically synthesize the protocol converter between incompatible protocols. Our method is applicable to complex protocols used by industries and handles advanced features such as data width mismatch and multi-clock domain. |
Title | Low-Power Op-Amp with Capacitor-Base On-Chip Power Supply |
Author | *Kazuhiro Hanada, Shigetoshi Nakatake (The University of Kitakyushu, Japan) |
Page | pp. 348 - 353 |
Keyword | on-chip power supply, energy harvesting system, sensor IC, low-power analog circtuit |
Abstract | This paper presents a low-power analog system with a mechanism which provides a power supply via rechargeable capacitor. The system is promising for sensor systems with energy harvesting mechanism. We implement a capacitor-base power supply using MIM structure, and provide a case study which a nano-watt op-amp operates in the proposed system. The simulation results show that the op-amp works for an hour by 1 µF charge to the capacitor. |
PDF file |
Title | A Basic-Block Level Optimistic Energy Estimation for Power-Gated VLIW Data-Path Model |
Author | *Shunsuke Nakamura, Ittetsu Taniguchi, Hiroyuki Tomiyama, Masahiro Fukui (Ritsumeikan University, Japan) |
Page | pp. 354 - 359 |
Keyword | Energy estimation, VLIW data-path, Power-gating |
Abstract | This paper proposes a basic-block level optimistic energy estimation for power-gated very long instruction-set word (VLIW) data-path model. A power-gating (PG) brings a big benefit for leakage power reduction, but it makes an instruction scheduling difficult because applying PG usually takes dozens or hundreds of consecutive NOP cycles. To estimate the energy consumption of such power-gated VLIW data-path, an optimization of instruction scheduling is necessary. Proposed method enables fast and accurate energy estimation without time consuming instruction scheduling. Experimental results demonstrated the effectiveness of proposed method. |
Title | A Memory-Saving Technique for 4K Super-Resolution Circuit with Binary Tree Dictionary |
Author | *Ayumi Kiriyama, Ryo Matsuzuka, Kohei Michibata, Takahiro Kitayama, Yuzuru Shizuku, Tetsuya Hirose, Nobutaka Kuroki, Masahiro Numa (Kobe University, Japan) |
Page | pp. 360 - 365 |
Keyword | learning-based super-resolution, memory-saving, hardware architecture |
Abstract | In this paper, we propose a memory-saving technique for 4K super-resolution circuit with binary tree dictionary. In the conventional architecture, 8 super-resolution circuits work in parallel to output 4K video signal. Each circuit needs a large dictionary. We propose a memory-saving technique by sharing the dictionary. In the proposed architecture, a binary search tree circuit consists of a ROM-read stage and a calculation stage, which enables 8 super-resolution circuits to access a single ROM in parallel. Moreover, we propose a memory compaction technique for the binary tree dictionary. All nodes of the tree are stored on the ROM without gaps. Since each node has addresses of child nodes on the ROM, we can trace the tree easily. Experimental results have shown that our architecture can reduce 87% memory area. |
Title | HLS Utilizing Area Optimizing Method for High-Definition MRA-TV Denoise Circuit |
Author | *Eita Kobayashi (NEC Corporation, Japan), Kenta Senzaki, Atsufumi Shibayama (NEC Corporataion, Japan), Yuichi Nakamura (NEC Corporation, Japan) |
Page | pp. 366 - 371 |
Keyword | Circuits, Optimization, Design Methodology, High-Level Synthesis, Denoise |
Abstract | This work proposed an area optimization method of high-definition image denoising for full HD image resolution. Conventional denoise techniques have a common defect, which outline of object is blurred while increases the strength of the noise reduction. Meanwhile, we develop a MRA-TV algorithm combined with wavelet transform and TV norm optimization to clear the outline. This method enables a high-quality image denoising with the maintenance of clear outlines. However, there is a fundamental problem that MRA-TV circuit with iterative TVs requires a large implementation due to the size of TV module. In this work, we achieve a significant improvement of that area with the combination of reduction of the calculation and resource sharing utilizing high-level synthesis. Evaluation results show the 52% of area reduction with the maintenance throughput or latency. |
PDF file |
Title | A Circuit Design Method for Dynamic Reconfigurable Circuits |
Author | *Hajime Sawano, Takashi Kambe (Kinki University, Japan) |
Page | pp. 372 - 376 |
Keyword | Reconfigurable Computing, Design Method, DAPDNA-2, JPEG encoder |
Abstract | Reconfigurable Computing (RC) is a new paradigm that addresses the conflicting design requirements of high performance and high area density. In Coarse Grained Architecture (CGA) RC systems, it is important to achieve acceleration using pipelining and also achieve a high PE utilization ratio. This paper proposes an interactive circuit design methodology for Dynamically Reconfigurable Processors to accelerate their performance and achieve compact, low power circuits. The method is applied to a JPEG encoder design and its performance evaluated. |
PDF file |
Title | Concurrent Verification Experience of Cache Protocol in Real Development of Large SMP Server Product by Using Model Checking |
Author | *Toru Shonai (Hitachi, Ltd., Japan), Shoichi Hanaki (OKANO Electric Co., Ltd, Japan), Yoshiaki Kinoshita (Hitachi, Ltd., Japan) |
Page | pp. 377 - 382 |
Keyword | model checking, formal verification, cache protocol, product development, high-end server |
Abstract | We have verified the cache protocol by using model checking in real development of the highly multiple-CPU server product. A formal verification engineer abstracted the models for model checking several times through the design process from the protocol specifications written in natural language by the architect team. We discovered actual nine complicated protocol bugs acknowledged by the architects in advance of logic simulation. Some bugs we found were too complicated to be replicated in logic simulation. This effort surely shortened the total design duration. We proved the effectiveness of formal verification of cache protocols in early design phase of real server product development. |
PDF file |
Title | Implementation of Strictly Convex QP Solver with Multiple Precision Arithmetic |
Author | *Masahiro Kimura, Hiroshige Dan (Kansai University, Japan) |
Page | pp. 383 - 386 |
Keyword | Strictly convex QP, Multiple precision arithmetic, Solver |
Abstract | Optimization solvers are usually implemented with so-called double precision arithmetic because it has been defined rigorously in the IEEE754-1985 standard and can perform high-speed floating point arithmetic. Double precision arithmetic for optimization basically works well, but it sometimes fails to solve some ill-posed problems. On the other hand, multiple precision arithmetic has attracted much attention recently. In this research, we implemented a solver for strictly convex QPs by using multiple precision arithmetic. |