SASIMI 2013 Technical Program

SASIMI 2013
The 18th Workshop on Synthesis And System Integration of Mixed Information Technologies

Poster V
Time: 15:30 - 17:00 Tuesday, October 22, 2013
Location: Tanchō-Hakuchō 1 & Kujyaku
Chairs: Yuko Hara-Azumi (Nara Institute of Science and Technology, Japan), Masashi Imai (Hirosaki University, Japan)

R5-1 (Time: 15:30 - 15:32)

Title	A Study of ESD Clamp Placement Impact on Peripheral- and Area-I/O Designs
Author	*Yi-Cheng Liang, Hung-Ming Chen, Ming-Fang Lai (National Chiao Tung University, Taiwan)
Page	pp. 292 - 297
Keyword	ESD, I/O Placement
Abstract	Area-I/O style flip-chip designs are now used in the main stream high-end electronics products due to the higher performance and better noise control in high density microsystem designs. Among design requirements in such microsystems and packaging, electrostatic discharge (ESD) is still one of the most important reliability concerns. The conventional I/O ring has been used for a long time, however it increases the distance of connection in flip-chip designs. In this study, we analyze rule-of-thumb principles and develop a new I/O distribution structure. In our analysis, the new structure in area-I/O has a large improvement for ESD clamp protection over peripheral I/O, and novel strategies of cell assignment on this structure can obtain less ESD violations than that from general assignment method. Our method can be easily applied in the usual design flow, especially with state-of-the-art area-I/O style cases.

R5-2 (Time: 15:32 - 15:34)

Title	Customizable Hardware Architecture of Support Vector Machine in CAD System for Colorectal Endoscopic Images with NBI Magnification
Author	*Satoshi Shigemi, Tsubasa Mishima, Anh-Tuan Hoang, Tetsushi Koide, Toru Tamaki, Bisser Raytchev, Kazufumi Kaneda, Yoko Kominami, Rie Miyaki, Taiji Matsuo, Shigeto Yoshida, Shinji Tanaka (Hiroshima University, Japan)
Page	pp. 298 - 303
Keyword	Colorectal Endoscopic Images with NBI Magnification, Support Vector Machine (SVM), Computer-Aided Diagnosis (CAD), FPGA
Abstract	With the increase of colorectal cancer patients in recent years, the needs of quantitative evaluation of colorectal cancer are increased, and the computer-aided diagnosis (CAD) system which supports doctor's diagnosis is essential. In this paper, a hardware design of type identification module in CAD system for colorectal endoscopic images with narrow band imaging (NBI) magnification [1] is proposed for real-time processing of full high definition (Full HD) image (1920 x 1080 pixel). As a result, it has possible to realize real-time processing of our system. In addition, in order to improve the identification accuracy for type B (TA: tubular adenoma) and type C3 (SM-m cancer) , algorithms to realize a 3-class identification, which has high efficiency and high accuracy, is proposed.
PDF file

R5-3 (Time: 15:34 - 15:36)

Title	Analysis of Corner Conditions in PVT Variations and Reliability Degradations
Author	Atsushi Kurokawa, *Masayuki Watanabe, Makoto Hoshi, Tetsuya Kobayashi, Masa-aki Fukase (Hirosaki University, Japan)
Page	pp. 304 - 309
Keyword	variability, reliability, timing analysis, corner model, on-chip variation
Abstract	The opposite conditions exist between the best/worst cases for PVT variations and reliability degradations. There are also gaps between general PVT variation and reliability degradation and that of product specifications that must be guaranteed by timing verification during the design process. We clarify these issues through analysis and then present an approach for design guarantee with realistic best-case/worst-case (BC/WC) corner conditions. Finally, the result that analyzed the max conditions of WC corners is shown.

R5-4 (Time: 15:36 - 15:38)

Title	High Level Synthesis with Stream Query to C Parser: Eliminating Hardware Development Difficulties for Software Developers
Author	*Eric Shun Fukuda (Hokkaido University, Japan), Takashi Takenaka, Hiroaki Inoue (NEC Corporation, Japan), Hideyuki Kawashima (University of Tsukuba, Japan), Tetsuya Asai, Masato Motomura (Hokkaido University, Japan)
Page	pp. 310 - 315
Keyword	Dynamically Reconfigurable Hardware, Stream Processing, SQL, HLS, C
Abstract	Recently, reconfigurable hardware is attracting wide attention as a stream processing platform for its high performance and power efficiency. To allow many software engineers to benefit from reconfigurable hardware, high level synthesis tools have been actively developed. Although these tools have enormously reduced the amount of work and difficulties, the users still need hardware development knowledge. In this paper, we introduce a method that parses SQL queries into high-level-synthesis-intended C codes. Our experiments using a dynamically reconfigurable hardware that features a high level synthesis tool showed that the hardware's potential was fully extracted and the developer writing the SQL queries does not need hardware development knowledge.

R5-5 (Time: 15:38 - 15:40)

Title	Faster Multiple Pattern Matching System on GPU Based on Bit-Parallelism
Author	*Hirohito Sasakawa, Hiroki Arimura (Hokkaido University, Japan)
Page	pp. 316 - 321
Keyword	GPGPU, extended pattern matching, large-scale pattern matching, bit-parallel method
Abstract	In this paper, we propose fast string matching system using GPU for large scale string matching. The key of our proposed system is the use of bit-parallel pattern matching approach for compact and fast parallel simulation of NFA transition on GPU. In the experiments, we show the usefulness of our proposed pattern matching system.
PDF file

R5-6 (Time: 15:40 - 15:42)

Title	High-Level Synthesis for Nested Loop Kernels with Non-Uniform Dependencies
Author	*Akihiro Suda, Hideki Takase, Kazuyoshi Takagi, Naofumi Takagi (Kyoto University, Japan)
Page	pp. 322 - 327
Keyword	High-Level Synthesis, Polyhedral Optimization, Buffering, OpenMP
Abstract	In high-level synthesis, parallelization for nested loop kernels has been hard due to their complex data dependencies, especially non-uniform dependencies. In this paper, we propose a new method to synthesize a parallelized circuit from such kernels using polyhedral optimization, which has been vigorously studied in the software field. The key point of our contribution is a buffering method for parallel RAM accesses. The experimental result shows that the parallelized circuit with 8 PEs is 5.73 times faster than the sequential one.
PDF file

R5-7 (Time: 15:42 - 15:44)

Title	A Fast Simplification Algorithm for Packet Classification
Author	*Infall Syafalni (Kyushu Institute of Technology, Japan), Tsutomu Sasao (Meiji University, Japan)
Page	pp. 328 - 333
Keyword	Partitioning, Elimination of rules, TCAM, Packet classification
Abstract	Packet classification is used in various network applications such as firewalls, access control lists, and network address translators. This technology uses ternary content addressable memories (TCAMs) to perform high speed packet forwarding. However, TCAMs dissipate high power and their cost are high. Thus, reduction of TCAMs is crucial. This paper shows a method to simplify rules in TCAMs for packet classification. We partition the rules into groups so that each group has the same source address, destination address and protocol. After that, we simplify rules in each group by removing redundant rules. We developed a computer program to simplify rules among groups. Experimental results show that this method reduces the size of rules up to 57% of the original specification for ACL5 filter, 73% for ACL3 filter, and 87% for overall filters. This algorithm is useful to reduce TCAMs for packet classification.

R5-8 (Time: 15:44 - 15:46)

Title	A Low Energy Full TMR Design Method with Optimized Selection of Time/Space TMR Mode and Supply Voltage
Author	*Kazuhito Ito, Yuki Hayashi (Saitama University, Japan)
Page	pp. 334 - 339
Keyword	TMR, Low energy, MIP, Schedule exploration
Abstract	Triple modular redundancy (TMR) is to execute an operation three times and obtain the correct result by taking the majority of the three outputs. While TMR is effective in eliminating soft errors in LSIs, the overhead of area as well as the energy consumption is the problem. In addition to the space TMR mode, where the three copies of an operation are actually executed, the time TMR mode is available, where only two copies of an operation are executed and the results are compared, then if the results differ, the third copy is executed to get the correct result. With the time TMR mode, the penalty of energy consumption can be reduced. The drawback of time TMR is that it requires longer time duration. Appropriately selecting the power supply voltage is also an effective technique to reduce the energy consumption. In this paper, a method to derive a TMR design is proposed which selects the TMR mode and supply voltage for each operation to minimize the energy consumption within the time and area constraints.
PDF file

R5-9s (Time: 15:46 - 15:48)

Title	Via-Configurable Structured Asic Using Dual Supply Voltages
Author	Ta-Kai Lin (Yuan Ze University, Taiwan), Kuen-Wey Lin (National Chiao Tung University, Taiwan), Chang-Hao Chiu, *Rung-Bin Lin (Yuan Ze University, Taiwan)
Page	pp. 340 - 341
Keyword	Dual supply voltages, Structured ASIC, Level converter, Low power
Abstract	This paper presents a via-configurable logic block and a design methodology for realizing fine-grained, dual-supply-voltage structured ASIC. Our results show that, given various timing budgets, our approach achieves a reduction up to 44% on energy per switching of our dual-supply-voltage structured ASIC at the expense of 1.6% overhead on level converters.

R5-10 (Time: 15:48 - 15:50)

Title	Automatic On-Chip Interface Synthesis Between Incompatible Protocols with Advanced Features
Author	*Jiayi Zhang, Masahiro Fujita (University of Tokyo, Japan)
Page	pp. 342 - 347
Keyword	Protocol, Conversion
Abstract	Abstract - A system-on-chip contains individual processing and peripheral components connected together. Hardware module reuse is a standard solution to the problem of increasing complexity of chip architectures and growing pressure to reduce time to market. In the absence of a single module interface standard, integration of pre-designed modules often requires the use of protocol converters to solve the mismatches. Mismatches occur when the exchange of control signals and/or data between components is not consistent with the intended behavior of their interaction. Complete automation of the converter synthesis process can save time and effort in both design and verification phase and reduce the risk of human error. The ability of the converter to deal with data mismatches and clock mismatches is essential for industrial usage. In the paper we proposed a method to automatically synthesize the protocol converter between incompatible protocols. Our method is applicable to complex protocols used by industries and handles advanced features such as data width mismatch and multi-clock domain.

R5-11 (Time: 15:50 - 15:52)

Title	Low-Power Op-Amp with Capacitor-Base On-Chip Power Supply
Author	*Kazuhiro Hanada, Shigetoshi Nakatake (The University of Kitakyushu, Japan)
Page	pp. 348 - 353
Keyword	on-chip power supply, energy harvesting system, sensor IC, low-power analog circtuit
Abstract	This paper presents a low-power analog system with a mechanism which provides a power supply via rechargeable capacitor. The system is promising for sensor systems with energy harvesting mechanism. We implement a capacitor-base power supply using MIM structure, and provide a case study which a nano-watt op-amp operates in the proposed system. The simulation results show that the op-amp works for an hour by 1 µF charge to the capacitor.
PDF file

R5-12 (Time: 15:52 - 15:54)

Title	A Basic-Block Level Optimistic Energy Estimation for Power-Gated VLIW Data-Path Model
Author	*Shunsuke Nakamura, Ittetsu Taniguchi, Hiroyuki Tomiyama, Masahiro Fukui (Ritsumeikan University, Japan)
Page	pp. 354 - 359
Keyword	Energy estimation, VLIW data-path, Power-gating
Abstract	This paper proposes a basic-block level optimistic energy estimation for power-gated very long instruction-set word (VLIW) data-path model. A power-gating (PG) brings a big benefit for leakage power reduction, but it makes an instruction scheduling difficult because applying PG usually takes dozens or hundreds of consecutive NOP cycles. To estimate the energy consumption of such power-gated VLIW data-path, an optimization of instruction scheduling is necessary. Proposed method enables fast and accurate energy estimation without time consuming instruction scheduling. Experimental results demonstrated the effectiveness of proposed method.

R5-13 (Time: 15:54 - 15:56)

Title	A Memory-Saving Technique for 4K Super-Resolution Circuit with Binary Tree Dictionary
Author	*Ayumi Kiriyama, Ryo Matsuzuka, Kohei Michibata, Takahiro Kitayama, Yuzuru Shizuku, Tetsuya Hirose, Nobutaka Kuroki, Masahiro Numa (Kobe University, Japan)
Page	pp. 360 - 365
Keyword	learning-based super-resolution, memory-saving, hardware architecture
Abstract	In this paper, we propose a memory-saving technique for 4K super-resolution circuit with binary tree dictionary. In the conventional architecture, 8 super-resolution circuits work in parallel to output 4K video signal. Each circuit needs a large dictionary. We propose a memory-saving technique by sharing the dictionary. In the proposed architecture, a binary search tree circuit consists of a ROM-read stage and a calculation stage, which enables 8 super-resolution circuits to access a single ROM in parallel. Moreover, we propose a memory compaction technique for the binary tree dictionary. All nodes of the tree are stored on the ROM without gaps. Since each node has addresses of child nodes on the ROM, we can trace the tree easily. Experimental results have shown that our architecture can reduce 87% memory area.

R5-14 (Time: 15:56 - 15:58)

Title	HLS Utilizing Area Optimizing Method for High-Definition MRA-TV Denoise Circuit
Author	*Eita Kobayashi (NEC Corporation, Japan), Kenta Senzaki, Atsufumi Shibayama (NEC Corporataion, Japan), Yuichi Nakamura (NEC Corporation, Japan)
Page	pp. 366 - 371
Keyword	Circuits, Optimization, Design Methodology, High-Level Synthesis, Denoise
Abstract	This work proposed an area optimization method of high-definition image denoising for full HD image resolution. Conventional denoise techniques have a common defect, which outline of object is blurred while increases the strength of the noise reduction. Meanwhile, we develop a MRA-TV algorithm combined with wavelet transform and TV norm optimization to clear the outline. This method enables a high-quality image denoising with the maintenance of clear outlines. However, there is a fundamental problem that MRA-TV circuit with iterative TVs requires a large implementation due to the size of TV module. In this work, we achieve a significant improvement of that area with the combination of reduction of the calculation and resource sharing utilizing high-level synthesis. Evaluation results show the 52% of area reduction with the maintenance throughput or latency.
PDF file

R5-15 (Time: 15:58 - 16:00)

Title	A Circuit Design Method for Dynamic Reconfigurable Circuits
Author	*Hajime Sawano, Takashi Kambe (Kinki University, Japan)
Page	pp. 372 - 376
Keyword	Reconfigurable Computing, Design Method, DAPDNA-2, JPEG encoder
Abstract	Reconfigurable Computing (RC) is a new paradigm that addresses the conflicting design requirements of high performance and high area density. In Coarse Grained Architecture (CGA) RC systems, it is important to achieve acceleration using pipelining and also achieve a high PE utilization ratio. This paper proposes an interactive circuit design methodology for Dynamically Reconfigurable Processors to accelerate their performance and achieve compact, low power circuits. The method is applied to a JPEG encoder design and its performance evaluated.
PDF file

R5-16 (Time: 16:00 - 16:02)

Title	Concurrent Verification Experience of Cache Protocol in Real Development of Large SMP Server Product by Using Model Checking
Author	*Toru Shonai (Hitachi, Ltd., Japan), Shoichi Hanaki (OKANO Electric Co., Ltd, Japan), Yoshiaki Kinoshita (Hitachi, Ltd., Japan)
Page	pp. 377 - 382
Keyword	model checking, formal verification, cache protocol, product development, high-end server
Abstract	We have verified the cache protocol by using model checking in real development of the highly multiple-CPU server product. A formal verification engineer abstracted the models for model checking several times through the design process from the protocol specifications written in natural language by the architect team. We discovered actual nine complicated protocol bugs acknowledged by the architects in advance of logic simulation. Some bugs we found were too complicated to be replicated in logic simulation. This effort surely shortened the total design duration. We proved the effectiveness of formal verification of cache protocols in early design phase of real server product development.
PDF file

R5-17 (Time: 16:02 - 16:04)

Title	Implementation of Strictly Convex QP Solver with Multiple Precision Arithmetic
Author	*Masahiro Kimura, Hiroshige Dan (Kansai University, Japan)
Page	pp. 383 - 386
Keyword	Strictly convex QP, Multiple precision arithmetic, Solver
Abstract	Optimization solvers are usually implemented with so-called double precision arithmetic because it has been defined rigorously in the IEEE754-1985 standard and can perform high-speed floating point arithmetic. Double precision arithmetic for optimization basically works well, but it sometimes fails to solve some ill-posed problems. On the other hand, multiple precision arithmetic has attracted much attention recently. In this research, we implemented a solver for strictly convex QPs by using multiple precision arithmetic.

SASIMI 2013 The 18th Workshop on Synthesis And System Integration of Mixed Information Technologies

SASIMI 2013
The 18th Workshop on Synthesis And System Integration of Mixed Information Technologies