(Back to Session Schedule)

SASIMI 2013
The 18th Workshop on Synthesis And System Integration of Mixed Information Technologies

Poster V
Time: 15:30 - 17:00 Tuesday, October 22, 2013
Location: Tanchō-Hakuchō 1 & Kujyaku
Chairs: Yuko Hara-Azumi (Nara Institute of Science and Technology, Japan), Masashi Imai (Hirosaki University, Japan)

R5-1 (Time: 15:30 - 15:32)
TitleA Study of ESD Clamp Placement Impact on Peripheral- and Area-I/O Designs
Author*Yi-Cheng Liang, Hung-Ming Chen, Ming-Fang Lai (National Chiao Tung University, Taiwan)
Pagepp. 292 - 297
KeywordESD, I/O Placement
AbstractArea-I/O style flip-chip designs are now used in the main stream high-end electronics products due to the higher performance and better noise control in high density microsystem designs. Among design requirements in such microsystems and packaging, electrostatic discharge (ESD) is still one of the most important reliability concerns. The conventional I/O ring has been used for a long time, however it increases the distance of connection in flip-chip designs. In this study, we analyze rule-of-thumb principles and develop a new I/O distribution structure. In our analysis, the new structure in area-I/O has a large improvement for ESD clamp protection over peripheral I/O, and novel strategies of cell assignment on this structure can obtain less ESD violations than that from general assignment method. Our method can be easily applied in the usual design flow, especially with state-of-the-art area-I/O style cases.

R5-2 (Time: 15:32 - 15:34)
TitleCustomizable Hardware Architecture of Support Vector Machine in CAD System for Colorectal Endoscopic Images with NBI Magnification
Author*Satoshi Shigemi, Tsubasa Mishima, Anh-Tuan Hoang, Tetsushi Koide, Toru Tamaki, Bisser Raytchev, Kazufumi Kaneda, Yoko Kominami, Rie Miyaki, Taiji Matsuo, Shigeto Yoshida, Shinji Tanaka (Hiroshima University, Japan)
Pagepp. 298 - 303
KeywordColorectal Endoscopic Images with NBI Magnification, Support Vector Machine (SVM), Computer-Aided Diagnosis (CAD), FPGA
AbstractWith the increase of colorectal cancer patients in recent years, the needs of quantitative evaluation of colorectal cancer are increased, and the computer-aided diagnosis (CAD) system which supports doctor's diagnosis is essential. In this paper, a hardware design of type identification module in CAD system for colorectal endoscopic images with narrow band imaging (NBI) magnification [1] is proposed for real-time processing of full high definition (Full HD) image (1920 x 1080 pixel). As a result, it has possible to realize real-time processing of our system. In addition, in order to improve the identification accuracy for type B (TA: tubular adenoma) and type C3 (SM-m cancer) , algorithms to realize a 3-class identification, which has high efficiency and high accuracy, is proposed.
PDF file

R5-3 (Time: 15:34 - 15:36)
TitleAnalysis of Corner Conditions in PVT Variations and Reliability Degradations
AuthorAtsushi Kurokawa, *Masayuki Watanabe, Makoto Hoshi, Tetsuya Kobayashi, Masa-aki Fukase (Hirosaki University, Japan)
Pagepp. 304 - 309
Keywordvariability, reliability, timing analysis, corner model, on-chip variation
AbstractThe opposite conditions exist between the best/worst cases for PVT variations and reliability degradations. There are also gaps between general PVT variation and reliability degradation and that of product specifications that must be guaranteed by timing verification during the design process. We clarify these issues through analysis and then present an approach for design guarantee with realistic best-case/worst-case (BC/WC) corner conditions. Finally, the result that analyzed the max conditions of WC corners is shown.

R5-4 (Time: 15:36 - 15:38)
TitleHigh Level Synthesis with Stream Query to C Parser: Eliminating Hardware Development Difficulties for Software Developers
Author*Eric Shun Fukuda (Hokkaido University, Japan), Takashi Takenaka, Hiroaki Inoue (NEC Corporation, Japan), Hideyuki Kawashima (University of Tsukuba, Japan), Tetsuya Asai, Masato Motomura (Hokkaido University, Japan)
Pagepp. 310 - 315
KeywordDynamically Reconfigurable Hardware, Stream Processing, SQL, HLS, C
AbstractRecently, reconfigurable hardware is attracting wide attention as a stream processing platform for its high performance and power efficiency. To allow many software engineers to benefit from reconfigurable hardware, high level synthesis tools have been actively developed. Although these tools have enormously reduced the amount of work and difficulties, the users still need hardware development knowledge. In this paper, we introduce a method that parses SQL queries into high-level-synthesis-intended C codes. Our experiments using a dynamically reconfigurable hardware that features a high level synthesis tool showed that the hardware's potential was fully extracted and the developer writing the SQL queries does not need hardware development knowledge.

R5-5 (Time: 15:38 - 15:40)
TitleFaster Multiple Pattern Matching System on GPU Based on Bit-Parallelism
Author*Hirohito Sasakawa, Hiroki Arimura (Hokkaido University, Japan)
Pagepp. 316 - 321
KeywordGPGPU, extended pattern matching, large-scale pattern matching, bit-parallel method
AbstractIn this paper, we propose fast string matching system using GPU for large scale string matching. The key of our proposed system is the use of bit-parallel pattern matching approach for compact and fast parallel simulation of NFA transition on GPU. In the experiments, we show the usefulness of our proposed pattern matching system.
PDF file

R5-6 (Time: 15:40 - 15:42)
TitleHigh-Level Synthesis for Nested Loop Kernels with Non-Uniform Dependencies
Author*Akihiro Suda, Hideki Takase, Kazuyoshi Takagi, Naofumi Takagi (Kyoto University, Japan)
Pagepp. 322 - 327
KeywordHigh-Level Synthesis, Polyhedral Optimization, Buffering, OpenMP
AbstractIn high-level synthesis, parallelization for nested loop kernels has been hard due to their complex data dependencies, especially non-uniform dependencies. In this paper, we propose a new method to synthesize a parallelized circuit from such kernels using polyhedral optimization, which has been vigorously studied in the software field. The key point of our contribution is a buffering method for parallel RAM accesses. The experimental result shows that the parallelized circuit with 8 PEs is 5.73 times faster than the sequential one.
PDF file

R5-7 (Time: 15:42 - 15:44)
TitleA Fast Simplification Algorithm for Packet Classification
Author*Infall Syafalni (Kyushu Institute of Technology, Japan), Tsutomu Sasao (Meiji University, Japan)
Pagepp. 328 - 333
KeywordPartitioning, Elimination of rules, TCAM, Packet classification
AbstractPacket classification is used in various network applications such as firewalls, access control lists, and network address translators. This technology uses ternary content addressable memories (TCAMs) to perform high speed packet forwarding. However, TCAMs dissipate high power and their cost are high. Thus, reduction of TCAMs is crucial. This paper shows a method to simplify rules in TCAMs for packet classification. We partition the rules into groups so that each group has the same source address, destination address and protocol. After that, we simplify rules in each group by removing redundant rules. We developed a computer program to simplify rules among groups. Experimental results show that this method reduces the size of rules up to 57% of the original specification for ACL5 filter, 73% for ACL3 filter, and 87% for overall filters. This algorithm is useful to reduce TCAMs for packet classification.

R5-8 (Time: 15:44 - 15:46)
TitleA Low Energy Full TMR Design Method with Optimized Selection of Time/Space TMR Mode and Supply Voltage
Author*Kazuhito Ito, Yuki Hayashi (Saitama University, Japan)
Pagepp. 334 - 339
KeywordTMR, Low energy, MIP, Schedule exploration
AbstractTriple modular redundancy (TMR) is to execute an operation three times and obtain the correct result by taking the majority of the three outputs. While TMR is effective in eliminating soft errors in LSIs, the overhead of area as well as the energy consumption is the problem. In addition to the space TMR mode, where the three copies of an operation are actually executed, the time TMR mode is available, where only two copies of an operation are executed and the results are compared, then if the results differ, the third copy is executed to get the correct result. With the time TMR mode, the penalty of energy consumption can be reduced. The drawback of time TMR is that it requires longer time duration. Appropriately selecting the power supply voltage is also an effective technique to reduce the energy consumption. In this paper, a method to derive a TMR design is proposed which selects the TMR mode and supply voltage for each operation to minimize the energy consumption within the time and area constraints.
PDF file

R5-9s (Time: 15:46 - 15:48)
TitleVia-Configurable Structured Asic Using Dual Supply Voltages
AuthorTa-Kai Lin (Yuan Ze University, Taiwan), Kuen-Wey Lin (National Chiao Tung University, Taiwan), Chang-Hao Chiu, *Rung-Bin Lin (Yuan Ze University, Taiwan)
Pagepp. 340 - 341
KeywordDual supply voltages, Structured ASIC, Level converter, Low power
AbstractThis paper presents a via-configurable logic block and a design methodology for realizing fine-grained, dual-supply-voltage structured ASIC. Our results show that, given various timing budgets, our approach achieves a reduction up to 44% on energy per switching of our dual-supply-voltage structured ASIC at the expense of 1.6% overhead on level converters.

R5-10 (Time: 15:48 - 15:50)
TitleAutomatic On-Chip Interface Synthesis Between Incompatible Protocols with Advanced Features
Author*Jiayi Zhang, Masahiro Fujita (University of Tokyo, Japan)
Pagepp. 342 - 347
KeywordProtocol, Conversion
AbstractAbstract - A system-on-chip contains individual processing and peripheral components connected together. Hardware module reuse is a standard solution to the problem of increasing complexity of chip architectures and growing pressure to reduce time to market. In the absence of a single module interface standard, integration of pre-designed modules often requires the use of protocol converters to solve the mismatches. Mismatches occur when the exchange of control signals and/or data between components is not consistent with the intended behavior of their interaction. Complete automation of the converter synthesis process can save time and effort in both design and verification phase and reduce the risk of human error. The ability of the converter to deal with data mismatches and clock mismatches is essential for industrial usage. In the paper we proposed a method to automatically synthesize the protocol converter between incompatible protocols. Our method is applicable to complex protocols used by industries and handles advanced features such as data width mismatch and multi-clock domain.

R5-11 (Time: 15:50 - 15:52)
TitleLow-Power Op-Amp with Capacitor-Base On-Chip Power Supply
Author*Kazuhiro Hanada, Shigetoshi Nakatake (The University of Kitakyushu, Japan)
Pagepp. 348 - 353
Keywordon-chip power supply, energy harvesting system, sensor IC, low-power analog circtuit
AbstractThis paper presents a low-power analog system with a mechanism which provides a power supply via rechargeable capacitor. The system is promising for sensor systems with energy harvesting mechanism. We implement a capacitor-base power supply using MIM structure, and provide a case study which a nano-watt op-amp operates in the proposed system. The simulation results show that the op-amp works for an hour by 1 µF charge to the capacitor.
PDF file

R5-12 (Time: 15:52 - 15:54)
TitleA Basic-Block Level Optimistic Energy Estimation for Power-Gated VLIW Data-Path Model
Author*Shunsuke Nakamura, Ittetsu Taniguchi, Hiroyuki Tomiyama, Masahiro Fukui (Ritsumeikan University, Japan)
Pagepp. 354 - 359
KeywordEnergy estimation, VLIW data-path, Power-gating
AbstractThis paper proposes a basic-block level optimistic energy estimation for power-gated very long instruction-set word (VLIW) data-path model. A power-gating (PG) brings a big benefit for leakage power reduction, but it makes an instruction scheduling difficult because applying PG usually takes dozens or hundreds of consecutive NOP cycles. To estimate the energy consumption of such power-gated VLIW data-path, an optimization of instruction scheduling is necessary. Proposed method enables fast and accurate energy estimation without time consuming instruction scheduling. Experimental results demonstrated the effectiveness of proposed method.

R5-13 (Time: 15:54 - 15:56)
TitleA Memory-Saving Technique for 4K Super-Resolution Circuit with Binary Tree Dictionary
Author*Ayumi Kiriyama, Ryo Matsuzuka, Kohei Michibata, Takahiro Kitayama, Yuzuru Shizuku, Tetsuya Hirose, Nobutaka Kuroki, Masahiro Numa (Kobe University, Japan)
Pagepp. 360 - 365
Keywordlearning-based super-resolution, memory-saving, hardware architecture
AbstractIn this paper, we propose a memory-saving technique for 4K super-resolution circuit with binary tree dictionary. In the conventional architecture, 8 super-resolution circuits work in parallel to output 4K video signal. Each circuit needs a large dictionary. We propose a memory-saving technique by sharing the dictionary. In the proposed architecture, a binary search tree circuit consists of a ROM-read stage and a calculation stage, which enables 8 super-resolution circuits to access a single ROM in parallel. Moreover, we propose a memory compaction technique for the binary tree dictionary. All nodes of the tree are stored on the ROM without gaps. Since each node has addresses of child nodes on the ROM, we can trace the tree easily. Experimental results have shown that our architecture can reduce 87% memory area.

R5-14 (Time: 15:56 - 15:58)
TitleHLS Utilizing Area Optimizing Method for High-Definition MRA-TV Denoise Circuit
Author*Eita Kobayashi (NEC Corporation, Japan), Kenta Senzaki, Atsufumi Shibayama (NEC Corporataion, Japan), Yuichi Nakamura (NEC Corporation, Japan)
Pagepp. 366 - 371
KeywordCircuits, Optimization, Design Methodology, High-Level Synthesis, Denoise
AbstractThis work proposed an area optimization method of high-definition image denoising for full HD image resolution. Conventional denoise techniques have a common defect, which outline of object is blurred while increases the strength of the noise reduction. Meanwhile, we develop a MRA-TV algorithm combined with wavelet transform and TV norm optimization to clear the outline. This method enables a high-quality image denoising with the maintenance of clear outlines. However, there is a fundamental problem that MRA-TV circuit with iterative TVs requires a large implementation due to the size of TV module. In this work, we achieve a significant improvement of that area with the combination of reduction of the calculation and resource sharing utilizing high-level synthesis. Evaluation results show the 52% of area reduction with the maintenance throughput or latency.
PDF file

R5-15 (Time: 15:58 - 16:00)
TitleA Circuit Design Method for Dynamic Reconfigurable Circuits
Author*Hajime Sawano, Takashi Kambe (Kinki University, Japan)
Pagepp. 372 - 376
KeywordReconfigurable Computing, Design Method, DAPDNA-2, JPEG encoder
AbstractReconfigurable Computing (RC) is a new paradigm that addresses the conflicting design requirements of high performance and high area density. In Coarse Grained Architecture (CGA) RC systems, it is important to achieve acceleration using pipelining and also achieve a high PE utilization ratio. This paper proposes an interactive circuit design methodology for Dynamically Reconfigurable Processors to accelerate their performance and achieve compact, low power circuits. The method is applied to a JPEG encoder design and its performance evaluated.
PDF file

R5-16 (Time: 16:00 - 16:02)
TitleConcurrent Verification Experience of Cache Protocol in Real Development of Large SMP Server Product by Using Model Checking
Author*Toru Shonai (Hitachi, Ltd., Japan), Shoichi Hanaki (OKANO Electric Co., Ltd, Japan), Yoshiaki Kinoshita (Hitachi, Ltd., Japan)
Pagepp. 377 - 382
Keywordmodel checking, formal verification, cache protocol, product development, high-end server
AbstractWe have verified the cache protocol by using model checking in real development of the highly multiple-CPU server product. A formal verification engineer abstracted the models for model checking several times through the design process from the protocol specifications written in natural language by the architect team. We discovered actual nine complicated protocol bugs acknowledged by the architects in advance of logic simulation. Some bugs we found were too complicated to be replicated in logic simulation. This effort surely shortened the total design duration. We proved the effectiveness of formal verification of cache protocols in early design phase of real server product development.
PDF file

R5-17 (Time: 16:02 - 16:04)
TitleImplementation of Strictly Convex QP Solver with Multiple Precision Arithmetic
Author*Masahiro Kimura, Hiroshige Dan (Kansai University, Japan)
Pagepp. 383 - 386
KeywordStrictly convex QP, Multiple precision arithmetic, Solver
AbstractOptimization solvers are usually implemented with so-called double precision arithmetic because it has been defined rigorously in the IEEE754-1985 standard and can perform high-speed floating point arithmetic. Double precision arithmetic for optimization basically works well, but it sometimes fails to solve some ill-posed problems. On the other hand, multiple precision arithmetic has attracted much attention recently. In this research, we implemented a solver for strictly convex QPs by using multiple precision arithmetic.