Title | Customizable Hardware Architecture of Support Vector Machine in CAD System for Colorectal Endoscopic Images with NBI Magnification |
Author | *Satoshi Shigemi, Tsubasa Mishima, Anh-Tuan Hoang, Tetsushi Koide, Toru Tamaki, Bisser Raytchev, Kazufumi Kaneda, Yoko Kominami, Rie Miyaki, Taiji Matsuo, Shigeto Yoshida, Shinji Tanaka (Hiroshima University, Japan) |
Page | pp. 298 - 303 |
Keyword | Colorectal Endoscopic Images with NBI Magnification, Support Vector Machine (SVM), Computer-Aided Diagnosis (CAD), FPGA |
Abstract | With the increase of colorectal cancer patients in recent years, the needs of quantitative evaluation of colorectal cancer are increased, and the computer-aided diagnosis (CAD) system which supports doctor's diagnosis is essential. In this paper, a hardware design of type identification module in CAD system for colorectal endoscopic images with narrow band imaging (NBI) magnification [1] is proposed for real-time processing of full high definition (Full HD) image (1920 x 1080 pixel). As a result, it has possible to realize real-time processing of our system. In addition, in order to improve the identification accuracy for type B (TA: tubular adenoma) and type C3 (SM-m cancer) , algorithms to realize a 3-class identification, which has high efficiency and high accuracy, is proposed. |
PDF file |
Title | Analysis of Corner Conditions in PVT Variations and Reliability Degradations |
Author | Atsushi Kurokawa, *Masayuki Watanabe, Makoto Hoshi, Tetsuya Kobayashi, Masa-aki Fukase (Hirosaki University, Japan) |
Page | pp. 304 - 309 |
Keyword | variability, reliability, timing analysis, corner model, on-chip variation |
Abstract | The opposite conditions exist between the best/worst cases for PVT variations and reliability degradations. There are also gaps between general PVT variation and reliability degradation and that of product specifications that must be guaranteed by timing verification during the design process. We clarify these issues through analysis and then present an approach for design guarantee with realistic best-case/worst-case (BC/WC) corner conditions. Finally, the result that analyzed the max conditions of WC corners is shown. |
Title | High Level Synthesis with Stream Query to C Parser: Eliminating Hardware Development Difficulties for Software Developers |
Author | *Eric Shun Fukuda (Hokkaido University, Japan), Takashi Takenaka, Hiroaki Inoue (NEC Corporation, Japan), Hideyuki Kawashima (University of Tsukuba, Japan), Tetsuya Asai, Masato Motomura (Hokkaido University, Japan) |
Page | pp. 310 - 315 |
Keyword | Dynamically Reconfigurable Hardware, Stream Processing, SQL, HLS, C |
Abstract | Recently, reconfigurable hardware is attracting wide attention as a stream processing platform for its high performance and power efficiency. To allow many software engineers to benefit from reconfigurable hardware, high level synthesis tools have been actively developed. Although these tools have enormously reduced the amount of work and difficulties, the users still need hardware development knowledge. In this paper, we introduce a method that parses SQL queries into high-level-synthesis-intended C codes. Our experiments using a dynamically reconfigurable hardware that features a high level synthesis tool showed that the hardware's potential was fully extracted and the developer writing the SQL queries does not need hardware development knowledge. |
Title | High-Level Synthesis for Nested Loop Kernels with Non-Uniform Dependencies |
Author | *Akihiro Suda, Hideki Takase, Kazuyoshi Takagi, Naofumi Takagi (Kyoto University, Japan) |
Page | pp. 322 - 327 |
Keyword | High-Level Synthesis, Polyhedral Optimization, Buffering, OpenMP |
Abstract | In high-level synthesis, parallelization for nested loop kernels has been hard due to their complex data dependencies, especially non-uniform dependencies.
In this paper, we propose a new method to synthesize a parallelized circuit from such kernels using polyhedral optimization, which has been vigorously studied in the software field.
The key point of our contribution is a buffering method for parallel RAM accesses.
The experimental result shows that the parallelized circuit with 8 PEs is 5.73 times faster than the sequential one. |
PDF file |
Title | A Fast Simplification Algorithm for Packet Classification |
Author | *Infall Syafalni (Kyushu Institute of Technology, Japan), Tsutomu Sasao (Meiji University, Japan) |
Page | pp. 328 - 333 |
Keyword | Partitioning, Elimination of rules, TCAM, Packet classification |
Abstract | Packet classification is used in various network applications such as firewalls, access control lists, and network address translators. This technology uses ternary content addressable memories (TCAMs) to perform high speed packet forwarding. However, TCAMs dissipate high power and their cost are high. Thus, reduction of TCAMs is crucial. This paper shows a method to simplify rules in TCAMs for packet classification. We partition the rules into groups so that each group has the same source address, destination address and protocol. After that, we simplify rules in each group by removing redundant rules. We developed a computer program to simplify rules among groups. Experimental results show that this method reduces the size of rules up to 57% of the original specification for ACL5 filter, 73% for ACL3 filter, and 87% for overall filters. This algorithm is useful to reduce TCAMs for packet classification. |
Title | A Low Energy Full TMR Design Method with Optimized Selection of Time/Space TMR Mode and Supply Voltage |
Author | *Kazuhito Ito, Yuki Hayashi (Saitama University, Japan) |
Page | pp. 334 - 339 |
Keyword | TMR, Low energy, MIP, Schedule exploration |
Abstract | Triple modular redundancy (TMR) is to execute an
operation three times and obtain the correct result by taking the
majority of the three outputs. While TMR is effective in eliminating
soft errors in LSIs, the overhead of area as well as the energy
consumption is the problem. In addition to the space TMR mode,
where the three copies of an operation are actually executed, the
time TMR mode is available, where only two copies of an operation
are executed and the results are compared, then if the results
differ, the third copy is executed to get the correct result. With the
time TMR mode, the penalty of energy consumption can be reduced.
The drawback of time TMR is that it requires longer time
duration. Appropriately selecting the power supply voltage is also
an effective technique to reduce the energy consumption. In this
paper, a method to derive a TMR design is proposed which selects
the TMR mode and supply voltage for each operation to minimize
the energy consumption within the time and area constraints. |
PDF file |
Title | Via-Configurable Structured Asic Using Dual Supply Voltages |
Author | Ta-Kai Lin (Yuan Ze University, Taiwan), Kuen-Wey Lin (National Chiao Tung University, Taiwan), Chang-Hao Chiu, *Rung-Bin Lin (Yuan Ze University, Taiwan) |
Page | pp. 340 - 341 |
Keyword | Dual supply voltages, Structured ASIC, Level converter, Low power |
Abstract | This paper presents a via-configurable logic block and a design methodology for realizing fine-grained, dual-supply-voltage structured ASIC. Our results show that, given various timing budgets, our approach achieves a reduction up to 44% on energy per switching of our dual-supply-voltage structured ASIC at the expense of 1.6% overhead on level converters. |
Title | A Basic-Block Level Optimistic Energy Estimation for Power-Gated VLIW Data-Path Model |
Author | *Shunsuke Nakamura, Ittetsu Taniguchi, Hiroyuki Tomiyama, Masahiro Fukui (Ritsumeikan University, Japan) |
Page | pp. 354 - 359 |
Keyword | Energy estimation, VLIW data-path, Power-gating |
Abstract | This paper proposes a basic-block level optimistic energy estimation for power-gated very long instruction-set word (VLIW) data-path model.
A power-gating (PG) brings a big benefit for leakage power reduction, but it makes an instruction scheduling difficult because applying PG usually takes dozens or hundreds of consecutive NOP cycles.
To estimate the energy consumption of such power-gated VLIW data-path, an optimization of instruction scheduling is necessary.
Proposed method enables fast and accurate energy estimation without time consuming instruction scheduling.
Experimental results demonstrated the effectiveness of proposed method. |
Title | A Memory-Saving Technique for 4K Super-Resolution Circuit with Binary Tree Dictionary |
Author | *Ayumi Kiriyama, Ryo Matsuzuka, Kohei Michibata, Takahiro Kitayama, Yuzuru Shizuku, Tetsuya Hirose, Nobutaka Kuroki, Masahiro Numa (Kobe University, Japan) |
Page | pp. 360 - 365 |
Keyword | learning-based super-resolution, memory-saving, hardware architecture |
Abstract | In this paper, we propose a memory-saving technique for 4K super-resolution circuit with binary tree dictionary. In the conventional architecture, 8 super-resolution circuits work in parallel to output 4K video signal. Each circuit needs a large dictionary. We propose a memory-saving technique by sharing the dictionary. In the proposed architecture, a binary search tree circuit consists of a ROM-read stage and a calculation stage, which enables 8 super-resolution circuits to access a single ROM in parallel. Moreover, we propose a memory compaction technique for the binary tree dictionary. All nodes of the tree are stored on the ROM without gaps. Since each node has addresses of child nodes on the ROM, we can trace the tree easily. Experimental results have shown that our architecture can reduce 87% memory area. |
Title | HLS Utilizing Area Optimizing Method for High-Definition MRA-TV Denoise Circuit |
Author | *Eita Kobayashi (NEC Corporation, Japan), Kenta Senzaki, Atsufumi Shibayama (NEC Corporataion, Japan), Yuichi Nakamura (NEC Corporation, Japan) |
Page | pp. 366 - 371 |
Keyword | Circuits, Optimization, Design Methodology, High-Level Synthesis, Denoise |
Abstract | This work proposed an area optimization method of high-definition image denoising for full HD image resolution. Conventional denoise techniques have a common defect, which outline of object is blurred while increases the strength of the noise reduction. Meanwhile, we develop a MRA-TV algorithm combined with wavelet transform and TV norm optimization to clear the outline. This method enables a high-quality image denoising with the maintenance of clear outlines. However, there is a fundamental problem that MRA-TV circuit with iterative TVs requires a large implementation due to the size of TV module. In this work, we achieve a significant improvement of that area with the combination of reduction of the calculation and resource sharing utilizing high-level synthesis. Evaluation results show the 52% of area reduction with the maintenance throughput or latency. |
PDF file |
Title | Concurrent Verification Experience of Cache Protocol in Real Development of Large SMP Server Product by Using Model Checking |
Author | *Toru Shonai (Hitachi, Ltd., Japan), Shoichi Hanaki (OKANO Electric Co., Ltd, Japan), Yoshiaki Kinoshita (Hitachi, Ltd., Japan) |
Page | pp. 377 - 382 |
Keyword | model checking, formal verification, cache protocol, product development, high-end server |
Abstract | We have verified the cache protocol by using model checking in real development of the highly multiple-CPU server product. A formal verification engineer abstracted the models for model checking several times through the design process from the protocol specifications written in natural language by the architect team. We discovered actual nine complicated protocol bugs acknowledged by the architects in advance of logic simulation. Some bugs we found were too complicated to be replicated in logic simulation. This effort surely shortened the total design duration. We proved the effectiveness of formal verification of cache protocols in early design phase of real server product development. |
PDF file |