Title | TOF-based 3-Dimensional Head-Tracking System for Repetitive Transcranial Magnetic Stimulation |
Author | *Ryo Ebisuwaki, Yoshihiro Yasumuro, Hiroshige Dan, Masahiko Fuyuki (Faculty of Environmental and Urban Engineering, Kansai University, Japan) |
Page | pp. 2 - 5 |
Keyword | TOF camera, ICP algorithm, head tracking |
Abstract | Many survivors from brain infraction and hemorrhage get peripheral nerve damage that sometimes causes neuropathic pain even without external injury. While no medication is useful for neuropathic pain, repetitive transcranial magnetic stimulation (rTMS) has gathered attention for mitigating this pain. Existing rTMS treatments requires precise localization of the stimulation target in the brain and it is necessary to bind a patient to a bed. This paper proposes a new localizing system scheme for keeping the patient unconstrained, using a TOF camera to measure the three-dimensional shape of an object in realtime. |
PDF file |
Title | A High-speed H.264/AVC CABAC Decoder for 4K Video Utilizing Residual Data Accelerator |
Author | *Kenji Watanabe (Synthesis Corporation, Japan), Gen Fujita (Dept. Engineering Informatics, Osaka Electro-Communication University, Japan), Toru Homemoto, Ryoji Hashimoto (Graduate School of Information Science and Technology, Osaka University, Japan) |
Page | pp. 6 - 10 |
Keyword | H.264, CABAC |
Abstract | The implementation of a parallel decoder
for CABAC (Context-based Adaptive Binary
Arithmetic Coding), which is adopted in the
H.264/AVC video coding standard, is extremely difficult
due to inherent data dependency. Therefore, the
CABAC decoder constitutes a bottleneck when decoding
1080 HD (1,920x1,080) or higher video sequences
in real time. In this paper, we propose a VLSI (Very
Large Scale Integration) architecture for the CABAC
decoder that adopts a multi-bin decoding architecture
in conjunction with techniques that improve the maximum
clock frequency. The implementation results
show that the proposed architecture achieves an average
throughput of 1.48 bins per clock and a maximum
clock frequency of 394 MHz, demonstrating that our
architecture is capable of decoding 4K (4,096x2,048
@ 30 fps) video in real time. |
PDF file |
Title | Low Power Decision Tree-Based Flow Search Engine |
Author | *Eita Kobayashi, Norio Yamagaki, Takashi Takenaka, Satoshi Kamiya (NEC Corporation, Japan), Masato Motomura (Hokkaido University, Japan) |
Page | pp. 11 - 16 |
Keyword | Search Engine, TCAM, Low Power, Design Method |
Abstract | This paper presents a novel architecture for a low-power flow search engine. It comprises a combination of decision tree based pipelines using general-purpose memories and linear search pipelines that prevent rule duplication.
We also developed a design method taking into account of the robustness for the fluctuation of the network property. The evaluation result shows that the hardware implementation of our architecture archives power reduction by up to 92% with maintenance of performance as much as TCAM. |
PDF file |
Title | Manycore NOC Based 2400-PE Network on Chip Emulation and Verification Environment |
Author | *Omar Hammami (ENSTA ParisTech, France), Xinyu Li (EVE, France) |
Page | pp. 17 - 21 |
Keyword | emulation, FPGA, manycore, NOC, verification |
Abstract | we present in this paper NOCEVE an industrial Network on Chip (NoC) emulation and verification environment on industrial large scale multi-FPGA emulation platform for billion cycle application. It help designer to improve system performance by the analysis of traffic distribution and balance through the network on chip. The hardware monitoring network is generated by another commercial NoC design tool. It consists of traffic collectors, which is reconfigurable to collect different traffic information such as packet latency and throughput. The statistic traffic information is collected during real application execution on FPGA platform and it is sent through monitoring network on FPGA and then PCI bright board back to host computer for real-time visualization or post-execution data analysis. NOCEVE is the first industrial NoC emulation and verification environment for billion cycle applications. |
Title | A Technique for Accelerating SVM-Based Image Recognition Using GPU |
Author | *Jin Sasaki, Yuzuru Shizuku, Tetsuya Hirose, Nobutaka Kuroki, Masahiro Numa (Kobe University, Japan) |
Page | pp. 28 - 32 |
Keyword | GPGPU, Accelaration, Image Recognition, SVM |
Abstract | In this paper, we propose a technique for accelerating image recognition based on SVM (Support Vector Machine) using GPU (Graphics Processing Unit). We have applied the proposed technique to human detection based on HOG (Histogram of Oriented Gradients) features. Experimental results have shown that the proposed technique achieves speedups of 107.7 times in the learning process, and speedups of 9.5 times in the recognition process compared with the conventional technique using CPU (Central Processing Unit) without lowering recognition accuracy. |
Title | Variation of Substrate Sensitivity in Differential Pair Transistors |
Author | *Satoshi Takaya, Takashi Hasegawa, Yoji Bando (Kobe University, Japan), Toru Ohkawa, Toshiharu Takaramoto, Toshio Yamada, Masaaki Souda, Shigetaka Kumashiro, Tohru Mogami (MIRAI-Selete, Japan), Makoto Nagata (Kobe University, Japan) |
Page | pp. 33 - 35 |
Keyword | Substrate coupling, Substrate noise, On-chip monitoring |
Abstract | The sensitivity of differential pair transistors against substrate voltage variation is investigated in different technology nodes at 90 nm and 65 nm. On-chip measurements were carried out for the response of transistors against small AC signals at input nodes as well as on a silicon substrate.
The analysis of the substrate sensitivity and its variation due to physical factors in a layout was discussed with measurement data. The universality and dependency of the substrate sensitivity in different technical nodes were also addressed. |
Title | Automatic Generation of GNU Binutils and GDB for Custom Processors Based on Plug-in Method |
Author | Takahiro Kumura (NEC Corporation, Japan), Soichiro Taga (Mitsubishi Electric Micro-Computer Application Software Co., Ltd., Japan), *Nagisa Ishiura (Kwansei Gakuin University, Japan), Yoshinori Takeuchi, Masaharu Imai (Osaka University, Japan) |
Page | pp. 36 - 41 |
Keyword | software development tools, custom processor, binutils, gdb, plug-in method |
Abstract | This paper presents a scheme of auto-generating GNU software
development tools for newly developed processor cores based on a
plug-in method. An experimental system based on our method successfully generated a
GNU toolchain consisting of an assembler, a disassembler, a linker, a
simulator, and a debugger from succinct architecture description.
Although the generated GDB supports only assembly level debugging,
this is the first system that retargets the GDB debugger automatically. |
PDF file |
Title | Accelerating Regression Test of Compilers by Test Program Merging |
Author | *Takayuki Fukumoto (Kwansei Gakuin University, Japan), Kazushi Morimoto (Nomura Research Institute, Ltd., Japan), Nagisa Ishiura (Kwansei Gakuin University, Japan) |
Page | pp. 42 - 47 |
Keyword | C language, compiler, test suite, gcc, testgen |
Abstract | This paper proposes a method of accelerating regression test of compilers by merging test programs in compiler test suites. Large amount of computation time is needed for compiler testing through test suites, for they consist of a huge number of test programs. Especially, in early stages of compiler development, reduction of time for testing is a critical issue, for bug fixes and regression tests are alternately repeated for many times. The proposed method attempts to shorten the time for test suite run by merging test programs in the test suite into longer but fewer programs, which drastically reduces the overhead for file open/close. During the merger, conflicts among the names of global variables, functions, and user defined types are avoided by prefixing. Header file inclusion as well as multiplier compilation are carefully handled so that the semantics of the original test programs are maintained. A technique is also proposed to identify test programs that resulted in execution errors while executing the merged test programs. In an experiment where about 9,000 test programs in the testgen test suite were merged into 117 programs, computation time was reduced into 1/11.1 on Ubuntu Linux and into 1/63.9 on Cygwin on 2.5GHz Core i5 CPU. |
PDF file |
Title | Random Testing of C Compilers Targeting Arithmetic Optimization |
Author | *Eriko Nagai (Kwansei Gakuin University, Japan), Hironobu Awazu (Fujitsu, Japan), Nagisa Ishiura (Kwansei Gakuin University, Japan), Naoya Takeda (ITEC Hankyu Hanshin, Japan) |
Page | pp. 48 - 53 |
Keyword | compiler, randomtest |
Abstract | This paper presents a method of testing validity
of arithmetic optimization of C compilers using random
programs. Compilers are tested by programs which
contain randomly generated arithmetic expressions. Undefined
behavior of the C language is carefully avoided during random
program generation. This is based on precise computation
of expected values of the expressions which takes
implementation-defined behavior into
account. A method for automatic minimization of error
programs is also presented which expedites the analysis
of detected errors. A random test program based on our
method has detected malfunctions in several compilers,
which include LVN GCC 4.2.1 shipped with the latest
Mac OS X, GCC 4.4.4 for Ubuntu Linux, GCC 4.3.4 for
Cygwin, and GCC 4.4.1 for h8300-elf and m32r-elf. |
PDF file |
Title | Compiler-Assisted Soft Error Correction by Duplicating Instructions for VLIW Architecture |
Author | Yunrong Li, Jongwon Lee (Seoul National University, Republic of Korea), *Yohan Ko, Kyoungwoo Lee (Yonsei University, Republic of Korea), Yunheung Paek (Seoul National University, Republic of Korea) |
Page | pp. 54 - 59 |
Keyword | soft error, vliw, embedded system, error correction |
Abstract | Exponentially increasing with technology scaling, soft errors have become a serious design concern in the deep sub-micron era. Error detection in VLIW or embedded systems is not enough while error correction is expensive due to the recovery mechanism. In this work, we present an enhanced VLIW architecture capable of not only error detection but also error correction by duplicating instructions efficiently, by re-executing the error-detected instruction, and by adopting the voting mechanism with the help of compilation techniques.
Further, we propose a scheduling algorithm to improve the instruction scheduling over the executable under the performance constraint. Our experimental results on ADL-described VLIW datapath demonstrate that our solution efficiently improves the reliability by 29% over the suite of DSPStone benchmarks without performance overhead in our compiler-scheduler-simulator framework. |
PDF file |
Title | Compiler Generation Method from ADL for ASIP Integrated Development Environment |
Author | *Yusuke Hyodo, Kensuke Murata (Osaka University, Japan), Takuji Hieda (Ritsumeikan University, Japan), Keishi Sakanushi, Yoshinori Takeuchi, Masaharu Imai (Osaka University, Japan) |
Page | pp. 60 - 65 |
Keyword | ASIP, compiler generation, instruction selection, code generation description, ADL |
Abstract | In this paper, we propose a compiler generation method from architecture description language(ADL)
of ASIP Integrated Development Environment.
By using our proposed method the modification of compiler, due to changes in processor specification, becomes easier and the amount of description and design time can be reduced.
In our experiments, we compared the description and design time of an ASIP using our proposed method and conventional method, which generates a compiler manually.
The experimental results show that the proposed method can reduce both the amount of the description and the design time by approximately 80% as compared to conventional method. |
PDF file |
Title | Mono-instruction Computer on a Dynamically Reconfigurable Gate Array |
Author | *Yuki Nihira, Minoru Watanabe (Shizuoka University, Japan) |
Page | pp. 66 - 70 |
Keyword | FPGA, ORGA, Dynamic reconfiguration |
Abstract | As gates in field programmable gate arrays (FPGAs) become usable in ever-increasing numbers, FPGAs are becoming more widely used in various applications.
Currently, FPGAs are implemented in many embedded systems. Demand for implementing a processor onto an FPGA is gaining.
In response to that demand, FPGA vendors have provided soft-core processors for FPGAs, but those processors invariably have lower performance than that of hard-core processors.
This paper therefore presents a proposal for a high-performance mono-instruction computer that fully exploits the programmability of a dynamically reconfigurable gate array.
In addition, this paper clarifies implementation area and operation frequency advantages of mono-instruction computers relative to soft-core RISC processors. |
PDF file |
Title | ASPE: an Abstruction Framework using ALU Arrays for Scalable Multiple FPGAs System |
Author | Kenta Inakagata, *Takayuki Akamine, Hirokazu Morishita (Keio University, Japan), Yasunori Osana (Ryukyu University, Japan), Naoyuki Fujita (Japan Aerospace Exploration Agency, Japan), Hideharu Amano (Keio University, Japan) |
Page | pp. 71 - 76 |
Keyword | Multi-FPGA, Acceleration with FPGAs, Floating-Point, Programmability |
Abstract | Multi-FPGA systems have attracted attentions as cost-efficient accelerators for high performance scientific computation. The major problem of such systems for users is programability. It is difficult
especially for Multi-FPGA systems to find the best structure considering the resource and communication capability with HDL-based design.
Here, ASPE, a design framework using arrays of processing elements on FPGAs is proposed to address the problem. Instead of HDL-coding,
ASPE makes the application executed by defining operations and
communication in the ALU arrays on multiple FPGAs.
MUSCL, the core program in the computational fluid dynamics is implemented on the ASPE as an example, and evaluation results show that about 4.1 times performance compared with software on Intel Core 2 Duo is achieved. |
PDF file |
Title | Robust Register Files by Exploiting Asymmetric Soft Error Rate |
Author | *Yohan Ko, Kyoungwoo Lee (Yonsei University, Republic of Korea) |
Page | pp. 77 - 81 |
Keyword | register file, dependability, soft error, ASER, profiling |
Abstract | As technology scaling, soft errors induced by external radiation or cosmic rays are becoming a serious concern in micro-architectures. In particular, soft errors in register files are critical in reliability since these errors are easily propagated to other components of processors, causing catastrophic system failures. To protect data in register files, there exist redundancy techniques such as Triple Modular Redundancy (TMR) and Error Correcting Code (ECC). However, these techniques incur high overheads in terms of area, performance, and power consumption. In this paper, we increase reliability on data in register files by simply applying inverters since soft error rates are asymmetric, i.e., different between 0 and 1 in bit values. The main idea behind our approach is to increase the more stable bit values in register files by inverting bit values if it has more unstable bit values from profiling data. Our experimental results show that our proposal can reduce soft error rates by up to 20% over a suite of benchmarks with minimal overheads due to inverters. |
PDF file |
Title | Performance Comparison of RG-DTM PUF and Arbiter-based PUFs |
Author | *Kousuke Ogawa, Mitsuru Shiozaki, Kota Furuhashi, Kohei Hozumi, Takeshi Fujino (Ritsumeikan University, Japan) |
Page | pp. 82 - 87 |
Keyword | PUF, RG-DTM PUF, Modeling Attack, SVM |
Abstract | The proposed RG-DTM PUF achieves high uniqueness and security against modeling attacks. Hence, the RG-DTM PUF is suitable for tamper-resistance device, such as IC identification, authentication and key generation, compared with an arbiter-PUF and XOR arbiter-PUF. This paper presents performance comparisons which include uniqueness, stability, resistance for modeling attacks, circuit area and power consumption. |
Title | Hardware Architecture for Accelerating Monte Carlo based SSTA using Generalized STA Processing Element |
Author | *Hiroshi Yuasa, Hiroshi Tsutsui, Hiroyuki Ochi, Takashi Sato (Kyoto University, Japan) |
Page | pp. 88 - 93 |
Keyword | STA Processing Element, Monte Carlo based SSTA, Hardware Acceleration, Static Timing Analysis, STA-PE |
Abstract | We propose a novel hardware architecture for accelerating
Monte Carlo based statistical static timing analysis (MC-SSTA).
In our approach, generalized hardware module called
STA processing element (STA-PE) is used for delay evaluation of
a logic gate. The proposed architecture is successfully
implemented on an FPGA device, in which 26 STA-PEs run
in parallel at 116 MHz clock. It achieves1,457 times acceleration compared to a software implementation. |
Title | Head-Tail Expressions for Interval Functions |
Author | *Infall Syafalni, Tsutomu Sasao (Kyushu Institute of Technology, Japan) |
Page | pp. 94 - 99 |
Keyword | Interval Function, Head-Tail Expression, TCAM |
Abstract | This paper shows a method to represent interval functions by using head-tail expressions. The head-tail expressions represent greater-than GT(n:A) functions, less-than LT(n:B) functions, and interval functions IN0(n:A,B) more efficiently than sum-of-products expressions, where n denotes the number of bits to represent the largest value in the interval (A,B). This paper proves that a head-tail expression represents an interval function with at most n words in a TCAM realization. Experimental results for up to n=16 are shown. |
Title | A Performance Monitoring Tool Suite for Software and SoC On-Chip Bus |
Author | *Yi-Hao Chang, Ing-Jer Huang (National Sun Yat-Sen University, Taiwan) |
Page | pp. 100 - 105 |
Keyword | performacne Analysis, SoC |
Abstract | Nowadays SoC involves both software and hardware designs, performance bottleneck may occur either in software/hardware or even both. But present performance monitoring tools usually evaluates one of software/hardware performance, which is not quite enough for nowadays SoC designs. Furthermore, due to increasing complexity of user requirements, embedded OS, such as Linux is introduced to manage the limited hardware resources for complicated applications. However, it also makes performance monitoring harder since the memory addressing space is divided into user space and kernel space with different capability to access system resources, which makes user space application impossible to retrieve system performance information without kernel or hardware supports. In this paper, we propose a performance monitoring tool suite which is capable of analyzing the performance of user pace application, kernel space device driver and AMBA AHB bus for SoC running under Linux. |
Title | Backward Multiple Time-frame Expansion for Accelerating Sequential SAT |
Author | *Kousuke Torii, Kazuhiro Nakamura (Nagoya University, Japan), Kazuyoshi Takagi, Naofumi Takagi (Kyoto University, Japan) |
Page | pp. 106 - 110 |
Keyword | Sequential SAT, sequential circuit, Formal Verification |
Abstract | Sequential SAT is a formal verification problem which checks whether an input sequence to given circuit, such that a desired objective is satisfied, exists or not.
Efficient algorithm for Sequential SAT solver is required to deal with sequential circuits which have large state space. In this paper, we demonstrate backward multiple time-frame expansion(BMTE) and present an algorithm for Sequential SAT solver that supports it. The proposed algorithm is suitable for merging states and pruning the state space for search.
We show our promising experimental result. |
PDF file |
Title | On Optimization of Power Network Synthesis for Multiple Power Domain Designs |
Author | Chieh-Jui Lee, Shih-Ying Liu, Chuan-Chia Huang, *Hung-Ming Chen (Institute of Electronics Engineering, National Chiao Tung University, Taiwan) |
Page | pp. 111 - 114 |
Keyword | Power network synthesis, Multiple power domain |
Abstract | In this paper, we propose a methodology that synthesize
and optimize the power network for design with multiple
power domains. An architecture is presented to represent the
power network with presence of sleep transistors. The power
network is numerically modeled to RC network using Modified
Nodal Analysis and solved using Conjugate Gradient Method.
Regarding to IR drop effect mitigation, an optimization
technique is proposed based on Simulated Annealing that minimize
total power stripe area while satisfying a given IR drop
constraint. In consideration of multiple power domains, the given
power domains are represented in tree-like structure and our
algorithm is recursively applied to synthesize and optimize the
power network for each power domain in a hierarchical fashion.
The proposed methodology is integrated to commercial design
tool and experimented on real design case for evaluation. To
ensure practical aspect of our approach, evaluation is performed
on latest digital design commercial tool. Design data and parameters
are extracted using Open Access. The result of our
algorithm is fed back to latest commercial tool for final IR and
EM analysis. Our algorithm is tested on both industrial testcase
and academic MCNC benchmark. Comparing to conventional
P/G network, using our power network synthesis can achieve
31% - 35% reduction in total P/G area while satisfying maximum
10% IR-drop constraint. |
PDF file |
Title | Thermal-Aware Placement for Hotspot Mitigation in 3D FPGAs |
Author | *Juinn-Dar Huang, Ya-Shih Huang, Mi-Yu Hsu, Han-Yuan Chang (National Chiao Tung University, Taiwan) |
Page | pp. 115 - 120 |
Keyword | Three-dimensional integration, 3D FPGAs, thermal-aware placement, logic block placement |
Abstract | Three-dimensional (3D) integration is an attractive and promising way for more complicated designs, whereas the thermal issue is a critical challenge for 3D integrated circuits. Moreover, accurate thermal analysis is too time-consuming to be incorporated into practical placement algorithms generally performing numerous iterative refinement steps. Therefore, in this paper, we propose two fast thermal-aware placement methods for 3D FPGAs, Standard Deviation (SD) and MineSweeper (MS), without the need of detailed thermal analysis. Both are devoted to distribute power sources more evenly within a 3D FPGA to mitigate hotspots. The experimental results show that SD and MS achieve 12.1%/7.6% reduction in maximum temperature and 82%/56% improvement in temperature deviation compared to a typical thermal-unaware placement method only at the cost of minor increase in wirelength and delay. Moreover, MS merely consumes 4% more runtime for producing thermal-aware placement solutions. |
Title | Efficient Delay Cells for Wave Pipelined Multifunctional Unit |
Author | Atsushi Kurokawa, *Tatsuya Takaki, Masa-aki Fukase (Hirosaki University, Japan) |
Page | pp. 121 - 126 |
Keyword | wave pipeline, processor, multifunctional unit, delay cell, buffer insertion |
Abstract | Wave pipelining requires the addition of cells and wiring in order to slow down faster paths so that their delays are close to that of the longest path. For tuning the delay, a large number of buffers are usually inserted. This results in an increased chip area. This paper focuses on the area problem due to buffer insertion and presents new delay cells that have high area efficiency and are low in cost. Estimations are made of the delay, power consumption, and area of various types of the new delay cells. It is found that cells with intermediate transistors having narrow and long channels are the best in terms of area and power consumption. Cells of the best type are applied to a multifunctional unit (MFU). Experimental results show that a circuit with the new delay cells has a smaller area than one with only standard cells. |
Title | An Integrated Smart Current Sensing Current-Mode Buck Converter |
Author | *Chia-Min Chen, Kai-Hsiu Hsu, Chung-Chih Hung (National Chiao Tung University, Taiwan) |
Page | pp. 127 - 130 |
Keyword | current-mode controller, current-sensing circuit, DC-DC converter, pulse-width modulation(PWM), switch-mode power converter |
Abstract | This paper presents an integrated circuit implementation of a high efficiency current-mode buck converter over a wide loading current. The converter adaptively operates as Pulse-Width Modulation (PWM). An on-chip current sensing technique is employed to reduce external components and no extra I/O pins are needed for the current-mode controller. A soft-start operation is designed to eliminate the excess large current during the startup of the regulator. The DC-DC converter was fabricated in 0.35um CMOS process with 2P4M. The range of the supply voltage is from 2 to 5V, which is suitable for single-cell lithium-ion battery. |
PDF file |
Title | An Effective Overlap Removable Objective for Analytical Placement |
Author | *Syota Kuwabara, Yukihide Kohira (The University of Aizu, Japan), Yasuhiro Takashima (The University of Kitakyushu, Japan) |
Page | pp. 135 - 140 |
Keyword | Analytical placement, minimization of overlap area, overlap removable area |
Abstract | In the recent LSI design, it is difficult to obtain the placement which satisfies design constraints and specifications. Analytical placement is promising to obtain the placement which satisfies design constraints and specifications. Although existing methods obtain the placement with short wire length, the obtained placement has overlap. In this paper, we propose overlap removable area as an overlap evaluation method for analytical placement. Experiments show that the proposed method is effective in order to remove overlap in analytical placement. |