(Back to Session Schedule)

SASIMI 2012
The 17th Workshop on Synthesis And System Integration of Mixed Information Technologies

Poster I
Time: 10:15 - 12:00 Thursday, March 8, 2012
Location: Int'l Conf. Room & Mtg. Room 31
Chairs: Hiroaki Yoshida (University of Tokyo, Japan), Tomoo Inoue (Hiroshima City University, Japan)

R1-1
TitleTOF-based 3-Dimensional Head-Tracking System for Repetitive Transcranial Magnetic Stimulation
Author*Ryo Ebisuwaki, Yoshihiro Yasumuro, Hiroshige Dan, Masahiko Fuyuki (Faculty of Environmental and Urban Engineering, Kansai University, Japan)
Pagepp. 2 - 5
KeywordTOF camera, ICP algorithm, head tracking
AbstractMany survivors from brain infraction and hemorrhage get peripheral nerve damage that sometimes causes neuropathic pain even without external injury. While no medication is useful for neuropathic pain, repetitive transcranial magnetic stimulation (rTMS) has gathered attention for mitigating this pain. Existing rTMS treatments requires precise localization of the stimulation target in the brain and it is necessary to bind a patient to a bed. This paper proposes a new localizing system scheme for keeping the patient unconstrained, using a TOF camera to measure the three-dimensional shape of an object in realtime.
PDF file

R1-2
TitleA High-speed H.264/AVC CABAC Decoder for 4K Video Utilizing Residual Data Accelerator
Author*Kenji Watanabe (Synthesis Corporation, Japan), Gen Fujita (Dept. Engineering Informatics, Osaka Electro-Communication University, Japan), Toru Homemoto, Ryoji Hashimoto (Graduate School of Information Science and Technology, Osaka University, Japan)
Pagepp. 6 - 10
KeywordH.264, CABAC
AbstractThe implementation of a parallel decoder for CABAC (Context-based Adaptive Binary Arithmetic Coding), which is adopted in the H.264/AVC video coding standard, is extremely difficult due to inherent data dependency. Therefore, the CABAC decoder constitutes a bottleneck when decoding 1080 HD (1,920x1,080) or higher video sequences in real time. In this paper, we propose a VLSI (Very Large Scale Integration) architecture for the CABAC decoder that adopts a multi-bin decoding architecture in conjunction with techniques that improve the maximum clock frequency. The implementation results show that the proposed architecture achieves an average throughput of 1.48 bins per clock and a maximum clock frequency of 394 MHz, demonstrating that our architecture is capable of decoding 4K (4,096x2,048 @ 30 fps) video in real time.
PDF file

R1-3
TitleLow Power Decision Tree-Based Flow Search Engine
Author*Eita Kobayashi, Norio Yamagaki, Takashi Takenaka, Satoshi Kamiya (NEC Corporation, Japan), Masato Motomura (Hokkaido University, Japan)
Pagepp. 11 - 16
KeywordSearch Engine, TCAM, Low Power, Design Method
AbstractThis paper presents a novel architecture for a low-power flow search engine. It comprises a combination of decision tree based pipelines using general-purpose memories and linear search pipelines that prevent rule duplication. We also developed a design method taking into account of the robustness for the fluctuation of the network property. The evaluation result shows that the hardware implementation of our architecture archives power reduction by up to 92% with maintenance of performance as much as TCAM.
PDF file

R1-4
TitleManycore NOC Based 2400-PE Network on Chip Emulation and Verification Environment
Author*Omar Hammami (ENSTA ParisTech, France), Xinyu Li (EVE, France)
Pagepp. 17 - 21
Keywordemulation, FPGA, manycore, NOC, verification
Abstractwe present in this paper NOCEVE an industrial Network on Chip (NoC) emulation and verification environment on industrial large scale multi-FPGA emulation platform for billion cycle application. It help designer to improve system performance by the analysis of traffic distribution and balance through the network on chip. The hardware monitoring network is generated by another commercial NoC design tool. It consists of traffic collectors, which is reconfigurable to collect different traffic information such as packet latency and throughput. The statistic traffic information is collected during real application execution on FPGA platform and it is sent through monitoring network on FPGA and then PCI bright board back to host computer for real-time visualization or post-execution data analysis. NOCEVE is the first industrial NoC emulation and verification environment for billion cycle applications.

R1-5
TitleBit-Selective SAD and Its Evaluation
AuthorRyosuke Hamaji, Yongson Choi, Yuko Hara-Azumi, *Shigeru Yamashita (Ritsumeikan University, Japan)
Pagepp. 22 - 27
KeywordBlock Maching, SAD
AbstractThis paper describes a simulation result which shows whether and which bits may be reduced among bit-value of images in calculating Sum-of-Absolute-Differences (SAD) for block matching. We find the importance of low-bits in calculating SAD. We also introduce a calculating SAD architecture that can select 8-bit or 4-bit mode to calculate SAD values with a new scheduling technique.

R1-6
TitleA Technique for Accelerating SVM-Based Image Recognition Using GPU
Author*Jin Sasaki, Yuzuru Shizuku, Tetsuya Hirose, Nobutaka Kuroki, Masahiro Numa (Kobe University, Japan)
Pagepp. 28 - 32
KeywordGPGPU, Accelaration, Image Recognition, SVM
AbstractIn this paper, we propose a technique for accelerating image recognition based on SVM (Support Vector Machine) using GPU (Graphics Processing Unit). We have applied the proposed technique to human detection based on HOG (Histogram of Oriented Gradients) features. Experimental results have shown that the proposed technique achieves speedups of 107.7 times in the learning process, and speedups of 9.5 times in the recognition process compared with the conventional technique using CPU (Central Processing Unit) without lowering recognition accuracy.

R1-7
TitleVariation of Substrate Sensitivity in Differential Pair Transistors
Author*Satoshi Takaya, Takashi Hasegawa, Yoji Bando (Kobe University, Japan), Toru Ohkawa, Toshiharu Takaramoto, Toshio Yamada, Masaaki Souda, Shigetaka Kumashiro, Tohru Mogami (MIRAI-Selete, Japan), Makoto Nagata (Kobe University, Japan)
Pagepp. 33 - 35
KeywordSubstrate coupling, Substrate noise, On-chip monitoring
AbstractThe sensitivity of differential pair transistors against substrate voltage variation is investigated in different technology nodes at 90 nm and 65 nm. On-chip measurements were carried out for the response of transistors against small AC signals at input nodes as well as on a silicon substrate. The analysis of the substrate sensitivity and its variation due to physical factors in a layout was discussed with measurement data. The universality and dependency of the substrate sensitivity in different technical nodes were also addressed.

R1-8
TitleAutomatic Generation of GNU Binutils and GDB for Custom Processors Based on Plug-in Method
AuthorTakahiro Kumura (NEC Corporation, Japan), Soichiro Taga (Mitsubishi Electric Micro-Computer Application Software Co., Ltd., Japan), *Nagisa Ishiura (Kwansei Gakuin University, Japan), Yoshinori Takeuchi, Masaharu Imai (Osaka University, Japan)
Pagepp. 36 - 41
Keywordsoftware development tools, custom processor, binutils, gdb, plug-in method
AbstractThis paper presents a scheme of auto-generating GNU software development tools for newly developed processor cores based on a plug-in method. An experimental system based on our method successfully generated a GNU toolchain consisting of an assembler, a disassembler, a linker, a simulator, and a debugger from succinct architecture description. Although the generated GDB supports only assembly level debugging, this is the first system that retargets the GDB debugger automatically.
PDF file

R1-9
TitleAccelerating Regression Test of Compilers by Test Program Merging
Author*Takayuki Fukumoto (Kwansei Gakuin University, Japan), Kazushi Morimoto (Nomura Research Institute, Ltd., Japan), Nagisa Ishiura (Kwansei Gakuin University, Japan)
Pagepp. 42 - 47
KeywordC language, compiler, test suite, gcc, testgen
AbstractThis paper proposes a method of accelerating regression test of compilers by merging test programs in compiler test suites. Large amount of computation time is needed for compiler testing through test suites, for they consist of a huge number of test programs. Especially, in early stages of compiler development, reduction of time for testing is a critical issue, for bug fixes and regression tests are alternately repeated for many times. The proposed method attempts to shorten the time for test suite run by merging test programs in the test suite into longer but fewer programs, which drastically reduces the overhead for file open/close. During the merger, conflicts among the names of global variables, functions, and user defined types are avoided by prefixing. Header file inclusion as well as multiplier compilation are carefully handled so that the semantics of the original test programs are maintained. A technique is also proposed to identify test programs that resulted in execution errors while executing the merged test programs. In an experiment where about 9,000 test programs in the testgen test suite were merged into 117 programs, computation time was reduced into 1/11.1 on Ubuntu Linux and into 1/63.9 on Cygwin on 2.5GHz Core i5 CPU.
PDF file

R1-10
TitleRandom Testing of C Compilers Targeting Arithmetic Optimization
Author*Eriko Nagai (Kwansei Gakuin University, Japan), Hironobu Awazu (Fujitsu, Japan), Nagisa Ishiura (Kwansei Gakuin University, Japan), Naoya Takeda (ITEC Hankyu Hanshin, Japan)
Pagepp. 48 - 53
Keywordcompiler, randomtest
AbstractThis paper presents a method of testing validity of arithmetic optimization of C compilers using random programs. Compilers are tested by programs which contain randomly generated arithmetic expressions. Undefined behavior of the C language is carefully avoided during random program generation. This is based on precise computation of expected values of the expressions which takes implementation-defined behavior into account. A method for automatic minimization of error programs is also presented which expedites the analysis of detected errors. A random test program based on our method has detected malfunctions in several compilers, which include LVN GCC 4.2.1 shipped with the latest Mac OS X, GCC 4.4.4 for Ubuntu Linux, GCC 4.3.4 for Cygwin, and GCC 4.4.1 for h8300-elf and m32r-elf.
PDF file

R1-11
TitleCompiler-Assisted Soft Error Correction by Duplicating Instructions for VLIW Architecture
AuthorYunrong Li, Jongwon Lee (Seoul National University, Republic of Korea), *Yohan Ko, Kyoungwoo Lee (Yonsei University, Republic of Korea), Yunheung Paek (Seoul National University, Republic of Korea)
Pagepp. 54 - 59
Keywordsoft error, vliw, embedded system, error correction
AbstractExponentially increasing with technology scaling, soft errors have become a serious design concern in the deep sub-micron era. Error detection in VLIW or embedded systems is not enough while error correction is expensive due to the recovery mechanism. In this work, we present an enhanced VLIW architecture capable of not only error detection but also error correction by duplicating instructions efficiently, by re-executing the error-detected instruction, and by adopting the voting mechanism with the help of compilation techniques. Further, we propose a scheduling algorithm to improve the instruction scheduling over the executable under the performance constraint. Our experimental results on ADL-described VLIW datapath demonstrate that our solution efficiently improves the reliability by 29% over the suite of DSPStone benchmarks without performance overhead in our compiler-scheduler-simulator framework.
PDF file

R1-12
TitleCompiler Generation Method from ADL for ASIP Integrated Development Environment
Author*Yusuke Hyodo, Kensuke Murata (Osaka University, Japan), Takuji Hieda (Ritsumeikan University, Japan), Keishi Sakanushi, Yoshinori Takeuchi, Masaharu Imai (Osaka University, Japan)
Pagepp. 60 - 65
KeywordASIP, compiler generation, instruction selection, code generation description, ADL
AbstractIn this paper, we propose a compiler generation method from architecture description language(ADL) of ASIP Integrated Development Environment. By using our proposed method the modification of compiler, due to changes in processor specification, becomes easier and the amount of description and design time can be reduced. In our experiments, we compared the description and design time of an ASIP using our proposed method and conventional method, which generates a compiler manually. The experimental results show that the proposed method can reduce both the amount of the description and the design time by approximately 80% as compared to conventional method.
PDF file

R1-13
TitleMono-instruction Computer on a Dynamically Reconfigurable Gate Array
Author*Yuki Nihira, Minoru Watanabe (Shizuoka University, Japan)
Pagepp. 66 - 70
KeywordFPGA, ORGA, Dynamic reconfiguration
AbstractAs gates in field programmable gate arrays (FPGAs) become usable in ever-increasing numbers, FPGAs are becoming more widely used in various applications. Currently, FPGAs are implemented in many embedded systems. Demand for implementing a processor onto an FPGA is gaining. In response to that demand, FPGA vendors have provided soft-core processors for FPGAs, but those processors invariably have lower performance than that of hard-core processors. This paper therefore presents a proposal for a high-performance mono-instruction computer that fully exploits the programmability of a dynamically reconfigurable gate array. In addition, this paper clarifies implementation area and operation frequency advantages of mono-instruction computers relative to soft-core RISC processors.
PDF file

R1-14
TitleASPE: an Abstruction Framework using ALU Arrays for Scalable Multiple FPGAs System
AuthorKenta Inakagata, *Takayuki Akamine, Hirokazu Morishita (Keio University, Japan), Yasunori Osana (Ryukyu University, Japan), Naoyuki Fujita (Japan Aerospace Exploration Agency, Japan), Hideharu Amano (Keio University, Japan)
Pagepp. 71 - 76
KeywordMulti-FPGA, Acceleration with FPGAs, Floating-Point, Programmability
AbstractMulti-FPGA systems have attracted attentions as cost-efficient accelerators for high performance scientific computation. The major problem of such systems for users is programability. It is difficult especially for Multi-FPGA systems to find the best structure considering the resource and communication capability with HDL-based design. Here, ASPE, a design framework using arrays of processing elements on FPGAs is proposed to address the problem. Instead of HDL-coding, ASPE makes the application executed by defining operations and communication in the ALU arrays on multiple FPGAs. MUSCL, the core program in the computational fluid dynamics is implemented on the ASPE as an example, and evaluation results show that about 4.1 times performance compared with software on Intel Core 2 Duo is achieved.
PDF file

R1-15
TitleRobust Register Files by Exploiting Asymmetric Soft Error Rate
Author*Yohan Ko, Kyoungwoo Lee (Yonsei University, Republic of Korea)
Pagepp. 77 - 81
Keywordregister file, dependability, soft error, ASER, profiling
AbstractAs technology scaling, soft errors induced by external radiation or cosmic rays are becoming a serious concern in micro-architectures. In particular, soft errors in register files are critical in reliability since these errors are easily propagated to other components of processors, causing catastrophic system failures. To protect data in register files, there exist redundancy techniques such as Triple Modular Redundancy (TMR) and Error Correcting Code (ECC). However, these techniques incur high overheads in terms of area, performance, and power consumption. In this paper, we increase reliability on data in register files by simply applying inverters since soft error rates are asymmetric, i.e., different between 0 and 1 in bit values. The main idea behind our approach is to increase the more stable bit values in register files by inverting bit values if it has more unstable bit values from profiling data. Our experimental results show that our proposal can reduce soft error rates by up to 20% over a suite of benchmarks with minimal overheads due to inverters.
PDF file

R1-16
TitlePerformance Comparison of RG-DTM PUF and Arbiter-based PUFs
Author*Kousuke Ogawa, Mitsuru Shiozaki, Kota Furuhashi, Kohei Hozumi, Takeshi Fujino (Ritsumeikan University, Japan)
Pagepp. 82 - 87
KeywordPUF, RG-DTM PUF, Modeling Attack, SVM
AbstractThe proposed RG-DTM PUF achieves high uniqueness and security against modeling attacks. Hence, the RG-DTM PUF is suitable for tamper-resistance device, such as IC identification, authentication and key generation, compared with an arbiter-PUF and XOR arbiter-PUF. This paper presents performance comparisons which include uniqueness, stability, resistance for modeling attacks, circuit area and power consumption.

R1-17
TitleHardware Architecture for Accelerating Monte Carlo based SSTA using Generalized STA Processing Element
Author*Hiroshi Yuasa, Hiroshi Tsutsui, Hiroyuki Ochi, Takashi Sato (Kyoto University, Japan)
Pagepp. 88 - 93
KeywordSTA Processing Element, Monte Carlo based SSTA, Hardware Acceleration, Static Timing Analysis, STA-PE
AbstractWe propose a novel hardware architecture for accelerating Monte Carlo based statistical static timing analysis (MC-SSTA). In our approach, generalized hardware module called STA processing element (STA-PE) is used for delay evaluation of a logic gate. The proposed architecture is successfully implemented on an FPGA device, in which 26 STA-PEs run in parallel at 116 MHz clock. It achieves1,457 times acceleration compared to a software implementation.

R1-18
TitleHead-Tail Expressions for Interval Functions
Author*Infall Syafalni, Tsutomu Sasao (Kyushu Institute of Technology, Japan)
Pagepp. 94 - 99
KeywordInterval Function, Head-Tail Expression, TCAM
AbstractThis paper shows a method to represent interval functions by using head-tail expressions. The head-tail expressions represent greater-than GT(n:A) functions, less-than LT(n:B) functions, and interval functions IN0(n:A,B) more efficiently than sum-of-products expressions, where n denotes the number of bits to represent the largest value in the interval (A,B). This paper proves that a head-tail expression represents an interval function with at most n words in a TCAM realization. Experimental results for up to n=16 are shown.

R1-19
TitleA Performance Monitoring Tool Suite for Software and SoC On-Chip Bus
Author*Yi-Hao Chang, Ing-Jer Huang (National Sun Yat-Sen University, Taiwan)
Pagepp. 100 - 105
Keywordperformacne Analysis, SoC
AbstractNowadays SoC involves both software and hardware designs, performance bottleneck may occur either in software/hardware or even both. But present performance monitoring tools usually evaluates one of software/hardware performance, which is not quite enough for nowadays SoC designs. Furthermore, due to increasing complexity of user requirements, embedded OS, such as Linux is introduced to manage the limited hardware resources for complicated applications. However, it also makes performance monitoring harder since the memory addressing space is divided into user space and kernel space with different capability to access system resources, which makes user space application impossible to retrieve system performance information without kernel or hardware supports. In this paper, we propose a performance monitoring tool suite which is capable of analyzing the performance of user pace application, kernel space device driver and AMBA AHB bus for SoC running under Linux.

R1-20
TitleBackward Multiple Time-frame Expansion for Accelerating Sequential SAT
Author*Kousuke Torii, Kazuhiro Nakamura (Nagoya University, Japan), Kazuyoshi Takagi, Naofumi Takagi (Kyoto University, Japan)
Pagepp. 106 - 110
KeywordSequential SAT, sequential circuit, Formal Verification
AbstractSequential SAT is a formal verification problem which checks whether an input sequence to given circuit, such that a desired objective is satisfied, exists or not. Efficient algorithm for Sequential SAT solver is required to deal with sequential circuits which have large state space. In this paper, we demonstrate backward multiple time-frame expansion(BMTE) and present an algorithm for Sequential SAT solver that supports it. The proposed algorithm is suitable for merging states and pruning the state space for search. We show our promising experimental result.
PDF file

R1-21
TitleOn Optimization of Power Network Synthesis for Multiple Power Domain Designs
AuthorChieh-Jui Lee, Shih-Ying Liu, Chuan-Chia Huang, *Hung-Ming Chen (Institute of Electronics Engineering, National Chiao Tung University, Taiwan)
Pagepp. 111 - 114
KeywordPower network synthesis, Multiple power domain
AbstractIn this paper, we propose a methodology that synthesize and optimize the power network for design with multiple power domains. An architecture is presented to represent the power network with presence of sleep transistors. The power network is numerically modeled to RC network using Modified Nodal Analysis and solved using Conjugate Gradient Method. Regarding to IR drop effect mitigation, an optimization technique is proposed based on Simulated Annealing that minimize total power stripe area while satisfying a given IR drop constraint. In consideration of multiple power domains, the given power domains are represented in tree-like structure and our algorithm is recursively applied to synthesize and optimize the power network for each power domain in a hierarchical fashion. The proposed methodology is integrated to commercial design tool and experimented on real design case for evaluation. To ensure practical aspect of our approach, evaluation is performed on latest digital design commercial tool. Design data and parameters are extracted using Open Access. The result of our algorithm is fed back to latest commercial tool for final IR and EM analysis. Our algorithm is tested on both industrial testcase and academic MCNC benchmark. Comparing to conventional P/G network, using our power network synthesis can achieve 31% - 35% reduction in total P/G area while satisfying maximum 10% IR-drop constraint.
PDF file

R1-22
TitleThermal-Aware Placement for Hotspot Mitigation in 3D FPGAs
Author*Juinn-Dar Huang, Ya-Shih Huang, Mi-Yu Hsu, Han-Yuan Chang (National Chiao Tung University, Taiwan)
Pagepp. 115 - 120
KeywordThree-dimensional integration, 3D FPGAs, thermal-aware placement, logic block placement
AbstractThree-dimensional (3D) integration is an attractive and promising way for more complicated designs, whereas the thermal issue is a critical challenge for 3D integrated circuits. Moreover, accurate thermal analysis is too time-consuming to be incorporated into practical placement algorithms generally performing numerous iterative refinement steps. Therefore, in this paper, we propose two fast thermal-aware placement methods for 3D FPGAs, Standard Deviation (SD) and MineSweeper (MS), without the need of detailed thermal analysis. Both are devoted to distribute power sources more evenly within a 3D FPGA to mitigate hotspots. The experimental results show that SD and MS achieve 12.1%/7.6% reduction in maximum temperature and 82%/56% improvement in temperature deviation compared to a typical thermal-unaware placement method only at the cost of minor increase in wirelength and delay. Moreover, MS merely consumes 4% more runtime for producing thermal-aware placement solutions.

R1-23
TitleEfficient Delay Cells for Wave Pipelined Multifunctional Unit
AuthorAtsushi Kurokawa, *Tatsuya Takaki, Masa-aki Fukase (Hirosaki University, Japan)
Pagepp. 121 - 126
Keywordwave pipeline, processor, multifunctional unit, delay cell, buffer insertion
AbstractWave pipelining requires the addition of cells and wiring in order to slow down faster paths so that their delays are close to that of the longest path. For tuning the delay, a large number of buffers are usually inserted. This results in an increased chip area. This paper focuses on the area problem due to buffer insertion and presents new delay cells that have high area efficiency and are low in cost. Estimations are made of the delay, power consumption, and area of various types of the new delay cells. It is found that cells with intermediate transistors having narrow and long channels are the best in terms of area and power consumption. Cells of the best type are applied to a multifunctional unit (MFU). Experimental results show that a circuit with the new delay cells has a smaller area than one with only standard cells.

R1-24
TitleAn Integrated Smart Current Sensing Current-Mode Buck Converter
Author*Chia-Min Chen, Kai-Hsiu Hsu, Chung-Chih Hung (National Chiao Tung University, Taiwan)
Pagepp. 127 - 130
Keywordcurrent-mode controller, current-sensing circuit, DC-DC converter, pulse-width modulation(PWM), switch-mode power converter
AbstractThis paper presents an integrated circuit implementation of a high efficiency current-mode buck converter over a wide loading current. The converter adaptively operates as Pulse-Width Modulation (PWM). An on-chip current sensing technique is employed to reduce external components and no extra I/O pins are needed for the current-mode controller. A soft-start operation is designed to eliminate the excess large current during the startup of the regulator. The DC-DC converter was fabricated in 0.35um CMOS process with 2P4M. The range of the supply voltage is from 2 to 5V, which is suitable for single-cell lithium-ion battery.
PDF file

R1-25
TitleLinear Time Estimation of Full-Chip Statistical Leakage Current
Author*Katsumi Homma (Fujitsu Laboratories Ltd., Japan)
Pagepp. 131 - 134
KeywordStatistical Leakage Analysis, Process Variation
AbstractIn this paper, we propose a method for estimating the leakage current of a circuit under process parameter variations. The proposed method needs only O(N) computation time where N is the number of gates in circuit, and is faster than Monte Carlo and Wilkinson’s method. Experimental results show that the proposed method is effective in estimating statistical full-chip leakage current. Errors for 99 percentile value of full-chip leakage current are within 1%.
PDF file

R1-26
TitleAn Effective Overlap Removable Objective for Analytical Placement
Author*Syota Kuwabara, Yukihide Kohira (The University of Aizu, Japan), Yasuhiro Takashima (The University of Kitakyushu, Japan)
Pagepp. 135 - 140
KeywordAnalytical placement, minimization of overlap area, overlap removable area
AbstractIn the recent LSI design, it is difficult to obtain the placement which satisfies design constraints and specifications. Analytical placement is promising to obtain the placement which satisfies design constraints and specifications. Although existing methods obtain the placement with short wire length, the obtained placement has overlap. In this paper, we propose overlap removable area as an overlap evaluation method for analytical placement. Experiments show that the proposed method is effective in order to remove overlap in analytical placement.