(Go to Top Page)

SASIMI 2012
The 17th Workshop on Synthesis And System Integration of Mixed Information Technologies
Technical Program

Remark: The presenter of each paper is marked with "*".
Technical Program:   SIMPLE version   DETAILED version with abstract
Author Index:   HERE

Session Schedule


Thursday, March 8, 2012

Opening (Int'l Conf. Room)
9:00 - 9:15
K1  (Int'l Conf. Room)
Keynote Speech I

9:15 - 10:15
R1  (Int'l Conf. Room & Mtg. Room 31)
Poster I

10:15 - 12:00
Lunch Break
12:00 - 13:30
I1  (Int'l Conf. Room)
Invited Talk I

13:30 - 14:30
R2  (Int'l Conf. Room & Mtg. Room 31)
Poster II

14:30 - 16:30
D  (Int'l Conf. Room)
Panel Discussion

16:30 - 18:00
Banquet (Reception Hall)
18:30 - 20:30

Friday, March 9, 2012

K2  (Int'l Conf. Room)
Keynote Speech II

9:00 - 10:00
R3  (Int'l Conf. Room & Mtg. Room 31)
Poster III

10:00 - 11:45
Lunch Break
11:45 - 13:15
I2  (Int'l Conf. Room)
Invited Talk II

13:15 - 14:15
R4  (Int'l Conf. Room & Mtg. Room 31)
Poster IV

14:15 - 16:00
I3  (Int'l Conf. Room)
Invited Talk III

16:00 - 17:00
Closing (Int'l Conf. Room)
17:00 - 17:15



List of Papers

Remark: The presenter of each paper is marked with "*".

Thursday, March 8, 2012

Keynote Speech I
Time: 9:15 - 10:15 Thursday, March 8, 2012
Location: Int'l Conf. Room
Chair: Masahiro Numa (Kobe University, Japan)

K1 (Time: 9:15 - 10:15)
TitleRobust System Design: Overcoming Complexity and Reliability Challenges
Author*Subhasish Mitra (Stanford University, U.S.A.)
Pagep. 1
AbstractToday's mainstream electronic systems typically assume that transistors and interconnects operate correctly over their useful lifetime. With enormous complexity and significantly increased vulnerability to failures compared to the past, future system designs cannot rely on such assumptions. At the same time, there is explosive growth in our dependency on such systems. Robust system design is essential to ensure that future systems perform correctly despite rising complexity and increasing disturbances. This talk will address the following major robust system design goals: 1. New approaches to thorough test and validation that scale with tremendous growth in complexity; and, 2. Cost-effective tolerance and prediction of failures in hardware during system operation. Significant recent progress in robust system design impacts almost every aspect of future systems, from ultra-large-scale networked systems, all the way to their nanoscale components.
PDF file


Poster I
Time: 10:15 - 12:00 Thursday, March 8, 2012
Location: Int'l Conf. Room & Mtg. Room 31
Chairs: Hiroaki Yoshida (University of Tokyo, Japan), Tomoo Inoue (Hiroshima City University, Japan)

R1-1
TitleTOF-based 3-Dimensional Head-Tracking System for Repetitive Transcranial Magnetic Stimulation
Author*Ryo Ebisuwaki, Yoshihiro Yasumuro, Hiroshige Dan, Masahiko Fuyuki (Faculty of Environmental and Urban Engineering, Kansai University, Japan)
Pagepp. 2 - 5
KeywordTOF camera, ICP algorithm, head tracking
AbstractMany survivors from brain infraction and hemorrhage get peripheral nerve damage that sometimes causes neuropathic pain even without external injury. While no medication is useful for neuropathic pain, repetitive transcranial magnetic stimulation (rTMS) has gathered attention for mitigating this pain. Existing rTMS treatments requires precise localization of the stimulation target in the brain and it is necessary to bind a patient to a bed. This paper proposes a new localizing system scheme for keeping the patient unconstrained, using a TOF camera to measure the three-dimensional shape of an object in realtime.
PDF file

R1-2
TitleA High-speed H.264/AVC CABAC Decoder for 4K Video Utilizing Residual Data Accelerator
Author*Kenji Watanabe (Synthesis Corporation, Japan), Gen Fujita (Dept. Engineering Informatics, Osaka Electro-Communication University, Japan), Toru Homemoto, Ryoji Hashimoto (Graduate School of Information Science and Technology, Osaka University, Japan)
Pagepp. 6 - 10
KeywordH.264, CABAC
AbstractThe implementation of a parallel decoder for CABAC (Context-based Adaptive Binary Arithmetic Coding), which is adopted in the H.264/AVC video coding standard, is extremely difficult due to inherent data dependency. Therefore, the CABAC decoder constitutes a bottleneck when decoding 1080 HD (1,920x1,080) or higher video sequences in real time. In this paper, we propose a VLSI (Very Large Scale Integration) architecture for the CABAC decoder that adopts a multi-bin decoding architecture in conjunction with techniques that improve the maximum clock frequency. The implementation results show that the proposed architecture achieves an average throughput of 1.48 bins per clock and a maximum clock frequency of 394 MHz, demonstrating that our architecture is capable of decoding 4K (4,096x2,048 @ 30 fps) video in real time.
PDF file

R1-3
TitleLow Power Decision Tree-Based Flow Search Engine
Author*Eita Kobayashi, Norio Yamagaki, Takashi Takenaka, Satoshi Kamiya (NEC Corporation, Japan), Masato Motomura (Hokkaido University, Japan)
Pagepp. 11 - 16
KeywordSearch Engine, TCAM, Low Power, Design Method
AbstractThis paper presents a novel architecture for a low-power flow search engine. It comprises a combination of decision tree based pipelines using general-purpose memories and linear search pipelines that prevent rule duplication. We also developed a design method taking into account of the robustness for the fluctuation of the network property. The evaluation result shows that the hardware implementation of our architecture archives power reduction by up to 92% with maintenance of performance as much as TCAM.
PDF file

R1-4
TitleManycore NOC Based 2400-PE Network on Chip Emulation and Verification Environment
Author*Omar Hammami (ENSTA ParisTech, France), Xinyu Li (EVE, France)
Pagepp. 17 - 21
Keywordemulation, FPGA, manycore, NOC, verification
Abstractwe present in this paper NOCEVE an industrial Network on Chip (NoC) emulation and verification environment on industrial large scale multi-FPGA emulation platform for billion cycle application. It help designer to improve system performance by the analysis of traffic distribution and balance through the network on chip. The hardware monitoring network is generated by another commercial NoC design tool. It consists of traffic collectors, which is reconfigurable to collect different traffic information such as packet latency and throughput. The statistic traffic information is collected during real application execution on FPGA platform and it is sent through monitoring network on FPGA and then PCI bright board back to host computer for real-time visualization or post-execution data analysis. NOCEVE is the first industrial NoC emulation and verification environment for billion cycle applications.

R1-5
TitleBit-Selective SAD and Its Evaluation
AuthorRyosuke Hamaji, Yongson Choi, Yuko Hara-Azumi, *Shigeru Yamashita (Ritsumeikan University, Japan)
Pagepp. 22 - 27
KeywordBlock Maching, SAD
AbstractThis paper describes a simulation result which shows whether and which bits may be reduced among bit-value of images in calculating Sum-of-Absolute-Differences (SAD) for block matching. We find the importance of low-bits in calculating SAD. We also introduce a calculating SAD architecture that can select 8-bit or 4-bit mode to calculate SAD values with a new scheduling technique.

R1-6
TitleA Technique for Accelerating SVM-Based Image Recognition Using GPU
Author*Jin Sasaki, Yuzuru Shizuku, Tetsuya Hirose, Nobutaka Kuroki, Masahiro Numa (Kobe University, Japan)
Pagepp. 28 - 32
KeywordGPGPU, Accelaration, Image Recognition, SVM
AbstractIn this paper, we propose a technique for accelerating image recognition based on SVM (Support Vector Machine) using GPU (Graphics Processing Unit). We have applied the proposed technique to human detection based on HOG (Histogram of Oriented Gradients) features. Experimental results have shown that the proposed technique achieves speedups of 107.7 times in the learning process, and speedups of 9.5 times in the recognition process compared with the conventional technique using CPU (Central Processing Unit) without lowering recognition accuracy.

R1-7
TitleVariation of Substrate Sensitivity in Differential Pair Transistors
Author*Satoshi Takaya, Takashi Hasegawa, Yoji Bando (Kobe University, Japan), Toru Ohkawa, Toshiharu Takaramoto, Toshio Yamada, Masaaki Souda, Shigetaka Kumashiro, Tohru Mogami (MIRAI-Selete, Japan), Makoto Nagata (Kobe University, Japan)
Pagepp. 33 - 35
KeywordSubstrate coupling, Substrate noise, On-chip monitoring
AbstractThe sensitivity of differential pair transistors against substrate voltage variation is investigated in different technology nodes at 90 nm and 65 nm. On-chip measurements were carried out for the response of transistors against small AC signals at input nodes as well as on a silicon substrate. The analysis of the substrate sensitivity and its variation due to physical factors in a layout was discussed with measurement data. The universality and dependency of the substrate sensitivity in different technical nodes were also addressed.

R1-8
TitleAutomatic Generation of GNU Binutils and GDB for Custom Processors Based on Plug-in Method
AuthorTakahiro Kumura (NEC Corporation, Japan), Soichiro Taga (Mitsubishi Electric Micro-Computer Application Software Co., Ltd., Japan), *Nagisa Ishiura (Kwansei Gakuin University, Japan), Yoshinori Takeuchi, Masaharu Imai (Osaka University, Japan)
Pagepp. 36 - 41
Keywordsoftware development tools, custom processor, binutils, gdb, plug-in method
AbstractThis paper presents a scheme of auto-generating GNU software development tools for newly developed processor cores based on a plug-in method. An experimental system based on our method successfully generated a GNU toolchain consisting of an assembler, a disassembler, a linker, a simulator, and a debugger from succinct architecture description. Although the generated GDB supports only assembly level debugging, this is the first system that retargets the GDB debugger automatically.
PDF file

R1-9
TitleAccelerating Regression Test of Compilers by Test Program Merging
Author*Takayuki Fukumoto (Kwansei Gakuin University, Japan), Kazushi Morimoto (Nomura Research Institute, Ltd., Japan), Nagisa Ishiura (Kwansei Gakuin University, Japan)
Pagepp. 42 - 47
KeywordC language, compiler, test suite, gcc, testgen
AbstractThis paper proposes a method of accelerating regression test of compilers by merging test programs in compiler test suites. Large amount of computation time is needed for compiler testing through test suites, for they consist of a huge number of test programs. Especially, in early stages of compiler development, reduction of time for testing is a critical issue, for bug fixes and regression tests are alternately repeated for many times. The proposed method attempts to shorten the time for test suite run by merging test programs in the test suite into longer but fewer programs, which drastically reduces the overhead for file open/close. During the merger, conflicts among the names of global variables, functions, and user defined types are avoided by prefixing. Header file inclusion as well as multiplier compilation are carefully handled so that the semantics of the original test programs are maintained. A technique is also proposed to identify test programs that resulted in execution errors while executing the merged test programs. In an experiment where about 9,000 test programs in the testgen test suite were merged into 117 programs, computation time was reduced into 1/11.1 on Ubuntu Linux and into 1/63.9 on Cygwin on 2.5GHz Core i5 CPU.
PDF file

R1-10
TitleRandom Testing of C Compilers Targeting Arithmetic Optimization
Author*Eriko Nagai (Kwansei Gakuin University, Japan), Hironobu Awazu (Fujitsu, Japan), Nagisa Ishiura (Kwansei Gakuin University, Japan), Naoya Takeda (ITEC Hankyu Hanshin, Japan)
Pagepp. 48 - 53
Keywordcompiler, randomtest
AbstractThis paper presents a method of testing validity of arithmetic optimization of C compilers using random programs. Compilers are tested by programs which contain randomly generated arithmetic expressions. Undefined behavior of the C language is carefully avoided during random program generation. This is based on precise computation of expected values of the expressions which takes implementation-defined behavior into account. A method for automatic minimization of error programs is also presented which expedites the analysis of detected errors. A random test program based on our method has detected malfunctions in several compilers, which include LVN GCC 4.2.1 shipped with the latest Mac OS X, GCC 4.4.4 for Ubuntu Linux, GCC 4.3.4 for Cygwin, and GCC 4.4.1 for h8300-elf and m32r-elf.
PDF file

R1-11
TitleCompiler-Assisted Soft Error Correction by Duplicating Instructions for VLIW Architecture
AuthorYunrong Li, Jongwon Lee (Seoul National University, Republic of Korea), *Yohan Ko, Kyoungwoo Lee (Yonsei University, Republic of Korea), Yunheung Paek (Seoul National University, Republic of Korea)
Pagepp. 54 - 59
Keywordsoft error, vliw, embedded system, error correction
AbstractExponentially increasing with technology scaling, soft errors have become a serious design concern in the deep sub-micron era. Error detection in VLIW or embedded systems is not enough while error correction is expensive due to the recovery mechanism. In this work, we present an enhanced VLIW architecture capable of not only error detection but also error correction by duplicating instructions efficiently, by re-executing the error-detected instruction, and by adopting the voting mechanism with the help of compilation techniques. Further, we propose a scheduling algorithm to improve the instruction scheduling over the executable under the performance constraint. Our experimental results on ADL-described VLIW datapath demonstrate that our solution efficiently improves the reliability by 29% over the suite of DSPStone benchmarks without performance overhead in our compiler-scheduler-simulator framework.
PDF file

R1-12
TitleCompiler Generation Method from ADL for ASIP Integrated Development Environment
Author*Yusuke Hyodo, Kensuke Murata (Osaka University, Japan), Takuji Hieda (Ritsumeikan University, Japan), Keishi Sakanushi, Yoshinori Takeuchi, Masaharu Imai (Osaka University, Japan)
Pagepp. 60 - 65
KeywordASIP, compiler generation, instruction selection, code generation description, ADL
AbstractIn this paper, we propose a compiler generation method from architecture description language(ADL) of ASIP Integrated Development Environment. By using our proposed method the modification of compiler, due to changes in processor specification, becomes easier and the amount of description and design time can be reduced. In our experiments, we compared the description and design time of an ASIP using our proposed method and conventional method, which generates a compiler manually. The experimental results show that the proposed method can reduce both the amount of the description and the design time by approximately 80% as compared to conventional method.
PDF file

R1-13
TitleMono-instruction Computer on a Dynamically Reconfigurable Gate Array
Author*Yuki Nihira, Minoru Watanabe (Shizuoka University, Japan)
Pagepp. 66 - 70
KeywordFPGA, ORGA, Dynamic reconfiguration
AbstractAs gates in field programmable gate arrays (FPGAs) become usable in ever-increasing numbers, FPGAs are becoming more widely used in various applications. Currently, FPGAs are implemented in many embedded systems. Demand for implementing a processor onto an FPGA is gaining. In response to that demand, FPGA vendors have provided soft-core processors for FPGAs, but those processors invariably have lower performance than that of hard-core processors. This paper therefore presents a proposal for a high-performance mono-instruction computer that fully exploits the programmability of a dynamically reconfigurable gate array. In addition, this paper clarifies implementation area and operation frequency advantages of mono-instruction computers relative to soft-core RISC processors.
PDF file

R1-14
TitleASPE: an Abstruction Framework using ALU Arrays for Scalable Multiple FPGAs System
AuthorKenta Inakagata, *Takayuki Akamine, Hirokazu Morishita (Keio University, Japan), Yasunori Osana (Ryukyu University, Japan), Naoyuki Fujita (Japan Aerospace Exploration Agency, Japan), Hideharu Amano (Keio University, Japan)
Pagepp. 71 - 76
KeywordMulti-FPGA, Acceleration with FPGAs, Floating-Point, Programmability
AbstractMulti-FPGA systems have attracted attentions as cost-efficient accelerators for high performance scientific computation. The major problem of such systems for users is programability. It is difficult especially for Multi-FPGA systems to find the best structure considering the resource and communication capability with HDL-based design. Here, ASPE, a design framework using arrays of processing elements on FPGAs is proposed to address the problem. Instead of HDL-coding, ASPE makes the application executed by defining operations and communication in the ALU arrays on multiple FPGAs. MUSCL, the core program in the computational fluid dynamics is implemented on the ASPE as an example, and evaluation results show that about 4.1 times performance compared with software on Intel Core 2 Duo is achieved.
PDF file

R1-15
TitleRobust Register Files by Exploiting Asymmetric Soft Error Rate
Author*Yohan Ko, Kyoungwoo Lee (Yonsei University, Republic of Korea)
Pagepp. 77 - 81
Keywordregister file, dependability, soft error, ASER, profiling
AbstractAs technology scaling, soft errors induced by external radiation or cosmic rays are becoming a serious concern in micro-architectures. In particular, soft errors in register files are critical in reliability since these errors are easily propagated to other components of processors, causing catastrophic system failures. To protect data in register files, there exist redundancy techniques such as Triple Modular Redundancy (TMR) and Error Correcting Code (ECC). However, these techniques incur high overheads in terms of area, performance, and power consumption. In this paper, we increase reliability on data in register files by simply applying inverters since soft error rates are asymmetric, i.e., different between 0 and 1 in bit values. The main idea behind our approach is to increase the more stable bit values in register files by inverting bit values if it has more unstable bit values from profiling data. Our experimental results show that our proposal can reduce soft error rates by up to 20% over a suite of benchmarks with minimal overheads due to inverters.
PDF file

R1-16
TitlePerformance Comparison of RG-DTM PUF and Arbiter-based PUFs
Author*Kousuke Ogawa, Mitsuru Shiozaki, Kota Furuhashi, Kohei Hozumi, Takeshi Fujino (Ritsumeikan University, Japan)
Pagepp. 82 - 87
KeywordPUF, RG-DTM PUF, Modeling Attack, SVM
AbstractThe proposed RG-DTM PUF achieves high uniqueness and security against modeling attacks. Hence, the RG-DTM PUF is suitable for tamper-resistance device, such as IC identification, authentication and key generation, compared with an arbiter-PUF and XOR arbiter-PUF. This paper presents performance comparisons which include uniqueness, stability, resistance for modeling attacks, circuit area and power consumption.

R1-17
TitleHardware Architecture for Accelerating Monte Carlo based SSTA using Generalized STA Processing Element
Author*Hiroshi Yuasa, Hiroshi Tsutsui, Hiroyuki Ochi, Takashi Sato (Kyoto University, Japan)
Pagepp. 88 - 93
KeywordSTA Processing Element, Monte Carlo based SSTA, Hardware Acceleration, Static Timing Analysis, STA-PE
AbstractWe propose a novel hardware architecture for accelerating Monte Carlo based statistical static timing analysis (MC-SSTA). In our approach, generalized hardware module called STA processing element (STA-PE) is used for delay evaluation of a logic gate. The proposed architecture is successfully implemented on an FPGA device, in which 26 STA-PEs run in parallel at 116 MHz clock. It achieves1,457 times acceleration compared to a software implementation.

R1-18
TitleHead-Tail Expressions for Interval Functions
Author*Infall Syafalni, Tsutomu Sasao (Kyushu Institute of Technology, Japan)
Pagepp. 94 - 99
KeywordInterval Function, Head-Tail Expression, TCAM
AbstractThis paper shows a method to represent interval functions by using head-tail expressions. The head-tail expressions represent greater-than GT(n:A) functions, less-than LT(n:B) functions, and interval functions IN0(n:A,B) more efficiently than sum-of-products expressions, where n denotes the number of bits to represent the largest value in the interval (A,B). This paper proves that a head-tail expression represents an interval function with at most n words in a TCAM realization. Experimental results for up to n=16 are shown.

R1-19
TitleA Performance Monitoring Tool Suite for Software and SoC On-Chip Bus
Author*Yi-Hao Chang, Ing-Jer Huang (National Sun Yat-Sen University, Taiwan)
Pagepp. 100 - 105
Keywordperformacne Analysis, SoC
AbstractNowadays SoC involves both software and hardware designs, performance bottleneck may occur either in software/hardware or even both. But present performance monitoring tools usually evaluates one of software/hardware performance, which is not quite enough for nowadays SoC designs. Furthermore, due to increasing complexity of user requirements, embedded OS, such as Linux is introduced to manage the limited hardware resources for complicated applications. However, it also makes performance monitoring harder since the memory addressing space is divided into user space and kernel space with different capability to access system resources, which makes user space application impossible to retrieve system performance information without kernel or hardware supports. In this paper, we propose a performance monitoring tool suite which is capable of analyzing the performance of user pace application, kernel space device driver and AMBA AHB bus for SoC running under Linux.

R1-20
TitleBackward Multiple Time-frame Expansion for Accelerating Sequential SAT
Author*Kousuke Torii, Kazuhiro Nakamura (Nagoya University, Japan), Kazuyoshi Takagi, Naofumi Takagi (Kyoto University, Japan)
Pagepp. 106 - 110
KeywordSequential SAT, sequential circuit, Formal Verification
AbstractSequential SAT is a formal verification problem which checks whether an input sequence to given circuit, such that a desired objective is satisfied, exists or not. Efficient algorithm for Sequential SAT solver is required to deal with sequential circuits which have large state space. In this paper, we demonstrate backward multiple time-frame expansion(BMTE) and present an algorithm for Sequential SAT solver that supports it. The proposed algorithm is suitable for merging states and pruning the state space for search. We show our promising experimental result.
PDF file

R1-21
TitleOn Optimization of Power Network Synthesis for Multiple Power Domain Designs
AuthorChieh-Jui Lee, Shih-Ying Liu, Chuan-Chia Huang, *Hung-Ming Chen (Institute of Electronics Engineering, National Chiao Tung University, Taiwan)
Pagepp. 111 - 114
KeywordPower network synthesis, Multiple power domain
AbstractIn this paper, we propose a methodology that synthesize and optimize the power network for design with multiple power domains. An architecture is presented to represent the power network with presence of sleep transistors. The power network is numerically modeled to RC network using Modified Nodal Analysis and solved using Conjugate Gradient Method. Regarding to IR drop effect mitigation, an optimization technique is proposed based on Simulated Annealing that minimize total power stripe area while satisfying a given IR drop constraint. In consideration of multiple power domains, the given power domains are represented in tree-like structure and our algorithm is recursively applied to synthesize and optimize the power network for each power domain in a hierarchical fashion. The proposed methodology is integrated to commercial design tool and experimented on real design case for evaluation. To ensure practical aspect of our approach, evaluation is performed on latest digital design commercial tool. Design data and parameters are extracted using Open Access. The result of our algorithm is fed back to latest commercial tool for final IR and EM analysis. Our algorithm is tested on both industrial testcase and academic MCNC benchmark. Comparing to conventional P/G network, using our power network synthesis can achieve 31% - 35% reduction in total P/G area while satisfying maximum 10% IR-drop constraint.
PDF file

R1-22
TitleThermal-Aware Placement for Hotspot Mitigation in 3D FPGAs
Author*Juinn-Dar Huang, Ya-Shih Huang, Mi-Yu Hsu, Han-Yuan Chang (National Chiao Tung University, Taiwan)
Pagepp. 115 - 120
KeywordThree-dimensional integration, 3D FPGAs, thermal-aware placement, logic block placement
AbstractThree-dimensional (3D) integration is an attractive and promising way for more complicated designs, whereas the thermal issue is a critical challenge for 3D integrated circuits. Moreover, accurate thermal analysis is too time-consuming to be incorporated into practical placement algorithms generally performing numerous iterative refinement steps. Therefore, in this paper, we propose two fast thermal-aware placement methods for 3D FPGAs, Standard Deviation (SD) and MineSweeper (MS), without the need of detailed thermal analysis. Both are devoted to distribute power sources more evenly within a 3D FPGA to mitigate hotspots. The experimental results show that SD and MS achieve 12.1%/7.6% reduction in maximum temperature and 82%/56% improvement in temperature deviation compared to a typical thermal-unaware placement method only at the cost of minor increase in wirelength and delay. Moreover, MS merely consumes 4% more runtime for producing thermal-aware placement solutions.

R1-23
TitleEfficient Delay Cells for Wave Pipelined Multifunctional Unit
AuthorAtsushi Kurokawa, *Tatsuya Takaki, Masa-aki Fukase (Hirosaki University, Japan)
Pagepp. 121 - 126
Keywordwave pipeline, processor, multifunctional unit, delay cell, buffer insertion
AbstractWave pipelining requires the addition of cells and wiring in order to slow down faster paths so that their delays are close to that of the longest path. For tuning the delay, a large number of buffers are usually inserted. This results in an increased chip area. This paper focuses on the area problem due to buffer insertion and presents new delay cells that have high area efficiency and are low in cost. Estimations are made of the delay, power consumption, and area of various types of the new delay cells. It is found that cells with intermediate transistors having narrow and long channels are the best in terms of area and power consumption. Cells of the best type are applied to a multifunctional unit (MFU). Experimental results show that a circuit with the new delay cells has a smaller area than one with only standard cells.

R1-24
TitleAn Integrated Smart Current Sensing Current-Mode Buck Converter
Author*Chia-Min Chen, Kai-Hsiu Hsu, Chung-Chih Hung (National Chiao Tung University, Taiwan)
Pagepp. 127 - 130
Keywordcurrent-mode controller, current-sensing circuit, DC-DC converter, pulse-width modulation(PWM), switch-mode power converter
AbstractThis paper presents an integrated circuit implementation of a high efficiency current-mode buck converter over a wide loading current. The converter adaptively operates as Pulse-Width Modulation (PWM). An on-chip current sensing technique is employed to reduce external components and no extra I/O pins are needed for the current-mode controller. A soft-start operation is designed to eliminate the excess large current during the startup of the regulator. The DC-DC converter was fabricated in 0.35um CMOS process with 2P4M. The range of the supply voltage is from 2 to 5V, which is suitable for single-cell lithium-ion battery.
PDF file

R1-25
TitleLinear Time Estimation of Full-Chip Statistical Leakage Current
Author*Katsumi Homma (Fujitsu Laboratories Ltd., Japan)
Pagepp. 131 - 134
KeywordStatistical Leakage Analysis, Process Variation
AbstractIn this paper, we propose a method for estimating the leakage current of a circuit under process parameter variations. The proposed method needs only O(N) computation time where N is the number of gates in circuit, and is faster than Monte Carlo and Wilkinson’s method. Experimental results show that the proposed method is effective in estimating statistical full-chip leakage current. Errors for 99 percentile value of full-chip leakage current are within 1%.
PDF file

R1-26
TitleAn Effective Overlap Removable Objective for Analytical Placement
Author*Syota Kuwabara, Yukihide Kohira (The University of Aizu, Japan), Yasuhiro Takashima (The University of Kitakyushu, Japan)
Pagepp. 135 - 140
KeywordAnalytical placement, minimization of overlap area, overlap removable area
AbstractIn the recent LSI design, it is difficult to obtain the placement which satisfies design constraints and specifications. Analytical placement is promising to obtain the placement which satisfies design constraints and specifications. Although existing methods obtain the placement with short wire length, the obtained placement has overlap. In this paper, we propose overlap removable area as an overlap evaluation method for analytical placement. Experiments show that the proposed method is effective in order to remove overlap in analytical placement.


Invited Talk I
Time: 13:30 - 14:30 Thursday, March 8, 2012
Location: Int'l Conf. Room
Chair: Makoto Takamiya (University of Tokyo, Japan)

I1 (Time: 13:30 - 14:30)
TitleEnergy Harvesting for Self Powered Sensor Systems - Case Study: Vibration Energy Harvesting for ‘Intelligent Tire’ Application -
Author*Rob van Schaijk, Rene Elfrink, Valer Pop, Ruud Vullers (Imec / Holst Centre, Netherlands)
Pagepp. 141 - 146
AbstractWireless autonomous sensor systems become steadily standard components in our environment and they become smaller, cheaper and more sophisticated. System autonomy during its intended lifetime is not reached in case batteries are used due to size limitations and the need to recharge. Aim is to generate and store power at the micro-scale to improve autonomy and reduce size. Energy harvesters fabricated by micro-system technology can realize this goal. The choice of harvesting principle depends on the application and vibration, thermal, photovoltaic and radiofrequency power conversions are investigated at imec/Holst Centre. An overview of latest results and remaining challenges will be given with the focus on vibration energy harvester. Vibration energy harvesters are of specific interest for machine environments where sinusoidal vibrations or repetitive shocks are present. In this presentation the application focus will be on tire pressure monitoring systems (TPMS) and `intelligent tire' applications. Measuring pressure and in the future more vital parameters, like e.g. forces, will improve safety and reduce fuel consumption. Vibration energy harvesters can provide sufficient power to accommodate autonomy in these applications. The design and characterization of piezo-electric energy harvesters will be presented together with dedicated power management solutions. Also the `intelligent tire' concept will be introduced and system optimization of fully autonomous wireless sensor systems mounted in the tire will be discussed.
PDF file


Poster II
Time: 14:30 - 16:30 Thursday, March 8, 2012
Location: Int'l Conf. Room & Mtg. Room 31
Chairs: Akihisa Yamada (Sharp, Japan), Mitsutoshi Mineshima (Jedat Inc., Japan)

R2-1
TitleA Formal Full Bus TLM Modeling for Fast and Accurate Contention Analysis
Author*Mao-Lin Li, Chen-Kang Lo, Li-Chun Chen (National Tsing Hua University, Taiwan), Hong-Jie Huang, Jen-Chieh Yeh (Industrial Technology Research Institute, Taiwan), Ren-Song Tsay (National Tsing Hua University, Taiwan)
Pagepp. 147 - 152
Keywordbus modeling, arbiter, TLM
AbstractThis paper presents an effective Cycle-count Accurate Transaction level (CCA-TLM) full bus modeling and simulation technique. Using the two-phase arbiter and master-slave models, an FSM-based Composite Master-Slave-pair and Arbiter Transaction (CMSAT) model is proposed for efficient and accurate dynamic simulations. This approach is particularly effective for bus architecture validation and contention analysis of complex Multi-Processor System-on-Chip (MPSoC) designs. The experimental results show that the proposed approach performs 23 times faster than the Cycle-Accurate (CA) bus model while maintaining 100% accurate timing information at every transaction boundary.
PDF file

R2-2
TitleA Formal Approach to Designing Arithmetic Circuits over Galois Fields Using Symbolic Computer Algebra
Author*Kazuya Saito, Naofumi Homma, Takafumi Aoki (Tohoku University, Japan)
Pagepp. 153 - 158
Keywordarithmetic circuits, formal verification, galois field, computer algebra
AbstractThis paper proposes a formal approach to designing arithmetic circuits over Galois Fields (GFs). Our method represents a GF arithmetic circuit by a hierarchical graph strucuture specified by variables and arithmetic formulae over GFs. The proposed circuit description is applicable to any GF(pm) (p ≥ 2) arithmetic and is formally verified by symbolic computation techniques such as polynomial reduction using Groebner basis. In this paper, we propose the graph representation and show some examples of its description and verification. The advantageous effect of the proposed approach is demonstrated through experimental designs of parallel multipliers over Galois field GF(2m) for different word-lengths and irreducible polynomials. An inversion circuit consisting of some multipliers is also designed and verified as a further application. The result shows that the proposed approach has a definite possibility of verifying practical GF arithmetic circuits where the conventional simulation and verification techniques failed.
PDF file

R2-3
TitleOptimal Design of Allpass Digital Filters using Artificial Bee Colony
Author*Wei-Der Chang (Department of Computer and Communication, Shu-Te University, Taiwan), Shing-Tai Pan (Department of Computer Science and Information Engineering, National University of Kaohsiung, Taiwan), Kuo-Hua Cheng, Ming-Chieh Hsu (Department of Computer and Communication, Shu-Te University, Taiwan)
Pagepp. 159 - 162
Keywordall-pass digital filter, phase response, artificial bee colony
AbstractThis paper applies a novel artificial bee colony algorithm to solve the design problem of allpass digital filters. We wish that the phase response of allpass filter can meet the desired specification. To achieve this aim, the ABC algorithm is utilized to update the related filter coefficients such that certain cost function of the algorithm can be minimized as possible as much. Finally, numerical simulation results will demonstrate the feasibility and effectiveness of the proposed scheme.
PDF file

R2-4
TitleA Processor Architecture for Multi-Dimensional Parity Check Code Processing
Author*Ryota Endo (Osaka University, Japan), Hiroki Ohsawa (Fuji Xerox Co., Ltd, Japan), Keishi Sakanushi, Yoshinori Takeuchi, Masaharu Imai (Osaka University, Japan)
Pagepp. 163 - 167
KeywordLow Energy, ASIP, Error Correcting Code, MDPC
AbstractMulti-Dimensional Parity Check (MDPC) code is an error correcting code which has been widely used for wireless communications under low error rate environment. In this study, a low-power processor for MDPC code processing is introduced and evaluated. Through experimental results, the processor achieves about 90% lower in energy consumption compared with an implementation by a usual RISC processor.
PDF file

R2-5
TitleApplication on the Hardware/Software Co-simulator; Implementation of Multi-stage, Multi-rate 2-D filter
Author*Yukiko Takanishi (Faculty of System Design Tokyo Metropolitan University, Japan), Yuichi Nakamura (System IP Core Research NEC Corporation, Japan), Takao Nishitani (Faculty of System Design Tokyo Metropolitan University, Japan)
Pagepp. 168 - 173
Keywordhardware/software co-simulation, FPGA, Simulink, visual debugging, FIR implementation
AbstractA functional expansion of a hardware-software co-simulator, using “Simulink” on PC and an FPGA emulator board, is realized for the purpose of real-time HDTV signal processing. In the proposed co-simulator, the emulation is carried out by using a set of raster scanning data processing, instead of the frame-based processing which is suitable for processing “Simulink” block. In addition, a visual verification approach in terms of processing delay between Simulink blocks is introduced for adjusting the connection of these blocks within the emulator.
PDF file

R2-6
TitleCheckpoint Selection for DEPS Framework Based on Quantitative Evaluation of DEPS Profile
Author*Hirotaka Kawashima, Gang Zeng, Hideki Takase, Masato Edahiro, Hiroaki Takada (Nagoya University, Japan)
Pagepp. 174 - 179
KeywordDEPS, energy optimization, DVFS, checkpoint
AbstractA dynamic energy performance scaling (DEPS) framework had been proposed as a generalization of the dynamic voltage frequency scaling (DVFS). In this paper, we propose a scheme of checkpoint selection for DEPS framework. The checkpoint is a sequence of operations for switching the hardware configurations. Our scheme of checkpoint selection judges energy efficiency of a checkpoint set using intra-task analysis informations. Our scheme evaluates DEPS profiles related with different checkpoint sets, and determines which checkpoint set is the most energy efficient. To achieve this scheme, we also propose a quantitative evaluation method of the DEPS profile. This method enables us to judge which DEPS profile is the most energy efficient. From experimental results, we confirm the reasonability of our quantitative evaluation, and that our scheme can select the optimal checkpoint set in realistic time.
PDF file

R2-7
TitleModel-Based Generation of a Fast and Accurate Virtual Execution Platform for Software-Intensive Real-Time Embedded Systems
Author*Jochen Zimmermann, Martin Küster, Oliver Bringmann (FZI Karlsruhe, Germany), Wolfgang Rosenstiel (Universität Tübingen, Germany)
Pagepp. 180 - 185
KeywordEarly System Verification, SystemC, Timing Simulation, Power Simulation, Model-based Generation
AbstractThe shift towards embedded functionality increasingly realized in software and the permanently growing complexity in design and verification require new methodologies in the development process of software-intensive real-time embedded systems. Major issues related to the software and hardware architecture have to be found out as early as possible to reduce subsequent costs and to allow a short time-to-market. Therefore, system analysis and verification must be possible in every stage during the design process. In this paper, we present an approach to generate a virtual execution platform in SystemC which allows to execute embedded software with strict consideration of the underlying hardware platform configuration. Starting from abstract UML/SysML models of software and hardware architecture or/and abstraction of legacy code, model transformation techniques are used during the generation process. In combination with source code timing annotations obtained from binary code analysis this approach allows a fast and accurate simulation of the embedded system model. To substantiate our allegation we present experimental results from different application domains.
PDF file

R2-8
TitleModel Based Parallelization from the Simulink Models and Their Sequential C Code
Author*Takahiro Kumura (Osaka University/NEC Corporation, Japan), Yuichi Nakamura (NEC Corporation, Japan), Nagisa Ishiura (Kwansei Gakuin University, Japan), Yoshinori Takeuchi, Masaharu Imai (Osaka University, Japan)
Pagepp. 186 - 191
Keywordmodel, dataflow, pipeline, parallelization, multicore
AbstractThis paper proposes a method to generate parallel C codes suited to pipeline processing from models developed on the Simulink. This paper focuses on a pipeline processing based on a way of applying the theory of communicating sequential processes. Under the parallelization process, the proposed method eliminates loop structures in models and builds directed acyclic graphs suited to a pipeline processing. On an experiment, the proposed method reduces the execution time to 26.3% on a 4-core processor.
PDF file

R2-9
TitleSaving Power Consumption in Final Stage Adder of Multiplier By Using Difference in Arrival Times with Input Signals
Author*Yuzuru Shizuku, Takeshi Kogure, Tatsuya Fujioka, Tetsuya Hirose, Nobutaka Kuroki, Masahiro Numa (Kobe University, Japan)
Pagepp. 192 - 196
Keywordmultiplier, low power comsumption, carry absorbing circuit
AbstractIn general, a Final Stage Adder (FSA) at the final stage of a multiplier is composed of a high-speed adder for shorter delay time. However, employing such a high-speed adder without paying attention to the difference in the arrival times with input signals increases the circuit size and power consumption due to unnecessary signal transitions. In this paper, we propose a technique for saving power consumption in the FSA based on a circuit architecture using difference in arrival times with input signals. Simulation results have shown that the proposed circuit reduces power consumption by 9% and power-delay product (PDP) by 12% compared with a conventional APPNA-based circuit.

R2-10s
TitleA Technique for SAT-based Test Generation through History of Reusing Solutions
Author*Kenji Ueda, Fumiyuki Hafuri, Toshiya Mukai, Tsuyoshi Iwagaki, Hideyuki Ichihara, Tomoo Inoue (Hiroshima City University, Japan)
Pagepp. 197 - 198
KeywordBoolean satisfiability, Test generation, Solution reuse, History of reusing, Instance similarity
AbstractThis paper presents a technique for test pattern generation (TPG) based on Boolean satisfiability (SAT) in a situation where a solution to a SAT instance is reused as the initial truth assignment for solving the successive instance. The efficiency of obtaining a solution to it depends on the order in reusing each variable of the previous solution one by one. The proposed technique utilizes the history of reusing solutions representing whether each variable of previous solutions is successfully reused. Experimental results show that the proposed technique using such a history is effective in test generation time.
PDF file

R2-11
TitleReconfigurable Cells for Post-Mask ECO
Author*Hiroto Senzaki, Tomoki Matsuyama, Kosuke Watanabe, Tetsuya Hirose, Nobutaka Kuroki, Masahiro Numa (Kobe University, Japan)
Pagepp. 199 - 204
KeywordEngineering change order (ECO), Reconfigurable (RECON) cell, Spare cell, Incremental synthesis
AbstractIn an LSI design process, Engineering Change Orders (ECOs) are often given even after the masks have been prepared. Spare-cell rewiring is a popular technique for post-mask ECO. In contrast to conventional spare cells having only one type of logic function, a reconfigurable (RECON) cell can be configured as one of three types of functional cells such as inverter, NAND, and NOR. This paper presents two new types of RECON cells: 2T-RECON cell and 6T-RECON cell with more types of logic functions. Technology remapping using the proposed RECON cells reduces the number of cells needed to complete post-mask ECO compared with using conventional spare cells. Experimental results with benchmark circuits have shown that the RECON cell rewiring scheme completes functional ECO with about 28% fewer cells than spare-cell rewiring.

R2-12
TitleGPU Acceleration of Cycle-based Soft-Error Simulation for Reconfigurable Array Architectures
Author*Takashi Imagawa, Takahiro Oue, Hiroshi Tsutsui, Hiroyuki Ochi, Takashi Sato (Kyoto University, Japan)
Pagepp. 205 - 210
KeywordGPGPU, cycle-based simulation, soft error, coarse-grained reconfigurable array
AbstractIn this paper, we propose two methods for accelerating cycle-based soft-error simulation of coarse-grained reconfigurable arrays (CGRAs) using GPUs. Two implementation strategies depending on the size of target CGRA is proposed considering struc- tural regularities of CGRA and memory architecture of GPUs. One of the proposed method achieves up to 68.0 times acceleration for small-scale CGRAs, while the other achieves 15.3 times acceleration on the av- erage without limitation on the size of CGRA.

R2-13
TitleHeterogeneous Assertion-Based Verification for Medical Devices Development
AuthorStefan Lämmermann (Universität Tübingen, Germany), Lukas Pielawa (OFFIS, Germany), *Andreas Burger (FZI Forschungszentrum für Informatik an der Universität Karlsruhe, Germany), Jan Schlemminger (OFFIS, Germany), Jürgen Ruf, Thomas Kropf (Universität Tübingen, Germany), Andreas Hein (OFFIS, Germany), Wolfgang Rosenstiel (Universität Tübingen, Germany)
Pagepp. 211 - 216
Keywordassertion based verification, hetereogenous system simulation, medical device development, formalization of medical requirements
AbstractThis paper describes the employment of an assertion-based verification methodology in the early stage of medical device development. The utilization of MSAL verification process enables to translate the medical characteristics automatically into observer automata for monitoring the systems behaviour. The experimental results show the integration of medical characteristics as assertions into simulation of a medical device in Matlab/Simulink and demonstrate the broad applicability and the high value of the evolved solution.

R2-14
TitleDegradation of Oscillation Frequency of Ring Oscillators Placed on a 90 nm FPGA
Author*Shouhei Ishii, Kazutoshi Kobayashi (Kyoto Institute of Technology, Japan)
Pagepp. 217 - 221
KeywordNBTI, FPGA, Variation, Degradation
AbstractWe focus on degradation of FPGAs which has become dominant due to scaling and quantitatively estimate the degradation of FPGAs by NBTI. We map ring oscillators on the Cyclone II FPGAs and measure the variation of oscillation frequency. In the result, the variation of oscillation frequency is 2.46%. As for degradation of FPGAs, we measure degradation of oscillation frequency until 10,000 seconds passed at room temperature (28Ž), 80Ž and 100Ž. As the result, degradation of oscillation frequency increases as temperature increases and degradation of about 0.1% at 10,000 seconds is observed at high temperature.

R2-15
TitleNUMANA: A Hybrid Numerical and Analytical Thermal Simulator for 3-D ICs
AuthorYu-Min Lee, Tsung-Heng Wu (National Chiao Tung University, Taiwan), Pei-Yu Huang (Industrial Technology Research Institute, Taiwan), *Chi-Wen Pan (National Chiao Tung University, Taiwan)
Pagepp. 222 - 226
Keyword3-D IC, thermal simulation
AbstractThis paper provides a hybrid framework by using numerical and analytical simulation techniques, NUMANA, to estimate the temperature profile of 3-D IC. Compared with a well known commercial tool, ANSYS, its error is within [-0.75%, 0.88%]. Furthermore, comparing with a fast modified-nodal-analysis thermal solver for a thermal circuit with 40K nodes, NUMANA can accurately estimate the temperature profile of 3-D IC with 3212.3X efficiency improvement.

R2-16
Title2-Stage Simulated Annealing with Crossover Operator for 3D-Packing Volume Minimization
Author*Yiqiang Sheng (Tokyo Institute of Technology, Japan), Atsushi Takahashi (Osaka University, Japan), Shuichi Ueno (Tokyo Institute of Technology, Japan)
Pagepp. 227 - 232
Keyword3D packing, 2-stage simulated annealing, sequence-k-tuple representation, VLSI physical design, CAD technique
AbstractThe 3D packing for VLSI physical design is facing big challenges to get better solution quality with less computational time. In this paper, we propose 2-stage simulated annealing with crossover operator (2-SA-X) to solve a general rectangular 3D-packing problem by using sequence-k-tuple representation, where k is defined by 3 and 5. The basic ideas of this research are to reuse the information of past solution by integrating the crossover operator from genetic algorithm and to improve the global search ability by using two different stages. The first stage mainly focuses on the global search by moving methods with big changes, including the crossover, while the second stage focuses on local search by the moving methods with small changes. Based on the experiment using ami98_3D benchmark, the computational performance of 3D packing is considerably improved. The paper shows how much the 3D-packing ratio of volume and the computational time can be improved by using the proposed 2-SA-X algorithm, comparing with normal 2-stage simulated annealing (2-SA) without the crossover operator.
PDF file

R2-17
TitleThermal Analysis for 3-D ICs Considering Interconnect Power Estimation
Author*Chi-Wen Pan, Ying-Hsiang Liu, Yu-Min Lee (National Chiao Tung University, Taiwan), Pei-Yu Huang (Industrial Technology Research Institute, Taiwan), Chi-Ping Yang (National Chiao Tung University, Taiwan)
Pagepp. 233 - 238
Keyword3D, IC, Thermal, Interconnect, Power
AbstractThis work presents a 3-D IC thermal simulator based on table look-up techniques, which considers the thermal effect of interconnect power. The key advantages are that the proposed simulator can fast analyze and incrementally update the temperature profile for thermal-aware physical design procedures. With delivering the portion of power into interconnect, the maximum temperature difference is about 2.4% comparing with the thermal simulation, which is set the total power as gate power.

R2-18s
TitleNet-based Move in SA-based Placement for a Switch-Block-Free Reconfigurable Device
Author*Masato Inagi, Masatoshi Nakamura, Tetsuo Hironaka (Hiroshima City University, Japan), Takashi Ishiguro (Taiyo Yuden Co., Ltd., Japan)
Pagepp. 239 - 240
Keywordplacement, MPLD, FPGA, move
AbstractIn this paper, we propose an enhanced SA-based placement algorithm for a switch-block-free reconfigurable architecture, introducing a move function that shifts all the logic cells which belong to a randomly selected net to the same direction. Although neighbor solutions generated by the move function are similar to the current solution, iteration of straightforward move functions that move a single logic cell at a time rarely generates such a solution. The move function improves reachability between good solutions, and thus improves the quality of the final solution.
PDF file

R2-19
TitleA Nonlinear Optimization Methodology for Resistor Matching in Analog Integrated Circuits
Author*Sheng-Jhih Jiang, Tsung-Yi Ho (National Cheng Kung University, Taiwan)
Pagepp. 241 - 246
KeywordAnalog CAD, Layout, Resistor Matching
AbstractIn analog design flow, one of the most important issues is to achieve accurate resistor ratios during the layout phase, which is called resistor matching. In the literature, researchers have proposed several methodologies achieving high matching quality in a rectangular structure. However, under the fixed-outline constraint, layout designers will place normal blocks such as macros and intellectual properties (IPs) first and then place the resistors. But the remaining space for resistors is usually rectilinear rather than rectangular, which is not appropriate for achieving high matching quality. To overcome this problem, we propose a nonlinear optimization methodology for globally improving the matching quality. Our algorithm enhances the matching quality by deforming the rectilinear shape into centrosymmetrical shape and simultaneously minimize the perturbation of the pre-placed normal blocks. Experimental result shows that the proposed algorithm is very promising.

R2-20
TitlePrecise Expression of nm CMOS Variability with Variance/Covariance Statistics on Ids(Vgs)
Author*Koutaro Hachiya (Jedat, Inc., Japan), Hiroo Masuda (ChiHiro Consultant, Japan), Atsushi Okamoto (Fujitsu Semiconductor Ltd., Japan), Masatoshi Abe, Takeshi Mizoguchi (Toshiba I.S. Corp., Japan), Goichi Yokomizo (STARC, Japan)
Pagepp. 247 - 252
Keywordmodel parameter extraction, statistical MOSFET model
AbstractWe have measured the drain current (Ids) variation of various sized MOS transistors under different gate bias conditions (Vgs). Both variation sigma (standard deviation) of Ids and correlations among Ids, Vth and Gm are considered to play important role in accurate expression of the device/circuit performance variations. This paper provides accurate model parameter extraction method considering size and bias dependence of the sigma and the correlations.
PDF file

R2-21
TitleA Transistor-level Symmetrical Layout Generation for Analog Device
Author*Bo Yang, Qing Dong, Jing Li, Shigetoshi Nakatake (The University of Kitakyushu, Japan)
Pagepp. 253 - 257
Keywordanalog layout, symmetrical placement, symmetrical routing, diffusion sharing
AbstractThis paper introduces a transistor-level symmetrical layout generation algorithm aiming at maximum diffusion-merging to the current paths for analog circuit. We present a SA-based algorithm to symmetrically assign the transistor pair into two rows and meanwhile minimize the total wirelength and diffusion gaps. Two examples are used to demonstrate the effectiveness of our algorithm.
PDF file

R2-22
TitleLDPC Coded MIMO Communication System With Relay Selection
Author*Nanfan Qiu, Xiao Peng, Yichao Lu, Satoshi Goto (Waseda University, Japan)
Pagepp. 258 - 261
KeywordLDPC, MIMO, Relay
AbstractThis paper presents a low-density parity check(LDPC) coded multiple-input multiple- output(MIMO) cooperative communication system with relay selection strategy. In the proposed co- operative network with multiple potential relays, we present selection cooperation to choose the best re- lay. We also present the outage probability analysis of this system. Furthermore, in the proposed archi- tecture relays rstly perform sphere detection, then send extrinsic messages to the terminal node by using space time block codes. By this architecture the ter- minal node only needs to perform LDPC decoding so the power consumption of the terminal node can be reduced.
PDF file

R2-23
TitleSubkey Driven Power Analysis Attack in Frequency Domain against Cryptographic LSIs
Author*Ryusuke Satoh, Daisuke Matsushima, Masaya Yoshikawa (Meijo University, Japan)
Pagepp. 262 - 267
KeywordSide-channel attacks, Power analysis, CPA, Frequency domain, AES
AbstractFor cryptographic LSI implemented on IC cards, it is important to secure resistance against power analysis attacks. This study proposes a new power analysis attack method that can be used to improve the efficiency of the resistance evaluation of cryptographic LSI. Compared with resistance evaluation that uses typical attack methods, the proposed method reduces the computational amount required for resistance evaluation greatly while maintaining the attack accuracy. In this study, the validity of the proposed method is verified through evaluation experiments performed with the use of a cryptographic circuit implemented on FPGA.

R2-24
TitleRealtime Mixed Reality Representation with a Virtual Light Source based on a Mobile 3D Acquisition
Author*Yoji Watatani (Graduate School of Engineering, Kansai University, Japan), Yoshihiro Yasumuro, Hiroshige Dan, Masahiko Fuyuki (Faculty of Environmental and Urban Engineering, Kansai University, Japan)
Pagepp. 268 - 271
Keywordmixed reality, calibration, TOF camera, superposition, real time
AbstractMixed reality (MR) has gathered attention recently as an effective technique for overlaying computer-generated virtual objects on physical scenes. Using MR, this research proposes a realtime imaging system to produce visual illumination effects on physical objects with a virtual light source. The proposed system models the shapes and the color information of a real scene through a realtime process. The illumination influenced by the virtual light source on the scene model are superposed on an actual video image to create MR representation. Experimental results show virtual light-up effects on the physical shapes and colors of the real objects by setting up a non-existing lighting configurations.
PDF file

R2-25
TitleA Full Dynamically Reconfigurable Vision-chip System Including a Lens-array
Author*Yuki Kamikubo, Minoru Watanabe, Shoji Kawahito (Shizuoka University, Japan)
Pagepp. 272 - 277
KeywordVison Chips, FPGA, ORGA, Image sensor
AbstractRecently, for use in autonomous vehicles and robots, demand has been increasing for high-speed image recognition that is superior to that of the human eye. However, to recognize many images quickly with such systems, many template images must be read out dynamically from memory. They must then be sent to a processor quickly. Realizing such high-speed real-time image recognition operation is difficult because of a bottleneck of transfer speed between the memory and the processor. Therefore, to improve the bottleneck, this paper experimentally presents a full dynamically reconfigurable vision-chip system including a lens-array.
PDF file

R2-26
TitleImproved Region-Growing Image-Segmentation Algorithm Using Dynamic Connection Weight Calculation Based on Mean Value of Exited Pixels
Author*Naotaka Kawakami, Ryosuke Kimura, Tatsuya Sugahara, Tetsushi Koide, Hans Jürgen Mattausch (Hiroshima University, Japan)
Pagepp. 278 - 283
KeywordImage segmentation, Dynamic connection weight
AbstractThis paper presents an image segmentation algorithm which uses a connection weight-based region-growing algorithm.By using a dynamic method which adjusts the connection weights to neighboring pixels based on mean values of exited pixels during the growth process, we enable segmentation of an image object with indistinct boundaries. In this paper, we introduce the proposed region-growing algorithm, and present the evaluation results with MATLAB.

R2-27
TitleAn Accurate Pedestrian Detection Utilizing Feature of Partitioned Image by Color
AuthorMasashi Ide, *Masataka Takahashi, Yoshiya Sugita, Masahiro Fukui (Ritsumeikan University, Japan)
Pagepp. 284 - 289
KeywordIntelligent transportation systems, pedestrian detection, outline tracing
AbstractOur approach aims at spreading the pedestrian recognition technique not only for luxury cars but also for general vehicles. Therefore, we propose an algorithm not for a stereo camera but for a cheap simple eye camera of cost. It traces outline of the target by a multiplex approach, in a grayscale and several partitioned domains of the hue of color. This technique provides several new techniques, a method of efficient enlargement of extracted edge image, a characterization method to analyze the angle histogram of outline tracing, and a composition method of multiplex ROI's. In candidates of the outlines of the body or the dress, the most likely ones are extracted. Highly precise recognition result was obtaine

R2-28
TitleA Fast and Accurate Algorithm for Traffic Sign Recognition
AuthorYoshiya Sugita, *Yuuki Tomisawa, Masashi Ide, Masahiro Fukui (Ritsumeikan University, Japan)
Pagepp. 290 - 295
Keywordtraffic sign recognition, inteligent transportation system
AbstractImage recognition technology is an important role for drivers. Especially traffic sign recognition is one of useful technology and researched many researchers. In this paper, traffic signs are recognized by an internal area of traffic signs and numbers of its areas. This paper proposes two features: an internal area of traffic signs and numbers of its areas for recognition of traffic signs are used to recognize a proper traffic sign. Although TRR (Total Recognition Rates) is not so much high due to miss of detection step, the recognition step accomplished recognition of 94%. With these methods, high accuracy is proved in the recognition step. Moreover, this method's calculation is six times as fast as template matching which is representative method in the traffic sign recognition technology.


Panel Discussion
Time: 16:30 - 18:00 Thursday, March 8, 2012
Location: Int'l Conf. Room
Moderator: Shinji Kimura (Waseda University, Japan)

D (Time: 16:30 - 18:00)
TitleChallenges for Future System Design and Verification
AuthorOrganizer/Moderator: Shinji Kimura (Waseda University, Japan), Panelists: Subhasish Mitra (Stanford University, U.S.A.), Rob van Schaijk (Imec / Holst Centre, Netherlands), Jason Cong (UCLA, U.S.A.), Sungjoo Yoo (POSTECH, Republic of Korea), Takahide Yoshikawa (Fujitsu Laboratories Ltd., Japan)
Pagep. 296
AbstractProcess shrinking does not stop, and tons of transistors can be integrated in one chip. We also have 3D-IC for integrating a memory chip on a CPU chip or so, and novel non-volatile memory technologies seem to be available in the near future. We can use such huge and various resources to implement highly parallel information systems. However the design, synthesis, and verification become complicated because of the system complexity, the unreliable behavior of devices such as the process variation, single event upset, etc. Power issue is also very important in system design. After 311 earthquake and the related nuclear plant problem in Japan, energy supply has been paid attention from the point of view of the sustainable life, and power consumption of information systems are discussed seriously since information systems become the basis of social activities and such systems cannot be stopped. Based on those observations, we would like to discuss about the problems and solutions on future system design and verification in the panel. 5 panelists gathered from various areas will clarify images of promising future systems, problems and prospective solutions on electronic design automation of parallel systems, reliability issues, power harvesting issues, memory issues, and massively parallel system issues, etc.
PDF file



Friday, March 9, 2012

Keynote Speech II
Time: 9:00 - 10:00 Friday, March 9, 2012
Location: Int'l Conf. Room
Chair: Masahiro Numa (Kobe University, Japan)

K2 (Time: 9:00 - 10:00)
TitleParallelization, Customization and Automation
Author*Jason Cong (UCLA, U.S.A.)
Pagepp. 297 - 299
AbstractIn order to meet ever-increasing computing needs and overcome power density limitations, the computing industry has halted simple processor frequency scaling and entered the era of parallelization, with tens to hundreds of computing cores integrated in a single processor, and hundreds to thousands of computing servers connected in a warehouse-scale data center. However, such highly parallel, general-purpose computing systems still face serious challenges in terms of performance, power, heat dissipation, space, and cost. We believe that we need to look beyond parallelization and focus on domain-specific customization to provide capability of adapting architecture to application in order to achieve significant power-performance efficiency improvement. This paradigm shift requires a great deal of innovation in architecture, compilation, and runtime system design, and offers many exciting and challenging research opportunities. I shall discuss the research progress in this direction and implication to the EDA industry.
PDF file


Poster III
Time: 10:00 - 11:45 Friday, March 9, 2012
Location: Int'l Conf. Room & Mtg. Room 31
Chairs: Qiang Zhu (Cadence Design Systems, Japan), Kyungsoo Lee (Kyoto University, Japan)

R3-1
TitleReplacement of Flip-Flops by Latches and Pulsed Latches for Power and Timing Optimization
AuthorYao-Ting Wu, *Rung-Bin Lin (Yuan Ze University, Taiwan)
Pagepp. 300 - 304
Keywordlow-power, timing optimization, latch, plused latch, clock tree
AbstractThis paper presents a simple (pulsed) latch replacement method that does not require clock tree re-synthesis while still excels at maintaining clock skew. It can improve timing performance by 14% and save clock tree power by 10% for already routed circuits with very tight timings. For circuits with looser timings, performance improvement is 6% to 8% and power saving is up to 24%. We also find out that a longer duty cycle has a great negative impact on the percentage of flip-flops being replaced with latches.

R3-2
TitleA Routability-oriented Packing Method for FPGA with Fracturable Logic Elements
AuthorWei Chen (Waseda University, Japan), Yuichi Nakamura (NEC Corporation, Japan), *Nan Liu, Takeshi Yoshimura (Waseda University, Japan)
Pagepp. 305 - 310
KeywordFPGA, Packing, Fracturable BLE, ALM
AbstractFracturable basic logic element (BLE) is widely applied in modern FPGAs to increase logic utilization rate, helping to reduce area of FPGAs. In this paper, we propose a novel packing method for FPGA with Adaptive Logic Module (ALM)-a kind of fracturable BLE manufactured by Altera. Our method can pack the LUTs and registers into ALMs as compactly as possible to reduce area and meanwhile improve routability of the result. Our method is based on a max-weight matching algorithm and the weight is decided in regard of area and routability. Experimental results show that by using fracturable BLE instead of traditional BLE, our method can reduce area by 37% and improve routability of the design by 15%.

R3-3
TitleA Two-Step BIST Scheme for Operational Amplifier
Author*Jun Yuan, Masayoshi Tachibana (Kochi University of Technology, Japan)
Pagepp. 311 - 316
KeywordBuilt-in Self-Test, Operational Amplifier, Compensation Capacitor, Current-based
AbstractThis paper presents a two-step Built-in Self-Test (BIST) scheme and its implementation for Operational Amplifier (Opamp). In addition to the catastrophic faults, the proposed technique can particularly detect the capacitance variation in the compensation capacitor by combining the current-based test with the offset-based test to detect the physical defects in the Opamp. The circuit-level simulation results of the proposed BIST system are presented to demonstrate the feasibility of the proposed BIST scheme with high fault coverage of 98%.
PDF file

R3-4s
TitleCircuit Partitioning Methods for FPGA-based ASIC Emulator using High-speed Serial Wires
Author*Katsunori Takahashi, Motoki Amagasaki, Morihiro Kuga, Masahiro Iida, Toshinori Sueyoshi (Kumamoto University, Japan)
Pagepp. 317 - 318
Keywordemulator, serial communication, virtual wire, FPGA
AbstractWe are studying FPGA-based ASIC emulator via high-speed serial communication. In this emulator, there are restrictions on placement of the FFs on FPGA and we have to reduce replicated logic gates and replicated input terminal when partitioning the cicuit to FPGAs. If the proposed circuit partitioning techniques are compared with hMETIS, it achieved average 56.4% reduction in the technique for suppressing the duplicution of external inputs. In the technique for suppressing the duplicution of nodes, it achieved average 71.8% reduction.
PDF file

R3-5
TitleTiming-aware Description Methods and Gate-level Simulation of Single Flux Quantum Logic Circuits
Author*Nobutaka Kito, Kazuyoshi Takagi, Naofumi Takagi (Kyoto University, Japan)
Pagepp. 319 - 324
KeywordSFQ circuit, timing, logic simulation
AbstractSingle-flux-quantum (SFQ) circuits are high-speed and low-power circuits using superconductive device. In SFQ circuits, skew of signals are not negligible and basic gates are clocked because SFQ circuits are fast and use pulse logic. Thus, we need to be aware timing issues for designing SFQ circuits. We propose two timing-aware description methods for SFQ circuits. One method is a circuit schematic with a note about order of pulse arrival. The other method is a timing-aware circuit description language. As an example application, we show a logic simulation algorithm.
PDF file

R3-6
TitleDesign and Analysis of Via-Configurable Routing Fabrics for Structured ASICs
AuthorHsin-Pei Tsai, *Rung-Bin Lin, Liang-Chi Lai (Yuan Ze University, Taiwan)
Pagepp. 325 - 329
KeywordStructured ASIC, Regular routing fabric, Via configurable, Routing resource, Router
AbstractThis paper presents a simple method for design and analysis of a via-configurable routing fabric formed by an array of routing fabric blocks (RFBs). The method simply probes into an RFB rather than resorts to full-chip routing to collect some statistics for a metric used to qualify the RFB. We find that the trade-off between wire length and via count is a good metric. This metric has been validated by full-chip routing and used successfully to create better routing fabrics.

R3-7
TitleDevice-level Simulations of Parasitic Bipolar Mechanisim on Preventing MCUs of Redundant Filp-Flops
Author*Kuiyuan Zhang, Ryosuke Yamamoto, Kazutoshi Kobayashi (Kyoto Institute of Technology, Japan)
Pagepp. 330 - 333
KeywordSoft error, Parasitic bipolar mechanisim, Multiple Cell Upset(MCU), Device simulation, Flip Flop
AbstractParasitic bipolar mechanisim can effectively prevent MCUs of redundant flip-flop, which improve the torlenrance of soft errors. Device-level simulations reveals that no MCU occurs in redundant latches storing the opposite values by the parasitic bipolar effect, while MCU occurs by a particle hit with high energy in the redundant latches storing the same value.

R3-8
TitleA Method of Analog IC Placement with Common Centroid Constraints
Author*Keitaro Ue, Kunihiro Fujiyoshi (Tokyo University of Agriculture and Technology, Japan)
Pagepp. 334 - 339
Keywordcommon centroid, sequence-pair, analog IC, placement
AbstractTo improve the immunity against process gradients, a common centroid constraint, in which every pair of capacitors which has been derived by dividing some original capacitors into two halves should be placed symmetrically with respect to a common centroid, is widely used. Xiao et al. proposed a method to obtain a placement satisfying the common centroid constraints, but this method has a defect. In this paper, we propose a method to obtain a placement which satisfies common centroid constraints.
PDF file

R3-9
TitleGPU-based Line Probing Techniques for Mikami Routing Algorithm
Author*Chiu-Yi Chan (Department of Computer Science and Engineering, Yuan Ze University, Taiwan), Jiun-Li Lin (Institute of Computer Science and Information Engineering, National Cheng Kung University, Taiwan), Lung-Sheng Chien (Department of Mathematics, National Tsing Hua University, Taiwan), Tsung-Yi Ho (Institute of Computer Science and Information Engineering, National Cheng Kung University, Taiwan), Yi-Yu Liu (Department of Computer Science and Engineering, Yuan Ze University, Taiwan)
Pagepp. 340 - 344
KeywordRouting, GPU, CUDA, Mikami router
AbstractGraphic processing unit (GPU), which contains hundreds of processing cores, is becoming a popular device for high performance computation in multi-core era. With strictly computation regularity characteristic, specific algorithms are key challenges for performance speed-up. In this paper, we propose a parallel CUDA-Mikami routing algorithm on NVIDIA's GPU. A 32-bit routing grid encoding is proposed to simplify wire intersection identification and wire direction recognition. Furthermore, thread-level and warp-level line probing techniques are proposed for vertical and horizontal routings, respectively. The experimental results indicate that the run-time efficiency is promising as compared to traditional CPU-version algorithms.
PDF file

R3-10
TitleTopology Design for Power Delivery in 3-D Integrated Circuits
Author*Shu-Han Wei, Yi-Hsuan Lee (Department of Electrical Engineering, National Chiao Tung University, Taiwan), Chih-Ting Sun, Yu-Min Lee (Department of Communication Engineering, National Chiao Tung University, Taiwan), Liang-Chia Cheng (Industrial Technology Research Institute, Taiwan)
Pagepp. 345 - 350
Keyword3D Power Delivery Network, Topology Optimization, 3D ICs, Power Grid, Through Silicon Via
AbstractThe three dimensional integrated circuit (3D IC) technology has been viewed as an effective method to improve the chip performance by overcoming the bottleneck of long global interconnection. However, the design of powerful 3D power delivery network (3D-PDN) becomes a serious challenge for 3D ICs. This work develops an efficient method to optimize the topology of 3D-PDN. A 3D-PDN topology design considers the 2D power grid design and through-silicon via placement. The proposed approach includes three main headings: (1) Initial 3DPDN Topology for early estimating the PG source and TSV source based on a compact circuit model of 3D-PDN; (2) Fast 3D-PDN IR Drop Analysis for identifying the correctness of 3D-PDN Topology; (3) 3D-PDN Topology Modification for refining the performance of initial 3D-PDN topology. The experimental results demonstrate the effectiveness of proposed 3D-PDN topology design method.

R3-11
TitleA Spur-Reduction Frequency Synthesizer For Wireless Application
Author*Te-Wen Liao, Jun-Ren Su, Chung-Chih Hung (Department of Electrical Engineering, National Chiao Tung University, Taiwan)
Pagepp. 351 - 354
KeywordPLL, VCO, Synthesizer
AbstractIn this paper, we presents a low-spur phase locked loop (PLL) system for wireless applications. The low-spur frequency synthesizer randomizes the periodic ripples on the control voltage of the voltage-controlled oscillator (VCO) in order to reduce the reference spur at the output of the PLL. A new random clock generator is presented to perform a random selection of phase frequency detector (PFD) control for charge pump at locked state. The proposed frequency synthesizer was fabricated in TSMC 0.18-µm CMOS process. The PLL has achieved the phase noise of -93dBc/Hz at 600 KHz offset frequency and reference spurs below -72dBc.
PDF file

R3-12
TitleDefinite Feature of Low-Energy Operation of Scaled Cross-Current Tetrode (XCT) SOI CMOS Circuits
Author*Yasuhisa Omura, Daishi Ino (Kansai University, Japan)
Pagepp. 355 - 360
KeywordSOI, CMOS, Low energy, XCT
AbstractThis paper describes an advanced aspect of cross-current tetrode (XCT) CMOS devices and demonstrates the outstanding low-energy characteristics of XCT-SOI CMOS by analyzing device operations. It is expected that this feature will be very useful to many medical implant applications.
PDF file

R3-13
TitleA Matching Method for Look-ahead Assertion on Pattern Independent Regular Expression Matching Engine
Author*Yoichi Wakaba, Shinobu Nagayama, Masato Inagi, Shin'ichi Wakabayashi (Hiroshima City University, Japan)
Pagepp. 361 - 366
KeywordFPGA, NIDS, Regular expression matching
AbstractIn this paper, we propose a matching method for look-ahead assertion on our pattern independent regular expression matching engine. Our pattern independent engine is suitable for network intrusion detection systems (NIDSs), which require quick updating of patterns. Look-ahead assertion is often used to describe patterns in NIDSs. However, as far as we know, existing pattern independent matching engines which can handle look-ahead assertion have not been proposed. In the proposed matching method, we introduce a preprocessing circuit into a matching engine. It performs matching for look-ahead assertion by searching from the end of a text to the beginning of the text. We also discuss the throughput of the proposed engine.
PDF file

R3-14
TitleHighly-parallel AES Processing for Five Confidentiality Modes with Massive-Parallel SIMD Matrix Processor
Author*Hiroki Yoshikawa, Takeshi Kumaki, Takeshi Fujino (Ritsumeikan University, Japan)
Pagepp. 367 - 371
KeywordAES, SIMD, matrix-processing architecture, cipher mode, parallel processing
AbstractThis paper presents a Highly-parallel AES processing of five confidentiality mode implementation with a Massive-Parallel SIMD Matrix processor (MX-1). MX-1 has 1,024 2-bit processing elements, which are connected by a flexible switching network, and supports 2-bit 1,024-way bit-serial and word-parallel operations in a single command. A method of parallel ECB processing with MX-1 has been reported previously. This research realizes to implement other AES cipher modes for expanding MX-1 capability. In order to realize the confidential of AES processing, we implemented AES with other cipher modes.

R3-15
TitleA Trace-Back Method with Source States and its Application to Viterbi Decoders of Low Power and Short Latency
Author*Kazuhito Ito (Saitama University, Japan)
Pagepp. 372 - 377
KeywordViterbi algorithm, Convolutional code, Source state, Low power
AbstractThe Viterbi algorithm is widely used for decoding of the convolutional codes. To find the survivor path, the traceback method is often employed because it consumes less power than the register exchange method especially for convolutional codes with many states. The disadvantage of the conventional trace-back using decision bits is the long decode latency. In this paper, a method of trace-back with source states instead of decision bits is proposed which reduces the number of memory accesses. The dedicated memory is also presented which supports the proposed trace-back method. The reduced memory accesses result in smaller power consumption and shorter decode latency than the conventional method.
PDF file

R3-16
TitleEvaluation of Migration Methods for Island Based Parallel Genetic Algorithm on CUDA
Author*Yuri Ardila, Shigeru Yamashita (Ritsumeikan University, Japan)
Pagepp. 378 - 383
KeywordEvolutionary Algorithm, Optimization, CUDA, GPU
AbstractIn EDA research community, various optimization problems have been studied so far. One of the successful metaheuristics for EDAs is Genetic Algorithm (GA). To speed-up many optimization methods based on GA, parallel GA implementations using GPUs have been proposed. This paper proposes new migration methods for parallel island based GAs, namely Roulette Wheel Migration (RWM), Developed City Migration (DCM), and Developed City Migration-α (DCM-α), and compares these methods to an existing method, Unidirectional Ring Migration. We implement our parallel GA on the CUDA's newest architecture, the Fermi architecture. The implemented parallel island based GA with the proposed migratios methods is tested using Travelling Salesman Problem benchmark. Our experimental results show that two of our proposed migration methods, RWM and DCM-α, are better than the existing method from the viewpoint of execution speed and solution quality.

R3-17
TitleFPGA Design of User Monitoring System for Display Power Control
Author*Tomoaki Ando, Vasily Moshnyaga (Fukuoka University, Japan)
Pagepp. 384 - 389
Keywordlow-power, FPGA, design, eye-tracking
AbstractThis paper describes the FPGA design of user-monitoring system for power management of PC display. From the camera readings the system detects whether the user looks at the screen or not and produces signals to control the display backlight. The system provides over 88% eye detection accuracy at 8f/s image processing rate. We describe the hardware and present the results of its experimental evaluation.

R3-18
TitleA Debug Solution with Synchronizer for CDC
Author*Akitoshi Matsuda (Kyushu University, Japan), Shinichi Baba (Kyushu Embedded Forum, Japan)
Pagepp. 390 - 393
Keywordclock domain crossing, low power, synchronizer
AbstractIt is important to advance correspondence of the high-performance and low-power requirements in system LSI designs. A CDC (clock domain crossing) verification solution needs to be deployed to detect efficiently debug the causes of CDC issues as well as to perform analysis of the design in low power issues. Even if some synchronizers are added for solving CDC issues, we have to make sure the amount of power. This paper describes that the power consumption decreased several percent using synchronizers by some case studies.

R3-19s
TitleA Low Power-Delay Product Processor Using Multi-valued Decision Diagram Machine
Author*Hiroki Nakahara (Kagoshima University, Japan), Tsutomu Sasao, Munehiro Matsuura (Kyushu Institute of Technology, Japan)
Pagepp. 394 - 395
KeywordBDD, MDD, Processor, MPU, Low Power
AbstractA heterogeneous multi-valued decision diagram of encoded characteristic function for non-zero outputs~(HMDD for ECFN) represents a multi-output logic function efficiently. As for the speed, the HMDD for ECFN machine is 3.02 times faster than the Core~i5 processor, and is 12.50 times faster than the Nios~II processor. As for the power-delay product, it is 32.72 times lower than the Core~i5 processor, and is 57.92 times lower than the Nios~II processor.
PDF file

R3-20
TitleA TMR-based Soft Error Mitigation Technique With Less Area Overhead in High-Level Synthesis
AuthorDaiki Tsuruta, *Masayuki Wakizaka, Yuko Hara-Azumi, Shigeru Yamashita (Ritsumeikan University, Japan)
Pagepp. 396 - 401
KeywordHigh-Level Synthesis, Fault Tolerant, TMR
AbstractIt is very important to consider soft errors in LSI designs. Although TMR (Triple Modular Redundancy) is an effective way of preventing soft errors, but it increases the mounting area in data path. In this paper, we propose a technique that can decrease the mounting area in data path for generating soft-error tolerant LSIs in high-level synthesis. Through experiments, our method demonstrates that it achieves high reliability at little area overhead compared with a traditional TMR-based method.

R3-21
TitlePipeline Circuit Synthesis from C Descriptions for Fast Memory Access in System LSI
Author*Yu-ichi Kitamura (Kinki University, Japan), Kazuya Kishida (Panasonic Industrial Devices S&T, Japan), Takashi Kambe (Kinki University, Japan)
Pagepp. 402 - 407
Keywordmemory access, C based deisgn, behavior synthesis, pipelining, register
AbstractHigh level design methodologies are becoming more and more important in the design of large system LSI devices. As a result, behavioral synthesis from C and other high level languages is key to achieving the productivity demanded by such large designs. For memory intensive applications in particular, the automatic identi cation, optimization and synthesis of memory access operations is essential. This paper describes a method for automatically generating behavioral descriptions for memory access pipeline circuits. Combined with registerization, the approach can accelerate Memory Accesses (MA) irrespective of the degree of data reuse. The method is applied to well-known algorithms used in applications such as speech recognition, JPEG encoding and particle tracking technology, and its effectiveness evaluated.

R3-22
TitleA PE-based Pipelining and Assignment Algorithm for Coarse Grained Dynamic Reconfigurable Circuits
Author*Nobuyuki Araki, Takashi Kambe (Kinki University, Japan)
Pagepp. 408 - 413
KeywordReconfigurable Computing, pipelining, PE assignment, C level language, configuration synthesis
AbstractReconfigurable Computing (RC) has been proposed as a new paradigm to address the conflicting design requirements of high performance and area efficiency. Coarse-grained architecture RC (CGA-RC) operates at the word level of granularity and exhibits better power and performance features than fine-grained architectures. However, in a CGA-RC system, the processing elements (PE) implement several types of multiple arithmetic operations and the routing between them has a fixed architecture. It is difficult for these systems to achieve both good performance and high PE utilization automatically for all applications. To cope with this issue, we propose a PE-based automatic loop pipelining algorithm to accelerate loop processing and a simultaneous PE assignment and routing algorithm to improve the PE utilization ratio in CGA-RC. In this paper, we investigate and evaluate these algorithms.

R3-23
TitleHigh-Level Synthesis Using Partially-Programmable Resources for Yield Improvement
Author*Yuko Hara-Azumi (University of California, Irvine, U.S.A.), Hiroyuki Tomiyama, Shigeru Yamashita (Ritsumeikan University, Japan), Nikil D. Dutt (University of California, Irvine, U.S.A.)
Pagepp. 414 - 419
KeywordHigh-level synthesis, Partially-programmable circuits, Yield improvement, Resource binding
AbstractThis paper proposes a novel binding technique in high-level synthesis (HLS) for yield improvement by using resources realized by Partially-Programmable Circuits (PPCs). A PPC, which has been recently developed, is very unique in that it can improve yield by reconfiguring its internal functionality depending on the faults detected after fabrication. We aim at further improving the yield by utilizing the PPC-realized resources. Our work performs resource binding in HLS considering the reconfigurations of the PPC-realized resources after fabrication, i.e., maximizing the yield expectation. Our work is formulated as an ILP problem. Several case studies demonstrate the effectiveness of our work.

R3-24
TitleA Method of Power Supply Voltage Assignment and Scheduling of Operations to Reduce Energy Consumption of Error Detectable Computations
Author*Yuki Suda, Kazuhito Ito (Saitama University, Japan)
Pagepp. 420 - 424
Keywordsupply voltage assignment, scheduling, low power, error detection, dependability
AbstractAs the VLSI technology evolves, VLSI circuits are becoming more vulnerable to noises such as the crosstalk, the power supply fluctuation, and single event upsets (SEU). To detect an error caused by the SEU in functional units, operations are executed twice and the results are compared to check if those are identical or not. Such doubly executing operations and the comparison may require large energy consumption. In this paper a method of the power supply voltage assignment and the scheduling of operations is proposed to reduce the energy consumption of the error detectable circuits.
PDF file

R3-25
TitleSoftware Design Methodology based on Energy Consumption Model Considering Relationship between Software and Hardware
Author*Koji Kurihara, Hiromasa Yamauchi, Toshiya Otomo, Takahisa Suzuki (Fujitsu Laboratories Ltd., Japan), Yuta Teranishi (Fujitsu Kyushu Network Technologies Limited, Japan), Koichiro Yamashita (Fujitsu Laboratories Ltd., Japan)
Pagepp. 425 - 430
Keywordmulti-core, energy consumption, model
AbstractIn an SoC for industrial systems, there is a case that we have to optimize energy consumption and performance with existing software and hardware. However, it is difficult to achieve this without evaluation methodology considering relationship between software and hardware. Therefore, we propose an evaluation methodology based on energy consumption model considering relationship between software and hardware. We verified the accuracy of our methodology by comparing it to an experimental result.
PDF file

R3-26
TitleElectro-Thermal Modeling and Reliability Simulation of Power MOSFETs with SystemC-AMS - Case Study: An Unclamped Inductive Switching Test Circuit
Author*Keiji Nakabayashi, Takahiro Ozasa (Keirex Technology Inc., Japan), Tamiyo Nakabayashi (Nara Women's University, Japan)
Pagepp. 431 - 436
KeywordSystemC-AMS, Power MOSFET, Electro-Thermal Simulation, Device Modeling, Unclamped Inductive Switching test circuit
AbstractWe present a new technique for the electro-thermal modeling and reliability simulation of power MOSFETs with SystemC-AMS. We model the non-linear electrical characteristics and self-heating effect of the power MOSFETs, and improve a numerical integration method in order to solve numerical instability of SystemC-AMS. Our technique is verified by experimental results using an Unclamped Inductive Switching (UIS) test circuit.
PDF file


Invited Talk II
Time: 13:15 - 14:15 Friday, March 9, 2012
Location: Int'l Conf. Room
Chair: Tohru Ishihara (Kyoto University, Japan)

I2 (Time: 13:15 - 14:15)
TitleInnovating the SoC Design for Emerging Memory Technologies
Author*Sungjoo Yoo (POSTECH, Republic of Korea)
Pagepp. 437 - 438
AbstractA new emerging memory technology, Phase-change RAM is gaining more and more attention as a complement or replacement of existing DRAM in the main memory subsystem. In order for PRAM to be applied to the main memory, its limitations of write endurance, long read/write latency, high write power consumption need to be overcome on the PRAM chip and the SoC utilizing the PRAM. In this paper, we explain recent works on innovating SoC designs to better utilize PRAM-based main memory.
PDF file


Poster IV
Time: 14:15 - 16:00 Friday, March 9, 2012
Location: Int'l Conf. Room & Mtg. Room 31
Chairs: Chikaaki Kodama (Toshiba Corp., Japan), Keishi Sakanushi (Osaka University, Japan)

R4-1
TitleDesign Automation for Digital Microfluidic Biochips: From Fluidic-Level Toward Chip-Level
AuthorTsung-Wei Huang, *Tsung-Yi Ho (National Cheng Kung University, Taiwan)
Pagepp. 439 - 444
KeywordChip-Level, Digital Microfluidic Biochips, Optimization, Physical Design, Synthesis
AbstractAdvances in droplet-based digital microfluidic biochips (DMFBs) have led to the emergence of biochips for automating laboratory procedures in biochemistry and molecular biology. These devices enable the precise control of microliter of nanoliter volumes of biochemical samples and reagents. They combine electronics with biology, and integrate various bioassay operations, such as sample preparation, analysis, separation, and detection. To meet the challenges of increasing design complexity, computer-aided-design (CAD) tools have been involved to build DMFBs efficiently. This paper provides an overview of DMFBs and describes emerging CAD tools for the automated synthesis and optimization of DMFB designs, from fluidic-level synthesis to chip-level design. Design automations are expected to relieve the design burden of manual optimization of bioassays, time-consuming chip designs, and costly testing and maintenance procedures. With the assistance of CAD tools, users can concentrate on the development and abstraction of nanoscale bioassays while leaving chip optimization and implementation details to CAD tools.

R4-2
TitleTiming-Aware Clock Gating Algorithm for Pulse-Latch Circuits
Author*Zong-Han Yang, Tsung-Yi Ho (National Cheng Kung University, Taiwan)
Pagepp. 445 - 450
KeywordClock Gating, Pulse Latch, Timing
AbstractLow power design is a crucial issue in modern circuit design. Recently, several techniques are proposed to save power consumption. One of them is the pulse-latch technologies which replace the flip-flops with pulse-latches due to smaller capacitance. To further reduce power consumption of pulse-latch-based circuits, the clock gating of pulse-latch, which is called pulser gating, has been proposed recently. However, pulser gating may cause the violation of setup time constraint and thus the data cannot be stored to the registers correctly, causing a fatal error in the design. In this paper, we propose an algorithm to deal with the problem of pulser gating and setup time constraint simultaneously. We use a line-search algorithm to capture the problem of setup time constraint and apply the minimum-cost maximum-flow technique to determine the clock tree topology of pulse-latch-based circuits. Experimental results show that our algorithm can reduce power consumption effectively by 58.35% on average compared to binary merge algorithm.

R4-3
TitleResistivity-based Modeling of Substrate Non-uniformity for Resistance Extraction of Low-Resistivity Substrate
Author*Yasuhiro Ogasahara, Toshiki Kanamoto (Renesas Electronics Corp., Japan), Hisato Inaba, Toshiharu Chiba (Renesas Design Corp., Japan)
Pagepp. 451 - 456
Keywordsubstrate noise, substrate extraction, low-resistivity substrate, doping profile
AbstractThis paper discusses modeling of non-uniform substrate resistivity for substrate resistance extraction. Though substrate resistivity of each substrate layer is frequently assumed to be uniform, doping profile of each substrate layer is not uniform. We present the extraction error of substrate resistance under uniform resistivity assumption. The resistivity model which enables accurate resistance extraction of substrate with non-uniform profile is suggested. We also demonstrate characterization of the suggested model using substrate resistances which are easily obtained from fabricated chips.
PDF file

R4-4
TitleTemperature-Constrained Fixed-Outline Floorplanning for 3D ICs
AuthorCiao-Yu Hong, Wai-Kei Mak, *Ting-Chi Wang (Department of Computer Science National Tsing Hua University, Taiwan)
Pagepp. 457 - 459
KeywordTemperature, Fixed-Outline, Floorplanning, 3D-IC
AbstractThree-dimensional (3D) ICs are produced by stacking multiple dies and delivering inter-die signals with Through-Silicon Vias (TSVs). Typically, TSVs which deliver signals among dies are called signal TSVs, while those enhancing heat dissipation are called thermal TSVs. In this paper we present a temperature-constrained fixed-outline 3D-IC floorplanner which also simultaneously places signal and thermal TSVs to benefit wirelength and temperature reduction. Encouraging experimental results are shown to demonstrate the effectiveness and efficiency of our floorplanner.

R4-5
TitleA GPGPU Implementation of Parallel Backward Euler Algorithm for Power Grid Circuit Simulation
AuthorLei Lin, *Hayato Shiono, Makoto Yokota, Masahiro Fukui (Ritsumeikan University, Japan)
Pagepp. 460 - 465
Keywordpower grid, simulator, GPGPU, Backward Euler
AbstractWith the increase in VLSI scale, have been increasing the time required for power grid simulation. This paper describes a fast and accurate parallel transient simulator for power grid, which is implemented by GPU (Graphics Processing Unit) using CUDA. This simulator employs accurate simulation by Backward Eular method. Experimental results show that the proposed simulator can achieve 86.2 times faster than CPU software.

R4-6s
TitleA Third Order Delta-Sigma Modulator with Shared Opamp Technique for Wireless Applications
Author*Ghazal Fahmy, Daisuke Kanemoto, Haruichi Kanaya, Ramesh Pokharel, Keiji Yoshida (Kyushu University, Japan)
Pagepp. 466 - 467
Keyworddelta- sigma modulator, ADC, shared-opamp
AbstractThis paper described the design of A third orders delta-sigma modulator (DSM) exploited shared opamp technique in order to reduce number of opamp required, consequently the total power consumption for the modulator decreased as well as required area decreased too. The architecture relaxed comparator speed which appropriate for wireless applications. First and second stages are sharing one opamp in integration and sampling phase. The proposed circuit has been designed on TSMC 0.18um CMOS technology. 2MHz Bandwidth, 50dB Peak Signal-to-Quantization-Noise Ratio (SQNR), which is suitable for WCDMA, have been achieved. It consumes 2.4mW with power supply 1.2V and area is 0.3mm2.
PDF file

R4-7s
TitleA Self-Organization Maps Approach to FPGA Placement
AuthorMotoki Amagasaki, *Yasuaki Tomonari, Masahiro Iida, Morihiro Kuga, Toshinori Sueyoshi (Kumamoto University, Japan)
Pagepp. 468 - 469
KeywordSOM, FPGA, Placement
AbstractCell placement is an important phase of current Field Programmable Gate Array(FPGA) cir- cuit design. However, this placement problem is NP- hard. Although nondeterministic algorithms such as Simulated Annealing(SA) are successful in solving this problem, they are known to be slow. In this paper, we introduce a new neural network approach to placement problem of FPGA. The used network is a Kohonen self-organization Map. A connection relation ship of cluster-level netlists is converted to a a set of appropriate input vectors. These vectors which have higher dimensionality are fed to the self-organization Map at random to map themselves onto a 2 dimensional plane of the regular chip. The key feature is that SOM algorithm perform the cell placement to minimize total connection length in the circuit. In this paper, we evaluate our placement tool using some benchmark circuits.
PDF file

R4-8
TitleThe Development of CAD System for Via Programmable Structured ASIC VPEX3
Author*Ryohei Hori (Ritsumeikan University, Japan), Masaya Yoshikawa (Meijo University, Japan), Takeshi Fujino (Ritsumeikan University, Japan)
Pagepp. 470 - 475
KeywordStructured ASIC, Via Programmable, Exclusive-or
AbstractVarious kinds of structured ASICs (SA) which can be customized by only few mask, make the photomask cost drastically decrease. We have been developing the novel VPSA architecture "VPEX (Via Programmable logic device using EXclusive-or array)". It is necessary to develop CAD system for VPEX, because there are no general tools supporting Placement and Routing for VPSA. In this paper, we describe the dedicated CAD system and studied the area penalty of VPEX compared with ASIC.

R4-9
TitleDesign of Low-Voltage High-Precision Complex Quadrature Modulators
Author*Takahiro Tsushima, Tsuneo Tsukahara (University of Aizu, Japan)
Pagepp. 476 - 481
Keywordquadrature modulator, LO calibration, transmitter
AbstractWe propose novel structures of quadrature modulator suitable for software-defined radio and cognitive radio transmitters. The proposed modulators can correct LO phase and amplitude errors, and achieve high modulation accuracy and low power consumption. The simulated sideband rejection ratios are better than 60dB when phase error is 3 degrees and amplitude error is 0.1 dB and the power consumption is about 13mW.

R4-10s
TitleA Design of 2GHz Band O-QPSK Wireless Transmitter using 0.18µmCMOS Technology
Author*Yuki Mitani, Nobuhiko Nakano (Keio University, Japan)
Pagepp. 482 - 483
KeywordBMI, wireless
AbstractBrain-Machine-Interface(BMI) has been attracted attention in recent years, and the demands for wireless communication are increasing. In this paper, we proposed a transmitter using O-QPSK on 0.18µm CMOS technology to meet the requirements for wireless communication. This transmitter operates at 1V supply voltage, and current consumption is 15.03mA. Output is -3dBm, and the maximum data rate is 12.8Mbps.
PDF file

R4-11
TitleA 0.5V PWM-Driven Analog Differential Amplifier Using Subthreshold Leakage Current
Author*Tomochika Harada, Ryuuya Otaki (Yamagata University, Japan)
Pagepp. 484 - 487
KeywordPWM, subthreshold, amplifier, mixied circuit
AbstractIn this paper, we design and fabricate a PWM-driven analog differential amplifier using only sub-uA order subthreshold current for realizing ultra-low power analog/digital LSI system by using low output power supply. In this circuit, 2 inputs analog data are translated to PWM signals. And they are operated using differential calculation by digital processing method. This circuit has almost the same performance as the ultra-low power analog operational amplifier we designed. It is designed and fabricated using triple-well structure 65nm CMOS process. From measurement results, we make sure of the circuit operation and power consumption, which is 1.06uW@55kHz.

R4-12s
Title16PE 3D-Mesh NOC Based 3D Multicore Design and Implementation
AuthorMohamad Hairol Jabbar (ENSTA ParisTech, France), Dominique Houzet (GIPSA-LAB, France), *Omar Hammami (ENSTA ParisTech, France)
Pagepp. 488 - 489
Keyword3D, multicore, mesh, noc, tezzaron
AbstractIn this paper, we describe the design flow, architecture and implementation of our 3D multiprocessor with NoC . The design based on 16 processors communicating using a 4x2x2 mesh NoC spread on two tiers is discussed in detail and will be fabricated using Tezzaron technology with 130 nm Global Foundaries standard library. The purpose of this work is to accurately measure NoC performances in real 3D chip when running mobile multimedia applications to evaluate the impact of 3D architecture compared to 2D

R4-13s
TitleA Performance Improvement for Floating-Point Arithmetic Unit with Precision Degradation Detection
Author*Soseki Aniya, Toshiaki Kitamura (Graduate School of Information Sciences, Hiroshima City University, Japan)
Pagepp. 490 - 491
Keywordperformance improvement, precision degradation detection, vector processor
AbstractSome errors are very important in the scientific computation observed in floating-point calculations caused by rounding, overflow, underflow, loss of significant digits, or loss of trailing digits. In the prior work, we designed a vector co-processor that has floating-point arithmetic units with detection of loss of significant digits and precision degradation. We propose a partitioned vector co-processor design. The design can improve performance of the data transfer throughput between vector co-processor and SSRAM. Compared to the prior work, the number of execution cycles of the vector load instruction becomes twice faster in the RTL simulation.
PDF file

R4-14
TitleHardware Architecture for Real-Time Operation of Learning-Based Super-Resolution Using Binary Search Tree
Author*Takahiro Kitayama, Kohei Michibata, Yuzuru Shizuku, Tetsuya Hirose, Nobutaka Kuroki, Masahiro Numa (Kobe University, Japan)
Pagepp. 492 - 496
KeywordLearning-Based Super-Resolution, hardware architecture, stream data-processing system, real-time operation, pipeline
AbstractIn this paper, we propose a hardware architecture for real-time operation of Learning-Based Super-Resolution using binary search tree. In the proposed architecture, the stream data-processing system is applied in the whole circuit, and burst transmission is applied between each module to improve the transfer rate. Moreover, the Search Dictionary module which has been a bottleneck is pipelined to improve the throughput. Experimental results have shown that the processing speed with our architecture is about 83 times faster than that of a software processing for a picture of 1,024 × 1,024 pixels.

R4-15
TitleArchitecture Optimization of Group Signature Circuits for Cloud Computing Environment
Author*Sumio Morioka, Jun Furukawa, Yuichi Nakamura, Kazue Sako (NEC Corporation, Japan)
Pagepp. 497 - 502
Keywordcloud security, digital signature, server accelerator, IP core design, HLS
AbstractGroup signature is one of the main theme in recent digital signature studies. The signature algorithm is a combination of more than 30 elliptic curve (ECC), modular (RSA), long-bit integer (INT) and hash arithmetic operations. In cloud computing environment where a lot of client devices (mobile devices, embedded systems, sensor devices and etc.) are connected to servers in data center via network, low-power and fast H/W accelerators are strongly desired. In this paper, we propose a H/W macro-architecture for servers in data center, and will compare it with the architecture for client devices. While these architectures are completely different, we can use the same H/W design methodology where the architectures are explored automatically by a custom-made HLS (High Level Synthesis) tool.
PDF file

R4-16
TitleEfficient Packet Transmission Priority Control Method for Network-on-Chip
Author*Yusuke Sekihara, Takashi Aoki, Akira Onozawa (NTT Microsystem Integration Laboratories, Japan)
Pagepp. 503 - 507
KeywordNoC, performance, priority, transmit, flit
AbstractTo meet the ever-increasing need for high-performance computing, the performance of a single processor has been improved almost to its limit and parallelization has thus become inevitable. NoC architecture based on packet switching is becoming popular for large-scale parallelism. In this paper, we propose a new packet transmission control method in the NoC architecture that can improve the efficiency of the buffers. The simulation results prove that the proposed method can improve average latency about 10-20% when congested.
PDF file

R4-17s
TitleDirect Memory Access Transfer Method with Chaining for Inter-Chip Network
Author*Eiichi Sasaki, Daisuke Sasaki, Ikan Wang, Yusuke Koizumi, Hideharu Amano (Keio University, Japan)
Pagepp. 508 - 509
KeywordNoC, multi-core
AbstractWireless 3D-NoC architecture has highly flexibility, but it is important how to communicate between processing nodes. We propose a DMA transfer mechanism using packet-request for inter-chip network router. In evaluation, by using the direct data transfer using the chaining mechanism, 7.7 times improvement on communication latency was achieved.

R4-18
TitleEfficient Barrier Synchronization for 2D Meshed NoC-based Many-core Processors
Author*Lovic Gauthier, Farhad Mehdipour, Koji Inoue, Shinya Ueno, Hiroshi Sasaki (Kyushu University, Japan)
Pagepp. 510 - 515
KeywordBarrier, Synchronization, NoC, Many-core, Multi-thread
AbstractNetwork-on-Chip (NoC) based many-cores are becoming popular due to their high scalability compared to traditional bus-based architectures. However they still lack software tailored to their specificities. In this paper we propose several techniques for tailoring and combining barrier synchronizations in order to take advantage of the 2D-meshed NoCs. Experimental results show that our combined barriers achieve often twice shorter delays than state of the art barriers.
PDF file

R4-19
TitleEffective Distributed Parallel Scheduling Methodology for Mobile Cloud Computing
Author*Hiromasa Yamauchi, Koji Kurihara, Toshiya Otomo (Fujitsu Laboratories Ltd., Japan), Yuta Teranishi (Fujitsu Kyushu Network Technologies Ltd., Japan), Takahisa Suzuki, Koichiro Yamashita (Fujitsu Laboratories Ltd., Japan)
Pagepp. 516 - 521
KeywordMobile phone, Parallel processing, Cloud computing, Scheduling, Sensor network
AbstractThere is a category of the device such as mobile phones and the sensor devices. If each device is considered as a node, these devices will be considered to be a distributed parallel processing system. It is defined as “Mobile Cloud computing (MC)”. The collaborated processing between mobile phones, calculation by sensor devices, etc. are practical usage of MC. This MC differs from traditional parallel processing among servers, mainframe or HPC in respect of dynamic fluctuation of battery power and mobile network quality. We propose a distributed parallel scheduling methodology for MC and developed a simulator to analyze these characteristics and the bottleneck of MC.
PDF file

R4-20
TitleExtending Intent in Android for Distributed Collaboration Framework
Author*Takahiro Ito, Takuya Azumi, Nobuhiko Nishio (Ritsumeikan University, Japan)
Pagepp. 522 - 527
KeywordAndroid, Embedded System
AbstractThe Android is widely used on mobile devices. An approach to control embedded devices from Android was proposed. Moreover, frameworks to collaborate embedded devices were proposed. These proposals have some issues, however, at the point of flexibility. In this paper, we propose a flexible framework using "Intent" to control embedded devices from Android. Our framework makes Android possible to control embedded devices which are manufactured to use not only our framework but also existing frameworks.

R4-21
TitleEnergy Efficient Instruction-set Extension Considering Inline Expansion
Author*Sho Ninomiya, Keishi Sakanushi, Yoshinori Takeuchi, Masaharu Imai (Osaka University, Japan)
Pagepp. 528 - 533
KeywordInstruction-set Extension, Inline Expansion, Energy-efficient, ASIP, Embedded Systems
AbstractTo reduce energy consumption of applications in embedded systems, instruction-set extension suitable for the application is necessary on ASIP. Inline expansion, one of the software optimization, is not considered in conventional instruction set extension method. In this paper, we propose energy efficient instruction-set extension method considering inline expansion. The experiment shows the proposed method reduce more energy consumption.
PDF file

R4-22
TitleReduction of Glitches for Low-Power Multipliers Using 4-2 Compressors Based on Hybrid-CMOS Logic Style
Author*Yang-uk Son, Yuzuru Shizuku, Takeshi Kogure, Tetsuya Hirose, Nobutaka Kuroki, Masahiro Numa (Kobe University, Japan)
Pagepp. 534 - 538
Keywordlow-power, multiplier, glitch, 4-2 compressor, 4-2 tree architecture
AbstractIn this paper, we propose a technique to reduce glitches for reducing power consumption in multipliers. Conventional approaches using flip-flops for synchronization increase area and power. Our 4-2 compressor based on hybrid-CMOS logic style reduces glitches without additional circuits by using transmission-gates and pass-transistors which act like resistors when cascaded. In addition, CMOS inverters reduce speed deterioration. Simulation results have shown that the proposed technique reduces glitch activity by 1/12.

R4-23
TitleAffine Transformations of Logic Functions and Their Application to Affine Decompositions of Index Generation Functions
Author*Tsutomu Sasao, Masao Maeta (Kyushu Institute of Technology, Japan), Radomir Stankovic (University of Nis, Serbia), Stanislav Stankovic (Tampere University of Technology, Finland)
Pagepp. 539 - 543
Keywordlinear transform, Incompletely specified function, functional decomposition, Boolean matching
AbstractAffine transformations are used to find optimal affine decompositions of incompletely specified index generation functions. This paper shows that the number of equivalence classes to consider is equal to the number of affine equivalence classes of logic functions. Exact minimum solutions with up to five variables are obtained.

R4-24
TitleAn Error Diagnosis Technique Based on SAT Solver
Author*Tomoki Matsuyama, Hiroto Senzaki, Kosuke Watanabe, Tetsuya Hirose, Nobutaka Kuroki, Masahiro Numa (Kobe University, Japan)
Pagepp. 544 - 548
KeywordECO, Error Diagnosis, SAT solver
AbstractThis paper presents an error diagnosis technique based on a SAT solver, which has an advantage in lower memory consumption and larger number of variables to be processed in comparison with Binary Decision Diagrams (BDDs). The SAT solver is used for generating input patterns for error diagnosis, and verification of a solution by the proposed technique. By using the SAT solver, the proposed technique can rectify such large circuit that cannot be represented by BDDs. Experimental results have shown that our technique rectifies the circuit of 21,061 gates.

R4-25
TitlePerformance Evaluation of Various Configuration of Adder in Variable Latency Circuits with Error Detection/Correction Mechanism
Author*Kenta Ando, Atsushi Takahashi (Osaka University, Japan)
Pagepp. 549 - 554
Keyworderror detection/correction circuits, maximum delay time, minimum delay time, distribution of delay, effective clock period
AbstractThe performance of a circuit is improved by introducing error detection/correction mechanism which uses the variation of delays between Flip-Flops effectively. The performance of an error detection/correction circuit depends on the minimum delay, maximum delay, and delay distribution of the circuit. In general, the performance is better if the larger the minimum delay is and/or the lower the possibility of large delay is. However, circuits are usually designed so that the maximum delay is reduced as much as possible to maximize the performance in the conventional framework and are not necessarily fitted to error detection/correction framework. In this paper, in order to develop a circuit synthesis method for error detection/correction framework, various ripple-carry-adders (RCA) in which the minimum delay is increased by delay insertion and/or the probability of large delay is reduced by changing the configuration of the circuit components are designed and evaluated. In experiments, we confirm that a circuit obtained achieves a better performance in error detection/correction framework.
PDF file

R4-26
TitleA Delay Control Technique for Extremely Low-Voltage Subthreshold CMOS Digital Circuits
Author*Seiichiro Shiga, Tetsuya Hirose, Yuji Osaki, Nobutaka Kuroki, Masahiro Numa (Kobe University, Japan)
Pagepp. 555 - 559
KeywordCMOS, subthreshold, on-chip, compensation circuit, PVT variation
AbstractIn this paper, we propose a fully on-chip delay control technique for extremely low-voltage (ELV) subthreshold CMOS digital circuits. Because the performance of ELV subthreshold CMOS digital circuits degrades with the process, supply voltage, and temperature (PVT) variations, we developed a delay control circuit consisting of voltage and current reference circuits, a delay monitoring circuit, a current comparator, and a frequency-current converter. The operation of the circuit was confirmed by SPICE simulations with a set of 0.18-um standard CMOS parameters. The results demonstrated that process and temperature variations can be compensated 59% and 95%, respectively.


Invited Talk III
Time: 16:00 - 17:00 Friday, March 9, 2012
Location: Int'l Conf. Room
Chair: Nagisa Ishiura (Kwansei Gakuin University, Japan)

I3 (Time: 16:00 - 17:00)
TitleK Computer: Challenges making the Superior Quality Interconnect
Author*Takahide Yoshikawa (Fujitsu Laboratories Ltd., Japan)
Pagepp. 560 - 564
AbstractThe K computer, RIKEN and Fujitsu are now developing, has twice been awarded the title of the world's fastest computer by the TOP500 project. Throughout development, lots of bugs were detected, but these bugs were fixed before manufacturing. This was achieved by our advanced verification methodologies. Furthermore, the 88,128 nodes system can run 30 hours without any single fault. This was supported by our leading production test methodologies. This paper introduces these advanced verification and production methodologies.
PDF file