(Go to Top Page)

SASIMI 2015
The 19th Workshop on Synthesis And System Integration of Mixed Information Technologies
Technical Program

Remark: The presenter of each paper is marked with "*".
Technical Program:   SIMPLE version   DETAILED version with abstract
Author Index:   HERE

Session Schedule


Monday, March 16, 2015

Registration
8:00 -
Opening
8:40 - 9:00
K1  Keynote Speech I
9:00 - 10:00
Coffee Break
10:00 - 10:15
R1  Poster I
10:15 - 12:00
Lunch
12:00 - 13:30
I1  Invited Talk I
13:30 - 14:20
D  Panel Discussion
14:20 - 15:50
Coffee Break
15:50 - 16:10
R2  Poster II
16:10 - 17:55
Banquet
18:30 - 20:30

Tuesday, March 17, 2015

K2  Keynote Speech II
8:30 - 9:30
Coffee Break
9:30 - 9:45
R3  Poster III
9:45 - 11:30
Lunch Break
11:30 - 13:00
I2  Invited Talk II
13:00 - 13:50
Coffee Break
13:50 - 14:05
I3  Invited Talk III
14:05 - 14:55
R4  Poster IV
14:55 - 16:40
Closing
16:40 - 16:50


List of papers

Remark: The presenter of each paper is marked with "*".

Monday, March 16, 2015

Keynote Speech I
Time: 9:00 - 10:00 Monday, March 16, 2015
Chair: Ting-Chi Wang (National Tsing Hua University, Taiwan)

K1-1 (Time: 9:00 - 10:00)
TitleInfluence of Emerging Devices in Revitalizing Electronic Systems Design
AuthorVijaykrishnan Narayanan (Pennsylvania State University, U.S.A.)
Pagep. 1
AbstractThere are a multitude of ongoing efforts in exploring new physics, new devices, and new computational paradigms to sustain progress in the semiconductor industry. This talk will provide an overview of the influence of some promising emerging devices at the system level. It will also provide insights on challenges and solutions when designing entire systems in yet-to-mature technologies. Specifically, this talk will focus on steep sub-threshold slope transistors, non-volatile logic/memory elements, and coupled oscillators and their potential impact on energy-scavenged assistive health systems and intelligent vision systems.
PDF file


Poster I
Time: 10:15 - 12:00 Monday, March 16, 2015
Chairs: Ren-Song Tsay (National Tsing Hua University, Taiwan), Po-Hung Lin (National Chung Cheng University, Taiwan)

R1-1 (Time: 10:15 - 10:17)
TitleMemory Synthesis for Multi-Processor System-on-Chips with Reconfigurable 3D-stacked SRAMs
AuthorMeng-Ling Tsai, *Yi-Jung Chen, Yi-Ting Chen, Ru-Hua Chang (National Chi Nan University, Taiwan)
Pagepp. 2 - 7
KeywordMemory Synthesis, Reconfigurable 3D-stacked SRAMs
AbstractIntegrating Multi-Processor System-on-Chips (MPSoCs) with 3D-stacked reconfigurable SRAM tiles has been proposed for embedded systems with high memory demands. At runtime, the SRAM tiles are configured into several memory areas, which can be reconfigured according to the dynamic behavior of the system. Targeting this architecture, in this paper, we propose a data placement and memory area allocation algorithm. The goal of the proposed algorithm is to optimize the performance of the memory system by minimizing the on-chip memory access latency, the number of off-chip memory accesses, and the number of reconfigurations. Since the behavior of an embedded system can be described by a set of scenarios, where each scenario specifies a set of applications that would execute concurrently, the proposed algorithm synthesizes data placements and the memory area allocation for each scenario. Not only the data access patterns within the scenario but also among all scenarios are considered for data placement. We evaluate the proposed algorithm on a set of synthetic and real-world applications. The experimental results show that, compared to the existing data placement method designed for MPSoCs with distributed memory modules, the proposed algorithm achieves up to 11.72% of data access latency reduction.

R1-2 (Time: 10:17 - 10:19)
TitleThermal-Pattern-Aware Voltage Assignment for Task Scheduler on 3D Multi-Core Processors
AuthorChien-Hui Liao, *Cheng Suo, Charles Hung-Pin Wen (National Chiao Tung University, Taiwan)
Pagepp. 8 - 9
Keywordtask scheduling, 3D MCPs, hotspots, DVFS, voltage assignment
AbstractIn three-dimensional multi-core processors (3D-MCPs), hotspots are found more often and cause severe problems on system reliability and lifetime. Moreover, higher frequency of hotspot occurrence triggers more dynamic voltage and frequency scaling (DVFS), leading to degraded throughput. Therefore, to reduce the frequency of hotspot occurrence effectively, a new thermal-constrained task-scheduling algorithm based on the thermal-pattern-aware voltage assignment is proposed. Through the temperature profiles of different voltage assignments on 3D-MCPs, thermal-pattern aware voltage assignment is applied for reducing the rate of temperature increase among 3D-MCPs effectively. Furthermore, the proposed scheduler includes on-line allocation for 3D vertically-grouping cores and new vertically-grouping voltage scaling which considers thermal correlation among vertically-adjacent cores in 3D MCPs. Experimental results show that, compared to the previous thermal-constrained task-scheduling strategy, our task-scheduling algorithm can reduce the frequency of hotspot occurrence by 38.84% and can further improve throughput by 6.62%.

R1-3 (Time: 10:19 - 10:21)
TitleHigh-Level Synthesis from Programs with External Interrupt Handling
Author*Naoya Ito, Nagisa Ishiura (Kwansei Gakuin University, Japan), Hiroyuki Tomiyama (Ritsumeikan University, Japan), Hiroyuki Kanbara (Advanced Scientific Technology & Management Research Institute of KYOTO, Japan)
Pagepp. 10 - 15
KeywordHigh-level synthesis, Binary synthesis, External interrupt, ACAP
AbstractThis paper presents a method of synthesizing a given binary program, which contains external interrupt handling, into hardware whose behavior is equivalent to the CPU running the program. In our method, the system control coprocessor which CPU uses for interrupt handling is incorporated into the hardware as a functional unit. Instructions for accessing coprocessor registers, returning from interrupt handling, and making system calls are scheduled as operations, and bound to the coprocessor. Jump register instructions for calling and returning from interrupt service routines are synthesized using operations that convert instruction addresses into the corresponding states of the hardware. Assuming MIPS R3000 as a CPU, the proposed method has been implemented on top of binary synthesizer ACAP. A program of about 40 lines with an external interrupt service routine was synthesized into hardware, and it was confirmed that interrupt handling works correctly. The execution cycles and the delay were reduced by 14% and 26% respectively, at the cost of 1.1 times increase in hardware size.
PDF file

R1-4 (Time: 10:21 - 10:23)
TitleAn SOC Estimation System for Lithium Ion Batteries Considering Thermal Characteristics
Author*Ryu Ishizaki, Lei Lin, Naoki Kawarabayashi, Masahiro Fukui (Ritsumeikan University, Japan)
Pagepp. 16 - 21
KeywordExtended Karman Filter, SOC estimation, Arrhenius formula, Lithium ion Batteries
AbstractThis paper discusses an SOC estimation system for lithium ion batteries based on the Extended Karman Filter. The accuracy of the estimation is strongly dependent on accuracy of the battery model. We have newly formulated the equivalent circuit model that considers temperature and SOC dependencies. As the result, the error rate of the estimation bas been improved significantly. The evaluation shows that the new SOC estimation system can be used for wide range of temperature.
PDF file

R1-5 (Time: 10:23 - 10:25)
TitleDynamic Data Migration to Eliminate Bank-Level Interference for Stencil Applications in Multicore Systems
AuthorWei-Hen Lo, *Yen-Hao Chen, TingTing Hwang (National Tsing Hua University, Taiwan)
Pagepp. 22 - 27
Keyworddata migration, memory controller, page allocation, stencils, multi-threaded
AbstractA stencil computation repeatedly updates each point of a d-dimensional grid as a function of itself and its near neighbors. Modern automatic transformation compiler framework can generate efficient tiling parallel stencil codes. Dynamically scheduling parallel stencils significantly improves system performance. However, memory contention problem exacerbates because of less idling cores and more memory requests sent to the DRAM memory. Traditional OS page coloring method which partitions the memory pages in advance can not alleviate the memory contention in dynamic scheduling parallel stencils. To address this issue, we provide a new software/hardware cooperative dynamic data migration method by exploiting the update-and-reuse property of stencils. We notice that the OS page allocation needs to be aware of the flexibility for dynamic data migration in memory to eliminate the memory interference. Experimental evaluation in a 8-core x86 system shows that our method can improve the system performance by 7% as compared with dynamic scheduling stencils in 8-cores 4-memory banks system.
PDF file

R1-6 (Time: 10:25 - 10:27)
TitleA Battery Smart Sensor and Its SOC Estimation Function for Assembled Lithium-Ion Batteries
Author*Naoki Kawarabayashi, Lei Lin, Ryu Ishizaki, Masahiro Fukui (Ritsumeikan University, Japan), Isao Shirakawa (University of Hyogo, Japan)
Pagepp. 28 - 33
Keywordassembled Lithium-ion batteries, Battery Smart Sensor, SOC
AbstractThis paper discusses about the smart sensor which is the important technology in a smart grid. We have developed the system to monitor the battery condition by the attached sensor. It accumulates the measured data onto the WEB. The battery sensor is implemented with a microcomputer. We have first developed a high accurate and practical SOC sensor using the Extended Kalman filter as a function of the battery sensor. Based on the SOC estimation function for a single cell, the SOC estimation function for assembled Lithium-ion batteries is also developed.
PDF file

R1-7 (Time: 10:27 - 10:29)
TitleA Fast and Highly Accurate Statistical Based Model for Performance Estimation of MPSoC On-Chip Bus
Author*Farhan Shafiq, Tsuyoshi Isshiki, Dongju Li, Hiroaki Kunieda (Tokyo Institute of Technology, Japan)
Pagepp. 34 - 39
Keywordbus, statistical model, performance prediction, arbitration stall, bus stall
AbstractWhile Multiprocessor System-On-Chips (MPSoCs) are becoming widely adopted in embedded systems, communication architecture analysis for MPSoCs becomes ever more complex. There is a growing need for faster and accurate performance estimation techniques for on-chip bus. In this paper, we present a novel statistical based technique that makes use of accumulated "workload statistics" to accurately predict the "stall cycle counts" caused due to bus contention. This eliminates the need to simulate arbitration on every bus access, resulting in substantial speed-up. It is assumed that each Processor in the system has a distinct fixed priority, and arbitration is based on priority. We verify accuracy of our proposed model against results achieved by cycle accurate simulation. Two kinds of traffic is used for experiments. Synthetically generated traffic as well as traffic from real-world application is used to verify the bus model. We report an accuracy with an error range of 0.1% - 5% for the synthetic traffic as well as achieving a speedup of 7-10x. For the real traffic, we use a limited “single blocking” bus model and report results accordingly.
PDF file

R1-8 (Time: 10:29 - 10:31)
TitleC-Based RTL Design Framework for Processor and Hardware-IP Synthesis
Author*Tsuyoshi Isshiki, Koshiro Date, Daisuke Kugimiya, Dongju Li, Hiroaki Kunieda (Tokyo Institute of Technology, Japan)
Pagepp. 40 - 45
KeywordC-based design, RTL synthesis, processor synthesis, verification, instruction-set simulator
AbstractIn this paper, we propose a new C-based design framework where the RTL structure is directly described on dataflow C coding style, while the same C code serves as a fast simulation model. Design examples on image signal processing pipeline shows the effectiveness of the proposed C-based tool framework where the dataflow C codes have 1/3 to 1/5 of the number of lines compared to HDLs, can generate high performance circuits having enormously high parallelism of 4000 operations/cycle. Also for RISC processor designs, our dataflow C coding style effectively captures the behavior of the instruction set simulator with less than 1000 lines of C code which is can be directly transformed into RTL structure
PDF file

R1-9 (Time: 10:31 - 10:33)
TitleProfiler for Control System in System Level Design
Author*Miaw Torng-Der, Yuki Ando, Shinya Honda, Hiroaki Takada, Masato Edahiro (Nagoya University, Japan)
Pagepp. 46 - 51
Keywordprofiler, system level design, FPGA, control system
AbstractThis paper introduces a profiler architecture for control system in system-level design. When design a control system, we need to consider two things. The first thing is the asynchronous signal coming from sensor and actuators, called interrupt request signal. The second thing is the process should have a higher priority and be activated by interrupt request signal, called interrupt handler. However, existing profiler cannot obtain the information of the interrupt request signal nor interrupt handler.
PDF file

R1-10 (Time: 10:33 - 10:35)
TitleSocket-Based Performance Monitoring Tool Suite for System-on-Chips
Author*Ting-Hsuan Wu, Tsun-Hsin Chang, Ing-Jer Huang (National Sun Yat-sen University, Taiwan)
Pagepp. 52 - 55
Keywordperformance, monitoring, system, software, hardware
AbstractSince the SoC industry had shifted its development goal from processor clock frequencies increasing to work distribution among multiple IPs. In order to achieve better efficiency of SoC integration, the socket interfaces are adopted to eliminate the migration overhead from system to another. Therefore, this paper proposed a Socket-Based Performance Monitoring Tool Suite (SB PMTS) which is capable to provide a holistic-view of system behavior and performance by monitoring the two types of performance information: (1) The cycle-accurate execution time of a complete task. (2) The transaction events on the socket interfaces. Accordingly, SB PMTS will synchronize the performance information from different resources and enable the average designers to quickly assess the quality of the SoC without any instrumentation.

R1-11 (Time: 10:35 - 10:37)
TitleMinimization of Register Area Cost for Soft-Error Correction in Low Energy DMR Design
Author*Kazuhito Ito, Takumi Negishi (Saitama University, Japan)
Pagepp. 56 - 61
KeywordDMR, Low energy, Synthesis, Register minimization
AbstractDouble modular redundancy (DMR) is to execute an operation twice and detect soft-error by comparing the operation results. The soft-error is corrected by executing necessary operations again to obtain correct results. Such re-executing operations requires thier input data and many registers are needed to store the necessary data. In this paper, a method to minimize the area cost of registers is proposed while the minimization of operation energy consumption is considered with respect to the give constraints of time, resource, and delay penalty for error correction. The experimantal results show about 20% of register cost is reduced on average.
PDF file

R1-12 (Time: 10:37 - 10:39)
TitleSimultaneous Test Scheduling and TAM Bus Wire Assignment for Core-Based SoC Designs
AuthorTe-Jui Wang, *Ching-Chun Chiu, Shih-Hsu Huang (Chung Yuan Christian University, Taiwan)
Pagepp. 62 - 67
KeywordCore-Based Systems, Test Scheduling, Testing Time, Test Access Mechanism
AbstractThe reduction of total testing time is crucial for the saving of IC testing cost. In the testing of a core-based System-on-Chip (SoC) design, external tests are applied to cores via a specialized test access mechanism (TAM). Previous test scheduling algorithms assume that two external tests cannot utilize the TAM at the same time. However, in fact, if the external tests of different cores do not use the same TAM bus wire, they can be executed concurrently, which reduces the total testing time. Based on this observation, in this paper, we propose an effective and efficient algorithm to perform the simultaneous application of test scheduling and TAM bus wire assignment for the testing of core-based SoC designs. Compared with previous works, experimental results consistently show that the proposed approach can greatly reduce the total testing time.

R1-13 (Time: 10:39 - 10:41)
TitleAutomatic Analog Synthesis Platform with Low-Noise Consideration
AuthorYing-Chi Lien, Ching-Mao Lee, Chih-Wei Li, *Yi-Syue Han, Chien-Nan Jimmy Liu (National Central University, Taiwan)
Pagepp. 68 - 71
Keywordanalog synthesis, bio-signal, automatic sizing, layout automation
AbstractBecause the bio-signals are often very weak, they can be influenced by noise easily and become hard to distinguish. In this paper, an automatic analog synthesis platform is presented for bio-acquisition systems to generate the required circuits from specification to layout with low-noise consideration. Process variations and layout effects are also simultaneously considered to generate the required circuits with high design yield. Furthermore, a user-friendly GUI is also provided to help users complete the design flow successfully and efficiently. As shown in the experimental results, this analog synthesis platform is able to generate the required circuits in seconds with low noise. The chip implementation result also verifies the capability of this tool to generate the required designs with fabricable quality.

R1-14 (Time: 10:41 - 10:43)
TitleIntra-Vehicle Network Routing Algorithm for Weight and Wireless Transmit Power Minimization
Author*Ta-Yang Huang, Chia-Jui Chang (National Cheng Kung University, Taiwan), Chung-Wei Lin (University of California at Berkeley, U.S.A.), Sudip Roy (National Cheng Kung University, Taiwan), Tsung-Yi Ho (National Chiao Tung University, Taiwan)
Pagepp. 72 - 77
KeywordIn-Vehicle Network, Routing
AbstractAs the complexity of vehicle distributed systems increases rapidly, several hundreds of devices (sensors, actuators, etc.) are being placed in a modern automotive system. With the increase in wiring cables connecting these devices, the weight of a car increases significantly, which degrades the fuel efficiency in driving. In order to reduce the weight of a car, wireless communication has been introduced to replace wiring cables between some devices. However, the extra energy consumption for packet transmissions by wireless devices requires frequent maintenances, e.g., recharging of batteries. In this paper, we propose an intra-vehicle network routing algorithm to simultaneously minimize the wiring weight and the transmission power for wireless communication. Experimental results show that the proposed method can effectively minimize the wiring weight and the transmit power for wireless communication.

R1-15 (Time: 10:43 - 10:45)
TitleAn Automated Flow Integration to Help Analog Layout Design Migration
AuthorJou-Chun Lin, *Po-Cheng Pan, Ching-Yu Chin, Hung-Ming Chen (National Chiao Tung University, Taiwan)
Pagepp. 78 - 82
Keywordanalog layout, design migration
AbstractThe development of the computer-aided-design (CAD) tools for digital circuits has been perfected for these years. However, the CAD tools for analog circuits still remains a great deal of challenges. Since the size of transistors scales down as the process technology advances, design migration problem takes place to increase the degree of layout reusing. With previous work such as placement migration and routing preservation tool, further performance boost becomes the next step. We aim at the width of wires that impacts resistance and capacitance of wires so as to improve the performance. We implement a flow, which can further improve the performance, generate the modified layout automatically and pass the verification check, to speed up the analysis process or design flow by adjusting the wire width. We apply greedy heuristic and simulated annealing algorithm in our framework. Our flow can help with the analog layout synthesis flow in more efficient way.
PDF file

R1-16 (Time: 10:45 - 10:47)
TitleRip-Up and Reroute Based Routing Algorithm for Self-Aligned Double Patterning
Author*Takeshi Ihara, Atsushi Takahashi (Tokyo Institute of Technology, Japan), Chikaaki Kodama (Toshiba, Japan)
Pagepp. 83 - 88
KeywordSADP
AbstractSelf-Aligned Double Patterning (SADP) is an important manufacturing technique for sub 20 nm technology node. In this paper, a rip-up and reroute based routing algorithm for SADP is proposed to obtain a more reliable routing pattern efficiently. In SADP, a cut pattern which is introduced in pattern mask reduces the extra mask cost, but a cut pattern itself potentially degrades the reliability of image on a wafer. The proposed algorithm generates a routing pattern that needs less cut patterns.
PDF file

R1-17 (Time: 10:47 - 10:49)
TitleAnalysis of the Distance Dependent Multiple Cell Upset Rates on 65-nm Redundant Latches by a PHITS-TCAD Simulation System
Author*Kuiyuan Zhang, Jun Furuta, Kazutoshi Kobayashi (Kyoto Institute of Technology, Japan)
Pagepp. 89 - 93
KeywordSoft error, PHITS, TCAD, MCU
AbstractRecently, the soft error rates of integrated circuits is increased by process scaling. Soft error decreases the tolerance of VLSIs. Charge sharing and bipolar effect become dominant when a particle hit on latches and flip-flop. Soft error makes circuit more sensitive to Multiple Cell Upset (MCU). We analyze the MCU tolerance of redundant latches in 65 nm process by device simulation and particle and heavy ion transfer code system (PHITS). The MCU rate of redundant latches is exponential decreased by increasing the distance between redundant latches. These results coincide with the neutron experiments.

R1-18 (Time: 10:49 - 10:51)
TitleFeasible Shortest Path Frame Bounded Maze-Routing Algorithm for ML-OARST with Ripping up and Re-Building Steiner Points
Author*Kuen-Wey Lin, Yeh-Sheng Lin, Yih-Lang Li (Institute of Computer Science and Engineering, National Chiao Tung University, Taiwan), Rung-Bin Lin (Computer Science and Engineering, Yuan Ze University, Taiwan)
Pagepp. 94 - 99
KeywordSteiner tree, Routing, Obstacle-avoidance, Multilayer, Physical Design
AbstractOwing to its large solution space, maze routing has never been used to solve the multi-layer obstacle-avoiding rectilinear Steiner tree problem (ML-OARST). This paper proposes the first maze routing-based algorithm that efficiently identifies a high-quality ML-OARST. Our algorithm employs a three-dimensional Hanan grid graph for maze routing and applies a novel scheme to identify good Steiner points. This significantly reduces the search overhead of maze routing. To reduce the routing cost of ML-OARST, we also develop a novel rip-up and re-building strategy for altering Steiner points and tree topology. Experimental results reveal that the proposed algorithm outperforms the state-of-the-art ML-OARST methods in wire-length and via costs. The required CPU time is comparable to that needed by spanning graph-based approaches.

R1-19 (Time: 10:51 - 10:53)
TitleA TPL-Friendly Legalizer for Standard Cell Based Design
Author*Hsiu-Yu Lai, Ting-Chi Wang (National Tsing Hua University, Taiwan)
Pagepp. 100 - 105
KeywordTriple Patterning Lithography, Placement, Legalization, Standard Cell, Layout Decomposition
AbstractAs the shrinking of the feature size and the delay of the next generation lithography, double patterning lithography (DPL) is no longer enough for 14/10nm technology node. Triple patterning lithography (TPL) is a nature extension from DPL, and it can not only triple the pitch but also reduce conflicts and stitches. Although TPL is more difficult and complicated than DPL, TPL is a promising alternative for 14/10nm technology node. In this paper, we consider TPL during the standard-cell legalization stage in order to let the resultant placement be more friendly to TPL layout decomposition. We provide a novel idea of reducing TPL conflicts through cell reordering and white space insertion. The experimental results show that as compared to a conventional legalizer, our legalizer is able to effectively reduce the numbers of conflicts and stitches.

R1-20 (Time: 10:53 - 10:55)
TitleGranularity of Via Configurable Logic Block for Structured ASIC
AuthorHui-Hsiang Tung (Oriental Institute of Technology, Taiwan), *Rung-Bin Lin (Yuan Ze University, Taiwan)
Pagepp. 106 - 110
KeywordStructured ASIC, Via Configurable, Granularity, VLSI
AbstractThis article presents a systematic way to determine the granularity of via configurable logic block (VCLB) for structured ASIC. The systematic and experimental studies both show that a VCLB with four transistors laid over a single diffusion strip results in the best area utilization.

R1-21 (Time: 10:55 - 10:57)
TitleOn the Impact of Initial Placement to SA-Based Placement for Mixed-Grained Reconfigurable Architecture
Author*Takashi Kishimoto, Hiroyuki Ochi (Ritsumeikan University, Japan)
Pagepp. 111 - 116
KeywordSimulated Annealing, Partitioning-based, Reconfigurable Architecture, Placement
AbstractIn this paper, we investigate a novel placement algorithm for mixed-grain reconfigurable architectures (MGRAs). The proposed algorithm applies partitioning-based method to LUTs to obtain an initial placement, followed by further optimization process for both LUTs and ALUs based on low temperature simulated annealing (SA) method. Compared with a conventional FPGA placement algorithm that uses SA with random initial placement, our method exhibits 9.3% smaller delay after running SA for half an hour. Our method is also superior in terms of final solution after several hours run.


Invited Talk I
Time: 13:30 - 14:20 Monday, March 16, 2015
Chair: Tsung-Yi Ho (National Chiao Tung University, Taiwan)

I1-1 (Time: 13:30 - 14:20)
TitleThrough-Silicon-Via Inductor based DC-DC Converters: The Marriage of the Princess and the Dragon
Author*Yiyu Shi (Missouri University of Science and Technology, U.S.A.)
Pagep. 117
AbstractThere has been a tremendous research effort in recent years to move DC-DC converters on chip for enhanced performance. However, a major limiting factor to implement on-chip inductive DC-DC converters is the large area overhead induced by spiral inductors. Towards this, we propose to use through-silicon-vias (TSVs), a critical enabling technique in three-dimensional (3D) integrated systems, to implement on-chip inductors for DC-DC converters. While existing literature show that TSV inductors are inferior compared with conventional spiral inductors due to substrate loss for RF applications, we demonstrate that it is not the case for DC-DC converters, which operate at relatively low frequencies. Experimental results show that by replacing conventional spiral inductors with TSV inductors, with almost the same efficiency and output voltage, up to 4.3x and 3.2x inductor area reduction can be achieved for the basic buck converter and the interleaved converter with magnetic coupling, respectively. To the best of our knowledge, this is the very first in-depth study on utilizing TSV inductors for on-chip DC-DC converters in 3D ICs.
PDF file


Panel Discussion
Time: 14:20 - 15:50 Monday, March 16, 2015
Moderator: Ing-Chao Lin (National Cheng Kung University, Taiwan)

D-1 (Time: 14:20 - 14:22)
TitleCircuit Reliability: Major Roadblock in Future Technology?
AuthorOrganizer: Tsung-Yi Ho (National Chiao Tung University, Taiwan), Moderator: Ing-Chao Lin (National Cheng Kung University, Taiwan), Panelists: Vijaykrishnan Narayanan (Pennsylvania State University, U.S.A.), Anthony Oates (Taiwan Semiconductor Manufacturing Company, Taiwan), Ulf Schlichtmann (Technische Universität München, Germany), Yiyu Shi (Missouri University of Science and Technology, U.S.A.), Tomohiro Yoneda (National Institute of Informatics, Japan)
Pagep. 118
AbstractAs technology scales, circuit reliability has become a major issue. This panel focuses on circuit reliability in current and future technology. Topics for discussion include the following: 1. Major reliability issues in advanced CMOS. Which is the most critical? 2. Major reliability issues in beyond CMOS technology. Any difference? 3. Major reliability issues at 3D IC, automotive, and medical electronics. Reliable hardware platform for automotive applications. 4. The role of EDA in improving circuit reliability.
PDF file


Poster II
Time: 16:10 - 17:55 Monday, March 16, 2015
Chairs: Eita Kobayashi (NEC Coporation, Japan), Chun-Yao Wang (National Tsing Hua University, Taiwan)

R2-1 (Time: 16:10 - 16:12)
TitleFast Transient and High Current Efficiency Voltage Regulator with Hybrid Dynamic Biasing Technique
AuthorChia-Min Chen, *Yen-Wei Liu, Chung-Chih Hung (National Chiao Tung University, Taiwan)
Pagepp. 119 - 122
KeywordCapacitive coupling, voltage spike, low-dropout regulator, hybrid dynamic biasing, transient response
AbstractThis paper presents an output-capacitorless low-dropout (LDO) voltage regulator that achieves fast transient responses by hybrid dynamic biasing. The hybrid dynamic biasing in the proposed transient improvement circuit is activated through capacitive coupling. The proposed transient improvement circuit senses the LDO output change so as to increase the bias current instantly. The proposed circuit was applied to an output-capacitorless LDO implemented in standard 0.35-um CMOS technology. The device consumes only 25 uA of quiescent current with a dropout voltage of 180 mV. The proposed circuit reduces the output voltage spike of the LDO to 80 mV when the output current is changed from 0 mA to 100 mA. The output voltage spike is reduced to 20 mV when the supply voltage varies between 1.3 V and 2.3 V with a load current of 100 mA.

R2-2 (Time: 16:12 - 16:14)
TitleA BIST Scheme Detecting Catastrophic Faults of MOSFETs in Bandgap Reference with Self-Biased Operational Amplifier
Author*Takuya Bando, Masayoshi Tachibana (Kochi University of Technology, Japan)
Pagepp. 123 - 127
Keywordbuilt-in self-test, bandgap reference
AbstractThis paper presents a Built-In Self-Test (BIST) scheme for detecting catastrophic faults in the Bandgap reference with self-biased operational amplifier. The proposed BIST technique detects catastrophic faults by comparing expected voltages and observed voltages on two test-points in bandgap reference which is improved for test operation. Additionally, since test stimulus generator is incorporated into a start-up circuit, the high-density integration becomes possible. The demonstrations show that fault coverage and area overhead are 100% and 3%, respectively.

R2-3 (Time: 16:14 - 16:16)
TitleScan Test of Latch-Based Asynchronous Pipeline Circuits under 2-Phase Handshaking Protocol
Author*Kyohei Terayama, Atsushi Kurokawa, Masashi Imai (Hirosaki University, Japan)
Pagepp. 128 - 133
Keywordtest, asynchronous circuit, scan D-latch, 2phase handshaking protocol
AbstractAsynchronous MOUSETRAP pipeline circuit is a simple and fast circuit thanks to the 2-phase handshaking protocol which has no return-to-zero overhead. In this paper, we propose two scan D-latches in order to support its scan test since D-latches are used instead of flip-flops in the MOUSETRAP. We design some MOUSETRAP pipeline circuits with the ISCAS89 benchmark combinational circuits using 130nm process technologies and show some evaluation results of the overhead and the fault coverage under the single stuck-at fault model.
PDF file

R2-4 (Time: 16:16 - 16:18)
TitleData Reduction and Parallelization for Human Detection System
Author*Mao Hatto, Takaaki Miyajima, Hideharu Amano (Keio University, Japan)
Pagepp. 134 - 139
KeywordHuman Detection, FPGA, parallelization, data reduction, HW/SW Co-design
AbstractHOG (Histogram of Oriented Gradients) is one of the effective ways for extracting feature values. Also, Real Adaboost algorithm has high recognition ratio, and it is adequate to hardware implementation. Many researches on human detection systems adopted these two algorithms and had achieved progress. However, data volume of HOG feature is still a problem in the whole system. Data volume from only one frame could be over 1 GB, and this data volume causes some difficulties from the view point of both sending data to a server and execution speed. Especially, since much data volume presses also internal data communication between modules in hardware execution, much data volume could be a bottle-neck of the whole system operating speed. Here, a high speed and small memory consuming implementation of human detection system using Hardware-Software Co-design is proposed. For the executing speed of the system, HOG feature values are accelerated by an FPGA, and Real Adaboost detection is executed only by accessing ROM data in the FPGA. As a result, HOG+Real Adaboost part was accelerated about 23.1 times faster compared to the software execution. Whole system had been implemented on a single board, and it achieved 3.22 times speed up from camera input to VGA display output. Also we tried to reduce feature data volume, and achieved 93.75% of data compression compared to double precision calculation, with only 2.68% loss of the recognition accuracy.
PDF file

R2-5 (Time: 16:18 - 16:20)
TitleEvaluation of Approximate SAD Circuits with Error Compensation
Author*Toshihiro Goto, Yasunori Takagi, Shigeru Yamashita (Ritsumeikan University, Japan)
Pagepp. 140 - 145
KeywordSAD, approximate comupting
AbstractThis paper proposes and evaluates an “approximate” but fast Sum of Absolute Difference (SAD) circuit to provide a design experience for approximate computing, which is an emerging research area. Our idea to design an “approximate” but fast circuit is similar to the one in the previous works in approximate computing researches. Unlike the previous works, we also propose various error compensation methods to use the circuit for real applications. Moreover, this paper reports the result of our hardware design, and our software evaluation of our various error compensation methods by using video compression applications. Our results show that our SAD circuit (with some errors) can reduce the total processing time by 10.71% than the conventional SAD circuit (without error), although it can provide acceptable quality for the video compression applications.

R2-6 (Time: 16:20 - 16:22)
TitleA Circuit Implementable 5-Output nMOSFET Shearing Stress Sensor
Author*Tomochika Harada, Kousuke Takeuchi (Yamagata University, Japan)
Pagepp. 146 - 148
Keywordshearing stress sensor, MOSFET sensor, multi-output sensor
AbstractIn this paper, we design, fabricate, and evaluate stress detection operation of 5-output MOSFET type stress detection element. We can verify in strong inversion regions. Stress detection sensitivity can be changed by VGS in the saturation region. If VGS is constant, stress detection sensitivity must set to constant. Furthermore, stress sensitivity is variable by VDS (Not VGS) in the linear region.

R2-7 (Time: 16:22 - 16:24)
TitleIddq Testing Against Process Variations and Measurement Noises
AuthorChia-Ling Chang, *Jack Sheng-Yan Lin, Clarles Hung-Pin Wen (National Chiao Tung University, Taiwan)
Pagepp. 149 - 150
KeywordIddq, Data mining, process variation
AbstractAnalyzing test data can have a significant impact on improving production test and parametric yield. The work investigates the test data analysis on Iddq test data to extract certain knowledge to estimate the process parameters and screen potential defective chips. With a simulation framework, we demonstrate the dependency of this screening to various assumptions, such as the amount of process variations, the sensitivity of measurement noises and the number of Iddq patterns. Experimental results on IWLS’05 designs show that the Iddq analysis reveals its strengths on screening faulty samples under various variations and assumptions in a 45nm technology.

R2-8 (Time: 16:24 - 16:26)
TitlePre-Bond Interposer Test Methodology for System in Package
AuthorKatherine Shu-Min Li (Department of Computer Science, National Sun Yat-sen University, Taiwan), Sying-Jyan Wang (Department of Computer Science, National Chung Hsing University, Taiwan), Cheng-You Ho (Department of Computer Science, National Sun Yat-sen University, Taiwan), Yingchieh Ho (Department of Electrical Engineering, National Dong Hwa University, Taiwan), Ruei-Ting Gu (National Sun Yat-sen University/Advanced Semiconductor Engineering (ASE) Group, Taiwan), Bo-Chuan Cheng (Advanced Semiconductor Engineering (ASE) Group, Taiwan)
Pagepp. 151 - 156
Keywordinterposer, test, 2.5D, System in Package, Through-Silicon-Via
AbstractPre-bond testing of silicon interposer is difficult due to the large number of nets to be tested and small number of test access ports. Recently, it was proposed to include a test interposer that is contacted with the interposer under test in the testing process. Combining these two interposers provides access to nets that are not normally accessible. Previous synthesis method for test interposer was based on constrained breadth-first search, which can be time-consuming. Besides, separate test interposers have to be provided for open and short fault testing. In this paper, we present a theoretical study on the topology of testable circuit structure for interconnect faults in silicon interposer. Based on the theoretical framework, a more efficient synthesis method is developed. Furthermore, a single test interposer can be used for both open and short fault detection, which leads to shorter test time and lower test cost.

R2-9 (Time: 16:26 - 16:28)
TitleOxygen Sensor Module with Majority Sensing for Monitoring Wide Area at Disaster
Author*Ryuta Nishino, Tatsuya Yamada, Qing Dong, Shigetoshi Nakatake (The University of Kitakyushu, Japan)
Pagepp. 157 - 158
KeywordSensor, Majority Sensing, Oxygen Concentration
AbstractThis work presents a new sensor module with majority sensing which improve an accuracy by multiple sensor devices. The sensor modules are distributed over disaster region for monitoring environmental information such as a temperature of the surface and oxygen concentration. Each sensor module is connected by a wireless network and transmits the information to a monitoring server. In this work, we focus on sensing oxygen concentration in case of forest fire. To improve an accuracy of the sensing value, we introduce a new sensing mechanism called majority sensing with multiple sensor devices. In experiments, we demonstrate 8.4-14% improvement for the oxygen concentration sensing.
PDF file

R2-10 (Time: 16:28 - 16:30)
TitleFPGA Implementation and Evaluation of Image Scaling Circuits Using Seletor-Logic-Based Bi-Linear Interpolation
Author*Keita Igarashi, Masao Yanagisawa, Nozomu Togawa (Waseda University, Japan)
Pagepp. 159 - 160
Keywordselector logic, FPGA, bi-linear interpolation
AbstractBi-linear interpolation is one of interpolation techniques, which interpolates a pixel value linearly from its four circumferences and often used for image scaling. In this paper, we pick up a method to interpolate pixels using selector logics and implement and evaluate it on an FPGA board. By applying selector logics to a bi-Linear interpolation operation, its total product terms are decreased and thus a circuit size and circuit delay are improved. We realize approximately 15.7% speed-up using selector-logic-based bi-linear interpolation.

R2-11 (Time: 16:30 - 16:32)
TitleAn Accelerator for Frequent Itemset Mining from Data Stream with Parallel Item Tree
Author*Kasho Yamamoto, Tsunaki Sadahisa, Dahoo Kim, Eric S. Fukuda, Tetsuya Asai, Masato Motomura (Hokkaido University, Japan)
Pagepp. 161 - 162
Keyworddata mining, frequent itemsets, stream processing, hardware accelerator
AbstractFrequent itemset mining attempts to find frequent subsets in a transaction database. In this era of big data, demand for frequent itemset mining is increasing. Therefore, the combination of fast implementation and low memory consumption, especially for stream data, is needed. In response to this, we optimize an online algorithm, called Skip LC-SS algorithm, for hardware.In this paper, we present an efficient architecture based on this algorithm.

R2-12 (Time: 16:32 - 16:34)
TitleA Leakage Current Reduction Algorithm Using Input Vector Control and Cell Topology Modification
AuthorTsung-Yi Wu (National Changhua University of Education, Taiwan), Hsin-Hui Li (Global Unichip Corp., Taiwan), *Zhi-Yao Ding, Guan-Cheng Guo (National Changhua University of Education, Taiwan)
Pagepp. 163 - 164
Keywordcell topology modification, input vector control, leakage current reduction, sleep mode
AbstractSince the leakage current of a digital circuit depends on the states of its logic gates, assigning a minimum leakage vector to its primary inputs in the sleep mode is a feasible technique for leakage current reduction. In this paper, we propose a heuristic algorithm that applies a cell topology modification and pin reordering technique and minimum leakage vector assignment for leakage current reduction. Experimental results show that the algorithm can reduce the leakage current by average 11.8%.

R2-13 (Time: 16:34 - 16:36)
TitleMajority-Inverter Graph for FPGA Synthesis
Author*Luca Amaru (EPFL - LSI, Switzerland), Ana Petkovska (EPFL - LAP, Switzerland), Pierre-Emmanuel Gaillardon (EPFL - LSI, Switzerland), David Novo Bruna, Paolo Ienne (EPFL - LAP, Switzerland), Giovanni De Micheli (EPFL - LSI, Switzerland)
Pagepp. 165 - 170
KeywordMajority-Inverter Graph, Logic Synthesis, FPGA
AbstractIn this paper, we present an FPGA synthesis flow based on Majority-Inverter Graph (MIG). An MIG is a directed acyclic graph consisting of three-input majority nodes and regular/complemented edges. MIG manipulation is supported by a consistent algebraic framework leading to strong synthesis properties. We propose MIG optimization techniques targeting high-speed FPGA implementations. For this purpose, we reduce the depth of logic circuits via MIG algebraic transformations enabling denser LUT mapping on FPGAs. Experimental results show that our MIG-based design flow reduces by 21%, on average, the delay of the arithmetic circuits synthesized on a state-of-art 28nm commercial FPGA device, as compared to a commercial design flow.
PDF file

R2-14 (Time: 16:36 - 16:38)
TitleHigh Observability Scan Chains with Improving Output Compaction Efficiency
AuthorSying-Jyan Wang, Che-Wei Kao (Department of Computer Science and Engineering, National Chung Hsing University, Taiwan), Katherine Shu-Min Li (Department of Computer Science and Engineering, National Sun Yat-sen University, Taiwan)
Pagepp. 171 - 176
Keywordscan test, scan chain, output compaction, X-tolerance, diagnosability
AbstractOutput selection is recently proposed for test response compaction. This scheme achieves zero aliasing, full X-tolerance, and high diagnosability, at the cost of inflated test set and non-trivial hardware overhead. The time/space penalty in test output compaction is mainly attributed to the loss of observability. In previous methods, it was in general assumed that erroneous responses are uniformly distributed among all scan chains, and the output compactors are designed accordingly. In this paper, we present three techniques to improve the performance of output selection based test response compaction. (1) The uneven distribution of erroneous test responses is exploited to optimize compactor design. (2) A test dynamic compaction algorithm is provided to deal with the test set inflation problem. (3) A low-cost test response compactor is presented. Experimental results indicate that the proposed techniques can achieve better compaction results with lower hardware overhead.

R2-15 (Time: 16:38 - 16:40)
TitleUsing Structural Relations for Checking Combinationality of Cyclic Circuits
AuthorWan-Chen Weng (National Tsing Hua University, Taiwan), Yung-Chih Chen (Yuan Ze University, Taiwan), Jui-Hung Chen, *Ching-Yi Huang, Chun-Yao Wang (National Tsing Hua University, Taiwan)
Pagepp. 177 - 182
Keywordcombinationality, cyclic circuit
AbstractFunctionality and combinationality are two main issues that have to be dealt with in cyclic combinational circuits, which are combinational circuits containing loops. Cyclic circuits are combinational if nodes within the circuits are definite values under all input assignments. For a cyclified circuit, we have to check whether it is combinational or not. Thus, this paper proposes an efficient two-stage algorithm to verify the combinationality of cyclic circuits. A set of cyclified IWLS 2005 benchmarks are performed to demonstrate the efficiency of the proposed algorithm. Compared to the state-of-the-art algorithm, our approach has a speedup of about 4000 times on average.

R2-16 (Time: 16:40 - 16:42)
TitleYAPSIM: Yet Another Parallel Logic Simulation Using GP-GPU
Author*Takuya Hashiguchi, Yuichiro Mori, Masahiko Toyonaga, Michiaki Muraoka (Kochi University, Japan)
Pagepp. 183 - 186
KeywordGP-GPU, Logic Simulator, Parallel algorithm
AbstractIn this paper, a new high-speed logic simulator YAPSIM based on a parallel logic simulation methodology using GP-GPU is presented. It consists of three acceleration methods for simulation performance, a fan-out cone grouping method, a LUT method and a GPU internal memory access method. The experimental comparison result shows that YAPSIM executed 29 times faster than a high speed commercial simulator for a combinational circuit of 75,000 gates, and 5.7 times faster for a sequential circuit of 84,000 gates respectively.
PDF file

R2-17 (Time: 16:42 - 16:44)
TitleTechnology Mapping Method for Low Power Consumption and High Performance in General-Synchronous Framework
Author*Junki Kawaguchi, Yukihide Kohira (The University of Aizu, Japan)
Pagepp. 187 - 192
KeywordGeneral-Synchronous Framework, Technology Mapping, Integer Linear Programming
AbstractIn general-synchronous framework, in which the clock is distributed periodically to each register but not necessarily simultaneously, circuit performance is expected to be improved compared to complete-synchronous framework, in which the clock is distributed periodically and simultaneously to each register. To improve the circuit performance more, logic circuit synthesis for general-synchronous framework is required. In this paper, under the assumption that any clock schedule is realized by an ideal clock distribution circuit, when two or more cell libraries are available, a technology mapping method which assigns a cell to each gate in the given logic circuit by using integer linear programming is proposed. In experiments, we show the effectiveness of the proposed technology mapping method.
PDF file

R2-18 (Time: 16:44 - 16:46)
TitleA Quaternary Master-Slave Flip-Flop with Multiple Functions for Multi-Valued Logics
Author*Renyuan Zhang, Mineo Kaneko (Japan Advanced Institute of Science and Technology, Japan)
Pagepp. 193 - 198
Keywordquaternary, flip-flop, Neuron-MOS
AbstractA prototype of flip-flop circuit is proposed in this work for storing quaternary signals. Inspired by the Neuron-MOS mechanism, the capacitance-coupling technology is implemented to realize multi-threshold inverters. On the basis of this technology, a self-lock feedback scheme is proposed to process and store quaternary signals with standard CMOS technology and ordinary dual-rail supply voltage. Thanks to the inherent property of quaternary processing and proposed scheme, various behaviors can be easily achieved without additional combination-circuits. An example is given on the quaternary counter with sixteen states. From circuit simulation results, the proposed quaternary multi-functional flip-flop achieves all the basic and extended functions correctly.
PDF file

R2-19 (Time: 16:46 - 16:48)
TitleQuantitative Evaluations and Efficient Exploration for Optimal Partially-Programmable Circuits Generation
Author*Takumi Tsuzuki (Nara Institute of Science and Technology, Japan), Yuko Hara-Azumi (Tokyo Institute of Technology, Japan), Shigeru Yamashita (Ritsumeikan University, Japan), Yasuhiko Nakashima (Nara Institute of Science and Technology, Japan)
Pagepp. 199 - 204
Keywordfault tolerance, PPC(Partially-Programmable Circuits), LUT(Look Up Tabble)
AbstractIn this paper, based on Partially-Programmable Circuits (PPCs), which have been recently proposed for improving the fault tolerance of circuits, we study further effective PPC generation by exploring wider design space. First, we quantitatively evaluated various aspects which may affect the fault tolerance of PPC. Exploiting the findings obtained, we then successfully generated PPCs which improve the area-efficiency of fault tolerance by 34% compared with an existing PPC generation method. Moreover, we developed an efficient exploration of PPCs, leading to exploration time reduction by 70% over exhaustive search, without affecting the optimality.

R2-20 (Time: 16:48 - 16:50)
TitleA Variability-Aware Energy-Efficient On-Chip Memory for Near-Threshold Operation Using Cell-Based Structure
Author*Jun Shiomi, Tohru Ishihara, Hidetoshi Onodera (Kyoto University, Japan)
Pagepp. 205 - 210
KeywordNear-threshold computing, Memory, Energy efficiency
AbstractOn-chip memory is one of the most energy consuming components in processors. Aggressive voltage scaling to the sub-/near-threshold region is thus applied even to the memory used for ultra-low power applications. In this paper, an energy efficient cell-based memory structure which is stably working with a near-threshold operating voltage is proposed. The circuit simulation using a commercial 28-nm technology shows that the energy consumption for the readout operation in our memory proposed here is up to 61% less than the energy dissipated in an existing cell-based memory and a conventional SRAM circuit. The simulation using a foundry provided Monte Carlo package also shows that the 3σ worst case read-access time of our cell-based memory is comparable to that of the SRAM circuit.

R2-21 (Time: 16:50 - 16:52)
TitleAn Efficient Calculation Method for Reliability Analysis of Logic Circuits
Author*Masatoshi Tsushima, Yuichi Ikeda, Shigeru Yamashita (Ritsumeikan University, Japan)
Pagepp. 211 - 216
KeywordSoft-error, Combinational Circuits, Reliability
AbstractIt has been anticipated that a so-called "soft error" causes serious problems even in a combinatorial logic circuits consisting of the near-future generation of transistors. Thus it is very important to develop efficient methods for the reliability analysis of logic circuits. There is an efficient reliability analysis method based on Probabilistic Transfer Matrix (PTM), but the necessary peak memory would be very huge in the wost case. This paper proposes another approach that uses much less memory even in the worst case. Our method carefully considers to avoid essentially the same computation as much as possible to reduce the computational time. Our preliminary experimental results show that the memory usage would be less than a ten-thousandth in the best case compared to the state-of-the-art PTM-based simulator.



Tuesday, March 17, 2015

Keynote Speech II
Time: 8:30 - 9:30 Tuesday, March 17, 2015
Chair: Ting-Chi Wang (National Tsing Hua University, Taiwan)

K2-1 (Time: 8:30 - 9:30)
TitleReliability and Robustness - Design and EDA to the Rescue!
Author*Ulf Schlichtmann (Technische Universität München, Germany)
Pagep. 217
AbstractTraditionally, Integrated Circuits (ICs) have been designed with the primary goal of minimizing area and thus cost. Performance also was a key issue from quite early on. Later, power became an important design consideration. Of course, optimizing yield always has been an important goal as well. Reliability of ICs, however, typically was (and still is) handled on technology level. Technology departments and manufacturing ensured the reliability of individual components, resulting in reliable circuits. But as we move to ever smaller geometries, individual devices (transistors and wires) become less reliable. At the same time, the complexity of ICs continues to grow exponentially. These two forces create a strong imperative to focus on design and especially Electronic Design Automation (EDA) in order to ensure reliability and robustness. Recently, cross-layer approaches have started to appear in order to achieve reliability in a cost-efficient manner. This talk will give an overview about reliability and robustness challenges and discuss recent research activities and results to address reliability and robustness challenges using EDA.
PDF file


Poster III
Time: 9:45 - 11:30 Tuesday, March 17, 2015
Chairs: Kazuhito Ito (Saitama University, Japan), Shigeru Yamashita (Ritsumeikan University, Japan)

R3-1 (Time: 9:45 - 9:47)
TitleNew nMOS Dynamic Shift Registers for Driver Circuit of Small LCDs and Their Evaluations
Author*Shinji Higa, Shuji Tsukiyama (Chuo University, Japan), Isao Shirakawa (University of Hyogo, Japan)
Pagepp. 218 - 223
Keywordshift register, nMOS dynamic logic, Liquid Crystal Display, System on Glass, source driver
AbstractDriver circuits for small LCDs (Liquid Crystal Displays) are formed on the same glass substrate as LCD by means of thin film transistors, which is called system on glass technology. If such a driver circuit is implemented by nMOS transistor only, then production cost can be reduced, because pMOS process is eliminated. In this paper, we focus on shift registers, which are indispensable in LCD driver circuit, and consider a method to design an nMOS dynamic shift register. Then, we propose two new 2-phase clock shift registers, and evaluate their performances by comparing them with the conventional shift registers using 2-phase or 4-phase clock. The results show that the new shift registers have acceptable areas and outperform the others in speed, power, and variations of power supply voltage and mobility of transistors.

R3-2 (Time: 9:47 - 9:49)
TitleA Floorplan-Driven High-Level Synthesis Algorithm Utilizing Interconnection Delay Characteristics in FPGA Designs
Author*Koichi Fujiwara, Masao Yanagisawa, Nozomu Togawa (Waseda University, Japan)
Pagepp. 224 - 225
Keywordhigh-level synthesis (HLS), FPGA, floorplan, interconnection delay
AbstractRecently, high-level synthesis (HLS) techniques for FPGA designs are required such as in image processing and software-defined radios. With recent process scaling in FPGAs, interconnection delays become dominant in total circuit delays and each FPGA family has different interconnection delay characteristics. Multiplexer cost is another concern in FPGA designs. We need to consider interconnection delays based on interconnection delay characteristics in FPGA designs with reducing multiplexer cost in HLS. In this paper, we propose a floorplan-driven HLS algorithm utilizing interconnection delay characteristics in FPGA designs. Experimental results show that our algorithm can realize FPGA designs which reduce the latency by up to 6% compared with our previous approach.

R3-3 (Time: 9:49 - 9:51)
TitleIntroducing Loop Statements in Random Testing of C Compilers Based on Expected Value Calculation
Author*Kazuhiro Nakamura, Nagisa Ishiura (Kwansei Gakuin University, Japan)
Pagepp. 226 - 227
Keywordcompiler, random testing, for loop
AbstractThis paper presents a method of reinforcing random testing of C compilers by introducing loop statements. While random testing based on precomputation of expected values is powerful in detecting bugs in C compilers, loop statements have not been handled, due to difficulties in avoiding undefined behavior. In this paper, an extended method to eliminate undefined behavior in loop bodies is proposed, where arrays of precomputed constants are used to modify problematic operands during loop iterations. A random test system based on the proposed method has uncovered a new bug in the latest version of LLVM which can not be detected by the existing methods.
PDF file

R3-4 (Time: 9:51 - 9:53)
TitleProduct Term Minimization in ROBDDs with Application to Reconfigurable SET Array Synthesis
Author*Yi-Hang Chen, Yang Chen, Juinn-Dar Huang (National Chiao Tung University, Taiwan)
Pagepp. 228 - 231
Keywordsingle-electron transistor, automatic synthesis, reconfigurable, area minimization, binary decision diagram
AbstractThe power dissipation has become a crucial issue for most electronic circuit and system designs nowadays when fabrication processes exploit even deeper submicron technology. In particular, leakage power is becoming a dominant source of power consumption. In recent years, the reconfigurable single-electron transistor (SET) array has been proposed as an emerging circuit design style for continuing Moore’s Law due to its ultra-low power consumption. Several automated synthesis techniques for area minimization have been developed for the reconfigurable SET array in the past few years. Nevertheless, most of those existing methods focus on variable and product term reordering during SET mapping. In fact, minimizing the number of product terms can greatly reduce the area as well, which has not been well addressed before. In this paper, we propose a dynamic shifting based variable ordering algorithm that can minimize the number of disjoint sum-of-product terms extracted from the given ROBDD. Experimental results show that the proposed method can achieve an area reduction of up to 49% as compared to current state-of-the-art techniques.
PDF file

R3-5 (Time: 9:53 - 9:55)
TitleAn Effective Timing-Coherent Transactor Generation Approach for Mixed-Level System Simulations
Author*Hsin-I Wu, Li-chun Chen, Ren-Song Tsay (National Tsing Hua University, Taiwan)
Pagepp. 232 - 237
KeywordMixed-level simulations, system simulations, transactor, timing coherent, ESL
AbstractIn this paper we extend the concept of the traditional transactor, which focuses on correct content transfer, to a new timing-coherent transactor that also accurately aligns the timing of each transaction boundary so that designers can perform precise concurrent system behavior analysis in mixed-abstraction-level system simulations which are essential to increasingly complex system designs. To streamline the process, we also developed an automatic approach for timing-coherent transactor generation. Our approach is actually applied in mixed-level simulations and the results show that it achieves 100% timing accuracy while the conventional approach produces results of 25% to 44% error rate.

R3-6 (Time: 9:55 - 9:57)
TitleAn Accurate Processor Power Estimation Approach Based on Microcomponent Structure Analysis
Author*Chi-Kang Chen, Zih-Ci Huang, Ren-Song Tsay (National Tsing Hua University, Taiwan)
Pagepp. 238 - 243
KeywordESL, power estimation, microcomponent, power anslysis, processor
AbstractWe propose a new embedded processor power analysis approach that maps instruction executions to microarchitecture components for highly efficient and accurate power evaluations, which are crucial for embedded system designs. We observe that in practice, the execution of each high-level instruction in a processor always triggers the same microcomponent activity sequence while the difference of power consumption values of different instructions is mainly due to timing variations caused by hazards and cache misses. Hence, by incorporating accurately pre-characterized microcomponent power consumption values into an efficient instruction-microcomponent processor timing simulation tool, we construct a highly accurate embedded processor power analysis tool. Additionally, based on the proposed approach, we accurately and effortlessly capture the power waveform at any time point for power profiling, peak power and dynamic thermal distribution analysis. The experimental results show that the proposed approach is nearly as accurate as gate-level simulators, with an error rate of less than 1.2% while achieving simulation speeds of up to 20 MIPS, five orders faster than a commercial gate-level simulator.

R3-7 (Time: 9:57 - 9:59)
TitleA Verilog Compiler Proposal for VerCPU Simulator
Author*Tze Sin Tan (Altera Corporation, Malaysia), Bakhtiar Affendi Rosdi (Universiti Sains Malaysia, Malaysia)
Pagepp. 244 - 249
KeywordVerilog, Simulator, Hardware Assisted
AbstractVerilog is a widely used Hardware Description Language (HDL) for VLSI design and modeling. As a language developed with hardware execution concurrency in mind, Verilog can be mapped onto a dedicated processor for higher simulation throughput. The processor requires a compiler to transform Verilog netlist into compiled-code instructions. In this paper, we propose a data structure that adequately represents a Verilog model. Then, the Verilog compiler is developed to map Verilog netlist into this data structure. We also demonstrated that it is possible to construct a hardware simulator (VerCPU) utilizing this data structure.
PDF file

R3-8 (Time: 9:59 - 10:01)
TitleMorFPGA Duo: A Dual-Core FPGA-Based Embedded System Development Platform
AuthorChih-Chyau Yang, *Chun-Yu Chen, Chun-Wen Cheng, Yi-Jun Liu, Chien-Ming Wu, Chun-Ming Huang (National Chip Implementation Center, Taiwan)
Pagepp. 250 - 254
KeywordMorFPGA, MorFPGA Duo, SoC FPGA, All-programmable SoC
AbstractTo help academia researchers of Taiwan rapidly integrate their IP to a system for complete demonstration of hardware/software co-design, CIC presents a platform named MorFPGA Duo in this paper. MorFPGA Duo owns a dual core, versatile built-in peripherals and high expandability to satisfy users’ eager needs for state-of-the-art research topics. MorFPGA Duo consists of two boards: The core modular board mainly includes a dual core Zynq and versatile peripherals while the multimedia modular board supports the high quality two-channel video sources with SDI interface. With the PMOD and FMC connectors, the various kinds of daughter boards are enabled to integrate into MorFPGA Duo system. The Media Wiki online forum is adopted as the platform to deliver users the latest lab materials. The design flow and two system integration examples are provided to show the MorFPGA Duo is workable. One example introduces how the software and hardware design can be integrated and demonstrated in this platform. The other example shows the video demo system with dual SDI cameras to enable the future development of 3D video applications.

R3-9 (Time: 10:01 - 10:03)
TitleA 3G-Based Bridge Structural Health Monitoring System Using Cost-Effective 1-Axis Accelerometers
AuthorChih-Hsing Lin, *Wen-Ching Chen, Chih-Ting Kuo, Gang-Neng Sung, Chih-Chyau Yang, Chien-Ming Wu, Chun-Ming Huang (National Chip Implementation Center, Taiwan)
Pagepp. 255 - 259
KeywordBridge health monitoring, 1-axis accelerometer, Cellular communication
AbstractThis paper proposes a 3G-based structure health monitoring device (HMD) for short-term monitoring. The proposed HMD includes three 1-axis accelerometers, microcontroller unit (MCU), analog to digital converter (ADC), and cellular gateway for long span bridge. The proposed monitoring system achieves the features of low cost by using three 1-axis accelerometers with the data synchronization problem being solved, and easily installation and removal. Furthermore, instead of using data loggers data is transmitted to Host through 3G gateway. Compared with 3-axis accelerometer, our proposed 1-axis accelerometers based device has achieved 72.7% cost saving. Besides, the cost of HMD achieves 34.1% cost saving when it is compared with data logger inside HMD. To adapt our HMD system to fit different monitoring environments, the proposed system can easily exchange the different PCB boards to achieve variety applications such as communication interfaces and sensors. Therefore, with using the proposed device, the realtime diagnosis system for bridge damage monitoring can be conducted effectively.

R3-10 (Time: 10:03 - 10:05)
TitleAnalytical Reliability Model of Die-Stacked DRAM Protected by Error Control Code and TSV Fault Tolerant Coding Technique
Author*Tadayuki Matsumura, Tsuyoshi Tanaka (Hitachi Ltd., Japan)
Pagepp. 260 - 265
Keywordreliability, stacked memory, TSV, ECC
AbstractDie-stacked DRAM is a promising innovation to meet the need for high memory bandwidth in HPC systems. HPC systems must also be reliable yet there is no analytical reliability model and it is difficult to evaluate reliability in a time-efficient manner. This paper proposes analytical reliability models for some type of the die-stacked memory configurations. It is shown that through silicon via (TSV) errors can be catastrophic, and an effective coding technique to solve this problem is proposed. The model is validated in simulation experiments. The reliability of future large-scale system is evaluated on the basis of the proposed model.
PDF file

R3-11 (Time: 10:05 - 10:07)
TitleProtection Method for AES IP Core from Scan-Based Attack
Author*Yifan Wu, Shinji Kimura (Waseda University, Japan)
Pagepp. 266 - 271
KeywordAdvanced Encryption Standard, scan chain, secure scan design, bit difference attack, JTAG
AbstractIn the research, scan-based two bit difference attack method has been studied and complete the method by further analysis and additional tables and test patterns. Then a protection method for such scan-based attack is also proposed.The proposed method cause less area overhead compared with the original AES IP core, higher security level and fault coverage compared to previous methods.
PDF file

R3-12 (Time: 10:07 - 10:09)
TitleSoft-Error Tolerant Datapath Synthesis Based on Speculative Resource Sharing in Triple Algorithm Redundancy
Author*Junghoon Oh, Mineo Kaneko (Japan Advanced Institute of Science and Technology, Japan)
Pagepp. 272 - 277
Keywordsoft-error, high-level synthesis, triple algorithm redundancy, speculative resource sharing
AbstractThe reliability degradation caused by soft-errors becomes one of serious issues in LSIs. We propose a method to synthesize soft-error tolerant datapaths via high-level synthesis. The novel feature of our method is speculative resource sharing between retry parts and secondary parts for hardware/time overhead mitigation. Scheduling algorithm using the special priority function to maximize the speculative resource sharing is also important feature. We found that our method is more effective when computation algorithm possesses higher parallelism and a smaller number of resources is available.
PDF file

R3-13 (Time: 10:09 - 10:11)
TitleUsing Range-Equivalent Circuits for Facilitating Bounded Sequential Equivalence Checking
AuthorYung-Chih Chen (Yuan Ze University, Taiwan), Wei-An Ji, Chih-Chung Wang, *Ching-Yi Huang, Chun-Yao Wang (National Tsing Hua University, Taiwan)
Pagepp. 278 - 282
Keyworddesign verification, bounded sequential equivalence checking
AbstractThis paper presents a method based on range-equivalent circuit technique for SAT-based bounded sequential equivalence checking. Given two sequential circuits to be verified, instead of straightforward unrolling the miter of two sequential circuits, we iteratively minimize the miter with a range-equivalent circuit technique before adding a new timeframe. This is because the previous timeframes can be considered as a pattern generator that feeds input patterns to the next timeframe. Experimental results show that the proposed method saved up to 91% of time for reaching the same bounded depth compared with previous work on IWLS2005 benchmarks.
PDF file

R3-14 (Time: 10:11 - 10:13)
TitleDesign of PPG-Based Heart Rate Sensor Enabling Motion Artifact Cancellation
Author*Takunori Shimazaki, Shinsuke Hara (Osaka City University, Japan)
Pagepp. 283 - 286
KeywordPPG, motion artifact, hear rate sensing, cancellation, experiments with subjects
AbstractWe have proposed a photoplethysmography (PPG)-based heart rate sensor which is equipped with a normal PPG sensor and a motion artifact sensor to be able to cancel motion artifact induced during exercise. It has two critical design parameters such as the height of the motion artifact sensor and the distance between the motion artifact sensor and the normal PPG sensor. This paper tries to determine the two parameters by experiments with a subject.
PDF file

R3-15 (Time: 10:13 - 10:15)
TitleA Redundant Task Allocation Method for Reliable Network-on-Chips
Author*Hiroshi Saito (The University of Aizu, Japan), Tomohiro Yoneda (National Institute of Informatics, Japan), Yuichi Nakamura (NEC, Japan)
Pagepp. 287 - 292
Keywordnetwork-on-chips, task allocation, task scheduling, fault tolerance
AbstractThe possibility of failures on network-on-chip (NoC) will be increased if the size increases. To realize reliable NoCs, we propose a redundant task allocation method which allocates several copies of tasks to different cores based on multiple task scheduling. In the experiments, we apply the proposed method to a real application. Then, the allocation time of the proposed method and the estimated execution time of the application are evaluated changing parameters such as multiplicities of scheduling and allocation.
PDF file

R3-16 (Time: 10:15 - 10:17)
TitleSingle-Flux-Quantum Digital Circuit Design Using Clockless Logic Cells with a Jitter Constraint
Author*Ryohei Matsumoto, Shigeru Yamashita (Ritsumeikan University, Japan)
Pagepp. 293 - 298
KeywordSFQ
AbstractWe propose a design method for Single-Flux-Quantum (SFQ) circuits using clockless logic cells. A clock tree’s size is reduced using clockless logic cells, but it is not easy to satisfy the so-called jitter constraint by doing so. Therefore, we consider using not only clockless logic cells but also clocked logic cells to satisfy the jitter constraint. Experimental results show that the circuit area by the proposed method is 32.12% smaller on average than that by the general method.

R3-17 (Time: 10:17 - 10:19)
TitleTime Analysis of Applying Back Gate Bias for Reconfigurable Architectures with SOTB MOSFET
Author*Hayate Okuhara, Hideharu Amano (Keio University, Japan)
Pagepp. 299 - 304
KeywordDynamic back gate bias scaling, Low power design
AbstractThe response time of the dynamic back gate biasscaling of large scale digital modules implemented with silicon on thin BOX (SOTB) technology developed by LEAP was analyzed using real chips. A reconfigurable accelerator cool mega array (CMA) and two different prototypes of microcontroller V850 E-star were utilized for measurement. Evaluation results revealed that the response time is related to the chip area which shares the bias voltage rather than the leakage current itself. The leakage current can be mostly stable after 180.0us and 270.2us after changing bias voltage of CMA and V850E-Star, respectively. The possibility of the dynamic back gate bias scaling within milliseconds for dynamic reconfigurable architectures was shown.
PDF file

R3-18 (Time: 10:19 - 10:21)
TitleA Cooling Effect Formulation and Implementation of a Cooling System for Li-Ion Battery Modules
Author*Yuki Kitagawa, Yusuke Yamamoto, Masahiro Fukui (Ritsumeikan University, Japan)
Pagepp. 305 - 310
KeywordLithium-ion battery, Degradation, Air cooling
AbstractThis paper discusses the theory and experiments of heating and air cooling of battery modules. Heating mechanism is shown first, and cooling of a single battery is examined. Optimum air flow speed is discussed. Then, similar discussion is made for a battery module of six serial cells. Finally, the discussion is to reduce the temperature variation
PDF file

R3-19 (Time: 10:21 - 10:23)
TitleGlobal Transformation-Based Optimization of Threshold Logic Circuits
Author*Maiko Kabu, Takayuki Kasugai, Shigeru Yamashita (Ritsumeikan University, Japan), Chun-Yao Wang (National Tsing Hua University, Taiwan)
Pagepp. 311 - 316
Keywordoptimization, threshold logic circuit, global functional flexibility, CSPF
AbstractThreshold logic circuit technology, which is considered to be one of the promising new technologies, has been successfully demonstrated recently thanks to the rapid progress of nanotechnology. Since the logic elements used in threshold logic circuits are very different from the ones used in the conventional logic circuits, we may need a totally different design methodology for threshold logic circuits; there have been intensive studies recently. In such previous works, local transformation have been mainly considered for the optimization of circuits. Instead, this paper, for the first time, considers global transformations. More specifically, we propose a method to calculate global functional flexibility based on compatible sets of permissible functions (CSPFs) and how to use it to optimize threshold logic circuits.

R3-20 (Time: 10:23 - 10:25)
TitleCounter-Based Victim Cache Hit Rate Optimization
Author*Li-Yen Chang, Chen-Hua Suo, Yi-Yu Liu (Yuan Ze University, Taiwan)
Pagepp. 317 - 318
Keywordvictim cache
AbstractVictim cache is proposed to alleviate cache miss penalty due to conflict misses. However, the low hit rate of victim cache, due to its small-size nature, indicates inefficiency of accessing the victim cache. In this paper, we integrate a small counter into L1 cache entry to filter out unnecessary victim cache accesses. Experimental results demonstrate that our technique substantially improves the victim cache hit rate.
PDF file

R3-21 (Time: 10:25 - 10:27)
TitleAn ECO-Friendly Design Style Based on Reconfigurable Cells
Author*Yudai Kabata, Tetsuya Hirose, Nobutaka Kuroki, Masahiro Numa (Kobe University, Japan)
Pagepp. 319 - 324
KeywordECO, reconfigurable cell, error diagnosis, technology remapping
AbstractThis paper presents an ECO-friendly design style based on reconfigurable (RECON) cells to reduce an increase in circuit delay by post-mask Engineering Change Orders (ECO’s). Employing RECON cells to implement not only the changes caused by ECO’s but also a part of the original circuit is the key to provide higher flexibility in the ECO process. Experimental results have shown that the proposed design style is effective to reduce the increase in circuit delay with post-mask ECO.


Invited Talk II
Time: 13:00 - 13:50 Tuesday, March 17, 2015
Chair: Mineo Kaneko (JAIST, Japan)

I2-1 (Time: 13:00 - 13:50)
TitleA New Approach to Synthesis of Transition Signaling Asynchronous Circuits
Author*Tomohiro Yoneda (National Institute of Informatics, Japan)
Pagep. 325
AbstractAsynchronous circuits work based on handshaking without using any global clocks, and thus, have potential for solving various problems that synchronous designs currently suffer in terms of global clocking. The handshake styles of asynchronous circuits are classified into the level signaling and the transition signaling. The transition signaling asynchronous circuits use "transitions" instead of "levels" in order to indicate events, which is beneficial on performance because it is not needed to reset the signal levels to zero. However, their design is more complicated compared to the level signaling asynchronous circuits. This talk introduces our trial approach to synthesis of transition signaling asynchronous circuits. This new approach is based on multi-clock flipflops that we recently developed, and I believe that the design of transition signaling asynchronous circuit can be done much more intuitively using this design style. This talk will show as an example how an NoC router is designed in this design style.
PDF file


Invited Talk III
Time: 14:05 - 14:55 Tuesday, March 17, 2015
Chair: Juinn-Dar Huang (National Chiao Tung University, Taiwan)

I3-1 (Time: 14:05 - 14:55)
TitleIC Design Challenges and Opportunities in Advanced Process Nodes
Author*Hsien-Hsin Sean Lee (Taiwan Semiconductor Manufacturing Company, Taiwan)
Pagep. 326
AbstractMoore’s Law has entered a new frontier as the incessant pace of device scaling continues to approach 10nm and beyond. As the physical dimension of devices and interconnect are shrunk, the design rules and the design flow, both ASIC and custom designs, face unprecedented complexity. Hence, common IC design practice can no longer separate the design and the process fabrication indifferently. Conventional design optimization techniques also need to take the novel process technologies, such as multi-gate devices (e.g., FinFET), spacer technology, and self-aligned multiple patterning lithography, into account in order to achieve the best possible performance, power, and area. In this talk, I will touch upon the challenges and implication of these new process technologies to IC designers from the foundry’s perspective and show how and what to innovate in EDA tools for bridging the gap between physical design and foundry fabrication, and then finally improve the overall design productivity.
PDF file


Poster IV
Time: 14:55 - 16:40 Tuesday, March 17, 2015
Chairs: Atsushi Takahashi (Tokyo Institute of Technology, Japan), Ing-Jer Huang (National Sun Yat-sen University, Taiwan)

R4-1 (Time: 14:55 - 14:57)
TitleLayout-Based Soft Error Rate Estimation Framework Considering Multiple Transient Faults - from Device to Circuit Level
AuthorHsuan-Ming Huang, *Yi-Wu Liu, Charles H.-P. Wen (National Chiao Tung University, Taiwan)
Pagepp. 327 - 332
KeywordSoft error, Multiple transient fault, Reliability
AbstractConsidering the structure of the layout and resulting nuclear reactions, multiple transient faults tend to be induced more frequently than do single transient faults, due to the effects of technology scaling. This study proposes a layout-based soft error estimation framework, which takes into account multiple transient faults from the device level to the circuit level. Experiment results demonstrate that the soft error rate can be underestimated by an average of 15.72% if only single (rather than multiple) transient faults are taken into account. Our results indicate that netlist-based analysis for the estimation of soft error rates is no longer sufficient, due to the overwhelming influence of the structural layout. Thus, using benchmark c432, a tighter layout will result in a soft error rate 34% higher than that generated in a looser layout.

R4-2 (Time: 14:57 - 14:59)
TitleUsing Body Biasing for Energy Efficient Frequency Scaling in a Dynamically Reconfigurable Processor
Author*Johannes Maximilian Kühn (University of Tübingen, Germany), Hideharu Amano (Keio University, Japan), Wolfgang Rosenstiel (University of Tübingen, Germany)
Pagepp. 333 - 338
KeywordBody Biasing, SOI, DVFS
AbstractSTMicro’s 28nm UTBB-FDSOI process is examined regarding the interplay of voltage and frequency scaling and coarse-grained body biasing in a Dynamically Reconfigurable Processor. We show that through coarse-grained body biasing, 37.79% and 40.76% greater energy efficiency at supply voltages of 0.6V and 0.8V are attainable compared to scaling supply voltages. Coarse-grained body biasing further optimizes energy efficiency by 16.1%, 10.6% and 12.8% at 0.6V, 0.8V and 1.0V over whole chip body biasing. No architectural changes are required.
PDF file

R4-3 (Time: 14:59 - 15:01)
TitleLow-Power Gated Clock Tree Synthesis for 3D ICs
Author*Yu-Chuan Chen, Chih-Cheng Hsu, Mark Po-Hung Lin (National Chung Cheng University, Taiwan)
Pagepp. 339 - 343
Keywordclock tree, clock gating, 3D IC
AbstractApplying clock gating in three dimensional integrated circuits (3D ICs) is essential for reducing power con- sumption and improving circuit reliability. However, the previous works only present algorithms for 3D clock tree synthesis. None of them address gated clock tree in 3D ICs for dynamic power reduction. In this paper, we propose the first problem formulation in the literature for 3D gated clock network optimization. We apply multilevel framework to effectively construct the topological gated clock tree while considering flip-flop switching activities and the timing constraint of enable signal paths at clock gating cells. Based on the constructed topological gated clock tree, a zero-skew 3D clock routing tree is then generated. Experimental results show that, compared with conventional 3D clock tree synthesis, the proposed 3D gated clock tree synthesis can achieve much less power consumption with similar number of TSVs and clock tree wirelength.

R4-4 (Time: 15:01 - 15:03)
TitleGraph-Covering-Based Architectural Synthesis for Programmable Digital Microfluidic Biochips
Author*Daiki Kitagawa, Dieu Quang Nguyen, Trung Anh Dinh, Shigeru Yamashita (Ritsumeikan University, Japan)
Pagepp. 344 - 349
Keywordgraph, binding, scheduling, programmable, biochip
AbstractDigital microfluidic technology has been extensively applied in various biomedical fields. Different from application-specific biochips, a programmable design has several advantages such as dynamic reconfigurability and general applicability. Basically, a programmable biochip divides the chip into several virtual modules. However, in the previous design, a virtual module can execute only one operation at a time. In this paper, we propose a new multi-functional module for programmable digital microfluidic biochips, which can execute two operations simultaneously. Moreover, we also propose a binding and scheduling algorithm for programmable biochips, which is motivated from a graph-covering problem. Experiment demonstrates that our algorithm can reduce the completion time of the applications compared with the previous approaches.

R4-5 (Time: 15:03 - 15:05)
TitleContamination-Aware Routing Flow for Both Functional and Washing Droplets in Digital Microfluidic Biochips
Author*Qin Wang, Yiren Shen, Hailong Yao (Tsinghua University, China), Tsung-Yi Ho (National Chiao Tung University, Taiwan), Yici Cai (Tsinghua University, China)
Pagepp. 350 - 355
KeywordContamination-Aware Routing, Washing Droplets Routing, Digital Microfluidic Biochips
AbstractA major issue in digital microfluidic biochips is cross-contamination caused by different biomolecule droplets crossing the same sites, where washing operations are necessary to avoid wrong assay results. Existing works either assume unrealistic infinite washing capacity, or ignore execution-time constraint and/or routing conflicts between functional and washing droplets. This paper presents the first practical droplet routing flow considering both realistic washing capacity constraint and routing conflicts between washing and functional droplets. Experimental results are promising.
PDF file

R4-6 (Time: 15:05 - 15:07)
TitleObstacle-Avoiding Wind Turbine Placement for Power-Loss and Wake-Effect Optimization
Author*Yu-Wei Wu (National Cheng Kung University, Taiwan), Yi-Yu Shi (Missouri University of Science and Technology, U.S.A.), Sudip Roy (National Cheng Kung University, Taiwan), Tsung-Yi Ho (National Chiao Tung University, Taiwan)
Pagepp. 356 - 361
KeywordPlacement, Wind Turbine
AbstractAs finite energy resources are being consumed at fast rate than they can be replaced, renewable energy resources have drawn an extensive attention. Wind power development is one such example, which is growing significantly throughout the world. The main difficulty in wind power development is that wind turbines interfere with each other and such turbulent directly affects the power produced, known as the wake effect. In addition, wirelength among wind turbines is not merely an economic factor, but also more decides the power loss occurs in the wirelength. Moreover, in reality, obstacles exist in the wind farm which is unavoidable, e.g., private land, lake. Nevertheless, to the best of our knowledge, none of the existing works consider wake effect, wirelength and obstacle-avoiding at the same time in the wind turbine placement problem. In this paper, we propose an analytical method to solve obstacle-avoiding placement of wind turbines for power-loss and wake-effect optimization. Experimental results show that the wind power produced by our tool is similar to that by the industrial tool AWS OpenWind. Besides, our algorithm can reduce the wirelength and avoid obstacles successfully while finding the locations of wind turbines at the same time.

R4-7 (Time: 15:07 - 15:09)
TitleAccelerating Random-Walk-Based Power Grid Analysis through Error Smoothing
Author*Tsuyoshi Okazaki, Masayuki Hiromoto, Takashi Sato (Kyoto University, Japan)
Pagepp. 362 - 367
Keywordpower grid analysis, random walk, Gauss-Seidel method
AbstractThis paper proposes a hybrid solver of a random walk and a stationary iterative method. Our solver is based on quasi-zero-variance importance sampling (QZV-IS), in which walk-probability is updated by using coarsely estimated voltages for rapid convergence. Because the convergence speed depends on the smoothness of the estimated voltages, we propose additionally to apply smoothing operator to quickly improve the quality of the estimated voltages. The propose solver achieved 2.3-3.6x speedup compared to the conventional method that only utilizes QZV-IS.

R4-8 (Time: 15:09 - 15:11)
TitleImprovement of Simulated Annealing Search ---Based on Tree Representations---
Author*Takaaki Banno, Kunihiro Fujiyoshi (Tokyo University of Agriculture and Technology, Japan)
Pagepp. 368 - 373
KeywordSimulated Annealing, tree representations, O-tree, DTS, packing
AbstractPlacement problem for LSI layout is often refered to ``Rectangle packing problem.'' For this problem, several representations of rectangle packing were proposed and packings are searched by Simulated Annealing based on a representation. To search efficiently based on representations, it is necessary to define appropriate MOVE operations. In this paper, we restrict MOVE operations so that a certain MOVE can restore any adjacent solution to former solution and confirmed the efficteveness by experiments.
PDF file

R4-9 (Time: 15:11 - 15:13)
TitleA Hierarchical Type Segmentation Algorithm Based on Support Vector Machine for Colorectal Endoscopic Images with NBI Magnification
Author*Takumi Okamoto, Tetsushi Koide, Anh-Tuan Hoang, Koki Sugi, Tatsuya Shimizu, Toru Tamaki, Bisser Raytchev, Kazufumi Kaneda, Yoko Kominami, Shigeto Yoshida, Shinji Tanaka (Hiroshima University, Japan)
Pagepp. 374 - 379
KeywordSupport Vector Machine (SVM), Colorectal Endoscopic Images, Computer-Aided Diagnosis (CAD), Hierarchical Type Segmentation, FPGA
AbstractWith the increase of colorectal cancer patients in recent years, the needs of quantitative evaluation of colorectal cancer are increased, and the computer-aided diagnosis (CAD) system which supports doctor's diagnosis is essential. In this paper, a hardware design of type identification module in CAD system for colorectal endoscopic images with narrow band imaging (NBI) magnification is proposed for real-time processing of full high definition image (1920 x 1080 pixel). A pyramid style identifier with SVMs for multi-size scan windows, which can be implemented with small circuit area and achieve high accuracy, is verified for actual complex colorectal endoscopic images.
PDF file

R4-10 (Time: 15:13 - 15:15)
TitleHigh Performance Feature Transformation Architecture Based on Bag-of-Features in CAD System for Colorectal Endoscopic Images
Author*Koki Sugi, Tetsushi Koide, Anh-Tuan Hoang, Takumi Okamoto, Tatsuya Shimizu, Toru Tamaki, Bisser Raytchev, Kazufumi Kaneda, Yoko Kominami, Shigeto Yoshida, Shinji Tanaka (Hiroshima University, Japan)
Pagepp. 380 - 385
KeywordColorectal Endoscopic Images, Computer-Aided Diagnosis(CAD), Feature Transformation, Visual Word(VW), FPGA Hardware Implementation
AbstractOur research describe the computer-aided diagnosis (CAD) system for colorectal endoscopic images with narrow band imaging (NBI) magnification, which identifies a pathology type from local feature in the NBI endoscopic image. We propose a high speed feature transformation for CAD system by using Manhattan distance calculation and on the fly normalization method. A high performance and a low cost algorithm for multiple Scan Window (SW) processing for FPGA is also introduced. The proposed high speed feature transformation can be completed within about 380 msec on a real time Full HD NBI endoscopic image.
PDF file

R4-11 (Time: 15:15 - 15:17)
TitleHardware Implementation of Motion Estimation Technology Using High Level Synthesis and Investigations into Techniques for Improvements
Author*Shota Nagai (Graduate School of Science and Engineering, Kindai University, Japan), Takashi Kambe (Depart. of Electric and Electronic Engineering, Kindai University, Japan), Gen Fujita (Osaka Electro-Communication University, Japan)
Pagepp. 386 - 390
Keywordmotion estimation, H.264/AVC, EPZS, high level synthesis, Bach C
AbstractThe motion estimation technology that is a key part of the H.264/AVC (Advanced Video Coding) standard, implemented it as hardware using high-level synthesis technology, and investigated improvements. An EPZS algorithm was implemented instead of a Full Search algorithm, and the results evaluated to understand the effectiveness of the high-level synthesis technology and of the speedup techniques that were adopted.
PDF file

R4-12 (Time: 15:17 - 15:19)
TitleFPGA Oriented Intra Angular Prediction Image Generation Hardware for HEVC Video Coding
Author*Eita Kobayashi, Seiya Shibata (NEC Corporation, Japan), Noriaki Suzuki (NEC corporation, Japan), Atsufumi Shibayama (NEC Corporation, Japan), Takeo Hosomi (NEC Coporation, Japan)
Pagepp. 391 - 396
KeywordHEVC, FPGA, Architecture, High Level Synthesis
AbstractThis work proposes a novel architecture for intra prediction image generation of High Efficiency Video Coding (HEVC) standards oriented to FPGA. HEVC intra prediction is highly-extended from H.264 in those of mode and block size to realize the high flexibility. From the point of view of hardware, however, this flexibility cause an increasing required the number of MUXs although MUXs tend to be a bottleneck of area and frequency in the case of FPGA. In this paper we propose a Reshaping Buffered Architecture which enables reduction the number of MUXs, drastically. Experimental results show that our proposed architecture can reduce up to 70% of number of MUXs compared with raster scan based architecture. This resulted in a marked improvement of maximum frequency by 43% and LUT usage by 51%, respectively.
PDF file

R4-13 (Time: 15:19 - 15:21)
TitleHigh Accuracy and Simple Real-Time Circle Detection on Low-Cost FPGA for Traffic-Sign Recognition on Advanced Driver Assistance System
Author*Anh-Tuan Hoang (Research Institute for Nanodevice and Bio Systems, Hiroshima University, Japan), Masaharu Yamamoto (Graduate School of Advanced Sciences of Matter, Hiroshima University, Japan), Tetsushi Koide (Research Institute for Nanodevice and Bio Systems, Hiroshima University, Japan)
Pagepp. 397 - 402
Keywordcircle detection, traffic sign detection, pipeline scaning, ADAS, multi grain pipelining
AbstractThis paper describes a hardware oriented algorithm and its conceptual implementation for real-time traffic signs detection system on automotive oriented FPGA. The speed limit sign area on a grayscale video frame is detected through a two-stage simple computation process. Rectangle Pattern Matching roughly detects global luminosity sharing feature between rectangle and circle for Region of Interest (ROI). Then, Circle Detection roughly votes local pixel direction of circle inside the detected ROI in binary image for circle confirmation. The proposed system achieves 83 full HD fps and over 99% accuracy even in difficult situation such as rainy night. It occupies around 50% the hardware available on proposed Xilinx Zynq automotive FPGA, which has 85 K logic cells, 53.2 K LUTs, 106.4 K registers and 506 KB BRAM, and so be able to apply to Advanced Driver Assistance System on common vehicles.
PDF file

R4-14 (Time: 15:21 - 15:23)
TitleDMATP: A Design Method and Architecture of TU Parallel Processing for 4K HEVC Hardware Encoder
Author*Seiya Shibata, Eita Kobayashi, Noriaki Suzuki, Atsufumi Shibayama, Takeo Hosomi (NEC, Japan)
Pagepp. 403 - 408
KeywordHEVC, hardware design
AbstractThis paper proposes design method and architecture of parallel processing hardware for Transform Units in High Efficiency Video Coding (HEVC). HEVC is the next generation video coding standard which is expected to be used for high resolution broadcasting such as 4K UltraHD. Since HEVC introduces higher complexities and dependencies than previous standard H.264/AVC, hardware designers have to find and utilize parallelism in HEVC to realize strict real-time encoding performance especially for broadcasting purpose. We propose design method to find appropriate parallelism considering both HEVC algorithm and hardware resources focusing on the Transform Units processing, and propose architecture to bring the parallelism efficiently. With the architecture, we got a prospect of realizing 4K HEVC encoder.
PDF file

R4-15 (Time: 15:23 - 15:25)
TitleAn Improved Rate-Distortion Optimized Quantization Algorithm and Its Hardware Implementation
Author*Genki Moriguchi (Graduate School of Science and Engineering, Kindai University, Japan), Takashi Kambe (Depart. of Electric and Electronic Engineering, Kindai University, Japan), Gen Fujita (Osaka Electro-Communication University, Japan)
Pagepp. 409 - 414
KeywordH.264/AVC, RDOQ, function based pipelining, high-level synthesis, Bach C
AbstractRate-distortion optimized quantization (RDOQ) is an important technology in H.264/AVC for improving video coding performance. It is able to determine the optimal value among multiple quantization candidates based on rate-distortion (RD). We propose improvements to the algorithm to reduce its complexity by changing the bit-rate estimation method and by excluding low scored candidates for the quantization. We also implement the algorithm in hardware using the Bach C high-level synthesis tool. Finally, the performances of the proposed algorithm and hardware design results are evaluated.
PDF file

R4-16 (Time: 15:25 - 15:27)
TitleImplementation and Evaluation of AES/ADPCM on STP and FPGA with High-Level Synthesis
Author*Yuki Ando, Yukihito Ishida, Shinya Honda, Hiroaki Takada, Masato Edahiro (Nagoya University, Japan)
Pagepp. 415 - 420
KeywordFPGA, DRP, High-level synthesis
AbstractReconfigurable techniques are attracting attention as an alternative to dedicated hardware of SoC. We have evaluated FPGA and STP engine in order to confirm their performance whether they can substitute the dedicated hardware of SoC. We selected AES and ADPCM applications to compare the performance of FPGA and STP engine. The applications were synthesized with the same high-level synthesis tools. Then, we implemented them onto FPGA and STP engine using the integrated development environments. For the evaluation, we compared them in terms of resource usage, the number of states, the number of cycles, frequency, and execution time.
PDF file

R4-17 (Time: 15:27 - 15:29)
TitleSpeed Traffic-Sign Number Recognition on Low Cost FPGA for Robust Sign Distortion and Illumination Conditions
Author*Masaharu Yamamoto, Anh-Tuan Hoang, Tetsushi Koide (Hiroshima University, Japan)
Pagepp. 421 - 426
KeywordAdvanced Driver Assistance System (ADAS), Real-Time Processing, Traffic-Sign Detection, Number Recognition, FPGA Imprementation
AbstractIn this paper, we propose a hardware-oriented robust speed traffic-sign recognition algorithm which can process real-time for Advanced Driving Assistant System (ADAS). In difficult conditions, such as sign distortion in various angle or at night and rain, the proposed algorithm is still be able to recognize the traffic sign with high precision. The proposed hardware oriented number recognition algorithm achieves more than 99 % in recognition rate in daytime and achieves 94.2 % including difficult conditions in rainy night.
PDF file

R4-18 (Time: 15:29 - 15:31)
TitleEfficient Manipulation of Truth Tables on CUDA for Gate-Level Simulation
Author*Yuri Ardila, Tatsuyuki Kida, Shigeru Yamashita (Ritsumeikan University, Japan)
Pagepp. 427 - 432
Keywordlogic circuit, verification, simulator, cuda, gpu
AbstractEfficient logic circuit simulations are indispensable for manufacturing LSI products. Since the computation of such simulations is usually very time consuming, there have been many efforts to optimize it; and many researches have been succeeded by using the GPGPU (General-Purpose computing on Graphics Processing Unit) technology for a decade. This paper also studies how to utilize GPGPU to optimize the logic circuit simulation. Our method is mainly based on efficient parallel manipulations of truth tables. Our idea is different from most of the previous works considering a fact that the outputs of many gates can be evaluated in parallel. We achieved as much as 65.5 times speedup compared to the simulation using only a CPU.

R4-19 (Time: 15:31 - 15:33)
TitleScan-Based Side-Channel Attack Implementation Evaluation on the LED Cipher Using SASEBO-GII
Author*Huiqian Jiang, Mika Fujishiro, Masao Yanagisawa, Nozomu Togawa (Waseda University, Japan)
Pagepp. 433 - 434
KeywordSide-Channel Attack, Scan-Based Attack, LED Cipher, SASEBO-GII, Implementation Evaluation
AbstractLED is a lightweight block cipher which is suitable for both hardware and software. Design-for-test is essential to LSI designers in order to check whether devices work correctly. One of design-for-test techniques using scan chains is called scan-path test, in which testers can observe and control registers inside an LSI chip directly. Recently, scan-based side-channel attack is reported which retrieves the secret information from a cryptosystem using scan chains. In this paper, we demonstrate that the secret key in LED cipher can be retrieved successfully from the SASEBO-GII, side-channel attack standard evaluation board. Experiments show that scan-based attack is practical enough.

R4-20 (Time: 15:33 - 15:35)
TitleA Study on Visualization of Auscultation-Based Blood Pressure Measurement
Author*Yusuke Katsuki, Mingyu Li, Qing Dong, Shigetoshi Nakatake (The University of Kitakyushu, Japan)
Pagepp. 435 - 436
KeywordSensor, Medical, digital filtering
AbstractBlood pressure measurement by Korotkov sounds auscultation is an essential skill for health care workers, but the skill mastery is not easy because complicated tasks such as simultaneous auscultation, manipulation of pressure, and checking of scale are required. This work provides a system to visualize the Korotkov sounds and pressure-in-cuff by sensing them at the same time. Plus, we evaluate the system from the viewpoint of an educational assistance of the skill mastery of blood pressure measurement.
PDF file