(Go to Top Page)

The 16th Workshop on Synthesis And System Integration of Mixed Information Technologies
Technical Program

Remark: The presenter of each paper is marked with "*".
Technical Program:   SIMPLE version   DETAILED version with abstract
Author Index:   HERE

Session Schedule


Monday, October 18, 2010

Opening (Ballroom)
08:40 - 09:00
K1  (Ballroom)
Keynote I

9:00 - 10:00
Coffee Break
10:00 - 10:15
R1  (Ballroom)
Paper Session I: System Level Design and Design Experience (I)

10:15 - 12:00
Lunch Break
12:00 - 13:30
I1  (Ballroom)
Invited Talk I

13:30 - 14:15
R2  (Ballroom)
Paper Session II: Logic and Physical Design (I)

14:15 - 16:00
D  (Ballroom)
Panel Discussion

16:00 - 17:30
Banquet (Ballroom)
19:00 - 21:00

Tuesday, October 19, 2010

K2  (Ballroom)
Keynote II

9:00 - 10:00
Coffee Break
10:00 - 10:15
R3  (Ballroom)
Paper Session III: Logic and Physical Design (II)

10:15 - 12:00
Lunch Break
12:00 - 13:30
I2  (Ballroom)
Invited Talk II

13:30 - 14:15
I3  (Ballroom)
Invited Talk III

14:15 - 15:00
Coffee Break
15:00 - 15:15
R4  (Ballroom)
Paper Session IV: System Level Design and Design Experience (II)

15:15 - 16:50
Closing
16:50 - 17:00



List of Papers

Remark: The presenter of each paper is marked with "*".

Monday, October 18, 2010

Keynote I
Time: 9:00 - 10:00 Monday, October 18, 2010
Location: Ballroom
Chair: Youn-Long Lin (National Tsing Hua University, Taiwan)

K1-1 (Time: 9:00 - 10:00)
TitleEnergy Efficient Enterprise Computing Systems
Author*Massoud Pedram (Univ. of Southern California, U.S.A.)
Pagep. 3
AbstractDigital information management is the key enabler for the unparalleled rise in productivity and efficiency gains experienced by the world economies. Enterprise computing systems are important elements of the world’s digital infrastructure by providing ever-present and ever-increasing information processing, storage, and networking capabilities. As such, they are also significant drivers of economic growth and societal changes. However, continued expansion of enterprise computing systems is now hindered by their unsustainable and rising energy needs. Moreover governments, people, and corporations are becoming increasingly concerned about the environmental impact of enterprise computing systems and their supporting cyber-physical structures. It is with this backdrop that I will present a number of best practices and methods for improving the energy efficiency of enterprise computing systems, ranging from core-level to platform-level power management, from design of energy proportional hardware to system-wide provisioning of heterogeneous resources, and from task scheduling to virtualization.


Paper Session I: System Level Design and Design Experience (I)
Time: 10:15 - 12:00 Monday, October 18, 2010
Location: Ballroom
Chairs: Rung-Bin Lin (Yuan Ze University, Taiwan), Youhua Shi (Waseda University, Japan)

R1-1 (Time: 10:15 - 10:17)
TitlePlacing Static and Stack Data into a Scratch-Pad Memory for Reducing the Energy Consumption of Multi-task Applications
Author*Lovic Gauthier, Tohru Ishihara (Kyushu University, Japan), Hideki Takase (Nagoya University, Japan), Hiroyuki Tomiyama (Ritsumeikan University, Japan), Hiroaki Takada (Nagoya University, Japan)
Pagepp. 7 - 12
KeywordEnergy consumption, Scratch-pad memory, Software, Multi-task, Stack
AbstractScratch-pad memories (SPM) are on-chip memory devices which are much smaller but much faster and which consume much less energy than off-chip memories. This paper presents two fully software techniques for respectively sharing the SPM among several tasks and managing the stacks of each task between the SPM and the external main memory (MM). The paper then explains then how to merge efficiently these techniques for achieving further energy consumption reduction.

R1-2 (Time: 10:17 - 10:19)
TitleAggressive Register Unsharing with Selective FU Sharing in High-Level Synthesis
Author*Yuko Hara-Azumi, Toshinobu Matsuba (Nagoya University, Japan), Hiroyuki Tomiyama (Ritsumeikan University, Japan), Shinya Honda, Hiroaki Takada (Nagoya University, Japan)
Pagepp. 13 - 18
KeywordHigh-Level Synthesis, Behavioral Synthesis, Aggressive Register Unsharing, Selective FU Sharing, Register Retiming
AbstractA novel high-level synthesis technique to improve the clock frequency with little area overhead is presented. Our technique aims at suppressing area overhead while keeping clock frequency as high as an existing work which achieves the highest clock frequency. Our proposed method performs selective functional unit (FU) sharing, which shares only large FUs in order to efficiently save circuit area and multiplexer (MUX) insertion, based on an existing technique called aggressive register unsharing, which significantly removes MUXs inserted before registers. Moreover, we propose hardware-component-level register retiming, which shortens critical path delays more effectively than the traditional logic-level register retiming. Three sets of experiments demonstrated that our proposed method achieved up to 37.8% and on average 15.7% area reduction with negligible clock frequency degradation from the existing work.

R1-3 (Time: 10:19 - 10:21)
TitleAutomatic Generation for Efficient Software TLM at Multiple Abstraction Layers
AuthorMeng-Huan Wu, *Yi-Shan Lu, Wen-Chuan Lee, Chen-Yu Chuang, Ren-Song Tsay (Department of Computer Science, National Tsing Hua University, Taiwan)
Pagepp. 19 - 24
Keywordhw/sw co-simulation, software abstraction
AbstractWe in this paper propose a software Transaction-Level Model-ing (TLM) approach to co-simulate HW/SW efficiently. To keep the concurrency in the simulated system, timing synchronization should be considered carefully in HW/SW co-simulation between hardware and software simulations. Nevertheless, improper timing synchronization leads to either poor simulation performance or inaccurate simulation result. Our approach achieves accurate yet efficient HW/SW co-simulation due to that we perform timing synchronization only at points where HW and SW actually interact. In addition, given the target software, three abstraction levels of software TLM models can be generated automatically based on the type of interactions concerned. The experimental results show that the speed of our software TLM models achieves 3 million instructions per second (MIPS) for low abstraction level, and goes higher up to 248 MIPS for higher abstraction levels. Hence, designers can leverage our approach to have an efficient HW/SW co-simulation by simply selecting proper abstraction layers which fit their needs.

R1-4 (Time: 10:21 - 10:23)
TitleEvaluation of Two Operating Systems for Lego Mindstorms NXT
Author*Wing-Kwong Wong (Department of Electronic Engineering, National Yunlin University of Science and Technology, Taiwan), Fu-Hsien Lin (Graduate School of Electronic Engineering, National Yunlin University of Science and Technology, Taiwan)
Pagepp. 25 - 30
KeywordEmbedded systems, NxtOSEK, MicroC/OS, Lego Mindstorms NXT, Operating systems
AbstractLego Mindstorms NXT is used as a hardware platform for comparing two embedded operating systems (OS). NxtOSEK is available as an open-source project that includes both device drivers and an OS kernel. We have successfully ported MicroCOS to replace the NxtOSEK kernel but the device drivers are kept. Following previous works on the evaluation of embedded operating systems, we use a number of measurements with a software approach to evaluate the performance of NxtOSEK and MicroCOS, including preemptive scheduling, interrupt preemption, get/release semaphore, semaphore passing and memory allocation. MicroCOS performed significantly better in two aspects and its kernel mechanisms are examined in detail in order to explain the speedup compared to NxtOSEK.

R1-5 (Time: 10:23 - 10:25)
TitleConcord: A Configurable SoC Prototyping Platform
AuthorChih-Chyau Yang, *Chen-Yen Lin, Hui-Ming Lin, Yui-Chih Shih, Hsi-Tse Wu, Shi-Lun Chen, Tien-Ching Wang, Chien-Ming Wu, Chun-Ming Huang, Chin-Long Wey (National Chip Implementation Center, Taiwan)
Pagepp. 31 - 36
KeywordSoC prototyping, CONCORD, verification platform
AbstractFPGA-based SoC verification boards have been commercially available for SoC verification prototyping. However, most of these boards were developed with fixed hardwired architectures. Due to the lack of architectural flexibility, users are not allowed to develop with on-chip-buses and on-chip-networks, and to alter the architecture for specific applications. In addition, the system architecture under the FPGA-based SoC system may differ from the real chip. This paper presents a fully configurable SoC prototyping platform, namely, CONCORD, which provides high flexibility in connection interfaces, high flexibility and high architectural compatibility for design changes, and high modularity for specific applications. In order to demonstrate the effectiveness of the developed CONCORD verification platform, this paper also presents three configurations for the embedded systems with the most popular cores, such as ARM, OpenRISC, and LEON.

R1-6 (Time: 10:25 - 10:27)
TitleGeneration Method of Decomposed Small Area Instruction Decoder for Configurable Processor
Author*Hiroki Ohsawa, Hirofumi Iwato, Keishi Sakanushi, Yoshinori Takeuchi, Masaharu Imai (Osaka University, Japan)
Pagepp. 37 - 41
Keywordsmall area instruction decoder, configurable processor, ASIP
AbstractThis paper studies a generation method of decomposed small area instruction decoder for configurable processor. Since Application Specific Instruction set Processor (ASIP) is widely used in embedded systems, ASIPs are required to be designed to have further smaller area, higher performance, and lower power consumption. This paper proposes a generation method of small area instruction decoder by using decomposed instruction decoder model. In this paper, we pay attention to the number of the pipeline registers in the controller. Proposed method minimizes the number of the pipeline registers by generating control signals on two or more stages. Experimental results show that proposed method achieves 85 % reduction of pipeline register for control signals in controller compared to the conventional method.

R1-7 (Time: 10:27 - 10:29)
TitleA High-speed VLSI Architecture of Output Probability and Likelihood Score Computations for HMM-based Recognition Systems
Author*Ryo Shimazaki, Kazuhiro Nakamura, Mashatoshi Yamamoto, Kazuyoshi Takagi (Nagoya University, Japan), Naofumi Takagi (Kyoto University, Japan)
Pagepp. 42 - 47
Keywordspeech recognition, VLSI architecture, HMM, likelihood score computation, output probability computation
AbstractWe present a VLSI architecture for output probability computations (OPCs) of continuous HiddenMarkovModels (HMMs) and likelihood scorer computations (LSCs) which supports store-based block parallel processing (StoreBPP). We also demonstrate fast store-based block parallel processing (FastStoreBPP) which exploits full performance of the StoreBPP and present a high-speed VLSI architecture that supports it. A comparison demonstrates the efficiency of the architecture.

R1-8 (Time: 10:29 - 10:31)
TitleImproved Local Horizontal and Vertical Common Subexpression Elimination Method for Constant Multiple Multiplication
Author*Yasuhiro Takahashi, Toshikazu Sekine (Gifu University, Japan), Michio Yokoyama (Yamagata University, Japan)
Pagepp. 48 - 53
Keywordmultuplierless filter, common subexpression elimination, constant multiplication
AbstractThe common subexpression elimination (CSE) techniques address the issue of minimizing the number of adders needed to implement the multiple constant multiplication (MCM) blocks. In this paper, we propose a new CSE method using a combining horizontal and vertical technique. The proposed method searches firstly the frequency of higher order horizontal common subexpression, i.e., 3-5 bits, and then searches vertical. Our simulation results show that our method others a good tradeoff between the implementation cost and the synthesis run-time in comparison with conventional methods.

R1-9 (Time: 10:31 - 10:33)
TitleImproved Normalized Image Reconstruction for Iris Recognition
Author*Hyo Jin Nam, Harsh Durga Tiwari, Yong Beom Cho (Konkuk University, Republic of Korea)
Pagepp. 54 - 57
KeywordIris recognition, Segmentation process, Normalization, Intel PXA255
AbstractIris recognition is one of the most common identification system used now-a-days. Compared with other biometric features such fingerprint and face, Iris patterns are more reliable and stable. In order to compensate the variation, common iris recognition requires the translation of the segmented iris image to the normalized image. This paper focuses on the implementation of improved normalized image formation by employing modified segmentation method which can reduce the time of execution by ten times.

R1-10 (Time: 10:33 - 10:35)
TitleInter-Island Delay Aware Communication Synthesis for Island-Based Distributed Register Architecture
AuthorJuinn-Dar Huang, *Chia-I Chen, Wan-Ling Hsu, Yen-Ting Lin, Jing-Yang Jou (Department of Electronics Engineering and Institute of Electronics, National Chiao Tung University, Taiwan)
Pagepp. 58 - 63
KeywordBehavioral synthesis, distributed register-file, resource binding, scheduling
AbstractIn deep-submicron era, wire delay is becoming the bottleneck while pursuing high system clock speed. Several distributed register (DR) architectures are proposed to cope with this problem by keeping most wires local. In this paper, a distributed register-file microarchitecture with inter-island delay (DRFM-IID) is proposed. Though DRFM-IID is also one of the DR-based architectures, it is more practical than the prior art, DRFM, in terms of delay model. With such interconnect delay consideration, synthesis task is inherently more complicated than the one with zero inter-island delay. The unexpected interconnect delay is very likely to make a serious impact on the whole system performance due to lengthened clock cycle time. Hence we also provide a performance-driven architectural synthesis framework targeting DRFM-IID to optimize the system performance. Multiple factors, such as the number of inter-island transfers, criticality of transfer, and resource utilizations, are considered to obtain a better solution. The experimental results indicate that the latency and the number of inter-cluster transfers can on average be reduced by 26.91% and 37.54% respectively, whereas the latter is also widely used as a metric of communication power consumption.

R1-11 (Time: 10:35 - 10:37)
TitleMorFPGA: A Modularized FPGA-Based Embedded System Development Platform
AuthorYu-Tsang Chang, Chun-Ming Huang, Chien-Ming Wu, Chun-Yu Chen, *Yu-Sheng Lin, Chih-Ting Kuo, Ting-Chun Liu, Chin-Long Wey (National Chip Implementation Center, Taiwan)
Pagepp. 64 - 69
KeywordEmbedded System, SoC, FPGA, Modularized Structure, LEON3
AbstractWith the ever increasing complexity of System-on-a-chip (SoC), the pressures of short time to market, and low cost requirements, the platform-based design paradigms have been commonly used for SoC designs. Modular and flexible design becomes important features for enhancing expandability and re-configurability of the system. This paper presents a modularized FPGA-based embedded system platform for digital photo frame application with the open source processor core, LEON3. An extra touch panel module, which is not natively supported by the LEON3 GRLIB library, is introduced and successfully integrated in this application.

R1-12 (Time: 10:37 - 10:39)
TitleA Novel Design-Methodology for PCB Traces Ensuring High Signal-Integrity on Random Signals
Author*Masami Ishiguro, Shohei Akita, Hiroki Shimada, Noriyuki Aibe (University of Tsukuba, Japan), Ikuo Yoshihara (University of Miyazaki, Japan), Moritoshi Yasunaga (University of Tsukuba, Japan)
Pagepp. 70 - 75
KeywordSignal Integrity, Transmission Line, Random Signal
AbstractWe have already proposed a novel transmission line called “Segmental Transmission Line (STL)”, which can ensure high signal integrity of high-speed signals in the PCB traces. Up to now, however, the design methodology of STL has limited to the clock signals. In this paper, we propose a novel design methodology of the STL for the random signals, and fabricate a scale-up prototype based on the proposed methodology. We also demonstrate its effectiveness using the prototype compared with the conventional transmission line.

R1-13 (Time: 10:39 - 10:41)
TitleA Novel IR-Drop Tolerant Scheduling for Reliability-Aware Datapaths
Author*Keisuke Inoue, Mineo Kaneko (Japan Advanced Institute of Science and Technology, Japan)
Pagepp. 76 - 81
Keyworddatapath synthesis
AbstractIn this paper, we discuss robustness against IR-drop risk in high-level. We propose a new IR-drop model which has two phases. In the first phase, the increase in functional unit delay occurs, and in the second phase, a fatal error due to electomigration occurs. We handle the first phase by inserting timing margin, and forbid to reach the second phase in any control step. Based on the IR-drop model, We formulate our problem as a scheduling problem. Our scheduling-based approach has robustness against IR-drop not using the specialized devices, e.g. multiple supply voltage.

R1-14 (Time: 10:41 - 10:43)
TitleA Physics-Based Compact Model for the 1/f Noise in p-type Si/SiGe/Si Heterostructure MOSFETs
Author*Chia-Yu Chen (Stanford University, U.S.A.), Chi-Chao Wang, Yun Ye (Arizona State University, U.S.A.), Yang Liu (Stanford University, U.S.A.), Junko Sato-Iwanaga, Akira Inoue, Haruyuki Sorada (Panasonic Electronics, Japan), Yu Cao (Arizona State University, U.S.A.), Robert Dutton (Stanford University, U.S.A.)
Pagepp. 82 - 83
Keyword1/f noise, screening effect, SiGe p-HMOS, compact model, heterostructure
AbstractA physics-based p-type Si/SiGe/Si heterostructure MOSFET (SiGe p-HMOS) 1/f noise model that can predict charge distribution in dual channels and calculate noise contributions from two channels in circuit simulators is developed. 1/f noise behavior in SiGe p-HMOS can be modeled in cooperating the capacitance of a Si cap layer into a conventional MOS and considering dual-channel screening effects. Based on the proposed model, excellent agreement among the compact model, TCAD simulations and measurements is observed at different bias conditions.

R1-15 (Time: 10:43 - 10:45)
TitleOn Behavioral Modeling for Sigma-Delta Digital-to-Analog Converters with Accurate Timing Response
Author*Hsin-Yu Luo, Hsiu-Wen Li, Xiao-Qian Chang, Chien-Nan Jimmy Liu (National Central University, Taiwan)
Pagepp. 84 - 89
Keywordsigma-delta DAC, Behavioral model, bottom-up extraction
AbstractIn this paper, an efficient bottom-up extraction approach is proposed to build accurate behavioral models for sigma-delta digital-to-analog converters (DAC). In the special extraction mode, specific patterns can be used to obtain the key circuit parameters of the design in a short time without separating this design into several sub-blocks. Actual loading effects and parasites can be considered automatically, which makes our modeling approach more suitable for existing IPs and flattened post-layout designs. In the experiments, the comparison results between our behavioral model, top-down behavioral model and HSPICE simulation have demonstrated the accuracy and efficiency of the proposed modeling strategy

R1-16 (Time: 10:45 - 10:47)
TitleSelf-Tuning Metric and Control Policy to Optimally Trade-off Lifetime Performance-Power-Reliability
Author*Evelyn Mintarno, Joelle Skaf (Stanford University, U.S.A.), Rui Zheng, Jyothi Velamala, Yu Cao (Arizona State University, U.S.A.), Stephen Boyd, Robert W. Dutton, Subhasish Mitra (Stanford University, U.S.A.)
Pagepp. 90 - 95
KeywordCircuit aging, Energy efficiency, Reliability
AbstractAn optimization framework and control policies are presented to find the optimal self-tuning over lifetime which guarantees functional operation in the presence of circuit aging and optimally trade-off performance, power, and reliability over lifetime. A weighted function of total performance achieved, total energy consumed, and total reliability is considered as a metric to be maximized, subject to constraints imposed by the user and underlying hardware. Self-tuning policies for both offline and online aging estimation methods are described. Dynamic cooling is introduced as one of the self-tuning parameters, in addition to supply voltage and clock frequency. Simulation results using aging models validated by 45nm CMOS stress measurements demonstrate the effectiveness and practicality of the approach.

R1-17 (Time: 10:47 - 10:49)
TitleA Throughput-aware BusMesh NoC Configuration Algorithm Utilizing the Communication Rate between IP Cores
Author*SeungJu Lee, Masao Yanagisawa, Tatsuo Ohtsuki, Nozomu Togawa (Waseda University, Japan)
Pagepp. 96 - 101
KeywordNetwork-on-Chip (NoC), BusMesh NoC (BMNoC), A novel NoC algorithm, BMNoC configuration algorithm
AbstractBusmesh NoC (BMNoC) is comprised of bus-based connection and global mesh routers to enhance the performance of on-chip communication. In this paper, we propose a BMNoC configuration algorithm together with simulation results. In BMNoC configuration algorithm, IP cores which have a heavy communication rate between them are connected by a bus and then we configure CNs. CNs can have communication to each other via ESes and MRs. Furthermore, the simulation results illustrate the better latency than earlier studies and feasibility of BMNoC.

R1-18 (Time: 10:49 - 10:51)
TitleTSV-constrained Scan Chain Reordering for 3D ICs
AuthorWei-Ting Chen, Chia-Ching Chang, *Charles H.-P. Wen (National Chiao Tung University, Taiwan)
Pagepp. 102 - 107
Keyword3D ICs, TSV, Scan Testing
AbstractThis paper formulates the scan-chain reordering problem considering a limited number of through-silicon vias (TSVs), and further develops an efficient 2-stage algorithm. For three-dimensional optimization, a greedy algorithm named Multiple Fragment Heuristic combined with a dynamic closest-pair data structure FastPair is proposed to derive a good initial solution at stage 1. Later, stage 2 proceeds two local refinements 3D Planarization and 3D Relaxation to reduce the wire cost and the number of TSVs in use, respectively. Experiments show that the proposed algorithm can result in a comparable performance to a genetic-algorithm-based method but can run at least 3-order faster, which evidently makes it more practical for TSV-constrained scan-chain reordering for 3D ICs.


Invited Talk I
Time: 13:30 - 14:15 Monday, October 18, 2010
Location: Ballroom
Chair: Hiroyuki Ochi (Kyoto University, Japan)

I1-1 (Time: 13:30 - 14:15)
TitleSmart Automobiles for Future Ecosystems
Author*Hideaki Ishihara (Denso, Japan)
Pagep. 111
AbstractAutomotive electronics have been advancing rapidly in their successful pursuit of environmental friendliness, safety, comfort, and convenience. Such advancements have been achieved through the use of approximately one hundred processors per luxury car. MEMS devices and power semiconductors have also played an important role in realizing energy-efficient automobiles and automated-societies. In this talk, I will describe the prospects of "Smart Automobiles for Future Ecosystems" with specific focus on creating ecological and dependable embedded systems, while considering new semiconductor technologies, their applications, and high-level design methodologies. At the end of my talk, I will also refer to the expectations for future innovations.


Paper Session II: Logic and Physical Design (I)
Time: 14:15 - 16:00 Monday, October 18, 2010
Location: Ballroom
Chairs: Takashi Horiyama (Saitama University, Japan), Hui-Ru Iris Jiang (National Chiao Tung University, Taiwan)

R2-1 (Time: 14:15 - 14:17)
TitleStable-LSE based Analytical Placement with Overlap Removable Length
Author*Masatomo Kuwano, Yasuhiro Takashima (University of Kitakyushu, Japan)
Pagepp. 115 - 120
KeywordStable-LSE, Overlap Removable Length, Analytical Placement
AbstractWe propose a novel overlap estimation for the analytical placement. This estimation, called overlap removable length, calculates the necessary length to remove overlap between two blocks. To obtain less overlap placement, the overlap removable length is minimized in the proposed method. We implement the prototype and obtain less overlap placement for 911 blocks in a minute. We confirm its efficiency empirically.

R2-2 (Time: 14:17 - 14:19)
TitleMetal Balance Based Clock Construction to Minimize Process Variation Effect
Author*Zhi-Wei Chen (Inst. of Information Industry, Taiwan), Hung-Ming Chen, Ren-Jie Lee, Chun-Kai Wang (National Chiao Tung University, Taiwan)
Pagepp. 121 - 125
Keywordprocess variation, CTS
AbstractThe design of robust high performance clock distribution has faced significant challenges due to increasing parameter and process variations in nanometer manufacturing technology. In this work, we propose a practical problem in clock construction with process variation awareness, which is to achieve the balance of the wirelength in preferred direction metal routing. Experimental results show that our approach (unbuffered and buffered clock tree syntheses) performs better than conventional DME algorithms (thus other variants that follow DME for process variation consideration) in reducing the skew of the clock.

R2-3 (Time: 14:19 - 14:21)
TitleCircuit Performance Degradation on FPGAs Considering NBTI and Process Variations
Author*Michitarou Yabuuchi, Kazutoshi Kobayashi (Kyoto Institute of Technology, Japan)
Pagepp. 126 - 129
KeywordNBTI, FPGA, variation
AbstractLSI scaling causes the reliability problem. It is important to analyze the degradation of Negative Bias Temperature Instability(NBTI) in circuit designs. Yield is affected by variations. In the near future, NBTI and variations will decrease reliability on FPGAs fabricated in a nanometer process. In this work, we show the effect of NBTI and variations on 65nm FPGAs. According to our results, cicuit design margin can be reduced.

R2-4 (Time: 14:21 - 14:23)
TitleRover: Routing on Via-Configurable Fabrics for Standard-Cell-Like Structured ASICs
Author*Liang-Chi Lai, Hsih-Han Chang, Rung-Bin Lin (Yuan Ze University, Taiwan)
Pagepp. 130 - 135
Keywordstructured ASIC, Router, Via-configurable
AbstractIn this paper, we present a router called Rover for structured ASICs with via-configurable routing fabrics. We integrate Rover into an industrial design flow. Compared to a commercial yet non-structured ASIC router without a predefined routing fabric, Rover employing a predefined routing fabric on average uses 47% (5%) more wire length (when not counting overhang wire length). It incurs 32% more delay on the longest path, which is much smaller than the 47% increase in wire length. It creates 28.1% overhang wires, which is less than the 35% obtained by previous work.

R2-5 (Time: 14:23 - 14:25)
TitleA Physical-Location-Aware Fault Redistribution for Maximum IR-Drop Reduction
Author*Fu-Wei Chen, Shih-Liang Chen, Yung-Sheng Lin, TingTing Hwang (National Tsing Hua University, Taiwan)
Pagepp. 136 - 141
Keywordat-speed testing, IR-drop, X-identification, X-filling
AbstractTo guarantee that an application specific integrated circuits (ASIC)meets its timing requirement, at-speed scan testing becomes an indispensable procedure for verifying the performance of ASIC. However, at-speed scan test suffers the test-induced yield loss. Because the switching activity in test mode is much higher than that in normal mode, the switching-induced large current drawn causes severe IR drop and increases gate delay. X-filling is the most commonly used technique to reduce IR-drop effect during at-speed test. However, the effectiveness of X-filling depends on the number and the characteristic of X-bit distribution. In this paper, we propose a physical-location-aware X-identification which redistributes faults so that the maximum switching activity is guaranteed to be reduced after X-filling. The experimental results on ITC’99 show that our method has an average of 9.55% reduction of maximum switching activity as compared to a previous work which re-distributes X-bits evenly in all test vectors.

R2-6 (Time: 14:25 - 14:27)
TitleRedundant Via Insertion under Timing Constraints
Author*Chi-Wen Pan, Yu-Min Lee (National Chiao Tung University, Taiwan)
Pagepp. 142 - 147
Keywordredundant via, timing issues, incremental timing analysis
AbstractRedundant via insertion is a useful technique to alleviate the yield loss and elevate the reliability of designed circuit. While extra visa are inserted into the circuit, the electronic properties of circuit will be altered, and the circuit timing will be changed and need to be efficiently re-analyzed. Therefore, a fast timing analyzer is required to assistant the redundant via insertion procedure. This work develops an efficient redundant via insertion method under timing constraints. Firstly, an effectively incremental circuit timing analysis method is developed, and the redundant via insertion task is transformed into a mixed bipartite-conflict graph matching problem. Then, the insertion problem is solved by a timing-driven minimum weighted matching algorithm. The experimental results show that the developed algorithm can achieve 3.2% extra insertion rates over the method without considering timing effects in average, and the developed incremental timing analysis mechanism can speed up the runtime of redundant via insertion procedure under timing constraints by over 10 times in average.

R2-7 (Time: 14:27 - 14:29)
TitleOptimal Wiring Topology for Electromigration Avoidance
AuthorIris Hui-Ru Jiang (National Chiao Tung University, Taiwan), Hua-Yu Chang (National Taiwan University, Taiwan), *Chih-Long Chang (National Chiao Tung University, Taiwan)
Pagepp. 148 - 153
KeywordReliability, Electromigration, Network flow
AbstractDue to excessive current densities, electromigration may trigger a permanent open- or short-circuit failure in signal wires or power networks in analog or mixed-signal circuits. As the feature size keeps shrinking, this effect becomes a key reliability concern. Hence, in this paper, we focus on wiring topology generation for avoiding electromigration at the routing stage. Prior works tended towards heuristics; on the contrary, we first claim this problem belongs to class P instead of class NP-hard. Our breakthrough is, via the proof of the greedychoice property, we successfully model this problem on a multisource multi-sink flow network and then solve it by a strongly polynomial time algorithm. Experimental results prove the effectiveness and efficiency of our algorithm.

R2-8 (Time: 14:29 - 14:31)
TitleIterative 3D Partitioning for Through-Silicon Via Minimization
Author*Ya-Shih Huang, Yang-Hsiang Liu, Juinn-Dar Huang (National Chiao Tung University, Taiwan)
Pagepp. 154 - 159
KeywordPartitioning, 3D IC, Through-silicon via minimization
AbstractThree-dimensional (3D) integration is a breakthrough technology of growing importance that has the potential to offer significant benefits such as wirelength/power reduction and higher system integration. This emerging technology allows stacking multiple layers of dies and resolves the vertical connection issue by through-silicon vias (TSVs). However, though a TSV is considered a good solution for vertical connection, it also occupies significant silicon estate and incurs reliability problem. Therefore, in this paper, we propose an iterative layer-aware 3D partitioning algorithm, named iLap, for TSV minimization. iLap iteratively applies multi-way min-cut partitioning to gradually divide a given design layer by layer in the bottom-up fashion. Meanwhile, iLap also properly fulfills a special I/O pad constraint incurred by 3D structures to further improve its outcome. The experimental results show that iLap can reduce the number of TSVs by about 35% as compared to several existing methods.

R2-9 (Time: 14:31 - 14:33)
TitleA Novel Zone-Based ILP Track Routing
Author*Ke-Ren Dai, Yi-Chun Lin, Yih-Lang Li (National Chiao Tung University, Taiwan)
Pagepp. 160 - 165
KeywordRouting, Linear Programming, Track Routing
AbstractTrack routing is an intermediate step between global routing and detailed routing. It is strongly associated with deep submicron (DSM) issues. This work proposes a track routing model via integer linear programming (ILP). The proposed layer assignment optimizes routability. In the track assignment stage, partitioning each panel into zones enables ILP track assignment to encode each assignment solution as a number. The reconfigurable cost table can be adopted to minimize the costs associated with DSM issues and local wire length. The parallelism algorithm is proposed to route the nets of each panel simultaneously. Experimental results indicate that the proposed track routing algorithm improves maximal density to using detailed router only. Moreover, the proposed parallelism algorithm run on an 8-core processer yields an 80% lower routing time than that using the one core system.

R2-10 (Time: 14:33 - 14:35)
Title3D-AADI: An Adaptive and Integrable Thermal Simulator According to ADI Concept for 3D IC Physical Design Flow
Author*Sophie Ting-Jung Li, Yu-Min Lee (Department of Electrical Engineering National Chiao Tung University, Taiwan)
Pagepp. 166 - 171
Keyword3D IC, physical design, thermal simulation, thermal-aware, thermal-driven
Abstract3D ICs, which deal with cost-e ective achievement by increasing the densities of intercon-nection between dies, are regarded as an attractive alternative solution for overcoming the bottlenecks on 2D planar ICs. In fact, 3D ICs o er the increased system a large number of advantages. However, one of critical challenges is heat dissipation due to higher accumulated power density and lower thermal conductivity of inter-layer dielectrics for vertical stacking layers of active tier. In this way, the management of thermal issues should be considered during physical design stages in spite of only pre-packaging verification on the future highly integrated systems. For these reasons, we develop an adaptive thermal simulator apply our adaptive-3D-thermal-ADI algorithm based on ADI method to provide temperature distribution during 3D IC physical design flow from floor-plan to veri cation. The simulator constructs adaptive size of simulation grids to avoid the restriction of the most critical position. Furthermore, we apply the concept of ADI iteration method to non-uniform nodes. Eventually, the adaptive-3D-thermal-ADI tool can be regard as both a reliable thermal simulator and a thermal-driven kernel on 3D IC design flow. The simulator we developed is both adaptive and incremental.

R2-11 (Time: 14:35 - 14:37)
TitleAn ILP-based Diagnosis Framework For Multiple Open-Segment Defects
AuthorChen-Yuan Kao, Chien-Hui Liao, *Charles Hung-Ping Wen (National Chiao Tung University, Taiwan)
Pagepp. 172 - 177
Keywordopen defect, Byzantine effect, diagnosis, segment fault
AbstractThe faulty responses of an open defect are determined by the Byzantine effect and the physical routing. The Byzantine effect makes such faulty behaviors non-deterministic and depends upon both the pattern and physical information. Therefore, traditional ATPG has difficulty on its fault activation and propagation. This paper proposes a three-stage diagnosis approach of finding combinations of open-segment defects automatically. Path tracing technique helps extract all candidate fault sites from error outputs of failing patterns. An ILP solver enumerates all fault combinations by considering fault candidates and simulation responses. Last, fault simulation identifies true open-segment faults by pruning false cases. Experimental results shows the resolution of the proposed approach is high and only generates an average of <4 combinations for ISCAS85 circuits under the multiple injection of open-segment defects.

R2-12 (Time: 14:37 - 14:39)
TitleDual Supply Voltage Assignment in 3D ICs Considering Thermal Effects
Author*Shu-Han Whi, Yu-Min Lee (National Chiao Tung University, Taiwan)
Pagepp. 178 - 183
Keyword3D IC, MSV, voltage assignment, voltage island, thermal
AbstractThe three dimensional integrated circuits (3D ICs) have been viewed as an effective method to improve chip performance by overcoming the bottleneck of long interconnects in the 2D ICs. However, the higher temperature becomes a serious challenge for 3D ICs and mitigates the advantage of low power. Therefore, it is important to propose an effective method considering thermal effect and power optimization simultaneously. In this paper, we present a methodology to minimize the total power consumption in the 3D ICs by employing a grid-based dual supply voltage technology. The proposed approach consider three main headings: 1) a voltage assignment process considering three main factor, which consists of sensitivity-based, proximity effect and level shifter budget factor, to be the voltage assignment criterion for power reduction; 2) a 3D electro-thermal simulation to get the temperature of chip; 3) a thermal aware static timing analysis to obtain the thermal related delay of gate in the circuit. The experimental results demonstrate the effectiveness of our voltage assignment method and the thermal effect in circuit performance.

R2-13 (Time: 14:39 - 14:41)
TitleStudy of Multiple-Output Neuron MOS Current Mirror for Current-Steering Digital-to-Analog Converter
Author*Shuhei Yasumoto, Yuki Nobe, Akio Shimizu, Sumio Fukai (Saga University, Japan), Yohei Ishikawa (Ariake National College of Technology, Japan)
Pagepp. 184 - 189
Keyworddigital-to-analog converter, neuron MOSFET, current mirror
AbstractIn this paper, we proposed a multiple-output neuron MOS current mirror for a current-steering digital-to-analog (D/A) converter. To improve the output voltage range of current-steering D/A converter, it proposes the current source that operates by the low voltage. It was achieved by using neuron MOS current mirror. HSPICE simulation results show that the proposed circuit is saturation region in Vout > 0.2[V]. Moreover, the proposed circuit is reducing circuit area compared with the conventional circuit.

R2-14 (Time: 14:41 - 14:43)
TitleExtended Sequence Pair: A Finite Solution Space for Two-Directional Repeated Placement
Author*Mineo Kaneko, Takayuki Shibata (Japan Advanced Institute of Science and Technology, Japan)
Pagepp. 190 - 195
Keywordrectangular packing, sequence pair, module placement, horizontal/vertical constraint graph
AbstractRepeated placement treated in this paper is the problem to place multiple copies of a set of modules so that copies of each module appear repeatedly with a common horizontal interval Lx and with a common vertical inerval Ly. To identify the solution space of this repeated placement problem, and to construct efficient algorithms for treating those placements, a coding system for those repeated placements of modules is proposed in this paper. Our proposed coding system uses a pair of sequences of module names, but one module is allowed to appear up to twice. By introducing such multiple module names, our coding system has a potential to describe not only intra-cycle spatial relation between modules but also inter-cycle spatial relation. This paper mainly treats semantics and syntax (feasibility conditions, and decoding algorithm) of our coding system.

R2-15 (Time: 14:43 - 14:45)
TitleLSI Implementation Method of DES Cryptographic Circuit Utilizing Domino-RSL Gate Resistant to DPA Attack
Author*Kenji Kojima, Kazuki Okuyama, Katsuhiro Iwai, Mitsuru Shiozaki (Ritsumeikan University, Japan), Masaya Yoshikawa (Meijyo University, Japan), Takeshi Fujino (Ritsumeikan University, Japan)
Pagepp. 196 - 201
KeywordSide Channel Attack, Differential Power Analysis, Domino-RSL, FPGA, Structured-ASIC
AbstractIt is necessary to design the tamper-resistant cryptographic circuit against side-channel attack such as Differential Power Analysis (DPA) to protect the secret key in it. In this paper, we propose the novel Domino-RSL gate primitive, which equalize the output transition probability for any input data using random number. We implemented DES cryptographic circuit using the pseudo Domino-RSL and random masking process on the FPGA board, and the DPA resistance is confirmed experimentally. In addition, we compared the circuit area composed of three kinds of Domino-RSL gates for ASIC implementation. As the result, the S-BOX circuit block using SOP/POS gates shows the smallest area. We estimated the area of DES cryptographic circuit including 8 S-BOX macros, in which SOP/POS gates are arranged by the matrices of 13×8. This area is 1.57 times as large as the normal circuit without DPA countermeasure.

R2-16 (Time: 14:45 - 14:47)
TitleThe Sizing of Sleep Transistors In Controlling Value Based Power Gating
Author*Lei Chen, Shinji Kimura (Graduate School of Information, Production and Systems, Waseda University, Japan)
Pagepp. 202 - 207
Keywordcontrolling value, power gating, sleep transistor sizing
AbstractA recently proposed low power technique, controlling-value-based (CV-based) power gating method, has been shown to be an area-efficient method and is also capable of reducing both dynamic power and leakage power while maintaining the performance of the original circuit. In this paper, we experimentally investigate the issue of sleep transistor sizing in CV-based power gating. Different from the traditional power gating, sleep transistors can be sized almost the same with the ones used in the original circuit instead of several times larger as usual. The experimental results further proves that CV-based power gating relatively suffers little from the area and delay penalties caused by the sleep transistors compared with other methods.

R2-17 (Time: 14:47 - 14:49)
TitleA Verification Method of Pipeline Processing Behavior of Superconducting Single-Flux-Quantum Pulse Logic Circuits
Author*Kazuyoshi Takagi, Motoki Sato, Masamitsu Tanaka (Nagoya University, Japan), Naofumi Takagi (Kyoto University, Japan)
Pagepp. 208 - 213
KeywordSingle-Flux-Quantum circuit, logic design verification, pipeline processing, pulse logic
AbstractIn pulse-driven synchronous Single-Flux-Quantum (SFQ) logic circuits, a clock signal is fed to each logic gate. The behavior of SFQ circuits can be considered as fine-grain pipeline processing. Therefore, SFQ circuits must be designed to implement the required logic functionality, while satisfying the pipeline timing requirements. In this paper, we propose a verification method of the pipeline processing behavior of SFQ circuits. In the method, we extract timed logic formulae from the circuit, and check their equivalence to the specification.

R2-18 (Time: 14:49 - 14:51)
TitleAn Incremental Synthesis Technique for ECO Based on Iterative Procedure for Error Diagnosis and Spare Cell Assignment
Author*Kosuke Watanabe, Hiroto Senzaki, Kosuke Shioki, Tetsuya Hirose, Nobutaka Kuroki, Masahiro Numa (Kobe University, Japan)
Pagepp. 214 - 219
KeywordECO, Spare Cell, Error Diagnosis, Incremental Synthesis
AbstractThis paper presents an incremental synthesis technique for Engineering Change Orders (ECO’s) based on iterative procedure for error diagnosis and spare cell assignment. A conventional error diagnosis technique based on iterative diagnosis procedure requires many spare cells to rectify a circuit, which causes failure in technology remapping. In order to avoid failure in technology remapping, our technique selects a solution for subcircuits based on available spare cells. Experimental results have shown that our technique improves technology remapping success ratio by 44.4% in average.

R2-19 (Time: 14:51 - 14:53)
TitleError-Rate Prediction for Probabilistic Circuits with More General Structures
AuthorMark Lau, *Keck-Voon Ling, Arun Bhanu, Vincent Mooney (Nanyang Technological University, Singapore)
Pagepp. 220 - 225
Keywordprobabilistic computing, carry-select adder, error-rate, HSPICE
AbstractA methodology has been proposed recently to predict error-rates of probabilistic circuits having a cascade structure. It was able to predict reasonably accurately for probabilistic ripple-carry and carry-skip adders. The objective of the present paper is twofold. First, the methodology is applied, for the first time in the literature, to a probabilistic carry-select adder, which has a more complex structure than the adders mentioned above. This is to provide additional evidence that the method is versatile and applicable to some non-trivial circuits. Second, the present paper shows that the methodology is also applicable to some seemingly non-cascade circuits. The key technique is to appropriately group circuit components into various blocks before applying the methodology. Such a preprocessing may potentially widen the scope of applicability of the methodology.


Panel Discussion
Time: 16:00 - 17:30 Monday, October 18, 2010
Location: Ballroom
Moderator: Tsun-Chieh Chiang (ITRI)

D-1 (Time: 16:00 - 17:30)
TitleIs Automotive Electronics Creating New Opportunities for Semiconductor?
AuthorOrganizers: Cheng-Wen Wu (ITRI, Taiwan), Jing-Jou Tang (Southern Taiwan University, Taiwan), Moderator: Tsun-Chieh Chiang (ITRI, Taiwan), Panelists: Ching-Yao Chan (University of California, Berkeley, U.S.A.), Hsueh-Lung Liao (ARTC, Taiwan), Kenneth Ma, James Wang (ITRI, Taiwan)
Pagepp. 229 - 230
AbstractThe automotive industry worldwide is going through a dramatic structural as well as cultural change never seen before in its history. The energy crisis in 2006 followed by the financial tsunami has done a lot of damages to many auto companies around the world. The semiconductor industry, like most other industries, also suffered from the worldwide economic down turn in 2007 and 2008. Although we are seeing signs of recovery in global economy, the two industries for sure will not stay in their same positions as they were before the crisis. Collaboration between these two industries will benefit not only themselves, but as well the general public who consider transportation as a basic need in their daily life. For semiconductor companies, enhancing their presence in the automotive electronic systems market means higher potential growth. For automotive companies, enhancing electronic systems and intelligence of the vehicles they manufacture means more efficient use of energy and highway, and vehicles that are safer, more fun to drive, equipped with more functionalities, etc., thus bringing higher value to the vehicles, drivers, and passengers. Although due to safety concerns and cultural reasons, the automotive industry in the past has been very conservative in integrating new electronic systems into vehicles. The situation, however, is changing very fast as we can see. We believe it is timely to discuss related issues during SASIMI 2010 to benefit our audience. This session includes a panel of experts in this area, who will give position statements and elaborate their perspectives of whether automotive electronic systems create new opportunities for the semiconductor industry. The panel will then be open for discussion between the audience and the panelists.



Tuesday, October 19, 2010

Keynote II
Time: 9:00 - 10:00 Tuesday, October 19, 2010
Location: Ballroom
Chair: Youn-Long Lin (National Tsing Hua University, Taiwan)

K2-1 (Time: 9:00 - 10:00)
TitleWorkflow Approach to Building User-Centric Automation and Assistive Devices and Systems
Author*Jane W. S. Liu (Academia Sinica, Taiwan)
Pagepp. 233 - 234
AbstractThis talk will discuss the use of workflow paradigm for modeling, design, implementation and evaluation of UCAAD. The acronym stands for user-centric automation and assistive devices and systems (services). Some UCAADS aim to help improve quality of life and self-reliance of their users, including elderly or functionally limited individuals. Examples are smart medication dispensers, autonomous appliances, service robots and robotic helpers. Other UCAADS are automation tools for care-providing institutions. Examples include smart medication cabinets and mobile tools that enforce bar-code controlled medication dispensing and administration for the purpose of enhancing the quality of medication use process. The talk will first present case studies to illustrate that UCAADS with workflow-based architecture can be easily configured and customized to support different processes, rely on different infrastructures and suit different users. In such a device (or system), components are workflows. We can use workflow definitions as behavior specification of the devices and as models of user actions. Being executable, the specification and models enable the usability of the device and correctness of device-user interactions to be assessed via simulation as soon as the requirement specification and design of the device are available. When software procedures, hardware devices, etc. required by workflows become available, we can implement the device by having its behavior specification run on a workflow engine and letting the middleware integrate the workflow components at runtime dynamically. The talk will conclude by presenting Embedded Workflow Framework (EMWF) and USE (UCAADS Simulation Environment). EMWF is written in C and provides lightweight engines on Linux, Microsoft Windows CE and XP Embedded. It is being developed to enable the implementation of workflow-based UCAADS specifically, similar embedded devices and systems in general. Similar to other simulation environments, USE also provides extensible libraries of reusable models and device components as well as data capture and display tools. In addition, USE supports the incorporation of workflow model elements with elements of human processor models commonly used in studies on human-computer interactions. In this way, USE enables the dependencies of device-user interactions on user behavior and skills be accounted for more precisely in simulation experiments of the device and its user(s).


Paper Session III: Logic and Physical Design (II)
Time: 10:15 - 12:00 Tuesday, October 19, 2010
Location: Ballroom
Chairs: Yasuhiro Takashima (University of Kitakyushu, Japan), Jiun-Lang Huang (National Taiwan University, Taiwan)

R3-1 (Time: 10:15 - 10:17)
TitleIncreasing Yield Using Partially-Programmable Circuits
Author*Shigeru Yamashita (Ritsumeikan University, Japan), Hiroaki Yoshida, Masahiro Fujita (University of Tokyo, Japan)
Pagepp. 237 - 242
KeywordPartially-Programmable Circuits, Yield, SPFD
AbstractThis paper proposes to use a new circuit model called Partially-Programmable Circuits (PPCs) to to increase the yield with very small overhead. PPCs are obtained from conventional logic circuits by replacing their sub-circuits with LUTs. If a connection in an PPC becomes redundant by changing the functionality of some LUTs, the connection is considered to be robust to defects because even if there are some defects at the connection, the circuit works properly by changing the functionality of some LUTs appropriately. To increase the number of such robust connections, we add some redundant connections to LUTs beforehand. We find such redundant connection by using functional flexibility represented by SPFDs and/or CSPFs. Thus, by our proposed approach we can increase the yield by only adding some redundant connections beforehand. From the result of our preliminary experiments, we consider our approach is promising.

R3-2 (Time: 10:17 - 10:19)
TitleOn Handling Cell Placement with Exclusive Adjacent Symmetry Constraints for Analog IC Layout Design
Author*Shimpei Asano, Kunihiro Fujiyoshi (Tokyo University of Agriculture and Technology, Japan)
Pagepp. 243 - 248
KeywordSymmetry constraint, Sequence-pair, Analog circuits, Placement
AbstractIn recent high performance analog IC design, it is often required to place some cells symmetrically to a horizontal or vertical axis. Then, some methods of obtaining the closest placement that satisfies the given symmetry constraints and the topology constraints imposed by a sequence-pair were proposed. But, some cells placed symmetrically are required to be placed nearly. Therefore, in this paper, we define ``exclusive adjacent symmetry onstraint'' and propose a method of obtaining the closest cell placement that satisfies the given constraints.

R3-3 (Time: 10:19 - 10:21)
TitleA Low-Cost and Noise-Tolerant ADC BIST with On-the-Fly DNL/INL Calculation
AuthorKuo-Yu Chou, Ming-Huan Lu, Ping-Ying Kang, Xuan-Lun Huang, *Jiun-Lang Huang (National Taiwan University, Taiwan)
Pagepp. 249 - 253
Keyworddesign-for-testability, ADC testing, histogram testing, wireless testing
AbstractA low-cost ADC BIST is developed and implemented in TSMC .18 μm CMOS technology. This design utilizes a noise tolerant code hit counting technique to facilitate the linear histogram testing. In addition, it includes an on-the-fly DNL/INL calcula- tion circuit that makes the pass/fail decision or out- puts the code widths for debugging. This design also possesses an interface compatible with the HOY wire- less testing system. Experiments on a 10-bit pipelined ADC in the HOY environment are performed to vali- date the proposed design.

R3-4 (Time: 10:21 - 10:23)
TitleA Four-valude Adder Circuit Design with FG-MOS Transistors
Author*Yuya Wada, Koji Nishi, Akio Shimizu, Sumio Fukai (Saga University, Japan), Yohei Ishikawa (Ariake National College of Technology, Japan)
Pagepp. 254 - 259
Keywordmultiple-valued logic, multiple-valued adder, FG-MOS, reduce of circuit area
AbstractIn this paper, we designed a four-valued adder circuit with FG-MOSFET. The proposed circuit is an arithmetic logic unit that performs addition four-valued signal mutually on quaternary number. The proposed circuit can reduce for amount of wiring in a chip. The proposed circuit is designed with FG-MOSFET. The FG-MOSFET can achieve the simple circuit configuration and the reduction in the amount of wiring. We confirmed that the proposed circuit can achieve a half circuit aria of binary-valued adder.

R3-5 (Time: 10:23 - 10:25)
TitleHigh-Level Synthesis of 3D IC Designs for TSV Number Minimization
AuthorChih-Hung Lee, *Shih-Hsu Huang, Chun-Hua Cheng (Chung Yuan Christian University, Taiwan)
Pagepp. 260 - 265
KeywordHigh-Level Synthesis, 3D IC, Integer Linear Programming
AbstractRecent progress in manufacturing technology makes it is possible to vertically stack multiple integrated chips. Therefore, developing CAD tools according to characteristics of 3D architecture is urgent and important. In this paper, we propose an integer linear programming formulation to perform signal through-the-silicon-vias (TSV) number minimization in high-level synthesis of 3D ICs. Different from previous works, our formulation directly and accurately minimizes the TSV number. Since TSV number is determined by layer assignment result of communicating resources rather than communicating operations, experimental results promise that our formulation is more effective and accurate on TSV number minimization than previous works.

R3-6 (Time: 10:25 - 10:27)
TitleAn IEEE 1500 Wrapper Sharing Technique on Reducing Test Cost
Author*Mao-Yin Wang, Ji-Jan Chen (Industrial Technology Research Institute, Taiwan)
Pagepp. 266 - 271
KeywordIEEE 1500, Test Wrapper, Test Scheduling, Sharing, SOC Test Architecture
AbstractMost existing approaches on design of SOC test architectures are developed for test time minimization. In addition to test time, the wrapper cost is also included in the test cost. We propose a test wrapper sharing technique based on the integer linear programming to reduce the wrapper cost. Experimental results show that our technique can achieve at least 24% reduction in wrapper logic and find a test schedule such that the number of WBR cells is minimized.

R3-7 (Time: 10:27 - 10:29)
TitleAn Incremental Synthesis Technique Based on Error Diagnosis and Technology Remapping for Clusters
AuthorHiroto Senzaki, Kosuke Watanabe, Kosuke Shioki, Tetsuya Hirose, Nobutaka Kuroki, *Masahiro Numa (Kobe University, Japan)
Pagepp. 272 - 277
KeywordECO, Spare cell, Incremental Synthesis, Eror diagnosis
AbstractIn an LSI design process, Engineering Change Orders (ECO's) are often given even after the masks have been prepared. This paper presents an incremental synthesis technique based on error diagnosis and technology remapping for clusters in order to reduce the number of spare cells needed to modify the circuit for satisfying functional post-mask ECO's. The proposed technique chooses and modifies not only error locations obtained by error diagnosis, but also clusters such as fanout free regions or reconvergent fanout regions including the error locations if fewer number of spare cells are needed.

R3-8 (Time: 10:29 - 10:31)
TitleA Single Layer Trunk Routing Using 45-Degree Lines within Critical Areas for PCB Routing
Author*Kyosuke Shinoda (Tokyo Institute of Technology, Japan), Yukihide Kohira (The University of Aizu, Japan), Atsushi Takahashi (Osaka University, Japan)
Pagepp. 278 - 283
Keywordprinted circuit board, plane routing, river routing, routing congestion, 45 degree line
AbstractIn Printed Circuit Board (PCB) design, most of routing instances contain congested areas at which routing can not be realized only by horizontal and vertical segments. In this paper, we propose a method that efficiently detects a congested area of single layer PCB routing problems which are derived after escape routing and routing area assignment are finished, and that relaxes routing congestion by locally introducing 45 degree segments so that a feasible routing is efficiently obtained.

R3-9 (Time: 10:31 - 10:33)
TitleClockless Handshaking Inter-chip Communication Applied in Daisy-chained Biomedical Signal Processing SoC
Author*Hong-Hui Chen, Tung-Chien Chen, Cheng-Yi Chiang, Liang-Gee Chen (National Taiwan University, Taiwan)
Pagepp. 284 - 289
KeywordECG, daisy, chain, daisy-chained, SoC
AbstractIn this paper an effective extension interface for connecting a bunch of biomedical SoC is introduced. The interface adopts handshaking mechanism and removes the need to assure synchronization to clock signals which makes the PCB design easier. Payload format is well elaborated and the bus arbitration scheme is updated to make the latencies for doing communication in the SoCs daisy chain minimized in order to provide an effective bandwidth to gather the computational results from chips in the chain. Currently, the designed SoC is targeted to be used in ECG related application. The proposed interface is useful in occasions where on-the-spot ECG signal processing is applied with algorithms that produce computational results at a lower speed than raw data input rate.

R3-10 (Time: 10:33 - 10:35)
TitleA New Statistical Maximum Operation for Gaussian Mixture Models Considering Cumulative Distribution Function Curve
Author*Shuji Tsukiyama (Chuo University, Japan), Masahiro Fukui (Ritsumeikan University, Japan)
Pagepp. 290 - 295
KeywordStatistical maximum, Gaussian mixture model, Statistical static timing anlalysis
AbstractA new method for the statistical maximum operation of Gaussian mixture models is presented, which is useful in statistical static timing analysis and delay fault testing. The method takes the cumulative distribution function curve into account, and can reduce the error of probability by almost 80% from the previous method.

R3-11 (Time: 10:35 - 10:37)
TitleMaximal Resilience for Reliability Enhancement in Interconnect Structure
AuthorChih-Yun Pai, *Shu-Min Li (National Sun Yat-sen University, Taiwan)
Pagepp. 296 - 301
Keywordinterconnect resilience, interconnect diagnosis, interconnect detection, oscillation ring, fault-tolerant routing
AbstractThis paper proposes a resilient scheme to achieve maximal interconnect fault tolerance, reliability and yield for both single and multiple interconnect faults under stuck-at and open fault models. By exploiting multiple routes inherent in a interconnect structure, this scheme can tolerate faulty connections by efficiently finding alternative paths. This scheme is compatible with previous interconnect detection and diagnosis methods, and together they can be applied to implement a robust interconnect structure that may still provide correct communication even under multiple faults. Furthermore, this scheme can identify connections which will cause communication failure if they are faulty. With this knowledge, designers can significantly improve interconnect reliability by augmenting such vulnerable connections. Experimental results show that alternative paths can be found for almost all paths in this scheme; this it provides a way to achieve fault-tolerant and reliability/yield improvement.

R3-12 (Time: 10:37 - 10:39)
TitleMinimizing Wirelength and Overflow of 3D-IC Global Routing by Signal-TSV Planning
Author*Guan-Hung Chen, Ke-Ren Dai, Yih-Lang Li (National Chiao Tung University, Taiwan)
Pagepp. 302 - 307
Keyword3D-IC, TSV, Wirelength, Routing, Planning
AbstractThis study integrates signal through-silicon-via (STSV) planning with global routing to eliminate the side-effects of inserting STSVs. The proposed approach mainly comprises two steps: initial STSV positioning places STSVs in appropriate locations and STSV count to each net with the estimation of congested regions; then wirelength-minimization and overflow-reduction issues are addressed by replacing STSVs during global routing. Experimental results show that the proposed method effectively improves the total routed wirelength by 3%~17% and reduces the number of congested regions, below those obtained using state-of-the-art procedures based on 3D-placement benchmarks. Moreover, the results of the proposed STSV-planning approach can reduce the number of DRC-violations and the wirelength by 2%~10%, below those of other methods from the reports of the commercial P&R tool.

R3-13 (Time: 10:39 - 10:41)
TitleBus-Driven Floorplanning With Bus Pin Assignment
Author*Po-Hsun Wu, Tsung-Yi Ho (National Cheng Kung University, Taiwan)
Pagepp. 308 - 313
KeywordFloorplanning, Bus Planning
AbstractWith the number of buses increase in multi-core SoC designs, the bus planning problem has become an important factor in determining the performance and power consumption of SoC designs. To ease the effort of bus planning problem, it is desirable to consider this issue in early floorplanning stage. Recently, bus-driven floorplanning (BDF) has attracted much attention in the literature. However, current algorithms adopt an over-simplified formulation which ignores the position and orientation of the bus pins, the simplified formulation may deteriorate the chip performance. In this paper, we propose a BDF algorithm that fully considers the impacts of bus pins. By fully utilizing the position and orientation of bus pins, bus bendings are not restricted to occur at the modules on the bus, thus, it has more flexibilities on the bus shape. With more flexibilities on the bus shape, the size of the solution space is increased and a better BDF solution can be obtained. Compared with the state-of-the-art bus-driven floorplanner, the experimental results show that our floorplanner performs better in runtime by 3.5x, success rate by 1.2x, wirelength by 1.8x, and reduced the deadspace by 1.2x. To enhance the solution quality of our floorplanner, we also develop an algorithm to minimize the wirelength differences between different bits. Experimental results show that our BDF algorithm is very promising.

R3-14 (Time: 10:41 - 10:43)
TitleSystematic Yield Optimization for Restricted PPC Pattern Generation with Genetic Algorithm
Author*Katsuhiko Harazaki (Sharp Corporation, Japan), Moritoshi Yasunaga (University of Tsukuba, Japan)
Pagepp. 314 - 319
KeywordDFM, Yield, Lithography, GA, PPC
AbstractRecently, improvements in manufacturing to increase systematic yield have become critical for the development of advanced LSI. To achieve this, studies into Design for manufacturability (DFM) issues have become essential. Normally, discussions about die yield revolved around random yield issues such as defects caused by dust in the fab. However, in the past several years, the discussion has moved to systematic yield including lithography and etching effects which are related to their equipments and manufacturing conditions. To improve die yield and ensure cost effective manufacturing, it is especially important to raise the systematic yield of a process. In this paper, we explain what systematic yield is and describe the relationship between systematic yield and lithography and etching processes. We also investigate various yield models, summarize the relationship between them and then optimize the cell patterns for the Gate Poly layer of a LSI process. Our results show how performing systematic yield optimization of the cell layout pattern taking into account Process and proximity compensation (PPC) can be achieved with a Genetic Algorithm methodology.

R3-15 (Time: 10:43 - 10:45)
TitleClock Planning for Multi-Voltage and Multi-Mode Designs
Author*Chang-Cheng Tsai, Tzu-Hen Lin, Shin-Han Tsai, Hung-Ming Chen (National Chiao Tung University, Taiwan)
Pagepp. 320 - 324
Keywordclock planning, multi-voltage, low power
AbstractLow power demand drives the development of lower power design architectures, among which multiple supply voltage is one of the state-of-the-art techniques to achieve low power. In addition, dynamic voltage frequency scaling and adaptive voltage scaling are popular power saving techniques during chip operation to provide different modes for various performance requirements. It is therefore very challenging to generate a clock tree for different operation modes. This paper proposes several implementations on this important issue, one of which can provide smallest clock latency and minimum clock skew on average of required operation modes in multi-voltage designs.

R3-16 (Time: 10:45 - 10:47)
TitleEfficient Random-Defect Aware Layer Assignment and Gridless Track Routing
Author*Yu-Wei Lee, Yen-Hung Lin, Yih-Lang Li (National Chiao Tung University, Taiwan)
Pagepp. 325 - 330
KeywordDesign for yield, random defect, griless design, track routing, layer assignment
AbstractDesign for yield (DFY) problems have received increasing attention. Of particular concern in DFY problems is how to formulate and reduce a critical area for random defects. Arranging interconnections is recognized as an effective means of improving the sensitivity towards random defects. Previous works have demonstrated that random defects significantly influence interconnections and the effectiveness of layer assignment and track routing to enhance routing quality and performance. This work proposes a random defect aware layer assignment and gridless track routing (RAAT) to eliminate the effect of random defects. Gridless track routing comprises wire ordering, wire sizing and spacing in this work. Exposure ratio metric is proposed to assign well each iroute to a specific layer. RAAT utilizes min-cut partitioning, a conventionally adopted method for placement and floorplanning, to place interconnections. Slicing tree-based structure improves the efficiency of wire ordering in lowering overlapped length between adjacent partitions. Finally, a second-order cone programming refined by considering an extra random-defect effect determines the position and width of each iroute. Experimental results demonstrate the necessity of the integration of layer assignment and track routing. Results further demonstrate the effectiveness of the gridless track routing methods proposed by RAAT. In addition to finishing each case more rapidly with higher completion rate than previous works do, RAAT reduces up to 20% of the number of failures in the Monte Carlo simulation as compared to previous works.

R3-17 (Time: 10:47 - 10:49)
TitleAnalog Layout Generation based on Wiring Symmetry
Author*Yu-Ming Yang, Iris Hui-Ru Jiang (National Chiao Tung University, Taiwan)
Pagepp. 331 - 336
KeywordAnalog design automation, wiring symmetry
AbstractUnlike the mature and highly automatic flow for digital layout generation, the existing method to generate an analog layout is far from automatic because it highly depends on the designer’s expertise. Prior endeavors are mainly dedicated to analog placement because they consider only the device symmetry constraint. This paper raises the wiring symmetry issue to analog layout: wiring symmetry is as crucial as device symmetry. Hence, we propose an analog placement and global routing algorithm to consider both types of symmetry constraints. During placement, we utilize the device folding technique to enhance the flexibility and feasibility on symmetry. Our results show that our algorithm can produce a promising initial layout to speed up the analog design process.

R3-18 (Time: 10:49 - 10:51)
TitleAn Approach for Computation Efficiency Improvement of Power Grid Simulation by GPGPU
Author*Makoto Yokota, Yuuya Isoda, Tetsuya Hasegawa, Masahiro Fukui (Ritsumeikan University, Japan)
Pagepp. 337 - 342
KeywordGPGPU, Simulation, Power Grid
AbstractThis paper proposes a speeding up technique for massively parallel power grid simulator by GPGPU (General Purpose computing on Graphics Processing Unit). The proposed power grid simulator is implemented by considering the GPU architecture. Experimental result show that the proposed method realizes 2.9 times faster than the conventional method. As a result, the proposed power grid simulator has achieved 75 times speeding-up than CPU computation with same accuracy.


Invited Talk II
Time: 13:30 - 14:15 Tuesday, October 19, 2010
Location: Ballroom
Chair: Ting-Chi Wang (National Tsing Hua University, Taiwan)

I2-1 (Time: 13:30 - 14:15)
TitleRecent Research Development in Mixed-Size Circuit Placement
AuthorMeng-Kai Hsu, *Yao-Wen Chang (National Taiwan University, Taiwan)
Pagepp. 345 - 351
AbstractA modern chip often contains large numbers of pre-designed macros (e.g., embedded memories, IP blocks) and standard cells, with very different sizes. The fast-growing design complexity with large-scale mixed-size macros and standard cells has caused significant challenges to modern circuit placement. In this paper, we first discuss the strengths and weaknesses of existing techniques for mixed-size placement. We then present a unified analytical algorithm to place large macros and standard cells simultaneously, with the first attempt in the literature to resolve the two intrinsic problems in analytical macro placement: rotation and legalization of large macros. Comparative studies are provided to show the superiority of our unified analytical algorithm. Finally, we provide some future research directions for modern mixed-size placement.


Invited Talk III
Time: 14:15 - 15:00 Tuesday, October 19, 2010
Location: Ballroom
Chair: Ting-Chi Wang (National Tsing Hua University, Taiwan)

I3-1 (Time: 14:15 - 15:00)
Title3D Die-Stacking: Challenges and Opportunities for Computer Architecture
Author*Gabriel H. Loh (Advanced Micro Devices, U.S.A.)
Pagep. 355
AbstractThree-dimensional die-stacking technologies are rapidly maturing, with intense research and development happening in the areas of manufacturing, EDA/CAD, test and yield improvement. The computer architecture research area is also starting to show great interest in 3D technology, and there are many opportunities and challenges. A first obvious direction for 3D integration is the incorporation of memory technologies alongside the processor. Even for this seemingly simple approach, many research and practical questions remain open. The large number of through-silicon vias can provide a high-bandwidth die-stacked memory interface, and there are many options for how this bandwidth may be utilized, such as providing many independent channels, very wide channels, more sophisticated command interfaces, etc. Modern high-performance processor microarchitectures have been designed to cope with a relatively low-bandwidth, high-latency memory interface, employing techniques such as speculative execution, out-of-order execution, hardware prefetching, etc. In a system employing die-stacked memories, such aggressive (and power hungry!) techniques may not be necessary, or at least could be significantly scaled back. There are many research opportunities for better optimizing processor pipelines to match the bandwidth of stacked memories, possibly even to the point of designing entirely new microarchitectures. Beyond the stacking of memory on processors, 3D integration also introduces the opportunity for new compute organizations. In particular, conventional commodity processors can be combined with a variety of specialized accelerators and application-specific processing or other circuitry. Conventional co-processor organizations employ coarse task partitioning due to the relatively limited bandwidth between a host processor and the co-processor; fine-grained task partitioning requires frequent communications which would eliminate performance benefits. 3D stacking can allow very tight cooperation between the processor and ASICs, reconfigurable logic/FPGAs, or any variety of custom-built accelerators (e.g., for signal processing, analog computing, machine learning, string matching/pattern recognition). All of these approaches lead to new and exciting compute platforms with the potential to greatly increase performance as well as performance-per-Watt.


Paper Session IV: System Level Design and Design Experience (II)
Time: 15:15 - 16:50 Tuesday, October 19, 2010
Location: Ballroom
Chairs: Masanori Muroyama (Tohoku University, Japan), Lih-Yih Chiou (National Cheng Kung University, Taiwan)

R4-1 (Time: 15:15 - 15:17)
TitleA Regular Expression Matching Circuit Based on a Modular Non-Deterministic Finite Automaton with Multi-Character Transition
Author*Hiroki Nakahara, Tsutomu Sasao, Munehiro Matsuura (Kyushu Institute of Technology, Japan)
Pagepp. 359 - 364
KeywordRegular Expression, IDS, FPGA, reconfigurable device
AbstractThis paper shows an implementation of a regular expression circuit based on an NFA (Non-deterministic finite automaton). Also, it shows that the NFA based one is superior to the DFA (Deterministic finite automaton) based one, in terms of area and time complexity. A regular expression matching circuit is generated as follows: First, the given regular expressions are converted into an NFA. Then, to reduce the number of states, the NFA is converted into a modular non-deterministic finite automaton (MNFA(p)) with p-character transition. Finally, a finite-input memory machine (FIMM) to detect p-characters as well as the matching elements (MEs) realizing the states for the MNFA(p) are generated. We designed MNFA(p) for different p on a Xilinx FPGA. Then, we derived an optimal value p that efficiently uses both LUTs and embedded memories of the FPGA. As for the performance per FPGA area, our method is 6.2-18.6 times better than DFA-based methods, and is 1.8 times better than the NFA-based method. Since our method efficiently utilizes FPGA resource, a low-cost FPGA can be used to implement a high-performance regular expression matching circuit.

R4-2 (Time: 15:17 - 15:19)
TitleAcceleration of a SAT Based Solver for Minimum Cost Satisfiability Problems Using Optimized Boolean Constraint Propagation
Author*Xin Zhang (The Graduate School of Information, Production and Systems, Waseda University, Japan), Peilin Liu (Department of Electronic Engineering Shanghai Jiao Tong University, China), Shinji Kimura (The Graduate School of Information, Production and Systems, Waseda University, Japan)
Pagepp. 365 - 370
KeywordSAT-Solver, MinCostSAT, BCP
AbstractIn this paper, we have shown an efficient way to accelerate a SAT based Solver for Minimum-cost problems with fractional coefficients by using Early Conflict Detection based BCP. Various benchmarks are evaluated and our solver with optimized BCP proves to gain significant acceleration rate, especially for satisfiable formulas.

R4-3 (Time: 15:19 - 15:21)
TitleCircuit Synthesis for Fast Memory Access in System LSI
Author*Kazuya Kishida, Takashi Kambe (Kinki University, Japan)
Pagepp. 371 - 376
KeywordBehavioral synthesis, Memory Access, System LSI, pipelining, on-chip array
AbstractHigh Level Design methodologies are becoming more and more important in the design of Large system LSI devices. Behavioral synthesis from C and other high level languages is a key to achieving the productivity required by such large designs. For memory intensive applications in particular the automatic identification, optimization and synthesis of memory access operations is essential. In this paper, some synthesis techniques for memory access logic are described. The three kinds of memory access method examined are (a) variable registerization (b) memory access pipelining and (c) on-chip arraying. This approach is applied to a speech recognition algorithm and its effectiveness is evaluated.

R4-4 (Time: 15:21 - 15:23)
TitleClock Gating Optimization with Delay-Matching Cells
Author*Shih-Jung Hsu, Rung-Bin Lin (Yuan Ze University, Taiwan)
Pagepp. 377 - 382
KeywordClock gating, Type matching, Delay matching, Clock skew, Low power
AbstractClock gating is an effective method of reducing power dissipation of a high-performance circuit. However, deployment of gated cells increases the difficulty of synthesizing a low-skew gated tree. In this paper, we propose a delay-matching approach to addressing this problem. Delay-matching is achieved using gated cells whose timing characteristics are similar to that of their clock buffer (inverter) counterparts. Experimental results show that delay-matching attains better slew and much better latency with comparable clock skew when compared with type-matching approach. Delay-matching achieves all of these using much less area than type-matching does. Besides, the skew of a delay-matching gated tree, just like the one generated by type-matching, is insensitive to process and operating condition variations. Moreover, the slews of a delay-matching gated tree are less sensitive to process and operating condition variations than that of a type-matching gated tree.

R4-5 (Time: 15:23 - 15:25)
TitleDesign and Evaluation of Digital Receiver for Low Power Wireless Communication
Author*Kazuki Ohya, Keishi Sakanushi, Yoshinori Takeuchi, Masaharu Imai (Osaka University, Japan)
Pagepp. 383 - 388
KeywordDigital Radio Design, Near Distance Wireless Communication, Low Power
AbstractThis paper studies design and implementation trade-off of digital receivers for low power wireless communications. In these days, wireless communication becomes one of indispensable technologies in our daily life. For mobile terminal devices, which have been widely developed, low power and low energy are required. On the other hand, consideration for noise affects on transmissions are indispensable for wireless communication. According to the characteristics of noise on transmissions, suitable implementation considering power consumption should be varied. In this paper, we design digital circuits of receiver for RuBee, which is one of low power communication standards. We measured area, power consumption, and noise tolerance of designed digital receivers by changing noise characteristics. Experimental results show that trade-off between power consumption and noise tolerance exists and optimal implementation can be changed according to the noise characteristics.

R4-6 (Time: 15:25 - 15:27)
TitleDesign and Verification of an Ultra-Low-Power Active RFID Tag with Multiple Power Domains
Author*Kenichi Agawa, Massimo Alioto, Wenting Zhou, Tsung-Te Liu, Louis Alarcon, Kimiya Hajkazemshirazi, Mervin John, Jesse Richmond, Wen Li, Jan Rabaey (University of California at Berkeley, U.S.A.)
Pagepp. 389 - 394
KeywordRFID, ultra low power, system-level verification, analog-digital-mixed-signal design, wake-up scheme transceiver
AbstractAn ultra-low-power active RFID tag chip with multiple power and clock domains has been designed and verified. The chip employs a wake-up scheme to reduce its power consumption. It includes a wake-up analog regulator and oscillator as well as a digital power manager, synchronizer, and protocol processor. Thus, we need to verify a complicated wake-up procedure and analog-digital-mixed-signal system for its correct operations. Matlab Simulink models and simulations are explored for system-level verification, and verification time can be suppressed efficiently.

R4-7 (Time: 15:27 - 15:29)
TitleDevice Simulation and Experimental Measurement of High-Voltage Unified-CBiCMOS Buffer Driver for Ultra-High-Speed CCD Image Sensors
AuthorToshiaki Koike-Akino (Harvard University, U.S.A.), Takashi Hamahata, *Toshiro Akino, Takeharu Goji Etoh (Kinki University, Japan)
Pagepp. 395 - 400
KeywordLateral-BJT, CMOS, High-Speed, High-Voltage, High Capacitive Load
AbstractWe propose a high-performance buffer circuit termed a unified-complementary bipolar CMOS (U-CBiCMOS) technique for ultra-high-speed charge-coupled device (CCD) image sensors. In order to improve driving capability for realizing over-1M frames per second, the U-CBiCMOS buffer driver makes an effective use of a lateral BJT, which is inherent to an MOSFET structure. We designed a prototype of the U-CBiCMOS buffer driver which uses a trench isolated SOI-CMOS process with an asymmetric lightly-doped diffusion (LDD) for high-voltage applications. Through device simulations and experimental measurements, we reveal that the high-voltage CMOS process can offer a peak current gain of higher than 200. In addition, it is demonstrated that the driving performance of the U-CBiCMOS buffer circuit is significantly improved by activating the lateral BJT at a supply voltage of 15V and a load capacitance of 2nF.

R4-8 (Time: 15:29 - 15:31)
TitleEfficient Multiple Regular Expression Matching on FPGAs based on Extended SHIFT-AND Method
Author*Yusaku Kaneta, Shingo Yoshizawa, Shin-ichi Minato, Hiroki Arimura, Yoshikazu Miyanaga (Hokkaido University, Japan)
Pagepp. 401 - 406
Keywordregular expression, pattern matching, FPGA
AbstractIn this paper, we study efficient pattern matching over high-speed data streams on a reconfigurable hardware, FPGA (Field Programmable Gate Array) First, we introduce a subclass of regular expressions, called linear regular expressions. Secondly, for the subclass, we present a new architecture, static BP-NFA (static bit-parallel NFA) on FPGA based on one of the well-known pattern matching techniques, bit-parallel pattern matching. Thirdly, we give analysis on the complexity of our architecture. Finally, we show the experimental results on our hardware.

R4-9 (Time: 15:31 - 15:33)
TitleEnergy-Aware Partitioning Using a Multi-Objective Genetic Algorithm
AuthorLih-Yih Chiou, Yi-Siou Chen, *Ya-Lun Jian (National Cheng Kung University, Taiwan)
Pagepp. 407 - 411
KeywordPartitioning, Low Power
AbstractIncorporating power management during partitioning significantly contributes to energy efficiency architecture. A partitioning approach that partitions a system with the maximum amount of idle time appears to be the most practical for power management, in which the efficiency of power saving inevitably decreases since the partitioning block size becomes unbalance. This work presents a novel energy-aware hardware clustering algorithm in an architecture partitioning domain using a multi-objective genetic algorithm. The proposed algorithm reduces the communication power and component power based on use of an energy estimation approach. Experimental results based on the pareto-optimal solutions demonstrate the effectiveness of the proposed algorithm in generating the close optimal solution faster than the exhaustive approach.

R4-10 (Time: 15:33 - 15:35)
TitleAn Extension of Systolic Regular Expression Matching Hardware for Handling Iteration of Strings Using Quantifiers
Author*Yoichi Wakaba, Masato Inagi, Shin'ichi Wakabayashi, Shinobu Nagayama (Hiroshima City University, Japan)
Pagepp. 412 - 417
Keywordregular expression, string matching, NIDS, quantifier, FPGA
AbstractWe propose systolic regular expression matching hardware that can handle iteration of subclasses of regular expression (e.g. union of strings,) which cannot directly be handled by conventional systolic hardware. Our proposed hardware handles these patterns by using shift registers. FPGA implementation results showed that the proposed method significantly reduces the number of LUTs required to handle those patterns compared with the conventional one, in which these patterns are expanded into much longer ones.

R4-11 (Time: 15:35 - 15:37)
TitleA Novel Timing Synchronization Method for Fast and Accurate Multi-Core Instruction-Set Simulators
AuthorMeng-Huan Wu, *Fan-Wei Yu, Cheng-Yang Fu, Peng-Chih Wang, Ren-Song Tsay (National Tsing Hua University, Taiwan)
Pagepp. 418 - 423
KeywordInstruction-set simulator, binary translation, multi-core, synchronization
AbstractThis paper proposes a timing synchronization method for fast and accurate Multi-Core Instruction-Set Simulation (MCISS). In order to achieve accurate simulation results of MCISS, a lock-step approach, which synchronizes every cycle, is commonly used. However, this approach introduces immense overhead and lowers the simulation speed. Instead of synchronizing every cycle, our ap-proach synchronizes based on the data dependency among the simulated programs. Therefore, the synchronization overheads can be highly reduced with accurate simulation results. With the proposed approach, the simulation speed of MCISS is up to 40 ~ 1,000 million instructions per second (MIPS) in general. Our major contribution is on clarifying the data dependency issue of multi-core system. The experimental results show that the MCISS using our simulation method can perform fast and accurate simulation.

R4-12 (Time: 15:37 - 15:39)
TitleA Power Efficient Unified Gated Flip-Flop
Author*Takumi Okuhira, Tohru Ishihara (Kyushu University, Japan)
Pagepp. 424 - 429
KeywordLow power design, flip-flop, clock gating
AbstractSince the clock power consumption in today's processors is considerably large, reducing the clock power consumption contributes to the reduction of the total power consumption in the processors. Recently, a gated flip-flop is proposed for reducing the clock power consumption of flip-flop circuits. The gated flip-flop employs a clock-gating circuit which cuts off an internal clock signal if the data stored in the flip-flop does not need to be updated. Although this reduces the clocking power consumption, the power dissipated in the clock-gating circuit is still large. For reducing the power dissipated in the clock-gating circuit, this paper proposes a technique for unifying the multiple clock-gating circuits, which reduces the overhead of the clock-gating circuit. Power measurement results obtained using a test chip demonstrate that our unified gated flip-flop reduces the power consumption of register circuits by 45% compared to the conventional gated flip-flops if the state transition probability of the flip-flop is 0.1 which is an average state transition probability of flip-flops in a commercial microprocessor used in our experiments.

R4-13 (Time: 15:39 - 15:41)
TitleQuantitative Graph-Based Minimal Queue Sizing for Throughput Optimization in Latency-Insensitive Designs
AuthorJuinn-Dar Huang, *Yi-Hang Chen, Ya-Chien Ho (National Chiao Tung University, Taiwan)
Pagepp. 430 - 435
KeywordLatency-Insensitive System, Latency-Insensitive-design, Throughput Optimization, Queue Size Minimization, Integer Linear Programming
AbstractAs manufacturing processes are constantly moving toward very deep submicron (VDSM), global interconnect delay is becoming one of the most critical performance obstacles in system-on-chip (SoC) designs today. Latency-insensitive-design (LID) methodology, which enables multicycle communication to tolerate latency variation at late stages of the design process without substantially modifying pre-designed IP cores, has been proposed accordingly to conquer this issue. However, the system throughput is still degraded due to imbalanced interconnect latency and communication back-pressure residing in an LID. In this paper, a performance optimization technique with minimal queue sizing is presented. We first model a given LID as a newly proposed quantitative graph (QG), which can be further compacted using the proposed compaction techniques, so that much bigger problems can be handled. On top of QG, integer linear programming is applied to achieve the exact solution with minimal queue size based on the proposed constraint formulation in a reasonable runtime. The experimental results show that our approach can deal with moderately large latency-incentive systems in an acceptable runtime and save about 28% of queues as compared to the prior art.

R4-14 (Time: 15:41 - 15:43)
TitleA Reconfigurable Layout Method and Evaluation for Network On Chip
Author*Yuichi Nakamura (NEC Corp., Japan), Marcello Lajolo (NEC Labs. America, U.S.A.)
Pagepp. 436 - 441
KeywordNoC, Layout, Reconf
AbstractThis paper presents a reconfigurable layout method for Networks-on-Chip (NoCs) based on partial re-layout. Currently, the layout design time is quite significant, because of the complexity involved with the verification of timing and signal integrity constraints. This limits the possibility to perform incremental changes at the physical design stage. However, a NoC which connects IP cores by network interfaces can be easily reconfigured during place and route. In general, a strict hierarchical design method can provide ease of reconfigurability, but it results in worse area and timing with respect to a flat layout method, which, on the other hand, does not provide reconfigurability. In this paper, we propose a rough hierarchical layout which combines the benefits of both hierarchical and flat layout design styles. Experimental results show area and performance numbers similar to the ones achieved by a flat layout as well as reconfigurability characteristics similar to the ones provided by a strict hierarchical layout.

R4-15 (Time: 15:43 - 15:45)
TitleRER: a Tuning Tool for Implementing a Computational Pipeline Across Multiple FPGAs
AuthorHirokazu Morishita, *Kenta Inakagata (Keio University, Japan), Yasunori Osana (Seikei University, Japan), Naoyuki Fujita (JAXA, Japan), Hideharu Amano (Keio University, Japan)
Pagepp. 442 - 447
KeywordFPGA, Optimization, CFD
AbstractRER (Resource Estimation and Recon figuration) is a quick optimization tool for arithmetic units of Xilinx IP cores and facilitates partitioning deep computational pipelines across multiple FPGAs. It generates an appropriate computational pipeline structure by composing the optimal con figuration of IP cores, from the designer's parameters and database of hardware amount for all possible configurations of the arithmetic IP cores. As an example, complicated pipeline used in MUSCL algorithm in computational Fluid dynamics is divided into two FPGAs, considering the results of optimization by RER. For any partition candidates, 5.46x times or more speed up compared to the software on Intel Core 2 Duo processor had achieved.

R4-16 (Time: 15:45 - 15:47)
TitleSoft-error Tolerability Analysis for Triplicated Circuit on an FPGA
Author*Yoshihiro Ichinomiya, Motoki Amagasaki, Morihiro Kuga, Toshinori Sueyoshi (Kumamoto University, Japan)
Pagepp. 448 - 453
KeywordFPGA, soft-error, TMR
AbstractSRAM-based Field Programmable Gate Arrays (FPGAs) are vulnerable to a single event up- set(SEU), which is induced by radiation effects. In order to improve the robustness against a SEU, many dependable design techniques were studied. Also, the reliability analysis become important. In this work, we propose the reliability estimation techniques for triplicated circuits by using conditional probability.

R4-17 (Time: 15:47 - 15:49)
TitleA Tile Based Reconfigurable Architecture with Dual ALU-array/Processor Operating Mode Capability
AuthorShin'ichi Kouyama, Masayuki Hiromoto (Kyoto University, Japan), Yukihiro Nakamura (Ritsumeikan University, Japan), *Hiroyuki Ochi (Kyoto University, Japan)
Pagepp. 454 - 459
KeywordCoarse-grained Reconfigurable Architecture
AbstractALU-based coarse-grained reconfigurable devices are suitable for accelerating simple stream processing by pipelining and parallelization. They are, however, not efficiently used for sequential processing with complicated controls. In this paper, we propose a tile based dynamically reconfigurable architecture in which each tile has dual ALU-array/RISC processor operating mode capability. The ratio of ALU-based hardware accelerator and RISC processor counterpart can be dynamically changed depending on the dominant computation in the application in order to maximize the efficiency.

R4-18 (Time: 15:49 - 15:51)
TitleVLSI Architecture of V-AMDF based Pitch Detection for Tonal Speech Recognizer
Author*Jirabhorn Chaiwongsai, Werapon Chiracharit, Kosin Chamnongthai (King Mongkut’s University of Technology Thonburi, Thailand), Yoshikazu Miyanaga (Hokkaido University, Japan), Kohji Higuchi (University of Electro-Communications, Japan)
Pagepp. 460 - 465
Keywordtone classification function, V-AMDF, 4-stage pipeline process
AbstractTonal speech recognizer needs tone classification function to guarantee the word correctness. However, tone classification function generally has high computation part because there are many repeated parameters. This paper proposes AMDF-based 4-stage pipeline process VLSI architecture of tone classification function for tonal speech recognizer (TONE-SPEC). The architecture is divided into pitch detection, fundamental frequency extraction and tone decision processes. In the speech recognition, only vowel signal is sent though the pitch detection process so that the sample number is reduced. AMDF detects the periodicity of vowel speech by using 4-stage pipeline process. After that, fundamental frequency is extracted and then tone result is decided by finding look-up table maximum vote in tone decision process. To evaluate the performance of the proposed architecture, the experiment is set and tested with 37 Thai words. The results show the improvement of processing time, comparing to the conventional method.