SASIMI2010 Technical Program

The 16th Workshop on Synthesis And System Integration of Mixed Information Technologies

Paper Session I: System Level Design and Design Experience (I)
Time: 10:15 - 12:00 Monday, October 18, 2010
Location: Ballroom
Chairs: Rung-Bin Lin (Yuan Ze University, Taiwan), Youhua Shi (Waseda University, Japan)

R1-1 (Time: 10:15 - 10:17)

Title	Placing Static and Stack Data into a Scratch-Pad Memory for Reducing the Energy Consumption of Multi-task Applications
Author	*Lovic Gauthier, Tohru Ishihara (Kyushu University, Japan), Hideki Takase (Nagoya University, Japan), Hiroyuki Tomiyama (Ritsumeikan University, Japan), Hiroaki Takada (Nagoya University, Japan)
Page	pp. 7 - 12
Keyword	Energy consumption, Scratch-pad memory, Software, Multi-task, Stack
Abstract	Scratch-pad memories (SPM) are on-chip memory devices which are much smaller but much faster and which consume much less energy than off-chip memories. This paper presents two fully software techniques for respectively sharing the SPM among several tasks and managing the stacks of each task between the SPM and the external main memory (MM). The paper then explains then how to merge efficiently these techniques for achieving further energy consumption reduction.

R1-2 (Time: 10:17 - 10:19)

Title	Aggressive Register Unsharing with Selective FU Sharing in High-Level Synthesis
Author	*Yuko Hara-Azumi, Toshinobu Matsuba (Nagoya University, Japan), Hiroyuki Tomiyama (Ritsumeikan University, Japan), Shinya Honda, Hiroaki Takada (Nagoya University, Japan)
Page	pp. 13 - 18
Keyword	High-Level Synthesis, Behavioral Synthesis, Aggressive Register Unsharing, Selective FU Sharing, Register Retiming
Abstract	A novel high-level synthesis technique to improve the clock frequency with little area overhead is presented. Our technique aims at suppressing area overhead while keeping clock frequency as high as an existing work which achieves the highest clock frequency. Our proposed method performs selective functional unit (FU) sharing, which shares only large FUs in order to efficiently save circuit area and multiplexer (MUX) insertion, based on an existing technique called aggressive register unsharing, which significantly removes MUXs inserted before registers. Moreover, we propose hardware-component-level register retiming, which shortens critical path delays more effectively than the traditional logic-level register retiming. Three sets of experiments demonstrated that our proposed method achieved up to 37.8% and on average 15.7% area reduction with negligible clock frequency degradation from the existing work.

R1-3 (Time: 10:19 - 10:21)

Title	Automatic Generation for Efficient Software TLM at Multiple Abstraction Layers
Author	Meng-Huan Wu, *Yi-Shan Lu, Wen-Chuan Lee, Chen-Yu Chuang, Ren-Song Tsay (Department of Computer Science, National Tsing Hua University, Taiwan)
Page	pp. 19 - 24
Keyword	hw/sw co-simulation, software abstraction
Abstract	We in this paper propose a software Transaction-Level Model-ing (TLM) approach to co-simulate HW/SW efficiently. To keep the concurrency in the simulated system, timing synchronization should be considered carefully in HW/SW co-simulation between hardware and software simulations. Nevertheless, improper timing synchronization leads to either poor simulation performance or inaccurate simulation result. Our approach achieves accurate yet efficient HW/SW co-simulation due to that we perform timing synchronization only at points where HW and SW actually interact. In addition, given the target software, three abstraction levels of software TLM models can be generated automatically based on the type of interactions concerned. The experimental results show that the speed of our software TLM models achieves 3 million instructions per second (MIPS) for low abstraction level, and goes higher up to 248 MIPS for higher abstraction levels. Hence, designers can leverage our approach to have an efficient HW/SW co-simulation by simply selecting proper abstraction layers which fit their needs.

R1-4 (Time: 10:21 - 10:23)

Title	Evaluation of Two Operating Systems for Lego Mindstorms NXT
Author	*Wing-Kwong Wong (Department of Electronic Engineering, National Yunlin University of Science and Technology, Taiwan), Fu-Hsien Lin (Graduate School of Electronic Engineering, National Yunlin University of Science and Technology, Taiwan)
Page	pp. 25 - 30
Keyword	Embedded systems, NxtOSEK, MicroC/OS, Lego Mindstorms NXT, Operating systems
Abstract	Lego Mindstorms NXT is used as a hardware platform for comparing two embedded operating systems (OS). NxtOSEK is available as an open-source project that includes both device drivers and an OS kernel. We have successfully ported MicroCOS to replace the NxtOSEK kernel but the device drivers are kept. Following previous works on the evaluation of embedded operating systems, we use a number of measurements with a software approach to evaluate the performance of NxtOSEK and MicroCOS, including preemptive scheduling, interrupt preemption, get/release semaphore, semaphore passing and memory allocation. MicroCOS performed significantly better in two aspects and its kernel mechanisms are examined in detail in order to explain the speedup compared to NxtOSEK.

R1-5 (Time: 10:23 - 10:25)

Title	Concord: A Configurable SoC Prototyping Platform
Author	Chih-Chyau Yang, *Chen-Yen Lin, Hui-Ming Lin, Yui-Chih Shih, Hsi-Tse Wu, Shi-Lun Chen, Tien-Ching Wang, Chien-Ming Wu, Chun-Ming Huang, Chin-Long Wey (National Chip Implementation Center, Taiwan)
Page	pp. 31 - 36
Keyword	SoC prototyping, CONCORD, verification platform
Abstract	FPGA-based SoC verification boards have been commercially available for SoC verification prototyping. However, most of these boards were developed with fixed hardwired architectures. Due to the lack of architectural flexibility, users are not allowed to develop with on-chip-buses and on-chip-networks, and to alter the architecture for specific applications. In addition, the system architecture under the FPGA-based SoC system may differ from the real chip. This paper presents a fully configurable SoC prototyping platform, namely, CONCORD, which provides high flexibility in connection interfaces, high flexibility and high architectural compatibility for design changes, and high modularity for specific applications. In order to demonstrate the effectiveness of the developed CONCORD verification platform, this paper also presents three configurations for the embedded systems with the most popular cores, such as ARM, OpenRISC, and LEON.

R1-6 (Time: 10:25 - 10:27)

Title	Generation Method of Decomposed Small Area Instruction Decoder for Configurable Processor
Author	*Hiroki Ohsawa, Hirofumi Iwato, Keishi Sakanushi, Yoshinori Takeuchi, Masaharu Imai (Osaka University, Japan)
Page	pp. 37 - 41
Keyword	small area instruction decoder, configurable processor, ASIP
Abstract	This paper studies a generation method of decomposed small area instruction decoder for configurable processor. Since Application Specific Instruction set Processor (ASIP) is widely used in embedded systems, ASIPs are required to be designed to have further smaller area, higher performance, and lower power consumption. This paper proposes a generation method of small area instruction decoder by using decomposed instruction decoder model. In this paper, we pay attention to the number of the pipeline registers in the controller. Proposed method minimizes the number of the pipeline registers by generating control signals on two or more stages. Experimental results show that proposed method achieves 85 % reduction of pipeline register for control signals in controller compared to the conventional method.

R1-7 (Time: 10:27 - 10:29)

Title	A High-speed VLSI Architecture of Output Probability and Likelihood Score Computations for HMM-based Recognition Systems
Author	*Ryo Shimazaki, Kazuhiro Nakamura, Mashatoshi Yamamoto, Kazuyoshi Takagi (Nagoya University, Japan), Naofumi Takagi (Kyoto University, Japan)
Page	pp. 42 - 47
Keyword	speech recognition, VLSI architecture, HMM, likelihood score computation, output probability computation
Abstract	We present a VLSI architecture for output probability computations (OPCs) of continuous HiddenMarkovModels (HMMs) and likelihood scorer computations (LSCs) which supports store-based block parallel processing (StoreBPP). We also demonstrate fast store-based block parallel processing (FastStoreBPP) which exploits full performance of the StoreBPP and present a high-speed VLSI architecture that supports it. A comparison demonstrates the efficiency of the architecture.

R1-8 (Time: 10:29 - 10:31)

Title	Improved Local Horizontal and Vertical Common Subexpression Elimination Method for Constant Multiple Multiplication
Author	*Yasuhiro Takahashi, Toshikazu Sekine (Gifu University, Japan), Michio Yokoyama (Yamagata University, Japan)
Page	pp. 48 - 53
Keyword	multuplierless filter, common subexpression elimination, constant multiplication
Abstract	The common subexpression elimination (CSE) techniques address the issue of minimizing the number of adders needed to implement the multiple constant multiplication (MCM) blocks. In this paper, we propose a new CSE method using a combining horizontal and vertical technique. The proposed method searches firstly the frequency of higher order horizontal common subexpression, i.e., 3-5 bits, and then searches vertical. Our simulation results show that our method others a good tradeoff between the implementation cost and the synthesis run-time in comparison with conventional methods.

R1-9 (Time: 10:31 - 10:33)

Title	Improved Normalized Image Reconstruction for Iris Recognition
Author	*Hyo Jin Nam, Harsh Durga Tiwari, Yong Beom Cho (Konkuk University, Republic of Korea)
Page	pp. 54 - 57
Keyword	Iris recognition, Segmentation process, Normalization, Intel PXA255
Abstract	Iris recognition is one of the most common identification system used now-a-days. Compared with other biometric features such fingerprint and face, Iris patterns are more reliable and stable. In order to compensate the variation, common iris recognition requires the translation of the segmented iris image to the normalized image. This paper focuses on the implementation of improved normalized image formation by employing modified segmentation method which can reduce the time of execution by ten times.

R1-10 (Time: 10:33 - 10:35)

Title	Inter-Island Delay Aware Communication Synthesis for Island-Based Distributed Register Architecture
Author	Juinn-Dar Huang, *Chia-I Chen, Wan-Ling Hsu, Yen-Ting Lin, Jing-Yang Jou (Department of Electronics Engineering and Institute of Electronics, National Chiao Tung University, Taiwan)
Page	pp. 58 - 63
Keyword	Behavioral synthesis, distributed register-file, resource binding, scheduling
Abstract	In deep-submicron era, wire delay is becoming the bottleneck while pursuing high system clock speed. Several distributed register (DR) architectures are proposed to cope with this problem by keeping most wires local. In this paper, a distributed register-file microarchitecture with inter-island delay (DRFM-IID) is proposed. Though DRFM-IID is also one of the DR-based architectures, it is more practical than the prior art, DRFM, in terms of delay model. With such interconnect delay consideration, synthesis task is inherently more complicated than the one with zero inter-island delay. The unexpected interconnect delay is very likely to make a serious impact on the whole system performance due to lengthened clock cycle time. Hence we also provide a performance-driven architectural synthesis framework targeting DRFM-IID to optimize the system performance. Multiple factors, such as the number of inter-island transfers, criticality of transfer, and resource utilizations, are considered to obtain a better solution. The experimental results indicate that the latency and the number of inter-cluster transfers can on average be reduced by 26.91% and 37.54% respectively, whereas the latter is also widely used as a metric of communication power consumption.

R1-11 (Time: 10:35 - 10:37)

Title	MorFPGA: A Modularized FPGA-Based Embedded System Development Platform
Author	Yu-Tsang Chang, Chun-Ming Huang, Chien-Ming Wu, Chun-Yu Chen, *Yu-Sheng Lin, Chih-Ting Kuo, Ting-Chun Liu, Chin-Long Wey (National Chip Implementation Center, Taiwan)
Page	pp. 64 - 69
Keyword	Embedded System, SoC, FPGA, Modularized Structure, LEON3
Abstract	With the ever increasing complexity of System-on-a-chip (SoC), the pressures of short time to market, and low cost requirements, the platform-based design paradigms have been commonly used for SoC designs. Modular and flexible design becomes important features for enhancing expandability and re-configurability of the system. This paper presents a modularized FPGA-based embedded system platform for digital photo frame application with the open source processor core, LEON3. An extra touch panel module, which is not natively supported by the LEON3 GRLIB library, is introduced and successfully integrated in this application.

R1-12 (Time: 10:37 - 10:39)

Title	A Novel Design-Methodology for PCB Traces Ensuring High Signal-Integrity on Random Signals
Author	*Masami Ishiguro, Shohei Akita, Hiroki Shimada, Noriyuki Aibe (University of Tsukuba, Japan), Ikuo Yoshihara (University of Miyazaki, Japan), Moritoshi Yasunaga (University of Tsukuba, Japan)
Page	pp. 70 - 75
Keyword	Signal Integrity, Transmission Line, Random Signal
Abstract	We have already proposed a novel transmission line called “Segmental Transmission Line (STL)”, which can ensure high signal integrity of high-speed signals in the PCB traces. Up to now, however, the design methodology of STL has limited to the clock signals. In this paper, we propose a novel design methodology of the STL for the random signals, and fabricate a scale-up prototype based on the proposed methodology. We also demonstrate its effectiveness using the prototype compared with the conventional transmission line.

R1-13 (Time: 10:39 - 10:41)

Title	A Novel IR-Drop Tolerant Scheduling for Reliability-Aware Datapaths
Author	*Keisuke Inoue, Mineo Kaneko (Japan Advanced Institute of Science and Technology, Japan)
Page	pp. 76 - 81
Keyword	datapath synthesis
Abstract	In this paper, we discuss robustness against IR-drop risk in high-level. We propose a new IR-drop model which has two phases. In the first phase, the increase in functional unit delay occurs, and in the second phase, a fatal error due to electomigration occurs. We handle the first phase by inserting timing margin, and forbid to reach the second phase in any control step. Based on the IR-drop model, We formulate our problem as a scheduling problem. Our scheduling-based approach has robustness against IR-drop not using the specialized devices, e.g. multiple supply voltage.

R1-14 (Time: 10:41 - 10:43)

Title	A Physics-Based Compact Model for the 1/f Noise in p-type Si/SiGe/Si Heterostructure MOSFETs
Author	*Chia-Yu Chen (Stanford University, U.S.A.), Chi-Chao Wang, Yun Ye (Arizona State University, U.S.A.), Yang Liu (Stanford University, U.S.A.), Junko Sato-Iwanaga, Akira Inoue, Haruyuki Sorada (Panasonic Electronics, Japan), Yu Cao (Arizona State University, U.S.A.), Robert Dutton (Stanford University, U.S.A.)
Page	pp. 82 - 83
Keyword	1/f noise, screening effect, SiGe p-HMOS, compact model, heterostructure
Abstract	A physics-based p-type Si/SiGe/Si heterostructure MOSFET (SiGe p-HMOS) 1/f noise model that can predict charge distribution in dual channels and calculate noise contributions from two channels in circuit simulators is developed. 1/f noise behavior in SiGe p-HMOS can be modeled in cooperating the capacitance of a Si cap layer into a conventional MOS and considering dual-channel screening effects. Based on the proposed model, excellent agreement among the compact model, TCAD simulations and measurements is observed at different bias conditions.

R1-15 (Time: 10:43 - 10:45)

Title	On Behavioral Modeling for Sigma-Delta Digital-to-Analog Converters with Accurate Timing Response
Author	*Hsin-Yu Luo, Hsiu-Wen Li, Xiao-Qian Chang, Chien-Nan Jimmy Liu (National Central University, Taiwan)
Page	pp. 84 - 89
Keyword	sigma-delta DAC, Behavioral model, bottom-up extraction
Abstract	In this paper, an efficient bottom-up extraction approach is proposed to build accurate behavioral models for sigma-delta digital-to-analog converters (DAC). In the special extraction mode, specific patterns can be used to obtain the key circuit parameters of the design in a short time without separating this design into several sub-blocks. Actual loading effects and parasites can be considered automatically, which makes our modeling approach more suitable for existing IPs and flattened post-layout designs. In the experiments, the comparison results between our behavioral model, top-down behavioral model and HSPICE simulation have demonstrated the accuracy and efficiency of the proposed modeling strategy

R1-16 (Time: 10:45 - 10:47)

Title	Self-Tuning Metric and Control Policy to Optimally Trade-off Lifetime Performance-Power-Reliability
Author	*Evelyn Mintarno, Joelle Skaf (Stanford University, U.S.A.), Rui Zheng, Jyothi Velamala, Yu Cao (Arizona State University, U.S.A.), Stephen Boyd, Robert W. Dutton, Subhasish Mitra (Stanford University, U.S.A.)
Page	pp. 90 - 95
Keyword	Circuit aging, Energy efficiency, Reliability
Abstract	An optimization framework and control policies are presented to find the optimal self-tuning over lifetime which guarantees functional operation in the presence of circuit aging and optimally trade-off performance, power, and reliability over lifetime. A weighted function of total performance achieved, total energy consumed, and total reliability is considered as a metric to be maximized, subject to constraints imposed by the user and underlying hardware. Self-tuning policies for both offline and online aging estimation methods are described. Dynamic cooling is introduced as one of the self-tuning parameters, in addition to supply voltage and clock frequency. Simulation results using aging models validated by 45nm CMOS stress measurements demonstrate the effectiveness and practicality of the approach.

R1-17 (Time: 10:47 - 10:49)

Title	A Throughput-aware BusMesh NoC Configuration Algorithm Utilizing the Communication Rate between IP Cores
Author	*SeungJu Lee, Masao Yanagisawa, Tatsuo Ohtsuki, Nozomu Togawa (Waseda University, Japan)
Page	pp. 96 - 101
Keyword	Network-on-Chip (NoC), BusMesh NoC (BMNoC), A novel NoC algorithm, BMNoC configuration algorithm
Abstract	Busmesh NoC (BMNoC) is comprised of bus-based connection and global mesh routers to enhance the performance of on-chip communication. In this paper, we propose a BMNoC configuration algorithm together with simulation results. In BMNoC configuration algorithm, IP cores which have a heavy communication rate between them are connected by a bus and then we configure CNs. CNs can have communication to each other via ESes and MRs. Furthermore, the simulation results illustrate the better latency than earlier studies and feasibility of BMNoC.

R1-18 (Time: 10:49 - 10:51)

Title	TSV-constrained Scan Chain Reordering for 3D ICs
Author	Wei-Ting Chen, Chia-Ching Chang, *Charles H.-P. Wen (National Chiao Tung University, Taiwan)
Page	pp. 102 - 107
Keyword	3D ICs, TSV, Scan Testing
Abstract	This paper formulates the scan-chain reordering problem considering a limited number of through-silicon vias (TSVs), and further develops an efficient 2-stage algorithm. For three-dimensional optimization, a greedy algorithm named Multiple Fragment Heuristic combined with a dynamic closest-pair data structure FastPair is proposed to derive a good initial solution at stage 1. Later, stage 2 proceeds two local refinements 3D Planarization and 3D Relaxation to reduce the wire cost and the number of TSVs in use, respectively. Experiments show that the proposed algorithm can result in a comparable performance to a genetic-algorithm-based method but can run at least 3-order faster, which evidently makes it more practical for TSV-constrained scan-chain reordering for 3D ICs.