(Back to Session Schedule)

The 16th Workshop on Synthesis And System Integration of Mixed Information Technologies

Paper Session I: System Level Design and Design Experience (I)
Time: 10:15 - 12:00 Monday, October 18, 2010
Location: Ballroom
Chairs: Rung-Bin Lin (Yuan Ze University, Taiwan), Youhua Shi (Waseda University, Japan)

R1-1 (Time: 10:15 - 10:17)
TitlePlacing Static and Stack Data into a Scratch-Pad Memory for Reducing the Energy Consumption of Multi-task Applications
Author*Lovic Gauthier, Tohru Ishihara (Kyushu University, Japan), Hideki Takase (Nagoya University, Japan), Hiroyuki Tomiyama (Ritsumeikan University, Japan), Hiroaki Takada (Nagoya University, Japan)
Pagepp. 7 - 12
KeywordEnergy consumption, Scratch-pad memory, Software, Multi-task, Stack
AbstractScratch-pad memories (SPM) are on-chip memory devices which are much smaller but much faster and which consume much less energy than off-chip memories. This paper presents two fully software techniques for respectively sharing the SPM among several tasks and managing the stacks of each task between the SPM and the external main memory (MM). The paper then explains then how to merge efficiently these techniques for achieving further energy consumption reduction.

R1-2 (Time: 10:17 - 10:19)
TitleAggressive Register Unsharing with Selective FU Sharing in High-Level Synthesis
Author*Yuko Hara-Azumi, Toshinobu Matsuba (Nagoya University, Japan), Hiroyuki Tomiyama (Ritsumeikan University, Japan), Shinya Honda, Hiroaki Takada (Nagoya University, Japan)
Pagepp. 13 - 18
KeywordHigh-Level Synthesis, Behavioral Synthesis, Aggressive Register Unsharing, Selective FU Sharing, Register Retiming
AbstractA novel high-level synthesis technique to improve the clock frequency with little area overhead is presented. Our technique aims at suppressing area overhead while keeping clock frequency as high as an existing work which achieves the highest clock frequency. Our proposed method performs selective functional unit (FU) sharing, which shares only large FUs in order to efficiently save circuit area and multiplexer (MUX) insertion, based on an existing technique called aggressive register unsharing, which significantly removes MUXs inserted before registers. Moreover, we propose hardware-component-level register retiming, which shortens critical path delays more effectively than the traditional logic-level register retiming. Three sets of experiments demonstrated that our proposed method achieved up to 37.8% and on average 15.7% area reduction with negligible clock frequency degradation from the existing work.

R1-3 (Time: 10:19 - 10:21)
TitleAutomatic Generation for Efficient Software TLM at Multiple Abstraction Layers
AuthorMeng-Huan Wu, *Yi-Shan Lu, Wen-Chuan Lee, Chen-Yu Chuang, Ren-Song Tsay (Department of Computer Science, National Tsing Hua University, Taiwan)
Pagepp. 19 - 24
Keywordhw/sw co-simulation, software abstraction
AbstractWe in this paper propose a software Transaction-Level Model-ing (TLM) approach to co-simulate HW/SW efficiently. To keep the concurrency in the simulated system, timing synchronization should be considered carefully in HW/SW co-simulation between hardware and software simulations. Nevertheless, improper timing synchronization leads to either poor simulation performance or inaccurate simulation result. Our approach achieves accurate yet efficient HW/SW co-simulation due to that we perform timing synchronization only at points where HW and SW actually interact. In addition, given the target software, three abstraction levels of software TLM models can be generated automatically based on the type of interactions concerned. The experimental results show that the speed of our software TLM models achieves 3 million instructions per second (MIPS) for low abstraction level, and goes higher up to 248 MIPS for higher abstraction levels. Hence, designers can leverage our approach to have an efficient HW/SW co-simulation by simply selecting proper abstraction layers which fit their needs.

R1-4 (Time: 10:21 - 10:23)
TitleEvaluation of Two Operating Systems for Lego Mindstorms NXT
Author*Wing-Kwong Wong (Department of Electronic Engineering, National Yunlin University of Science and Technology, Taiwan), Fu-Hsien Lin (Graduate School of Electronic Engineering, National Yunlin University of Science and Technology, Taiwan)
Pagepp. 25 - 30
KeywordEmbedded systems, NxtOSEK, MicroC/OS, Lego Mindstorms NXT, Operating systems
AbstractLego Mindstorms NXT is used as a hardware platform for comparing two embedded operating systems (OS). NxtOSEK is available as an open-source project that includes both device drivers and an OS kernel. We have successfully ported MicroCOS to replace the NxtOSEK kernel but the device drivers are kept. Following previous works on the evaluation of embedded operating systems, we use a number of measurements with a software approach to evaluate the performance of NxtOSEK and MicroCOS, including preemptive scheduling, interrupt preemption, get/release semaphore, semaphore passing and memory allocation. MicroCOS performed significantly better in two aspects and its kernel mechanisms are examined in detail in order to explain the speedup compared to NxtOSEK.

R1-5 (Time: 10:23 - 10:25)
TitleConcord: A Configurable SoC Prototyping Platform
AuthorChih-Chyau Yang, *Chen-Yen Lin, Hui-Ming Lin, Yui-Chih Shih, Hsi-Tse Wu, Shi-Lun Chen, Tien-Ching Wang, Chien-Ming Wu, Chun-Ming Huang, Chin-Long Wey (National Chip Implementation Center, Taiwan)
Pagepp. 31 - 36
KeywordSoC prototyping, CONCORD, verification platform
AbstractFPGA-based SoC verification boards have been commercially available for SoC verification prototyping. However, most of these boards were developed with fixed hardwired architectures. Due to the lack of architectural flexibility, users are not allowed to develop with on-chip-buses and on-chip-networks, and to alter the architecture for specific applications. In addition, the system architecture under the FPGA-based SoC system may differ from the real chip. This paper presents a fully configurable SoC prototyping platform, namely, CONCORD, which provides high flexibility in connection interfaces, high flexibility and high architectural compatibility for design changes, and high modularity for specific applications. In order to demonstrate the effectiveness of the developed CONCORD verification platform, this paper also presents three configurations for the embedded systems with the most popular cores, such as ARM, OpenRISC, and LEON.

R1-6 (Time: 10:25 - 10:27)
TitleGeneration Method of Decomposed Small Area Instruction Decoder for Configurable Processor
Author*Hiroki Ohsawa, Hirofumi Iwato, Keishi Sakanushi, Yoshinori Takeuchi, Masaharu Imai (Osaka University, Japan)
Pagepp. 37 - 41
Keywordsmall area instruction decoder, configurable processor, ASIP
AbstractThis paper studies a generation method of decomposed small area instruction decoder for configurable processor. Since Application Specific Instruction set Processor (ASIP) is widely used in embedded systems, ASIPs are required to be designed to have further smaller area, higher performance, and lower power consumption. This paper proposes a generation method of small area instruction decoder by using decomposed instruction decoder model. In this paper, we pay attention to the number of the pipeline registers in the controller. Proposed method minimizes the number of the pipeline registers by generating control signals on two or more stages. Experimental results show that proposed method achieves 85 % reduction of pipeline register for control signals in controller compared to the conventional method.

R1-7 (Time: 10:27 - 10:29)
TitleA High-speed VLSI Architecture of Output Probability and Likelihood Score Computations for HMM-based Recognition Systems
Author*Ryo Shimazaki, Kazuhiro Nakamura, Mashatoshi Yamamoto, Kazuyoshi Takagi (Nagoya University, Japan), Naofumi Takagi (Kyoto University, Japan)
Pagepp. 42 - 47
Keywordspeech recognition, VLSI architecture, HMM, likelihood score computation, output probability computation
AbstractWe present a VLSI architecture for output probability computations (OPCs) of continuous HiddenMarkovModels (HMMs) and likelihood scorer computations (LSCs) which supports store-based block parallel processing (StoreBPP). We also demonstrate fast store-based block parallel processing (FastStoreBPP) which exploits full performance of the StoreBPP and present a high-speed VLSI architecture that supports it. A comparison demonstrates the efficiency of the architecture.

R1-8 (Time: 10:29 - 10:31)
TitleImproved Local Horizontal and Vertical Common Subexpression Elimination Method for Constant Multiple Multiplication
Author*Yasuhiro Takahashi, Toshikazu Sekine (Gifu University, Japan), Michio Yokoyama (Yamagata University, Japan)
Pagepp. 48 - 53
Keywordmultuplierless filter, common subexpression elimination, constant multiplication
AbstractThe common subexpression elimination (CSE) techniques address the issue of minimizing the number of adders needed to implement the multiple constant multiplication (MCM) blocks. In this paper, we propose a new CSE method using a combining horizontal and vertical technique. The proposed method searches firstly the frequency of higher order horizontal common subexpression, i.e., 3-5 bits, and then searches vertical. Our simulation results show that our method others a good tradeoff between the implementation cost and the synthesis run-time in comparison with conventional methods.

R1-9 (Time: 10:31 - 10:33)
TitleImproved Normalized Image Reconstruction for Iris Recognition
Author*Hyo Jin Nam, Harsh Durga Tiwari, Yong Beom Cho (Konkuk University, Republic of Korea)
Pagepp. 54 - 57
KeywordIris recognition, Segmentation process, Normalization, Intel PXA255
AbstractIris recognition is one of the most common identification system used now-a-days. Compared with other biometric features such fingerprint and face, Iris patterns are more reliable and stable. In order to compensate the variation, common iris recognition requires the translation of the segmented iris image to the normalized image. This paper focuses on the implementation of improved normalized image formation by employing modified segmentation method which can reduce the time of execution by ten times.

R1-10 (Time: 10:33 - 10:35)
TitleInter-Island Delay Aware Communication Synthesis for Island-Based Distributed Register Architecture
AuthorJuinn-Dar Huang, *Chia-I Chen, Wan-Ling Hsu, Yen-Ting Lin, Jing-Yang Jou (Department of Electronics Engineering and Institute of Electronics, National Chiao Tung University, Taiwan)
Pagepp. 58 - 63
KeywordBehavioral synthesis, distributed register-file, resource binding, scheduling
AbstractIn deep-submicron era, wire delay is becoming the bottleneck while pursuing high system clock speed. Several distributed register (DR) architectures are proposed to cope with this problem by keeping most wires local. In this paper, a distributed register-file microarchitecture with inter-island delay (DRFM-IID) is proposed. Though DRFM-IID is also one of the DR-based architectures, it is more practical than the prior art, DRFM, in terms of delay model. With such interconnect delay consideration, synthesis task is inherently more complicated than the one with zero inter-island delay. The unexpected interconnect delay is very likely to make a serious impact on the whole system performance due to lengthened clock cycle time. Hence we also provide a performance-driven architectural synthesis framework targeting DRFM-IID to optimize the system performance. Multiple factors, such as the number of inter-island transfers, criticality of transfer, and resource utilizations, are considered to obtain a better solution. The experimental results indicate that the latency and the number of inter-cluster transfers can on average be reduced by 26.91% and 37.54% respectively, whereas the latter is also widely used as a metric of communication power consumption.

R1-11 (Time: 10:35 - 10:37)
TitleMorFPGA: A Modularized FPGA-Based Embedded System Development Platform
AuthorYu-Tsang Chang, Chun-Ming Huang, Chien-Ming Wu, Chun-Yu Chen, *Yu-Sheng Lin, Chih-Ting Kuo, Ting-Chun Liu, Chin-Long Wey (National Chip Implementation Center, Taiwan)
Pagepp. 64 - 69
KeywordEmbedded System, SoC, FPGA, Modularized Structure, LEON3
AbstractWith the ever increasing complexity of System-on-a-chip (SoC), the pressures of short time to market, and low cost requirements, the platform-based design paradigms have been commonly used for SoC designs. Modular and flexible design becomes important features for enhancing expandability and re-configurability of the system. This paper presents a modularized FPGA-based embedded system platform for digital photo frame application with the open source processor core, LEON3. An extra touch panel module, which is not natively supported by the LEON3 GRLIB library, is introduced and successfully integrated in this application.

R1-12 (Time: 10:37 - 10:39)
TitleA Novel Design-Methodology for PCB Traces Ensuring High Signal-Integrity on Random Signals
Author*Masami Ishiguro, Shohei Akita, Hiroki Shimada, Noriyuki Aibe (University of Tsukuba, Japan), Ikuo Yoshihara (University of Miyazaki, Japan), Moritoshi Yasunaga (University of Tsukuba, Japan)
Pagepp. 70 - 75
KeywordSignal Integrity, Transmission Line, Random Signal
AbstractWe have already proposed a novel transmission line called “Segmental Transmission Line (STL)”, which can ensure high signal integrity of high-speed signals in the PCB traces. Up to now, however, the design methodology of STL has limited to the clock signals. In this paper, we propose a novel design methodology of the STL for the random signals, and fabricate a scale-up prototype based on the proposed methodology. We also demonstrate its effectiveness using the prototype compared with the conventional transmission line.

R1-13 (Time: 10:39 - 10:41)
TitleA Novel IR-Drop Tolerant Scheduling for Reliability-Aware Datapaths
Author*Keisuke Inoue, Mineo Kaneko (Japan Advanced Institute of Science and Technology, Japan)
Pagepp. 76 - 81
Keyworddatapath synthesis
AbstractIn this paper, we discuss robustness against IR-drop risk in high-level. We propose a new IR-drop model which has two phases. In the first phase, the increase in functional unit delay occurs, and in the second phase, a fatal error due to electomigration occurs. We handle the first phase by inserting timing margin, and forbid to reach the second phase in any control step. Based on the IR-drop model, We formulate our problem as a scheduling problem. Our scheduling-based approach has robustness against IR-drop not using the specialized devices, e.g. multiple supply voltage.

R1-14 (Time: 10:41 - 10:43)
TitleA Physics-Based Compact Model for the 1/f Noise in p-type Si/SiGe/Si Heterostructure MOSFETs
Author*Chia-Yu Chen (Stanford University, U.S.A.), Chi-Chao Wang, Yun Ye (Arizona State University, U.S.A.), Yang Liu (Stanford University, U.S.A.), Junko Sato-Iwanaga, Akira Inoue, Haruyuki Sorada (Panasonic Electronics, Japan), Yu Cao (Arizona State University, U.S.A.), Robert Dutton (Stanford University, U.S.A.)
Pagepp. 82 - 83
Keyword1/f noise, screening effect, SiGe p-HMOS, compact model, heterostructure
AbstractA physics-based p-type Si/SiGe/Si heterostructure MOSFET (SiGe p-HMOS) 1/f noise model that can predict charge distribution in dual channels and calculate noise contributions from two channels in circuit simulators is developed. 1/f noise behavior in SiGe p-HMOS can be modeled in cooperating the capacitance of a Si cap layer into a conventional MOS and considering dual-channel screening effects. Based on the proposed model, excellent agreement among the compact model, TCAD simulations and measurements is observed at different bias conditions.

R1-15 (Time: 10:43 - 10:45)
TitleOn Behavioral Modeling for Sigma-Delta Digital-to-Analog Converters with Accurate Timing Response
Author*Hsin-Yu Luo, Hsiu-Wen Li, Xiao-Qian Chang, Chien-Nan Jimmy Liu (National Central University, Taiwan)
Pagepp. 84 - 89
Keywordsigma-delta DAC, Behavioral model, bottom-up extraction
AbstractIn this paper, an efficient bottom-up extraction approach is proposed to build accurate behavioral models for sigma-delta digital-to-analog converters (DAC). In the special extraction mode, specific patterns can be used to obtain the key circuit parameters of the design in a short time without separating this design into several sub-blocks. Actual loading effects and parasites can be considered automatically, which makes our modeling approach more suitable for existing IPs and flattened post-layout designs. In the experiments, the comparison results between our behavioral model, top-down behavioral model and HSPICE simulation have demonstrated the accuracy and efficiency of the proposed modeling strategy

R1-16 (Time: 10:45 - 10:47)
TitleSelf-Tuning Metric and Control Policy to Optimally Trade-off Lifetime Performance-Power-Reliability
Author*Evelyn Mintarno, Joelle Skaf (Stanford University, U.S.A.), Rui Zheng, Jyothi Velamala, Yu Cao (Arizona State University, U.S.A.), Stephen Boyd, Robert W. Dutton, Subhasish Mitra (Stanford University, U.S.A.)
Pagepp. 90 - 95
KeywordCircuit aging, Energy efficiency, Reliability
AbstractAn optimization framework and control policies are presented to find the optimal self-tuning over lifetime which guarantees functional operation in the presence of circuit aging and optimally trade-off performance, power, and reliability over lifetime. A weighted function of total performance achieved, total energy consumed, and total reliability is considered as a metric to be maximized, subject to constraints imposed by the user and underlying hardware. Self-tuning policies for both offline and online aging estimation methods are described. Dynamic cooling is introduced as one of the self-tuning parameters, in addition to supply voltage and clock frequency. Simulation results using aging models validated by 45nm CMOS stress measurements demonstrate the effectiveness and practicality of the approach.

R1-17 (Time: 10:47 - 10:49)
TitleA Throughput-aware BusMesh NoC Configuration Algorithm Utilizing the Communication Rate between IP Cores
Author*SeungJu Lee, Masao Yanagisawa, Tatsuo Ohtsuki, Nozomu Togawa (Waseda University, Japan)
Pagepp. 96 - 101
KeywordNetwork-on-Chip (NoC), BusMesh NoC (BMNoC), A novel NoC algorithm, BMNoC configuration algorithm
AbstractBusmesh NoC (BMNoC) is comprised of bus-based connection and global mesh routers to enhance the performance of on-chip communication. In this paper, we propose a BMNoC configuration algorithm together with simulation results. In BMNoC configuration algorithm, IP cores which have a heavy communication rate between them are connected by a bus and then we configure CNs. CNs can have communication to each other via ESes and MRs. Furthermore, the simulation results illustrate the better latency than earlier studies and feasibility of BMNoC.

R1-18 (Time: 10:49 - 10:51)
TitleTSV-constrained Scan Chain Reordering for 3D ICs
AuthorWei-Ting Chen, Chia-Ching Chang, *Charles H.-P. Wen (National Chiao Tung University, Taiwan)
Pagepp. 102 - 107
Keyword3D ICs, TSV, Scan Testing
AbstractThis paper formulates the scan-chain reordering problem considering a limited number of through-silicon vias (TSVs), and further develops an efficient 2-stage algorithm. For three-dimensional optimization, a greedy algorithm named Multiple Fragment Heuristic combined with a dynamic closest-pair data structure FastPair is proposed to derive a good initial solution at stage 1. Later, stage 2 proceeds two local refinements 3D Planarization and 3D Relaxation to reduce the wire cost and the number of TSVs in use, respectively. Experiments show that the proposed algorithm can result in a comparable performance to a genetic-algorithm-based method but can run at least 3-order faster, which evidently makes it more practical for TSV-constrained scan-chain reordering for 3D ICs.