Title | Placing Static and Stack Data into a Scratch-Pad Memory for Reducing the Energy Consumption of Multi-task Applications |
Author | *Lovic Gauthier, Tohru Ishihara (Kyushu University, Japan), Hideki Takase (Nagoya University, Japan), Hiroyuki Tomiyama (Ritsumeikan University, Japan), Hiroaki Takada (Nagoya University, Japan) |
Page | pp. 7 - 12 |
Keyword | Energy consumption, Scratch-pad memory, Software, Multi-task, Stack |
Abstract | Scratch-pad memories (SPM) are on-chip memory devices which are much smaller but much faster and which consume much less energy than off-chip memories. This paper presents two fully software techniques for respectively sharing the SPM among several tasks and managing the stacks of each task between the SPM and the external main memory (MM). The paper then explains then how to merge efficiently these techniques
for achieving further energy consumption reduction. |
Title | Aggressive Register Unsharing with Selective FU Sharing in High-Level Synthesis |
Author | *Yuko Hara-Azumi, Toshinobu Matsuba (Nagoya University, Japan), Hiroyuki Tomiyama (Ritsumeikan University, Japan), Shinya Honda, Hiroaki Takada (Nagoya University, Japan) |
Page | pp. 13 - 18 |
Keyword | High-Level Synthesis, Behavioral Synthesis, Aggressive Register Unsharing, Selective FU Sharing, Register Retiming |
Abstract | A novel high-level synthesis technique to improve the clock frequency with little area overhead is presented. Our technique aims at suppressing area overhead while keeping clock frequency as high as an existing work which achieves the highest clock frequency. Our proposed method performs selective functional unit (FU) sharing, which shares only large FUs in order to efficiently save circuit area and multiplexer (MUX) insertion, based on an existing technique called aggressive register unsharing, which significantly removes MUXs inserted before registers. Moreover, we propose hardware-component-level register retiming, which shortens critical path delays more effectively than the traditional logic-level register retiming. Three sets of experiments demonstrated that our proposed method achieved up to 37.8% and on average 15.7% area reduction with negligible clock frequency degradation from the existing work. |
Title | Automatic Generation for Efficient Software TLM at Multiple Abstraction Layers |
Author | Meng-Huan Wu, *Yi-Shan Lu, Wen-Chuan Lee, Chen-Yu Chuang, Ren-Song Tsay (Department of Computer Science, National Tsing Hua University, Taiwan) |
Page | pp. 19 - 24 |
Keyword | hw/sw co-simulation, software abstraction |
Abstract | We in this paper propose a software Transaction-Level Model-ing (TLM) approach to co-simulate HW/SW efficiently. To keep the concurrency in the simulated system, timing synchronization should be considered carefully in HW/SW co-simulation between hardware and software simulations. Nevertheless, improper timing synchronization leads to either poor simulation performance or inaccurate simulation result. Our approach achieves accurate yet efficient HW/SW co-simulation due to that we perform timing synchronization only at points where HW and SW actually interact. In addition, given the target software, three abstraction levels of software TLM models can be generated automatically based on the type of interactions concerned. The experimental results show that the speed of our software TLM models achieves 3 million instructions per second (MIPS) for low abstraction level, and goes higher up to 248 MIPS for higher abstraction levels. Hence, designers can leverage our approach to have an efficient HW/SW co-simulation by simply selecting proper abstraction layers which fit their needs. |
Title | Evaluation of Two Operating Systems for Lego Mindstorms NXT |
Author | *Wing-Kwong Wong (Department of Electronic Engineering, National Yunlin University of Science and Technology, Taiwan), Fu-Hsien Lin (Graduate School of Electronic Engineering, National Yunlin University of Science and Technology, Taiwan) |
Page | pp. 25 - 30 |
Keyword | Embedded systems, NxtOSEK, MicroC/OS, Lego Mindstorms NXT, Operating systems |
Abstract | Lego Mindstorms NXT is used as a hardware platform for comparing two embedded operating systems (OS). NxtOSEK is available as an open-source project that includes both device drivers and an OS kernel. We have successfully ported MicroCOS to replace the NxtOSEK kernel but the device drivers are kept. Following previous works on the evaluation of embedded operating systems, we use a number of measurements with a software approach to evaluate the performance of NxtOSEK and MicroCOS, including preemptive scheduling, interrupt preemption, get/release semaphore, semaphore passing and memory allocation. MicroCOS performed significantly better in two aspects and its kernel mechanisms are examined in detail in order to explain the speedup compared to NxtOSEK. |
Title | Concord: A Configurable SoC Prototyping Platform |
Author | Chih-Chyau Yang, *Chen-Yen Lin, Hui-Ming Lin, Yui-Chih Shih, Hsi-Tse Wu, Shi-Lun Chen, Tien-Ching Wang, Chien-Ming Wu, Chun-Ming Huang, Chin-Long Wey (National Chip Implementation Center, Taiwan) |
Page | pp. 31 - 36 |
Keyword | SoC prototyping, CONCORD, verification platform |
Abstract | FPGA-based SoC verification boards have been commercially available for SoC verification prototyping. However, most of these boards were developed with fixed hardwired architectures. Due to the lack of architectural flexibility, users are not allowed to develop with on-chip-buses and on-chip-networks, and to alter the architecture for specific applications. In addition, the system architecture under the FPGA-based SoC system may differ from the real chip. This paper presents a fully configurable SoC prototyping platform, namely, CONCORD, which provides high flexibility in connection interfaces, high flexibility and high architectural compatibility for design changes, and high modularity for specific applications. In order to demonstrate the effectiveness of the developed CONCORD verification platform, this paper also presents three configurations for the embedded systems with the most popular cores, such as ARM, OpenRISC, and LEON. |
Title | Generation Method of Decomposed Small Area Instruction Decoder for Configurable Processor |
Author | *Hiroki Ohsawa, Hirofumi Iwato, Keishi Sakanushi, Yoshinori Takeuchi, Masaharu Imai (Osaka University, Japan) |
Page | pp. 37 - 41 |
Keyword | small area instruction decoder, configurable processor, ASIP |
Abstract | This paper studies a generation method of decomposed small area instruction decoder for configurable processor. Since Application Specific Instruction set Processor (ASIP) is widely used in embedded systems, ASIPs are required to be designed to have further smaller area, higher performance, and lower power consumption.
This paper proposes a generation method of small area instruction
decoder by using decomposed instruction decoder model.
In this paper, we pay attention to the number of the pipeline registers in the controller.
Proposed method minimizes the number of the pipeline
registers by generating control signals on two or more stages.
Experimental results show that proposed method achieves 85 % reduction of pipeline register for control signals in controller
compared to the conventional method. |
Title | A High-speed VLSI Architecture of Output Probability and Likelihood Score Computations for HMM-based Recognition Systems |
Author | *Ryo Shimazaki, Kazuhiro Nakamura, Mashatoshi Yamamoto, Kazuyoshi Takagi (Nagoya University, Japan), Naofumi Takagi (Kyoto University, Japan) |
Page | pp. 42 - 47 |
Keyword | speech recognition, VLSI architecture, HMM, likelihood score computation, output probability computation |
Abstract | We present a VLSI architecture for output
probability computations (OPCs) of continuous
HiddenMarkovModels (HMMs) and likelihood scorer
computations (LSCs) which supports store-based block
parallel processing (StoreBPP). We also demonstrate fast
store-based block parallel processing (FastStoreBPP) which
exploits full performance of the StoreBPP and present
a high-speed VLSI architecture that supports it. A
comparison demonstrates the efficiency of the architecture. |
Title | Improved Local Horizontal and Vertical Common Subexpression Elimination Method for Constant Multiple Multiplication |
Author | *Yasuhiro Takahashi, Toshikazu Sekine (Gifu University, Japan), Michio Yokoyama (Yamagata University, Japan) |
Page | pp. 48 - 53 |
Keyword | multuplierless filter, common subexpression elimination, constant multiplication |
Abstract | The common subexpression elimination (CSE) techniques address the issue of minimizing the number of adders needed to implement the multiple constant multiplication (MCM) blocks. In this paper, we propose a new CSE method using a combining horizontal and vertical technique. The proposed method searches firstly the frequency of higher order horizontal common subexpression, i.e., 3-5 bits, and then searches vertical. Our simulation results show that our method others a good tradeoff between the implementation cost and the synthesis run-time in comparison with conventional methods. |
Title | Improved Normalized Image Reconstruction for Iris Recognition |
Author | *Hyo Jin Nam, Harsh Durga Tiwari, Yong Beom Cho (Konkuk University, Republic of Korea) |
Page | pp. 54 - 57 |
Keyword | Iris recognition, Segmentation process, Normalization, Intel PXA255 |
Abstract | Iris recognition is one of the most common identification system used now-a-days. Compared with other biometric features such fingerprint and face, Iris patterns are more reliable and stable. In order to compensate the variation, common iris recognition requires the translation of the segmented iris image to the normalized image. This paper focuses on the implementation of improved normalized image formation by employing modified segmentation method which can reduce the time of execution by ten times. |
Title | Inter-Island Delay Aware Communication Synthesis for Island-Based Distributed Register Architecture |
Author | Juinn-Dar Huang, *Chia-I Chen, Wan-Ling Hsu, Yen-Ting Lin, Jing-Yang Jou (Department of Electronics Engineering and Institute of Electronics, National Chiao Tung University, Taiwan) |
Page | pp. 58 - 63 |
Keyword | Behavioral synthesis, distributed register-file, resource binding, scheduling |
Abstract | In deep-submicron era, wire delay is becoming the bottleneck while pursuing high system clock speed. Several distributed register (DR) architectures are proposed to cope with this problem by keeping most wires local. In this paper, a distributed register-file microarchitecture with inter-island delay (DRFM-IID) is proposed. Though DRFM-IID is also one of the DR-based architectures, it is more practical than the prior art, DRFM, in terms of delay model. With such interconnect delay consideration, synthesis task is inherently more complicated than the one with zero inter-island delay. The unexpected interconnect delay is very likely to make a serious impact on the whole system performance due to lengthened clock cycle time. Hence we also provide a performance-driven architectural synthesis framework targeting DRFM-IID to optimize the system performance. Multiple factors, such as the number of inter-island transfers, criticality of transfer, and resource utilizations, are considered to obtain a better solution. The experimental results indicate that the latency and the number of inter-cluster transfers can on average be reduced by 26.91% and 37.54% respectively, whereas the latter is also widely used as a metric of communication power consumption. |
Title | MorFPGA: A Modularized FPGA-Based Embedded System Development Platform |
Author | Yu-Tsang Chang, Chun-Ming Huang, Chien-Ming Wu, Chun-Yu Chen, *Yu-Sheng Lin, Chih-Ting Kuo, Ting-Chun Liu, Chin-Long Wey (National Chip Implementation Center, Taiwan) |
Page | pp. 64 - 69 |
Keyword | Embedded System, SoC, FPGA, Modularized Structure, LEON3 |
Abstract | With the ever increasing complexity of System-on-a-chip (SoC), the pressures of short time to market, and low cost requirements, the platform-based design paradigms have been commonly used for SoC designs. Modular and flexible design becomes important features for enhancing expandability and re-configurability of the system. This paper presents a modularized FPGA-based embedded system platform for digital photo frame application with the open source processor core, LEON3. An extra touch panel module, which is not natively supported by the LEON3 GRLIB library, is introduced and successfully integrated in this application. |
Title | A Novel Design-Methodology for PCB Traces Ensuring High Signal-Integrity on Random Signals |
Author | *Masami Ishiguro, Shohei Akita, Hiroki Shimada, Noriyuki Aibe (University of Tsukuba, Japan), Ikuo Yoshihara (University of Miyazaki, Japan), Moritoshi Yasunaga (University of Tsukuba, Japan) |
Page | pp. 70 - 75 |
Keyword | Signal Integrity, Transmission Line, Random Signal |
Abstract | We have already proposed a novel transmission line called “Segmental Transmission Line (STL)”, which can ensure high signal integrity of high-speed signals in the PCB traces. Up to now, however, the design methodology of STL has limited to the clock signals. In this paper, we propose a novel design methodology of the STL for the random signals, and fabricate a scale-up prototype based on the proposed methodology. We also demonstrate its effectiveness using the prototype compared with the conventional transmission line. |
Title | A Physics-Based Compact Model for the 1/f Noise in p-type Si/SiGe/Si Heterostructure MOSFETs |
Author | *Chia-Yu Chen (Stanford University, U.S.A.), Chi-Chao Wang, Yun Ye (Arizona State University, U.S.A.), Yang Liu (Stanford University, U.S.A.), Junko Sato-Iwanaga, Akira Inoue, Haruyuki Sorada (Panasonic Electronics, Japan), Yu Cao (Arizona State University, U.S.A.), Robert Dutton (Stanford University, U.S.A.) |
Page | pp. 82 - 83 |
Keyword | 1/f noise, screening effect, SiGe p-HMOS, compact model, heterostructure |
Abstract | A physics-based p-type Si/SiGe/Si heterostructure MOSFET
(SiGe p-HMOS) 1/f noise model that can predict charge distribution
in dual channels and calculate noise contributions from
two channels in circuit simulators is developed. 1/f noise behavior
in SiGe p-HMOS can be modeled in cooperating the capacitance
of a Si cap layer into a conventional MOS and considering
dual-channel screening effects. Based on the proposed
model, excellent agreement among the compact model, TCAD
simulations and measurements is observed at different bias
conditions. |
Title | On Behavioral Modeling for Sigma-Delta Digital-to-Analog Converters with Accurate Timing Response |
Author | *Hsin-Yu Luo, Hsiu-Wen Li, Xiao-Qian Chang, Chien-Nan Jimmy Liu (National Central University, Taiwan) |
Page | pp. 84 - 89 |
Keyword | sigma-delta DAC, Behavioral model, bottom-up extraction |
Abstract | In this paper, an efficient bottom-up extraction approach is proposed to build accurate behavioral models for sigma-delta digital-to-analog converters (DAC). In the special extraction mode, specific patterns can be used to obtain the key circuit parameters of the design in a short time without separating this design into several sub-blocks. Actual loading effects and parasites can be considered automatically, which makes our modeling approach more suitable for existing IPs and flattened post-layout designs. In the experiments, the comparison results between our behavioral model, top-down behavioral model and HSPICE simulation have demonstrated the accuracy and efficiency of the proposed modeling strategy |
Title | Self-Tuning Metric and Control Policy to Optimally Trade-off Lifetime Performance-Power-Reliability |
Author | *Evelyn Mintarno, Joelle Skaf (Stanford University, U.S.A.), Rui Zheng, Jyothi Velamala, Yu Cao (Arizona State University, U.S.A.), Stephen Boyd, Robert W. Dutton, Subhasish Mitra (Stanford University, U.S.A.) |
Page | pp. 90 - 95 |
Keyword | Circuit aging, Energy efficiency, Reliability |
Abstract | An optimization framework and control policies are presented to find the optimal self-tuning over lifetime which guarantees functional operation in the presence of circuit aging and optimally trade-off performance, power, and reliability over lifetime. A weighted function of total performance achieved, total energy consumed, and total reliability is considered as a metric to be maximized, subject to constraints imposed by the user and underlying hardware. Self-tuning policies for both offline and online aging estimation methods are described. Dynamic cooling is introduced as one of the self-tuning parameters, in addition to supply voltage and clock frequency. Simulation results using aging models validated by 45nm CMOS stress measurements demonstrate the effectiveness and practicality of the approach. |
Title | A Throughput-aware BusMesh NoC Configuration Algorithm Utilizing the Communication Rate between IP Cores |
Author | *SeungJu Lee, Masao Yanagisawa, Tatsuo Ohtsuki, Nozomu Togawa (Waseda University, Japan) |
Page | pp. 96 - 101 |
Keyword | Network-on-Chip (NoC), BusMesh NoC (BMNoC), A novel NoC algorithm, BMNoC configuration algorithm |
Abstract | Busmesh NoC (BMNoC) is comprised of bus-based connection and global mesh routers to enhance the performance of on-chip communication. In this paper, we propose a BMNoC configuration algorithm together with simulation results. In BMNoC configuration algorithm, IP cores which have a heavy communication rate between them are connected by a bus and then we configure CNs. CNs can have communication to each other via ESes and MRs. Furthermore, the simulation results illustrate the better latency than earlier studies and feasibility of BMNoC. |
Title | TSV-constrained Scan Chain Reordering for 3D ICs |
Author | Wei-Ting Chen, Chia-Ching Chang, *Charles H.-P. Wen (National Chiao Tung University, Taiwan) |
Page | pp. 102 - 107 |
Keyword | 3D ICs, TSV, Scan Testing |
Abstract | This paper formulates the scan-chain reordering problem considering a limited number of through-silicon vias (TSVs), and further develops an efficient 2-stage algorithm. For three-dimensional optimization, a greedy algorithm named Multiple Fragment Heuristic combined with a dynamic closest-pair data structure FastPair is proposed to derive a good initial solution at stage 1. Later, stage 2 proceeds two local refinements 3D Planarization and 3D Relaxation to reduce the wire cost and the number of TSVs in use, respectively. Experiments show that the proposed algorithm can result in a comparable performance to a genetic-algorithm-based method but can run at least 3-order faster, which evidently makes it more practical for TSV-constrained scan-chain reordering for 3D ICs. |