Title | Memory Synthesis for Multi-Processor System-on-Chips with Reconfigurable 3D-stacked SRAMs |
Author | Meng-Ling Tsai, *Yi-Jung Chen, Yi-Ting Chen, Ru-Hua Chang (National Chi Nan University, Taiwan) |
Page | pp. 2 - 7 |
Keyword | Memory Synthesis, Reconfigurable 3D-stacked SRAMs |
Abstract | Integrating Multi-Processor System-on-Chips (MPSoCs)
with 3D-stacked reconfigurable SRAM tiles has been
proposed for embedded systems with high memory demands. At
runtime, the SRAM tiles are configured into several memory
areas, which can be reconfigured according to the dynamic
behavior of the system. Targeting this architecture, in this
paper, we propose a data placement and memory area allocation
algorithm. The goal of the proposed algorithm is to optimize the
performance of the memory system by minimizing the on-chip
memory access latency, the number of off-chip memory accesses,
and the number of reconfigurations. Since the behavior of an
embedded system can be described by a set of scenarios, where
each scenario specifies a set of applications that would execute
concurrently, the proposed algorithm synthesizes data placements
and the memory area allocation for each scenario. Not only
the data access patterns within the scenario but also among
all scenarios are considered for data placement. We evaluate
the proposed algorithm on a set of synthetic and real-world
applications. The experimental results show that, compared to
the existing data placement method designed for MPSoCs with
distributed memory modules, the proposed algorithm achieves up
to 11.72% of data access latency reduction. |
Title | Thermal-Pattern-Aware Voltage Assignment for Task Scheduler on 3D Multi-Core Processors |
Author | Chien-Hui Liao, *Cheng Suo, Charles Hung-Pin Wen (National Chiao Tung University, Taiwan) |
Page | pp. 8 - 9 |
Keyword | task scheduling, 3D MCPs, hotspots, DVFS, voltage assignment |
Abstract | In three-dimensional multi-core processors (3D-MCPs), hotspots are found more often and cause severe problems on system reliability and lifetime. Moreover, higher frequency of hotspot occurrence triggers more dynamic voltage and frequency scaling (DVFS), leading to degraded throughput. Therefore, to reduce the frequency of hotspot occurrence effectively, a new thermal-constrained task-scheduling algorithm based on the thermal-pattern-aware voltage assignment is proposed. Through the temperature profiles of different voltage assignments on 3D-MCPs, thermal-pattern aware voltage assignment is applied for reducing the rate of temperature increase among 3D-MCPs effectively. Furthermore, the proposed scheduler includes on-line allocation for 3D vertically-grouping cores and new vertically-grouping voltage scaling which considers thermal correlation among vertically-adjacent cores in 3D MCPs. Experimental results show that, compared to the previous thermal-constrained task-scheduling strategy, our task-scheduling algorithm can reduce the frequency of hotspot occurrence by 38.84% and can further improve throughput by 6.62%. |
Title | High-Level Synthesis from Programs with External Interrupt Handling |
Author | *Naoya Ito, Nagisa Ishiura (Kwansei Gakuin University, Japan), Hiroyuki Tomiyama (Ritsumeikan University, Japan), Hiroyuki Kanbara (Advanced Scientific Technology & Management Research Institute of KYOTO, Japan) |
Page | pp. 10 - 15 |
Keyword | High-level synthesis, Binary synthesis, External interrupt, ACAP |
Abstract | This paper presents a method of synthesizing a given binary program, which contains external interrupt handling, into hardware whose behavior is equivalent to the CPU running the program. In our method, the system control coprocessor which CPU uses for interrupt handling is incorporated into the hardware as a functional unit. Instructions for accessing coprocessor registers, returning from interrupt handling, and making system calls are scheduled as operations, and bound to the coprocessor. Jump register instructions for calling and returning from interrupt service routines are synthesized using operations that convert instruction addresses into the corresponding states of the hardware. Assuming MIPS R3000 as a CPU, the proposed method has been implemented on top of binary synthesizer ACAP. A program of about 40 lines with an external interrupt service routine was synthesized into hardware, and it was confirmed that interrupt handling works correctly. The execution cycles and the delay were reduced by 14% and 26% respectively, at the cost of 1.1 times increase in hardware size. |
PDF file |
Title | An SOC Estimation System for Lithium Ion Batteries Considering Thermal Characteristics |
Author | *Ryu Ishizaki, Lei Lin, Naoki Kawarabayashi, Masahiro Fukui (Ritsumeikan University, Japan) |
Page | pp. 16 - 21 |
Keyword | Extended Karman Filter, SOC estimation, Arrhenius formula, Lithium ion Batteries |
Abstract | This paper discusses an SOC estimation system for lithium ion batteries based on the Extended Karman Filter. The accuracy of the estimation is strongly dependent on accuracy of the battery model. We have newly formulated the equivalent circuit model that considers temperature and SOC dependencies. As the result, the error rate of the estimation bas been improved significantly. The evaluation shows that the new SOC estimation system can be used for wide range of temperature. |
PDF file |
Title | Dynamic Data Migration to Eliminate Bank-Level Interference for Stencil Applications in Multicore Systems |
Author | Wei-Hen Lo, *Yen-Hao Chen, TingTing Hwang (National Tsing Hua University, Taiwan) |
Page | pp. 22 - 27 |
Keyword | data migration, memory controller, page allocation, stencils, multi-threaded |
Abstract | A stencil computation repeatedly updates each point of a d-dimensional grid as a function of itself and its near neighbors. Modern automatic transformation compiler framework can generate efficient tiling parallel stencil codes. Dynamically scheduling parallel stencils significantly improves system performance. However, memory contention problem exacerbates because of less idling cores and more memory requests sent to the DRAM memory. Traditional OS page coloring method which partitions the memory pages in advance can not alleviate the memory contention
in dynamic scheduling parallel stencils. To address this issue, we provide a new software/hardware cooperative dynamic data migration method by exploiting the update-and-reuse property of stencils. We notice that the OS page allocation needs to be aware of the flexibility for dynamic data migration in memory to eliminate the memory interference. Experimental evaluation in a 8-core x86 system shows that our method can improve the system performance by 7% as compared with dynamic scheduling stencils in 8-cores 4-memory banks system. |
PDF file |
Title | A Battery Smart Sensor and Its SOC Estimation Function for Assembled Lithium-Ion Batteries |
Author | *Naoki Kawarabayashi, Lei Lin, Ryu Ishizaki, Masahiro Fukui (Ritsumeikan University, Japan), Isao Shirakawa (University of Hyogo, Japan) |
Page | pp. 28 - 33 |
Keyword | assembled Lithium-ion batteries, Battery Smart Sensor, SOC |
Abstract | This paper discusses about the smart sensor which is the important technology in a smart grid. We have developed the system to monitor the battery condition by the attached sensor. It accumulates the measured data onto the WEB. The battery sensor is implemented with a microcomputer. We have first developed a high accurate and practical SOC sensor using the Extended Kalman filter as a function of the battery sensor. Based on the SOC estimation function for a single cell, the SOC estimation function for assembled Lithium-ion batteries is also developed. |
PDF file |
Title | A Fast and Highly Accurate Statistical Based Model for Performance Estimation of MPSoC On-Chip Bus |
Author | *Farhan Shafiq, Tsuyoshi Isshiki, Dongju Li, Hiroaki Kunieda (Tokyo Institute of Technology, Japan) |
Page | pp. 34 - 39 |
Keyword | bus, statistical model, performance prediction, arbitration stall, bus stall |
Abstract | While Multiprocessor System-On-Chips (MPSoCs) are becoming widely adopted in embedded systems, communication architecture analysis for MPSoCs becomes ever more complex. There is a growing need for faster and accurate performance estimation techniques for on-chip bus. In this paper, we present a novel statistical based technique that makes use of accumulated "workload statistics" to accurately predict the "stall cycle counts" caused due to bus contention. This eliminates the need to simulate arbitration on every bus access, resulting in substantial speed-up. It is assumed that each Processor in the system has a distinct fixed priority, and arbitration is based on priority. We verify accuracy of our proposed model against results achieved by cycle accurate simulation. Two kinds of traffic is used for experiments. Synthetically generated traffic as well as traffic from real-world application is used to verify the bus model. We report an accuracy with an error range of 0.1% - 5% for the synthetic traffic as well as achieving a speedup of 7-10x. For the real traffic, we use a limited “single blocking” bus model and report results accordingly. |
PDF file |
Title | C-Based RTL Design Framework for Processor and Hardware-IP Synthesis |
Author | *Tsuyoshi Isshiki, Koshiro Date, Daisuke Kugimiya, Dongju Li, Hiroaki Kunieda (Tokyo Institute of Technology, Japan) |
Page | pp. 40 - 45 |
Keyword | C-based design, RTL synthesis, processor synthesis, verification, instruction-set simulator |
Abstract | In this paper, we propose a new C-based design framework where the RTL structure is directly described on dataflow C coding style, while the same C code serves as a fast simulation model. Design examples on image signal processing pipeline shows the effectiveness of the proposed C-based tool framework where the dataflow C codes have 1/3 to 1/5 of the number of lines compared to HDLs, can generate high performance circuits having enormously high parallelism of 4000 operations/cycle. Also for RISC processor designs, our dataflow C coding style effectively captures the behavior of the instruction set simulator with less than 1000 lines of C code which is can be directly transformed into RTL structure |
PDF file |
Title | Profiler for Control System in System Level Design |
Author | *Miaw Torng-Der, Yuki Ando, Shinya Honda, Hiroaki Takada, Masato Edahiro (Nagoya University, Japan) |
Page | pp. 46 - 51 |
Keyword | profiler, system level design, FPGA, control system |
Abstract | This paper introduces a profiler architecture for control system in system-level design.
When design a control system, we need to consider two things.
The first thing is the asynchronous signal coming from sensor and actuators, called interrupt request signal.
The second thing is the process should have a higher priority and be activated by interrupt request signal, called interrupt handler.
However, existing profiler cannot obtain the information of the interrupt request signal nor interrupt handler. |
PDF file |
Title | Socket-Based Performance Monitoring Tool Suite for System-on-Chips |
Author | *Ting-Hsuan Wu, Tsun-Hsin Chang, Ing-Jer Huang (National Sun Yat-sen University, Taiwan) |
Page | pp. 52 - 55 |
Keyword | performance, monitoring, system, software, hardware |
Abstract | Since the SoC industry had shifted its development goal from processor clock frequencies increasing to work distribution among multiple IPs. In order to achieve better efficiency of SoC integration, the socket interfaces are adopted to eliminate the migration overhead from system to another. Therefore, this paper proposed a Socket-Based Performance Monitoring Tool Suite (SB PMTS) which is capable to provide a holistic-view of system behavior and performance by monitoring the two types of performance information: (1) The cycle-accurate execution time of a complete task. (2) The transaction events on the socket interfaces. Accordingly, SB PMTS will synchronize the performance information from different resources and enable the average designers to quickly assess the quality of the SoC without any instrumentation. |
Title | Minimization of Register Area Cost for Soft-Error Correction in Low Energy DMR Design |
Author | *Kazuhito Ito, Takumi Negishi (Saitama University, Japan) |
Page | pp. 56 - 61 |
Keyword | DMR, Low energy, Synthesis, Register minimization |
Abstract | Double modular redundancy (DMR) is to execute an operation twice
and detect soft-error by comparing the operation results.
The soft-error is corrected by executing necessary operations again
to obtain correct results.
Such re-executing operations requires thier input data and
many registers are needed to store the necessary data.
In this paper, a method to minimize the area cost of registers
is proposed while the minimization of operation energy consumption is considered
with respect to the give constraints of time, resource, and delay penalty for error correction.
The experimantal results show about 20% of register cost is reduced
on average. |
PDF file |
Title | Simultaneous Test Scheduling and TAM Bus Wire Assignment for Core-Based SoC Designs |
Author | Te-Jui Wang, *Ching-Chun Chiu, Shih-Hsu Huang (Chung Yuan Christian University, Taiwan) |
Page | pp. 62 - 67 |
Keyword | Core-Based Systems, Test Scheduling, Testing Time, Test Access Mechanism |
Abstract | The reduction of total testing time is crucial for the saving of IC testing cost. In the testing of a core-based System-on-Chip (SoC) design, external tests are applied to cores via a specialized test access mechanism (TAM). Previous test scheduling algorithms assume that two external tests cannot utilize the TAM at the same time. However, in fact, if the external tests of different cores do not use the same TAM bus wire, they can be executed concurrently, which reduces the total testing time. Based on this observation, in this paper, we propose an effective and efficient algorithm to perform the simultaneous application of test scheduling and TAM bus wire assignment for the testing of core-based SoC designs. Compared with previous works, experimental results consistently show that the proposed approach can greatly reduce the total testing time. |
Title | Automatic Analog Synthesis Platform with Low-Noise Consideration |
Author | Ying-Chi Lien, Ching-Mao Lee, Chih-Wei Li, *Yi-Syue Han, Chien-Nan Jimmy Liu (National Central University, Taiwan) |
Page | pp. 68 - 71 |
Keyword | analog synthesis, bio-signal, automatic sizing, layout automation |
Abstract | Because the bio-signals are often very weak, they can be influenced by noise easily and become hard to distinguish. In this paper, an automatic analog synthesis platform is presented for bio-acquisition systems to generate the required circuits from specification to layout with low-noise consideration. Process variations and layout effects are also simultaneously considered to generate the required circuits with high design yield. Furthermore, a user-friendly GUI is also provided to help users complete the design flow successfully and efficiently. As shown in the experimental results, this analog synthesis platform is able to generate the required circuits in seconds with low noise. The chip implementation result also verifies the capability of this tool to generate the required designs with fabricable quality. |
Title | Intra-Vehicle Network Routing Algorithm for Weight and Wireless Transmit Power Minimization |
Author | *Ta-Yang Huang, Chia-Jui Chang (National Cheng Kung University, Taiwan), Chung-Wei Lin (University of California at Berkeley, U.S.A.), Sudip Roy (National Cheng Kung University, Taiwan), Tsung-Yi Ho (National Chiao Tung University, Taiwan) |
Page | pp. 72 - 77 |
Keyword | In-Vehicle Network, Routing |
Abstract | As the complexity of vehicle distributed systems increases rapidly, several hundreds of devices (sensors, actuators, etc.) are being placed in a modern automotive system.
With the increase in wiring cables connecting these devices, the weight of a car increases significantly, which degrades the fuel efficiency in driving.
In order to reduce the weight of a car, wireless communication has been introduced to replace wiring cables between some devices.
However, the extra energy consumption for packet transmissions by wireless devices requires frequent maintenances, e.g., recharging of batteries.
In this paper, we propose an intra-vehicle network routing algorithm to simultaneously minimize the wiring weight and the transmission power for wireless communication.
Experimental results show that the proposed method can effectively minimize the wiring weight and the transmit power for wireless communication. |
Title | An Automated Flow Integration to Help Analog Layout Design Migration |
Author | Jou-Chun Lin, *Po-Cheng Pan, Ching-Yu Chin, Hung-Ming Chen (National Chiao Tung University, Taiwan) |
Page | pp. 78 - 82 |
Keyword | analog layout, design migration |
Abstract | The development of the computer-aided-design (CAD) tools for digital circuits has been perfected for these years. However, the CAD tools for analog circuits still remains a great deal of challenges. Since the size of transistors scales down as the process technology advances, design migration problem takes place to increase the degree of layout reusing. With previous work such as placement migration and routing preservation tool, further performance boost becomes the next step. We aim at the width of wires that impacts resistance and capacitance of wires so as to improve the performance. We implement a flow, which can further improve the performance, generate the modified layout automatically and pass the verification check, to speed up the analysis process or design flow by adjusting the wire width. We apply greedy heuristic and simulated annealing algorithm in our framework. Our flow can help with the analog layout synthesis flow in more efficient way. |
PDF file |
Title | Analysis of the Distance Dependent Multiple Cell Upset Rates on 65-nm Redundant Latches by a PHITS-TCAD Simulation System |
Author | *Kuiyuan Zhang, Jun Furuta, Kazutoshi Kobayashi (Kyoto Institute of Technology, Japan) |
Page | pp. 89 - 93 |
Keyword | Soft error, PHITS, TCAD, MCU |
Abstract | Recently, the soft error rates of integrated circuits is
increased by process scaling. Soft error decreases the tolerance of
VLSIs. Charge sharing and bipolar effect become dominant
when a particle hit on latches and flip-flop. Soft error makes
circuit more sensitive to Multiple Cell Upset (MCU). We
analyze the MCU tolerance of redundant latches in 65 nm
process by device simulation and particle and heavy ion
transfer code system (PHITS). The MCU rate of redundant
latches is exponential decreased by increasing the
distance between redundant latches. These results coincide
with the neutron experiments. |
Title | Feasible Shortest Path Frame Bounded Maze-Routing Algorithm for ML-OARST with Ripping up and Re-Building Steiner Points |
Author | *Kuen-Wey Lin, Yeh-Sheng Lin, Yih-Lang Li (Institute of Computer Science and Engineering, National Chiao Tung University, Taiwan), Rung-Bin Lin (Computer Science and Engineering, Yuan Ze University, Taiwan) |
Page | pp. 94 - 99 |
Keyword | Steiner tree, Routing, Obstacle-avoidance, Multilayer, Physical Design |
Abstract | Owing to its large solution space, maze routing has never been used to solve the multi-layer obstacle-avoiding rectilinear Steiner tree problem (ML-OARST). This paper proposes the first maze routing-based algorithm that efficiently identifies a high-quality ML-OARST. Our algorithm employs a three-dimensional Hanan grid graph for maze routing and applies a novel scheme to identify good Steiner points. This significantly reduces the search overhead of maze routing. To reduce the routing cost of ML-OARST, we also develop a novel rip-up and re-building strategy for altering Steiner points and tree topology. Experimental results reveal that the proposed algorithm outperforms the state-of-the-art ML-OARST methods in wire-length and via costs. The required CPU time is comparable to that needed by spanning graph-based approaches. |
Title | A TPL-Friendly Legalizer for Standard Cell Based Design |
Author | *Hsiu-Yu Lai, Ting-Chi Wang (National Tsing Hua University, Taiwan) |
Page | pp. 100 - 105 |
Keyword | Triple Patterning Lithography, Placement, Legalization, Standard Cell, Layout Decomposition |
Abstract | As the shrinking of the feature size and the delay of the next generation lithography, double patterning lithography (DPL) is no longer enough for 14/10nm technology node. Triple patterning lithography (TPL) is a nature extension from DPL, and it can not only triple the pitch but also reduce conflicts and stitches. Although TPL is more difficult and complicated than DPL, TPL is a promising alternative for 14/10nm technology node. In this paper, we consider TPL during the standard-cell legalization stage in order to let the resultant placement be more friendly to TPL layout decomposition. We provide a novel idea of reducing TPL conflicts through cell reordering and white space insertion. The experimental results show that as compared to a conventional legalizer, our legalizer is able to effectively reduce the numbers of conflicts and stitches. |
Title | On the Impact of Initial Placement to SA-Based Placement for Mixed-Grained Reconfigurable Architecture |
Author | *Takashi Kishimoto, Hiroyuki Ochi (Ritsumeikan University, Japan) |
Page | pp. 111 - 116 |
Keyword | Simulated Annealing, Partitioning-based, Reconfigurable Architecture, Placement |
Abstract | In this paper, we investigate a novel placement algorithm for mixed-grain reconfigurable architectures (MGRAs). The proposed algorithm applies partitioning-based method to LUTs to obtain an initial placement, followed by further optimization process for both LUTs and ALUs based on low temperature simulated annealing (SA) method. Compared with a conventional FPGA placement algorithm that uses SA with random initial placement, our method exhibits 9.3% smaller delay after running SA for half an hour. Our method is also superior in terms of final solution after several hours run. |