SASIMI 2015 Technical Program

SASIMI 2015
The 19th Workshop on Synthesis And System Integration of Mixed Information Technologies

Poster I
Time: 10:15 - 12:00 Monday, March 16, 2015
Chairs: Ren-Song Tsay (National Tsing Hua University, Taiwan), Po-Hung Lin (National Chung Cheng University, Taiwan)

R1-1 (Time: 10:15 - 10:17)

Title	Memory Synthesis for Multi-Processor System-on-Chips with Reconfigurable 3D-stacked SRAMs
Author	Meng-Ling Tsai, *Yi-Jung Chen, Yi-Ting Chen, Ru-Hua Chang (National Chi Nan University, Taiwan)
Page	pp. 2 - 7
Keyword	Memory Synthesis, Reconfigurable 3D-stacked SRAMs
Abstract	Integrating Multi-Processor System-on-Chips (MPSoCs) with 3D-stacked reconfigurable SRAM tiles has been proposed for embedded systems with high memory demands. At runtime, the SRAM tiles are configured into several memory areas, which can be reconfigured according to the dynamic behavior of the system. Targeting this architecture, in this paper, we propose a data placement and memory area allocation algorithm. The goal of the proposed algorithm is to optimize the performance of the memory system by minimizing the on-chip memory access latency, the number of off-chip memory accesses, and the number of reconfigurations. Since the behavior of an embedded system can be described by a set of scenarios, where each scenario specifies a set of applications that would execute concurrently, the proposed algorithm synthesizes data placements and the memory area allocation for each scenario. Not only the data access patterns within the scenario but also among all scenarios are considered for data placement. We evaluate the proposed algorithm on a set of synthetic and real-world applications. The experimental results show that, compared to the existing data placement method designed for MPSoCs with distributed memory modules, the proposed algorithm achieves up to 11.72% of data access latency reduction.

R1-2 (Time: 10:17 - 10:19)

Title	Thermal-Pattern-Aware Voltage Assignment for Task Scheduler on 3D Multi-Core Processors
Author	Chien-Hui Liao, *Cheng Suo, Charles Hung-Pin Wen (National Chiao Tung University, Taiwan)
Page	pp. 8 - 9
Keyword	task scheduling, 3D MCPs, hotspots, DVFS, voltage assignment
Abstract	In three-dimensional multi-core processors (3D-MCPs), hotspots are found more often and cause severe problems on system reliability and lifetime. Moreover, higher frequency of hotspot occurrence triggers more dynamic voltage and frequency scaling (DVFS), leading to degraded throughput. Therefore, to reduce the frequency of hotspot occurrence effectively, a new thermal-constrained task-scheduling algorithm based on the thermal-pattern-aware voltage assignment is proposed. Through the temperature profiles of different voltage assignments on 3D-MCPs, thermal-pattern aware voltage assignment is applied for reducing the rate of temperature increase among 3D-MCPs effectively. Furthermore, the proposed scheduler includes on-line allocation for 3D vertically-grouping cores and new vertically-grouping voltage scaling which considers thermal correlation among vertically-adjacent cores in 3D MCPs. Experimental results show that, compared to the previous thermal-constrained task-scheduling strategy, our task-scheduling algorithm can reduce the frequency of hotspot occurrence by 38.84% and can further improve throughput by 6.62%.

R1-3 (Time: 10:19 - 10:21)

Title	High-Level Synthesis from Programs with External Interrupt Handling
Author	*Naoya Ito, Nagisa Ishiura (Kwansei Gakuin University, Japan), Hiroyuki Tomiyama (Ritsumeikan University, Japan), Hiroyuki Kanbara (Advanced Scientific Technology & Management Research Institute of KYOTO, Japan)
Page	pp. 10 - 15
Keyword	High-level synthesis, Binary synthesis, External interrupt, ACAP
Abstract	This paper presents a method of synthesizing a given binary program, which contains external interrupt handling, into hardware whose behavior is equivalent to the CPU running the program. In our method, the system control coprocessor which CPU uses for interrupt handling is incorporated into the hardware as a functional unit. Instructions for accessing coprocessor registers, returning from interrupt handling, and making system calls are scheduled as operations, and bound to the coprocessor. Jump register instructions for calling and returning from interrupt service routines are synthesized using operations that convert instruction addresses into the corresponding states of the hardware. Assuming MIPS R3000 as a CPU, the proposed method has been implemented on top of binary synthesizer ACAP. A program of about 40 lines with an external interrupt service routine was synthesized into hardware, and it was confirmed that interrupt handling works correctly. The execution cycles and the delay were reduced by 14% and 26% respectively, at the cost of 1.1 times increase in hardware size.
PDF file

R1-4 (Time: 10:21 - 10:23)

Title	An SOC Estimation System for Lithium Ion Batteries Considering Thermal Characteristics
Author	*Ryu Ishizaki, Lei Lin, Naoki Kawarabayashi, Masahiro Fukui (Ritsumeikan University, Japan)
Page	pp. 16 - 21
Keyword	Extended Karman Filter, SOC estimation, Arrhenius formula, Lithium ion Batteries
Abstract	This paper discusses an SOC estimation system for lithium ion batteries based on the Extended Karman Filter. The accuracy of the estimation is strongly dependent on accuracy of the battery model. We have newly formulated the equivalent circuit model that considers temperature and SOC dependencies. As the result, the error rate of the estimation bas been improved significantly. The evaluation shows that the new SOC estimation system can be used for wide range of temperature.
PDF file

R1-5 (Time: 10:23 - 10:25)

Title	Dynamic Data Migration to Eliminate Bank-Level Interference for Stencil Applications in Multicore Systems
Author	Wei-Hen Lo, *Yen-Hao Chen, TingTing Hwang (National Tsing Hua University, Taiwan)
Page	pp. 22 - 27
Keyword	data migration, memory controller, page allocation, stencils, multi-threaded
Abstract	A stencil computation repeatedly updates each point of a d-dimensional grid as a function of itself and its near neighbors. Modern automatic transformation compiler framework can generate efficient tiling parallel stencil codes. Dynamically scheduling parallel stencils significantly improves system performance. However, memory contention problem exacerbates because of less idling cores and more memory requests sent to the DRAM memory. Traditional OS page coloring method which partitions the memory pages in advance can not alleviate the memory contention in dynamic scheduling parallel stencils. To address this issue, we provide a new software/hardware cooperative dynamic data migration method by exploiting the update-and-reuse property of stencils. We notice that the OS page allocation needs to be aware of the flexibility for dynamic data migration in memory to eliminate the memory interference. Experimental evaluation in a 8-core x86 system shows that our method can improve the system performance by 7% as compared with dynamic scheduling stencils in 8-cores 4-memory banks system.
PDF file

R1-6 (Time: 10:25 - 10:27)

Title	A Battery Smart Sensor and Its SOC Estimation Function for Assembled Lithium-Ion Batteries
Author	*Naoki Kawarabayashi, Lei Lin, Ryu Ishizaki, Masahiro Fukui (Ritsumeikan University, Japan), Isao Shirakawa (University of Hyogo, Japan)
Page	pp. 28 - 33
Keyword	assembled Lithium-ion batteries, Battery Smart Sensor, SOC
Abstract	This paper discusses about the smart sensor which is the important technology in a smart grid. We have developed the system to monitor the battery condition by the attached sensor. It accumulates the measured data onto the WEB. The battery sensor is implemented with a microcomputer. We have first developed a high accurate and practical SOC sensor using the Extended Kalman filter as a function of the battery sensor. Based on the SOC estimation function for a single cell, the SOC estimation function for assembled Lithium-ion batteries is also developed.
PDF file

R1-7 (Time: 10:27 - 10:29)

Title	A Fast and Highly Accurate Statistical Based Model for Performance Estimation of MPSoC On-Chip Bus
Author	*Farhan Shafiq, Tsuyoshi Isshiki, Dongju Li, Hiroaki Kunieda (Tokyo Institute of Technology, Japan)
Page	pp. 34 - 39
Keyword	bus, statistical model, performance prediction, arbitration stall, bus stall
Abstract	While Multiprocessor System-On-Chips (MPSoCs) are becoming widely adopted in embedded systems, communication architecture analysis for MPSoCs becomes ever more complex. There is a growing need for faster and accurate performance estimation techniques for on-chip bus. In this paper, we present a novel statistical based technique that makes use of accumulated "workload statistics" to accurately predict the "stall cycle counts" caused due to bus contention. This eliminates the need to simulate arbitration on every bus access, resulting in substantial speed-up. It is assumed that each Processor in the system has a distinct fixed priority, and arbitration is based on priority. We verify accuracy of our proposed model against results achieved by cycle accurate simulation. Two kinds of traffic is used for experiments. Synthetically generated traffic as well as traffic from real-world application is used to verify the bus model. We report an accuracy with an error range of 0.1% - 5% for the synthetic traffic as well as achieving a speedup of 7-10x. For the real traffic, we use a limited “single blocking” bus model and report results accordingly.
PDF file

R1-8 (Time: 10:29 - 10:31)

Title	C-Based RTL Design Framework for Processor and Hardware-IP Synthesis
Author	*Tsuyoshi Isshiki, Koshiro Date, Daisuke Kugimiya, Dongju Li, Hiroaki Kunieda (Tokyo Institute of Technology, Japan)
Page	pp. 40 - 45
Keyword	C-based design, RTL synthesis, processor synthesis, verification, instruction-set simulator
Abstract	In this paper, we propose a new C-based design framework where the RTL structure is directly described on dataflow C coding style, while the same C code serves as a fast simulation model. Design examples on image signal processing pipeline shows the effectiveness of the proposed C-based tool framework where the dataflow C codes have 1/3 to 1/5 of the number of lines compared to HDLs, can generate high performance circuits having enormously high parallelism of 4000 operations/cycle. Also for RISC processor designs, our dataflow C coding style effectively captures the behavior of the instruction set simulator with less than 1000 lines of C code which is can be directly transformed into RTL structure
PDF file

R1-9 (Time: 10:31 - 10:33)

Title	Profiler for Control System in System Level Design
Author	*Miaw Torng-Der, Yuki Ando, Shinya Honda, Hiroaki Takada, Masato Edahiro (Nagoya University, Japan)
Page	pp. 46 - 51
Keyword	profiler, system level design, FPGA, control system
Abstract	This paper introduces a profiler architecture for control system in system-level design. When design a control system, we need to consider two things. The first thing is the asynchronous signal coming from sensor and actuators, called interrupt request signal. The second thing is the process should have a higher priority and be activated by interrupt request signal, called interrupt handler. However, existing profiler cannot obtain the information of the interrupt request signal nor interrupt handler.
PDF file

R1-10 (Time: 10:33 - 10:35)

Title	Socket-Based Performance Monitoring Tool Suite for System-on-Chips
Author	*Ting-Hsuan Wu, Tsun-Hsin Chang, Ing-Jer Huang (National Sun Yat-sen University, Taiwan)
Page	pp. 52 - 55
Keyword	performance, monitoring, system, software, hardware
Abstract	Since the SoC industry had shifted its development goal from processor clock frequencies increasing to work distribution among multiple IPs. In order to achieve better efficiency of SoC integration, the socket interfaces are adopted to eliminate the migration overhead from system to another. Therefore, this paper proposed a Socket-Based Performance Monitoring Tool Suite (SB PMTS) which is capable to provide a holistic-view of system behavior and performance by monitoring the two types of performance information: (1) The cycle-accurate execution time of a complete task. (2) The transaction events on the socket interfaces. Accordingly, SB PMTS will synchronize the performance information from different resources and enable the average designers to quickly assess the quality of the SoC without any instrumentation.

R1-11 (Time: 10:35 - 10:37)

Title	Minimization of Register Area Cost for Soft-Error Correction in Low Energy DMR Design
Author	*Kazuhito Ito, Takumi Negishi (Saitama University, Japan)
Page	pp. 56 - 61
Keyword	DMR, Low energy, Synthesis, Register minimization
Abstract	Double modular redundancy (DMR) is to execute an operation twice and detect soft-error by comparing the operation results. The soft-error is corrected by executing necessary operations again to obtain correct results. Such re-executing operations requires thier input data and many registers are needed to store the necessary data. In this paper, a method to minimize the area cost of registers is proposed while the minimization of operation energy consumption is considered with respect to the give constraints of time, resource, and delay penalty for error correction. The experimantal results show about 20% of register cost is reduced on average.
PDF file

R1-12 (Time: 10:37 - 10:39)

Title	Simultaneous Test Scheduling and TAM Bus Wire Assignment for Core-Based SoC Designs
Author	Te-Jui Wang, *Ching-Chun Chiu, Shih-Hsu Huang (Chung Yuan Christian University, Taiwan)
Page	pp. 62 - 67
Keyword	Core-Based Systems, Test Scheduling, Testing Time, Test Access Mechanism
Abstract	The reduction of total testing time is crucial for the saving of IC testing cost. In the testing of a core-based System-on-Chip (SoC) design, external tests are applied to cores via a specialized test access mechanism (TAM). Previous test scheduling algorithms assume that two external tests cannot utilize the TAM at the same time. However, in fact, if the external tests of different cores do not use the same TAM bus wire, they can be executed concurrently, which reduces the total testing time. Based on this observation, in this paper, we propose an effective and efficient algorithm to perform the simultaneous application of test scheduling and TAM bus wire assignment for the testing of core-based SoC designs. Compared with previous works, experimental results consistently show that the proposed approach can greatly reduce the total testing time.

R1-13 (Time: 10:39 - 10:41)

Title	Automatic Analog Synthesis Platform with Low-Noise Consideration
Author	Ying-Chi Lien, Ching-Mao Lee, Chih-Wei Li, *Yi-Syue Han, Chien-Nan Jimmy Liu (National Central University, Taiwan)
Page	pp. 68 - 71
Keyword	analog synthesis, bio-signal, automatic sizing, layout automation
Abstract	Because the bio-signals are often very weak, they can be influenced by noise easily and become hard to distinguish. In this paper, an automatic analog synthesis platform is presented for bio-acquisition systems to generate the required circuits from specification to layout with low-noise consideration. Process variations and layout effects are also simultaneously considered to generate the required circuits with high design yield. Furthermore, a user-friendly GUI is also provided to help users complete the design flow successfully and efficiently. As shown in the experimental results, this analog synthesis platform is able to generate the required circuits in seconds with low noise. The chip implementation result also verifies the capability of this tool to generate the required designs with fabricable quality.

R1-14 (Time: 10:41 - 10:43)

Title	Intra-Vehicle Network Routing Algorithm for Weight and Wireless Transmit Power Minimization
Author	*Ta-Yang Huang, Chia-Jui Chang (National Cheng Kung University, Taiwan), Chung-Wei Lin (University of California at Berkeley, U.S.A.), Sudip Roy (National Cheng Kung University, Taiwan), Tsung-Yi Ho (National Chiao Tung University, Taiwan)
Page	pp. 72 - 77
Keyword	In-Vehicle Network, Routing
Abstract	As the complexity of vehicle distributed systems increases rapidly, several hundreds of devices (sensors, actuators, etc.) are being placed in a modern automotive system. With the increase in wiring cables connecting these devices, the weight of a car increases significantly, which degrades the fuel efficiency in driving. In order to reduce the weight of a car, wireless communication has been introduced to replace wiring cables between some devices. However, the extra energy consumption for packet transmissions by wireless devices requires frequent maintenances, e.g., recharging of batteries. In this paper, we propose an intra-vehicle network routing algorithm to simultaneously minimize the wiring weight and the transmission power for wireless communication. Experimental results show that the proposed method can effectively minimize the wiring weight and the transmit power for wireless communication.

R1-15 (Time: 10:43 - 10:45)

Title	An Automated Flow Integration to Help Analog Layout Design Migration
Author	Jou-Chun Lin, *Po-Cheng Pan, Ching-Yu Chin, Hung-Ming Chen (National Chiao Tung University, Taiwan)
Page	pp. 78 - 82
Keyword	analog layout, design migration
Abstract	The development of the computer-aided-design (CAD) tools for digital circuits has been perfected for these years. However, the CAD tools for analog circuits still remains a great deal of challenges. Since the size of transistors scales down as the process technology advances, design migration problem takes place to increase the degree of layout reusing. With previous work such as placement migration and routing preservation tool, further performance boost becomes the next step. We aim at the width of wires that impacts resistance and capacitance of wires so as to improve the performance. We implement a flow, which can further improve the performance, generate the modified layout automatically and pass the verification check, to speed up the analysis process or design flow by adjusting the wire width. We apply greedy heuristic and simulated annealing algorithm in our framework. Our flow can help with the analog layout synthesis flow in more efficient way.
PDF file

R1-16 (Time: 10:45 - 10:47)

Title	Rip-Up and Reroute Based Routing Algorithm for Self-Aligned Double Patterning
Author	*Takeshi Ihara, Atsushi Takahashi (Tokyo Institute of Technology, Japan), Chikaaki Kodama (Toshiba, Japan)
Page	pp. 83 - 88
Keyword	SADP
Abstract	Self-Aligned Double Patterning (SADP) is an important manufacturing technique for sub 20 nm technology node. In this paper, a rip-up and reroute based routing algorithm for SADP is proposed to obtain a more reliable routing pattern efficiently. In SADP, a cut pattern which is introduced in pattern mask reduces the extra mask cost, but a cut pattern itself potentially degrades the reliability of image on a wafer. The proposed algorithm generates a routing pattern that needs less cut patterns.
PDF file

R1-17 (Time: 10:47 - 10:49)

Title	Analysis of the Distance Dependent Multiple Cell Upset Rates on 65-nm Redundant Latches by a PHITS-TCAD Simulation System
Author	*Kuiyuan Zhang, Jun Furuta, Kazutoshi Kobayashi (Kyoto Institute of Technology, Japan)
Page	pp. 89 - 93
Keyword	Soft error, PHITS, TCAD, MCU
Abstract	Recently, the soft error rates of integrated circuits is increased by process scaling. Soft error decreases the tolerance of VLSIs. Charge sharing and bipolar effect become dominant when a particle hit on latches and flip-flop. Soft error makes circuit more sensitive to Multiple Cell Upset (MCU). We analyze the MCU tolerance of redundant latches in 65 nm process by device simulation and particle and heavy ion transfer code system (PHITS). The MCU rate of redundant latches is exponential decreased by increasing the distance between redundant latches. These results coincide with the neutron experiments.

R1-18 (Time: 10:49 - 10:51)

Title	Feasible Shortest Path Frame Bounded Maze-Routing Algorithm for ML-OARST with Ripping up and Re-Building Steiner Points
Author	*Kuen-Wey Lin, Yeh-Sheng Lin, Yih-Lang Li (Institute of Computer Science and Engineering, National Chiao Tung University, Taiwan), Rung-Bin Lin (Computer Science and Engineering, Yuan Ze University, Taiwan)
Page	pp. 94 - 99
Keyword	Steiner tree, Routing, Obstacle-avoidance, Multilayer, Physical Design
Abstract	Owing to its large solution space, maze routing has never been used to solve the multi-layer obstacle-avoiding rectilinear Steiner tree problem (ML-OARST). This paper proposes the first maze routing-based algorithm that efficiently identifies a high-quality ML-OARST. Our algorithm employs a three-dimensional Hanan grid graph for maze routing and applies a novel scheme to identify good Steiner points. This significantly reduces the search overhead of maze routing. To reduce the routing cost of ML-OARST, we also develop a novel rip-up and re-building strategy for altering Steiner points and tree topology. Experimental results reveal that the proposed algorithm outperforms the state-of-the-art ML-OARST methods in wire-length and via costs. The required CPU time is comparable to that needed by spanning graph-based approaches.

R1-19 (Time: 10:51 - 10:53)

Title	A TPL-Friendly Legalizer for Standard Cell Based Design
Author	*Hsiu-Yu Lai, Ting-Chi Wang (National Tsing Hua University, Taiwan)
Page	pp. 100 - 105
Keyword	Triple Patterning Lithography, Placement, Legalization, Standard Cell, Layout Decomposition
Abstract	As the shrinking of the feature size and the delay of the next generation lithography, double patterning lithography (DPL) is no longer enough for 14/10nm technology node. Triple patterning lithography (TPL) is a nature extension from DPL, and it can not only triple the pitch but also reduce conflicts and stitches. Although TPL is more difficult and complicated than DPL, TPL is a promising alternative for 14/10nm technology node. In this paper, we consider TPL during the standard-cell legalization stage in order to let the resultant placement be more friendly to TPL layout decomposition. We provide a novel idea of reducing TPL conflicts through cell reordering and white space insertion. The experimental results show that as compared to a conventional legalizer, our legalizer is able to effectively reduce the numbers of conflicts and stitches.

R1-20 (Time: 10:53 - 10:55)

Title	Granularity of Via Configurable Logic Block for Structured ASIC
Author	Hui-Hsiang Tung (Oriental Institute of Technology, Taiwan), *Rung-Bin Lin (Yuan Ze University, Taiwan)
Page	pp. 106 - 110
Keyword	Structured ASIC, Via Configurable, Granularity, VLSI
Abstract	This article presents a systematic way to determine the granularity of via configurable logic block (VCLB) for structured ASIC. The systematic and experimental studies both show that a VCLB with four transistors laid over a single diffusion strip results in the best area utilization.

R1-21 (Time: 10:55 - 10:57)

Title	On the Impact of Initial Placement to SA-Based Placement for Mixed-Grained Reconfigurable Architecture
Author	*Takashi Kishimoto, Hiroyuki Ochi (Ritsumeikan University, Japan)
Page	pp. 111 - 116
Keyword	Simulated Annealing, Partitioning-based, Reconfigurable Architecture, Placement
Abstract	In this paper, we investigate a novel placement algorithm for mixed-grain reconfigurable architectures (MGRAs). The proposed algorithm applies partitioning-based method to LUTs to obtain an initial placement, followed by further optimization process for both LUTs and ALUs based on low temperature simulated annealing (SA) method. Compared with a conventional FPGA placement algorithm that uses SA with random initial placement, our method exhibits 9.3% smaller delay after running SA for half an hour. Our method is also superior in terms of final solution after several hours run.

SASIMI 2015 The 19th Workshop on Synthesis And System Integration of Mixed Information Technologies

SASIMI 2015
The 19th Workshop on Synthesis And System Integration of Mixed Information Technologies