(Back to Session Schedule)

SASIMI 2015
The 19th Workshop on Synthesis And System Integration of Mixed Information Technologies

Poster I
Time: 10:15 - 12:00 Monday, March 16, 2015
Chairs: Ren-Song Tsay (National Tsing Hua University, Taiwan), Po-Hung Lin (National Chung Cheng University, Taiwan)

R1-1 (Time: 10:15 - 10:17)
TitleMemory Synthesis for Multi-Processor System-on-Chips with Reconfigurable 3D-stacked SRAMs
AuthorMeng-Ling Tsai, *Yi-Jung Chen, Yi-Ting Chen, Ru-Hua Chang (National Chi Nan University, Taiwan)
Pagepp. 2 - 7
KeywordMemory Synthesis, Reconfigurable 3D-stacked SRAMs
AbstractIntegrating Multi-Processor System-on-Chips (MPSoCs) with 3D-stacked reconfigurable SRAM tiles has been proposed for embedded systems with high memory demands. At runtime, the SRAM tiles are configured into several memory areas, which can be reconfigured according to the dynamic behavior of the system. Targeting this architecture, in this paper, we propose a data placement and memory area allocation algorithm. The goal of the proposed algorithm is to optimize the performance of the memory system by minimizing the on-chip memory access latency, the number of off-chip memory accesses, and the number of reconfigurations. Since the behavior of an embedded system can be described by a set of scenarios, where each scenario specifies a set of applications that would execute concurrently, the proposed algorithm synthesizes data placements and the memory area allocation for each scenario. Not only the data access patterns within the scenario but also among all scenarios are considered for data placement. We evaluate the proposed algorithm on a set of synthetic and real-world applications. The experimental results show that, compared to the existing data placement method designed for MPSoCs with distributed memory modules, the proposed algorithm achieves up to 11.72% of data access latency reduction.

R1-2 (Time: 10:17 - 10:19)
TitleThermal-Pattern-Aware Voltage Assignment for Task Scheduler on 3D Multi-Core Processors
AuthorChien-Hui Liao, *Cheng Suo, Charles Hung-Pin Wen (National Chiao Tung University, Taiwan)
Pagepp. 8 - 9
Keywordtask scheduling, 3D MCPs, hotspots, DVFS, voltage assignment
AbstractIn three-dimensional multi-core processors (3D-MCPs), hotspots are found more often and cause severe problems on system reliability and lifetime. Moreover, higher frequency of hotspot occurrence triggers more dynamic voltage and frequency scaling (DVFS), leading to degraded throughput. Therefore, to reduce the frequency of hotspot occurrence effectively, a new thermal-constrained task-scheduling algorithm based on the thermal-pattern-aware voltage assignment is proposed. Through the temperature profiles of different voltage assignments on 3D-MCPs, thermal-pattern aware voltage assignment is applied for reducing the rate of temperature increase among 3D-MCPs effectively. Furthermore, the proposed scheduler includes on-line allocation for 3D vertically-grouping cores and new vertically-grouping voltage scaling which considers thermal correlation among vertically-adjacent cores in 3D MCPs. Experimental results show that, compared to the previous thermal-constrained task-scheduling strategy, our task-scheduling algorithm can reduce the frequency of hotspot occurrence by 38.84% and can further improve throughput by 6.62%.

R1-3 (Time: 10:19 - 10:21)
TitleHigh-Level Synthesis from Programs with External Interrupt Handling
Author*Naoya Ito, Nagisa Ishiura (Kwansei Gakuin University, Japan), Hiroyuki Tomiyama (Ritsumeikan University, Japan), Hiroyuki Kanbara (Advanced Scientific Technology & Management Research Institute of KYOTO, Japan)
Pagepp. 10 - 15
KeywordHigh-level synthesis, Binary synthesis, External interrupt, ACAP
AbstractThis paper presents a method of synthesizing a given binary program, which contains external interrupt handling, into hardware whose behavior is equivalent to the CPU running the program. In our method, the system control coprocessor which CPU uses for interrupt handling is incorporated into the hardware as a functional unit. Instructions for accessing coprocessor registers, returning from interrupt handling, and making system calls are scheduled as operations, and bound to the coprocessor. Jump register instructions for calling and returning from interrupt service routines are synthesized using operations that convert instruction addresses into the corresponding states of the hardware. Assuming MIPS R3000 as a CPU, the proposed method has been implemented on top of binary synthesizer ACAP. A program of about 40 lines with an external interrupt service routine was synthesized into hardware, and it was confirmed that interrupt handling works correctly. The execution cycles and the delay were reduced by 14% and 26% respectively, at the cost of 1.1 times increase in hardware size.
PDF file

R1-4 (Time: 10:21 - 10:23)
TitleAn SOC Estimation System for Lithium Ion Batteries Considering Thermal Characteristics
Author*Ryu Ishizaki, Lei Lin, Naoki Kawarabayashi, Masahiro Fukui (Ritsumeikan University, Japan)
Pagepp. 16 - 21
KeywordExtended Karman Filter, SOC estimation, Arrhenius formula, Lithium ion Batteries
AbstractThis paper discusses an SOC estimation system for lithium ion batteries based on the Extended Karman Filter. The accuracy of the estimation is strongly dependent on accuracy of the battery model. We have newly formulated the equivalent circuit model that considers temperature and SOC dependencies. As the result, the error rate of the estimation bas been improved significantly. The evaluation shows that the new SOC estimation system can be used for wide range of temperature.
PDF file

R1-5 (Time: 10:23 - 10:25)
TitleDynamic Data Migration to Eliminate Bank-Level Interference for Stencil Applications in Multicore Systems
AuthorWei-Hen Lo, *Yen-Hao Chen, TingTing Hwang (National Tsing Hua University, Taiwan)
Pagepp. 22 - 27
Keyworddata migration, memory controller, page allocation, stencils, multi-threaded
AbstractA stencil computation repeatedly updates each point of a d-dimensional grid as a function of itself and its near neighbors. Modern automatic transformation compiler framework can generate efficient tiling parallel stencil codes. Dynamically scheduling parallel stencils significantly improves system performance. However, memory contention problem exacerbates because of less idling cores and more memory requests sent to the DRAM memory. Traditional OS page coloring method which partitions the memory pages in advance can not alleviate the memory contention in dynamic scheduling parallel stencils. To address this issue, we provide a new software/hardware cooperative dynamic data migration method by exploiting the update-and-reuse property of stencils. We notice that the OS page allocation needs to be aware of the flexibility for dynamic data migration in memory to eliminate the memory interference. Experimental evaluation in a 8-core x86 system shows that our method can improve the system performance by 7% as compared with dynamic scheduling stencils in 8-cores 4-memory banks system.
PDF file

R1-6 (Time: 10:25 - 10:27)
TitleA Battery Smart Sensor and Its SOC Estimation Function for Assembled Lithium-Ion Batteries
Author*Naoki Kawarabayashi, Lei Lin, Ryu Ishizaki, Masahiro Fukui (Ritsumeikan University, Japan), Isao Shirakawa (University of Hyogo, Japan)
Pagepp. 28 - 33
Keywordassembled Lithium-ion batteries, Battery Smart Sensor, SOC
AbstractThis paper discusses about the smart sensor which is the important technology in a smart grid. We have developed the system to monitor the battery condition by the attached sensor. It accumulates the measured data onto the WEB. The battery sensor is implemented with a microcomputer. We have first developed a high accurate and practical SOC sensor using the Extended Kalman filter as a function of the battery sensor. Based on the SOC estimation function for a single cell, the SOC estimation function for assembled Lithium-ion batteries is also developed.
PDF file

R1-7 (Time: 10:27 - 10:29)
TitleA Fast and Highly Accurate Statistical Based Model for Performance Estimation of MPSoC On-Chip Bus
Author*Farhan Shafiq, Tsuyoshi Isshiki, Dongju Li, Hiroaki Kunieda (Tokyo Institute of Technology, Japan)
Pagepp. 34 - 39
Keywordbus, statistical model, performance prediction, arbitration stall, bus stall
AbstractWhile Multiprocessor System-On-Chips (MPSoCs) are becoming widely adopted in embedded systems, communication architecture analysis for MPSoCs becomes ever more complex. There is a growing need for faster and accurate performance estimation techniques for on-chip bus. In this paper, we present a novel statistical based technique that makes use of accumulated "workload statistics" to accurately predict the "stall cycle counts" caused due to bus contention. This eliminates the need to simulate arbitration on every bus access, resulting in substantial speed-up. It is assumed that each Processor in the system has a distinct fixed priority, and arbitration is based on priority. We verify accuracy of our proposed model against results achieved by cycle accurate simulation. Two kinds of traffic is used for experiments. Synthetically generated traffic as well as traffic from real-world application is used to verify the bus model. We report an accuracy with an error range of 0.1% - 5% for the synthetic traffic as well as achieving a speedup of 7-10x. For the real traffic, we use a limited “single blocking” bus model and report results accordingly.
PDF file

R1-8 (Time: 10:29 - 10:31)
TitleC-Based RTL Design Framework for Processor and Hardware-IP Synthesis
Author*Tsuyoshi Isshiki, Koshiro Date, Daisuke Kugimiya, Dongju Li, Hiroaki Kunieda (Tokyo Institute of Technology, Japan)
Pagepp. 40 - 45
KeywordC-based design, RTL synthesis, processor synthesis, verification, instruction-set simulator
AbstractIn this paper, we propose a new C-based design framework where the RTL structure is directly described on dataflow C coding style, while the same C code serves as a fast simulation model. Design examples on image signal processing pipeline shows the effectiveness of the proposed C-based tool framework where the dataflow C codes have 1/3 to 1/5 of the number of lines compared to HDLs, can generate high performance circuits having enormously high parallelism of 4000 operations/cycle. Also for RISC processor designs, our dataflow C coding style effectively captures the behavior of the instruction set simulator with less than 1000 lines of C code which is can be directly transformed into RTL structure
PDF file

R1-9 (Time: 10:31 - 10:33)
TitleProfiler for Control System in System Level Design
Author*Miaw Torng-Der, Yuki Ando, Shinya Honda, Hiroaki Takada, Masato Edahiro (Nagoya University, Japan)
Pagepp. 46 - 51
Keywordprofiler, system level design, FPGA, control system
AbstractThis paper introduces a profiler architecture for control system in system-level design. When design a control system, we need to consider two things. The first thing is the asynchronous signal coming from sensor and actuators, called interrupt request signal. The second thing is the process should have a higher priority and be activated by interrupt request signal, called interrupt handler. However, existing profiler cannot obtain the information of the interrupt request signal nor interrupt handler.
PDF file

R1-10 (Time: 10:33 - 10:35)
TitleSocket-Based Performance Monitoring Tool Suite for System-on-Chips
Author*Ting-Hsuan Wu, Tsun-Hsin Chang, Ing-Jer Huang (National Sun Yat-sen University, Taiwan)
Pagepp. 52 - 55
Keywordperformance, monitoring, system, software, hardware
AbstractSince the SoC industry had shifted its development goal from processor clock frequencies increasing to work distribution among multiple IPs. In order to achieve better efficiency of SoC integration, the socket interfaces are adopted to eliminate the migration overhead from system to another. Therefore, this paper proposed a Socket-Based Performance Monitoring Tool Suite (SB PMTS) which is capable to provide a holistic-view of system behavior and performance by monitoring the two types of performance information: (1) The cycle-accurate execution time of a complete task. (2) The transaction events on the socket interfaces. Accordingly, SB PMTS will synchronize the performance information from different resources and enable the average designers to quickly assess the quality of the SoC without any instrumentation.

R1-11 (Time: 10:35 - 10:37)
TitleMinimization of Register Area Cost for Soft-Error Correction in Low Energy DMR Design
Author*Kazuhito Ito, Takumi Negishi (Saitama University, Japan)
Pagepp. 56 - 61
KeywordDMR, Low energy, Synthesis, Register minimization
AbstractDouble modular redundancy (DMR) is to execute an operation twice and detect soft-error by comparing the operation results. The soft-error is corrected by executing necessary operations again to obtain correct results. Such re-executing operations requires thier input data and many registers are needed to store the necessary data. In this paper, a method to minimize the area cost of registers is proposed while the minimization of operation energy consumption is considered with respect to the give constraints of time, resource, and delay penalty for error correction. The experimantal results show about 20% of register cost is reduced on average.
PDF file

R1-12 (Time: 10:37 - 10:39)
TitleSimultaneous Test Scheduling and TAM Bus Wire Assignment for Core-Based SoC Designs
AuthorTe-Jui Wang, *Ching-Chun Chiu, Shih-Hsu Huang (Chung Yuan Christian University, Taiwan)
Pagepp. 62 - 67
KeywordCore-Based Systems, Test Scheduling, Testing Time, Test Access Mechanism
AbstractThe reduction of total testing time is crucial for the saving of IC testing cost. In the testing of a core-based System-on-Chip (SoC) design, external tests are applied to cores via a specialized test access mechanism (TAM). Previous test scheduling algorithms assume that two external tests cannot utilize the TAM at the same time. However, in fact, if the external tests of different cores do not use the same TAM bus wire, they can be executed concurrently, which reduces the total testing time. Based on this observation, in this paper, we propose an effective and efficient algorithm to perform the simultaneous application of test scheduling and TAM bus wire assignment for the testing of core-based SoC designs. Compared with previous works, experimental results consistently show that the proposed approach can greatly reduce the total testing time.

R1-13 (Time: 10:39 - 10:41)
TitleAutomatic Analog Synthesis Platform with Low-Noise Consideration
AuthorYing-Chi Lien, Ching-Mao Lee, Chih-Wei Li, *Yi-Syue Han, Chien-Nan Jimmy Liu (National Central University, Taiwan)
Pagepp. 68 - 71
Keywordanalog synthesis, bio-signal, automatic sizing, layout automation
AbstractBecause the bio-signals are often very weak, they can be influenced by noise easily and become hard to distinguish. In this paper, an automatic analog synthesis platform is presented for bio-acquisition systems to generate the required circuits from specification to layout with low-noise consideration. Process variations and layout effects are also simultaneously considered to generate the required circuits with high design yield. Furthermore, a user-friendly GUI is also provided to help users complete the design flow successfully and efficiently. As shown in the experimental results, this analog synthesis platform is able to generate the required circuits in seconds with low noise. The chip implementation result also verifies the capability of this tool to generate the required designs with fabricable quality.

R1-14 (Time: 10:41 - 10:43)
TitleIntra-Vehicle Network Routing Algorithm for Weight and Wireless Transmit Power Minimization
Author*Ta-Yang Huang, Chia-Jui Chang (National Cheng Kung University, Taiwan), Chung-Wei Lin (University of California at Berkeley, U.S.A.), Sudip Roy (National Cheng Kung University, Taiwan), Tsung-Yi Ho (National Chiao Tung University, Taiwan)
Pagepp. 72 - 77
KeywordIn-Vehicle Network, Routing
AbstractAs the complexity of vehicle distributed systems increases rapidly, several hundreds of devices (sensors, actuators, etc.) are being placed in a modern automotive system. With the increase in wiring cables connecting these devices, the weight of a car increases significantly, which degrades the fuel efficiency in driving. In order to reduce the weight of a car, wireless communication has been introduced to replace wiring cables between some devices. However, the extra energy consumption for packet transmissions by wireless devices requires frequent maintenances, e.g., recharging of batteries. In this paper, we propose an intra-vehicle network routing algorithm to simultaneously minimize the wiring weight and the transmission power for wireless communication. Experimental results show that the proposed method can effectively minimize the wiring weight and the transmit power for wireless communication.

R1-15 (Time: 10:43 - 10:45)
TitleAn Automated Flow Integration to Help Analog Layout Design Migration
AuthorJou-Chun Lin, *Po-Cheng Pan, Ching-Yu Chin, Hung-Ming Chen (National Chiao Tung University, Taiwan)
Pagepp. 78 - 82
Keywordanalog layout, design migration
AbstractThe development of the computer-aided-design (CAD) tools for digital circuits has been perfected for these years. However, the CAD tools for analog circuits still remains a great deal of challenges. Since the size of transistors scales down as the process technology advances, design migration problem takes place to increase the degree of layout reusing. With previous work such as placement migration and routing preservation tool, further performance boost becomes the next step. We aim at the width of wires that impacts resistance and capacitance of wires so as to improve the performance. We implement a flow, which can further improve the performance, generate the modified layout automatically and pass the verification check, to speed up the analysis process or design flow by adjusting the wire width. We apply greedy heuristic and simulated annealing algorithm in our framework. Our flow can help with the analog layout synthesis flow in more efficient way.
PDF file

R1-16 (Time: 10:45 - 10:47)
TitleRip-Up and Reroute Based Routing Algorithm for Self-Aligned Double Patterning
Author*Takeshi Ihara, Atsushi Takahashi (Tokyo Institute of Technology, Japan), Chikaaki Kodama (Toshiba, Japan)
Pagepp. 83 - 88
KeywordSADP
AbstractSelf-Aligned Double Patterning (SADP) is an important manufacturing technique for sub 20 nm technology node. In this paper, a rip-up and reroute based routing algorithm for SADP is proposed to obtain a more reliable routing pattern efficiently. In SADP, a cut pattern which is introduced in pattern mask reduces the extra mask cost, but a cut pattern itself potentially degrades the reliability of image on a wafer. The proposed algorithm generates a routing pattern that needs less cut patterns.
PDF file

R1-17 (Time: 10:47 - 10:49)
TitleAnalysis of the Distance Dependent Multiple Cell Upset Rates on 65-nm Redundant Latches by a PHITS-TCAD Simulation System
Author*Kuiyuan Zhang, Jun Furuta, Kazutoshi Kobayashi (Kyoto Institute of Technology, Japan)
Pagepp. 89 - 93
KeywordSoft error, PHITS, TCAD, MCU
AbstractRecently, the soft error rates of integrated circuits is increased by process scaling. Soft error decreases the tolerance of VLSIs. Charge sharing and bipolar effect become dominant when a particle hit on latches and flip-flop. Soft error makes circuit more sensitive to Multiple Cell Upset (MCU). We analyze the MCU tolerance of redundant latches in 65 nm process by device simulation and particle and heavy ion transfer code system (PHITS). The MCU rate of redundant latches is exponential decreased by increasing the distance between redundant latches. These results coincide with the neutron experiments.

R1-18 (Time: 10:49 - 10:51)
TitleFeasible Shortest Path Frame Bounded Maze-Routing Algorithm for ML-OARST with Ripping up and Re-Building Steiner Points
Author*Kuen-Wey Lin, Yeh-Sheng Lin, Yih-Lang Li (Institute of Computer Science and Engineering, National Chiao Tung University, Taiwan), Rung-Bin Lin (Computer Science and Engineering, Yuan Ze University, Taiwan)
Pagepp. 94 - 99
KeywordSteiner tree, Routing, Obstacle-avoidance, Multilayer, Physical Design
AbstractOwing to its large solution space, maze routing has never been used to solve the multi-layer obstacle-avoiding rectilinear Steiner tree problem (ML-OARST). This paper proposes the first maze routing-based algorithm that efficiently identifies a high-quality ML-OARST. Our algorithm employs a three-dimensional Hanan grid graph for maze routing and applies a novel scheme to identify good Steiner points. This significantly reduces the search overhead of maze routing. To reduce the routing cost of ML-OARST, we also develop a novel rip-up and re-building strategy for altering Steiner points and tree topology. Experimental results reveal that the proposed algorithm outperforms the state-of-the-art ML-OARST methods in wire-length and via costs. The required CPU time is comparable to that needed by spanning graph-based approaches.

R1-19 (Time: 10:51 - 10:53)
TitleA TPL-Friendly Legalizer for Standard Cell Based Design
Author*Hsiu-Yu Lai, Ting-Chi Wang (National Tsing Hua University, Taiwan)
Pagepp. 100 - 105
KeywordTriple Patterning Lithography, Placement, Legalization, Standard Cell, Layout Decomposition
AbstractAs the shrinking of the feature size and the delay of the next generation lithography, double patterning lithography (DPL) is no longer enough for 14/10nm technology node. Triple patterning lithography (TPL) is a nature extension from DPL, and it can not only triple the pitch but also reduce conflicts and stitches. Although TPL is more difficult and complicated than DPL, TPL is a promising alternative for 14/10nm technology node. In this paper, we consider TPL during the standard-cell legalization stage in order to let the resultant placement be more friendly to TPL layout decomposition. We provide a novel idea of reducing TPL conflicts through cell reordering and white space insertion. The experimental results show that as compared to a conventional legalizer, our legalizer is able to effectively reduce the numbers of conflicts and stitches.

R1-20 (Time: 10:53 - 10:55)
TitleGranularity of Via Configurable Logic Block for Structured ASIC
AuthorHui-Hsiang Tung (Oriental Institute of Technology, Taiwan), *Rung-Bin Lin (Yuan Ze University, Taiwan)
Pagepp. 106 - 110
KeywordStructured ASIC, Via Configurable, Granularity, VLSI
AbstractThis article presents a systematic way to determine the granularity of via configurable logic block (VCLB) for structured ASIC. The systematic and experimental studies both show that a VCLB with four transistors laid over a single diffusion strip results in the best area utilization.

R1-21 (Time: 10:55 - 10:57)
TitleOn the Impact of Initial Placement to SA-Based Placement for Mixed-Grained Reconfigurable Architecture
Author*Takashi Kishimoto, Hiroyuki Ochi (Ritsumeikan University, Japan)
Pagepp. 111 - 116
KeywordSimulated Annealing, Partitioning-based, Reconfigurable Architecture, Placement
AbstractIn this paper, we investigate a novel placement algorithm for mixed-grain reconfigurable architectures (MGRAs). The proposed algorithm applies partitioning-based method to LUTs to obtain an initial placement, followed by further optimization process for both LUTs and ALUs based on low temperature simulated annealing (SA) method. Compared with a conventional FPGA placement algorithm that uses SA with random initial placement, our method exhibits 9.3% smaller delay after running SA for half an hour. Our method is also superior in terms of final solution after several hours run.