|
|
|||||||||||||||||
Monday, March 11, 2024 |
Title | (Keynote Speech) My Last Dance -- Development and Applications of a Memory-Traffic-Efficient Convolutional Neural Network |
Author | *Youn-Long Lin (National Tsing Hua University, Taiwan) |
Page | p. 1 |
Keyword | Convolutional Neural Network, HarDNet |
Abstract | In this presentation, I will introduce the design of the HarDNet, an efficient and accurate convolutional neural network architecture. The fundamental idea behind its design is to minimize DRAM access, considering the slower speed and higher energy consumption of DRAM compared to fast and cost-effective arithmetic operations. HarDNet architecture has undergone optimization for speed and energy efficiency, making it an ideal choice for various applications, including object detection, semantic segmentation, and medical image segmentation. HarDNet is open-source, allowing anyone to use and modify it. The success of HarDNet has been significant in various fields and countries, such as autonomous driving, industrial automation, vehicle safety, environmental monitoring, colonoscopy polyp segmentation, and MRI imaging. |
PDF file |
Title | A Novel Task Deployment Framework for Heterogeneous Multicore Systems Considering Circuit Aging |
Author | Yu-Guang Chen (National Central University, Taiwan), Ing-Chao Lin, Yu-Lin Chen, *Yi-Ping Chen (National Cheng Kung University, Taiwan) |
Page | pp. 2 - 7 |
Keyword | heterogeneous multicore systems, aging effects, asymmetric aging |
Abstract | Heterogeneous multicore systems are widely used nowadays to trade-off between computing performance and power consumption. ARM big.LITTLE architecture is such an example which consists of high-performance big cores and low-power LITTLE cores to provide execution flexibility. On the other hand, the aging effect becomes a non-negligible threat with shrinking technology. NBTI is one of the most severe aging effects which can cause timing violation or even system failure. Previous researchers proposed various techniques to mitigate the impact of NBTI. However, most of these researches only focus on homogeneous multicore architecture and cannot directly apply to the heterogeneous multicore systems. Moreover, none of these methods consider the real-time applications where a very tight timing constraint may be applied to a task even after circuit aging. Therefore, in this paper, we investigate the characteristics of heterogeneous multicore systems and propose an aging-aware framework to improve the system lifetime. In particular, we propose to use the asymmetric aging concept which keeps a few cores robust to address the critical tasks at later life stage, and the task migration technique which executes a single task with different types of cores to provide a better trade-off between energy consumption and system lifetime. Experimental results show that the proposed framework can achieve 5.29x to 10.78x lifetime improvement and 11.8% to 23.8% average power consumption saving. |
PDF file |
Title | FPGA Implementation of a DPU-Based Facial Expression Recognition System |
Author | *Takuto Ando, Yusuke Inoue (National Institute of Technology, Oita College, Japan) |
Page | pp. 8 - 13 |
Keyword | FPGA, Facial Expression Recognition, Face Detection, CNN, DPU |
Abstract | In this paper, we implemented a stand-alone DPU-based facial expression recognition system on SoC FPGA. The system consists of a face detection step and a facial expression recognition step. In conventional FPGA-based facial expression recognition systems, the Haar Cascade detector is run in the CPU due to FPGA resource limitations in the face detection step. However, the Haar Cascade detector is less accurate than DNN-based face detection for images of profile faces and images with changing lighting conditions. On the other hand, face detection using a DNN such as YOLO requires a long latency when executed on a CPU with low computing performance. Therefore, We offload face detection and facial expression recognition by DNN to DPU, a CNN accelerator on FPGA, to speed up the processing. In this work, we combined face detection with YOLOv2 tiny and CNN-based facial expression recognition on the same DPU. The same DPU was used to implement the facial expression recognition system, which enabled efficient use of FPGA resources while minimizing the size of the circuitry. |
PDF file |
Title | An Optoelectronic Pipelined Convolutional-RNN Architecture for Energy-Efficient AI Accelerator |
Author | *Chunlu Wang, Yutaka Masuda, Tohru Ishihara (Nagoya University, Japan) |
Page | pp. 14 - 19 |
Keyword | optical computing, Convolutional Neural Network, Recurrent Neural Network |
Abstract | This paper proposes an optoelectronic Convolutional Recurrent Neural Network (C_RNN) architecture, employing RNN layers that replace area consuming fully connected layers and process image data in a pipelined batch manner. It takes advantage of both the high feature extraction capabilities of CNNs and the compact and power-efficient nature of RNNs. The proposed optoelectronic C_RNN architecture achieves over 97.8% accuracy on MNIST dataset while maintaining the advantages of power-efficient and high-speed characteristics of photonics. Our proposed optoelectronic C_RNN architecture can reach 300 TOPs/W, which is 12 times more efficient than CMOS-based dedicated CNN accelerators. |
PDF file |
Title | Double Moduler Redundancy Design of LSI Controller for Soft Error Tolerance |
Author | *Katsutoshi Otsuka, Kazuhito Ito (Saitama University, Japan) |
Page | pp. 20 - 25 |
Keyword | LSI, soft error, redundancy, DMR |
Abstract | A soft error in LSI is a temporary malfunction in which stored data or signals are flipped. Redundancy is used to correct soft errors. Double modular redundancy performs computation execution and data recording in duplicate, detects soft errors through comparison, and corrects errors by re-executing the computation. It is preferable in terms of LSI area and power consumption compared to triple modular redundancy. While many studies have been conducted on redundancy in LSI datapaths, there have been few reports on double modular redundancy in LSI control units. In this paper, a double redundancy design for LSI controllers is proposed. |
PDF file |
Title | Architecture and Implementation of Micro-ROS with OpenAMP on an Heterogeneous Multi-core Processor |
Author | *Vincent Conus, Shinya Honda, Shinkichi Inagaki (Nanzan University, Japan) |
Page | pp. 26 - 31 |
Keyword | micro-ROS, OpenAMP, HMP, Linux, RTOS |
Abstract | Integration of a variety of systems on a chip has become possible in recent years, making heterogeneous multi-core processors (HMP) available as development targets. In this article, the implementation and deployment of the Robot Operating System (ROS) and micro-ROS on an HMP is presented, as these are very popular choices as middleware in robotics, automotive and beyond. We are focusing on the architecture of the system and the use of OpenAMP shared-memory system as a mean of communication as well as on the early result in data transfer speed improvement compared to communication using a serial bus. |
PDF file |
Title | Efficient FPGA Implementation of Binarized Neural Networks Based on Generalized Parallel Counter Tree |
Author | Takahiro Tanigawa, *Mugi Noda, Nagisa Ishiura (Kwansei Gakuin University, Japan) |
Page | pp. 32 - 37 |
Keyword | binarized neural networks, generalized parallel counters, FPGA implementation, compressor trees |
Abstract | Binarized neural networks (BNN) allow compact hardware implementation by binarizing weight values and neuron activations. The critical path delay of a combinational circuit implementing a BNN neuron may be curbed by adopting a Wallace tree of full-adders. However, in FPGA implementation, a 3-input full-adder does not make full use of LUTs of more than 5 inputs. This paper proposes the use of a GPC (generalized parallel counter) based compressor tree in FPGA implementation of a BNN neuron to reduce both the delay and size of the resulting circuit. We further enhance the efficiency of the circuit by reducing the comparison of the popcount and threshould into reference to the carry signal from the compressor tree. The critical path delay and the slice count of our BNN neuron, implemented on a Xilinx Artinx-7 FPGA, were smaller by 6.3% and 12.0%, respectively, compared to those of the circuit produced by regular logic synthesis, at number of inputs 1024. |
PDF file |
Title | Circuit Division for Gaussian Elimination-based NNA-Compliant Circuit Synthesis Utilizing Reinforcement Learning |
Author | *Huan Yu (Ritsumeikan University, Japan), Atsushi Matsuo (IBM Research - Tokyo, Japan), Shigeru Yamashita (Ritsumeikan University, Japan) |
Page | pp. 38 - 43 |
Keyword | Nearest Neighboring Architecture constraint, CNOT circuit, Gaussian elimination, model-based reinforcement learning |
Abstract | To implement quantum circuits on actual quantum devices, it’s imperative that these quantum circuits adhere to nearest neighboring architecture (NNA) constraint. Among the various methods available, Gaussian elimination stands out for its exceptional efficiency. Instead of inserting SWAP gates, Gaussian elimination synthesizes an NNA-compliant circuit by transforming a matrix, which represents the functionality of the CNOT circuit, into an identity matrix. In this paper, we introduce a novel method based on Model-based reinforcement learning to improve Gaussian elimination. Our approach begins by dividing the CNOT circuit into multiple subcircuits, a process implemented by Model-based reinforcement learning. Subsequently, Gaussian elimination is applied to each of these subcircuits, resulting in NNA-compliant subcircuits. By integrating all these NNA-compliant subcircuits, we synthesize the targeted NNA-compliant circuit. Experimental results validate our method, revealing that, compared to conventional Gaussian elimination, our approach reduces the number of CNOT gates in NNA-compliant circuits by an average of 13.21%. Given the inherent high efficiency of Gaussian elimination, this represents a noteworthy advancement. |
Title | Automated FPGA Implementation of Convolutional Neural Networks with Pipelining and Layer Partitioning |
Author | Eito Yamada, *Kazuyoshi Takagi (Mie University, Japan) |
Page | pp. 44 - 45 |
Keyword | FPGA, convolutional neural network, high-level synthesis |
Abstract | Convolutional neural networks (CNNs) are used in various machine learning applications. In this work, we show an acceleration scheme for CNN processing using field programmable gate arrays (FPGAs). High throughput operation is achieved by pipelining operation and layer partitioning. We also propose an automated design flow to map CNN operations on FPGA. In our experiments, the CNN operations for the Fashion-MNIST and CIFAR-10 datasets are about 140 to 250 times faster compared to CPU execution. |
PDF file |
Title | Masking Regularity of Noise for Tamper-resistant Design on FPGAs |
Author | *Yui Koyanagi, Tomoaki Ukezono (Fukuoka University, Japan) |
Page | pp. 46 - 49 |
Keyword | FPGA, Tamper-resistance, Power Analysis Attacks, Side-Channel Attacks, Noise |
Abstract | In recent years, there have been numerous instances of FPGA integration into products. However, FPGA implementations are inherently more vulnerable to side-channel attacks compared to ASIC implementations. Since FPGAs integrated into products need to be cheap, applying tamper-resistant circuit design that sacrifice the area overhead, as researched in the past, is not practical. This paper improves upon conventional study that leveraged FPGA hard macros to achieve low overhead while enhancing tamper-resistance. The proposed circuit configuration method achieves low overhead while further enhancing tamper resistance. |
PDF file |
Title | A Fast Three-layer Bottleneck Channel Track Assignment with Layout Constraints using ILP |
Author | *Kazuya Taniguchi, Satoshi Tayu, Atsushi Takahashi (Tokyo Institute of Technology, Japan), Mathieu Molongo, Makoto Minami, Katsuya Nishioka (Jedat, Japan) |
Page | pp. 50 - 55 |
Keyword | Channel Routing, Bottleneck Routing, Analog VLSI, Integer Linear Programming |
Abstract | An algorithm for a bottleneck channel routing problem that uses Integer Linear Programming is proposed. The proposed algorithm determines the track and layer assignment of nets for three-layer bottleneck channel routing problem with layout constraints in which pins of each net are placed on the upper boundary of the adjacent regions on both sides of the bottleneck channel. The proposed algorithm restricts the routing pattern of each net to one of three patterns by taking feasibility into account, and outputs a solution in a few seconds when the number of nets is 300. |
PDF file |
Title | Fast Integer Linear Programming for Set-Pair Routing Problem |
Author | *Yasuhiro Takashima (University of Kitakyushu, Japan) |
Page | pp. 56 - 61 |
Keyword | set-pair routing, integer linear programming, reachable vertex set |
Abstract | This paper introduces an efficient ap- proach to integer linear programming for addressing the set-pair routing problem. Unlike previous works, which either employ fast heuristics that may not yield optimal solutions or exact methods that may exceed practical processing time, the proposed method is a rapid integer programming formulation. This ap- proach ensures a balance between practical processing time and the delivery of optimized or high-quality so- lutions. Empirical evidence validates the efficiency of the proposed method. |
PDF file |
Title | Multi-pin Net Substrate Routing Framework for Fine Pitch Ball Grid Array |
Author | Ming-Yen Chuang, *Yi-Yu Liu (National Taiwan University of Science and Technology, Taiwan) |
Page | pp. 62 - 67 |
Keyword | Substrate Routing, Multi-pin Net |
Abstract | The packaging substrate is an essential carrier for integrated circuits (IC) and printed circuit boards (PCB). The quality of substrate routing is a critical factor for the efficiency and accuracy of signal connection in the substrate. However, most of the available automatic substrate routers only focus on the part of two-pin nets. Substrate engineers still need to complete the routing for multi-pin nets manually. It is inefficient, time-consuming, and error-prone, especially for a large number of pins in the net, and even delays the time to market. In this paper, we proposed a three-stage framework for multi-pin net routing on fine pitch ball grid array package, including pin grouping, minimum spanning tree topology generation, and group topology connection. It accomplishes not only the connection from finger to bump ball but also the connections between bump balls and between bonding fingers. Experimental results from 6 industrial designs demonstrate that our framework completes multi-pin net routing with better routing results. |
PDF file |
Title | Transmitting Coil for Uniform Magnetic Flux Density |
Author | *Tatsumu Mitsuhashi, Toshiki Kanamoto (Hirosaki University, Japan), Koutaro Hachiya (Teikyo Heisei University, Japan), Atsushi Kurokawa (Hirosaki University, Japan) |
Page | pp. 68 - 73 |
Keyword | transmitting coil, uniform magnetic field distribution, wireless power transmission |
Abstract | In wireless power transmission, achieving a uniform magnetic field distribution near the transmitting coil is important for charging multiple devices simultaneously and ensuring tolerance to the misalignment of devices, from the viewpoint of stable power transmission efficiency, which depends on the position of the receiving coil. In this paper, we propose a new transmitting coil structure to achieve a uniform magnetic flux density distribution. The transmitting coil features an outer part with evenly spaced wires, and a central part that has wires with progressively narrower spacing toward the outside. The coil structures are represented by polar equations. Moreover, we present a method to determine an optimal transmitting coil structure by specifying the outer diameter and the minimum required magnetic flux density. The structure is determined by using a deep neural network (DNN) model that learns the relationship between the coil structure parameters, magnetic flux density, and coefficient of variation (CoV) of the magnetic flux density. The verification results show that the CoVs of the magnetic flux density for the conventional and proposed transmitting coils are 0.43 and 0.11, respectively, and the proposed transmitting coil can generate a more uniform magnetic field distribution than the conventional one. |
Title | A Comparator with Controllable Offset Voltage Variation for Stochastic Flash ADC |
Author | *Taira Sakaguchi, Satoshi Komatsu (Tokyo Denki University, Japan) |
Page | pp. 74 - 77 |
Keyword | Offset voltage variation, Stochastic flash ADC, Controlling variation, StrongARM comparator |
Abstract | We propose a comparator with controllable offset voltage variation for stochastic flash ADC. The proposed comparator is based on a conventional StrongARM comparator, and additional transistors control the differential pair's currents to control the offset voltage variation. The circuit simulation results show that the standard deviation of the offset voltages variation is changed from 17.4 mV to 74.7 mV by digital control when a reference voltage Vref is 0.9 V. |
PDF file |
Title | Development of a Remote Monitoring System for Lithium-ion Batteries by Using IoT and Real-time Processing |
Author | *Kosuke Shibuya, Masahiro Fukui (Ritsumeikan University, Japan) |
Page | pp. 78 - 83 |
Keyword | Lithium-ion battery, Kalman Filter, MQTT, IoT |
Abstract | In the context of the rapid proliferation of electric vehicles and challenges such as shortages and price increases in rare metals, there is a growing demand for reused batteries with significant performance variations. Furthermore, with the increasing importance of remote monitoring, encompassing overall lifecycle management, efficient utilization, and early detection of malfunctions, including both reused and new batteries, there is a high demand for the development of a remote monitoring and history management system for lithium-ion batteries. Against this background, the authors are actively engaged in the development of a system for dynamic charge state monitoring using IoT and the accumulation of historical data on the cloud. This paper presents a report on the development of dynamic charge state monitoring using Kalman Filter and data communication/management using MQTT. |
PDF file |
Title | Development of Snowfall Prediction System using X-band Weather Radar and Artificial Intelligence |
Author | *Atsushi Onodera, Masashi Imai (Hirosaki University, Japan) |
Page | pp. 84 - 85 |
Keyword | Snowfall prediction system, X-band weather radar, Artificial intelligence, RNN model, CNN model |
Abstract | A snowfall prediction system using X-band weather radar and artificial intelligence is developed to mitigate the impact of snow-related damage. In this paper, several attempts to design AIs in which model parameters and input datasets are varied are explained and their evaluation results are shown. As a result, it is confirmed that the combination of the average values of radar data and the number of radar data is effective to predict snowfall using the developed AI based on the RNN model. |
Title | (Invited Talk) Technology Challenges of Verification and Post-Silicon Validation for Supercomputer Fugaku |
Author | Takahide Yoshikawa (Fujitsu Ltd., Japan) |
Page | p. 86 |
Keyword | Supercomputer, Verification |
Abstract | Fujitsu has developed the world’s fastest supercomputer systems, including K Computer and supercomputer Fugaku. Such supercomputer systems are very large and complex, with more than 100,000 CPUs connected by more than 100,000 optical cables with a total length of about 900km. Once a functional or performance issue is found after the large-scale system has been assembled, it is difficult to identify and fix the cause. Therefore, in order to ensure that the system operates stably with the correct functions, expected power, and performance, various kinds of technologies are applied from requirements definition to manufacturing. In this presentation, I will introduce the overview of the Fugaku and the technologies used for verification (extraction of verification items, simulation, formal, power, and performance verification), testing (ATPG), post-silicon validation (automatic test generation), and manufacturing testing (test time reduction), in the development of Fugaku. |
PDF file |
Title | Enhancing visual similarities in DNA-based similar image retrieval |
Author | *Takefumi Koike, Takashi Sato (Kyoto University, Japan) |
Page | pp. 87 - 92 |
Keyword | DNA storage, Deep Metric Learing, Content-Based Image Retrieval |
Abstract | With the exponential growth of digital data, DNA is emerging as an attractive storage and computing medium. Designing digital data for appropriate DNA sequences and subsequently assessing the design methodology is important. In this paper, we propose to use image classification as a quantifiable task to evaluate the DNA encoder for similar image searches. In addition, we propose a triplet network-based DNA encoder to enhance the encoding performance. The study demonstrates that the proposed encoder outperforms existing encoders in retrieval accuracy. |
Title | An IoT platform "My-IoT" and its enhancement |
Author | *Hidetomo Shibamura (Kyushu University, Japan), Yoshimitsu Okayama (The University of Electro-Communications, Japan), Koji Inoue (Kyushu University, Japan) |
Page | pp. 93 - 94 |
Keyword | IoT, AI, FPGA, GPU |
Abstract | This paper presents a new IoT platform called My-IoT and an edge-cloud collaborative computing environment that can easily be downloaded from and executed on the platform. As platform enhancements, an AI computing framework and promising devices for edge computers, such as GPU, FPGA, etc., are discussed. |
PDF file |
Title | A CNN Network Suitable for FPGA Implementation in Surveillance Camera Systems |
Author | *Shota Ishikawa, Nobutaka Kuroki, Masahiro Numa (Kobe University, Japan) |
Page | pp. 95 - 100 |
Keyword | CNN, Object Detection, FPGA |
Abstract | This paper presents a CNN (Convolutional Neural Network) network suitable for FPGA (Field-Programmable Gate Array) implementation, with the main objective of developing an object detection algorithm for a surveillance camera system. Experimental results have shown that the network of the proposed method, which reduces the number of parameters by about 99% compared to the conventional method, does not cause significant accuracy degradation when used in the specific situation of a surveillance camera system. |
Title | Multiple regression analysis considering multicollinearity for estimating CPU cycles using performance counters |
Author | *Ryota Hattori, Yoshinori Takeuchi (Kindai University, Japan) |
Page | pp. 101 - 106 |
Keyword | Performance Analysis, Embedded Processor, Performance Counter, Multicollinearity, Multiple Regression Analysis |
Abstract | Currently, the development of industrial controller devices requires the estimation of the execution time of control software running on embedded processors. However, embedded processors have the complex functions and functional specifications are black box. Thus, estimating the execution time is difficult. Recently, many studies offer estimating methods of CPU cycles using performance counters as methods for estimating execution time. Since only a limited number of performance counters can be measured at one time, repeated measurements are required in order to get many performance counter values, which take a lot of time. This study proposes multiple regression analysis considering multicollinearity (MRACM) to reduce the number of measurements for estimating CPU cycles. This study compares estimation accuracy of CPU cycles by linear programming (LP), multiple regression analysis (MRA), and multiple regression analysis considering multicollinearity. This study discusses the best analytical approach for each program. Experimental results show that MRACM can reduce the number of required performance counters to 2 and estimate CPU cycles within the almost same estimation errors as conventional methods when multicollinearity occurs and counters with high and low correlation coefficients exist. |
PDF file |
Title | On Construction of Trajectory of Boxer's Punch using a single IMU |
Author | Yu-Cheng Lee, *Kai-Po Hsu, Yun-Ju Lee, Yi-Ting Li (National Tsing Hua University, Taiwan), Yung-Chih Chen (National Taiwan University of Science and Technology, Taiwan), Wen-Hsin Chiu, Chun-Yao Wang (National Tsing Hua University, Taiwan) |
Page | pp. 107 - 112 |
Keyword | IMU applications |
Abstract | In this work, we propose a system to construct the trajectory of punch for boxers. This system can plot trajectories of three kinds of punches including straight punch, hook, and uppercut via a single IMU sensor. A quaternion-based approach is utilized to identify rotations on collected data in three-dimensional space. Furthermore, we apply ellipsoid fitting as our calibration method to remove the built-in offsets inside the IMU sensor effectively. The experimental results show that the proposed system achieves reliable trajectories compared to the professional motion capture product, VICON Motion Systems. The root mean square error (RMSE) of trajectory in straight punch, hook, and uppercut are 0.041m, 0.078m, and 0.117m, respectively. |
PDF file |
Title | Iterative Linear Transformation to Reduce Compound Variables |
Author | *Tsutomu Sasao (Meiji University, Japan) |
Page | pp. 113 - 118 |
Keyword | functional decomposition,, minimization of variables,, linear transformation,, partially defined function. |
Abstract | A classification function is a multi-valued function, where the function values for only a fraction of the input combinations are defined. Many variables in such a function are redundant, and can be eliminated. A variable that can be represented as an EXOR of variables is called a compound variable. Using compound variables, we can further reduce the number of variables. This paper shows iterative methods to reduce the number of variables. It require memory with the size O(nk), where n is the number of input variables, and k is the number of the registered vectors. Experimental results for various benchmark functions show the effectiveness of the algorithms. These methods are useful for embedded system, where the memory size is limited. Also, they can be used as a pre-processor for other variable minimizer to reduce computation time. |
PDF file |
Title | Optimizing Gaussian Elimination-based NNA-compliant Circuit Synthesis by Simulated Annealing-based CNOT Gates Insertion |
Author | *Zanhe Qi (Ritsumeikan University, Japan), Atsushi Matsuo (IBM Research - Tokyo, Japan), Shigeru Yamashita (Ritsumeikan University, Japan) |
Page | pp. 119 - 124 |
Keyword | Quantum Circuit Design, Nearest Neighbor Architecture (NNA)-compliant, Gaussian Elimination, Inserting CNOT gates, Simulated Annealing (SA) |
Abstract | Quantum circuits are often tailored for the Nearest Neighbor Architecture (NNA), which primarily supports two-qubit operations only between neighboring qubits. Typically, converting a quantum circuit to adhere to NNA involves integrating SWAP gates. However, using Gaussian Elimination often results in a smaller NNA-compliant quantum circuit. This paper reveals that we can improve the Gaussian Elimination-based method by inserting CNOT gates before and/or after the target circuit in many cases. Additionally, we utilize Simulated Annealing (SA) method to get an optimal circuit. This paper shows that we achieve a reduction of approximately 19% in the number of CNOT gates compared to the original Gaussian Elimination-based method by inserting CNOT gates into initial circuits. |
Title | An Error Diagnosis Technique Based on Location Variable Simulation Employing Dedicated Multiplicity-Limiter Function and Ordering for Input Patterns |
Author | *Hiroki Tsuyama, Nobutaka Kuroki, Masahiro Numa (Kobe University, Japan) |
Page | pp. 125 - 130 |
Keyword | Error Diagnosis, ECO |
Abstract | This paper presents an error diagnosis technique based on location variable (LV) simulation employing dedicated multiplicity-limiter function (MLF) and ordering for input patterns, called PLEMs, to shorten the processing time for error diagnosis. The proposed MLF dedicated to PLEM includes only the location variables needed to compute the rectification condition, which improves the efficiency of the cofactor operation. In addition, by applying the PLEM in ascending order of the number of incorrect POs, PLEMs with fewer LUTs to consider are given priority. Experimental results have shown that the proposed technique reduces the processing time by 99.2% at the maximum, and by 82.7% on average. |
Title | Accurate Performance Estimation with BBFDA: Beyond Granularity Constraints |
Author | Hsuan-Yi Lin, *Ren-Song Tsay (National Tsing Hua University, Taiwan) |
Page | pp. 131 - 133 |
Keyword | program execution phase |
Abstract | In this paper, we present BBFDA, a pioneering approach for precise performance estimation in computer systems. Conventional time-quantum-based methods often encounter granularity limitations, impeding their ability to capture program behavior accurately. BBFDA utilizes Basic Block Analysis and Recursive Frequency Domain Analysis to estimate performance waveforms. This method enables dynamic performance tracking without being constrained by granularity issues and remains robust in the face of input variations. We assess the performance of BBFDA using SPEC CPU2017 benchmarks, showcasing its exceptional accuracy and resilience, particularly in multi-phase scenarios. |
PDF file |
Title | A Machine Learning-Based Approach to Cell Layout Optimization Considering LDEs |
Author | Ya-Rou Hsu, *Yen-Ju Su, Chia-Wei Liang, Han-Ya Tsai, Hung-Pin Wen (National Yang Ming Chiao Tung University, Taiwan), Hsuan-Ming Huang (MediaTek, Taiwan) |
Page | pp. 134 - 137 |
Keyword | Machine learning, Layout dependent effects (LDEs), Performance ranking, Design automation, Standard cell library |
Abstract | Cell layout generation plays a crucial role in design automation. The generated layout must not only follow design rules but also exhibit optimized performance in terms of factors such as delay, power, area, and cost. However, existing approaches in the literature often rely on metrics that fail to consider the layout dependent effects (LDEs). Furthermore, evaluating the ac- tual performance using commercial tools can be excessively time- consuming, especially when iteratively optimizing cell layouts. Therefore, this work proposes a new machine learning-based ranking model to enable rapid performance ranking between layout candidates of standard cells. This model incorporates layout dependent effects (LDEs) in feature extraction, generating an ordered list of cell layouts, and evaluating only the top- K candidates for performance. The experiments show that this approach successfully identifies the optimal layout from 10 benchmark cells in a sub-5nm FinFET industrial standard cell library, achieving a 348x speedup over the conventional flow. |
Title | Active Learning-based Practical Power Estimation Considering Multi-Cycle Paths |
Author | Shao-Min Liu, *Shao-Yun Fang (National Taiwan University of Science and Technology, Taiwan), Hsiang-Wen Chang, Ming-Chao Lee, Peter Wei (Synopsys Inc., Taiwan) |
Page | pp. 138 - 143 |
Keyword | power estimation, active learning |
Abstract | In order to meet the design requirements of low-power products, how to provide an accurate power estimation in early stages of the design flow has become more and more important. A previous research method uses the toggle rates of registers to perform design-dependent RTL-level power estimation based on machine learning (ML) techniques. However, the runtime speedup claimed by the previous work ignores the huge runtime obtaining the labels for training data, making the ML-based approach insufficiently efficient for general designs. In this paper, we adopt active learning to query the labels of the most representative training data and propose a new feature representation approach to enhance the model accuracy given that the training data are deficient. A recurrent neural network (RNN)-based autoencoder is also adopted, which makes the proposed model able to handle the designs with multi-cycle paths. Experimental results show that compared to the existing work, the proposed training flow can greatly improve the power estimation accuracy with much fewer training data. |
PDF file |
Title | RESURF Structure Optimization of SiC Trench MOSFET using Machine Learning |
Author | *Tomoya Akasaka (Hirosaki University, Japan), Ichirota Takazawa (JEDAT Inc., Japan), Seria Kasai, Atsushi Kurokawa, Toshiki Kanamoto (Hirosaki University, Japan) |
Page | pp. 144 - 149 |
Keyword | Power, MOSFET, Machine Learning, SiC, Simulated Annealing |
Abstract | This paper proposes a method to optimize vertical RESURF structure formed in a SiC trench MOSFET. SiC trench MOSFET is one of the most energy efficient devices for automotive power modules. In terms of energy efficiency, the RESURF structure is introduced in order to operate at higher voltages. Concerning the RESURF structure, the most important key parameter is the thickness of the P-type vertical RESURF region surrounding the N-drift doping conjunct with the drain. In this paper, we take advantage of artificial intelligence to optimize the thickness for the desired electrical characteristics. We first formulate the relationship between the output characteristics and the thicknesses using machine learning. With the obtained formula, we next search for the thickness to meet the desired current-voltage characteristics by applying simulated annealing. Experimental results show that the proposed method achieves the I-V characteristics with an adjusted coefficient of determination of 0.999 compared to the target. |
Title | A Search Algorithm for Optimal Resistance Measurement Points in Testing Power TSV with Manufacturing Variation Cancellation |
Author | *Yudai Kawakami, Koutaro Hachiya (Teikyo Heisei University, Japan) |
Page | pp. 150 - 154 |
Keyword | 3D-IC, power TSV, design for test, measurement point selection |
Abstract | Test methods have been proposed to detect open defects in power TSVs (Through Silicon Vias) in 3D-ICs by measuring the resistances between power supply pads placed directly beneath TSVs under test. When the manufacturing variation of the resistance is large, the diagnostic performance of testing a TSV must be improved by measuring two resistances, the detection resistance and the cancellation resistance, the latter of which is utilized to cancel the manufacturing variation component. Since the combinations of selecting these two resistance measurement points from the power supply pads directly under all TSVs are enormous, the previous research proposed the empirical rules to select the measurement points instead of searching for the optimum ones. This paper presents a search method for a local optimum solution by hill-climbing method, using measurement points determined by the empirical rules as the initial solution. |
PDF file |
Title | Optimal Inner Diameter of Single-Layer Planar Spiral Coils |
Author | *Kotaro Terada (Hirosaki University, Japan), Koutaro Hachiya (Teikyo Heisei University, Japan), Toshiki Kanamoto, Atsushi Kurokawa (Hirosaki University, Japan) |
Page | pp. 155 - 159 |
Keyword | spiral coil, inner diameter, wireless charging |
Abstract | Many electronic devices such as smartphones can now utilize wireless charging. Most of the power transmitting and receiving coils built into these devices are single-layer planar spiral coils to make them lighter and thinner. However, manufacturers have difficulty determining whether or not to wind the coil all the way to the center and what the inner diameter of the coil should be. In this paper, we clarify the various electrical and physical properties given by different inner diameters and present the optimal inner/outer diameter ratio for power transfer efficiency. The analysis results show that the optimal inner diameter ratio for obtaining the maximum power transfer efficiency in the resonant frequency range of 100 to 200 kHz was 0.442 to 0.544 when the outer diameter was 43 mm. |
Title | FPGA-Based Deep-Pipelined Architecture for Vision Transformer's Multi-Head Attention |
Author | *Hasitha Muthumala Waidyasooriya, Masanori Hariyama (Tohoku University, Japan), Daisuke Tanaka (Niihama College, Japan) |
Page | pp. 160 - 163 |
Keyword | FPGA, Vision transformer, Attention, OpenCL |
Abstract | Multihead attention is a crucial component within the Vision Transformer architecture, which plays a significant role in the overall processing. While multihead attention contains a substantial degree of parallelism, it comes with a considerable demand for memory access. This paper proposes an FPGA-based deep pipelined architecture to increase the processing speed while reducing the external memory access. According to the experimental results, proposed accelerator is faster than the multicore CPU implementation. We also discuss the potential to increase the processing speed further. |
PDF file |
Title | RLGC-Model-Based Film-Type Electromagnetic-Wave Absorber Design |
Author | *Sangyeop Lee (Tokyo Institute of Technology, Japan) |
Page | pp. 164 - 167 |
Keyword | Electromagnetic-wave absorber, EM simulaton, RLGC, circuit simulation |
Abstract | A film-type electromagnetic-wave (EM-wave) absorber reduces cavity resonance owing to reflection when the module is implemented. It is also used to meet EMI/EMC (electromagnetic interference/electromagnetic compatibility) requirements. However, high-cost workstations or servers are generally required for the design and high-cost EM simulation tools. In this work, we show a new design method based on circuit simulation with an RLGC model of dielectrics, which leads to shortening the simulation time and can be used for free software, such as Python, in the future. |
PDF file |
Title | (Panel Discussion) Counting the Blessings of Long Lasting SASIMI: Retrospectives of Senior SASIMIers |
Author | Moderator: Ing-Jer Huang (National Sun Yat-sen University, Taiwan), Panelists: Nagisa Ishiura (Kwansei Gakuin University, Japan), Shin-ichi Minato (Kyoto University, Japan), Ing-Jer Huang (National Sun Yat-sen University, Taiwan), Yu-Guang Chen (National Central University, Taiwan), Organizer: Ing-Jer Huang (National Sun Yat-sen University, Taiwan) |
Page | p. 168 |
Keyword | SASIMI |
Abstract | SASIMI has been 36 years old! It has a long lasting mission of encouraging young people with solid technical programs and strong supporting policies. There have been researchers who kept coming back to the workshop, either themselves or with colleagues/students as participants or serving in the committees. Many of them become experts in related fields. The workshop is exactly like its logo, a sailboat with full harvests! Let's count the blessings of the workshop, see what it has done for young people and wish a bright future for the young people and the future workshop! |
PDF file |
Tuesday, March 12, 2024 |
Title | (Keynote Speech) Big AI for Small Devices |
Author | Yiran Chen (Duke University, USA) |
Page | p. 169 |
Keyword | Model Compression, Edge Computing |
Abstract | As artificial intelligence (AI) transforms industries, state-of-the-art models have exploded in size and capability. However, deploying them on resource-constrained edge devices remains extremely challenging. Smartphones, wearables, and IoT sensors face tight limits on compute, memory, power, and communication. This gap between demanding AI models and edge hardware capabilities hinders onboard intelligence. In this talk, we will re-examine the techniques to bridge this gap and embed big AI on small devices. First, we will boost single-device efficiency via model compression. We will discuss how the properties of different hardware platforms impact the quantization and pruning strategies of deep neural network (DNN) models, benefiting actual system throughput and memory usage when considering the execution process of the models. Second, we will discuss the designs aimed at reducing the communication, computation, and storage overheads for distributed edge AI systems. We will also delve into the underlying design philosophies and their evolution toward efficient, scalable, robust, and secure edge computing systems. |
PDF file |
Title | Optimization of Pipeline Schedule for Hardware Efficient Two-Level Adiabatic Logic Circuits |
Author | *Yuya Ushioda, Mineo Kaneko (Japan Advanced Institute of Science and Technology, Japan) |
Page | pp. 170 - 175 |
Keyword | adiabatic circuit, Boolean network, current-scaling, Integer Linear Programming, hardware minimization |
Abstract | Two-Level Adiabatic Logic (2LAL) family has the potential to achieve ultra low power consumption, whereas its major drawback is a large hardware cost due to a large number of buffers for pipelining and ``decomputation'', which is inherent in adiabatic circuits. This paper proposes an Integer Linear Programming-based optimization of pipeline stage assignment of original gate operations, ``early-decompute'' operations, ``recompute'' operations applied to early-decomputed signals, and decompute operations for minimizing the number of buffers. The design simulation done on small- to mid-size combinatorial benchmark circuits shows that 56% reduction of the hardware cost in maximum and 22% reduction in average are achieved compared with designs obtained from optimized early decompute schedule under fixed gate operation schedule. |
Title | An Integer-Linear-Programming-Based Logic Locking Approach for Threshold Logic Gates |
Author | Yueh Cho, Ting-Yu Yeh, *Yu-Shan Lin, Yung-Chih Chen (National Taiwan University of Science and Technology, Taiwan) |
Page | pp. 176 - 180 |
Keyword | Logic locking, Threshold logic, SAT attack |
Abstract | Logic locking is an IC/IP protection technique that can prevent IC/IP piracy, overproduction, and hardware Trojans. Recently, threshold logic re-attracted attention from researchers due to promising hardware realization and its applications in machine learning. Although several electronic design automation techniques for threshold logic have been proposed, there is still a lack of research specifically addressing logic locking for threshold logic. Thus, in this paper, we propose an integer linear programming (ILP)-based method for locking a threshold logic gate (TLG) to resist SAT attack, which is one of the most powerful attack techniques used to break logic locking. We present the characteristics that a locked TLG should have for resisting SAT attack, and formulate the problem of computing the locked function as an ILP problem. To handle larger TLGs efficiently, we further enhance the method with a heuristic and relax the ILP problem. The experimental results show that the proposed method can successfully lock the TLGs having no more than 8 inputs with acceptable execution time. |
Title | Native Code Level Test of Optimizing Performance of Android Compilers |
Author | Naoki Yoshida, *Toya Hamada, Nagisa Ishiura (Kwansei Gakuin University, Japan) |
Page | pp. 181 - 186 |
Keyword | Android, Compiler, ART, Performance test |
Abstract | In this paper, we introduce a technique for assessing the optimization performance of the Android DEX compiler at the level of native code. This method is designed to detect missed optimization in native codes generated by the Android runtime environment through random generation of Java programs. The detection of optimization deficiencies is performed using both differential and equivalence methods. In the differential method tests, we attempt to identify missed optimization by comparing the native code produced by newer and older versions of DEX compilers. In the equivalence method tests, we aim to identify missed optimizations by comparing the native codes generated by a DEX compiler from both optimized and unoptimized source programs. The random Java programs are generated from a modified version of Orange4, which were originally developted for generating C programs. The test systems, employing the proposed methods, effectively identified insufficient optimization in x86_64 native code generated by the d8 DEX compiler. |
PDF file |
Title | Lightweight Monocular Depth Estimation Network Using Separable Convolution |
Author | *Kazuki Numata, Nobutaka Kuroki, Masahiro Numa (Kobe University, Japan) |
Page | pp. 187 - 192 |
Keyword | depth estimation, FPGA, separable convolution, CNN |
Abstract | In this paper, we propose a monocular depth estimation network using separable convolution to reduce computational complexity while minimizing a decrease in accuracy, with the aim of implementing it on FPGA (Field-Programmable Gate Array).Experimental results demonstrate that the network employing our proposed method maintains accuracy in two evaluation metrics, namely RMS and thresholded accuracy, compared to the conventional approach, while reducing the number of parameters by approximately 90%. Moreover, we confirmed that there is no significant degradation in subjective evaluation. |
Title | Assessing the Impact of Signal Strength Variability on AI-based Heart Sound Analysis |
Author | *Kyoichi Oyama, Chao Geng, Shigetoshi Nakatake (The University of Kitakyushu, Japan) |
Page | pp. 193 - 194 |
Keyword | Machine Learning, CNN, BNN, Heart Sound Analysis |
Abstract | Recent advances in machine learning and deep learning are fostering the adoption of AI in healthcare, notably in heart sound analysis. However, inconsistencies in the signal strength of clinical heart sound data pose a challenge, potentially compromising data reliability. This work investigates the influence of these signal fluctuations on the accuracy of AI-based heart sound identification, aiming to highlight critical insights for improving the robustness of AI applications in cardiac diagnostics. |
PDF file |
Title | An Efficient Approach to Iterative Network Pruning |
Author | Chuan-Shun Huang, Wuqian Tang (National Tsing Hua University, Taiwan), *Yung-Chih Chen (National Taiwan University of Science and Technology, Taiwan), Yi-Ting Li, Shih-Chieh Chang, Chun-Yao Wang (National Tsing Hua University, Taiwan) |
Page | pp. 195 - 200 |
Keyword | Network pruning, Model reduction |
Abstract | Network pruning is a technique to minimize the number of parameters of large neural networks. Network pruning can be performed once or multiple times. One-shot network pruning is easy to reach the required sparsity, but the corresponding accuracy drop may be unacceptable with respect to different goals. On the other hand, iterative network pruning trims and retrains the network iteratively to maintain the accuracy, but suffering from the long runtime of this repetitive procedure. In this work, we propose an efficient approach to network pruning by removing redundant trainings. Experimental results show that our approach reduces 25% to almost 60% of training time with comparable network accuracy as compared to the state-of-the-art. |
PDF file |
Title | Squaremax: A Hardware-Friendly Replacement for Softmax and Its Efficient VLSI Design and Implementation |
Author | *Meng-Hsun Hsieh, Xuan-Hong Li, Yu-Hsiang Huang, Pei-Hsuan Kuo, Juinn-Dar Huang (National Yang Ming Chiao Tung University, Taiwan) |
Page | pp. 201 - 205 |
Keyword | hardware-friendly activation function design, Softmax, efficient VLSI implementation |
Abstract | The Softmax function holds an essential role in most machine learning algorithms. Conventional realization of Softmax necessitates computationally intensive exponential operations and divisions, thereby posing formidable challenges in developing low-cost hardware implementations. This paper presents a promising hardware-friendly alternative, Squaremax, which gets rid of complex exponential operations. The function definition is extremely simple and can thus be efficiently implemented in both software and hardware. Experimental results show that Squaremax consistently attains comparable or superior accuracy over several popular models. Besides, this paper also proposes an efficient hardware design of Squaremax. It requires no functional units for exponential and logarithmic operations, and is even lookup table (LUT) free. Moreover, it achieves remarkable area and power efficiency. Therefore, hardware-friendly Squaremax is a very promising alternative to complex Softmax in both software and hardware, and the proposed hardware design and efficient LUT-free implementation do achieve a notable improvement in speed, area, and power. |
PDF file |
Title | An Approximate Fault-Tolerance Mechanism for SRAM-Based Near-Memory MAC Units |
Author | Yung-Chieh Lin, *Shih-Hsu Huang (Chung Yuan Christian University, Taiwan) |
Page | pp. 206 - 211 |
Keyword | Reliability, Digital Design, Error Tolerance, Multiplier Design, Computing-in-memories |
Abstract | To resolve the von Neumann bottleneck, computing-in-memories is a growing trend. In recent years, there have been many studies exploring SRAM-based near-memory MAC design. However, previous literatures mainly focus on the MAC function without discussing fault-tolerance and approximation techniques. To prevent chips from becoming unusable due to a few faulty memory cells, in this paper, we propose the incorporation of an approximate fault-tolerance mechanism in near-memory MAC circuits. If a memory cell responsible for calculating the MSB fails, we adjust the memory cell originally responsible for calculating the LSB to calculate the MSB instead for the reduction of computation errors. Experiment results demonstrate that our approximate fault-tolerance mechanism requires only a small additional area, yet significantly reduces errors caused by memory cell faults. |
Title | Expanding Tail Layer Training Scope on FPGA with Data Augmentation |
Author | *Yuki Takashima, Akira Jinguji, Ryota Kayanoma (Tokyo Institute of Technology, Japan), Hiroki Nakahara (Tohoku University, Japan) |
Page | pp. 212 - 217 |
Keyword | FPGA, CNN, Image Classification, Tail Layer Training |
Abstract | The demand for deep learning has increased, and many accelerators have been proposed. Although they perform inference at high speed, many of them have problems in training. We present the "tail layer training" for a convolutional neural network (CNN). In this method, only the tail layer of the model is trained. Since the number of neurons in the output and classes must be the same for image classification, it is effective for retraining to count the number of classes. Accuracy loss is negligible when training only the tail layer with two added categories in CIFAR10. Even on large datasets such as ImageNet, underfitting of additional classes can be avoided by using data augmentation. And since only the tail layer is trained, fast computation using a CNN accelerator is possible. Therefore, lightweight learning on FPGAs is achieved. Our scheme can be applied to the all existing SoC-FPGA-based CNN accelerator. |
PDF file |
Title | Broadband 5G Millimeter-Wave Low Noise Amplifier (LNA) Design in 22 nm FD-SOI CMOS and 40 nm GaN HEMT |
Author | *Clint Sweeney, Liang-Wei Ouyang, Yu-Chun Donald Lie (Texas Tech University, USA) |
Page | pp. 218 - 220 |
Keyword | millimeter-wave, 5G, Low Noise Amplifier (LNA), FD-SOI CMOS, GaN HEMT |
Abstract | We present the design of two broadband millimeter-wave (mm-Wave) low noise amplifiers (LNAs) that cover the key 5G FR2 band in advanced semiconductor technologies. One LNA is designed with a 22 nm fully-depleted silicon-on-insulator (FD-SOI) CMOS and the other with a 40 nm GaN high-electron-mobility transistor (HEMT) process. Several post-layout parasitic extraction (PEX) options are compared vs. the EM (electromagnetic) simulations for the CMOS LNA design, while the EM PEX simulations are solely used for the GaN LNA design. The simulation data suggests both broadband LNAs are very competitive vs. state-of-the-art ones in literature. For example, the LNAs achieve 3-dB bandwidth (BW) of 16.9 – 41.8/19.8 – 43.1 GHz, and Noise Figure (NF) of 2.9 – 4.1/1.9-2.4 dB for the CMOS vs. GaN LNAs, respectively. When using a FOM (figure-of-merit) (OIP3*G*BW)/((F-1)*Size*P_DC ) that accounts for linearity, power, NF, BW and size, both LNAs achieve among the best reported FOMs in literature. |
PDF file |
Title | Experimental Study of Pass/Fail Threshold Determination Based on Gaussian Process Regression |
Author | *Daisuke Goeda (Kyoto Institute of Technology, Japan), Tomoki Nakamura, Masuo Kajiyama, Makoto Eiki (Sony Semiconductor Manufacturing Corporation, Japan), Takashi Sato (Department of Graduate School of Informatics, Kyoto University, Japan), Michihiro Shintani (Kyoto Institute of Technology, Japan) |
Page | pp. 221 - 226 |
Keyword | Wafer-level spatial characteristic modeling, Gaussian process regression, Outlier detection, LSI testing |
Abstract | As large-scale integrated circuits (LSIs) grow in size and complexity, improving LSI test quality without increasing test costs becomes challenging. LSIs manufactured with advanced technologies exhibit significant variation in characteristics. The variation makes it difficult to determine the pass/fail threshold that distinguishes good and bad chips. Therefore, the yield loss and test escape ratios are increasing. Particularly, automotive semiconductors must comply with test standards set by the Automotive Electronics Council (AEC), resulting in increased yield loss and test escape compared to carefully designed threshold. To address this issue, a method that utilizes Gaussian process regression to determine the pass/fail threshold with has been proposed for power MOSFETs. This paper applies this approach to industrial LSI test data and confirms that it is equally effective for both power MOSFETs and LSIs. The method reduces yield loss and missed failures by 0.019% and 35.5% when compared to conventional methods in compliance with the AEC standard. |
PDF file |
Title | Energy Reduction of Health Monitoring Processor by Optimizing Supply and Back-Gate Voltages with Simulated Annealing |
Author | *Seria Kasai, Yamato Ishida, Fumiya Sano, Tomoya Akasaka (Hirosaki University, Japan), Masami Fukushima, Koichi Kitagishi, Seijin Nakayama (UNO Laboratories, Ltd., Japan), Hideki Ishihara (AQUAXIS TECHNOLOGY, Japan), Masashi Imai, Atsushi Kurokawa, Toshiki Kanamoto (Hirosaki University, Japan) |
Page | pp. 227 - 232 |
Keyword | processor, low-power, voltage optimization, simulated annealing, health monitoring |
Abstract | This paper proposes a method to reduce energy consumption of health monitoring processors by optimizing the power supply and the back-gate bias voltages by applying the simulated annealing technique. Health monitoring processors are typically embedded in wearable devices, such as pulse oximeters, which need to operate continuously with limited power supply. Our previous works have proposed an energy-efficient health monitoring processor that realized a single-stage operation by asynchronously reading data from data memory. In this paper, we further reduce energy consumption of the proposed processor. We first formulate the relationship between the total energy consumption and the combination of the supply and the back-gate voltages using machine learning. With the obtained response surface, we next find the voltages to achieve the minimum energy consumption within the operating range by the simulated annealing algorithm. The experimental results show that the proposed optimization method can effectively reduce the energy consumption of the processor by 83 %. |
Title | CMOS Bandgap Voltage Reference with Calibration Circuit for Process Variation |
Author | *Ryuji Hayashi, Masayoshi Tachibana (Kochi University of Technology, Japan) |
Page | pp. 233 - 237 |
Keyword | Analog circuit, LSI, Bandgap Reference |
Abstract | In this paper, a Band-Gap Reference (BGR) circuit with a calibration circuit is designed to suppress manufacturing variations. The calibration circuit makes it possible to adjust the output voltage at a low cost without the need for trimming. The BGR circuit with calibration circuit made it possible to suppress the output voltage variation at a supply voltage of 1.8 V to less than 7%. The study used a 0.18µm process and a circuit area of 229.5µm x 283.36µm. |
Title | IR drop Prediction Based on Machine Learning and Pattern Reduction |
Author | Yong-Fong Chang (National Tsing Hua University, Taiwan), Yung-Chih Chen (Natinoal Taiwan University of Science and Technology, Taiwan), *Yu-Chen Cheng (National Tsing Hua University, Taiwan), Shu-Hong Lin, Che-Hsu Lin (Natinoal Taiwan University of Science and Technology, Taiwan), Chun-Yuan Chen, Yu-Hsuan Chen, Yu-Che Lee (National Tsing Hua University, Taiwan), Jia-Wei Lin, Hsun-Wei Pao (MediaTek Inc., Taiwan), Shih-Chieh Chang, Yi-Ting Li, Chun-Yao Wang (National Tsing Hua University, Taiwan) |
Page | pp. 238 - 243 |
Keyword | IR drop, machine learning |
Abstract | With the advances in semiconductor technology, the size of transistors is getting smaller, which has led to an increasingly severe impact of IR drop, making IR drop analysis an important part of the chip design process. However, analyzing IR drops consumes a significant time and resources, and each engineering change order(ECO) step requires a reanalysis. In this paper, we propose a machine learning-based method to predict IR drop and introduce an algorithm for reducing input test patterns, significantly reducing the time and resources required for analyzing IR drop in ECO flows. |
PDF file |
Title | Evaluation of FPGA Performance in a Cryogenic Environment |
Author | *Akimasa Saito, Masashi Imai (Hirosaki University, Japan) |
Page | pp. 244 - 249 |
Keyword | FPGA, superconducting quantum computer, cryogenic, control electronics |
Abstract | Quantum computers are a new type of computer that use quantum mechanics to perform massively parallel computations. The superconducting quantum computer is the most promising method for practical use because it is easy to control and has high integration. To use quantum computers, a classical computer is needed as the controller. We are studying how FPGAs perform in cryogenic environments to determine if we can use them to control a superconducting quantum computer. Initially, we test FPGAs made by Xilinx and Altera to see how they operate at temperatures as low as -150°C. Then, we evaluate the performance of FPGAs in cryogenic environments by measuring the oscillation frequency of ring oscillators, power consumption, phase-locked loops (PLLs), and macro-CPUs. |
PDF file |
Title | Rad-Hard Flip-Flop Design for Automotive Electronics with Temperature-Tolerance |
Author | Ralf E.-H. Yee, *Lowry P.-T. Wang, Yen-Ju Su, Charles H.-P. Wen, Herming Chiueh (National Yang Ming Chiao Tung University, Taiwan) |
Page | pp. 250 - 253 |
Keyword | soft error, single event transient, single event upset, flip-flop |
Abstract | Many existing soft-error-tolerant flip-flop designs (e.g., MDAD-FF, SETU-TOFF, SEDR-FF) apply delayed latching to mitigate strikes of radiation particles. However, according to AEC-Q100 (Grade 1), automotive electronics are permitted to operate at temperatures between -40°C to 125°C, resulting in two reliability issues: (1) protection failure and (2) timing degrada- tion. At -40°C, these rad-hard FF designs are capable of providing a worst-case delay of only 113 ps, ineffective in protecting against 77-LET particles (which require 200 ps in 45nm process). At 125°C, however, the performance of these FF designs may degrade to 386 ps, resulting in more timing violations. Therefore, RAV-FF is proposed to address these two issues by incorporating a MOSFET capacitance (MCAP) to generate sufficient delay to delay clock and a current-control transistor (CC) to stabilize delay at different temperature corners. Experimental results indicate that RAV-FF provides effective soft-error protection in the temperature range of -40°C to 125°C by ensuring a delay of at least 200 ps with only 3% variation. |
Title | Development of Tsugaru Dialect Dictionary Management System |
Author | *Ryota Sato, Masashi Imai (Hirosaki University, Japan) |
Page | pp. 254 - 259 |
Keyword | Tsugaru dialect, Dictionary management system, Translation AI, Example sentence generation system |
Abstract | Tsugaru-ben is a unique dialect in the Tsugaru region of Aomori prefecture in Japan. Recently, old Tsugaru words are becoming less commonly used since young people in the Tsugaru region can understand them but do not use them themselves, resulting in the disappearing of the old Tsugaru culture. We are developing a Tsugaru dialect dictionary management system in order to preserve them and utilize them for the future. It contains a Tsugaru-ben dictionary database and the related databases, and maintains the stored data. An example sentence generation system for the translation artificial intelligence based on the databases is also developed. |
Title | (Invited Talk) Design Automation for Quantum Computing: How to (Not) Re-invent the Wheel for an Emerging Technology |
Author | Robert Wille (Technical University of Munich, Germany) |
Page | p. 260 |
Keyword | Quantum Computing, Design Automation |
Abstract | Quantum computers are one of the most promising new technologies which are currently investigated. With physical realizations already available to a broader audience and several potential applications on the horizon, this raises the question how to efficiently design corresponding quantum computing solutions. Can we re-use established methods from the design automation of classical systems? Or do we have to start from scratch for quantum computing? This talk aims to provide answers to these questions. We are trying to make the point that we do not have to re-invent the wheel---but that a 1:1 re-use of classical design methods also won’t do the trick. The corresponding discussions are exemplified using design automation solutions and software tools from the Munich Quantum Toolkit (MQT). For more details, please see https://www.cda.cit.tum.de/research/quantum/. |
PDF file |
Title | Voltage Dependence Model of Electromagnetic Side-Channel Attacks on Cryptographic Circuits |
Author | *Kazuki Minamiguchi, Yoshihiro Midoh, Noriyuki Miura, Jun Shiomi (Osaka University, Japan) |
Page | pp. 261 - 266 |
Keyword | side-channel attack, electromagnetic leakage, hardware security, voltage scaling |
Abstract | This paper propose a voltage-scaled model of tamper resistance to side-channel attacks using ElectroMagnetic (EM) waves, particularly on cryptographic circuits such as Advanced Encryption Standard (AES). Side-channel attacks are regarded as a potential threat to reveal secret information processed by cryptographic circuits. This paper firstly derives a voltage dependence model on the strength of EM leakage from voltage scale circuits. After that, we employ a statistical test for evaluating the tamper resistance to the EM side-channel attacks. Particularly, this paper models the trade-off relationship between the number of traces with which attackers can not reveal the data and the supply voltage of AES circuits. The proposed EM strength model is validated by a transistor level circuit simulator using a 180-nm process technology. Furthermore, this paper validates the proposed models using silicon measurements of an AES circuit with a 180-nm process technology. |
Title | Efficient Yield Analysis for SRAM-Based System with PDF Consolidation Methodology |
Author | *Shih-Han Chang, Ling-Yen Song, Yen-Chen Chun, Yu-Cheng Tsai, Chien-Nan Liu (Institute of Electronics, National Yang Ming Chiao Tung University, Taiwan) |
Page | pp. 267 - 270 |
Keyword | SRAM yield analysis, Process variation, Monte Carlo (MC) analysis, peripheral circuits |
Abstract | SRAM-based system is one of the most popular design in various applications. However, the simulation cost for yield estimation is often very high due to the high yield requirement of SRAM circuits. Importance sampling techniques are able to reduce the number of samples in high sigma analysis. However, the complexity is still high if the entire memory system with peripheral circuits are simulated together. To handle this issue, we propose an efficient yield analysis method for the overall SRAM system. Instead of analyzing the whole system directly, the proposed methodology evaluates each circuit block first. Then, the interactions of circuit blocks are considered to evaluate the system performance accurately with the prior distribution of each block. In this way, the overall accurate yield estimation can be obtained easily. The experimental results demonstrate that the proposed methodology efficiently estimates the yield of SRAM-based designs with high accuracy, especially for rare events. |
PDF file |
Title | Ramanujan Edge-Popup: Finding Strong Lottery Tickets with Ramanujan Graph Properties for Efficient DNN Inference Execution |
Author | *Hikari Otsuka, Yasuyuki Okoshi, Ángel López García-Arias, Kazushi Kawamura, Thiem Van Chu, Masato Motomura (Tokyo Institute of Technology, Japan) |
Page | pp. 271 - 274 |
Keyword | DNN Inference Accelerator, Deep Learning, Strong Lottery Tickets, Graph Theory, Ramanujan Graph |
Abstract | As Strong Lottery Tickets (SLT) can build highly accurate neural networks from random weights and binary masks, specialized SLT hardware enables efficient inference. Its performance, however, depends on the SLT used, and an accurate one is needed. We propose Ramanujan Edge-Popup, which explores SLTs through the lens of spectral graph theory and obtain sparse and accurate SLTs. The experiment with VGG-11 using CIFAR-10 shows that Ramanujan Edge-Popup achieves 5.78% better accuracy than Edge-Popup with 97.02% sparsity. |
PDF file |
Title | A Design Strategy for Processing-in-Memory Accelerators Using Cell-based DRAM |
Author | *Tai-Feng Chen, Yutaka Masuda, Tohru Ishihara (Nagoya University, Japan) |
Page | pp. 275 - 280 |
Keyword | Processing-in-Memory, Cell-based DRAM, AI Accelerator |
Abstract | Processing-in-Memory (PiM) is one of the most promising design solutions for applications such as edge AI accelerators that require both high performance and high energy efficiency. Since AI algorithms and their accelerator architectures are evolving rapidly, the biggest challenge in edge AI design is to synthesize high-performance and highly energy-efficient AI accelerators rapidly. In this paper, to address this challenge, we propose a strategy for synthesizing PiM accelerators using cell-based synthesizable DRAM. A gain cell structure is used for a bit cell of the synthesizable DRAM. The paper shows the design results of the cell-based synthesizable DRAM and an accelerator placed and routed in a mixed manner. |
Title | Model Reduction Using a Hybrid Approach of Genetic Algorithm and Rule-based Method |
Author | Wuqian Tang, Chuan-Shun Huang (National Tsing Hua University, Taiwan), Yung-Chih Chen (National Taiwan University of Science and Technology, Taiwan), *Yi-Ting Li, Shih-Chieh Chang, Chun-Yao Wang (National Tsing Hua University, Taiwan) |
Page | pp. 281 - 286 |
Keyword | Model reduction, Genetic algorithm |
Abstract | Model reduction is a technique that reduces the computational resources required to run a model (neural network) by pruning parameters or structures in the model. Most of the model reduction algorithms achieve the goals of model reduction and accuracy preserving through multiple iterations of pruning-retraining process. However, this retraining process is quite time-consuming, making the model size reduction algorithm particularly inefficient, especially when the parameters of model exceed tens of millions. In this paper, we propose a hybrid approach combining genetic algorithm (GA) and rule-based method. With the integration of GA and a rule-based method, the time cost of searching for a well-performing model can be significantly reduced. This strategy greatly reduces GA's search space and time cost. With a very limited number of retraining epochs (<10), the accuracy and pruning ratio (sparsity) of the reduced model can catch up the results of state-of-the-art. We conduct experiments on a gesture recognition model with over 30 million parameters. The experimental results show that for this model, our approach prunes 74.6% of the parameters with 3.8% accuracy drop without retraining. With only three epochs of retraining, our approach prune 93.1% of the parameters without any accuracy drop. |
PDF file |
Title | An Efficient Routing Method for Micro-Electrode-Dot-Array Digital Microfluidic Biochips Considering Droplet Division and Velocity |
Author | *Chuan Lin, Debraj Kundu, Shigeru Yamashita, Hiroyuki Tomiyama (Ritsumeikan University, Japan) |
Page | pp. 287 - 292 |
Keyword | biochip, DMFB, MEDA, routing, algorithm |
Abstract | A biochip is a device that enables biochemical experiments, typically performed manually, to be conducted on a small chip by manipulating reagents in small liquid volumes. There is a biochip known as MEDA. On MEDA, droplets can change their shape, and their speed varies accordingly. In this paper, leveraging this characteristic, a method is proposed to circumvent blockages on the chip and solve the liquid droplet routing problem. |
Title | A Study on an Interface Circuit for Burst Transfers from Synchronous to Asynchronous Circuits Considering Cycle Times |
Author | *Shogo Semba, Hiroshi Saito (The University of Aizu, Japan) |
Page | pp. 293 - 298 |
Keyword | interface circuits, asynchronous circuits, burst transfers |
Abstract | In this paper, we propose an interface circuit for burst transfers from synchronous to asynchronous circuits. The proposed interface circuit realizes burst transfers in a single handshake cycle. To realize burst transfers, we decide the number of registers from the difference between cycle times of synchronous and asynchronous circuits and burst length. In the experiment, we compared the proposed interface circuit with a FIFO-based interface circuit in terms of energy consumption. The proposed interface circuit could reduce energy consumption by at least 9.7%. |
PDF file |
Title | Reduction of Static Power Consumption of LSI by Decreasing Leakage Current Paths with Equivalent Logic Expression Conversion |
Author | *Kazuma Dobata, Kazuhito Ito (Saitama University, Japan) |
Page | pp. 299 - 304 |
Keyword | LSI, static power, leackage current, CMOS |
Abstract | Reducing the static power consumption of large-scale integrated circuits (LSI) has become an important issue. The main cause of static power consumption in CMOS circuits is leakage current flowing through off-state MOS transistors. In this paper, we propose a method to reduce static power consumption of CMOS LSIs by equivalently converting a given logic expression to reduce the number of leakage current paths and thereby stacking MOS transistors to decrease the leakage current. |
PDF file |
Title | Template Design and Layout Decomposition for Lamellar DSA with Donut-Shaped Templates |
Author | *Yun-Na Tsai, Shao-Yun Fang (National Taiwan University of Science and Technology, Taiwan) |
Page | pp. 305 - 310 |
Keyword | Lamellar DSA, Template design, layout decomposition |
Abstract | Directed self-assembly (DSA) using block copolymers (BCP) has become a very promising technique for the fabrication of via layers in integrated circuits with the dramatic shrink of feature sizes and great increase in circuit complexity. Since the cylindrical DSA suffers from the drawbacks of fixed via pitch in a single template and great displacement error due to process variation, lamellar DSA in combination with the self-aligned via (SAV) process becomes an alternative that may lead to better manufacturability. Many studies have investigated design methodologies for via/contact layer fabrication with cylindrical block copolymers, but there is only one existing work focusing on the guiding template design problem for lamellar DSA. However, this work only considers one-dimensional lamellar guiding templates, while adopting two-dimensional templates can resolve more template conflicts. This paper presents the first work of multi-row guiding template design for lamellar DSA with SAV technology and multiple patterning lithography (MPL). To tackle the problem, we enumerate all guiding template shapes for a given via layout by considering the design constraints in the target process and design flexibility with dummy vias. Two methods are proposed afterward. The first one is a method based on integer linear programming (ILP), and the second one is a heuristic method. The experimental results show that the optimal solutions can be obtained by solving the ILP formulation, and the heuristic method can obtain near-optimal solutions with much less runtime. |
Title | On Effective Usage of APR Tools for Display Driver IC Layout Generation |
Author | Kai-Liang Liang, Li-Yu Lin, *Hung-Ming Chen (NYCU, Taiwan) |
Page | pp. 311 - 316 |
Keyword | APR, DDIC, DRV |
Abstract | Current APR (automatic place-and-route) tools are in very mature status while realizing digital or mixed signal ICs, however some special purpose ICs are still in the struggle of obtaining better support from the vendors. This study focuses on the display driver IC (DDIC), which has an extreme aspect ratio and thus causing a severe congestion problem. We propose a series of treatments, trying to resolve the routing resource shortage in modern APR methodology. In the congestion identification method, we consider the crowding level of local global routing congestion to predict the location of the detailed routing violations (DRVs). We have utilized the customized techniques, including blockages application, cell inflation, and module adjustment, to achieve our goals. The experiments show that the proposed methods can help identify the congestion regions and effectively decrease the number of DRVs. Moreover, these methods can be integrated into the existing APR flow in the leading-edge design house, providing a rather comprehensive study for routability improvement to reduce the iterations on placement and detailed routing. |
PDF file |
Title | Polygon Fracturing Method Considering Maximum Size Limit |
Author | *Taiki Matsuzaki, Kunihiro Fujiyoshi, Tomohiko Hotta (Tokyo University of Agriculture and Technology, Japan) |
Page | pp. 317 - 322 |
Keyword | Rectilinear Polygon, Fracture, Optimal Single-Partition, Variable Shaped-Beam Mask Writing |
Abstract | Variable shaped-beam electron beam lithography systems are widely used for mask writing. The exposure data, which is an input for variable shaped-beam mask writing, must be a set of rectangles with considering maximum size limit. It is also crucial to fracture the layout into as few rectangles as possible for reducing the number of times the beam irradiated. Although several methods have been proposed, there is still no method to obtain an optimal solution in practical calculation time with maximum size limit. In this paper, we propose four types of Optimal SinglePartitions and prove these partitions guarantee that the optimal solutions are not lost. We performed computational experiments to evaluate the performance of methods based on these Optimal Single-Partitions. Our methods yield better solutions than the previous method and are faster than ILP in many cases. |
PDF file |
Title | Pin Access-Aware Power Distribution Network Optimization in 7nm Technology |
Author | Wei-Shou Wu, *Rung-Bin Lin (Yuan Ze University, Taiwan) |
Page | pp. 323 - 327 |
Keyword | Power distribution network, pin access, power stripe, IR drop, VLSI |
Abstract | In this paper we propose reallocating the power stripe resources to improve the maximum IR-drop on a chip while minimizing the impact on pin accessibility. The experimental results show that, without using extra power stripe resources, our method can reduce the maximum IR-drop by 7.3%~8.5% and incur 28%~43% fewer DRC violations without any perceivable increase in total wire length and via count at the expense of only up to 3% increase in the worst negative slack. |
Title | Anomaly Classification with Anomaly-Focused Patch Selection by Gaussian Distribution |
Author | *Yuga Ono, Lin Meng (Ritsumeikan University, Japan) |
Page | pp. 328 - 332 |
Keyword | Deep Learning, Image recgnition, Anomaly detection |
Abstract | Numerous methods for detecting and localizing anomalies have been proposed, and many have achieved great success. In practical applications, various factors can lead to different types of anomalies, each of which requires specific treatment. Therefore, it is crucial to not only detect anomalies but also to identify their specific types. This paper builds upon an existing anomaly detection method, using it as a foundational model, and extends its application toward anomaly classification. We present a novel approach aimed at efficiently and accurately identifying the type of anomaly by leveraging patch embeddings and the anomaly score obtained during the initial anomaly detection stage. Our paper introduces Anomaly-Focused Patch Selection (AFPS), which is a unique mechanism that helps select more meaningful patches for training a classification model. AFPS demonstrates superior classification accuracy with 75% fewer number of patches compared to the base method, which simply employs random patch selection. |
PDF file |
Title | Architecture of an FPGA-Based Brain Neural Network Simulator Using Direct Mapping |
Author | Hasitha Muthumala Waidyasooriya, *Mizuki Harasawa, Masanori Hariyama (Tohoku University, Japan) |
Page | pp. 333 - 334 |
Keyword | FPGA, Brain Neural Network, Accelerator, OpenCL |
Abstract | Simulating brain neural networks is crucial for gaining insights into the functioning of the brain and for advancing the development of artificial intelligence. However, simulating large neural networks is very time consuming process. We propose an FPGA architecture to accelerate the simulation using parallel processing. Our proposed architecture employs a unified scheduling and allocation scheme to effectively increase the number of neurons while maintaining a high degree of parallelism. Our results demonstrate an impressive over 80% reduction in area without compromising on processing speed. |
PDF file |
Title | CODEC system using EG2C chips and power control with a sleep mode for a visual prosthesis |
Author | *Naoya Tanaka, Shogo Hirayama, Yoshinori Takeuchi (Kindai University, Japan) |
Page | pp. 335 - 340 |
Keyword | Visual prostheses, ASIC, Data compression, Low power wireless communication |
Abstract | Visual prostheses are expected to become a method of restoring sight for the blind by stimulating the visual pathway, and need further development for users. Specifically, visual prostheses require wirelessly transmitting data to implantable devices and processing the data with high speed and low power consumption. Thus, we design the dedicated CODEC chip EG2C to meet the two specifications and implement the CODEC system using EG2C chips and wireless transmission chips with a sleep mode. This paper describes the CODEC system and a method of reducing the power consumption with a sleep mode, shows the performance of the CODEC system, and proposes the CODEC system as a module of a visual prosthesis. Experimental results show EG2C chips enable to decrease the power consumption of the CODEC system by 7.56 mW. |
PDF file |
Title | Hybrid Refinement Strategy for Package Substrate Routing |
Author | Tsubasa Koyama, *Ding-Hsun Lin, Yu-Jen Chen (National Tsing Hua University, Taiwan), Keng-Tuan Chang, Chih-Yi Huang, Chen-Chao Wang (Advanced Semiconductor Engineering (ASE), Inc., Taiwan), Tsung-Yi Ho (National Tsing Hua University/The Chinese University of Hong Kong, Taiwan) |
Page | pp. 341 - 346 |
Keyword | Advanced Packaging, Package Substrate Routing, Routing Refinement, Deep Learning |
Abstract | Advanced packaging technologies have gained significant importance in recent years due to rapid technological advancements. In these designs, substrate routing plays a critical role in ensuring the proper functioning and performance of the package. While existing works and automatic routing tools are available to assist designers in solving routing problems, they often encounter challenges when dealing with the complex constraints and specifications found in industrial designs. As a consequence, issues such as open/short nets, dense routing areas, and routing detours can arise. Designers are required to manually modify these results, which is a time-consuming process that can take weeks. In this work, a hybrid refinement strategy that combines rule-based and Deep Learning (DL)-based approaches is proposed to address this challenge. The aim is to improve the area distribution and reduce detours in the auto-routing results of industrial Flip-Chip Ball Grid Array (FCBGA) substrate design, while significantly reducing the time required for modifications. Experimental results demonstrate that the proposed methods effectively enhance both detours and area distribution, achieving an average improvement of 55% and 32% respectively, compared to the auto-routed design. Furthermore, the time required for modifications is also drastically reduced from weeks to minutes. |