(Go to Top Page)

SASIMI 2022
The 24th Workshop on Synthesis And System Integration of Mixed Information Technologies
Technical Program

Remark: The presenter of each paper is marked with "*".   Time zone is JST (=UTC+9:00)
Technical Program:   SIMPLE version   DETAILED version with abstract
Author Index:   HERE

Session Schedule

Monday, October 24, 2022

Premier Hall (on-site)Zoom (online)Gather (online)
Op  Opening
9:00 - 9:40

K1  Keynote Speech I
9:40 - 10:40

A  Poster Session (Group A)
10:40 - 12:10

BO  Poster Session (Group B)
10:40 - 12:10
Lunch Break
12:10 - 13:40
I1  Invited Talk I
13:40 - 14:40

Break & Exhibitors' Presentation
14:40 - 15:00


V  Poster Session (Group V)
15:00 - 16:40
K2  Keynote Speech II
16:40 - 17:40

Tuesday, October 25, 2022

Premier Hall (on-site)Zoom (online)Gather (online)
I2  Invited Talk II
9:00 - 10:00

B  Poster Session (Group B)
10:00 - 11:30

CO  Poster Session (Group C)
10:00 - 11:30
Lunch Break
11:30 - 13:00
I3  Invited Talk III
13:00 - 14:00

C  Poster Session (Group C)
14:00 - 15:30

AO  Poster Session (Group A)
14:00 - 15:30
Cl  Closing
15:30 - 15:40



List of papers

Remark: The presenter of each paper is marked with "*".   Time zone is JST (=UTC+9:00)

Monday, October 24, 2022

[To Session Table]

Opening
Time: 9:00 - 9:40, Monday, October 24, 2022
Location: Premier Hall (on-site) / Zoom (online)


[To Session Table]

Keynote Speech I
Time: 9:40 - 10:40, Monday, October 24, 2022
Location: Premier Hall (on-site) / Zoom (online)
Chair: Yoshinori Takeuchi (Kindai University, Japan)

K1-1 (Time: 9:40 - 10:40)
Title(Keynote Speech) Hardware/Software Codesign for Machine Learning Acceleration with Silicon Photonics
AuthorSudeep Pasricha (Colorado State University, USA)
Pagep. 1
AbstractThe massive data deluge from mobile, IoT, and edge devices, together with powerful innovations in data science and hardware processing, have established machine learning (ML) as the cornerstone of modern medical, automotive, industrial automation, and consumer electronics domains. Domain-specific ML accelerators such as Google’s TPU and Apple’s Bionic, now dominate CPUs and GPUs for energy-efficient ML processing. However, the evolution of these electronic accelerators is facing fundamental limits due to the slowdown of Moore’s law and the reliance on metal wires, which already severely bottleneck computational performance today. Silicon photonics represents a promising post-Moore technological alternative to overcome these limitations. Not only can photonic interconnects fabricated in CMOS-compatible processes provide near speed of light transfers at the chip-scale, but photonic devices can now also perform computations entirely in the optical domain. In this talk, I will present my vision of how silicon photonics can drive an entirely new class of sustainable ML hardware accelerators that can provide orders of magnitude energy improvements over today’s accelerators. I will discuss new directions in hardware/software codesign for ML acceleration with silicon photonics, with multi-objective goals related to power and energy minimization, variation tolerance, fault resilience, and secure computing.
PDF file


[To Session Table]

Poster Session (Group A)
Time: 10:40 - 12:10, Monday, October 24, 2022
Location: Premier Hall (on-site)
Chair: Kenshu Seto (Tokyo City University, Japan)

Outstanding Paper Award
A-1 (Time: 10:40 - 10:42)
TitleFull Hardware Implementation of RTOS-Based Systems Using General High-Level Synthesizer
AuthorTakuya Ando, Iori Muguruma, *Yugo Ishii, Nagisa Ishiura (Kwansei Gakuin University, Japan), Hiroyuki Tomiyama (Ritsumeikan University, Japan), Hiroyuki Kanbara (ASTEM RI/Kyoto, Japan)
Pagepp. 2 - 7
KeywordRTOS, full-hardware implementation, high-level synthesis, real-time systems
AbstractThis article proposes a method for implementing an RTOS-based system as hardware using a general high-level synthesizer. Oosako proposed a full hardware scheme where all the tasks and all the RTOS functions are implemented as hardware. However, it depended on special features of an in-house binary synthesizer ACAP; a synthesized hardware module has a stall port by which module's execution can be suspended, and accesses to global variables are automatically translated to accesses to the single memory space without rewriting the source program. Moreover, the size of the resulting circuits was too large for practical use. This paper proposes a new architecture that can dispense with the stall ports and also reduces the size of the resulting circuits. This paper also presents a wrapper class for global variable accesses and a style of programs to minimize the rewriting of task programs. Based on the proposed method, a hardware module for a reduced version of ``sample1'' bundled with TOPPERS/ASP3 has been successfully implemented as hardware using Xilinx Vitis HLS. Moreover, the size of the resulting circuit was 89 smaller than that by the previous method.
PDF file

A-2 (Time: 10:42 - 10:44)
TitleSNRoverSDNN: A Metric for Robust CNN-based ROI Selection in Remote Heart Rate Extraction
Author*Yuta Hitotsuyanagi, Takashi Sato (Graduate School of Informatics, Kyoto University, Japan)
Pagepp. 8 - 13
KeywordHeart rate, Cameras, Convolutional neural networks, Image color analysis, Biomedical measurement
AbstractRemote photoplethysmography (rPPG) is a method to estimate heart rate (HR) using video cameras. It enables non-contact HR estimation with inexpensive cameras,allowing subjects to conveniently measure HR without being restraint or feeling discomfort. In rPPG, it is important to select a region of interest (ROI), which is suitable for HR estimaion. We propose a new metric SNRoverSDNN for CNN-based ROI selection. SNRoverSDNN takes into account harmonics and periodicity of the heartbeat. Using SNRoverSDNN, we could select reasonable ROIs without face detection.

A-3 (Time: 10:44 - 10:46)
TitleHardware RTOS Services for Full Hardware Implementation of RTOS-Based Systems
Author*Hiro Minamiguchi, Masaki Nakahara, Yugo Ishii, Yukino Shinohara, Iori Muguruma, Nagisa Ishiura (Kwansei Gakuin University, Japan)
Pagepp. 14 - 19
KeywordRTOS, full hardware implementation, high-level synthesisi, real-time systems
AbstractThis paper presents hardware implementation of RTOS services for full hardware implementation of RTOS-based systems, where all the task programs and all the RTOS functions are implemented as hardware. Hardware methods for processing services of mutexes, event flags, data queues, shared variable accesses, and task control are proposed. Wait and release operations necessary in synchronization and communication services are efficiently performed using a request arbitration module. Timeouts are also handled by hardware using distributed timers. A hardware module that contains two mutexes, two event flags, one data queue of 320B data, and shared variable of 1024B, as well as task scheduling and control functions, has been designed in Verilog HDL. It was synthesized to an FPGA circuit of 4,300 LUTs and 2,200 flip-flops (Xilinx Artix-7). All the services can be executed well in 150 ns, which is fast enough even for extreme applications.
PDF file

A-4 (Time: 10:46 - 10:48)
TitleImportance Evaluation Methodology of FFs for Design Optimization of Approximate Computing Circuits
Author*Jiaxuan Lu, Yutaka Masuda, Tohru Ishihara (Nagoya University, Japan)
Pagepp. 20 - 25
Keywordapproximate computing, importance evaluation, fault injection
AbstractApproximate computing (AC) has attracted much attention, contributing to energy saving and performance improvement by accurately performing the important computation and approximating others. In order to make AC circuits practical, we need to determine which computation is important carefully, and thus approximate unimportant computations to maintain the required computational quality.In this paper, we focus on the importance of computations at the Flip-Flop (FF) level and propose a novel importance evaluation methodology. The key idea of the proposed methodology is a two-step fault injection algorithm to extract the near-optimal set of unimportant FFs. In the first step, the proposed methodology derives the importance of each FF. Then, in the second step, the proposed methodology extracts the set of unimportant FFs in a binary search manner. Thanks to the two-step strategy, the proposed algorithm reduces the complexity of architecture exploration from an exponential order to a linear order without understanding the functionality and behavior of the target application program. In a case study of an image processing accelerator, the proposed methodology finds out that 21.8% of FFs can be approximated, resulting in 16.5% area reduction and 19.1% power saving while satisfying the image quality constraint.

A-5 (Time: 10:48 - 10:50)
TitleBottleneck Channel Routing to Reduce the Area of Analog VLSI
Author*Kazuya Taniguchi, Satoshi Tayu, Atsushi Takahashi (Tokyo Institute of Technology, Japan), Yukichi Todoroki, Makoto Minami (Jedat, Japan)
Pagepp. 26 - 31
Keywordtwo-layer Bottleneck Routing, Analog VLSI
AbstractDesign automation that realizes analog integrated circuits to meet performance specifications in a small area is desired. To reduce the layout area, “Bottleneck Channel Routing” is proposed in which two wires go through a routing track in the bottleneck region. A two-layer routing problem that consists of the bottleneck channel and the adjacent regions where the HV rule is not applicable is defined. The proposed algorithm uses a U-shaped routing model, and generates two-layer routing in which the number of intersections is minimized and the wire of a net includes at most one via. The obtained routing contains no conflicts if the algorithm outputs a feasible solution.
PDF file

A-6 (Time: 10:50 - 10:52)
TitleBinding and Scheduling of 2×3 Mixers for Transport-Free Sample Preparation Using Programmable Microfluidic Devices
Author*Masataka Hirai, Shigeru Yamashita (Ritsumeikan University, Japan), Sudip Roy (Indian Institute of Technology (IIT) Roorkee, India), Hiroyuki Tomiyama (Ritsumeikan University, Japan)
Pagepp. 32 - 37
KeywordBiochip, PMD
AbstractA Programmable Microfluidic Devices (PMD) is one of the promising biochip platforms. On a PMD, fluids are mixed by a module called a mixer. We can generate various kinds of mixers, such as a 2x2 mixer and a 2x3 mixer consisting of 2x2 and 2x3 arrays of cells, respectively. Unlike other biochip platforms, we cannot move a fluid from one cell to another cell in a PMD. Thus, it has been proposed ``No Transport Mixing (NTM)'' which is a method to bind and schedule mixers without droplet transportation. However, NTM can treat only 2x2 mixers. Thus, in this paper, we propose an efficient method to bind and schedule mixing operations using 2x3 mixers as well as 2x2 mixers. Our method is based on a transformation of a given mixing tree based on our proposed ``Placement Priority'' values. Simulation results can confirm that our transformation indeed is useful to decrease the number of ``flushing'' operations required in sample preparation using a PMD.

A-7 (Time: 10:52 - 10:54)
TitleSegmented DAC Linearity Improvement Algorithm Using Unit Cell Sorted Alternately with Digital Method
Author*Yi Liu, Anna Kuwana, Shogo Katayama, Xiongyan Li (Gunma University, Japan), Atsushi Motozawa (Renesas Electronics Corporation, Japan), Haruo Kobayashi (Gunma University, Japan)
Pagepp. 38 - 43
KeywordDAC, SSPA, DNL, INL
AbstractThis paper describes a self-calibration method for a current-steering Digital-to-Analog Converter (DAC) with a voltage-controlled oscillator (VCO). It is a digital method and does not require high precision analog circuits; the VCO needs only monotonic characteristics but it does not need linearity. Mismatches among the unit current sources in the current-steering segmented DAC cause the overall DAC nonlinearity, and the VCO measures the order of each current source value. The measured information is stored in memory, and based on it, each current source is sorted to reduce the DAC nonlinearity. Especially we have investigated with simulations whether the comparison algorithms can improve the DAC Differential Non-Linearity (DNL) and Integral Non-Linearity (INL) with several mismatch conditions. We present its principle and simulation results.
PDF file

A-8 (Time: 10:54 - 10:56)
TitleAging-Compromised Computing-In-Memory Dot-Product Calculation Technique Through DVFS
Author*Yu-Guang Chen, Chi-Hsu Wang (National Central University, Taiwan), Ing-Chao Lin (National Cheng Kung University, Taiwan)
Pagepp. 44 - 47
KeywordComputing-In-Memory, BTI, HCI, SRAM, DVFS
AbstractVon Neumann architecture which separates the computing logic and the storage area has been considered as the fundamental architecture of nearly all digital computers nowadays. The data-intensive applications such as image recognition or cryptography may transfer large amount of data between memory and the computing cores, which causes a well-known von Neumann bottleneck due to the limitation of communication bandwidth. Computing In-Memory (CIM), which directly perform in-situ operations at memory, has been considered as one of the promising solutions to overcome von Neumann bottleneck. Previous researchers have proposed an 8T-SRAM-based CIM architecture to perform multi-bit dot product computations by analog charging/discharging operations. However, such operations are very sensitive to variations as well as aging effects such as Bias Temperature Instability (BTI) and/or Hot Carrier Injection (HCI). To provide a reliable CIM multi-bit dot product engine, in this paper we propose an aging-aware in-memory computing framework which consists of an aging detection method and an aging tolerance technique. Specifically, we apply Dynamic Voltage Frequency Scaling (DVFS) on CIM structure to compensate the current drop due to variations and aging effects. Experimental results show that we can double the lifetime of CIM structure with 1.185x extra power consumption in average.
PDF file

A-9 (Time: 10:56 - 10:58)
TitleAn Implementation of Self-Testable Layout-Level Scan C-element
Author*Kokoro Yamasaki, Hiroshi Iwata, Ken'ichi Yamaguchi (National Institute of Technology, Nara College, Japan)
Pagepp. 48 - 53
KeywordDesign for testability, Full scan design, Asynchronous circuit, C-element, Layout level design
AbstractDesign methodology with asynchronous circuit is used for recent VLSI designs since it can solve several problems with synchronous circuit designs. However, manufacturing test for asynchronous circuits is more difficult than that for synchronous circuits, in which global synchronization is controlled by clock signal lines. To solve the above serious problem for dependability, a full scan design for asynchronous circuits is an answer. A transistor-level circuit for the scan C-element has also been proposed so that one way as an implementation full scan. However, there is no layout-level design of scan C-element to fabricate the chip, and no physical information is available. In this paper, we propose a layout design for scan C-elements using a Rohm 0.18um process transistor model with a view to fabricating chips for experiments.
PDF file

A-10 (Time: 10:58 - 11:00)
TitleVoice Learning of Reservoir Computing Architecture using Ternary Content Addressable Memory with Individuality
Author*Sayaka Akiyama, Go Ajiki, Xiangbo Kong, Takeshi Kumaki (Ritsumeikan University, Japan)
Pagepp. 54 - 59
KeywordReservoir Computing, CAM, Voice learning, AI
AbstractWith the rapid progress in artificial intelligence (AI) technology, the number of machines that have been designed to interact with human beings has been steadily increasing. However, the responses of such machines to human interactions are often excessively uniform. The purpose of our study is to incorporate the variations that occur during chip manufacturing into machine learning and give own individuality to AI-based robots. In this paper, reservoir computing architecture using Ternary Content Addressable Memory with Individuality is developed and learning is performed using voice data, which is a complicated waveform. It is found that the error of the average of all data between 10 chips is 140% at the maximum. Voice data learning results have individuality outputs.
PDF file

Outstanding Paper Award
A-11 (Time: 11:00 - 11:02)
TitleFormulation of Maximum Independent Set Problem for Simulated Quantum Annealing Machine
Author*Haruki Nakayama, Yukihide Kohira (The University of Aizu, Japan)
Pagepp. 60 - 65
KeywordMaximum Independent Set Problem, Simulated Quantum Annealing
AbstractVarious problems in LSI design such as redundant via insertion are formulated as Maximum Independent Set Problems (MISP). Recently, various algorithms have been proposed to optimize combinatorial optimization problems such as MISP. It is required that we find a suitable combination between each combinatorial optimization problem and a method since a combinatorial optimization problem can be solved by multiple methods. In this paper, we try to find a suitable combination between three optimization problems, which are MISP, minimum vertex cover problem, and maximum clique problem, and three methods, which are a mathematical optimization for binary variables, a solver for satisfiability problem, and a Simulated Quantum Annealing (SQA) machine. It is known that these three problems are equivalent to each other. Moreover, a new formulation for SQA is proposed to improve modeling time. Experimental results show that the proposed formulation obtains the best solution in a short modeling time.

A-12 (Time: 11:02 - 11:04)
TitleEfficient Hardware Architecture for Taylor-Series Expansion Calculation Using Distributed Arithmetic with Term Division
Author*Xaybandith Hemthavy, Jianglin Wei, Shogo Katayama, Anna Kuwana, Haruo Kobayashi (Gunma University, Japan), Kazuyoshi Kubo (Oyama National College of Technology, Japan)
Pagepp. 66 - 70
KeywordDigital Signal Processing, Distributed Arithmetic, Taylor-Series Expansion, Digital Arithmetic, Multiply-Add
AbstractThis paper describes the digital arithmetic that reduces the calculation and hardware (logic circuits and memory) for Taylor series expansion calculation by applying the distributed (bit-serial) arithmetic with the proposed term division method. The distributed arithmetic (DA) is a multiplier-less approach for calculating multiply-add operation, but its direct application to the Taylor-series expansion calculation still demands almost the same number of multiplications as the direct calculation and additionally large size Look Up Table (LUT), hence it is useless. Then we propose the term division method which can reduce the number of multiplications and the LUT size significantly. Further, we found that the optimal number of the term division is approximately the square root of the number of the Taylor series expansion terms.
PDF file


[To Session Table]

Poster Session (Group B)
Time: 10:40 - 12:10, Monday, October 24, 2022
Location: Gather (online)
Chair: Ing-Jer Huang (National Sun Yat-sen University, Taiwan)

These papers are assigned to session B

BO-D:1 (Time: 10:40 - 10:42)
TitleOptimal Synthesis of NNA-Compliant Quantum Circuits in 2-D Architectures by Utilizing Don't Care Conditions
Author*Kyohei Seino, Shigeru Yamashita (Ritsumeikan University, Japan)
KeywordQuantum Circuit, Nearest Neighbor Architecture (NNA), SMT solver
AbstractFor the current and possibly the future technology as well, it is very natural to assume that we can perform quantum operations between only two adjacent physical qubits (quantum bits) to realize a quantum computer. This restriction is called the Nearest Neighbor Architecture (NNA) restriction. Thus there have been many studies on how to convert a quantum circuit with as little overhead as possible such that it satisfies the NNA restriction. There has been proposed a conversion method of quantum circuits to satisfy the NNA restriction by utilizing an SMT solver. In this paper, we propose to consider ``don't care'' conditions in intermediate points of a quantum circuit so that we can improve the existing SMT solver-based conversion method. Experimental results show that our approach can reduce the number of CNOT gates by 18.57% on average compared to the existing method.
Click here to go on-site presentation (to show detail)

BO-D:2 (Time: 10:42 - 10:44)
TitleOn Technology Remapping Approach Using Multi-Gate Functionality of Reconfigurable Cells for Post-Mask ECO
Author*Tomohiro Nishiguchi, Nobutaka Kuroki, Masahiro Numa (Kobe University, Japan)
KeywordECO
AbstractIn this paper, we propose a technology remapping approach using multi-gate RECON cells that configure multiple gates in a single RECON cell while maintaining the layout of the 4T/6T-RECON base cells. Our experimental results confirm that the multi-gate RECON cells are used in 62.1% of the test cases, of which 48.7% have improved slack and 77.2% have improved wire length. The results also show that the number of used spare cells is reduced by 5.12%.
Click here to go on-site presentation (to show detail)

BO-D:3 (Time: 10:44 - 10:46)
TitleBinary Synthesis Using High-Level Synthesizer as its Back-End
AuthorRyo Nakamichi, *Sho Kishimoto, Nagisa Ishiura, Takumi Kondo (Kwansei Gakuin University, Japan)
Keywordbinary synthesis, high-level synthesis, RISC-V
AbstractThis paper presents a facile way to implement binary synthesizers using existing high-level synthesizers as their back-ends. Binary synthesis is a variant of high-level synthesis which translates binary programs into register transfer level hardware models. In the proposed method, C programs in place of CDFGs (control dataflow graphs) are generated from binary programs, which are synthesized into hardware by high-level synthesis. Based on the proposed method, a binary synthesizer for RISC-V (RV32IM) has been implemented using Xilinx Vivado HLS as a back-end high-level synthesizer. The execution cycles and critical path delay of the synthesized circuits, generated from RV32IM binaries compiled from C programs, are almost the same as those of the circuits generated by the high-level synthesizer from the C programs, though the circuit size is 1.00 to 3.32 times larger.
Click here to go on-site presentation (to show detail)

BO-D:4 (Time: 10:46 - 10:48)
TitleAn Error Diagnosis Technique Based on Location Variable Simulation Employing Implicit Representation of Error Location Sets
Author*Hiroki Tsuyama, Akio Masamori, Nobutaka Kuroki, Masahiro Numa (Kobe University, Japan)
Keyworderror diagnosis, ECO
AbstractThis paper presents an error diagnosis technique based on location variable (LV) simulation, which drastically shortens the processing time for screening error location sets using BDD-based implicit representation. An LV is a Boolean variable which indicates whether the function of the location is modified or not. For each signal value with each signal line, the LV-simulation computes a signal value function in terms of LVs which represents necessary conditions for taking that value. Based on the rectification condition to modify every incorrect primary output value to the correct one, the results of screening error location sets are obtained in a BDD-based implicit representation. Experimental results have shown that the proposed technique reduces the processing time by 99.986% at the maximum, and by 86.7% on average.
Click here to go on-site presentation (to show detail)

BO-D:5 (Time: 10:48 - 10:50)
TitleExtending Channel Routing Method for Two-Layer Routing Problem Allowing for Terminals Placed within the Routing Area
Author*Kaito Ishigami, Kunihiro Fujiyoshi (Tokyo University of Agriculture and Technology, Japan)
KeywordV-H Routing, Channel Routing, Dogleg Channel Router, Constraint Graph
AbstractChannel routing method is known that it can efficiently route rectangular regions called channels, which have fixed terminals on the top and bottom sides only. If we can apply this routing method of assigning nets to each track one by one to a general problem which has terminals inside the routing region. It is thought that it will be possible to route more nets and faster than rip-up and reroute method of routing each net one by one. In this paper, we propose a two-layer routing method that can be applied to the general problem, and routes nets to each track one by one like conventional channel routing method. As a results of experiment comparison, the proposed method routed more nets and was faster than Qrouter, which uses rip-up and reroute method.
Click here to go on-site presentation (to show detail)

BO-D:6 (Time: 10:50 - 10:52)
TitleA Study on the Design of Interface Circuits Between Synchronous-Asynchronous Modules Using Click Elements
Author*Shogo Semba, Hiroshi Saito (The University of Aizu, Japan)
Keywordinterface circuits, asynchronous circuits
AbstractIn this paper, we propose interface circuits between synchronous-asynchronous modules using Click Elements. Click Elements are used to control the asynchronous parts in the proposed interface circuits. In the experiment, compared with the interface circuit based on the two-flop synchronizer, the proposed interface circuits could reduce the latency and handshake overhead by up to 4.9 cycles and 17.0 cycles.
Click here to go on-site presentation (to show detail)

BO-D:7 (Time: 10:52 - 10:54)
TitleA Scalable Linear Equation Solver FPGA using High-Level Synthesis
Author*Haopeng Meng, Kazutoshi Wakabayashi, Tadahiro Kuroda (The University of Tokyo, Japan)
KeywordHigh-Level Synthesis, Linear Equation Solver, Scalable FPGA, C-Based Design
AbstractThis paper mainly describes a scalable linear equation solver in FPGA based on Gauss-Jordan Elimination using high-level synthesis (HLS). A C++ generator is created in this work to obtain the HLS code for synthesis, which is able to balance area and performance of sovler by few parameters. Compared with the traditional RTL design, it has higher design efficiency. In the case of best performance, the solver has time complexity of o(N). Due to the high efficient in design, this scalable linear equation solver also could be used as IP in another design. The result is synthesized in NEC CyberWorkBench HLS, and RTL synthesis in Xilinx Vivado, ZYNQ UltraScale+, and ZCU104 Evaluation Kit at 200 MHz.
Click here to go on-site presentation (to show detail)

BO-D:8 (Time: 10:54 - 10:56)
TitleTail Layer CNN Training for a SoC-based FPGA
Author*Yuki Takashima, Akira Jinguji, Ryosuke Kuramochi, Ryota Kayanoma, Hiroki Nakahara (Tokyo Institute of Technology, Japan)
KeywordCNN, image classification, Tail Layer Training, FPGA
AbstractThe demand for deep learning has increased, and many accelerators have been proposed. Although it performs inference at high speed, it cannot perform training. We present the tail layer training for a convolutional neural network (CNN). It is implemented with a conventional CNN accelerator and a CPU. Processing speed retains because most of the CNN is processed in an accelerator, while only the tail layer is updated by the CPU, enabling the weights to be updated or added. Since the number of neurons and classes in the output must be the same for image classification, it is effective for retraining to count the number of classes. We show the relationship between the existing classes and to be added for CIFAR10 and ImageNet datasets. The tail training is unsuitable for many classes. Accuracy loss is negligible when training only the tail layer with two added categories. The processing speed reduction was almost negligible. Our scheme can be applied to the existing SoC-FPGA-based CNN accelerator.
Click here to go on-site presentation (to show detail)

BO-D:9 (Time: 10:56 - 10:58)
TitleA Thermally Optimizing Method of Thin Film Resistor Trimming with Machine Learning
Author*Tomoya Akasaka (Hirosaki University, Japan), Shigeru Hidaka (Nikkohm Co., Ltd, Japan), Ryosuke Watanabe, Taisei Arima, Atsushi Kurokawa, Toshiki Kanamoto (Hirosaki University, Japan)
KeywordThermal, Thin Flim Resistor, Trimming, Machine Learning
AbstractThis paper proposes a novel trimming method of thin-film resistors dedicated to the power modules. Thin-film resistors are utilized for applications including snubber circuits which avoids unwanted ringing appearing in the output voltage of power transistors. Our previous works have revealed that the applicable voltage of a NiCr thin-film resistor is thermally limited, and then the indispensable trimming process affects the degree of temperature rise in use. In this paper, we first formulate the relationship between the target resistance value and the trim dimensions using machine learning. With the obtained equation, we propose a new trimming method, which enables the trimmed pattern to reduce the variability of the maximum temperature rise. The experimental results show that the proposed trimming method can suppress the estimated range of the maximum temperature from 619 K to 179 K.
Click here to go on-site presentation (to show detail)

BO-D:10 (Time: 10:58 - 11:00)
TitleDevelopment of Text Translation System from Tsugaru Dialect into Common Japanese
Author*Taiki Niida, Masashi Imai (Hirosaki University, Japan)
KeywordLanguage translation, Artificial intelligence, Tsugaru dialect, Morphological analysis
AbstractTsugaru dialect can be an obstacle to communication between Aomori residents and residents who have transferred there for work and tourists from outside the prefecture. We are developing a bidirectional voice and text translation system between Tsugaru-ben and common Japanese utilizing artificial intelligence. In this paper, our research project is firstly introduced and the developed text translation system from Tsugaru dialect into common Japanese is explained. Some evaluation results of the morphological analysis and translation tools are also shown.
Click here to go on-site presentation (to show detail)

BO-D:11 (Time: 11:00 - 11:02)
TitleOn Providing Faster IR-Drop Forecast via SVM-Based Solutions
AuthorYa-Ying Chien (NYCU, Taiwan), Chang-Tzu Lin (ITRI, Taiwan), *Hung-Ming Chen (NYCU, Taiwan)
KeywordIR drop, PDN, SVM
AbstractThe objective of this work is to develop a fast and accurate IR drop predictor to reduce runtime of nodal analysis and obtain the IR drop violation result directly from the power delivery network (PDN). Support vector machine (SVM) helps train influential features to predict IR drop result. When constructing the machine learning model, we avoid overfitting and tune the parameter to find better performance. Our work is experimented on a real industry design in TSMC 180 nm process. Experimental result shows our model can efficiently predict IR violation with high accuracy, ranging from 98.11% to 99.42%.
Click here to go on-site presentation (to show detail)


[To Session Table]

Invited Talk I
Time: 13:40 - 14:40, Monday, October 24, 2022
Location: Premier Hall (on-site) / Zoom (online)
Chair: Toshiki Kanamoto (Hirosaki University, Japan)

I1-1 (Time: 13:40 - 14:40)
Title(Invited Talk) Utilization of Dominant Time Constant Information to Improve the Efficiency of Power and Hard-Breakdown Device Simulation
AuthorShigetaka Kumashiro (Kyoto Institute of Technology, Japan)
Pagep. 71
AbstractDominant time constants of the response of the linearized semiconductor device equations (Poisson, electron and hole current continuity) are extracted by using Arnoldi method. A new accurate metric for the time step control in the transient device simulation has been derived based on the dominant time constant information. By using this metric, CPU-time of the transient simulation of a power DMOSFET decreases down to 27% of that by the conventional method. It has been found that Newton iteration diverges if a negative time constant appears during the hard-breakdown simulation of a PN junction. Stable convergence is obtained either by restricting the terminal voltage increment so small that no negative time constant should appear, or by switching to transient simulation upon detecting the appearance of a negative time constant. These two methods are more efficient than the conventional blind-trial-and-error type convergence control method.
PDF file


[To Session Table]

Poster Session (Group V)
Time: 15:00 - 16:40, Monday, October 24, 2022
Location: Gather (online)
Chair: Yukihide Kohira (The University of Aizu, Japan)

V-1 (Time: 15:00 - 15:02)
TitleElectronic Component Placement Optimization for Heat Measures of Smartglasses
Author*Kyosuke Kusumi (Hirosaki University, Japan), Koutaro Hachiya (Teikyo Heisei University, Japan), Ryotaro Kudo, Toshiki Kanamoto, Atsushi Kurokawa (Hirosaki University, Japan)
Pagepp. 72 - 76
KeywordPlacement, NSGA-II, Heat, Smartglasses
AbstractThe thermal aware floor planning for VLSIs and thermal placement optimization of electronic components on printed circuit boards (PCBs) using genetic algorithms (GAs) are well studied. However, these studies do not consider real device shapes and the environment around them. In this paper, we propose a method for optimizing the placement of electronic components equipped on smartglasses using the elitist non-dominated sorting genetic algorithm (NSGA-II) and a thermal resistance circuit. Electronic components that have various dimensions and power consumptions are relocated to minimize the maximum temperature of parts around ears and areas often held by hands simultaneously. The experimental results show that the proposed method effectively reduced the maximum temperature.

V-2 (Time: 15:02 - 15:04)
TitleML-assisted Sizing Approach for Low-Voltage Circuits Considering Process Variation
Author*Ling-Yen Song, Chih-Yun Chou, Tung-Chieh Kuo, Chien-Nan Jimmy Liu, Juinn-Dar Huang (Institute of Electronics, National Yang Ming Chiao Tung University, Taiwan)
Pagepp. 77 - 80
KeywordProcess variation, low-voltage analog circuit sizing, evolutionary algorithm, machine learning
AbstractSizing low power analog circuits is not easy because the increasing uncertainties from low-voltage techniques magnifies process variation effects on the design yield. However, if process variation is also considered, the huge number of simulations becomes almost infeasible for large circuits in traditional approaches. In this paper, we propose a ML-assisted prediction model to speed up the variation-aware circuit sizing technique by skipping many unnecessary simulations. Moreover, a novel force-directed model is proposed to guide the optimization toward better yield without time-consuming Monte Carlo simulations. Compared with prior works, the proposed approach significantly reduces the number of simulations in the yield-aware EA optimization, which helps to generate practical low-voltage designs with high reliability and low cost.

V-3 (Time: 15:04 - 15:06)
TitleTag-Less Compression for FPGA Configuration Data
Author*Souhei Takagi, Naoya Niwa, Yusuke Yanai, Hideharu Amano (Keio University, Japan), Masaki Amagasaki, Yuya Nakazato, Masahiro Iida (Faculty of Advanced Science and Technology, Kumamoto University, Japan)
Pagepp. 81 - 82
Keywordfine-grained reconstruction logic, Run Length compression, FPGA
AbstractSLM (Scalable Logic Module) is a fine-grained reconstruction logic developed by Kumamoto University, which has a small amount of configuration information. As a result, the area of ​​the logic cell is also small. We have this new built-in SLM, CPU, switch and memory I am developing an FPGA. This chip utilizes the small amount of SLM configuration information and stores multiple configuration information in internal memory. It has a function to replace at high speed. In this paper, to store more configuration information data by compressing the configuration information. Propose a method. This compression method can be decompressed at high speed inside the chip and must be possible to implement with simple hardware. Must be. In addition, the configuration information of the target SLM reconstruction logic has many consecutive 0s as a whole, but it is locally 0. There is a part where 1 is mixed. In this paper, TLC (Tag-Less Compression) is a Run Length compression method that meets the above conditions. To propose. TLC specializes in zero sequences, does not require tags (prefixes) unlike many RunLength methods, and is extremely implementationable. It's very easy. Designing an extension circuit with Verilog-HDL and assuming a USJC 55nm process, Synopsys design compa Logic synthesis was performed in Ira. As a result, it was found that the circuit area was as small as 793μm2. Also, the circuit is slow The total length is 3095psec, and it was found that it can be used by incorporating it into a configuration circuit operating at 200MHz.
PDF file

V-4 (Time: 15:06 - 15:08)
TitleCo-optimization of Prefix Structure and Bit-Line Arrangement for Long Bit-Length Parallel Prefix Adders
Author*Kazuya Uryu, Mineo Kaneko (Japan Advanced Institute of Science and Technology, Japan)
Pagepp. 83 - 84
KeywordParallel prefix adder, procedural construction, Simulated annealing
AbstractIn the design of long bit-length adders, the control of wire length would contribute to the speed performance of the adder. This paper proposes a co-optimization of the adder structure and bit-line positioning for generating new solutions having improved speed performance. Through experiments, around 20% improvement in the maximum path delay with a comparable number of prefix components compared with the case of the fixed bit-line positioning has been observed for the 256-bit adder.

V-5 (Time: 15:08 - 15:10)
TitleA Global Buffer and Splitter Insertion Algorithm in AQFP Circuits
Author*Rongliang Fu (The Chinese University of Hong Kong, Hong Kong), Mengmeng Wang (Yokohama National University, Japan), Yirong Kan (Nara Institute of Science and Technology, Japan), Olivia Chen (Tokyo City University, Japan), Nobuyuki Yoshikawa (Yokohama National University, Japan), Tsung-Yi Ho (The Chinese University of Hong Kong, Hong Kong)
Pagepp. 85 - 90
KeywordSuperconducting logic, AQFP, Logic synthesis, Buffer and splitter insertion
AbstractThe extremely low-bit energy characteristic of the adiabatic quantum-flux-parametron (AQFP) logic circuit makes it an attractive candidate for extremely high energy-efficient computing systems. To ensure the circuit functionality of AQFP logic, buffers and splitters must be inserted for dataflow synchronization at all clock phases of the circuit and driving multiple fan-outs. However, existing works on buffer and splitter insertion only perform optimization on either a single net or a few local nets. They can incur redundant Josephson junctions (JJs) that account for much area overhead and energy dissipation in AQFP circuits. This paper proposes a global optimization algorithm for buffer and splitter insertion to resolve the issues above further. First, the logic level of each logic gate is determined by an integer linear programming model that considers the interaction among different nets. Then, an optimal splitter tree is constructed for each net of the input circuit through the dynamic programming-based multi-way search tree generation algorithm. Experimental results on ISCAS'85 circuits show the effectiveness of the proposed method, with an average reduction of 7.37% and 10.51% in the total number of buffers and splitters inserted on the entire benchmarks compared with the methods from ICCAD'21 and DAC'22, respectively. Meanwhile, the depth of each generated circuit does not exceed that of these two methods.

V-6 (Time: 15:10 - 15:12)
TitleHeating of Foreign Object in Inductive Wireless Charging
Author*Issei Sato, Ryotaro Kudo, Toshiki Kanamoto (Hirosaki University, Japan), Koutaro Hachiya (Teikyo Heisei University, Japan), Shinsuke Kashiwazaki, Atsushi Kurokawa (Hirosaki University, Japan)
Pagepp. 91 - 95
Keywordwireless power transfer, foreign object, heat, thermal analysis
AbstractA number of small electronic devices that are used every day, such as smartphones, have been equipped with wireless charging functions. If there is a metal foreign object (FO) near the transmitter and the function to detect and control the FO does not work well, it is unclear how hot the FO will be due to induction heating. This paper presents the thermal analysis results of an FO in inductive wireless charging of mobile devices. The results show that the maximum FO temperatures are 151.6℃ when the receiver (Rx) was directly above the transmitter (Tx), 363.4℃ when the Rx was laterally displaced, and 341.4℃ when there was no Rx.

V-7 (Time: 15:12 - 15:14)
TitleAn Efficient LSI Implementation of the Summation of Products in Convolution Operation for Binarized Neural Networks
Author*Mitsuru Takahashi, Kazuhito Ito (Saitama University, Japan)
Pagepp. 96 - 101
KeywordCNN, BNN, full adder, LSI
AbstractVarious applications of machine learning with convolutional neural networks (CNN) are emerging. A binarized NN (BNN) is a CNN where the number of bits of input, activation, and weight for convolution is one. Hence the inference operation in BNNs is simple and it is suitable for LSI implementation. In this paper, an efficient LSI implementation of the summation of the products in BNNs is proposed. The required number of transistors is reduced by 32% for the convolution kernel of the size 3x3x64.
PDF file


[To Session Table]

Keynote Speech II
Time: 16:40 - 17:40, Monday, October 24, 2022
Location: Premier Hall (on-site) / Zoom (online)
Chair: Hiroyuki Ochi (Ritsumeikan University, Japan)

K2-1 (Time: 16:40 - 17:40)
Title(Keynote Speech) One is not Enough: Using Hybrid Proof Engines for Polynomial Formal Verification
Author*Rolf Drechsler (University of Bremen/DFKI, Germany), Alireza Mahzoon (University of Bremen, Germany)
Pagepp. 102 - 107
AbstractRecently, polynomial formal verification has been introduced as a new concept. The core idea is to consider verification not only as a post-processing step, but from the very beginning. Based on formal proof techniques complexity bounds are given that allow for an efficient verification task. Thus, this overcomes the unpredictability problem and ensures the scalability of verification techniques. Despite this progress, most of the works are still limited to the polynomial verification of individual components, e.g. adders and multipliers, and are based on a monolithic proof engine. Polynomial formal verification of complex systems, consisting of many different sub-components, is an almost unexplored area. The challenge originates from the fact that a verification method (e.g., BDD-based equivalence checking) might verify a sub-component (e.g., an adder) in polynomial time but have an exponential verification complexity for another component (e.g., a multiplier). The concept of polynomial verification is reviewed. Then, we introduce a hybrid verification engine to attack the problem of verifying complex modern systems in polynomial space and time. The engine takes advantage of several verification techniques, such as combinational equivalence checking based on bit-level approaches, like SAT and BDDs, as well as word-level verification based on e.g. SCA and *BMDs. The correctness of each block or system task can be ensured in polynomial time using a specific verification technique from the environment. Thus, we overcome the shortcomings of using only one verification method and pave the way toward polynomial verification of advanced CPUs, DSP blocks, and AI-synthesized architectures.
PDF file



Tuesday, October 25, 2022

[To Session Table]

Invited Talk II
Time: 9:00 - 10:00, Tuesday, October 25, 2022
Location: Premier Hall (on-site) / Zoom (online)
Chair: Hiroyuki Ochi (Ritsumeikan University, Japan)

I2-1 (Time: 9:00 - 10:00)
Title(Invited Talk) Design and Development of Electronic Devices for Driving, Measuring and Controlling Humanoid Robots
AuthorKunio Kojima (The University of Tokyo, Japan)
Pagep. 108
AbstractAlong with the development of electronic devices such as integrated circuits, a variety of robots have been developed, including industrial robots. Our laboratory has developed musculoskeletal humanoids, high-power humanoids, flexible fleshy humanoids, and flying robots. We have studied intelligent robotic behavioral systems, dynamics control of whole-body motions, and hardware configuration of robotic physical structures for the purposes of daily life support, disaster rescue support, industrial manpower saving, and analysis of the body-intelligence mechanism. Since humanoid robots are independent machines, circuits, and systems, various issues such as weight, size, durability, as well as power consumption and processing performance, are involved with each other. Therefore, we have developed circuit devices such as power and sensor devices and inter-device communication boards while enhancing and adding functions required for actual robot applications. This presentation will introduce some of the topics in more detail and discuss the current status and future prospects.
PDF file


[To Session Table]

Poster Session (Group B)
Time: 10:00 - 11:30, Tuesday, October 25, 2022
Location: Premier Hall (on-site)
Chair: Masato Inagi (Hiroshima City University, Japan)

Outstanding Paper Award
B-1 (Time: 10:00 - 10:02)
TitleOptimal Synthesis of NNA-Compliant Quantum Circuits in 2-D Architectures by Utilizing Don't Care Conditions
Author*Kyohei Seino, Shigeru Yamashita (Ritsumeikan University, Japan)
Pagepp. 109 - 114
KeywordQuantum Circuit, Nearest Neighbor Architecture (NNA), SMT solver
AbstractFor the current and possibly the future technology as well, it is very natural to assume that we can perform quantum operations between only two adjacent physical qubits (quantum bits) to realize a quantum computer. This restriction is called the Nearest Neighbor Architecture (NNA) restriction. Thus there have been many studies on how to convert a quantum circuit with as little overhead as possible such that it satisfies the NNA restriction. There has been proposed a conversion method of quantum circuits to satisfy the NNA restriction by utilizing an SMT solver. In this paper, we propose to consider ``don't care'' conditions in intermediate points of a quantum circuit so that we can improve the existing SMT solver-based conversion method. Experimental results show that our approach can reduce the number of CNOT gates by 18.57% on average compared to the existing method.

B-2 (Time: 10:02 - 10:04)
TitleOn Technology Remapping Approach Using Multi-Gate Functionality of Reconfigurable Cells for Post-Mask ECO
Author*Tomohiro Nishiguchi, Nobutaka Kuroki, Masahiro Numa (Kobe University, Japan)
Pagepp. 115 - 120
KeywordECO
AbstractIn this paper, we propose a technology remapping approach using multi-gate RECON cells that configure multiple gates in a single RECON cell while maintaining the layout of the 4T/6T-RECON base cells. Our experimental results confirm that the multi-gate RECON cells are used in 62.1% of the test cases, of which 48.7% have improved slack and 77.2% have improved wire length. The results also show that the number of used spare cells is reduced by 5.12%.

B-3 (Time: 10:04 - 10:06)
TitleBinary Synthesis Using High-Level Synthesizer as its Back-End
AuthorRyo Nakamichi, *Sho Kishimoto, Nagisa Ishiura, Takumi Kondo (Kwansei Gakuin University, Japan)
Pagepp. 121 - 126
Keywordbinary synthesis, high-level synthesis, RISC-V
AbstractThis paper presents a facile way to implement binary synthesizers using existing high-level synthesizers as their back-ends. Binary synthesis is a variant of high-level synthesis which translates binary programs into register transfer level hardware models. In the proposed method, C programs in place of CDFGs (control dataflow graphs) are generated from binary programs, which are synthesized into hardware by high-level synthesis. Based on the proposed method, a binary synthesizer for RISC-V (RV32IM) has been implemented using Xilinx Vivado HLS as a back-end high-level synthesizer. The execution cycles and critical path delay of the synthesized circuits, generated from RV32IM binaries compiled from C programs, are almost the same as those of the circuits generated by the high-level synthesizer from the C programs, though the circuit size is 1.00 to 3.32 times larger.
PDF file

B-4 (Time: 10:06 - 10:08)
TitleAn Error Diagnosis Technique Based on Location Variable Simulation Employing Implicit Representation of Error Location Sets
Author*Hiroki Tsuyama, Akio Masamori, Nobutaka Kuroki, Masahiro Numa (Kobe University, Japan)
Pagepp. 127 - 132
Keyworderror diagnosis, ECO
AbstractThis paper presents an error diagnosis technique based on location variable (LV) simulation, which drastically shortens the processing time for screening error location sets using BDD-based implicit representation. An LV is a Boolean variable which indicates whether the function of the location is modified or not. For each signal value with each signal line, the LV-simulation computes a signal value function in terms of LVs which represents necessary conditions for taking that value. Based on the rectification condition to modify every incorrect primary output value to the correct one, the results of screening error location sets are obtained in a BDD-based implicit representation. Experimental results have shown that the proposed technique reduces the processing time by 99.986% at the maximum, and by 86.7% on average.

B-5 (Time: 10:08 - 10:10)
TitleExtending Channel Routing Method for Two-Layer Routing Problem Allowing for Terminals Placed within the Routing Area
Author*Kaito Ishigami, Kunihiro Fujiyoshi (Tokyo University of Agriculture and Technology, Japan)
Pagepp. 133 - 138
KeywordV-H Routing, Channel Routing, Dogleg Channel Router, Constraint Graph
AbstractChannel routing method is known that it can efficiently route rectangular regions called channels, which have fixed terminals on the top and bottom sides only. If we can apply this routing method of assigning nets to each track one by one to a general problem which has terminals inside the routing region. It is thought that it will be possible to route more nets and faster than rip-up and reroute method of routing each net one by one. In this paper, we propose a two-layer routing method that can be applied to the general problem, and routes nets to each track one by one like conventional channel routing method. As a results of experiment comparison, the proposed method routed more nets and was faster than Qrouter, which uses rip-up and reroute method.

B-6 (Time: 10:10 - 10:12)
TitleA Study on the Design of Interface Circuits Between Synchronous-Asynchronous Modules Using Click Elements
Author*Shogo Semba, Hiroshi Saito (The University of Aizu, Japan)
Pagepp. 139 - 144
Keywordinterface circuits, asynchronous circuits
AbstractIn this paper, we propose interface circuits between synchronous-asynchronous modules using Click Elements. Click Elements are used to control the asynchronous parts in the proposed interface circuits. In the experiment, compared with the interface circuit based on the two-flop synchronizer, the proposed interface circuits could reduce the latency and handshake overhead by up to 4.9 cycles and 17.0 cycles.
PDF file

B-7 (Time: 10:12 - 10:14)
TitleA Scalable Linear Equation Solver FPGA using High-Level Synthesis
Author*Haopeng Meng, Kazutoshi Wakabayashi, Tadahiro Kuroda (The University of Tokyo, Japan)
Pagepp. 145 - 150
KeywordHigh-Level Synthesis, Linear Equation Solver, Scalable FPGA, C-Based Design
AbstractThis paper mainly describes a scalable linear equation solver in FPGA based on Gauss-Jordan Elimination using high-level synthesis (HLS). A C++ generator is created in this work to obtain the HLS code for synthesis, which is able to balance area and performance of sovler by few parameters. Compared with the traditional RTL design, it has higher design efficiency. In the case of best performance, the solver has time complexity of o(N). Due to the high efficient in design, this scalable linear equation solver also could be used as IP in another design. The result is synthesized in NEC CyberWorkBench HLS, and RTL synthesis in Xilinx Vivado, ZYNQ UltraScale+, and ZCU104 Evaluation Kit at 200 MHz.
PDF file

B-8 (Time: 10:14 - 10:16)
TitleTail Layer CNN Training for a SoC-based FPGA
Author*Yuki Takashima, Akira Jinguji, Ryosuke Kuramochi, Ryota Kayanoma, Hiroki Nakahara (Tokyo Institute of Technology, Japan)
Pagepp. 151 - 156
KeywordCNN, image classification, Tail Layer Training, FPGA
AbstractThe demand for deep learning has increased, and many accelerators have been proposed. Although it performs inference at high speed, it cannot perform training. We present the tail layer training for a convolutional neural network (CNN). It is implemented with a conventional CNN accelerator and a CPU. Processing speed retains because most of the CNN is processed in an accelerator, while only the tail layer is updated by the CPU, enabling the weights to be updated or added. Since the number of neurons and classes in the output must be the same for image classification, it is effective for retraining to count the number of classes. We show the relationship between the existing classes and to be added for CIFAR10 and ImageNet datasets. The tail training is unsuitable for many classes. Accuracy loss is negligible when training only the tail layer with two added categories. The processing speed reduction was almost negligible. Our scheme can be applied to the existing SoC-FPGA-based CNN accelerator.
PDF file

B-9 (Time: 10:16 - 10:18)
TitleA Thermally Optimizing Method of Thin Film Resistor Trimming with Machine Learning
Author*Tomoya Akasaka (Hirosaki University, Japan), Shigeru Hidaka (Nikkohm Co., Ltd, Japan), Ryosuke Watanabe, Taisei Arima, Atsushi Kurokawa, Toshiki Kanamoto (Hirosaki University, Japan)
Pagepp. 157 - 162
KeywordThermal, Thin Flim Resistor, Trimming, Machine Learning
AbstractThis paper proposes a novel trimming method of thin-film resistors dedicated to the power modules. Thin-film resistors are utilized for applications including snubber circuits which avoids unwanted ringing appearing in the output voltage of power transistors. Our previous works have revealed that the applicable voltage of a NiCr thin-film resistor is thermally limited, and then the indispensable trimming process affects the degree of temperature rise in use. In this paper, we first formulate the relationship between the target resistance value and the trim dimensions using machine learning. With the obtained equation, we propose a new trimming method, which enables the trimmed pattern to reduce the variability of the maximum temperature rise. The experimental results show that the proposed trimming method can suppress the estimated range of the maximum temperature from 619 K to 179 K.

B-10 (Time: 10:18 - 10:20)
TitleDevelopment of Text Translation System from Tsugaru Dialect into Common Japanese
Author*Taiki Niida, Masashi Imai (Hirosaki University, Japan)
Pagepp. 163 - 167
KeywordLanguage translation, Artificial intelligence, Tsugaru dialect, Morphological analysis
AbstractTsugaru dialect can be an obstacle to communication between Aomori residents and residents who have transferred there for work and tourists from outside the prefecture. We are developing a bidirectional voice and text translation system between Tsugaru-ben and common Japanese utilizing artificial intelligence. In this paper, our research project is firstly introduced and the developed text translation system from Tsugaru dialect into common Japanese is explained. Some evaluation results of the morphological analysis and translation tools are also shown.
PDF file

B-11 (Time: 10:20 - 10:22)
TitleOn Providing Faster IR-Drop Forecast via SVM-Based Solutions
AuthorYa-Ying Chien (NYCU, Taiwan), Chang-Tzu Lin (ITRI, Taiwan), *Hung-Ming Chen (NYCU, Taiwan)
Pagepp. 168 - 171
KeywordIR drop, PDN, SVM
AbstractThe objective of this work is to develop a fast and accurate IR drop predictor to reduce runtime of nodal analysis and obtain the IR drop violation result directly from the power delivery network (PDN). Support vector machine (SVM) helps train influential features to predict IR drop result. When constructing the machine learning model, we avoid overfitting and tune the parameter to find better performance. Our work is experimented on a real industry design in TSMC 180 nm process. Experimental result shows our model can efficiently predict IR violation with high accuracy, ranging from 98.11% to 99.42%.


[To Session Table]

Poster Session (Group C)
Time: 10:00 - 11:30, Tuesday, October 25, 2022
Location: Gather (online)
Chair: Takeshi Kumaki (Ritsumeikan University, Japan)

These papers are assigned to session C

CO-D:1 (Time: 10:00 - 10:02)
TitleDNN-based Accelerator for Intelligent Robotic Arm Control with High-Level Synthesis
Author*Yu-Chien Chung, Hao-Hsiang Lian, Yong-Lun Xiao, Chih-Tsun Huang, Jing-Jia Liou (National Tsing Hua University, Taiwan)
KeywordDNN, Accelerator, Intelligent Robotics, High-Level Synthesis
AbstractIntelligent robotics leverages deep learning to boost collaboration between humans and devices. Robotic controllers require a low-latency computation process for a real-time response when facing dynamic situations. Also, in the meantime, more controllers are designed with DNN-based reinforcement learning, which may need more computation power. In this paper, we use high-level synthesis to implement a DNN-based controller on an FPGA. The FPGA is built with an ESP SoC (System-on-Chip) platform, integrated with, and controlled through a host computer. We demonstrated the complete end-to-end controller system on a virtual robotic arm with 1041 times speedup compared with a CPU-based software implementation.
Click here to go on-site presentation (to show detail)

CO-D:2 (Time: 10:02 - 10:04)
TitleTrotter Based Parallel Processing of Quantum Annealing for FPGA
Author*Sohei Shimomai, Shinji Kimura (Waseda University, Japan)
KeywordSimulated Quantum Annealing, Quantum Monte Carlo, Ising Model, Trotter Parallel
AbstractQuantum annealing is a combinatorial optimization algorithm based on the energy minimization of correlating spins, and its simulation method based on quantum Monte Carlo has been used. The paper proposes a trotter-oriented emulation method of quantum Monte Carlo method for FPGA. Random spin toggles in trotters are manipulated in parallel with sharing spin information of adjacent trotters. By using the Mersenne twistor method to compute random numbers and by incorporating information about neighboring trotters, proposed parallel processing can obtain the same accuracy as in the case of serial processing. The proposed method gains more than 20 times speed-up compared with a serial execution of hardware on 32 trotter case.
Click here to go on-site presentation (to show detail)

CO-D:3 (Time: 10:04 - 10:06)
TitleAn Efficient Realization of Power-Root SC Calculations by Inserting Bits
Author*Yuto Arimura, Shigeru Yamashita (Ritsumeikan University, Japan)
KeywordStochastic Computing, Power-Root
AbstractStochastic Computing (SC) is an approximate computation method that uses the probability of the existence of 1's in a bit-stream called a Stochastic Number (SN). SN allows some operations to be performed much more efficiently than the conventional binary operations. However, for some complex functions, we do not know an efficient method to calculate a function by using SNs efficiently; it is necessary to realize a circuit by using so-called the polynomial approximation method for such functions. In this paper, we propose a method to realize SC power root more efficiently than the polynomial approximation. We also present our experimental results for the arithmetic accuracy of SC cubic root; from the experimental results, we can confirm that our proposed method can realize more accurate SC cubic root calculation than the polynomial approximation method when the precision level of SNs is set to 256 bits or more.
Click here to go on-site presentation (to show detail)

CO-D:4 (Time: 10:06 - 10:08)
TitleAn NDA-free Oriented Open PDK Technology and EDA for Small Volume LSI Developments
Author*Seijiro Moriyama (Anagix Corporation, Japan), Tadaaki Tsuchiya, Shingo Ura (Logic Research Co., Ltd., Japan)
KeywordEDA, PDK, PCell, Open Source, Analog
AbstractIP sharing and reuse are indispensable for small-scale (multivariate) LSI development. To make this possible, it is desirable that the PDK is open and the EDA tools are also open source. We are developing a technology that splits PDK development into process-dependent and process-independent parts. The latter can receive benefits from being open source. Our minimal EDA applies the open PDK technology to a wide range of process technologies from Minimal Fab to Skywater 130nm process. This technology can be applied to processes that require NDAs as well. Regardless of NDA, we will be able to develop PDKs at low cost and in a short period of time. We hope to provide high-value-added LSI developers with an environment where they can select semiconductor manufacturing fabs best suited for their needs.
Click here to go on-site presentation (to show detail)

CO-D:5 (Time: 10:08 - 10:10)
TitleDevelopment of Diagnosis-based Hardware Trojan Tolerate System
Author*Takuro Kasai, Masashi Imai (Hirosaki University, Japan)
KeywordHardware Trojan, Artificial Intelligence, Diagnosis, Power consumption
AbstractHardware Trojan threats caused by adversaries have become one of serious issues. It has been recognized that it is significantly difficult to detect all the hardware Trojans in field. In this paper, a diagnosis-based hardware Trojan tolerate system with deep learning scheme is introduced. Several collection methods of dynamic information in order to judge whether a target behavior is normal or abnormal are explained and some evaluation results are shown.
Click here to go on-site presentation (to show detail)

CO-D:6 (Time: 10:10 - 10:12)
TitleFeasibility Study of DSP Block Mapping Algorithms for FPGAs Utilizing SAT-solver and Top-down ZDD Construction
Author*Takuya Serizawa, Koyo Shibata (Ritsumeikan University, Japan), Takashi Imagawa (Meiji University, Japan), Hiroyuki Ochi (Ritsumeikan University, Japan)
KeywordDatapath synthesis, Design space exploration, Data flow graph, Optimal covering, Valid structure enumeration
AbstractThis paper proposes two algorithms to find the exact optimal technology mapping solution(s) for DSP blocks of FPGAs, one using SAT solver and the other using a top-down ZDD construction method. The exhaustive depth-first search algorithm for DSP block mapping by Shibata et al. introduced several complicated rules for pruning and graph partitioning for speeding up. In contrast, the proposed ones are relatively simple. The runtime of the SAT-solver-based method is comparable to that of Shibata et al., and the ZDD-based method can enumerate all optimal solutions.
Click here to go on-site presentation (to show detail)

CO-D:7 (Time: 10:12 - 10:14)
TitleEvaluating Accuracy of Quantum Circuit Learning via Quantum Circuit Mapping
Author*Nanao Segawa, Takashi Sato (Graduate School of Informatics, Kyoto University, Japan)
KeywordQuantum, Quantum machine learning, Quantum circuit mapping, Qubit allocation, NISQ
AbstractThe quantum computation on noisy intermediate-scale quantum (NISQ) devices with limited resources has become a reality. The most significant concern in the computation using NISQ devices is error. Algorithms that runs on this device must be error tolerant because errors made during computation cannot be corrected. In the transformation called quantum circuit mapping, which ensures all 2-bit operations can honor physical proximity constraints, errors may need to be taken into account. In this paper, we evaluate the impact of the error on the mapping of quantum circuit learning (QCL), one of the error-tolerant NISQ algorithms. We run QCL using different mappings and quantitatively investigated the changes in accuracy and convergence. The results show that different mappings can change the accuracy by up to 17%, which indicates better mappings considering error is necessary.
Click here to go on-site presentation (to show detail)

CO-D:8 (Time: 10:14 - 10:16)
TitlePCB Component Copper Landing Pad Design Optimization
Author*Hsiao-Chieh Ma, Yi-Yu Liu (National Taiwan University of Science and Technology, Taiwan)
KeywordComputational Geometry, Copper Landing Pad, Design Rule Check, PCB Layout Legalization
AbstractAs the density of electronic components increases in modern PCB designs, the adjustment of copper landing pads has become a complex and essential issue during PCB layout design stage. Common copper landing pad adjustment strategies are optimized by experienced PCB layout engineers. However, manual designs are error-prone and may suffer reliability degradation. In this paper, we propose an optimization framework to legalize copper landing pads via pad offset, pad cutting, and pad shrinking operations, with minimal pad distortion. The experimental results demonstrate the effectiveness to significantly reduce the manual task of PCB layout engineers for time and effort saving.
Click here to go on-site presentation (to show detail)

CO-D:9 (Time: 10:16 - 10:18)
TitleFlat-Shape Capacitive Sensor of Droplet Contact-Angle for Electrowetting-on-Dielectric Microfluidic Systems
AuthorTomohiro Kodaniguchi, *Akira Tsuchiya, Toshiyuki Inoue, Keiji Kishine (The University of Shiga Prefecture, Japan)
KeywordMicrofluidic, contact angle, capacitive sensor
AbstractThis paper proposes a fully-electrical and flat-shape sensor for contact-angle of droplet on microfluidic systems. Contact-angle sensor is an important feature for electrowetting-on-dielectric microfluidic systems. we employ planar-type capacitors for contact-angle estimation. By improving the estimation procedure, the proposed method can estimate from 40 deg. to 120 deg. contact angle. We verified the proposed method by electromagnetic simulation and measurement of proof-of-concept model.
Click here to go on-site presentation (to show detail)

CO-D:10 (Time: 10:18 - 10:20)
TitleRemote Access Tag Array for Efficient GPU Intra-Cluster Data Sharing
AuthorBo-Wun Cheng, *En-Ming Huang, Chen-Hao Chao, Wei-Fang Sun (National Tsing Hua University, Taiwan), Tsung-Tai Yeh (National Yang Ming Chiao Tung University, Taiwan), Chun-Yi Lee (National Tsing Hua University, Taiwan)
KeywordGPU, Cache
AbstractIn this work, we aim to address the memory congestion problem of modern GPUs by incorporating a remote access tag array (RATA) into the baseline architecture. With the assistance of RATA, GPUs are able to service replicated cache requests within stream multiprocessor (SM) clusters without resorting to the level-two (L2) cache. Our experimental results show that the adoption of RATA has the potential to alleviate the memory congestion problem and enhance the overall system throughput.
Click here to go on-site presentation (to show detail)

CO-D:11 (Time: 10:20 - 10:22)
TitleOn-Interposer Decoupling Capacitors Placement for Interposer-based 3DIC
AuthorBo-Yang Chen, Chang-Yun Liu, Bo-Tsang Huang, *Hung-Ming Chen (NYCU, Taiwan)
Keyword3DIC, PDN, iCap
AbstractWith the demand for high performance and density, silicon interposer-based three-dimensional integrated circuit (3DIC) has become a promising solution for these requirements. However, simultaneously switching noise (SSN) will cause voltage fluctuation and hence performance degradation and logic failure. Our work proposes an efficient Simulated Annealing (SA) based algorithm to perform decap placement automatically. In our solution, target impedance canbe achieved within certain frequency range. Results show that number of decaps as well as impedance of PDN are minimized.
Click here to go on-site presentation (to show detail)


[To Session Table]

Invited Talk III
Time: 13:00 - 14:00, Tuesday, October 25, 2022
Location: Premier Hall (on-site) / Zoom (online)
Chair: Yoshinori Takeuchi (Kindai University, Japan)

I3-1 (Time: 13:00 - 14:00)
Title(Invited Talk) Challenges and Opportunities for New Radio New Type Communications for 5G and Beyond
AuthorJen-Ming Wu (National Tsing Hua University/Hon Hai Research Institute, Taiwan)
Pagep. 172
AbstractThe next-generation wireless communications will address the demands not only for cellular networks but also internet of everything. Varieties of new type applications would be inspired by the communications capabilities in high data rate, low latency, massive connections, and ubiquitous coverage. In this talk, we will discuss the challenges and the opportunities for new type communications in B5G/6G, especially in the areas of intelligent vehicle-to-everything (V2X) and the non-terrestrial network (NTN) communications. In particular, we will provide comprehensive coverage of challenges of autonomous driving and how intelligent V2X can help to cover the vulnerability. We will also discuss the emerging NTN over low earth orbit (LEO) satellite communications and how the NTN could help to facilitate the intelligent V2X.
PDF file


[To Session Table]

Poster Session (Group C)
Time: 14:00 - 15:30, Tuesday, October 25, 2022
Location: Premier Hall (on-site)
Chair: Mahfuzul Islam (Kyoto University, Japan)

Best Paper Award
C-1 (Time: 14:00 - 14:02)
TitleDNN-based Accelerator for Intelligent Robotic Arm Control with High-Level Synthesis
Author*Yu-Chien Chung, Hao-Hsiang Lian, Yong-Lun Xiao, Chih-Tsun Huang, Jing-Jia Liou (National Tsing Hua University, Taiwan)
Pagepp. 173 - 177
KeywordDNN, Accelerator, Intelligent Robotics, High-Level Synthesis
AbstractIntelligent robotics leverages deep learning to boost collaboration between humans and devices. Robotic controllers require a low-latency computation process for a real-time response when facing dynamic situations. Also, in the meantime, more controllers are designed with DNN-based reinforcement learning, which may need more computation power. In this paper, we use high-level synthesis to implement a DNN-based controller on an FPGA. The FPGA is built with an ESP SoC (System-on-Chip) platform, integrated with, and controlled through a host computer. We demonstrated the complete end-to-end controller system on a virtual robotic arm with 1041 times speedup compared with a CPU-based software implementation.
PDF file

C-2 (Time: 14:02 - 14:04)
TitleTrotter Based Parallel Processing of Quantum Annealing for FPGA
Author*Sohei Shimomai, Shinji Kimura (Waseda University, Japan)
Pagepp. 178 - 183
KeywordSimulated Quantum Annealing, Quantum Monte Carlo, Ising Model, Trotter Parallel
AbstractQuantum annealing is a combinatorial optimization algorithm based on the energy minimization of correlating spins, and its simulation method based on quantum Monte Carlo has been used. The paper proposes a trotter-oriented emulation method of quantum Monte Carlo method for FPGA. Random spin toggles in trotters are manipulated in parallel with sharing spin information of adjacent trotters. By using the Mersenne twistor method to compute random numbers and by incorporating information about neighboring trotters, proposed parallel processing can obtain the same accuracy as in the case of serial processing. The proposed method gains more than 20 times speed-up compared with a serial execution of hardware on 32 trotter case.
PDF file

C-3 (Time: 14:04 - 14:06)
TitleAn Efficient Realization of Power-Root SC Calculations by Inserting Bits
Author*Yuto Arimura, Shigeru Yamashita (Ritsumeikan University, Japan)
Pagepp. 184 - 189
KeywordStochastic Computing, Power-Root
AbstractStochastic Computing (SC) is an approximate computation method that uses the probability of the existence of 1's in a bit-stream called a Stochastic Number (SN). SN allows some operations to be performed much more efficiently than the conventional binary operations. However, for some complex functions, we do not know an efficient method to calculate a function by using SNs efficiently; it is necessary to realize a circuit by using so-called the polynomial approximation method for such functions. In this paper, we propose a method to realize SC power root more efficiently than the polynomial approximation. We also present our experimental results for the arithmetic accuracy of SC cubic root; from the experimental results, we can confirm that our proposed method can realize more accurate SC cubic root calculation than the polynomial approximation method when the precision level of SNs is set to 256 bits or more.

C-4 (Time: 14:06 - 14:08)
TitleAn NDA-free Oriented Open PDK Technology and EDA for Small Volume LSI Developments
Author*Seijiro Moriyama (Anagix Corporation, Japan), Tadaaki Tsuchiya, Shingo Ura (Logic Research Co., Ltd., Japan)
Pagepp. 190 - 195
KeywordEDA, PDK, PCell, Open Source, Analog
AbstractIP sharing and reuse are indispensable for small-scale (multivariate) LSI development. To make this possible, it is desirable that the PDK is open and the EDA tools are also open source. We are developing a technology that splits PDK development into process-dependent and process-independent parts. The latter can receive benefits from being open source. Our minimal EDA applies the open PDK technology to a wide range of process technologies from Minimal Fab to Skywater 130nm process. This technology can be applied to processes that require NDAs as well. Regardless of NDA, we will be able to develop PDKs at low cost and in a short period of time. We hope to provide high-value-added LSI developers with an environment where they can select semiconductor manufacturing fabs best suited for their needs.
PDF file

C-5 (Time: 14:08 - 14:10)
TitleDevelopment of Diagnosis-based Hardware Trojan Tolerate System
Author*Takuro Kasai, Masashi Imai (Hirosaki University, Japan)
Pagepp. 196 - 197
KeywordHardware Trojan, Artificial Intelligence, Diagnosis, Power consumption
AbstractHardware Trojan threats caused by adversaries have become one of serious issues. It has been recognized that it is significantly difficult to detect all the hardware Trojans in field. In this paper, a diagnosis-based hardware Trojan tolerate system with deep learning scheme is introduced. Several collection methods of dynamic information in order to judge whether a target behavior is normal or abnormal are explained and some evaluation results are shown.
PDF file

C-6 (Time: 14:10 - 14:12)
TitleFeasibility Study of DSP Block Mapping Algorithms for FPGAs Utilizing SAT-solver and Top-down ZDD Construction
Author*Takuya Serizawa, Koyo Shibata (Ritsumeikan University, Japan), Takashi Imagawa (Meiji University, Japan), Hiroyuki Ochi (Ritsumeikan University, Japan)
Pagepp. 198 - 203
KeywordDatapath synthesis, Design space exploration, Data flow graph, Optimal covering, Valid structure enumeration
AbstractThis paper proposes two algorithms to find the exact optimal technology mapping solution(s) for DSP blocks of FPGAs, one using SAT solver and the other using a top-down ZDD construction method. The exhaustive depth-first search algorithm for DSP block mapping by Shibata et al. introduced several complicated rules for pruning and graph partitioning for speeding up. In contrast, the proposed ones are relatively simple. The runtime of the SAT-solver-based method is comparable to that of Shibata et al., and the ZDD-based method can enumerate all optimal solutions.
PDF file

C-7 (Time: 14:12 - 14:14)
TitleEvaluating Accuracy of Quantum Circuit Learning via Quantum Circuit Mapping
Author*Nanao Segawa, Takashi Sato (Graduate School of Informatics, Kyoto University, Japan)
Pagepp. 204 - 209
KeywordQuantum, Quantum machine learning, Quantum circuit mapping, Qubit allocation, NISQ
AbstractThe quantum computation on noisy intermediate-scale quantum (NISQ) devices with limited resources has become a reality. The most significant concern in the computation using NISQ devices is error. Algorithms that runs on this device must be error tolerant because errors made during computation cannot be corrected. In the transformation called quantum circuit mapping, which ensures all 2-bit operations can honor physical proximity constraints, errors may need to be taken into account. In this paper, we evaluate the impact of the error on the mapping of quantum circuit learning (QCL), one of the error-tolerant NISQ algorithms. We run QCL using different mappings and quantitatively investigated the changes in accuracy and convergence. The results show that different mappings can change the accuracy by up to 17%, which indicates better mappings considering error is necessary.

C-8 (Time: 14:14 - 14:16)
TitlePCB Component Copper Landing Pad Design Optimization
Author*Hsiao-Chieh Ma, Yi-Yu Liu (National Taiwan University of Science and Technology, Taiwan)
Pagepp. 210 - 215
KeywordComputational Geometry, Copper Landing Pad, Design Rule Check, PCB Layout Legalization
AbstractAs the density of electronic components increases in modern PCB designs, the adjustment of copper landing pads has become a complex and essential issue during PCB layout design stage. Common copper landing pad adjustment strategies are optimized by experienced PCB layout engineers. However, manual designs are error-prone and may suffer reliability degradation. In this paper, we propose an optimization framework to legalize copper landing pads via pad offset, pad cutting, and pad shrinking operations, with minimal pad distortion. The experimental results demonstrate the effectiveness to significantly reduce the manual task of PCB layout engineers for time and effort saving.
PDF file

C-9 (Time: 14:16 - 14:18)
TitleFlat-Shape Capacitive Sensor of Droplet Contact-Angle for Electrowetting-on-Dielectric Microfluidic Systems
AuthorTomohiro Kodaniguchi, *Akira Tsuchiya, Toshiyuki Inoue, Keiji Kishine (The University of Shiga Prefecture, Japan)
Pagepp. 216 - 220
KeywordMicrofluidic, contact angle, capacitive sensor
AbstractThis paper proposes a fully-electrical and flat-shape sensor for contact-angle of droplet on microfluidic systems. Contact-angle sensor is an important feature for electrowetting-on-dielectric microfluidic systems. we employ planar-type capacitors for contact-angle estimation. By improving the estimation procedure, the proposed method can estimate from 40 deg. to 120 deg. contact angle. We verified the proposed method by electromagnetic simulation and measurement of proof-of-concept model.
PDF file

C-10 (Time: 14:18 - 14:20)
TitleRemote Access Tag Array for Efficient GPU Intra-Cluster Data Sharing
AuthorBo-Wun Cheng, *En-Ming Huang, Chen-Hao Chao, Wei-Fang Sun (National Tsing Hua University, Taiwan), Tsung-Tai Yeh (National Yang Ming Chiao Tung University, Taiwan), Chun-Yi Lee (National Tsing Hua University, Taiwan)
Pagepp. 221 - 222
KeywordGPU, Cache
AbstractIn this work, we aim to address the memory congestion problem of modern GPUs by incorporating a remote access tag array (RATA) into the baseline architecture. With the assistance of RATA, GPUs are able to service replicated cache requests within stream multiprocessor (SM) clusters without resorting to the level-two (L2) cache. Our experimental results show that the adoption of RATA has the potential to alleviate the memory congestion problem and enhance the overall system throughput.
PDF file

C-11 (Time: 14:20 - 14:22)
TitleOn-Interposer Decoupling Capacitors Placement for Interposer-based 3DIC
AuthorBo-Yang Chen, Chang-Yun Liu, Bo-Tsang Huang, *Hung-Ming Chen (NYCU, Taiwan)
Pagepp. 223 - 228
Keyword3DIC, PDN, iCap
AbstractWith the demand for high performance and density, silicon interposer-based three-dimensional integrated circuit (3DIC) has become a promising solution for these requirements. However, simultaneously switching noise (SSN) will cause voltage fluctuation and hence performance degradation and logic failure. Our work proposes an efficient Simulated Annealing (SA) based algorithm to perform decap placement automatically. In our solution, target impedance canbe achieved within certain frequency range. Results show that number of decaps as well as impedance of PDN are minimized.
PDF file


[To Session Table]

Poster Session (Group A)
Time: 14:00 - 15:30, Tuesday, October 25, 2022
Location: Gather (online)
Chair: Shimpei Sato (Shinshu University, Japan)

These papers are assigned to session A

AO-D:1 (Time: 14:00 - 14:02)
TitleFull Hardware Implementation of RTOS-Based Systems Using General High-Level Synthesizer
AuthorTakuya Ando, Iori Muguruma, *Yugo Ishii, Nagisa Ishiura (Kwansei Gakuin University, Japan), Hiroyuki Tomiyama (Ritsumeikan University, Japan), Hiroyuki Kanbara (ASTEM RI/Kyoto, Japan)
KeywordRTOS, full-hardware implementation, high-level synthesis, real-time systems
AbstractThis article proposes a method for implementing an RTOS-based system as hardware using a general high-level synthesizer. Oosako proposed a full hardware scheme where all the tasks and all the RTOS functions are implemented as hardware. However, it depended on special features of an in-house binary synthesizer ACAP; a synthesized hardware module has a stall port by which module's execution can be suspended, and accesses to global variables are automatically translated to accesses to the single memory space without rewriting the source program. Moreover, the size of the resulting circuits was too large for practical use. This paper proposes a new architecture that can dispense with the stall ports and also reduces the size of the resulting circuits. This paper also presents a wrapper class for global variable accesses and a style of programs to minimize the rewriting of task programs. Based on the proposed method, a hardware module for a reduced version of ``sample1'' bundled with TOPPERS/ASP3 has been successfully implemented as hardware using Xilinx Vitis HLS. Moreover, the size of the resulting circuit was 89 smaller than that by the previous method.
Click here to go on-site presentation (to show detail)

AO-D:2 (Time: 14:02 - 14:04)
TitleSNRoverSDNN: A Metric for Robust CNN-based ROI Selection in Remote Heart Rate Extraction
Author*Yuta Hitotsuyanagi, Takashi Sato (Graduate School of Informatics, Kyoto University, Japan)
KeywordHeart rate, Cameras, Convolutional neural networks, Image color analysis, Biomedical measurement
AbstractRemote photoplethysmography (rPPG) is a method to estimate heart rate (HR) using video cameras. It enables non-contact HR estimation with inexpensive cameras,allowing subjects to conveniently measure HR without being restraint or feeling discomfort. In rPPG, it is important to select a region of interest (ROI), which is suitable for HR estimaion. We propose a new metric SNRoverSDNN for CNN-based ROI selection. SNRoverSDNN takes into account harmonics and periodicity of the heartbeat. Using SNRoverSDNN, we could select reasonable ROIs without face detection.
Click here to go on-site presentation (to show detail)

AO-D:3 (Time: 14:04 - 14:06)
TitleHardware RTOS Services for Full Hardware Implementation of RTOS-Based Systems
Author*Hiro Minamiguchi, Masaki Nakahara, Yugo Ishii, Yukino Shinohara, Iori Muguruma, Nagisa Ishiura (Kwansei Gakuin University, Japan)
KeywordRTOS, full hardware implementation, high-level synthesisi, real-time systems
AbstractThis paper presents hardware implementation of RTOS services for full hardware implementation of RTOS-based systems, where all the task programs and all the RTOS functions are implemented as hardware. Hardware methods for processing services of mutexes, event flags, data queues, shared variable accesses, and task control are proposed. Wait and release operations necessary in synchronization and communication services are efficiently performed using a request arbitration module. Timeouts are also handled by hardware using distributed timers. A hardware module that contains two mutexes, two event flags, one data queue of 320B data, and shared variable of 1024B, as well as task scheduling and control functions, has been designed in Verilog HDL. It was synthesized to an FPGA circuit of 4,300 LUTs and 2,200 flip-flops (Xilinx Artix-7). All the services can be executed well in 150 ns, which is fast enough even for extreme applications.
Click here to go on-site presentation (to show detail)

AO-D:4 (Time: 14:06 - 14:08)
TitleImportance Evaluation Methodology of FFs for Design Optimization of Approximate Computing Circuits
Author*Jiaxuan Lu, Yutaka Masuda, Tohru Ishihara (Nagoya University, Japan)
Keywordapproximate computing, importance evaluation, fault injection
AbstractApproximate computing (AC) has attracted much attention, contributing to energy saving and performance improvement by accurately performing the important computation and approximating others. In order to make AC circuits practical, we need to determine which computation is important carefully, and thus approximate unimportant computations to maintain the required computational quality.In this paper, we focus on the importance of computations at the Flip-Flop (FF) level and propose a novel importance evaluation methodology. The key idea of the proposed methodology is a two-step fault injection algorithm to extract the near-optimal set of unimportant FFs. In the first step, the proposed methodology derives the importance of each FF. Then, in the second step, the proposed methodology extracts the set of unimportant FFs in a binary search manner. Thanks to the two-step strategy, the proposed algorithm reduces the complexity of architecture exploration from an exponential order to a linear order without understanding the functionality and behavior of the target application program. In a case study of an image processing accelerator, the proposed methodology finds out that 21.8% of FFs can be approximated, resulting in 16.5% area reduction and 19.1% power saving while satisfying the image quality constraint.
Click here to go on-site presentation (to show detail)

AO-D:5 (Time: 14:08 - 14:10)
TitleBottleneck Channel Routing to Reduce the Area of Analog VLSI
Author*Kazuya Taniguchi, Satoshi Tayu, Atsushi Takahashi (Tokyo Institute of Technology, Japan), Yukichi Todoroki, Makoto Minami (Jedat, Japan)
Keywordtwo-layer Bottleneck Routing, Analog VLSI
AbstractDesign automation that realizes analog integrated circuits to meet performance specifications in a small area is desired. To reduce the layout area, “Bottleneck Channel Routing” is proposed in which two wires go through a routing track in the bottleneck region. A two-layer routing problem that consists of the bottleneck channel and the adjacent regions where the HV rule is not applicable is defined. The proposed algorithm uses a U-shaped routing model, and generates two-layer routing in which the number of intersections is minimized and the wire of a net includes at most one via. The obtained routing contains no conflicts if the algorithm outputs a feasible solution.
Click here to go on-site presentation (to show detail)

AO-D:6 (Time: 14:10 - 14:12)
TitleBinding and Scheduling of 2×3 Mixers for Transport-Free Sample Preparation Using Programmable Microfluidic Devices
Author*Masataka Hirai, Shigeru Yamashita (Ritsumeikan University, Japan), Sudip Roy (Indian Institute of Technology (IIT) Roorkee, India), Hiroyuki Tomiyama (Ritsumeikan University, Japan)
KeywordBiochip, PMD
AbstractA Programmable Microfluidic Devices (PMD) is one of the promising biochip platforms. On a PMD, fluids are mixed by a module called a mixer. We can generate various kinds of mixers, such as a 2x2 mixer and a 2x3 mixer consisting of 2x2 and 2x3 arrays of cells, respectively. Unlike other biochip platforms, we cannot move a fluid from one cell to another cell in a PMD. Thus, it has been proposed ``No Transport Mixing (NTM)'' which is a method to bind and schedule mixers without droplet transportation. However, NTM can treat only 2x2 mixers. Thus, in this paper, we propose an efficient method to bind and schedule mixing operations using 2x3 mixers as well as 2x2 mixers. Our method is based on a transformation of a given mixing tree based on our proposed ``Placement Priority'' values. Simulation results can confirm that our transformation indeed is useful to decrease the number of ``flushing'' operations required in sample preparation using a PMD.
Click here to go on-site presentation (to show detail)

AO-D:7 (Time: 14:12 - 14:14)
TitleSegmented DAC Linearity Improvement Algorithm Using Unit Cell Sorted Alternately with Digital Method
Author*Yi Liu, Anna Kuwana, Shogo Katayama, Xiongyan Li (Gunma University, Japan), Atsushi Motozawa (Renesas Electronics Corporation, Japan), Haruo Kobayashi (Gunma University, Japan)
KeywordDAC, SSPA, DNL, INL
AbstractThis paper describes a self-calibration method for a current-steering Digital-to-Analog Converter (DAC) with a voltage-controlled oscillator (VCO). It is a digital method and does not require high precision analog circuits; the VCO needs only monotonic characteristics but it does not need linearity. Mismatches among the unit current sources in the current-steering segmented DAC cause the overall DAC nonlinearity, and the VCO measures the order of each current source value. The measured information is stored in memory, and based on it, each current source is sorted to reduce the DAC nonlinearity. Especially we have investigated with simulations whether the comparison algorithms can improve the DAC Differential Non-Linearity (DNL) and Integral Non-Linearity (INL) with several mismatch conditions. We present its principle and simulation results.
Click here to go on-site presentation (to show detail)

AO-D:8 (Time: 14:14 - 14:16)
TitleAging-Compromised Computing-In-Memory Dot-Product Calculation Technique Through DVFS
Author*Yu-Guang Chen, Chi-Hsu Wang (National Central University, Taiwan), Ing-Chao Lin (National Cheng Kung University, Taiwan)
KeywordComputing-In-Memory, BTI, HCI, SRAM, DVFS
AbstractVon Neumann architecture which separates the computing logic and the storage area has been considered as the fundamental architecture of nearly all digital computers nowadays. The data-intensive applications such as image recognition or cryptography may transfer large amount of data between memory and the computing cores, which causes a well-known von Neumann bottleneck due to the limitation of communication bandwidth. Computing In-Memory (CIM), which directly perform in-situ operations at memory, has been considered as one of the promising solutions to overcome von Neumann bottleneck. Previous researchers have proposed an 8T-SRAM-based CIM architecture to perform multi-bit dot product computations by analog charging/discharging operations. However, such operations are very sensitive to variations as well as aging effects such as Bias Temperature Instability (BTI) and/or Hot Carrier Injection (HCI). To provide a reliable CIM multi-bit dot product engine, in this paper we propose an aging-aware in-memory computing framework which consists of an aging detection method and an aging tolerance technique. Specifically, we apply Dynamic Voltage Frequency Scaling (DVFS) on CIM structure to compensate the current drop due to variations and aging effects. Experimental results show that we can double the lifetime of CIM structure with 1.185x extra power consumption in average.
Click here to go on-site presentation (to show detail)

AO-D:9 (Time: 14:16 - 14:18)
TitleAn Implementation of Self-Testable Layout-Level Scan C-element
Author*Kokoro Yamasaki, Hiroshi Iwata, Ken'ichi Yamaguchi (National Institute of Technology, Nara College, Japan)
KeywordDesign for testability, Full scan design, Asynchronous circuit, C-element, Layout level design
AbstractDesign methodology with asynchronous circuit is used for recent VLSI designs since it can solve several problems with synchronous circuit designs. However, manufacturing test for asynchronous circuits is more difficult than that for synchronous circuits, in which global synchronization is controlled by clock signal lines. To solve the above serious problem for dependability, a full scan design for asynchronous circuits is an answer. A transistor-level circuit for the scan C-element has also been proposed so that one way as an implementation full scan. However, there is no layout-level design of scan C-element to fabricate the chip, and no physical information is available. In this paper, we propose a layout design for scan C-elements using a Rohm 0.18um process transistor model with a view to fabricating chips for experiments.
Click here to go on-site presentation (to show detail)

AO-D:10 (Time: 14:18 - 14:20)
TitleVoice Learning of Reservoir Computing Architecture using Ternary Content Addressable Memory with Individuality
Author*Sayaka Akiyama, Go Ajiki, Xiangbo Kong, Takeshi Kumaki (Ritsumeikan University, Japan)
KeywordReservoir Computing, CAM, Voice learning, AI
AbstractWith the rapid progress in artificial intelligence (AI) technology, the number of machines that have been designed to interact with human beings has been steadily increasing. However, the responses of such machines to human interactions are often excessively uniform. The purpose of our study is to incorporate the variations that occur during chip manufacturing into machine learning and give own individuality to AI-based robots. In this paper, reservoir computing architecture using Ternary Content Addressable Memory with Individuality is developed and learning is performed using voice data, which is a complicated waveform. It is found that the error of the average of all data between 10 chips is 140% at the maximum. Voice data learning results have individuality outputs.
Click here to go on-site presentation (to show detail)

AO-D:11 (Time: 14:20 - 14:22)
TitleFormulation of Maximum Independent Set Problem for Simulated Quantum Annealing Machine
Author*Haruki Nakayama, Yukihide Kohira (The University of Aizu, Japan)
KeywordMaximum Independent Set Problem, Simulated Quantum Annealing
AbstractVarious problems in LSI design such as redundant via insertion are formulated as Maximum Independent Set Problems (MISP). Recently, various algorithms have been proposed to optimize combinatorial optimization problems such as MISP. It is required that we find a suitable combination between each combinatorial optimization problem and a method since a combinatorial optimization problem can be solved by multiple methods. In this paper, we try to find a suitable combination between three optimization problems, which are MISP, minimum vertex cover problem, and maximum clique problem, and three methods, which are a mathematical optimization for binary variables, a solver for satisfiability problem, and a Simulated Quantum Annealing (SQA) machine. It is known that these three problems are equivalent to each other. Moreover, a new formulation for SQA is proposed to improve modeling time. Experimental results show that the proposed formulation obtains the best solution in a short modeling time.
Click here to go on-site presentation (to show detail)

AO-D:12 (Time: 14:22 - 14:24)
TitleEfficient Hardware Architecture for Taylor-Series Expansion Calculation Using Distributed Arithmetic with Term Division
Author*Xaybandith Hemthavy, Jianglin Wei, Shogo Katayama, Anna Kuwana, Haruo Kobayashi (Gunma University, Japan), Kazuyoshi Kubo (Oyama National College of Technology, Japan)
KeywordDigital Signal Processing, Distributed Arithmetic, Taylor-Series Expansion, Digital Arithmetic, Multiply-Add
AbstractThis paper describes the digital arithmetic that reduces the calculation and hardware (logic circuits and memory) for Taylor series expansion calculation by applying the distributed (bit-serial) arithmetic with the proposed term division method. The distributed arithmetic (DA) is a multiplier-less approach for calculating multiply-add operation, but its direct application to the Taylor-series expansion calculation still demands almost the same number of multiplications as the direct calculation and additionally large size Look Up Table (LUT), hence it is useless. Then we propose the term division method which can reduce the number of multiplications and the LUT size significantly. Further, we found that the optimal number of the term division is approximately the square root of the number of the Taylor series expansion terms.
Click here to go on-site presentation (to show detail)


[To Session Table]

Closing
Time: 15:30 - 15:40, Tuesday, October 25, 2022
Location: Premier Hall (on-site) / Zoom (online)