The 14th Workshop on Synthesis And System Integration of Mixed Information technologies
Technical Program

Remark: The presenter of each paper is marked with "*".

Technical Program: SIMPLE version DETAILED version with abstract

Author Index: HERE

Session Schedule

Monday, October 15, 2007


Opening 9:10 - 9:20
K (Conference Hall (2F)) Keynote Speech 9:20 - 10:20
R1 (Conference Hall (2F) & Poster Room (2F)) Design Experience I 10:20 - 12:05
Lunch 12:05 - 13:25
I1 (Conference Hall (2F)) Invited Talk I 13:25 - 14:10
R2 (Conference Hall (2F) & Poster Room (2F)) FPGA, Place & Route 14:10 - 15:50
I2 (Conference Hall (2F)) Invited Talk II 15:50 - 16:35
R3 (Conference Hall (2F) & Poster Room (2F)) Design Methodology for Nanometer Era 16:35 - 18:15
Banquet 18:30 - 20:30

Tuesday, October 16, 2007


I3 (Conference Hall (2F)) Invited Talk III 9:00 - 9:45
R4 (Conference Hall (2F) & Poster Room (2F)) System Level Design & Logic Synthesis 9:45 - 11:30
I4 (Conference Hall (2F)) Invited Talk IV 11:30 - 12:15
Lunch 12:15 - 13:30
I5 (Conference Hall (2F)) Invited Talk V 13:30 - 14:15
R5 (Conference Hall (2F) & Poster Room (2F)) Design Verification & Design Experience II 14:15 - 16:00
D (Conference Hall (2F)) Panel Discussion 16:00 - 17:30
Closing 17:30 - 17:40

List of Papers

Remark: The presenter of each paper is marked with "*".

Monday, October 15, 2007

Keynote Speech
Time: 9:20 - 10:20 Monday, October 15, 2007
Location: Conference Hall (2F)
Chair: Shinji Kimura (Waseda University, Japan)

K-1 (Time: 9:20 - 10:20)

Title	Future Design Paradigms: Technologies, Circuits and Architectures
Author	*Giovanni De Micheli (CSI, EPFL, Switzerland)
Page	p. 3
Abstract	The scaling of CMOS technology is coming soon to an end, and yet it is unclear whether CMOS devices in the 10-20 nanometer range will find a useful place in semiconductor products. At the same time, new silicon-based technologies (e.g., silicon nanowires) and non-silicon based (e.g., carbon nanotubes) show the promise of replacing traditional transistors. Within this rich set of possibilities, we will see more an more a hybridization of technologies toward achieving specific objectives, such as seamless interfacing to embedded sensors, ultra-low power consumption, biological probing, etc. In order for the technology to be widely applicable, specific architectures will be required as well as design tools and methodologies.

Design Experience I
Time: 10:20 - 12:05 Monday, October 15, 2007
Location: Conference Hall (2F) & Poster Room (2F)
Chairs: Chun-Yao Wang (National Tsing Hua University, Taiwan), Tohru Ishihara (Kyushu University, Japan)

R1-1 (Time: 10:20 - 10:22)

Title	Power-Conscious Synthesis of Parallel Prefix Adders under Bitwise Timing Constraints
Author	*Taeko Matsunaga, Shinji Kimura (Waseda University, Japan), Yusuke Matsunaga (Kyushu University, Japan)
Page	pp. 7 - 14
Keyword	parallel prefix adder, switching activity, power, timing constraints, arithmetic synthesis
Abstract	Global structures of parallel prefix adders can be synthesized flexibly depending on each context, such as bitwise input/output timing constraints. In this paper, an approach for power-conscious synthesis of parallel prefix adders is proposed. Global structures of parallel prefix adders are represented as prefix graphs. The switching cost of a prefix graph is defined based on switching activities of nodes in a prefix graph, and minimized by extending our area minimization algorithms. This approach accepts bitwise input/output timing constraints and bitwise probability that each input signal value is one, and minimizes the total sum of switching activities depending on each distinct context. Calculating switching activities by OBDD-based approach makes this approach efficient. Experimental results show the effectiveness of our approach compared to existing regular parallel prefix adders.

R1-2 (Time: 10:22 - 10:24)

Title	Design of a Combined Circuit for Multiplication and Inversion in GF(2^m)
Author	*Katsuki Kobayashi, Naofumi Takagi (Nagoya University, Japan)
Page	pp. 15 - 20
Keyword	GF(2^m), multiplication, inversion
Abstract	A combined circuit for multiplication and inversion in GF(2^m) is proposed. We combine the inversion algorithm proposed by Yan et al. that is based on the extended Euclid's algorithm and the MSB-first multiplication algorithm by focusing on the similarities between them so that multiplication and inversion can share almost all hardware components of the circuit. The area of the circuit is estimated to be approximately 40% smaller than the total area of an ordinary multiplication circuit and an ordinary inversion circuit.

R1-3 (Time: 10:24 - 10:26)

Title	Associative Memory Design Realizing Reference-Pattern Recognition and Learning based on Short/Long-Term Storage Concept
Author	*Shogo Sakakibara, Md. Anwarul Abedin, Yuki Tanaka, Ali Ahmadi , Hans Jüergen Mattausch, Tetsushi Koide (Hiroshima University, Japan)
Page	pp. 21 - 25
Keyword	Associative Memory, Short/Long-term memory
Abstract	In the presented research, an associative memory architecture for searching the most similar data among previously stored reference data is applied, which achieves high speed, low power consumption and small implementation area due to a mixed digital-analog fully-parallel nearest-match search circuitry. The realization of the learning capability is based on the concept of short/long-term memory and tries to mimic the function of the human brain. The complete LSI test-chip designed in 0.35um CMOS technology for verification of this architecture.

R1-4 (Time: 10:26 - 10:28)

Title	Acceleration of Advanced Encryption Standard (AES) Processing on a CAM Enhanced Super Parallel SIMD Processor
Author	*Masaharu Tagami, Masakatsu Ishizaki, Takeshi Kumaki, Yutaka Kono, Tetsushi Koide, Hans Jüergen Mattausch (Hiroshima University, Japan), Takayuki Gyohten, Hideyuki Noda, Katsumi Dosaka, Kazutami Arimoto, Kazunori Saito (Renesas Technology Corporation, Japan)
Page	pp. 26 - 31
Keyword	super parallel SIMD processor, AES, CAM, multimedia processing, pattern matching
Abstract	This paper presents an Advanced Encryption Standard (AES) implementation on a Content Addressable Memory (CAM) enhanced super-parallel SIMD processor. The proposed SIMD processor architecture achieves 40 GOPS for 16b additions at 200MHz clock frequency and 250 mW power dissipation. In the AES processing, a table conversion processing is included. We apply an integrated CAM to which the SIMD processor can off-load the table conversion for quick processing. As a result, we can realize high-speed AES execution on the proposed architecture.

R1-5 (Time: 10:28 - 10:30)

Title	Hardware Realization of Two-Stage Pattern Matching System using Fully-Parallel Associative Memories
Author	*Md. Anwarul Abedin, Yuki Tanaka, Shogo Sakakibara, Ali Ahmadi , Tetsushi Koide, Hans Jüergen Mattausch (RCNS, Hiroshima University, Japan)
Page	pp. 32 - 37
Keyword	associative memory, pattern matching, fully parallel search, mixed digital/analog circuit
Abstract	A hardware realization of cascaded fully-parallel associative memory with two-stage winner search is proposed. In this architecture we have used two different types of associative memories. One is based on the $k$-nearest-matches search and other one is a special type of associative memory in which winner search is done only among the activated reference patterns. The activation in the second associative memory is done by first associative memory after searching the k-nearest-matches. We have already designed, fabricated and tested the associative memories separately. The complete two-stage pattern matching system is tested here with Matlab software and hardware realization is currently under the design process.

R1-6 (Time: 10:30 - 10:32)

Title	A Fast Differential-Amplifier-Based Winner-Search circuit for Fully Parallel Associative Memories
Author	*Yuki Tanaka, Md. Anwarul Abedin, Shogo Sakakibara, Tetsushi Koide, Hans Jüergen Mattausch (Hiroshima University, Japan)
Page	pp. 38 - 41
Keyword	associative memory, nearest search, digital-analog circuit, differential amplifier
Abstract	A mixed digital-analog fully parallel associative memory with differential amplifier for winner search is proposed. The use of proposed differential amplifier for winner search improves the speed, reliability and area efficiency of the associative memory based system. The test chip consumes $5.48mm^2$ area in 0.35 $\mu$m CMOS technology for 64 reference patterns with 16 binaries of 5-bit. The operation speed of the system is less than 78 ns with an average power consumption of around 132 mW.

R1-7 (Time: 10:32 - 10:34)

Title	Reducing the Dynamic Energy Consumption in the Multi-Layer Memory of Embedded Multimedia Processing Systems
Author	*Ilie I. Luican (University of Illinois at Chicago, United States), Hongwei Zhu (ARM, Inc., United States), Florin Balasa (Southern Utah University, United States), Dhiraj K. Pradhan (University of Bristol, Great Britain)
Page	pp. 42 - 48
Keyword	memory management, embedded systems, dynamic energy
Abstract	The memories in data-intensive signal processing systems -- including video and image processing, artificial vision, real-time 3-D rendering, advanced audio and speech coding, medical imaging applications -- have an important impact on the overall energy budget. This paper focuses on the reduction of the dynamic energy consumption in the memory subsystem, starting from the high-level algorithmic specification of the application. The approach to address this problem uses elements of the theory of polyhedra and relies on a variety of algebraic techniques specific to the data-flow analysis used in modern compilers.

R1-8 (Time: 10:34 - 10:36)

Title	An Output Probability Computation Circuit Design for Real Time Speech Recognition
Author	*Joe Hashimoto, Akihiko Eguchi, Makoto Saituji (Kinki University, Japan), Akihisa Yamada (Sharp Corporation, Japan), Takashi Kambe (Kinki University, Japan)
Page	pp. 49 - 55
Keyword	Speech recognition, C-based architecture design, memory access method, application specific arithmetic circuit, Bach system
Abstract	Speech recognition is becoming a popular technology for the implementation of human interfaces. However, conventional approaches to large vocabulary continuous speech recognition require a high performance CPU. In this paper, we describe a speech-recognition system designed using a C-based architecture design methodology. Pipelining and parallel processing circuits accelerated by data buffering, memory separation, and loop unrolling were implemented to calculate the Hidden Markov Model (HMM) output probability at high speed and their performances evaluated. It is shown that real time speech recognition in small portable systems is possible.

R1-9 (Time: 10:36 - 10:38)

Title	A Hybrid Memory Architecture for Low Power Embedded System Design
Author	*Tadayuki Matsumura, Yuriko Ishitobi (Kyushu University, Japan), Tohru Ishihara, Maziar Goudarzi (System LSI Research Center Kyushu University, Japan), Hiroto Yasuura (Kyushu University, Japan)
Page	pp. 56 - 62
Keyword	low power, on-chip memory, leakage, design, scratchpad
Abstract	On-chip memories are one of the most power hungry components of today's system on a chips (SoCs). The on-chip memories generally use higher Vdd and Vth than those of logic parts to suppress the static power consumption without increasing the access delay of the memories. This design policy, however, increases the dynamic power consumption since the dynamic power consumption is quadratically proportional to the Vdd. This paper proposes a hybrid memory architecture which consists of the following two regions; 1) a frequently accessed region which uses low Vdd and Vth and 2) a rarely accessed region which uses high Vdd and Vth. The key of our architecture is that the access delays for the two regions are equal to each other, which eases to integrate this memory into processors without any modifications of an internal processor architecture. This paper also proposes a technique for finding the sizes and the code allocation for the regions so as to minimize the total power consumption of the memory. Experimental results demonstrate that the total power consumption of the scratchpad memory can be reduced in all cases.

R1-10 (Time: 10:38 - 10:40)

Title	An Accurate and Efficient Lane Recognition Algorithm for Automotive Active Safety System
Author	*Yusuke Watanabe, Masahiro Fukui (Ritsumeikan University, Japan)
Page	pp. 63 - 68
Keyword	image filter, automobile, lane recognition
Abstract	Lane recognition is an essential technique for automobile active safety applications. We aim at developing a high speed and high accurate lane recognition system. The proposing algorithm provides an efficient filter to extract candidates edges of lanes and avoid noise edges to reduce mis-recognition as much as possible. It is implemented by a simple hardware logic.

R1-11 (Time: 10:40 - 10:42)

Title	Performance Evaluation of Region-Growing Image Segmentation Using Two-Dimensional Image-Block Scanning
Author	*Keita Okazaki, Kazutoshi Awane, Kosuke Yamaoka, Tetsushi Koide, Hans Jüergen Mattausch (Hiroshima University, Japan)
Page	pp. 69 - 73
Keyword	block-scanning
Abstract	We report a 2-dimensional block-scanning image-segmentation architecture based on a region-growing approach which has real-time execution capability. Using the two techniques of a limited scan to the boundary of each grown region and an exhaustive block-internal growing process, we have improved processing speed, power consumption and hardware efficiency in comparison to the previous state of the art. In particular, the processing speed could be maximized and the processing-circuit size could be minimized by adjusting the pixel number within the scanning block, the memory configuration and the memory-access method.

R1-12 (Time: 10:42 - 10:44)

Title	An Effective Parallel Coding Architecture Utilizing Characteristics of Multimedia Application
Author	*Takeshi Kumaki, Masakatsu Ishizaki, Masaharu Tagami, Tetsushi Koide, Hans Jüergen Mattausch (Hiroshima University, Japan)
Page	pp. 74 - 80
Keyword	Content addressable memory, CAM, Parallel coding, Multiport, Huffman coding
Abstract	This paper presents a parallel coding architecture using a flexible multi-ported content addressable memory (CAM). A previously reported Flexible Multi-port Content Addressable Memory (FMCAM) technology is improved by additional schemes for a single search mode and counting value setting and enables the fast parallel coding operation. Moreover, the concept of an inactive category suspend mode is possible and reduces the power consumption. Evaluation results for Huffman encoding within the JPEG application show that in the proposed architecture the number of clock cycles needed for encoding is 93% less than for a conventional DSP. The power consumption during data transmission between memory block and processing block for the improved FMCAM is estimated about 90% smaller than for the original FMCAM. Furthermore, the performance per unit area, measured in MOPS/mm^2, can be improved by a factor 3.8 in comparison to a conventional DSP.

R1-13 (Time: 10:44 - 10:46)

Title	VLSI Architecture for Real-time Retinex Video Image Enhancement
Author	*Kazuyuki Takahashi, Yoshihiro Nozato (Osaka University, Japan), Hiroyuki Okuhata (Synthesis Corporation, Japan), Takao Onoye (Osaka University, Japan)
Page	pp. 81 - 86
Keyword	video image enhancement, Retinex, variational model
Abstract	Real-time VLSI architecture for Full HD 1080i video image enhancement is proposed, which is based on variational approach of the Retinex algorithm. In order to efficiently reduce the enormous computational cost required for image enhancement, processing layers and the number of iterations are determined in accordance with software evaluation result. Pipeline and parallel processing of pixels also contributes to achieve realtime processing of high resolution pictures. In addition, the use of illumination signal calculated for the previous frame rather than that for the current frame reduces required frame memory size. As a result, the proposed architecture with four parallelization, which can be implemented by 100K gates, processes 1,920x1,080, 30fps images in real-time at 24MHz operation.

R1-14 (Time: 10:46 - 10:48)

Title	ΣΔ-Modulator with High Nearby Interferers Suppression by Transmission Zeroes
Author	*Takashi Moue, Akira Matsuzawa (Tokyo Institute of Technology, Japan)
Page	pp. 87 - 90
Keyword	Delta Sigma modulator , A/D conversion , CMOS
Abstract	A Delta Sigma modulator that can suppress nearby interferers strongly by forming zeroes in signal transfer function has been proposed and demonstrated. Feedforward signal passes from input signal terminal to each integrator can form zeroes in signal transfer function to suppress the nearby interferers strongly which often degrade quality of A/D conversion heavily and causes serious instability. A prototype discrete-time 6th-order Delta Sigma modulator of which signal bandwidth is 777 kHz has fabricated in 0.18 um CMOS technology and demonstrated 20 dB suppression to the 2.65 MHz to 8.22 MHz adjacent channel signals and SNR of 59 dB for in-band signals.

R1-15 (Time: 10:48 - 10:50)

Title	The Effects of Switch Resistances on Pipelined ADC Performances and the Optimization for the Settling Time
Author	Masaya Miyahara, *Hiroki Endou, Akira Matsuzawa (Tokyo Institute of Technology, Japan)
Page	pp. 91 - 96
Keyword	analog to digital converter, switched capacitor amplifier, switch resistance, pipeline operation
Abstract	In this paper, we discuss the effects of switch resistances on the step response of switched-capacitor (SC) circuits, especially multiplying digital-to-analog converters (MDACs) in pipelined analog-to-digital converters. Theory and simulation results reveal that the settling time of MDACs can be decreased by optimizing the switch resistances. This switch resistance optimization does not only effectively increase the speed of single-bit MDACs, but also of multi-bit MDACs. Moreover, multi-bit MDACs are faster than the single-bit MDACs when slewing occurs during the step response. With such an optimization, the response of the switch will be improved by up to 50 %.

R1-16 (Time: 10:50 - 10:52)

Title	A 12-bit 3.7-Msample/s Pipelined A/D Converter Based on the Novel Capacitor Mismatch Calibration Technique
Author	*Shuaiqi Wang (Graduate School of Information, Production, and System, Waseda University, Japan), Fule Li ( Institute of Microelectronics,Tsinghua University, China), Yasuaki Inoue (Graduate School of Information, Production, and System, Waseda University, Japan)
Page	pp. 97 - 103
Keyword	A/D conversion, pipelined, capacitor mismatch calibration, low power dissipation
Abstract	TThis paper proposes a 12-bit 3.7-MS/s pipelined A/D Converter based on the novel capacitor mismatch calibration technique. The conventional stage is improved to an algorithmic circuit involving charge summing, capacitors’ exchange and charge redistribution, simply through introducing some extra switches into the analog circuit. This proposed ADC obtains the linearity beyond the accuracy of the capacitor match and verifies the validity of reducing the nonlinear error from the capacitor mismatch to the second order without additional power dissipation and chip size through the novel capacitor mismatch calibration technique. It is processed in 0.5um CMOS technology. Simulation results show that 71.7dB SNDR, 77.9dB SFDR are obtained for a 2V Vpp 500kHz sine input sampled at 3.7MS/s. The whole power dissipation of this ADC is 33.46mW at the power supply of 5V.

Invited Talk I
Time: 13:25 - 14:10 Monday, October 15, 2007
Location: Conference Hall (2F)
Chair: Takao Onoye (Osaka University, Japan)

I1-1 (Time: 13:25 - 14:10)

Title	Reconfigurable Architecture: Challenges and Impacts for Multimedia
Author	Chung-Jr Lian, You Ming Tsao, *Liang-Gee Chen (National Taiwan University, Taiwan)
Page	pp. 107 - 111
Abstract	Consumer electronic applications become the driving force for the growth of semiconductor technology. The variety of multimedia applications with wide range real time demands requests different computing architecture. Reconfigurable architectures are the most promising techniques. In this presentation, the design concept of reconfigurable architecture for multimedia applications is introduced. By introducing well-designed reconfigurability into application specific circuits, the design can provide not only good performance in terms of area, speed and power, but also flexibility for different modes, parameters, and fast algorithms. Power-aware concept can therefore also be realized based on reconfigurable architecture. Several design cases will be included in this talk: reconfigurable architecture for MPEG-4 and H.264/AVC, scalable architecture for JPEG 2000, and a video processing unit (VPU) with reconfigurable memory. The software and system optimization issues will also be addressed.

FPGA, Place & Route
Time: 14:10 - 15:50 Monday, October 15, 2007
Location: Conference Hall (2F) & Poster Room (2F)
Chairs: Hung-Ming Chen (National Chiao Tung University, Taiwan), Yasuhiro Takashima (The University of Kitakyushu, Japan)

R2-1 (Time: 14:10 - 14:12)

Title	A BCH Decode Accelerator for Application Specific Processors
Author	*Kazuhito Ito (Saitama University, Japan)
Page	pp. 115 - 121
Keyword	BCH, accelerator, processor
Abstract	The BCH code is one of popular error correction codes (ECC) and decoding BCH requires many bit oriented operations as well as word oriented operations. A dedicated hardware BCH decoder is less flexible and decoding BCH by base processor consumes many instructions in bit operations and requires large memory area for look-up tables. In this paper, we propose an auxiliary circuit included in application specific pipelined processors which accelerates the BCH decoding process.

R2-2 (Time: 14:12 - 14:14)

Title	Design and FPGA Implementation of a High-Speed String Matching Engine
Author	*Yosuke Kawanaka, Shin'ichi Wakabayashi, Shinobu Nagayama (Hiroshima City University, Japan)
Page	pp. 122 - 129
Keyword	string matching, FPGA, special-purpose hardware, regular expressions
Abstract	A high-speed string matching circuit for searching a pattern in a given text is proposed. In the circuit, a pattern is specified by a class of restricted regular expressions. The architecture of the circuit is a one-dimensional array of simple processing units. The proposed circuit was designed with Verilog-HDL, and was implemented using a Xilinx Virtex4 chip.

R2-3 (Time: 14:14 - 14:16)

Title	Speed Improvement of AES Encryption using Hardware Acclererators Synthesized by C Compatible Architecture Prototyper (CCAP)
Author	*Hiroyuki Kanbara (ASTEM RI, Japan), Takayuki Nakatani, Naoto Umehara (Ritsumeikan University, Japan), Nagisa Ishiura (Kwansei Gakuin University, Japan), Hiroyuki Tomiyama (Nagoya University, Japan)
Page	pp. 130 - 134
Keyword	high level synthesis, Embedded system, Codesign, AES Encryption
Abstract	The authors are developping a high-level synthesizer called C Compatible Architecture Prototyper (CCAP). CCAP compiles ANSI C program which is a part of embedded software and generates an hardware accelerator in HDL. CCAP offers an arbiter circuit which makes it possible for the synthesized accelerator and a cpu to access main memory in parallel. In this paer we report the speed improvement of AES Encryption using CCAP.

R2-4 (Time: 14:16 - 14:18)

Title	A Hybrid Logic Simulator Using LUT Cascade Emulators
Author	*Hiroki Nakahara, Tsutomu Sasao, Munehiro Matsuura (Kyushu Institute of Technology, Japan)
Page	pp. 135 - 141
Keyword	LUT cascade, Logic simulation, Design Verification
Abstract	This paper presents a hybrid logic simulator using both an event-driven and a cycle-based methods. For special primitives such as memories and tri-state buffers, it uses an event-driven method. For other parts, it uses a cycle-based method using LUT cascade emulators. To simulate a large scale circuit, it partitions the circuit into smaller ones, and realizes each part by an LUT cascade emulator. Next, it combines these emulators by interconnections. Since a multiplier often requires large memories in an LUT cascade, an instruction of the processor is used instead of the LUT cascade. This will reduce the code size and the simulation time. Our experiment shows that proposed method is effective for circuits including arithmetic operations.

R2-5 (Time: 14:18 - 14:20)

Title	Statistical Estimation Method for Verification Coverage Using FPGA-based Emulators
Author	*Kohei Hosokawa, Yuichi Nakamura (NEC, Japan), Baku Haraguchi (NEC Micro Systems, Japan)
Page	pp. 142 - 146
Keyword	FPGA-based Emulators, Verification Coverage, Toggle Coverage, Statistics, Test-Pattern
Abstract	We propose a new method to quickly estimate toggle coverage as an indicator of verification coverage for a large number of test patterns. The proposed method uses statistical interval estimation theory to reduce the number of signals required to estimate the toggle coverage, which normally requires transition information for all the signals in a circuit. Since this reduction decreases a size of toggle measurement circuits on an FPGA, the toggle coverage can be estimated by an FPGA-based emulator that can operate at speeds in the MHz order, which is roughly 10^4 - 10^5 times faster than HDL simulators. We confirmed by experiment that the average estimation error is within +-1% in actual LSI emulations.

R2-6 (Time: 14:20 - 14:22)

Title	Blockage-Aware Routing Tree Construction with Concurrent Buffer and Flip-Flop Insertion
Author	Shu-Yun Chen (Realtek Semiconductor Corp., Taiwan), *Ting-Chi Wang (National Tsing Hua University, Taiwan)
Page	pp. 147 - 154
Keyword	Routing, Buffer/Flip-Flop Insertion, Physical Design
Abstract	For high-frequency designs, concurrent buffer and flip-flop insertion becomes inevitable for interconnect delay optimization. To the best of our knowledge, all existing works perform concurrent buffer and flip-flop insertion on a given routing tree. The given routing tree, however, may greatly limit the effectiveness of concurrent buffer and flip-flop insertion. In this paper, we present a method which simultaneously constructs a routing tree and performs concurrent buffer and flip-flop insertion subject to latency constraints. We also propose four speed-up techniques to further reduce the computation time. The experimental results show that our method has 90% success rate in generating a feasible solution while a sequential method, which separates the tree construction and the concurrent buffer and flip-flop insertion into 2 steps, has only 57% success rate. For the test cases in which both our method and the sequential method can generate feasible solutions, our method has up to 96% chance to produce better solutions.

R2-7 (Time: 14:22 - 14:24)

Title	Low-Power Clock Tree Synthesis by Low-Swing Techniques
Author	Yun-Ta Lin (SpringSoft, Inc., Taiwan), *Hung-Ming Chen (Dept of EE and SoC Research Center, National Chiao Tung University, Taiwan)
Page	pp. 155 - 160
Keyword	Clock Tree Synthesis, Low Power, Low Swing
Abstract	Chips running at higher frequency consume much more power. Without carefully planning clock network, the chips will suffer from high power dissipation. In this paper, we present a methodology which can be applied in buffered clock tree synthesis to achieve low power demands and zero-skew constraint. It is based on the low-swing interconnections for the clock signal transmission and the low-swing double-edge triggered flip-flops for synchronizing elements. DME based buffering is applied for reducing the number of buffers inserted as well as wirelength in order to lower power consumption. The experimental results are encouraging. We obtain average 49\% power saving in equivalent clock rate, compared with a previous work based on low-swing interconnection.

R2-8 (Time: 14:24 - 14:26)

Title	Post-Silicon Clock-timing Tuning Based on Statistical Estimation
Author	*Yuko Hashizume, Yasuhiro Takashima (The University of Kitakyushu, Japan), Yuichi Nakamura (NEC Corporation, Japan)
Page	pp. 161 - 165
Keyword	deskew, linear programming, PDE
Abstract	In deep-submicron technologies, process variations can severely affect the performance and yield of VLSI chips. As a countermeasure to the variations, post-silicon tuning has been proposed. Deskew, where the clock timing of flip-flops (FFs) is tuned by inserting delay elements into the clock tree is classified into this method. We propose a novel deskew method that decides delay values from measuring a small amount of FFs’ clock timing and estimating the rest of FFs’ clock timings based on a statistical model.

R2-9 (Time: 14:26 - 14:28)

Title	Speed Enhancement Technique for the Post-fabrication Clock-timing Adjustment of Digital LSIs
Author	*Tatsuya Susa (Graduate School of Science, Toho University, Japan), Masahiro Murakawa, Eiichi Takahashi (National Institute of Advanced Industrial Science and Technology, Japan), Tatsumi Furuya (Graduate School of Science, Toho University, Japan), Tetsuya Higuchi (National Institute of Advanced Industrial Science and Technology, Japan), Shinji Furuichi, Yoshitaka Ueda, Atsushi Wada (Sanyo Electric Co., Ltd, Japan)
Page	pp. 166 - 173
Keyword	post-fabrication adjustment, adjustment simulation, process variation, yield, genetic algorithm
Abstract	We propose a speed enhancement technique for post-fabrication clock-timing adjustment to realize practical applications. The method reduces adjustment time by reducing the number of adjustment points by utilizing static timing analysis (STA) results and adopting an improved distribution for the initial GA population. Moreover, we have developed an adjustment simulator to predict adjustment results with the proposed method at the LSI design stage. Adjustment experiments using the developed simulator demonstrate that our method can adjust practical LSIs with 1,031 flipflops within a few seconds.

R2-10 (Time: 14:28 - 14:30)

Title	Repairs for Voltage Drop and Noise Violation in Late Design Stages
Author	Shih-Tsung Huang (AnaGlobe Technology, Taiwan), *Hung-Ming Chen (Dept of EE and SoC Research Center, National Chiao Tung University, Taiwan)
Page	pp. 174 - 178
Keyword	DSM, ECO, Voltage Drop, Crosstalk Noise
Abstract	Since many second order problems have emerged in deep submicron (DSM) era, some critical functional changes in ECO cause inevitable timing and voltage drop violations. In this paper, we have proposed a methodology to reduce %coupling capacitance and voltage drop and noise violation with minimal design changes, which can be used in ECO or late design stage. It is simple to be plugged it into current design flow, and is efficient so that we can avoid excess timing and voltage drop check iterations and repair the power delivery damage from limited resource in late design stage. We formulate this problem as a longest path problem and fix the violation by using lower metal layer power lines for power compensation. We have integrated this framework with a commercial tool and experimental results show that our methodology can successfully relieve the violations of noise and IR-drop in ECO or late design stage.

R2-11 (Time: 14:30 - 14:32)

Title	Estimation of Yield Enhancement by Critical Path Reconfiguration Utilizing Random Variations on Deep-submicron FPGAs
Author	*Yuuri Sugihara, Yohei Kume, Kazutoshi Kobayashi, Hidetoshi Onodera (Kyoto University, Japan)
Page	pp. 179 - 183
Keyword	FPGA, variation-aware, yield enhancement
Abstract	In this paper, we estimate yield enhancement by critical path reconfiguration of deep submicron FPGAs which suffer from drastic yield loss due to process variations. Critical path reconfiguration is dedicated to random process variations which are hard to predict. First, an initial configuration for an implemented circuit is applied to all fabricated FPGAs and at-speed test are done. Then failed signal paths are rerouted to different locations. Reroute and at-speed test are repeated several times to enhance yield. Locations of the critical paths are optimized chip by chip incrementally according to chip-oriented random variations. Theoretical analysis is done to verify the effectiveness of critical path reconfiguration compared with multiple configurations according to the number of critical paths in the presense of random variations.

R2-12 (Time: 14:32 - 14:34)

Title	A Mixed Integer Linear Programming Based Approach for Post-Routing Redundant Via Insertion
Author	Kuang-Yao Lee, *Ting-Chi Wang (National Tsing Hua University, Taiwan), Kai-Yuan Chao (Intel Corporation, United States)
Page	pp. 184 - 191
Keyword	Redundant via, Physical design, Design for manufacturability
Abstract	Redundant via insertion is highly recommended to improve chip yield and reliability. The well-studied double-cut via insertion (DVI) problem allows a single via in a chip to have at most one redundant via inserted next to it, but the solution to this problem is not good enough particularly for high-activity and power nets because those nets typically need more redundant vias to further enhance reliability. This motivates us to study in this paper a new problem, called the multiple-cut via insertion (MVI) problem, in which one redundant via or more can be inserted next to a single via such that the amount of single vias with redundant vias inserted next to them and the amount of inserted redundant vias are both maximized. We formulate the MVI problem as a mixed integer linear programming (MILP) problem. To make the problem tractable, we further break the MILP problem into a set of much smaller MILP problems each of which is solved independently and efficiently without sacrificing the optimality. Besides, we identify that the DVI problem is just a special case of the MVI problem, and therefore our MILP approach can be easily adapted to optimally solve the DVI problem as well. To the best of our knowledge, none of the existing DVI works can guarantee the optimality. The extensive experimental results are provided to support the efficiencies of our MILP approaches on both the MVI and DVI problems.

R2-13 (Time: 14:34 - 14:36)

Title	Fast Monotonic Via Assignment Excluding Mold Gates for 2-Layer Ball Grid Array Packages
Author	*Yoichi Tomioka, Atsushi Takahashi (Tokyo Institute of Technology, Japan)
Page	pp. 192 - 197
Keyword	ball grid array, package, monotonic, 2-layer, routing
Abstract	Ball Grid Array packages in which I/O pins are arranged in a grid array pattern realize a number of connections between chips and a printed circuit board, but it takes much time in manual routing. We propose a fast routing method for 2-layer Ball Grid Array packages to support designers. Our method obtains a via assignment which distributes wires evenly on top layer and has high completion ratio of nets by improving via assignment iteratively.

R2-14 (Time: 14:36 - 14:38)

Title	An I/O Planning Method for Three-Dimensional Integrated Circuits
Author	*Chao-Hung Lu (National Central University, Taiwan), Hung-Ming Chen (National Chiao Tung University, Taiwan), Chien-Nan Jimmy Liu, Wen-Yu Shih (National Central University, Taiwan)
Page	pp. 198 - 202
Keyword	I/O, Partition, 3D
Abstract	3DIC is an alternative choice when we design a chip because this architecture has high performance and high density properties. In this paper, we propose a partition methodology to solve the problem of I/O assignment and number of 3D-Via in the 3DIC design. The I/O partitioning method is based on the F-M algorithm and the method would consider the total number of 3D-Via and the I/O number for each tier at the same time. Experimental results show that our approach can reduce the number of 3D-Vias while balances the I/O number for each tier. Additionally, our partition result and the floorplan algorithm can be integrated together.

R2-15 (Time: 14:38 - 14:40)

Title	Non-Slicing Floorplanning-Based Crosstalk Reduction on Gridless Track Assignment
Author	*Wen-Nai Cheng, Yu-Ning Chang, Yih-Lang Li (National Chiao-Tung University, Taiwan)
Page	pp. 203 - 207
Keyword	VLSI design, physical design, Gridless Routing, Track Assignment, Crosstalk minimization
Abstract	Track assignment, which is an intermediate stage between global routing and detailed routing, provides a good platform for promoting performance, and for imposing additional constraints during routing, such as crosstalk. Gridless track assignment (GTA) has not been addressed in public literature. This work develops a gridless crosstalk-driven GTA. Initial assignment is produced rapidly with a left-edge like algorithm. Crosstalk reduction on the assignment is then transformed to a restricted non-slicing floorplanning problem, and a deterministic O-tree based algorithm is employed to re-assign each net segment. Finally, each panel is partitioned into several sub-panels, and the sub-panels are re-ordered using branch and bound algorithm to decrease the crosstalk further. Experimental results demonstrate that the proposed gridless crosstalk-driven GTA has over 80% reduction in the overlapping length of adjacent wires.

R2-16 (Time: 14:40 - 14:42)

Title	Fujimaki-Takahashi Squeeze : Linear Time Construction of Constraint Graphs of a Floorplan for a Given Permutation
Author	*Ryo Fujimaki, Toshihiko Takahashi (Niigata University, Japan)
Page	pp. 208 - 213
Keyword	Floorplan, Representation, Permutation, Constraint graph
Abstract	A floorplan is a subdivision of a rectangle into rectangular faces with horizontal and vertical line segments. We call a floorplan room-to-room when adjacency between rooms are considered. Fujimaki and Takahashi showed that any room-to-room floorplan can be represented as a permutation. In this paper, we give an O(n)-time algorithm that constructs the vertical and the horizontal constraint graphs of a floorplan for a given permutation under the representation.

R2-17 (Time: 14:42 - 14:44)

Title	Placement with Symmetry Constraints for Analog IC Layout Design based on Tree Representation
Author	*Natsumi Hirakawa, Kunihiro Fujiyoshi (Tokyo University of Agriculture and Technology, Japan)
Page	pp. 214 - 221
Keyword	symmetry constraints, O-tree
Abstract	Symmetry constrains are the constraints that the given cells should be placed symmetrically in design of analog ICs. We use O-tree to represent placements and propose a decoding algorithm which can obtain a closest packing satisfying the constraints. The decoding algorithm uses linear programming, which is time consuming. Therefore we propose a method to judge if there exists a packing corresponding to a given O-tree or not on graph, and use the method before linear programming. The effectiveness of the proposed method was shown by computational experiments.

Invited Talk II
Time: 15:50 - 16:35 Monday, October 15, 2007
Location: Conference Hall (2F)
Chair: Yusuke Matsunaga (Kyushu University, Japan)

I2-1 (Time: 15:50 - 16:35)

Title	Why Study Quantum Circuits and What They Are Good For
Author	*Igor Markov (University of Michigan, United States)
Page	pp. 225 - 230
Abstract	As transistor dimensions approach atomic scale, quantum-mechanical effects such as tunneling and spin become important ingredients in accurate performance models of integrated circuits. Theoretical work in terms of such models suggests that power-density constraints may eventually require a departure from common practices of representing logic 0s and 1s by charges, voltages or currents. Instead, nuclear and electron spins are proposed as primary careers of stationary information, e.g., in the well-publicized demonstration by IBM in 2000, and photon polarizations can transport quantum information over great distances, acting as quantum bits. However, the algebra of quantum bits is radically different from the Boolean algebra that describes modern digital electronics, while such states are succeptible to frequent and unusual types of errors. On the positive side, quantum communication promises an unparalleled level of security and some quantum algorithms solve other-wise intractable problems in polynomial time. Despite many potential applications and several active start-ups in the field, the main obstacle to further progress in quantum information processing is complexity. This is where design automation can lend a helping hand.

Design Methodology for Nanometer Era
Time: 16:35 - 18:15 Monday, October 15, 2007
Location: Conference Hall (2F) & Poster Room (2F)
Chairs: Ting-Chi Wang (National Tsing Hua University, Taiwan), Youhua Shi (Waseda University, Japan)

R3-1 (Time: 16:35 - 16:45)

Title	A Study on Body-Biasing Layout Style Focusing on Area Efficiency and Speed Controllability
Author	*Koichi Hamamoto, Hiroshi Fuketa, Masanori Hashimoto, Yukio Mitsuyama, Takao Onoye (Osaka University, Japan)
Page	pp. 233 - 237
Keyword	body bias, forward bias, layout style, speed controllability
Abstract	Body-biasing is expected to be a common design technique, and then area efficient implementation in layout has been demanded. Body-biasing outside standard cells is one of possible layouts, but in this case body-bias controllability, especially when forward bias is applied, is a concern. To investigate the controllability, we fabricated a ring oscillator in a 90nm technology, and measured the controllability. Our measurement result and evaluation of area efficiency reveal that body-biased circuits can be implemented with area overhead of less than 1%.

R3-2 (Time: 16:45 - 16:47)

Title	Simulations of Flicker Noise in SiGe HMOS: Body Bias Dependence
Author	*C.-Y. Chen, Y. Liu, R. W. Dutton (Stanford University, United States), J. Sato-Iwanaga, A. Inoue, H. Sorada (Matsushita Electric Industrial Co., Ltd, Japan)
Page	pp. 238 - 241
Keyword	TCAD, flicker noise, SiGe, p-type hetero-structure MOS (pHMOS), body bias
Abstract	Advanced TCAD simulation capabilities have been developed to investigate flicker noise behavior in p-type SiGe/Si hetero-structure MOS (HMOS) transistors. The numerical model is based on the impedance field method and accounts for the carrier number fluctuation due to trap/de-trap effects and the correlated mobility fluctuation mechanism. Such a device-level simulation approach enables separate treatment of the buried and parasitic surface channels which have different contributions from the mobility fluctuations. Simulations have been conducted to explain experimentally observed strong body-bias dependence of drain current noise in p-HMOS devices. In particular, this dependence is found to be closely correlated with the carrier distribution between the two channels. An improved compact model to account for this body bias dependence of flicker noise in SiGe pHMOS devices is also presented in this paper.

R3-3 (Time: 16:47 - 16:49)

Title	Active Body-Biasing Control on PD-SOI for Dual Supply Voltage Scheme
Author	*Yosuke Torii, Kenji Hamada, Kayoko Seto, Masaaki Iijima, Masahiro Numa (Kobe University, Japan), Akira Tada, Takashi Ipposhi (Renesas Technology Corporation, Japan)
Page	pp. 242 - 245
Keyword	low power, active body-bias, dual supply voltage, PD-SOI
Abstract	The dual supply voltage scheme reduces the power consumption without performance degradation by using two power supply rails. However, an increase in the delay has made assigning the lower supply voltage more difficult in the conventional dual-VDD scheme under low supply voltage. We propose a technique for dual-VDD scheme employing the Active Body-biasing Control on PD-SOI, which increases the number of VDDL-cells by lowering threshold voltage. Simulation results have shown our approach reduces the power consumption at low voltage operation.

R3-4 (Time: 16:49 - 16:51)

Title	A Look-Ahead Active Body-Biasing Scheme for SOI-SRAM with Dynamic V_DDM Control
Author	*Kayoko Seto, Yosuke Torii, Masaaki Iijima, Masahiro Numa (Kobe University, Japan), Akira Tada, Takashi Ipposhi (Renesas Technology Corporation, Japan)
Page	pp. 246 - 249
Keyword	PD-SOI, body-bias, SRAM, low power design
Abstract	Instability of SRAM memory cells derived from aggressive technology scaling has become one of the most significant issues. Although lowering the supply voltage for a memory cell (VDDM) improves a write margin, which increases the access time. In this paper, we propose a memory cell employing a Look-ahead Active Body-biasing (LAB) scheme for SOI-SRAM with dynamic VDDM control. Simulation results have shown that the proposed SRAM cell shortens the access time by 54 % in the write mode.

R3-5 (Time: 16:51 - 16:53)

Title	A Study on Variation-Component Decomposition using Polynomial Smoothing Function
Author	*Takashi Sato, Hiroyuki Ueyama, Noriaki Nakayama, Kazuya Masu (Tokyo Institute of Technology, Japan)
Page	pp. 250 - 255
Keyword	device variation, systematic, random, goodness of fit, AIC
Abstract	A procedure that decomposes parametric device variation into systematic and random components of the device variation is studied. Regarding the decomposition process as obtaining a smooth regression function, polynomial model is used to describe the systematic variation and the residue is considered as random variation. In a proposed flow, required order of regression function is determined adaptively, using a statistical index called AICc. The impact of polynomial order selection on variation competition is also discussed through numerical experiments using measured data.

R3-6 (Time: 16:53 - 16:55)

Title	Effect of Dummy Fills on High Frequency Characteristics of Spiral Inductor
Author	*Akira Tsuchiya, Hidetoshi Onodera (Kyoto University, Japan)
Page	pp. 256 - 260
Keyword	spiral inductor, dummy fill
Abstract	This paper discusses the effect of CMP dummy fills on spiral inductors. Conventionally the effect of dummy fills are discussed from the viewpoint of the capacitance. However in high frequency above 10GHz, the dummy fills affect the resistance and the inductance of the wire. We evaluate the effect of dummy fills by 3D field-solver. Experimental results shows that the Q-factor decreases by 20\% due to the loss in dummy fills.

R3-7 (Time: 16:55 - 16:57)

Title	Static-Noise-Margin Analysis of Major SRAM-Cell Types Including Production Variations for a 90nm CMOS Process
Author	*Shinya Izumi, Koh Johguchi, Hans Jüergen Mattausch, Tetsushi Koide (Hiroshima University, Japan)
Page	pp. 261 - 265
Keyword	SRAM, SNM, variation, robust
Abstract	Here we report a comparative study of the effect of the Vth variation on the major SRAM-cell types in a 90 nm CMOS process, namely the conventional 1-port cell with 6-transistors, the 8- transistor cell with separate read and write port, the static noise margin (SNM) free 7-transistor cell, and the loadless 4-transistor cell. While 4Tr-SRAM and 6Tr-SRAM cannot keep enough reliability at worst case, 8Tr-SRAM and 7Tr-SRAM can keep it at worst case. At low operation voltage, 8Tr-SRAM has higher reliability than 7Tr-SRAM.

R3-8 (Time: 16:57 - 16:59)

Title	Active Mode Leakage Power Reduction Based on the Controlling Value of Logic Gates
Author	*Lei Chen, Shinji Kimura (The Graduate School of Information, Production and Systems, Waseda University, Japan)
Page	pp. 266 - 271
Keyword	MTCMOS, Leakage Power, Controllability
Abstract	Leakage power dissipation becomes an important issue as technology scaling of LSI process. In this paper, we propose a novel control method of Multi-Threshold CMOS (MTCMOS) technology based on the controllability of logic gates. The controlling value of a logic gate can stop the power of the blocks connected to other inputs of the gate. Based on the idea, we can control the power dynamically. This paper discusses methods to construct and control power blocks from gate level circuit. A power optimization idea is also introduced. The effect of the proposed method is shown on several standard benchmark circuits.

R3-9 (Time: 16:59 - 17:01)

Title	Structural Robustness of Datapaths against Delay-Variation
Author	*Keisuke Inoue, Mineo Kaneko, Tsuyoshi Iwagaki (Japan Advanced Institute of Science and Technology, Japan)
Page	pp. 272 - 279
Keyword	High-Level Synthesis, Delay Variation, Register Assignment
Abstract	As the feature size of VLSI becomes smaller, delay variations become a serious problem in VLSI design. In this paper, we propose a novel class of robustness for a datapath against delay variations, which is named structural robustness against delay-variation (SRV), and propose sufficient conditions for a datapath to have SRV. A resultant circuit designed based on these conditions has a larger timing margin to delay variations than previous designs without sacrificing effective computation time. In addition, under any degree of delay variations, we can always find an available clock frequency for a datapath having SRV property to operate correctly, which could be a preferable characteristic in IP-based design.

R3-10 (Time: 17:01 - 17:03)

Title	Critical Issues Regarding A Variation Resilient Flip-Flop
Author	Toshinori Sato (Kyushu University, Japan), *Yuji Kunitake (Kyushu Institute of Technology, Japan)
Page	pp. 280 - 286
Keyword	variations, low-power, DVS, Razor, microprocessors
Abstract	Razor flip-flop (FF) is a clever technique to eliminate the supply voltage margin by exploiting circuit-level timing speculation. It combines dynamic voltage scaling technique with the error detection and recovery mechanism. This paper presents an improvement of Razor FF in removing delayed clock, which complicates timing design. It is named canary FF. This paper discusses critical issues regarding the canary FF. When the issues were solved, the canary FF would achieve 10% of power reduction by exploiting input value variations.

R3-11 (Time: 17:03 - 17:05)

Title	A Case Study of Multi-processor Design with Asynchronous Interconnect using Synchronous Design Tools
Author	*Katsunori Tanaka, Yuichi Nakamura, Atsushi Atarashi (System IP Core Research Labs., NEC Corporation, Japan)
Page	pp. 287 - 293
Keyword	GALS, design methodology
Abstract	This paper shows a case study of multi-processor design with synchronous interconnect based on QDI (Quasi Delay Insensitive) model using synchronous design tools for GALS (Globally Asynchronous, Locally Synchronous) architecture. In the design flow, we set specific design constraints to apply design tools for clocked circuits to the asynchronous interconnect as well. By applying the flow through placement and routing to an experimental design of a GALS system consisting of four clocked processors and a data memory with a clockless interconnect based on QDI model, we proved that it can produce a GALS system working correctly. We also show experimental results of a preliminary version of the experimental design.

R3-12 (Time: 17:05 - 17:07)

Title	An Asynchronous Single-precision Floating-point Divider and its Implementation on FPGA
Author	*Masayuki Hiromoto, Shin'ichi Kouyama, Hiroyuki Ochi (Kyoto University, Japan), Yukihiro Nakamura (Ritsumeikan University, Japan)
Page	pp. 294 - 301
Keyword	IP reusability, IEEE754, low power design, digit-recurrence divider
Abstract	Synchronous design methodology is widely used for today's digital circuits. However, it is difficult to reuse a highly-optimized synchronous module for a specific clock frequency to other systems with different global clocks, because logic depth between FFs should be tailored for the clock frequency. In this paper, we focus on asynchronous design, in which each module works at its best performance, and apply it to an IEEE754-standard single-precision floating-point divider. In our divider, a mantissa divider is driven by a high-speed local clock and connected to pre-/post-processing modules with asynchronous interface. Our divider is ready to be built into a system with arbitrary clock frequency and achieves its peak performance and area- and power-efficiency. This paper also reports an implementation result of the proposed divider on a Xilinx FPGA.

R3-13 (Time: 17:07 - 17:09)

Title	Full-Chip Thermal Analysis via Generalized Integral Transforms
Author	*Pei-Yu Haung, Chih-Kang Lin, Yu-Min Lee (National Chiao Tung University, Taiwan)
Page	pp. 302 - 309
Keyword	Thermal analysis, generalized integral transforms
Abstract	This paper presents an accurate and fast analytical full-chip thermal simulator for the early-stage temperature-aware chip design. By using the technique of generalized integral transforms (GIT), our proposed method can accurately estimate the temperature distribution of full-chip with very small truncation points of bases in the spatial domain. We also develop a fast Fourier transform (FFT) like evaluating algorithm to efficiently evaluate the temperature distribution. Experimental results confirm that our GIT based analyzer can achieve an order of magnitude speedup compared with a highly efficient Green’s function based method.

R3-14 (Time: 17:09 - 17:11)

Title	A Power Grid Optimization Algorithm by Direct Observation of Timing Error Risk Reduction
Author	*Makoto Terao, Kenji Kusano, Yoshiyuki Kawakami (Graduate School of Science and Engineering, Ritsumeikan University, Japan), Masahiro Fukui (Dept. of VLSI System Design, Ritsumeikan University, Japan), Shuji Tsukiyama (Dept. of EECE, Chuo University, Japan)
Page	pp. 310 - 315
Keyword	delay analysis, dispersion, power and ground routing optimization, IR-drop, electro-migration
Abstract	With the advent of super deep submicron age, the circuit behavior has large variation according to the process variation. Power grid optimization which considers the timing error risk caused by the variation becomes very important for the stable and fast operation of the system. This paper proposes an approach which uses the “timing error risk caused by the IR drop” as its direct objective function. Experimental results shows the effectivity.

R3-15 (Time: 17:11 - 17:13)

Title	A High-level Power Grid Optimization Algorithm by Direct Observation of Manufacturing Cost Reduction
Author	*Takayuki Hayashi, Hironobu Ishijima, Yoshiyuki Kawakami, Masahiro Fukui (Ritsumeikan University, Japan)
Page	pp. 316 - 321
Keyword	floor-plan, optimization, Cost, decoupling capacitor
Abstract	Recent rapid growth of the narrow and fine patterning technology faces many difficulties of power grid design. The insertions of the decoupling capacitor cause the increase of size of the blocks in the chip. It is hard to analyze the trade-off after the detail placement and routing optimization. Authors propose an approach to do the optimization in the phase of floorplanning and deals with trade-off analysis between the chip cost by area increase and stabilization of circuit behavior.

R3-16 (Time: 17:13 - 17:15)

Title	An Evaluation of Circuit Simulation Algorithms for Hardware Implementation
Author	*Taiki Hashizume, Hironobu Ishijima, Masahiro Fukui (Ritsumeikan University, Japan)
Page	pp. 322 - 327
Keyword	circuit simulation, Euler method, Runge-Kutta method, hardware, fixed point
Abstract	In super deep submicron technology, a very large sized system on one LSI chip is constructed. Therefore, the circuit size becomes larger, and we need lots of time for the circuit simulation. Reducing the simulation time is indispensable for larger sized circuit design. We have proposed a high-speed circuit simulation for power supply network by hardware algorithm. The most adequate numerical analysis for hardware algorithm is specified in this paper.

Tuesday, October 16, 2007

Invited Talk III
Time: 9:00 - 9:45 Tuesday, October 16, 2007
Location: Conference Hall (2F)
Chair: Masahiro Fujita (University of Tokyo, Japan)

I3-1 (Time: 9:00 - 9:45)

Title	Dynamic Analysis of Concurrent Systems
Author	*Gul Agha (University of Illinois at Urbana-Champaign, United States)
Page	pp. 331 - 334
Abstract	Despite considerable progress in model checking techniques, large concurrent systems have more states than can be effectively model checked. Other verification techniques, such as theorem proving, require significant human expertise. The talk will present research in three techniques we have developed at Illinois to reason about concurrent systems: concolic testing, predictive monitoring, and learning based verification. Concolic testing of concurrent systems improves the efficiency of testing by using symbolic testing and partial order reduction to guide testing. Random values are used to simplify infeasible constraints, thus maintaining soundness. Predictive monitoring improves the efficiency of testing by using observed traces to predict other traces that may occur. Computation learning based verification uses learning to reach fixed points rather than explore the entire state space. I will illustrate these techniques by means of examples from software, and discuss their benefits and limitations.

System Level Design & Logic Synthesis
Time: 9:45 - 11:30 Tuesday, October 16, 2007
Location: Conference Hall (2F) & Poster Room (2F)
Chairs: Bernard Courtois (CMP, France), Yukio Mitsuyama (Osaka University, Japan)

R4-1 (Time: 9:45 - 9:47)

Title	An Object-Oriented Circuit Design Method and Its Evaluation
Author	*Seigo Masuoka, Hiroyuki Terai, Manabu Koyama (Kinki University, Japan), Kazuhiko Nakahara (Spansion Japan Corporation, Japan), Akihisa Yamada (Sharp Corporation, Japan), Takashi Kambe (Kinki University, Japan)
Page	pp. 337 - 342
Keyword	Object-Oriented Design, Java, Hardware-software co-design, JPEG decoder, Bach system
Abstract	Hardware-software System LSI solutions have increased in popularity in a variety of design domains because these systems provide both high performance and flexibility. The language used to describe the System LSI is critical in a co-design methodology because it is used in both the hardware-software design process and functional validation. Java is a general-purpose, concurrent, object-oriented, platform-independent programming language and is often used in the field of embedded system design for applications such as mobile phones. In this paper we describe the Jackal language, which is an extension of Java for hardware design and propose an object-oriented circuit design methodology based on Jackal. This methodology is applied to the design of a JPEG encoder and its performance is evaluated.

R4-2 (Time: 9:47 - 9:49)

Title	Object Oriented Design and Synthesis of Communication in Hardware-/Software Systems with OSSS
Author	*Kim Grüttner, Cornelia Grabbe, Frank Oppenheimer (OFFIS - Institute for Information Technology, Germany), Wolfgang Nebel (Carl v. Ossietzky University Oldenburg, Germany)
Page	pp. 343 - 350
Keyword	hw/sw co-design, high-level synthesis, communication synthesis, object oriented design, systemc
Abstract	In this paper we propose an object oriented hardware/software co-design methodology for embedded system design. The use of object-oriented techniques combined with template meta-programming during system level design facilitates the designer in writing faster, better and more reusable executable models of the specified system. One of the major challenges in system level design lies is the automatic or guided refinement process from the specification down to the implementation on a certain target platform. The contribution of this paper is a seamless communication refinement from a method based communication between active and passive objects to a signal base synthesisable communication through buses or point-to-point channels. The proposed methodology retains the separation of communication and behaviour and therefore enables an easy communication architecture exploration. To achieve this we have implemented a remote method invocation mechanism that can be used in conjunction with synthesisable channels. The applicability of our approach is shown with an IPv4 router design.

R4-3 (Time: 9:49 - 9:51)

Title	A Data Arrangement Method for Block Floating Point Systems
Author	*Takashi Hamabe, Keishi Sakanushi, Yoshinori Takeuchi, Masaharu Imai (Osaka University, Japan)
Page	pp. 351 - 356
Keyword	Block floating point, Data arrangement, Memory size
Abstract	Block floating point representation is a representation of real number that provides accurate arithmetic with small hardware cost. This research proposes a data arrangement method for block floating point systems considering data memory size. Our method intends to minimizes data memory size by grouping real number data which have close absolute value with an algorithm based on the Kernighan and Lin algorithm.

R4-4 (Time: 9:51 - 9:53)

Title	Calling Software Functions from Hardware Functions in High-Level Synthesizer CCAP
Author	*Masanari Nishimura, Nagisa Ishiura, Yoshiyuki Ishimori (Kwansei Gakuin University, Japan), Hiroyuki Kanbara (ASTEM RI, Japan), Hiroyuki Tomiyama (Nagoya University, Japan)
Page	pp. 357 - 360
Keyword	high-level synthesis, CCAP, hardware/software co-design, C-based design
Abstract	We are developing a high-level synthesizer named CCAP (C Compatible Architecture Prototyper), which synthesizes functions in C programs into hardware modules which are callable from the other software functions. In this paper, we propose a novel framework in which the synthesized hardware functions can also call software functions. We give both multi-thread and single-thread implementation schemes. We verified the correctness of the proposed method (single-thread version) through register transfer level simulation.

R4-5 (Time: 9:53 - 9:55)

Title	Performance-Aware Communication Architecture Synthesis
Author	*Alexander Viehl, Oliver Bringmann (FZI Forschungszentrum Informatik, Germany), Wolfgang Rosenstiel (Universität Tübingen, Germany)
Page	pp. 361 - 368
Keyword	Communication Architecture, Synthesis, Performance, Real-Time
Abstract	In this paper, a novel approach for communication architecture synthesis to guarantee conflict-free communication access in real-time critical systems is proposed. Our approach is based on the analysis of the temporal relation of communicating processes and the determination of communication instances that synchronize them. Based on these communication instances, the global system timing behavior is determined to identify potentially parallel communication instances. Based on the result of this analysis, an algorithm for determining a guaranteed conflict free communication schedule is proposed. This schedule can be used to synthesize communication controllers that realize resource allocation and guaranteed conflict-free binding of communication instances. Additionally, the inclusion of high-level communication protocols in the synthesis approach is discussed. Moreover, improvements on timing analysis are proposed with the objective of reducing the necessary amount of communication resources.

R4-6 (Time: 9:55 - 9:57)

Title	A Network Processor Synthesis System for Task-Chaining Network Applications
Author	*Youhua Shi, Keishi Nakayama, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki (Waseda University, Japan)
Page	pp. 369 - 374
Keyword	network processor, synthesis, task-chaining
Abstract	With the rapid development of network technology, the need to design a network equipment while to offer the speed, flexibility, and ease-of-use to accelerate time-to-market has emerged. To meet this challenge, in this paper first we presented a network processor model and then based on the model we proposed a network processor synthesis system for task-chaining network applications. Unlike previous works, the proposed method has the feature of sharing the communication resources. Experimental results have shown the importance of conducting the reduction in shared resource contention and also shown that, using the proposed NP synthesis system, how we can find the optimized network processor configurations in terms of performance and area to meet the designer's requirements.

R4-7 (Time: 9:57 - 9:59)

Title	Resynthesis Method for Circuit Acceleration on LUT-based FPGA
Author	*Weijie Xing (Graduate School of Information, Production and Systems, Waseda University, Japan), Takashi Horiyama (Saitama University, Japan), Shunichi Kuromaru, Tomoo Kimura (Matsushita Electric Industrial Co., Ltd, Japan), Shinji Kimura (Graduate School of Information, Production and Systems, Waseda University, Japan)
Page	pp. 375 - 380
Keyword	Verification, acceleration, FPGA, false path
Abstract	Design verification becomes most time consuming part in the design period, and the reduction is important. In the paper, we focus on the acceleration of emulation circuits, and propose a systematic method to reduce the delay time of combinational circuits called 0&1 skip method. The proposed method is simpler compared to the existing method. We apply the 0&1 skip method for the acceleration of circuits on LUT-Based FPGA

R4-8 (Time: 9:59 - 10:01)

Title	SAT Based Boolean Matching for Incompletely Specified Functions
Author	*Kuo-Hua Wang, Chung-Ming Chan (Fu Jen Catholic University, Taiwan)
Page	pp. 381 - 388
Keyword	Boolean Matching, Boolean Satisfiability, Functional Symmetry, Signature
Abstract	Boolean matching is to check the equivalence of two functions under input permutation and input/output phase assignments. In this paper, we will transform the Boolean matching problem to the Boolean satisfiability problem. Based on this transformation approach, a SAT-based matching algorithm will be proposed. Our algorithm can not only handle completely specified functions but also incompletely specified functions. Moreover, two signatures exploiting functional symmetries will be provided to reduce the size of SAT instance and thus expedite the matching process. Experimental results on a set of benchmarking circuits show that our matching algorithm is indeed very effective and efficient to solve the Boolean matching problem. Compared with our prior work on Boolean matching [30], our SAT-based matching algorithm outperforms the old algorithm by several orders of magnitude for many large circuits.

R4-9 (Time: 10:01 - 10:03)

Title	An Error Diagnosis Technique Based on Specifications with Don't Cares
Author	*Narumi Okada, Takayuki Iida, Toshiro Ishihara, Nobutaka Kuroki, Masahiro Numa (Kobe University, Japan)
Page	pp. 389 - 396
Keyword	error diagnosis, don't cares, ECO, incremental synthesis, design error
Abstract	We present an error diagnosis technique for subcircuits based on specifications with don't cares. This technique combines two procedures for reducing the number of error candidates, screening for false error locations based on the specification defined with nine signal values for incorporating don’t cares, and and a Boolean function manipulation using characteristic function indicating don’t care input vectors for each primary output. Experimental results have shown that the proposed approach is effective to increase the number of solutions by incorporating don’t cares.

R4-10 (Time: 10:03 - 10:05)

Title	An LUT-Based Error Diagnosis Technique Extended for Multiple Missing Line Errors Based on Iterative Diagnosis Procedure
Author	*Toshiro Ishihara, Ryosuke Arai, Narumi Okada, Nobutaka Kuroki, Masahiro Numa (Kobe University, Japan)
Page	pp. 397 - 404
Keyword	incremental synthesis, error diagnosis, missing line error, iterative procedure
Abstract	In this paper, we propose an improved technique to rectify multiple logic design errors including multiple missing line errors in LUT-based combinational circuits. A conventional error diagnosis technique: EXL_SL can rectify only a single missing line error at a time. Our technique can rectify multiple missing line errors by employing iterative diagnosis procedure for subcircuits. Experimental results for ISCAS’85 benchmark circuits demonstrate that 79.0% of circuits including one to three missing line errors can be rectified successfully.

R4-11 (Time: 10:05 - 10:07)

Title	Mixed-Abstraction Level Co-Simulation Environment for Dynamically Reconfigurable Processor Arrays
Author	*Satoshi Tsutsumi, Yohei Hasegawa, Hideharu Amano (Keio University, Japan)
Page	pp. 405 - 411
Keyword	Co-simulation, System level design, Dynamically reconfigurable processor, SystemC, Compiler
Abstract	In this paper, we present an automated design methodology and a design framework including System Generator, DRPA Generator, and DRPA Compiler for dynamically reconfigurable processor arrays (DRPAs). We have developed a System Generator which can generate a DRPA model written in SystemC and an interface wrapper using Verilog Procedural Interface (VPI) from application codes and a architecture description. We have integrated it to the tentative compiler based on COINS, and constructed a mixed-abstraction level co-simulation environment.

R4-12 (Time: 10:07 - 10:09)

Title	Black-Diamond: a Retargetable Compiler using Graph with Configuration Bits for Dynamically Reconfigurable Architectures
Author	*Vasutan Tunbunheng, Hideharu Amano (Keio University, Japan)
Page	pp. 412 - 419
Keyword	dynamically reconfigurable processor, retargetable compiler, placement and routing, multicontext
Abstract	For developing design envionment for various types of Dynamically Reconfigurable Processor Arrays (DRPAs), the GCI (Graph with Configuration Information) is proposed to represent configurable resource in the target dynamically reconfigurable architecture. The function unit, constant unit, register, and routing resource can be represented in the graph as well as the configuration information. The restriction in the hardware is added in the graph by using ``DisCounT'' port which is limited the possible configuration bits at the port controlled by the other ports. A prototype compiler called Black-Diamond with GCI is now available for three different DRPAs. It translates data-flow graph from C-like front-end description, applies placement and routing by using the GCI, and generates configuration data for each element of the DRPA in the form of multicasting. Implementation results of simple applications show that Black-Diamond can generate reasonable designs for three different architectures.

R4-13 (Time: 10:09 - 10:11)

Title	A Reconfigurable Architecture with Special Functions for Shift Keying
Author	*Ayataka Kobayashi, Ittetsu Taniguchi, Keishi Sakanushi, Yoshinori Takeuchi, Masaharu Imai (Osaka University, Japan)
Page	pp. 420 - 426
Keyword	Reconfigurable Architecture, shift keying
Abstract	This paper proposes a reconfigurable architecture for shift keying named RASK. RASK has a specialized ALU with specific functions and specialized processing elements for shift keying. Experimental results show that the proposed architecture achieves several shift keyings with small area compared to a reconfigurable architecture without specialized ALU.

R4-14 (Time: 10:11 - 10:13)

Title	Topology Generation and Floorplanning for Low Power Application-Specific Network-on-Chips
Author	*Wan-Yu Lee, Iris Hui-Ru Jiang (Department of Electronics Engineering and Institute of Electronics, National Chiao Tung University, Taiwan)
Page	pp. 427 - 432
Keyword	Network-on-Chips, Low Power
Abstract	As the process advances into nanotechnology, the number of cores and the amount of communication on a chip are rapidly increasing. Using a micro-network, Network-on-Chip can overcome the communication inefficiency in the traditional shared bus communication architecture. The system performance of application-specific Network-on-Chips is mostly measured by power, timing, and area. Power and timing highly depend on how the network topology connects routers and cores and how many routers are used; area is simply determined by floorplanning. Unlike previous endeavors, we propose a new methodology to perform network topology generation before floorplanning. Moreover, our method can preserve the optimality of topology to floorplan. Our method not only minimizes power, satisfies timing and area constraints, but also guarantees deadlock free. Compared with previous work, the results show using the same number of routers, this approach can achieve competitive power consumption and have the above guarantees.

R4-15 (Time: 10:13 - 10:15)

Title	Floorplan-Aware Design Methodology for Application-Specific Bus Matrix Systems
Author	*Geeng-Wei Lee, Juinn-Dar Huang, Jing-Yang Jou (National Chiao Tung University, Taiwan)
Page	pp. 433 - 438
Keyword	bus matrix, floorplan, multi-cycle communication, communication architecture
Abstract	The design of communication architectures becomes more and more important as modern systems require wider and wider communication bandwidth and the technology keeps the trend of miniaturization. Simultaneously considering the issues of hardware cost, system performance, and multi-cycle communication makes designing communication architectures even harder. In this paper, we propose a floorplan-aware design methodology for designing the bus matrix consisting of the minimum number of buses for a given system under the performance constraints and the assumption of multi-cycle communication.

R4-16 (Time: 10:15 - 10:17)

Title	Low Power Object Oriented Synthesis for Electronic System-Level Design
Author	*Mehdi Kamal, Shaahin Hessabi (Sharif University of Technology, Iran)
Page	pp. 439 - 444
Keyword	Object Oriented, Synthesis, Low Power, System Level
Abstract	Energy and power consumptions are becoming among the most important design factors due to portable device usage. Low power techniques are widely used in low level of design; similarly using this technique in system-level design is inevitable. In this paper, we use two techniques for low power synthesis of an object oriented (OO) system. We implement our proposed techniques in an OO synthesis tool, named ODYSSEY. We have added module-level clock gating and reduced the number of object's data accesses during synthesis and studied the power reduction of these two techniques. Clock gating part controls the clock during the system work and dynamically manages the power. Each class of design needs its data, so methods for must access a shared memory. Therefore, decreasing the access number reduces the power dissipation in interconnection network and improves the performance of system. we implemented this technique in algorithm-level. For evaluating the proposed techniques, we have considered JpegDecoder, JpegEncoder and Genetic Algorithm benchmarks. Experiments show that the clock gating technique reduces power dissipation about 45%. Decreasing the number of object's data accesses reduces power and improves the performance of system.

Invited Talk IV
Time: 11:30 - 12:15 Tuesday, October 16, 2007
Location: Conference Hall (2F)
Chair: Hidetoshi Onodera (Kyoto University, Japan)

I4-1 (Time: 11:30 - 12:15)

Title	Statistical Techniques to Combat Variability and Achieve Robust Design
Author	*Chandu Visweswariah (IBM T. J. Watson Research Center, United States)
Page	p. 447
Abstract	Variability due to manufacturing, environmental and aging uncertainties constitutes one of the major challenges in continuing CMOS scaling. Worst-case design is simply not feasible any more. This presentation will describe how statistical timing techniques can be used to reduce pessimism, achieve full-chip and full-process coverage, and enable robust design practices. A practical ASIC timing methodology based on statistical timing will be described. Model-to-hardware correlation, at-speed test and robust optimization techniques will be presented. Key research initiatives that were required to achieve such a design flow will be described.

Invited Talk V
Time: 13:30 - 14:15 Tuesday, October 16, 2007
Location: Conference Hall (2F)
Chair: Shin'ichi Wakabayashi (Hiroshima City University, Japan)

I5-1 (Time: 13:30 - 14:15)

Title	Current Status of LSI Micro-Fabrication and Future Prospect for 3D System and Design Integration
Author	*Kazuya Okamoto (Osaka University, Japan)
Page	pp. 451 - 457
Abstract	Miniaturization technology based on Dennard's rule for LSI has been technically progressing throughout the years and it has conferred a benefit on human's life. Optical lithography has an amazing progress so far with achieving high resolution at 90nm or less using various kinds of technologies. However, there is a low probability that this scenario of producing ever-finer feature geometry will continue, because resolution capabilities will soon reach a critical limit due to CMOS performance threshold and chip economy. Therefore, to assure continued performance improvements for the future of LSI devices, next generation interconnect and advanced packaging technologies should acquire importance. Especially, a new 3 dimensional (3D) monolithic integration would be an integral part of this technology. At the same time, the definition of the semiconductor device should be updated into "System&Design Integration (SDI)." SDI will provide the needed feedback to launch a new field of clear applications based on a total system solution with innovated equipments of design, fabrication, inspection and evaluation. 3D-LSI and SDI will have a tremendous impact on the future electronics industries.

Design Verification & Design Experience II
Time: 14:15 - 16:00 Tuesday, October 16, 2007
Location: Conference Hall (2F) & Poster Room (2F)
Chairs: Chien-Nan Liu (National Central University, Taiwan), Qiang Zhu (Cadence Design Systems, Japan)

R5-1 (Time: 14:15 - 14:17)

Title	Formal Representation and Verification of Arithmetic Circuits Using Symbolic Computer Algebra
Author	*Yuki Watanabe, Naofumi Homma, Takafumi Aoki (Tohoku University, Japan), Tatsuo Higuchi (Tohoku Institute of Technology, Japan)
Page	pp. 461 - 468
Keyword	datapath, arithmetic circuit, formal verification, computer algebra
Abstract	This paper presents an application of symbolic computer algebra to arithmetic circuit design. Our method represents an arithmetic circuit as a hierarchical graph, which consists of high-level mathematical objects based on weighted number systems and arithmetic formulae. We can verify the function of such circuit representation by polynomial reduction techniques using Groebner Bases as well as the conventional *BMD (multiplicative Binary Moment Diagram) techniques. In this paper, we investigate the basic characteristics of the proposed representation and verification through some case studies such as parallel multiplier and BCD (Binary-Coded Decimal) adder. The result shows that the proposed approach succeeded in verifying some arithmetic circuits where the conventional approaches failed.

R5-2 (Time: 14:17 - 14:19)

Title	Range Equivalent Circuit Minimization
Author	*Yung-Chih Chen, Chun-Yao Wang (National Tsing Hua University, Taiwan)
Page	pp. 469 - 476
Keyword	range redundant primary input, range-preserving simplification
Abstract	Simplifying a combinational circuit while preserving its range has a variety of applications, such as combinational equivalence checking and random simulation. Previous approaches use BDD technique to compute the range of one circuit, and then reconstruct the circuit with the computed range. Although the size of the new circuit is significantly reduced due to the range rearrangement, these methods suffer from the BDD blowup problem for large circuits. Thus, in this paper, we propose a new method to simplify combinational circuits without explicit range computation. We first introduce a new concept of range stuck-at fault test, and show that an untestable range stuck-at fault on a primary input indicates this primary input is range redundant (not responsible for the circuit’s range). We then present a procedure to determine if a given range stuck-at fault on a primary input is untestable. Our method iteratively identifies and removes range redundant primary inputs to simplify a combinational circuit without performing range computation. Accordingly, large circuits that BDD-based methods cannot deal with can be handled. We conduct experiments on a set of ISCAS’85 and MCNC benchmarks. The experimental results show that our approach can minimize circuits such that less number of primary inputs are left. The ratio of our approach and a previous non-BDD-based method over the reduced number of primary inputs is 1.57 on average.

R5-3 (Time: 14:19 - 14:21)

Title	Predictive Test Strategy for CMOS RF Mixers
Author	*Kay Suenaga, Rodrigo Picos, Sebastia Bota, Miquel Roca, Eugeni Isern, Eugeni Garcia-Moreno (University of Balearic Islands, Spain)
Page	pp. 477 - 483
Keyword	CMOS, RF Mixer, Predictive Test, RF Test
Abstract	Abstract - In this paper, we present two built-in self-test strategies for the down-converter stage in a GSM receiver. These strategies are based on estimating its performance parameters from measurements in test mode. By using some receiver blocks as part of the test set-up and reusing it, the circuitry overhead is kept small. The first strategy uses the LO signal as the only test stimuli. The second strategy uses additional test circuitry, a generator and an auxiliary mixer. Prediction accuracies are similar in both strategies, but the second one simplifies the measure process of the test observables.

R5-4 (Time: 14:21 - 14:23)

Title	Unifying AMBA based Verification Environment at SystemC / RTL / FPGA Levels: Using 3D Graphics SoC As an Example
Author	*Wei-Sheng Huang, Ruei-Ting Gu, Ing-Jer Huang (National Sun Yat-Sen University, Taiwan)
Page	pp. 484 - 487
Keyword	test-pattern, auto regrssion test, unify, verification environment
Abstract	This paper presents an AMBA-based mutual-verification environment that unifies the different level of verification environment. It makes the test-patterns reuse in different verification environment and regression test automation easier. In addition, mutual-verification environment can reduce the verification efforts because the level of verification is raised from cycle-level to program-level. In modern complex IC design, make the verification more efficient could reduce lots of costs and gain a better verification quality dramatically.

R5-5 (Time: 14:23 - 14:25)

Title	Hardware/Software Covalidation with FPGA and RTOS Model
Author	*Seiya Shibata, Shinya Honda, Yuko Hara, Hiroyuki Tomiyama, Hiroaki Takada (Nagoya University, Japan)
Page	pp. 488 - 494
Keyword	Covalidation, FPGA, RTOS, Embedded Systems
Abstract	This paper presents a hardware/software covalidation environment for embedded systems. Our covalidation environment consists of a software simulator which simulates a set of application tasks together with an RTOS running on a processor, multiple hardware simulators, FPGA emulators and a covalidation backplane. For shortening validation time, our covalidation environment uses fast RTOS simulation model for software and FPGA for hardware. Using the covalidation environment, we successfully performed covalidation of an MPEG4 decoder system.

R5-6 (Time: 14:25 - 14:27)

Title	Pipeline-Aware Instruction-Level Power Analysis for VLIW DSP Core
Author	Wen-Tsan Hsieh, Hsin-Ying Liao, *Chien-Nan Jimmy Liu (National Central University, Taiwan), Shu-Yu Cheng, Ji-Jan Chen (SOC Technology Center of Industrial Technological Research Institute, Taiwan)
Page	pp. 495 - 499
Keyword	software power model, instruction level power analysis, power model, pipline-aware, VLIW
Abstract	In this work, we develop a new instruction-level power analysis approach for pipelined VLIW DSP cores. The proposed approach can take care of both the base power cost and inter-instruction effect cost in each pipeline stage as well as possible, so the power estimation can be much closer to the real pipeline behavior. The experimental results have shown that the average error of our approach is less than 3%.

R5-7 (Time: 14:27 - 14:29)

Title	Automatic Generation of Custom Interface Transactors for Verification Environments
Author	*Rafael K. Morizawa, Hiroaki Iwashita, Koichiro Takayama (Fujitsu Laboratories, LTD., Japan)
Page	pp. 500 - 506
Keyword	Transactor generation, Testbench generation, Protocol checker
Abstract	The verification cost of complex SoCs has been increasing in a fast pace. Thus it is necessary to cut as much as possible any costs that are not directly associated to the verification task itself. From our experience, we have noticed that building the verification environment (also called testbench) is not an easy task, takes time, and has a negative impact in the overall verification cost. The main reason of the complexity of a verification environment lies in the interfacing between the DUT (Design Under Test) and the testbench. Although standard interface protocols are available, custom complex interface protocols are used instead in order to optimize the hardware's communication throughput and latency. One way to alleviate this problem is to abstract this interfacing by using transactors. In this paper we propose a methodology to automatically generate transactors. We also present a case study where the proposed methodology has been used to build the verification environment of a bus bridge used in a commercial product.

R5-8 (Time: 14:29 - 14:31)

Title	Analog Simulation Meets Digital Verification- A Formal Assertion Approach for Mixed-Signal Verification
Author	*Alexander Jesser, Lars Hedrich (University of Frankfurt a.M., Germany), Stefan Laemmermann, Roland Weiss, Juergen Ruf, Thomas Kropf, Wolfgang Rosenstiel (University of Tuebingen, Germany), Alexander Pacholik, Wolfgang Fengler (Technical University of Ilmenau, Germany)
Page	pp. 507 - 514
Keyword	Analog and mixed-signal design, Verification and simulation, Assertion-based verification, Property specifiction language
Abstract	Functional and formal verification are important methodologies for complex mixed-signal designs. But there exists a verification gap between the analog and digital blocks of a mixed-signal system. Our approach improves the verification process by creating mixed-signal assertions which are described by a combination of digital assertions and analog properties. The proposed method is a new assertion-based verification flow for designing mixed-signal circuits. The effectiveness of the approach is demonstrated on a sigma/delta-converter.

R5-9 (Time: 14:31 - 14:33)

Title	Encoding Assertions with Dynamic Local Variables for Bounded Property Checking
Author	*Sho Takeuchi, Kiyoharu Hamaguchi, Toshinobu Kashiwabara (Graduate School of Information Science and Technology, Osaka University, Japan)
Page	pp. 515 - 521
Keyword	Assertion-Based Verification, Bounded Model Checking, SystemVerilog, Dynamic Local Variable
Abstract	To perform functional formal verification, bounded property checking for assertions has been proposed. However, it is difficult to handle assertions including dynamic local variables such as in SystemVerilog. In this paper, we assume a restriction for assertions with dynamic local variables that substitution to each dynamic local variable is allowed only once in the assertion at the left-hand side of an implication operator. Under this restriction, we investigate an algorithm for verifying assertions with one storing variable for each dynamic local variable using bounded property checking. We implemented the algorithm and performed some experiments.

R5-10 (Time: 14:33 - 14:35)

Title	Evaluation of All-Digital PLL by Using Clock-Period Comparator
Author	*Yukinobu Makihara, Masayuki Ikebe, Eiichi Sano (Hokkaido University, Japan)
Page	pp. 522 - 528
Keyword	digitally controlled PLL, clock-period comparator, loop characteristic
Abstract	For a digitally controlled phase-locked loop (PLL), we evaluate the use of a clock-period comparator (CPC). In this PLL, only the frequency lock operation should be performed; however, the phase lock operation is also simultaneously achieved by performing the clock-period comparison. In addition, we succeeded in digitizing a voltage controlled oscillator (VCO) with a linear characteristic. We confirmed a phase lock operation with a slight loop characteristic through SPICE simulation.

R5-11 (Time: 14:35 - 14:37)

Title	A Lateral Unified-CBiCMOS Buffer Circuit for Driving 5-nF Maximum Load Capacitance per CCD Clock
Author	*Masatoshi Kobayashi, Takashi Hamahata, Toshiro Akino (Kinki University, Japan), Kenji Nishi (Kinki University Technology College, Japan), Cuong Vo Le, Kohsei Takehara, T. Goji Etoh (Kinki University, Japan)
Page	pp. 529 - 535
Keyword	Slanted linear CCD storage, ISIS, CMOS/SOI, Lateral unified-CBiCMOS
Abstract	Since 2001, we have been developing an in-situ storage image sensor (ISIS) that captures 100 to 150 consecutive images at a frame rate of 1 Mfps and an ultra-high-speed video camera for use with this ISIS. Currently, basic research is continuing in an attempt to increase the frame rate up to 100 Mfps. The CCD chip of this camera has a 10 V maximum voltage supply source and a 5 nF maximum load capacitance per CCD clock. The goal of this study is to design a prototype power supply chip for generating the CCD clock and for driving the load capacitance of the CCD chip. A further goal is to verify the circuit behavior, based on a 1-ìm CMOS/SOI process having breakdown voltages of almost 20 V. A lateral unified-CBiCMOS buffer circuit consists of n- and p-channel MOSFETs that include parasitic lateral npn- and pnp-BJTs having partially depleted p- and n-base layers, respectively, on an epitaxial substrate and SOI. A forward current is applied to the base terminal of the channel MOSFET, adding a normal pull-up or pull-down MOSFET as a current source. A new device structure is designed to reduce the resistance values between the drains and the bases, while also keeping both MOSFETs inactive and activating either the lateral npn or pnp BJT. A clock generator consisting of a ring oscillator with a 21-stage CMOS inverter amplified and driven by a buffer circuit is designed. Circuit simulation using 1-ìm LEVEL-3 model parameters for the MOSFETs and a current gain of âF = 100 for the BJTs reduced the delay time of the unified-CBiCMOS buffer circuit by approximately 1/4, compared to that for an equivalent two-stage CMOS inverter circuit designed on the basis of logical effort for driving a load capacitance of 5 nF at Vdd = 10 V. The power supply chip with the unified-CBiCMOS buffer circuit can drive the CCD chip at a frame rate of 10 Mfps for a maximum 5-nF load capacitance.

R5-12 (Time: 14:37 - 14:39)

Title	A CMOS Transconductor with Rail-to-Rail Input Stage under 1.8-V Supply Voltage
Author	*Tien-Yu Lo, Cheng-Sheng Kao, Wen-Hung Hsieh, Chung-Chih Hung (National Chiao Tung University, Taiwan)
Page	pp. 536 - 539
Keyword	Transconductor, Rail-to-rail
Abstract	This paper presents a CMOS low-voltage rail-to-rail transconductor under a supply voltage 1.8-V. Instead of using an n-type and a p-type differential input pair, we use an n-type and a level-shift n-type differential input pair to design a rail-to-rail input stage. Instead of the reported complex structure, a novel level-shift n-type differential input pair is designed to maintain constant transconductance. This work is designed in TSMC 0.18-¨¬m CMOS technology. Results show that the fluctuation of total transconductance of the proposed transconductor is less than ¡À 3%.

R5-13 (Time: 14:39 - 14:41)

Title	Charge Recycling between Divided Blocks in MTCMOS Circuits
Author	*Akira Tada, Hiromi Notani, Genichi Tanaka, Takashi Ipposhi (Renesas Technology Corporation, Japan), Masaaki Iijima, Masahiro Numa (Kobe University, Japan)
Page	pp. 540 - 544
Keyword	power gating, MTCMOS, low power, charge recycling, leak current
Abstract	An important issue with MTCMOS circuits is the energy consumption for charging virtual P/G lines during the sleep/active mode transitions. Charge recycling is an effective technique. We propose a technique to reuse more charge by dividing a circuit into several blocks, where the charge is transferred between the properly selected pairs. Assuming ideal situation, we can improve the energy saving ratio up to 63.6% from 50%. The proposed method has improved the ratio by 10.0%, and total power by 7.1%.

R5-14 (Time: 14:41 - 14:43)

Title	CoDaMa: An XML-based Framework to Manipulate Control Data Flow Graphs
Author	*Shunitsu Kohara, Shi Youhua, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki (Waseda University, Japan)
Page	pp. 545 - 549
Keyword	CDFG, XML, framework, HW/SW co-synthesis, high-level synthesis
Abstract	This paper proposes an XML-based framework to manipulate CDFGs (Control Data Flow Graphs) for HW/SW (Hardware / Software) co-synthesis systems or high-level synthesis systems. With the increased scale of the recent SoC applications, synthesis systems require implemented more advanced functions. It would result in increased development efforts. The developers using our framework can implement algorithm and construct the systems easily by using XML descriptions as intermediate representation of application programs and providing the input/output interface.

Panel Discussion
Time: 16:00 - 17:30 Tuesday, October 16, 2007
Location: Conference Hall (2F)

D-1 (Time: 16:00 - 17:30)

Title	The End of Traditional CMOS
Author	*Moderator: Raul Camposano (Xoomsys, United States), Panelists: Gul Agha (University of Illinois at Urbana-Champaign, United States), Yasuhiko Hagihara (Device Platforms Research Laboratories, NEC Corporation, Japan), Igor Markov (University of Michigan, United States), Chandu Visweswariah (IBM T. J. Watson Research Center, United States)
Page	p. 553
Abstract	The rumors of CMOS’ death have been greatly exaggerated. After 2 decades as the workhorse of the electronics industry, device counts have scaled by ~10⁴ and speeds by ~10². Transistor “performance” has consequently scaled by a ~10⁶ and has arguably been the main driver of system performance. CMOS isn’t showing signs of ending its reign any time soon. Or is it? The panel will discuss this question, in particular the following positions: •Business as usual. “Simple” (Dennard) scaling has not been simple for decades, its just getting a bit harder but essentially nothing new. We have dealt with new processes, materials, devices, circuits for a long time. We can make it below 10nm. •The problem is really economic. Scores of companies are exiting the fab business already. Scaling will most definitely end some day, and the end is coming slowly. Depending on the volume, different applications are getting stuck at different nodes. But CMOS will continue to be the principal game in town for a long time. So, it is more important to look at what we can do (design) with silicon than to further scale it. •CMOS will be “hybridized” by add on technologies, for example to increase communication speed both on- and off-chip; or to produce small, very fast non-volatile memories.. Main candidates: Optical and nanoswitches. •We ought to look at something new like Quantum Computing. Some niche applications like cryptography will benefit greatly and will drive the development of such new technologies. Power, reliability and material limits (among others) will prevent further progress in CMOS.

The 14th Workshop on Synthesis And System Integration of Mixed Information technologies Technical Program

Session Schedule

List of Papers

The 14th Workshop on Synthesis And System Integration of Mixed Information technologies
Technical Program