(Go to Top Page)

The 14th Workshop on Synthesis And System Integration of Mixed Information technologies
Technical Program

Remark: The presenter of each paper is marked with "*".
Technical Program:   SIMPLE version   DETAILED version with abstract
Author Index:   HERE

Session Schedule


Monday, October 15, 2007

Opening
9:10 - 9:20
K (Conference Hall (2F))
Keynote Speech

9:20 - 10:20
R1 (Conference Hall (2F) & Poster Room (2F))
Design Experience I

10:20 - 12:05
Lunch
12:05 - 13:25
I1 (Conference Hall (2F))
Invited Talk I

13:25 - 14:10
R2 (Conference Hall (2F) & Poster Room (2F))
FPGA, Place & Route

14:10 - 15:50
I2 (Conference Hall (2F))
Invited Talk II

15:50 - 16:35
R3 (Conference Hall (2F) & Poster Room (2F))
Design Methodology for Nanometer Era

16:35 - 18:15
Banquet
18:30 - 20:30

Tuesday, October 16, 2007

I3 (Conference Hall (2F))
Invited Talk III

9:00 - 9:45
R4 (Conference Hall (2F) & Poster Room (2F))
System Level Design & Logic Synthesis

9:45 - 11:30
I4 (Conference Hall (2F))
Invited Talk IV

11:30 - 12:15
Lunch
12:15 - 13:30
I5 (Conference Hall (2F))
Invited Talk V

13:30 - 14:15
R5 (Conference Hall (2F) & Poster Room (2F))
Design Verification & Design Experience II

14:15 - 16:00
D (Conference Hall (2F))
Panel Discussion

16:00 - 17:30
Closing
17:30 - 17:40



List of Papers

Remark: The presenter of each paper is marked with "*".

Monday, October 15, 2007

Keynote Speech
Time: 9:20 - 10:20 Monday, October 15, 2007
Location: Conference Hall (2F)
Chair: Shinji Kimura (Waseda University, Japan)

K-1 (Time: 9:20 - 10:20)
TitleFuture Design Paradigms: Technologies, Circuits and Architectures
Author*Giovanni De Micheli (CSI, EPFL, Switzerland)
Pagep. 3
AbstractThe scaling of CMOS technology is coming soon to an end, and yet it is unclear whether CMOS devices in the 10-20 nanometer range will find a useful place in semiconductor products. At the same time, new silicon-based technologies (e.g., silicon nanowires) and non-silicon based (e.g., carbon nanotubes) show the promise of replacing traditional transistors. Within this rich set of possibilities, we will see more an more a hybridization of technologies toward achieving specific objectives, such as seamless interfacing to embedded sensors, ultra-low power consumption, biological probing, etc. In order for the technology to be widely applicable, specific architectures will be required as well as design tools and methodologies.


Design Experience I
Time: 10:20 - 12:05 Monday, October 15, 2007
Location: Conference Hall (2F) & Poster Room (2F)
Chairs: Chun-Yao Wang (National Tsing Hua University, Taiwan), Tohru Ishihara (Kyushu University, Japan)

R1-1 (Time: 10:20 - 10:22)
TitlePower-Conscious Synthesis of Parallel Prefix Adders under Bitwise Timing Constraints
Author*Taeko Matsunaga, Shinji Kimura (Waseda University, Japan), Yusuke Matsunaga (Kyushu University, Japan)
Pagepp. 7 - 14
Keywordparallel prefix adder, switching activity, power, timing constraints, arithmetic synthesis
AbstractGlobal structures of parallel prefix adders can be synthesized flexibly depending on each context, such as bitwise input/output timing constraints. In this paper, an approach for power-conscious synthesis of parallel prefix adders is proposed. Global structures of parallel prefix adders are represented as prefix graphs. The switching cost of a prefix graph is defined based on switching activities of nodes in a prefix graph, and minimized by extending our area minimization algorithms. This approach accepts bitwise input/output timing constraints and bitwise probability that each input signal value is one, and minimizes the total sum of switching activities depending on each distinct context. Calculating switching activities by OBDD-based approach makes this approach efficient. Experimental results show the effectiveness of our approach compared to existing regular parallel prefix adders.

R1-2 (Time: 10:22 - 10:24)
TitleDesign of a Combined Circuit for Multiplication and Inversion in GF(2m)
Author*Katsuki Kobayashi, Naofumi Takagi (Nagoya University, Japan)
Pagepp. 15 - 20
KeywordGF(2m), multiplication, inversion
AbstractA combined circuit for multiplication and inversion in GF(2m) is proposed. We combine the inversion algorithm proposed by Yan et al. that is based on the extended Euclid's algorithm and the MSB-first multiplication algorithm by focusing on the similarities between them so that multiplication and inversion can share almost all hardware components of the circuit. The area of the circuit is estimated to be approximately 40% smaller than the total area of an ordinary multiplication circuit and an ordinary inversion circuit.

R1-3 (Time: 10:24 - 10:26)
TitleAssociative Memory Design Realizing Reference-Pattern Recognition and Learning based on Short/Long-Term Storage Concept
Author*Shogo Sakakibara, Md. Anwarul Abedin, Yuki Tanaka, Ali Ahmadi , Hans Jüergen Mattausch, Tetsushi Koide (Hiroshima University, Japan)
Pagepp. 21 - 25
KeywordAssociative Memory, Short/Long-term memory
AbstractIn the presented research, an associative memory architecture for searching the most similar data among previously stored reference data is applied, which achieves high speed, low power consumption and small implementation area due to a mixed digital-analog fully-parallel nearest-match search circuitry. The realization of the learning capability is based on the concept of short/long-term memory and tries to mimic the function of the human brain. The complete LSI test-chip designed in 0.35um CMOS technology for verification of this architecture.

R1-4 (Time: 10:26 - 10:28)
TitleAcceleration of Advanced Encryption Standard (AES) Processing on a CAM Enhanced Super Parallel SIMD Processor
Author*Masaharu Tagami, Masakatsu Ishizaki, Takeshi Kumaki, Yutaka Kono, Tetsushi Koide, Hans Jüergen Mattausch (Hiroshima University, Japan), Takayuki Gyohten, Hideyuki Noda, Katsumi Dosaka, Kazutami Arimoto, Kazunori Saito (Renesas Technology Corporation, Japan)
Pagepp. 26 - 31
Keywordsuper parallel SIMD processor, AES, CAM, multimedia processing, pattern matching
AbstractThis paper presents an Advanced Encryption Standard (AES) implementation on a Content Addressable Memory (CAM) enhanced super-parallel SIMD processor. The proposed SIMD processor architecture achieves 40 GOPS for 16b additions at 200MHz clock frequency and 250 mW power dissipation. In the AES processing, a table conversion processing is included. We apply an integrated CAM to which the SIMD processor can off-load the table conversion for quick processing. As a result, we can realize high-speed AES execution on the proposed architecture.

R1-5 (Time: 10:28 - 10:30)
TitleHardware Realization of Two-Stage Pattern Matching System using Fully-Parallel Associative Memories
Author*Md. Anwarul Abedin, Yuki Tanaka, Shogo Sakakibara, Ali Ahmadi , Tetsushi Koide, Hans Jüergen Mattausch (RCNS, Hiroshima University, Japan)
Pagepp. 32 - 37
Keywordassociative memory, pattern matching, fully parallel search, mixed digital/analog circuit
AbstractA hardware realization of cascaded fully-parallel associative memory with two-stage winner search is proposed. In this architecture we have used two different types of associative memories. One is based on the $k$-nearest-matches search and other one is a special type of associative memory in which winner search is done only among the activated reference patterns. The activation in the second associative memory is done by first associative memory after searching the k-nearest-matches. We have already designed, fabricated and tested the associative memories separately. The complete two-stage pattern matching system is tested here with Matlab software and hardware realization is currently under the design process.

R1-6 (Time: 10:30 - 10:32)
TitleA Fast Differential-Amplifier-Based Winner-Search circuit for Fully Parallel Associative Memories
Author*Yuki Tanaka, Md. Anwarul Abedin, Shogo Sakakibara, Tetsushi Koide, Hans Jüergen Mattausch (Hiroshima University, Japan)
Pagepp. 38 - 41
Keywordassociative memory, nearest search, digital-analog circuit, differential amplifier
AbstractA mixed digital-analog fully parallel associative memory with differential amplifier for winner search is proposed. The use of proposed differential amplifier for winner search improves the speed, reliability and area efficiency of the associative memory based system. The test chip consumes $5.48mm^2$ area in 0.35 $\mu$m CMOS technology for 64 reference patterns with 16 binaries of 5-bit. The operation speed of the system is less than 78 ns with an average power consumption of around 132 mW.

R1-7 (Time: 10:32 - 10:34)
TitleReducing the Dynamic Energy Consumption in the Multi-Layer Memory of Embedded Multimedia Processing Systems
Author*Ilie I. Luican (University of Illinois at Chicago, United States), Hongwei Zhu (ARM, Inc., United States), Florin Balasa (Southern Utah University, United States), Dhiraj K. Pradhan (University of Bristol, Great Britain)
Pagepp. 42 - 48
Keywordmemory management, embedded systems, dynamic energy
AbstractThe memories in data-intensive signal processing systems -- including video and image processing, artificial vision, real-time 3-D rendering, advanced audio and speech coding, medical imaging applications -- have an important impact on the overall energy budget. This paper focuses on the reduction of the dynamic energy consumption in the memory subsystem, starting from the high-level algorithmic specification of the application. The approach to address this problem uses elements of the theory of polyhedra and relies on a variety of algebraic techniques specific to the data-flow analysis used in modern compilers.

R1-8 (Time: 10:34 - 10:36)
TitleAn Output Probability Computation Circuit Design for Real Time Speech Recognition
Author*Joe Hashimoto, Akihiko Eguchi, Makoto Saituji (Kinki University, Japan), Akihisa Yamada (Sharp Corporation, Japan), Takashi Kambe (Kinki University, Japan)
Pagepp. 49 - 55
KeywordSpeech recognition, C-based architecture design, memory access method, application specific arithmetic circuit, Bach system
AbstractSpeech recognition is becoming a popular technology for the implementation of human interfaces. However, conventional approaches to large vocabulary continuous speech recognition require a high performance CPU. In this paper, we describe a speech-recognition system designed using a C-based architecture design methodology. Pipelining and parallel processing circuits accelerated by data buffering, memory separation, and loop unrolling were implemented to calculate the Hidden Markov Model (HMM) output probability at high speed and their performances evaluated. It is shown that real time speech recognition in small portable systems is possible.

R1-9 (Time: 10:36 - 10:38)
TitleA Hybrid Memory Architecture for Low Power Embedded System Design
Author*Tadayuki Matsumura, Yuriko Ishitobi (Kyushu University, Japan), Tohru Ishihara, Maziar Goudarzi (System LSI Research Center Kyushu University, Japan), Hiroto Yasuura (Kyushu University, Japan)
Pagepp. 56 - 62
Keywordlow power, on-chip memory, leakage, design, scratchpad
AbstractOn-chip memories are one of the most power hungry components of today's system on a chips (SoCs). The on-chip memories generally use higher Vdd and Vth than those of logic parts to suppress the static power consumption without increasing the access delay of the memories. This design policy, however, increases the dynamic power consumption since the dynamic power consumption is quadratically proportional to the Vdd. This paper proposes a hybrid memory architecture which consists of the following two regions; 1) a frequently accessed region which uses low Vdd and Vth and 2) a rarely accessed region which uses high Vdd and Vth. The key of our architecture is that the access delays for the two regions are equal to each other, which eases to integrate this memory into processors without any modifications of an internal processor architecture. This paper also proposes a technique for finding the sizes and the code allocation for the regions so as to minimize the total power consumption of the memory. Experimental results demonstrate that the total power consumption of the scratchpad memory can be reduced in all cases.

R1-10 (Time: 10:38 - 10:40)
TitleAn Accurate and Efficient Lane Recognition Algorithm for Automotive Active Safety System
Author*Yusuke Watanabe, Masahiro Fukui (Ritsumeikan University, Japan)
Pagepp. 63 - 68
Keywordimage filter, automobile, lane recognition
AbstractLane recognition is an essential technique for automobile active safety applications. We aim at developing a high speed and high accurate lane recognition system. The proposing algorithm provides an efficient filter to extract candidates edges of lanes and avoid noise edges to reduce mis-recognition as much as possible. It is implemented by a simple hardware logic.

R1-11 (Time: 10:40 - 10:42)
TitlePerformance Evaluation of Region-Growing Image Segmentation Using Two-Dimensional Image-Block Scanning
Author*Keita Okazaki, Kazutoshi Awane, Kosuke Yamaoka, Tetsushi Koide, Hans Jüergen Mattausch (Hiroshima University, Japan)
Pagepp. 69 - 73
Keywordblock-scanning
AbstractWe report a 2-dimensional block-scanning image-segmentation architecture based on a region-growing approach which has real-time execution capability. Using the two techniques of a limited scan to the boundary of each grown region and an exhaustive block-internal growing process, we have improved processing speed, power consumption and hardware efficiency in comparison to the previous state of the art. In particular, the processing speed could be maximized and the processing-circuit size could be minimized by adjusting the pixel number within the scanning block, the memory configuration and the memory-access method.

R1-12 (Time: 10:42 - 10:44)
TitleAn Effective Parallel Coding Architecture Utilizing Characteristics of Multimedia Application
Author*Takeshi Kumaki, Masakatsu Ishizaki, Masaharu Tagami, Tetsushi Koide, Hans Jüergen Mattausch (Hiroshima University, Japan)
Pagepp. 74 - 80
KeywordContent addressable memory, CAM, Parallel coding, Multiport, Huffman coding
AbstractThis paper presents a parallel coding architecture using a flexible multi-ported content addressable memory (CAM). A previously reported Flexible Multi-port Content Addressable Memory (FMCAM) technology is improved by additional schemes for a single search mode and counting value setting and enables the fast parallel coding operation. Moreover, the concept of an inactive category suspend mode is possible and reduces the power consumption. Evaluation results for Huffman encoding within the JPEG application show that in the proposed architecture the number of clock cycles needed for encoding is 93% less than for a conventional DSP. The power consumption during data transmission between memory block and processing block for the improved FMCAM is estimated about 90% smaller than for the original FMCAM. Furthermore, the performance per unit area, measured in MOPS/mm^2, can be improved by a factor 3.8 in comparison to a conventional DSP.

R1-13 (Time: 10:44 - 10:46)
TitleVLSI Architecture for Real-time Retinex Video Image Enhancement
Author*Kazuyuki Takahashi, Yoshihiro Nozato (Osaka University, Japan), Hiroyuki Okuhata (Synthesis Corporation, Japan), Takao Onoye (Osaka University, Japan)
Pagepp. 81 - 86
Keywordvideo image enhancement, Retinex, variational model
AbstractReal-time VLSI architecture for Full HD 1080i video image enhancement is proposed, which is based on variational approach of the Retinex algorithm. In order to efficiently reduce the enormous computational cost required for image enhancement, processing layers and the number of iterations are determined in accordance with software evaluation result. Pipeline and parallel processing of pixels also contributes to achieve realtime processing of high resolution pictures. In addition, the use of illumination signal calculated for the previous frame rather than that for the current frame reduces required frame memory size. As a result, the proposed architecture with four parallelization, which can be implemented by 100K gates, processes 1,920x1,080, 30fps images in real-time at 24MHz operation.

R1-14 (Time: 10:46 - 10:48)
TitleΣΔ-Modulator with High Nearby Interferers Suppression by Transmission Zeroes
Author*Takashi Moue, Akira Matsuzawa (Tokyo Institute of Technology, Japan)
Pagepp. 87 - 90
KeywordDelta Sigma modulator , A/D conversion , CMOS
AbstractA Delta Sigma modulator that can suppress nearby interferers strongly by forming zeroes in signal transfer function has been proposed and demonstrated. Feedforward signal passes from input signal terminal to each integrator can form zeroes in signal transfer function to suppress the nearby interferers strongly which often degrade quality of A/D conversion heavily and causes serious instability. A prototype discrete-time 6th-order Delta Sigma modulator of which signal bandwidth is 777 kHz has fabricated in 0.18 um CMOS technology and demonstrated 20 dB suppression to the 2.65 MHz to 8.22 MHz adjacent channel signals and SNR of 59 dB for in-band signals.

R1-15 (Time: 10:48 - 10:50)
TitleThe Effects of Switch Resistances on Pipelined ADC Performances and the Optimization for the Settling Time
AuthorMasaya Miyahara, *Hiroki Endou, Akira Matsuzawa (Tokyo Institute of Technology, Japan)
Pagepp. 91 - 96
Keywordanalog to digital converter, switched capacitor amplifier, switch resistance, pipeline operation
AbstractIn this paper, we discuss the effects of switch resistances on the step response of switched-capacitor (SC) circuits, especially multiplying digital-to-analog converters (MDACs) in pipelined analog-to-digital converters. Theory and simulation results reveal that the settling time of MDACs can be decreased by optimizing the switch resistances. This switch resistance optimization does not only effectively increase the speed of single-bit MDACs, but also of multi-bit MDACs. Moreover, multi-bit MDACs are faster than the single-bit MDACs when slewing occurs during the step response. With such an optimization, the response of the switch will be improved by up to 50 %.

R1-16 (Time: 10:50 - 10:52)
TitleA 12-bit 3.7-Msample/s Pipelined A/D Converter Based on the Novel Capacitor Mismatch Calibration Technique
Author*Shuaiqi Wang (Graduate School of Information, Production, and System, Waseda University, Japan), Fule Li ( Institute of Microelectronics,Tsinghua University, China), Yasuaki Inoue (Graduate School of Information, Production, and System, Waseda University, Japan)
Pagepp. 97 - 103
KeywordA/D conversion, pipelined, capacitor mismatch calibration, low power dissipation
AbstractTThis paper proposes a 12-bit 3.7-MS/s pipelined A/D Converter based on the novel capacitor mismatch calibration technique. The conventional stage is improved to an algorithmic circuit involving charge summing, capacitors’ exchange and charge redistribution, simply through introducing some extra switches into the analog circuit. This proposed ADC obtains the linearity beyond the accuracy of the capacitor match and verifies the validity of reducing the nonlinear error from the capacitor mismatch to the second order without additional power dissipation and chip size through the novel capacitor mismatch calibration technique. It is processed in 0.5um CMOS technology. Simulation results show that 71.7dB SNDR, 77.9dB SFDR are obtained for a 2V Vpp 500kHz sine input sampled at 3.7MS/s. The whole power dissipation of this ADC is 33.46mW at the power supply of 5V.


Invited Talk I
Time: 13:25 - 14:10 Monday, October 15, 2007
Location: Conference Hall (2F)
Chair: Takao Onoye (Osaka University, Japan)

I1-1 (Time: 13:25 - 14:10)
TitleReconfigurable Architecture: Challenges and Impacts for Multimedia
AuthorChung-Jr Lian, You Ming Tsao, *Liang-Gee Chen (National Taiwan University, Taiwan)
Pagepp. 107 - 111
AbstractConsumer electronic applications become the driving force for the growth of semiconductor technology. The variety of multimedia applications with wide range real time demands requests different computing architecture. Reconfigurable architectures are the most promising techniques. In this presentation, the design concept of reconfigurable architecture for multimedia applications is introduced. By introducing well-designed reconfigurability into application specific circuits, the design can provide not only good performance in terms of area, speed and power, but also flexibility for different modes, parameters, and fast algorithms. Power-aware concept can therefore also be realized based on reconfigurable architecture. Several design cases will be included in this talk: reconfigurable architecture for MPEG-4 and H.264/AVC, scalable architecture for JPEG 2000, and a video processing unit (VPU) with reconfigurable memory. The software and system optimization issues will also be addressed.


FPGA, Place & Route
Time: 14:10 - 15:50 Monday, October 15, 2007
Location: Conference Hall (2F) & Poster Room (2F)
Chairs: Hung-Ming Chen (National Chiao Tung University, Taiwan), Yasuhiro Takashima (The University of Kitakyushu, Japan)

R2-1 (Time: 14:10 - 14:12)
TitleA BCH Decode Accelerator for Application Specific Processors
Author*Kazuhito Ito (Saitama University, Japan)
Pagepp. 115 - 121
KeywordBCH, accelerator, processor
AbstractThe BCH code is one of popular error correction codes (ECC) and decoding BCH requires many bit oriented operations as well as word oriented operations. A dedicated hardware BCH decoder is less flexible and decoding BCH by base processor consumes many instructions in bit operations and requires large memory area for look-up tables. In this paper, we propose an auxiliary circuit included in application specific pipelined processors which accelerates the BCH decoding process.

R2-2 (Time: 14:12 - 14:14)
TitleDesign and FPGA Implementation of a High-Speed String Matching Engine
Author*Yosuke Kawanaka, Shin'ichi Wakabayashi, Shinobu Nagayama (Hiroshima City University, Japan)
Pagepp. 122 - 129
Keywordstring matching, FPGA, special-purpose hardware, regular expressions
AbstractA high-speed string matching circuit for searching a pattern in a given text is proposed. In the circuit, a pattern is specified by a class of restricted regular expressions. The architecture of the circuit is a one-dimensional array of simple processing units. The proposed circuit was designed with Verilog-HDL, and was implemented using a Xilinx Virtex4 chip.

R2-3 (Time: 14:14 - 14:16)
TitleSpeed Improvement of AES Encryption using Hardware Acclererators Synthesized by C Compatible Architecture Prototyper (CCAP)
Author*Hiroyuki Kanbara (ASTEM RI, Japan), Takayuki Nakatani, Naoto Umehara (Ritsumeikan University, Japan), Nagisa Ishiura (Kwansei Gakuin University, Japan), Hiroyuki Tomiyama (Nagoya University, Japan)
Pagepp. 130 - 134
Keywordhigh level synthesis, Embedded system, Codesign, AES Encryption
AbstractThe authors are developping a high-level synthesizer called C Compatible Architecture Prototyper (CCAP). CCAP compiles ANSI C program which is a part of embedded software and generates an hardware accelerator in HDL. CCAP offers an arbiter circuit which makes it possible for the synthesized accelerator and a cpu to access main memory in parallel. In this paer we report the speed improvement of AES Encryption using CCAP.

R2-4 (Time: 14:16 - 14:18)
TitleA Hybrid Logic Simulator Using LUT Cascade Emulators
Author*Hiroki Nakahara, Tsutomu Sasao, Munehiro Matsuura (Kyushu Institute of Technology, Japan)
Pagepp. 135 - 141
KeywordLUT cascade, Logic simulation, Design Verification
AbstractThis paper presents a hybrid logic simulator using both an event-driven and a cycle-based methods. For special primitives such as memories and tri-state buffers, it uses an event-driven method. For other parts, it uses a cycle-based method using LUT cascade emulators. To simulate a large scale circuit, it partitions the circuit into smaller ones, and realizes each part by an LUT cascade emulator. Next, it combines these emulators by interconnections. Since a multiplier often requires large memories in an LUT cascade, an instruction of the processor is used instead of the LUT cascade. This will reduce the code size and the simulation time. Our experiment shows that proposed method is effective for circuits including arithmetic operations.

R2-5 (Time: 14:18 - 14:20)
TitleStatistical Estimation Method for Verification Coverage Using FPGA-based Emulators
Author*Kohei Hosokawa, Yuichi Nakamura (NEC, Japan), Baku Haraguchi (NEC Micro Systems, Japan)
Pagepp. 142 - 146
KeywordFPGA-based Emulators, Verification Coverage, Toggle Coverage, Statistics, Test-Pattern
AbstractWe propose a new method to quickly estimate toggle coverage as an indicator of verification coverage for a large number of test patterns. The proposed method uses statistical interval estimation theory to reduce the number of signals required to estimate the toggle coverage, which normally requires transition information for all the signals in a circuit. Since this reduction decreases a size of toggle measurement circuits on an FPGA, the toggle coverage can be estimated by an FPGA-based emulator that can operate at speeds in the MHz order, which is roughly 10^4 - 10^5 times faster than HDL simulators. We confirmed by experiment that the average estimation error is within +-1% in actual LSI emulations.

R2-6 (Time: 14:20 - 14:22)
TitleBlockage-Aware Routing Tree Construction with Concurrent Buffer and Flip-Flop Insertion
AuthorShu-Yun Chen (Realtek Semiconductor Corp., Taiwan), *Ting-Chi Wang (National Tsing Hua University, Taiwan)
Pagepp. 147 - 154
KeywordRouting, Buffer/Flip-Flop Insertion, Physical Design
AbstractFor high-frequency designs, concurrent buffer and flip-flop insertion becomes inevitable for interconnect delay optimization. To the best of our knowledge, all existing works perform concurrent buffer and flip-flop insertion on a given routing tree. The given routing tree, however, may greatly limit the effectiveness of concurrent buffer and flip-flop insertion. In this paper, we present a method which simultaneously constructs a routing tree and performs concurrent buffer and flip-flop insertion subject to latency constraints. We also propose four speed-up techniques to further reduce the computation time. The experimental results show that our method has 90% success rate in generating a feasible solution while a sequential method, which separates the tree construction and the concurrent buffer and flip-flop insertion into 2 steps, has only 57% success rate. For the test cases in which both our method and the sequential method can generate feasible solutions, our method has up to 96% chance to produce better solutions.

R2-7 (Time: 14:22 - 14:24)
TitleLow-Power Clock Tree Synthesis by Low-Swing Techniques
AuthorYun-Ta Lin (SpringSoft, Inc., Taiwan), *Hung-Ming Chen (Dept of EE and SoC Research Center, National Chiao Tung University, Taiwan)
Pagepp. 155 - 160
KeywordClock Tree Synthesis, Low Power, Low Swing
AbstractChips running at higher frequency consume much more power. Without carefully planning clock network, the chips will suffer from high power dissipation. In this paper, we present a methodology which can be applied in buffered clock tree synthesis to achieve low power demands and zero-skew constraint. It is based on the low-swing interconnections for the clock signal transmission and the low-swing double-edge triggered flip-flops for synchronizing elements. DME based buffering is applied for reducing the number of buffers inserted as well as wirelength in order to lower power consumption. The experimental results are encouraging. We obtain average 49\% power saving in equivalent clock rate, compared with a previous work based on low-swing interconnection.

R2-8 (Time: 14:24 - 14:26)
TitlePost-Silicon Clock-timing Tuning Based on Statistical Estimation
Author*Yuko Hashizume, Yasuhiro Takashima (The University of Kitakyushu, Japan), Yuichi Nakamura (NEC Corporation, Japan)
Pagepp. 161 - 165
Keyworddeskew, linear programming, PDE
AbstractIn deep-submicron technologies, process variations can severely affect the performance and yield of VLSI chips. As a countermeasure to the variations, post-silicon tuning has been proposed. Deskew, where the clock timing of flip-flops (FFs) is tuned by inserting delay elements into the clock tree is classified into this method. We propose a novel deskew method that decides delay values from measuring a small amount of FFs’ clock timing and estimating the rest of FFs’ clock timings based on a statistical model.

R2-9 (Time: 14:26 - 14:28)
TitleSpeed Enhancement Technique for the Post-fabrication Clock-timing Adjustment of Digital LSIs
Author*Tatsuya Susa (Graduate School of Science, Toho University, Japan), Masahiro Murakawa, Eiichi Takahashi (National Institute of Advanced Industrial Science and Technology, Japan), Tatsumi Furuya (Graduate School of Science, Toho University, Japan), Tetsuya Higuchi (National Institute of Advanced Industrial Science and Technology, Japan), Shinji Furuichi, Yoshitaka Ueda, Atsushi Wada (Sanyo Electric Co., Ltd, Japan)
Pagepp. 166 - 173
Keywordpost-fabrication adjustment, adjustment simulation, process variation, yield, genetic algorithm
AbstractWe propose a speed enhancement technique for post-fabrication clock-timing adjustment to realize practical applications. The method reduces adjustment time by reducing the number of adjustment points by utilizing static timing analysis (STA) results and adopting an improved distribution for the initial GA population. Moreover, we have developed an adjustment simulator to predict adjustment results with the proposed method at the LSI design stage. Adjustment experiments using the developed simulator demonstrate that our method can adjust practical LSIs with 1,031 flipflops within a few seconds.

R2-10 (Time: 14:28 - 14:30)
TitleRepairs for Voltage Drop and Noise Violation in Late Design Stages
AuthorShih-Tsung Huang (AnaGlobe Technology, Taiwan), *Hung-Ming Chen (Dept of EE and SoC Research Center, National Chiao Tung University, Taiwan)
Pagepp. 174 - 178
KeywordDSM, ECO, Voltage Drop, Crosstalk Noise
AbstractSince many second order problems have emerged in deep submicron (DSM) era, some critical functional changes in ECO cause inevitable timing and voltage drop violations. In this paper, we have proposed a methodology to reduce %coupling capacitance and voltage drop and noise violation with minimal design changes, which can be used in ECO or late design stage. It is simple to be plugged it into current design flow, and is efficient so that we can avoid excess timing and voltage drop check iterations and repair the power delivery damage from limited resource in late design stage. We formulate this problem as a longest path problem and fix the violation by using lower metal layer power lines for power compensation. We have integrated this framework with a commercial tool and experimental results show that our methodology can successfully relieve the violations of noise and IR-drop in ECO or late design stage.

R2-11 (Time: 14:30 - 14:32)
TitleEstimation of Yield Enhancement by Critical Path Reconfiguration Utilizing Random Variations on Deep-submicron FPGAs
Author*Yuuri Sugihara, Yohei Kume, Kazutoshi Kobayashi, Hidetoshi Onodera (Kyoto University, Japan)
Pagepp. 179 - 183
KeywordFPGA, variation-aware, yield enhancement
AbstractIn this paper, we estimate yield enhancement by critical path reconfiguration of deep submicron FPGAs which suffer from drastic yield loss due to process variations. Critical path reconfiguration is dedicated to random process variations which are hard to predict. First, an initial configuration for an implemented circuit is applied to all fabricated FPGAs and at-speed test are done. Then failed signal paths are rerouted to different locations. Reroute and at-speed test are repeated several times to enhance yield. Locations of the critical paths are optimized chip by chip incrementally according to chip-oriented random variations. Theoretical analysis is done to verify the effectiveness of critical path reconfiguration compared with multiple configurations according to the number of critical paths in the presense of random variations.

R2-12 (Time: 14:32 - 14:34)
TitleA Mixed Integer Linear Programming Based Approach for Post-Routing Redundant Via Insertion
AuthorKuang-Yao Lee, *Ting-Chi Wang (National Tsing Hua University, Taiwan), Kai-Yuan Chao (Intel Corporation, United States)
Pagepp. 184 - 191
KeywordRedundant via, Physical design, Design for manufacturability
AbstractRedundant via insertion is highly recommended to improve chip yield and reliability. The well-studied double-cut via insertion (DVI) problem allows a single via in a chip to have at most one redundant via inserted next to it, but the solution to this problem is not good enough particularly for high-activity and power nets because those nets typically need more redundant vias to further enhance reliability. This motivates us to study in this paper a new problem, called the multiple-cut via insertion (MVI) problem, in which one redundant via or more can be inserted next to a single via such that the amount of single vias with redundant vias inserted next to them and the amount of inserted redundant vias are both maximized. We formulate the MVI problem as a mixed integer linear programming (MILP) problem. To make the problem tractable, we further break the MILP problem into a set of much smaller MILP problems each of which is solved independently and efficiently without sacrificing the optimality. Besides, we identify that the DVI problem is just a special case of the MVI problem, and therefore our MILP approach can be easily adapted to optimally solve the DVI problem as well. To the best of our knowledge, none of the existing DVI works can guarantee the optimality. The extensive experimental results are provided to support the efficiencies of our MILP approaches on both the MVI and DVI problems.

R2-13 (Time: 14:34 - 14:36)
TitleFast Monotonic Via Assignment Excluding Mold Gates for 2-Layer Ball Grid Array Packages
Author*Yoichi Tomioka, Atsushi Takahashi (Tokyo Institute of Technology, Japan)
Pagepp. 192 - 197
Keywordball grid array, package, monotonic, 2-layer, routing
AbstractBall Grid Array packages in which I/O pins are arranged in a grid array pattern realize a number of connections between chips and a printed circuit board, but it takes much time in manual routing. We propose a fast routing method for 2-layer Ball Grid Array packages to support designers. Our method obtains a via assignment which distributes wires evenly on top layer and has high completion ratio of nets by improving via assignment iteratively.

R2-14 (Time: 14:36 - 14:38)
TitleAn I/O Planning Method for Three-Dimensional Integrated Circuits
Author*Chao-Hung Lu (National Central University, Taiwan), Hung-Ming Chen (National Chiao Tung University, Taiwan), Chien-Nan Jimmy Liu, Wen-Yu Shih (National Central University, Taiwan)
Pagepp. 198 - 202
KeywordI/O, Partition, 3D
Abstract3DIC is an alternative choice when we design a chip because this architecture has high performance and high density properties. In this paper, we propose a partition methodology to solve the problem of I/O assignment and number of 3D-Via in the 3DIC design. The I/O partitioning method is based on the F-M algorithm and the method would consider the total number of 3D-Via and the I/O number for each tier at the same time. Experimental results show that our approach can reduce the number of 3D-Vias while balances the I/O number for each tier. Additionally, our partition result and the floorplan algorithm can be integrated together.

R2-15 (Time: 14:38 - 14:40)
TitleNon-Slicing Floorplanning-Based Crosstalk Reduction on Gridless Track Assignment
Author*Wen-Nai Cheng, Yu-Ning Chang, Yih-Lang Li (National Chiao-Tung University, Taiwan)
Pagepp. 203 - 207
KeywordVLSI design, physical design, Gridless Routing, Track Assignment, Crosstalk minimization
AbstractTrack assignment, which is an intermediate stage between global routing and detailed routing, provides a good platform for promoting performance, and for imposing additional constraints during routing, such as crosstalk. Gridless track assignment (GTA) has not been addressed in public literature. This work develops a gridless crosstalk-driven GTA. Initial assignment is produced rapidly with a left-edge like algorithm. Crosstalk reduction on the assignment is then transformed to a restricted non-slicing floorplanning problem, and a deterministic O-tree based algorithm is employed to re-assign each net segment. Finally, each panel is partitioned into several sub-panels, and the sub-panels are re-ordered using branch and bound algorithm to decrease the crosstalk further. Experimental results demonstrate that the proposed gridless crosstalk-driven GTA has over 80% reduction in the overlapping length of adjacent wires.

R2-16 (Time: 14:40 - 14:42)
TitleFujimaki-Takahashi Squeeze : Linear Time Construction of Constraint Graphs of a Floorplan for a Given Permutation
Author*Ryo Fujimaki, Toshihiko Takahashi (Niigata University, Japan)
Pagepp. 208 - 213
KeywordFloorplan, Representation, Permutation, Constraint graph
AbstractA floorplan is a subdivision of a rectangle into rectangular faces with horizontal and vertical line segments. We call a floorplan room-to-room when adjacency between rooms are considered. Fujimaki and Takahashi showed that any room-to-room floorplan can be represented as a permutation. In this paper, we give an O(n)-time algorithm that constructs the vertical and the horizontal constraint graphs of a floorplan for a given permutation under the representation.

R2-17 (Time: 14:42 - 14:44)
TitlePlacement with Symmetry Constraints for Analog IC Layout Design based on Tree Representation
Author*Natsumi Hirakawa, Kunihiro Fujiyoshi (Tokyo University of Agriculture and Technology, Japan)
Pagepp. 214 - 221
Keywordsymmetry constraints, O-tree
AbstractSymmetry constrains are the constraints that the given cells should be placed symmetrically in design of analog ICs. We use O-tree to represent placements and propose a decoding algorithm which can obtain a closest packing satisfying the constraints. The decoding algorithm uses linear programming, which is time consuming. Therefore we propose a method to judge if there exists a packing corresponding to a given O-tree or not on graph, and use the method before linear programming. The effectiveness of the proposed method was shown by computational experiments.


Invited Talk II
Time: 15:50 - 16:35 Monday, October 15, 2007
Location: Conference Hall (2F)
Chair: Yusuke Matsunaga (Kyushu University, Japan)

I2-1 (Time: 15:50 - 16:35)
TitleWhy Study Quantum Circuits and What They Are Good For
Author*Igor Markov (University of Michigan, United States)
Pagepp. 225 - 230
AbstractAs transistor dimensions approach atomic scale, quantum-mechanical effects such as tunneling and spin become important ingredients in accurate performance models of integrated circuits. Theoretical work in terms of such models suggests that power-density constraints may eventually require a departure from common practices of representing logic 0s and 1s by charges, voltages or currents. Instead, nuclear and electron spins are proposed as primary careers of stationary information, e.g., in the well-publicized demonstration by IBM in 2000, and photon polarizations can transport quantum information over great distances, acting as quantum bits. However, the algebra of quantum bits is radically different from the Boolean algebra that describes modern digital electronics, while such states are succeptible to frequent and unusual types of errors. On the positive side, quantum communication promises an unparalleled level of security and some quantum algorithms solve other-wise intractable problems in polynomial time. Despite many potential applications and several active start-ups in the field, the main obstacle to further progress in quantum information processing is complexity. This is where design automation can lend a helping hand.


Design Methodology for Nanometer Era
Time: 16:35 - 18:15 Monday, October 15, 2007
Location: Conference Hall (2F) & Poster Room (2F)
Chairs: Ting-Chi Wang (National Tsing Hua University, Taiwan), Youhua Shi (Waseda University, Japan)

R3-1 (Time: 16:35 - 16:45)
TitleA Study on Body-Biasing Layout Style Focusing on Area Efficiency and Speed Controllability
Author*Koichi Hamamoto, Hiroshi Fuketa, Masanori Hashimoto, Yukio Mitsuyama, Takao Onoye (Osaka University, Japan)
Pagepp. 233 - 237
Keywordbody bias, forward bias, layout style, speed controllability
AbstractBody-biasing is expected to be a common design technique, and then area efficient implementation in layout has been demanded. Body-biasing outside standard cells is one of possible layouts, but in this case body-bias controllability, especially when forward bias is applied, is a concern. To investigate the controllability, we fabricated a ring oscillator in a 90nm technology, and measured the controllability. Our measurement result and evaluation of area efficiency reveal that body-biased circuits can be implemented with area overhead of less than 1%.

R3-2 (Time: 16:45 - 16:47)
TitleSimulations of Flicker Noise in SiGe HMOS: Body Bias Dependence
Author*C.-Y. Chen, Y. Liu, R. W. Dutton (Stanford University, United States), J. Sato-Iwanaga, A. Inoue, H. Sorada (Matsushita Electric Industrial Co., Ltd, Japan)
Pagepp. 238 - 241
KeywordTCAD, flicker noise, SiGe, p-type hetero-structure MOS (pHMOS), body bias
AbstractAdvanced TCAD simulation capabilities have been developed to investigate flicker noise behavior in p-type SiGe/Si hetero-structure MOS (HMOS) transistors. The numerical model is based on the impedance field method and accounts for the carrier number fluctuation due to trap/de-trap effects and the correlated mobility fluctuation mechanism. Such a device-level simulation approach enables separate treatment of the buried and parasitic surface channels which have different contributions from the mobility fluctuations. Simulations have been conducted to explain experimentally observed strong body-bias dependence of drain current noise in p-HMOS devices. In particular, this dependence is found to be closely correlated with the carrier distribution between the two channels. An improved compact model to account for this body bias dependence of flicker noise in SiGe pHMOS devices is also presented in this paper.

R3-3 (Time: 16:47 - 16:49)
TitleActive Body-Biasing Control on PD-SOI for Dual Supply Voltage Scheme
Author*Yosuke Torii, Kenji Hamada, Kayoko Seto, Masaaki Iijima, Masahiro Numa (Kobe University, Japan), Akira Tada, Takashi Ipposhi (Renesas Technology Corporation, Japan)
Pagepp. 242 - 245
Keywordlow power, active body-bias, dual supply voltage, PD-SOI
AbstractThe dual supply voltage scheme reduces the power consumption without performance degradation by using two power supply rails. However, an increase in the delay has made assigning the lower supply voltage more difficult in the conventional dual-VDD scheme under low supply voltage. We propose a technique for dual-VDD scheme employing the Active Body-biasing Control on PD-SOI, which increases the number of VDDL-cells by lowering threshold voltage. Simulation results have shown our approach reduces the power consumption at low voltage operation.

R3-4 (Time: 16:49 - 16:51)
TitleA Look-Ahead Active Body-Biasing Scheme for SOI-SRAM with Dynamic VDDM Control
Author*Kayoko Seto, Yosuke Torii, Masaaki Iijima, Masahiro Numa (Kobe University, Japan), Akira Tada, Takashi Ipposhi (Renesas Technology Corporation, Japan)
Pagepp. 246 - 249
KeywordPD-SOI, body-bias, SRAM, low power design
AbstractInstability of SRAM memory cells derived from aggressive technology scaling has become one of the most significant issues. Although lowering the supply voltage for a memory cell (VDDM) improves a write margin, which increases the access time. In this paper, we propose a memory cell employing a Look-ahead Active Body-biasing (LAB) scheme for SOI-SRAM with dynamic VDDM control. Simulation results have shown that the proposed SRAM cell shortens the access time by 54 % in the write mode.

R3-5 (Time: 16:51 - 16:53)
TitleA Study on Variation-Component Decomposition using Polynomial Smoothing Function
Author*Takashi Sato, Hiroyuki Ueyama, Noriaki Nakayama, Kazuya Masu (Tokyo Institute of Technology, Japan)
Pagepp. 250 - 255
Keyworddevice variation, systematic, random, goodness of fit, AIC
AbstractA procedure that decomposes parametric device variation into systematic and random components of the device variation is studied. Regarding the decomposition process as obtaining a smooth regression function, polynomial model is used to describe the systematic variation and the residue is considered as random variation. In a proposed flow, required order of regression function is determined adaptively, using a statistical index called AICc. The impact of polynomial order selection on variation competition is also discussed through numerical experiments using measured data.

R3-6 (Time: 16:53 - 16:55)
TitleEffect of Dummy Fills on High Frequency Characteristics of Spiral Inductor
Author*Akira Tsuchiya, Hidetoshi Onodera (Kyoto University, Japan)
Pagepp. 256 - 260
Keywordspiral inductor, dummy fill
AbstractThis paper discusses the effect of CMP dummy fills on spiral inductors. Conventionally the effect of dummy fills are discussed from the viewpoint of the capacitance. However in high frequency above 10GHz, the dummy fills affect the resistance and the inductance of the wire. We evaluate the effect of dummy fills by 3D field-solver. Experimental results shows that the Q-factor decreases by 20\% due to the loss in dummy fills.

R3-7 (Time: 16:55 - 16:57)
TitleStatic-Noise-Margin Analysis of Major SRAM-Cell Types Including Production Variations for a 90nm CMOS Process
Author*Shinya Izumi, Koh Johguchi, Hans Jüergen Mattausch, Tetsushi Koide (Hiroshima University, Japan)
Pagepp. 261 - 265
KeywordSRAM, SNM, variation, robust
AbstractHere we report a comparative study of the effect of the Vth variation on the major SRAM-cell types in a 90 nm CMOS process, namely the conventional 1-port cell with 6-transistors, the 8- transistor cell with separate read and write port, the static noise margin (SNM) free 7-transistor cell, and the loadless 4-transistor cell. While 4Tr-SRAM and 6Tr-SRAM cannot keep enough reliability at worst case, 8Tr-SRAM and 7Tr-SRAM can keep it at worst case. At low operation voltage, 8Tr-SRAM has higher reliability than 7Tr-SRAM.

R3-8 (Time: 16:57 - 16:59)
TitleActive Mode Leakage Power Reduction Based on the Controlling Value of Logic Gates
Author*Lei Chen, Shinji Kimura (The Graduate School of Information, Production and Systems, Waseda University, Japan)
Pagepp. 266 - 271
KeywordMTCMOS, Leakage Power, Controllability
AbstractLeakage power dissipation becomes an important issue as technology scaling of LSI process. In this paper, we propose a novel control method of Multi-Threshold CMOS (MTCMOS) technology based on the controllability of logic gates. The controlling value of a logic gate can stop the power of the blocks connected to other inputs of the gate. Based on the idea, we can control the power dynamically. This paper discusses methods to construct and control power blocks from gate level circuit. A power optimization idea is also introduced. The effect of the proposed method is shown on several standard benchmark circuits.

R3-9 (Time: 16:59 - 17:01)
TitleStructural Robustness of Datapaths against Delay-Variation
Author*Keisuke Inoue, Mineo Kaneko, Tsuyoshi Iwagaki (Japan Advanced Institute of Science and Technology, Japan)
Pagepp. 272 - 279
KeywordHigh-Level Synthesis, Delay Variation, Register Assignment
AbstractAs the feature size of VLSI becomes smaller, delay variations become a serious problem in VLSI design. In this paper, we propose a novel class of robustness for a datapath against delay variations, which is named structural robustness against delay-variation (SRV), and propose sufficient conditions for a datapath to have SRV. A resultant circuit designed based on these conditions has a larger timing margin to delay variations than previous designs without sacrificing effective computation time. In addition, under any degree of delay variations, we can always find an available clock frequency for a datapath having SRV property to operate correctly, which could be a preferable characteristic in IP-based design.

R3-10 (Time: 17:01 - 17:03)
TitleCritical Issues Regarding A Variation Resilient Flip-Flop
AuthorToshinori Sato (Kyushu University, Japan), *Yuji Kunitake (Kyushu Institute of Technology, Japan)
Pagepp. 280 - 286
Keywordvariations, low-power, DVS, Razor, microprocessors
AbstractRazor flip-flop (FF) is a clever technique to eliminate the supply voltage margin by exploiting circuit-level timing speculation. It combines dynamic voltage scaling technique with the error detection and recovery mechanism. This paper presents an improvement of Razor FF in removing delayed clock, which complicates timing design. It is named canary FF. This paper discusses critical issues regarding the canary FF. When the issues were solved, the canary FF would achieve 10% of power reduction by exploiting input value variations.

R3-11 (Time: 17:03 - 17:05)
TitleA Case Study of Multi-processor Design with Asynchronous Interconnect using Synchronous Design Tools
Author*Katsunori Tanaka, Yuichi Nakamura, Atsushi Atarashi (System IP Core Research Labs., NEC Corporation, Japan)
Pagepp. 287 - 293
KeywordGALS, design methodology
AbstractThis paper shows a case study of multi-processor design with synchronous interconnect based on QDI (Quasi Delay Insensitive) model using synchronous design tools for GALS (Globally Asynchronous, Locally Synchronous) architecture. In the design flow, we set specific design constraints to apply design tools for clocked circuits to the asynchronous interconnect as well. By applying the flow through placement and routing to an experimental design of a GALS system consisting of four clocked processors and a data memory with a clockless interconnect based on QDI model, we proved that it can produce a GALS system working correctly. We also show experimental results of a preliminary version of the experimental design.

R3-12 (Time: 17:05 - 17:07)
TitleAn Asynchronous Single-precision Floating-point Divider and its Implementation on FPGA
Author*Masayuki Hiromoto, Shin'ichi Kouyama, Hiroyuki Ochi (Kyoto University, Japan), Yukihiro Nakamura (Ritsumeikan University, Japan)
Pagepp. 294 - 301
KeywordIP reusability, IEEE754, low power design, digit-recurrence divider
AbstractSynchronous design methodology is widely used for today's digital circuits. However, it is difficult to reuse a highly-optimized synchronous module for a specific clock frequency to other systems with different global clocks, because logic depth between FFs should be tailored for the clock frequency. In this paper, we focus on asynchronous design, in which each module works at its best performance, and apply it to an IEEE754-standard single-precision floating-point divider. In our divider, a mantissa divider is driven by a high-speed local clock and connected to pre-/post-processing modules with asynchronous interface. Our divider is ready to be built into a system with arbitrary clock frequency and achieves its peak performance and area- and power-efficiency. This paper also reports an implementation result of the proposed divider on a Xilinx FPGA.

R3-13 (Time: 17:07 - 17:09)
TitleFull-Chip Thermal Analysis via Generalized Integral Transforms
Author*Pei-Yu Haung, Chih-Kang Lin, Yu-Min Lee (National Chiao Tung University, Taiwan)
Pagepp. 302 - 309
KeywordThermal analysis, generalized integral transforms
AbstractThis paper presents an accurate and fast analytical full-chip thermal simulator for the early-stage temperature-aware chip design. By using the technique of generalized integral transforms (GIT), our proposed method can accurately estimate the temperature distribution of full-chip with very small truncation points of bases in the spatial domain. We also develop a fast Fourier transform (FFT) like evaluating algorithm to efficiently evaluate the temperature distribution. Experimental results confirm that our GIT based analyzer can achieve an order of magnitude speedup compared with a highly efficient Green’s function based method.

R3-14 (Time: 17:09 - 17:11)
TitleA Power Grid Optimization Algorithm by Direct Observation of Timing Error Risk Reduction
Author*Makoto Terao, Kenji Kusano, Yoshiyuki Kawakami (Graduate School of Science and Engineering, Ritsumeikan University, Japan), Masahiro Fukui (Dept. of VLSI System Design, Ritsumeikan University, Japan), Shuji Tsukiyama (Dept. of EECE, Chuo University, Japan)
Pagepp. 310 - 315
Keyworddelay analysis, dispersion, power and ground routing optimization, IR-drop, electro-migration
AbstractWith the advent of super deep submicron age, the circuit behavior has large variation according to the process variation. Power grid optimization which considers the timing error risk caused by the variation becomes very important for the stable and fast operation of the system. This paper proposes an approach which uses the “timing error risk caused by the IR drop” as its direct objective function. Experimental results shows the effectivity.

R3-15 (Time: 17:11 - 17:13)
TitleA High-level Power Grid Optimization Algorithm by Direct Observation of Manufacturing Cost Reduction
Author*Takayuki Hayashi, Hironobu Ishijima, Yoshiyuki Kawakami, Masahiro Fukui (Ritsumeikan University, Japan)
Pagepp. 316 - 321
Keywordfloor-plan, optimization, Cost, decoupling capacitor
AbstractRecent rapid growth of the narrow and fine patterning technology faces many difficulties of power grid design. The insertions of the decoupling capacitor cause the increase of size of the blocks in the chip. It is hard to analyze the trade-off after the detail placement and routing optimization. Authors propose an approach to do the optimization in the phase of floorplanning and deals with trade-off analysis between the chip cost by area increase and stabilization of circuit behavior.

R3-16 (Time: 17:13 - 17:15)
TitleAn Evaluation of Circuit Simulation Algorithms for Hardware Implementation
Author*Taiki Hashizume, Hironobu Ishijima, Masahiro Fukui (Ritsumeikan University, Japan)
Pagepp. 322 - 327
Keywordcircuit simulation, Euler method, Runge-Kutta method, hardware, fixed point
AbstractIn super deep submicron technology, a very large sized system on one LSI chip is constructed. Therefore, the circuit size becomes larger, and we need lots of time for the circuit simulation. Reducing the simulation time is indispensable for larger sized circuit design. We have proposed a high-speed circuit simulation for power supply network by hardware algorithm. The most adequate numerical analysis for hardware algorithm is specified in this paper.



Tuesday, October 16, 2007

Invited Talk III
Time: 9:00 - 9:45 Tuesday, October 16, 2007
Location: Conference Hall (2F)
Chair: Masahiro Fujita (University of Tokyo, Japan)

I3-1 (Time: 9:00 - 9:45)
TitleDynamic Analysis of Concurrent Systems
Author*Gul Agha (University of Illinois at Urbana-Champaign, United States)
Pagepp. 331 - 334
AbstractDespite considerable progress in model checking techniques, large concurrent systems have more states than can be effectively model checked. Other verification techniques, such as theorem proving, require significant human expertise. The talk will present research in three techniques we have developed at Illinois to reason about concurrent systems: concolic testing, predictive monitoring, and learning based verification. Concolic testing of concurrent systems improves the efficiency of testing by using symbolic testing and partial order reduction to guide testing. Random values are used to simplify infeasible constraints, thus maintaining soundness. Predictive monitoring improves the efficiency of testing by using observed traces to predict other traces that may occur. Computation learning based verification uses learning to reach fixed points rather than explore the entire state space. I will illustrate these techniques by means of examples from software, and discuss their benefits and limitations.


System Level Design & Logic Synthesis
Time: 9:45 - 11:30 Tuesday, October 16, 2007
Location: Conference Hall (2F) & Poster Room (2F)
Chairs: Bernard Courtois (CMP, France), Yukio Mitsuyama (Osaka University, Japan)

R4-1 (Time: 9:45 - 9:47)
TitleAn Object-Oriented Circuit Design Method and Its Evaluation
Author*Seigo Masuoka, Hiroyuki Terai, Manabu Koyama (Kinki University, Japan), Kazuhiko Nakahara (Spansion Japan Corporation, Japan), Akihisa Yamada (Sharp Corporation, Japan), Takashi Kambe (Kinki University, Japan)
Pagepp. 337 - 342
KeywordObject-Oriented Design, Java, Hardware-software co-design, JPEG decoder, Bach system
Abstract Hardware-software System LSI solutions have increased in popularity in a variety of design domains because these systems provide both high performance and flexibility. The language used to describe the System LSI is critical in a co-design methodology because it is used in both the hardware-software design process and functional validation. Java is a general-purpose, concurrent, object-oriented, platform-independent programming language and is often used in the field of embedded system design for applications such as mobile phones. In this paper we describe the Jackal language, which is an extension of Java for hardware design and propose an object-oriented circuit design methodology based on Jackal. This methodology is applied to the design of a JPEG encoder and its performance is evaluated.

R4-2 (Time: 9:47 - 9:49)
TitleObject Oriented Design and Synthesis of Communication in Hardware-/Software Systems with OSSS
Author*Kim Grüttner, Cornelia Grabbe, Frank Oppenheimer (OFFIS - Institute for Information Technology, Germany), Wolfgang Nebel (Carl v. Ossietzky University Oldenburg, Germany)
Pagepp. 343 - 350
Keywordhw/sw co-design, high-level synthesis, communication synthesis, object oriented design, systemc
AbstractIn this paper we propose an object oriented hardware/software co-design methodology for embedded system design. The use of object-oriented techniques combined with template meta-programming during system level design facilitates the designer in writing faster, better and more reusable executable models of the specified system. One of the major challenges in system level design lies is the automatic or guided refinement process from the specification down to the implementation on a certain target platform. The contribution of this paper is a seamless communication refinement from a method based communication between active and passive objects to a signal base synthesisable communication through buses or point-to-point channels. The proposed methodology retains the separation of communication and behaviour and therefore enables an easy communication architecture exploration. To achieve this we have implemented a remote method invocation mechanism that can be used in conjunction with synthesisable channels. The applicability of our approach is shown with an IPv4 router design.

R4-3 (Time: 9:49 - 9:51)
TitleA Data Arrangement Method for Block Floating Point Systems
Author*Takashi Hamabe, Keishi Sakanushi, Yoshinori Takeuchi, Masaharu Imai (Osaka University, Japan)
Pagepp. 351 - 356
KeywordBlock floating point, Data arrangement, Memory size
AbstractBlock floating point representation is a representation of real number that provides accurate arithmetic with small hardware cost. This research proposes a data arrangement method for block floating point systems considering data memory size. Our method intends to minimizes data memory size by grouping real number data which have close absolute value with an algorithm based on the Kernighan and Lin algorithm.

R4-4 (Time: 9:51 - 9:53)
TitleCalling Software Functions from Hardware Functions in High-Level Synthesizer CCAP
Author*Masanari Nishimura, Nagisa Ishiura, Yoshiyuki Ishimori (Kwansei Gakuin University, Japan), Hiroyuki Kanbara (ASTEM RI, Japan), Hiroyuki Tomiyama (Nagoya University, Japan)
Pagepp. 357 - 360
Keywordhigh-level synthesis, CCAP, hardware/software co-design, C-based design
AbstractWe are developing a high-level synthesizer named CCAP (C Compatible Architecture Prototyper), which synthesizes functions in C programs into hardware modules which are callable from the other software functions. In this paper, we propose a novel framework in which the synthesized hardware functions can also call software functions. We give both multi-thread and single-thread implementation schemes. We verified the correctness of the proposed method (single-thread version) through register transfer level simulation.

R4-5 (Time: 9:53 - 9:55)
TitlePerformance-Aware Communication Architecture Synthesis
Author*Alexander Viehl, Oliver Bringmann (FZI Forschungszentrum Informatik, Germany), Wolfgang Rosenstiel (Universität Tübingen, Germany)
Pagepp. 361 - 368
KeywordCommunication Architecture, Synthesis, Performance, Real-Time
AbstractIn this paper, a novel approach for communication architecture synthesis to guarantee conflict-free communication access in real-time critical systems is proposed. Our approach is based on the analysis of the temporal relation of communicating processes and the determination of communication instances that synchronize them. Based on these communication instances, the global system timing behavior is determined to identify potentially parallel communication instances. Based on the result of this analysis, an algorithm for determining a guaranteed conflict free communication schedule is proposed. This schedule can be used to synthesize communication controllers that realize resource allocation and guaranteed conflict-free binding of communication instances. Additionally, the inclusion of high-level communication protocols in the synthesis approach is discussed. Moreover, improvements on timing analysis are proposed with the objective of reducing the necessary amount of communication resources.

R4-6 (Time: 9:55 - 9:57)
TitleA Network Processor Synthesis System for Task-Chaining Network Applications
Author*Youhua Shi, Keishi Nakayama, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki (Waseda University, Japan)
Pagepp. 369 - 374
Keywordnetwork processor, synthesis, task-chaining
AbstractWith the rapid development of network technology, the need to design a network equipment while to offer the speed, flexibility, and ease-of-use to accelerate time-to-market has emerged. To meet this challenge, in this paper first we presented a network processor model and then based on the model we proposed a network processor synthesis system for task-chaining network applications. Unlike previous works, the proposed method has the feature of sharing the communication resources. Experimental results have shown the importance of conducting the reduction in shared resource contention and also shown that, using the proposed NP synthesis system, how we can find the optimized network processor configurations in terms of performance and area to meet the designer's requirements.

R4-7 (Time: 9:57 - 9:59)
TitleResynthesis Method for Circuit Acceleration on LUT-based FPGA
Author*Weijie Xing (Graduate School of Information, Production and Systems, Waseda University, Japan), Takashi Horiyama (Saitama University, Japan), Shunichi Kuromaru, Tomoo Kimura (Matsushita Electric Industrial Co., Ltd, Japan), Shinji Kimura (Graduate School of Information, Production and Systems, Waseda University, Japan)
Pagepp. 375 - 380
KeywordVerification, acceleration, FPGA, false path
AbstractDesign verification becomes most time consuming part in the design period, and the reduction is important. In the paper, we focus on the acceleration of emulation circuits, and propose a systematic method to reduce the delay time of combinational circuits called 0&1 skip method. The proposed method is simpler compared to the existing method. We apply the 0&1 skip method for the acceleration of circuits on LUT-Based FPGA

R4-8 (Time: 9:59 - 10:01)
TitleSAT Based Boolean Matching for Incompletely Specified Functions
Author*Kuo-Hua Wang, Chung-Ming Chan (Fu Jen Catholic University, Taiwan)
Pagepp. 381 - 388
KeywordBoolean Matching, Boolean Satisfiability, Functional Symmetry, Signature
AbstractBoolean matching is to check the equivalence of two functions under input permutation and input/output phase assignments. In this paper, we will transform the Boolean matching problem to the Boolean satisfiability problem. Based on this transformation approach, a SAT-based matching algorithm will be proposed. Our algorithm can not only handle completely specified functions but also incompletely specified functions. Moreover, two signatures exploiting functional symmetries will be provided to reduce the size of SAT instance and thus expedite the matching process. Experimental results on a set of benchmarking circuits show that our matching algorithm is indeed very effective and efficient to solve the Boolean matching problem. Compared with our prior work on Boolean matching [30], our SAT-based matching algorithm outperforms the old algorithm by several orders of magnitude for many large circuits.

R4-9 (Time: 10:01 - 10:03)
TitleAn Error Diagnosis Technique Based on Specifications with Don't Cares
Author*Narumi Okada, Takayuki Iida, Toshiro Ishihara, Nobutaka Kuroki, Masahiro Numa (Kobe University, Japan)
Pagepp. 389 - 396
Keyworderror diagnosis, don't cares, ECO, incremental synthesis, design error
AbstractWe present an error diagnosis technique for subcircuits based on specifications with don't cares. This technique combines two procedures for reducing the number of error candidates, screening for false error locations based on the specification defined with nine signal values for incorporating don’t cares, and and a Boolean function manipulation using characteristic function indicating don’t care input vectors for each primary output. Experimental results have shown that the proposed approach is effective to increase the number of solutions by incorporating don’t cares.

R4-10 (Time: 10:03 - 10:05)
TitleAn LUT-Based Error Diagnosis Technique Extended for Multiple Missing Line Errors Based on Iterative Diagnosis Procedure
Author*Toshiro Ishihara, Ryosuke Arai, Narumi Okada, Nobutaka Kuroki, Masahiro Numa (Kobe University, Japan)
Pagepp. 397 - 404
Keywordincremental synthesis, error diagnosis, missing line error, iterative procedure
AbstractIn this paper, we propose an improved technique to rectify multiple logic design errors including multiple missing line errors in LUT-based combinational circuits. A conventional error diagnosis technique: EXL_SL can rectify only a single missing line error at a time. Our technique can rectify multiple missing line errors by employing iterative diagnosis procedure for subcircuits. Experimental results for ISCAS’85 benchmark circuits demonstrate that 79.0% of circuits including one to three missing line errors can be rectified successfully.

R4-11 (Time: 10:05 - 10:07)
TitleMixed-Abstraction Level Co-Simulation Environment for Dynamically Reconfigurable Processor Arrays
Author*Satoshi Tsutsumi, Yohei Hasegawa, Hideharu Amano (Keio University, Japan)
Pagepp. 405 - 411
KeywordCo-simulation, System level design, Dynamically reconfigurable processor, SystemC, Compiler
AbstractIn this paper, we present an automated design methodology and a design framework including System Generator, DRPA Generator, and DRPA Compiler for dynamically reconfigurable processor arrays (DRPAs). We have developed a System Generator which can generate a DRPA model written in SystemC and an interface wrapper using Verilog Procedural Interface (VPI) from application codes and a architecture description. We have integrated it to the tentative compiler based on COINS, and constructed a mixed-abstraction level co-simulation environment.

R4-12 (Time: 10:07 - 10:09)
TitleBlack-Diamond: a Retargetable Compiler using Graph with Configuration Bits for Dynamically Reconfigurable Architectures
Author*Vasutan Tunbunheng, Hideharu Amano (Keio University, Japan)
Pagepp. 412 - 419
Keyworddynamically reconfigurable processor, retargetable compiler, placement and routing, multicontext
AbstractFor developing design envionment for various types of Dynamically Reconfigurable Processor Arrays (DRPAs), the GCI (Graph with Configuration Information) is proposed to represent configurable resource in the target dynamically reconfigurable architecture. The function unit, constant unit, register, and routing resource can be represented in the graph as well as the configuration information. The restriction in the hardware is added in the graph by using ``DisCounT'' port which is limited the possible configuration bits at the port controlled by the other ports. A prototype compiler called Black-Diamond with GCI is now available for three different DRPAs. It translates data-flow graph from C-like front-end description, applies placement and routing by using the GCI, and generates configuration data for each element of the DRPA in the form of multicasting. Implementation results of simple applications show that Black-Diamond can generate reasonable designs for three different architectures.

R4-13 (Time: 10:09 - 10:11)
TitleA Reconfigurable Architecture with Special Functions for Shift Keying
Author*Ayataka Kobayashi, Ittetsu Taniguchi, Keishi Sakanushi, Yoshinori Takeuchi, Masaharu Imai (Osaka University, Japan)
Pagepp. 420 - 426
KeywordReconfigurable Architecture, shift keying
Abstract This paper proposes a reconfigurable architecture for shift keying named RASK. RASK has a specialized ALU with specific functions and specialized processing elements for shift keying. Experimental results show that the proposed architecture achieves several shift keyings with small area compared to a reconfigurable architecture without specialized ALU.

R4-14 (Time: 10:11 - 10:13)
TitleTopology Generation and Floorplanning for Low Power Application-Specific Network-on-Chips
Author*Wan-Yu Lee, Iris Hui-Ru Jiang (Department of Electronics Engineering and Institute of Electronics, National Chiao Tung University, Taiwan)
Pagepp. 427 - 432
KeywordNetwork-on-Chips, Low Power
AbstractAs the process advances into nanotechnology, the number of cores and the amount of communication on a chip are rapidly increasing. Using a micro-network, Network-on-Chip can overcome the communication inefficiency in the traditional shared bus communication architecture. The system performance of application-specific Network-on-Chips is mostly measured by power, timing, and area. Power and timing highly depend on how the network topology connects routers and cores and how many routers are used; area is simply determined by floorplanning. Unlike previous endeavors, we propose a new methodology to perform network topology generation before floorplanning. Moreover, our method can preserve the optimality of topology to floorplan. Our method not only minimizes power, satisfies timing and area constraints, but also guarantees deadlock free. Compared with previous work, the results show using the same number of routers, this approach can achieve competitive power consumption and have the above guarantees.

R4-15 (Time: 10:13 - 10:15)
TitleFloorplan-Aware Design Methodology for Application-Specific Bus Matrix Systems
Author*Geeng-Wei Lee, Juinn-Dar Huang, Jing-Yang Jou (National Chiao Tung University, Taiwan)
Pagepp. 433 - 438
Keywordbus matrix, floorplan, multi-cycle communication, communication architecture
AbstractThe design of communication architectures becomes more and more important as modern systems require wider and wider communication bandwidth and the technology keeps the trend of miniaturization. Simultaneously considering the issues of hardware cost, system performance, and multi-cycle communication makes designing communication architectures even harder. In this paper, we propose a floorplan-aware design methodology for designing the bus matrix consisting of the minimum number of buses for a given system under the performance constraints and the assumption of multi-cycle communication.

R4-16 (Time: 10:15 - 10:17)
TitleLow Power Object Oriented Synthesis for Electronic System-Level Design
Author*Mehdi Kamal, Shaahin Hessabi (Sharif University of Technology, Iran)
Pagepp. 439 - 444
KeywordObject Oriented, Synthesis, Low Power, System Level
AbstractEnergy and power consumptions are becoming among the most important design factors due to portable device usage. Low power techniques are widely used in low level of design; similarly using this technique in system-level design is inevitable. In this paper, we use two techniques for low power synthesis of an object oriented (OO) system. We implement our proposed techniques in an OO synthesis tool, named ODYSSEY. We have added module-level clock gating and reduced the number of object's data accesses during synthesis and studied the power reduction of these two techniques. Clock gating part controls the clock during the system work and dynamically manages the power. Each class of design needs its data, so methods for must access a shared memory. Therefore, decreasing the access number reduces the power dissipation in interconnection network and improves the performance of system. we implemented this technique in algorithm-level. For evaluating the proposed techniques, we have considered JpegDecoder, JpegEncoder and Genetic Algorithm benchmarks. Experiments show that the clock gating technique reduces power dissipation about 45%. Decreasing the number of object's data accesses reduces power and improves the performance of system.


Invited Talk IV
Time: 11:30 - 12:15 Tuesday, October 16, 2007
Location: Conference Hall (2F)
Chair: Hidetoshi Onodera (Kyoto University, Japan)

I4-1 (Time: 11:30 - 12:15)
TitleStatistical Techniques to Combat Variability and Achieve Robust Design
Author*Chandu Visweswariah (IBM T. J. Watson Research Center, United States)
Pagep. 447
AbstractVariability due to manufacturing, environmental and aging uncertainties constitutes one of the major challenges in continuing CMOS scaling. Worst-case design is simply not feasible any more. This presentation will describe how statistical timing techniques can be used to reduce pessimism, achieve full-chip and full-process coverage, and enable robust design practices. A practical ASIC timing methodology based on statistical timing will be described. Model-to-hardware correlation, at-speed test and robust optimization techniques will be presented. Key research initiatives that were required to achieve such a design flow will be described.


Invited Talk V
Time: 13:30 - 14:15 Tuesday, October 16, 2007
Location: Conference Hall (2F)
Chair: Shin'ichi Wakabayashi (Hiroshima City University, Japan)

I5-1 (Time: 13:30 - 14:15)
TitleCurrent Status of LSI Micro-Fabrication and Future Prospect for 3D System and Design Integration
Author*Kazuya Okamoto (Osaka University, Japan)
Pagepp. 451 - 457
AbstractMiniaturization technology based on Dennard's rule for LSI has been technically progressing throughout the years and it has conferred a benefit on human's life. Optical lithography has an amazing progress so far with achieving high resolution at 90nm or less using various kinds of technologies. However, there is a low probability that this scenario of producing ever-finer feature geometry will continue, because resolution capabilities will soon reach a critical limit due to CMOS performance threshold and chip economy. Therefore, to assure continued performance improvements for the future of LSI devices, next generation interconnect and advanced packaging technologies should acquire importance. Especially, a new 3 dimensional (3D) monolithic integration would be an integral part of this technology. At the same time, the definition of the semiconductor device should be updated into "System&Design Integration (SDI)." SDI will provide the needed feedback to launch a new field of clear applications based on a total system solution with innovated equipments of design, fabrication, inspection and evaluation. 3D-LSI and SDI will have a tremendous impact on the future electronics industries.


Design Verification & Design Experience II
Time: 14:15 - 16:00 Tuesday, October 16, 2007
Location: Conference Hall (2F) & Poster Room (2F)
Chairs: Chien-Nan Liu (National Central University, Taiwan), Qiang Zhu (Cadence Design Systems, Japan)

R5-1 (Time: 14:15 - 14:17)
TitleFormal Representation and Verification of Arithmetic Circuits Using Symbolic Computer Algebra
Author*Yuki Watanabe, Naofumi Homma, Takafumi Aoki (Tohoku University, Japan), Tatsuo Higuchi (Tohoku Institute of Technology, Japan)
Pagepp. 461 - 468
Keyworddatapath, arithmetic circuit, formal verification, computer algebra
AbstractThis paper presents an application of symbolic computer algebra to arithmetic circuit design. Our method represents an arithmetic circuit as a hierarchical graph, which consists of high-level mathematical objects based on weighted number systems and arithmetic formulae. We can verify the function of such circuit representation by polynomial reduction techniques using Groebner Bases as well as the conventional *BMD (multiplicative Binary Moment Diagram) techniques. In this paper, we investigate the basic characteristics of the proposed representation and verification through some case studies such as parallel multiplier and BCD (Binary-Coded Decimal) adder. The result shows that the proposed approach succeeded in verifying some arithmetic circuits where the conventional approaches failed.

R5-2 (Time: 14:17 - 14:19)
TitleRange Equivalent Circuit Minimization
Author*Yung-Chih Chen, Chun-Yao Wang (National Tsing Hua University, Taiwan)
Pagepp. 469 - 476
Keywordrange redundant primary input, range-preserving simplification
AbstractSimplifying a combinational circuit while preserving its range has a variety of applications, such as combinational equivalence checking and random simulation. Previous approaches use BDD technique to compute the range of one circuit, and then reconstruct the circuit with the computed range. Although the size of the new circuit is significantly reduced due to the range rearrangement, these methods suffer from the BDD blowup problem for large circuits. Thus, in this paper, we propose a new method to simplify combinational circuits without explicit range computation. We first introduce a new concept of range stuck-at fault test, and show that an untestable range stuck-at fault on a primary input indicates this primary input is range redundant (not responsible for the circuit’s range). We then present a procedure to determine if a given range stuck-at fault on a primary input is untestable. Our method iteratively identifies and removes range redundant primary inputs to simplify a combinational circuit without performing range computation. Accordingly, large circuits that BDD-based methods cannot deal with can be handled. We conduct experiments on a set of ISCAS’85 and MCNC benchmarks. The experimental results show that our approach can minimize circuits such that less number of primary inputs are left. The ratio of our approach and a previous non-BDD-based method over the reduced number of primary inputs is 1.57 on average.

R5-3 (Time: 14:19 - 14:21)
TitlePredictive Test Strategy for CMOS RF Mixers
Author*Kay Suenaga, Rodrigo Picos, Sebastia Bota, Miquel Roca, Eugeni Isern, Eugeni Garcia-Moreno (University of Balearic Islands, Spain)
Pagepp. 477 - 483
KeywordCMOS, RF Mixer, Predictive Test, RF Test
AbstractAbstract - In this paper, we present two built-in self-test strategies for the down-converter stage in a GSM receiver. These strategies are based on estimating its performance parameters from measurements in test mode. By using some receiver blocks as part of the test set-up and reusing it, the circuitry overhead is kept small. The first strategy uses the LO signal as the only test stimuli. The second strategy uses additional test circuitry, a generator and an auxiliary mixer. Prediction accuracies are similar in both strategies, but the second one simplifies the measure process of the test observables.

R5-4 (Time: 14:21 - 14:23)
TitleUnifying AMBA based Verification Environment at SystemC / RTL / FPGA Levels: Using 3D Graphics SoC As an Example
Author*Wei-Sheng Huang, Ruei-Ting Gu, Ing-Jer Huang (National Sun Yat-Sen University, Taiwan)
Pagepp. 484 - 487
Keywordtest-pattern, auto regrssion test, unify, verification environment
AbstractThis paper presents an AMBA-based mutual-verification environment that unifies the different level of verification environment. It makes the test-patterns reuse in different verification environment and regression test automation easier. In addition, mutual-verification environment can reduce the verification efforts because the level of verification is raised from cycle-level to program-level. In modern complex IC design, make the verification more efficient could reduce lots of costs and gain a better verification quality dramatically.

R5-5 (Time: 14:23 - 14:25)
TitleHardware/Software Covalidation with FPGA and RTOS Model
Author*Seiya Shibata, Shinya Honda, Yuko Hara, Hiroyuki Tomiyama, Hiroaki Takada (Nagoya University, Japan)
Pagepp. 488 - 494
KeywordCovalidation, FPGA, RTOS, Embedded Systems
AbstractThis paper presents a hardware/software covalidation environment for embedded systems. Our covalidation environment consists of a software simulator which simulates a set of application tasks together with an RTOS running on a processor, multiple hardware simulators, FPGA emulators and a covalidation backplane. For shortening validation time, our covalidation environment uses fast RTOS simulation model for software and FPGA for hardware. Using the covalidation environment, we successfully performed covalidation of an MPEG4 decoder system.

R5-6 (Time: 14:25 - 14:27)
TitlePipeline-Aware Instruction-Level Power Analysis for VLIW DSP Core
AuthorWen-Tsan Hsieh, Hsin-Ying Liao, *Chien-Nan Jimmy Liu (National Central University, Taiwan), Shu-Yu Cheng, Ji-Jan Chen (SOC Technology Center of Industrial Technological Research Institute, Taiwan)
Pagepp. 495 - 499
Keywordsoftware power model, instruction level power analysis, power model, pipline-aware, VLIW
AbstractIn this work, we develop a new instruction-level power analysis approach for pipelined VLIW DSP cores. The proposed approach can take care of both the base power cost and inter-instruction effect cost in each pipeline stage as well as possible, so the power estimation can be much closer to the real pipeline behavior. The experimental results have shown that the average error of our approach is less than 3%.

R5-7 (Time: 14:27 - 14:29)
TitleAutomatic Generation of Custom Interface Transactors for Verification Environments
Author*Rafael K. Morizawa, Hiroaki Iwashita, Koichiro Takayama (Fujitsu Laboratories, LTD., Japan)
Pagepp. 500 - 506
KeywordTransactor generation, Testbench generation, Protocol checker
AbstractThe verification cost of complex SoCs has been increasing in a fast pace. Thus it is necessary to cut as much as possible any costs that are not directly associated to the verification task itself. From our experience, we have noticed that building the verification environment (also called testbench) is not an easy task, takes time, and has a negative impact in the overall verification cost. The main reason of the complexity of a verification environment lies in the interfacing between the DUT (Design Under Test) and the testbench. Although standard interface protocols are available, custom complex interface protocols are used instead in order to optimize the hardware's communication throughput and latency. One way to alleviate this problem is to abstract this interfacing by using transactors. In this paper we propose a methodology to automatically generate transactors. We also present a case study where the proposed methodology has been used to build the verification environment of a bus bridge used in a commercial product.

R5-8 (Time: 14:29 - 14:31)
TitleAnalog Simulation Meets Digital Verification- A Formal Assertion Approach for Mixed-Signal Verification
Author*Alexander Jesser, Lars Hedrich (University of Frankfurt a.M., Germany), Stefan Laemmermann, Roland Weiss, Juergen Ruf, Thomas Kropf, Wolfgang Rosenstiel (University of Tuebingen, Germany), Alexander Pacholik, Wolfgang Fengler (Technical University of Ilmenau, Germany)
Pagepp. 507 - 514
KeywordAnalog and mixed-signal design, Verification and simulation, Assertion-based verification, Property specifiction language
AbstractFunctional and formal verification are important methodologies for complex mixed-signal designs. But there exists a verification gap between the analog and digital blocks of a mixed-signal system. Our approach improves the verification process by creating mixed-signal assertions which are described by a combination of digital assertions and analog properties. The proposed method is a new assertion-based verification flow for designing mixed-signal circuits. The effectiveness of the approach is demonstrated on a sigma/delta-converter.

R5-9 (Time: 14:31 - 14:33)
TitleEncoding Assertions with Dynamic Local Variables for Bounded Property Checking
Author*Sho Takeuchi, Kiyoharu Hamaguchi, Toshinobu Kashiwabara (Graduate School of Information Science and Technology, Osaka University, Japan)
Pagepp. 515 - 521
KeywordAssertion-Based Verification, Bounded Model Checking, SystemVerilog, Dynamic Local Variable
AbstractTo perform functional formal verification, bounded property checking for assertions has been proposed. However, it is difficult to handle assertions including dynamic local variables such as in SystemVerilog. In this paper, we assume a restriction for assertions with dynamic local variables that substitution to each dynamic local variable is allowed only once in the assertion at the left-hand side of an implication operator. Under this restriction, we investigate an algorithm for verifying assertions with one storing variable for each dynamic local variable using bounded property checking. We implemented the algorithm and performed some experiments.

R5-10 (Time: 14:33 - 14:35)
TitleEvaluation of All-Digital PLL by Using Clock-Period Comparator
Author*Yukinobu Makihara, Masayuki Ikebe, Eiichi Sano (Hokkaido University, Japan)
Pagepp. 522 - 528
Keyworddigitally controlled PLL, clock-period comparator, loop characteristic
AbstractFor a digitally controlled phase-locked loop (PLL), we evaluate the use of a clock-period comparator (CPC). In this PLL, only the frequency lock operation should be performed; however, the phase lock operation is also simultaneously achieved by performing the clock-period comparison. In addition, we succeeded in digitizing a voltage controlled oscillator (VCO) with a linear characteristic. We confirmed a phase lock operation with a slight loop characteristic through SPICE simulation.

R5-11 (Time: 14:35 - 14:37)
TitleA Lateral Unified-CBiCMOS Buffer Circuit for Driving 5-nF Maximum Load Capacitance per CCD Clock
Author*Masatoshi Kobayashi, Takashi Hamahata, Toshiro Akino (Kinki University, Japan), Kenji Nishi (Kinki University Technology College, Japan), Cuong Vo Le, Kohsei Takehara, T. Goji Etoh (Kinki University, Japan)
Pagepp. 529 - 535
KeywordSlanted linear CCD storage, ISIS, CMOS/SOI, Lateral unified-CBiCMOS
AbstractSince 2001, we have been developing an in-situ storage image sensor (ISIS) that captures 100 to 150 consecutive images at a frame rate of 1 Mfps and an ultra-high-speed video camera for use with this ISIS. Currently, basic research is continuing in an attempt to increase the frame rate up to 100 Mfps. The CCD chip of this camera has a 10 V maximum voltage supply source and a 5 nF maximum load capacitance per CCD clock. The goal of this study is to design a prototype power supply chip for generating the CCD clock and for driving the load capacitance of the CCD chip. A further goal is to verify the circuit behavior, based on a 1-ìm CMOS/SOI process having breakdown voltages of almost 20 V. A lateral unified-CBiCMOS buffer circuit consists of n- and p-channel MOSFETs that include parasitic lateral npn- and pnp-BJTs having partially depleted p- and n-base layers, respectively, on an epitaxial substrate and SOI. A forward current is applied to the base terminal of the channel MOSFET, adding a normal pull-up or pull-down MOSFET as a current source. A new device structure is designed to reduce the resistance values between the drains and the bases, while also keeping both MOSFETs inactive and activating either the lateral npn or pnp BJT. A clock generator consisting of a ring oscillator with a 21-stage CMOS inverter amplified and driven by a buffer circuit is designed. Circuit simulation using 1-ìm LEVEL-3 model parameters for the MOSFETs and a current gain of âF = 100 for the BJTs reduced the delay time of the unified-CBiCMOS buffer circuit by approximately 1/4, compared to that for an equivalent two-stage CMOS inverter circuit designed on the basis of logical effort for driving a load capacitance of 5 nF at Vdd = 10 V. The power supply chip with the unified-CBiCMOS buffer circuit can drive the CCD chip at a frame rate of 10 Mfps for a maximum 5-nF load capacitance.

R5-12 (Time: 14:37 - 14:39)
TitleA CMOS Transconductor with Rail-to-Rail Input Stage under 1.8-V Supply Voltage
Author*Tien-Yu Lo, Cheng-Sheng Kao, Wen-Hung Hsieh, Chung-Chih Hung (National Chiao Tung University, Taiwan)
Pagepp. 536 - 539
KeywordTransconductor, Rail-to-rail
AbstractThis paper presents a CMOS low-voltage rail-to-rail transconductor under a supply voltage 1.8-V. Instead of using an n-type and a p-type differential input pair, we use an n-type and a level-shift n-type differential input pair to design a rail-to-rail input stage. Instead of the reported complex structure, a novel level-shift n-type differential input pair is designed to maintain constant transconductance. This work is designed in TSMC 0.18-¨¬m CMOS technology. Results show that the fluctuation of total transconductance of the proposed transconductor is less than ¡À 3%.

R5-13 (Time: 14:39 - 14:41)
TitleCharge Recycling between Divided Blocks in MTCMOS Circuits
Author*Akira Tada, Hiromi Notani, Genichi Tanaka, Takashi Ipposhi (Renesas Technology Corporation, Japan), Masaaki Iijima, Masahiro Numa (Kobe University, Japan)
Pagepp. 540 - 544
Keywordpower gating, MTCMOS, low power, charge recycling, leak current
AbstractAn important issue with MTCMOS circuits is the energy consumption for charging virtual P/G lines during the sleep/active mode transitions. Charge recycling is an effective technique. We propose a technique to reuse more charge by dividing a circuit into several blocks, where the charge is transferred between the properly selected pairs. Assuming ideal situation, we can improve the energy saving ratio up to 63.6% from 50%. The proposed method has improved the ratio by 10.0%, and total power by 7.1%.

R5-14 (Time: 14:41 - 14:43)
TitleCoDaMa: An XML-based Framework to Manipulate Control Data Flow Graphs
Author*Shunitsu Kohara, Shi Youhua, Nozomu Togawa, Masao Yanagisawa, Tatsuo Ohtsuki (Waseda University, Japan)
Pagepp. 545 - 549
KeywordCDFG, XML, framework, HW/SW co-synthesis, high-level synthesis
AbstractThis paper proposes an XML-based framework to manipulate CDFGs (Control Data Flow Graphs) for HW/SW (Hardware / Software) co-synthesis systems or high-level synthesis systems. With the increased scale of the recent SoC applications, synthesis systems require implemented more advanced functions. It would result in increased development efforts. The developers using our framework can implement algorithm and construct the systems easily by using XML descriptions as intermediate representation of application programs and providing the input/output interface.


Panel Discussion
Time: 16:00 - 17:30 Tuesday, October 16, 2007
Location: Conference Hall (2F)

D-1 (Time: 16:00 - 17:30)
TitleThe End of Traditional CMOS
Author*Moderator: Raul Camposano (Xoomsys, United States), Panelists: Gul Agha (University of Illinois at Urbana-Champaign, United States), Yasuhiko Hagihara (Device Platforms Research Laboratories, NEC Corporation, Japan), Igor Markov (University of Michigan, United States), Chandu Visweswariah (IBM T. J. Watson Research Center, United States)
Pagep. 553
AbstractThe rumors of CMOS’ death have been greatly exaggerated. After 2 decades as the workhorse of the electronics industry, device counts have scaled by ~104 and speeds by ~102. Transistor “performance” has consequently scaled by a ~106 and has arguably been the main driver of system performance. CMOS isn’t showing signs of ending its reign any time soon. Or is it? The panel will discuss this question, in particular the following positions: •Business as usual. “Simple” (Dennard) scaling has not been simple for decades, its just getting a bit harder but essentially nothing new. We have dealt with new processes, materials, devices, circuits for a long time. We can make it below 10nm. •The problem is really economic. Scores of companies are exiting the fab business already. Scaling will most definitely end some day, and the end is coming slowly. Depending on the volume, different applications are getting stuck at different nodes. But CMOS will continue to be the principal game in town for a long time. So, it is more important to look at what we can do (design) with silicon than to further scale it. •CMOS will be “hybridized” by add on technologies, for example to increase communication speed both on- and off-chip; or to produce small, very fast non-volatile memories.. Main candidates: Optical and nanoswitches. •We ought to look at something new like Quantum Computing. Some niche applications like cryptography will benefit greatly and will drive the development of such new technologies. Power, reliability and material limits (among others) will prevent further progress in CMOS.