The 14th Workshop on Synthesis And System Integration of Mixed Information technologies Technical Program

The 14th Workshop on Synthesis And System Integration of Mixed Information technologies

Design Experience I
Time: 10:20 - 12:05 Monday, October 15, 2007
Location: Conference Hall (2F) & Poster Room (2F)
Chairs: Chun-Yao Wang (National Tsing Hua University, Taiwan), Tohru Ishihara (Kyushu University, Japan)

R1-1 (Time: 10:20 - 10:22)

Title	Power-Conscious Synthesis of Parallel Prefix Adders under Bitwise Timing Constraints
Author	*Taeko Matsunaga, Shinji Kimura (Waseda University, Japan), Yusuke Matsunaga (Kyushu University, Japan)
Page	pp. 7 - 14
Keyword	parallel prefix adder, switching activity, power, timing constraints, arithmetic synthesis
Abstract	Global structures of parallel prefix adders can be synthesized flexibly depending on each context, such as bitwise input/output timing constraints. In this paper, an approach for power-conscious synthesis of parallel prefix adders is proposed. Global structures of parallel prefix adders are represented as prefix graphs. The switching cost of a prefix graph is defined based on switching activities of nodes in a prefix graph, and minimized by extending our area minimization algorithms. This approach accepts bitwise input/output timing constraints and bitwise probability that each input signal value is one, and minimizes the total sum of switching activities depending on each distinct context. Calculating switching activities by OBDD-based approach makes this approach efficient. Experimental results show the effectiveness of our approach compared to existing regular parallel prefix adders.

R1-2 (Time: 10:22 - 10:24)

Title	Design of a Combined Circuit for Multiplication and Inversion in GF(2^m)
Author	*Katsuki Kobayashi, Naofumi Takagi (Nagoya University, Japan)
Page	pp. 15 - 20
Keyword	GF(2^m), multiplication, inversion
Abstract	A combined circuit for multiplication and inversion in GF(2^m) is proposed. We combine the inversion algorithm proposed by Yan et al. that is based on the extended Euclid's algorithm and the MSB-first multiplication algorithm by focusing on the similarities between them so that multiplication and inversion can share almost all hardware components of the circuit. The area of the circuit is estimated to be approximately 40% smaller than the total area of an ordinary multiplication circuit and an ordinary inversion circuit.

R1-3 (Time: 10:24 - 10:26)

Title	Associative Memory Design Realizing Reference-Pattern Recognition and Learning based on Short/Long-Term Storage Concept
Author	*Shogo Sakakibara, Md. Anwarul Abedin, Yuki Tanaka, Ali Ahmadi , Hans Jüergen Mattausch, Tetsushi Koide (Hiroshima University, Japan)
Page	pp. 21 - 25
Keyword	Associative Memory, Short/Long-term memory
Abstract	In the presented research, an associative memory architecture for searching the most similar data among previously stored reference data is applied, which achieves high speed, low power consumption and small implementation area due to a mixed digital-analog fully-parallel nearest-match search circuitry. The realization of the learning capability is based on the concept of short/long-term memory and tries to mimic the function of the human brain. The complete LSI test-chip designed in 0.35um CMOS technology for verification of this architecture.

R1-4 (Time: 10:26 - 10:28)

Title	Acceleration of Advanced Encryption Standard (AES) Processing on a CAM Enhanced Super Parallel SIMD Processor
Author	*Masaharu Tagami, Masakatsu Ishizaki, Takeshi Kumaki, Yutaka Kono, Tetsushi Koide, Hans Jüergen Mattausch (Hiroshima University, Japan), Takayuki Gyohten, Hideyuki Noda, Katsumi Dosaka, Kazutami Arimoto, Kazunori Saito (Renesas Technology Corporation, Japan)
Page	pp. 26 - 31
Keyword	super parallel SIMD processor, AES, CAM, multimedia processing, pattern matching
Abstract	This paper presents an Advanced Encryption Standard (AES) implementation on a Content Addressable Memory (CAM) enhanced super-parallel SIMD processor. The proposed SIMD processor architecture achieves 40 GOPS for 16b additions at 200MHz clock frequency and 250 mW power dissipation. In the AES processing, a table conversion processing is included. We apply an integrated CAM to which the SIMD processor can off-load the table conversion for quick processing. As a result, we can realize high-speed AES execution on the proposed architecture.

R1-5 (Time: 10:28 - 10:30)

Title	Hardware Realization of Two-Stage Pattern Matching System using Fully-Parallel Associative Memories
Author	*Md. Anwarul Abedin, Yuki Tanaka, Shogo Sakakibara, Ali Ahmadi , Tetsushi Koide, Hans Jüergen Mattausch (RCNS, Hiroshima University, Japan)
Page	pp. 32 - 37
Keyword	associative memory, pattern matching, fully parallel search, mixed digital/analog circuit
Abstract	A hardware realization of cascaded fully-parallel associative memory with two-stage winner search is proposed. In this architecture we have used two different types of associative memories. One is based on the $k$-nearest-matches search and other one is a special type of associative memory in which winner search is done only among the activated reference patterns. The activation in the second associative memory is done by first associative memory after searching the k-nearest-matches. We have already designed, fabricated and tested the associative memories separately. The complete two-stage pattern matching system is tested here with Matlab software and hardware realization is currently under the design process.

R1-6 (Time: 10:30 - 10:32)

Title	A Fast Differential-Amplifier-Based Winner-Search circuit for Fully Parallel Associative Memories
Author	*Yuki Tanaka, Md. Anwarul Abedin, Shogo Sakakibara, Tetsushi Koide, Hans Jüergen Mattausch (Hiroshima University, Japan)
Page	pp. 38 - 41
Keyword	associative memory, nearest search, digital-analog circuit, differential amplifier
Abstract	A mixed digital-analog fully parallel associative memory with differential amplifier for winner search is proposed. The use of proposed differential amplifier for winner search improves the speed, reliability and area efficiency of the associative memory based system. The test chip consumes $5.48mm^2$ area in 0.35 $\mu$m CMOS technology for 64 reference patterns with 16 binaries of 5-bit. The operation speed of the system is less than 78 ns with an average power consumption of around 132 mW.

R1-7 (Time: 10:32 - 10:34)

Title	Reducing the Dynamic Energy Consumption in the Multi-Layer Memory of Embedded Multimedia Processing Systems
Author	*Ilie I. Luican (University of Illinois at Chicago, United States), Hongwei Zhu (ARM, Inc., United States), Florin Balasa (Southern Utah University, United States), Dhiraj K. Pradhan (University of Bristol, Great Britain)
Page	pp. 42 - 48
Keyword	memory management, embedded systems, dynamic energy
Abstract	The memories in data-intensive signal processing systems -- including video and image processing, artificial vision, real-time 3-D rendering, advanced audio and speech coding, medical imaging applications -- have an important impact on the overall energy budget. This paper focuses on the reduction of the dynamic energy consumption in the memory subsystem, starting from the high-level algorithmic specification of the application. The approach to address this problem uses elements of the theory of polyhedra and relies on a variety of algebraic techniques specific to the data-flow analysis used in modern compilers.

R1-8 (Time: 10:34 - 10:36)

Title	An Output Probability Computation Circuit Design for Real Time Speech Recognition
Author	*Joe Hashimoto, Akihiko Eguchi, Makoto Saituji (Kinki University, Japan), Akihisa Yamada (Sharp Corporation, Japan), Takashi Kambe (Kinki University, Japan)
Page	pp. 49 - 55
Keyword	Speech recognition, C-based architecture design, memory access method, application specific arithmetic circuit, Bach system
Abstract	Speech recognition is becoming a popular technology for the implementation of human interfaces. However, conventional approaches to large vocabulary continuous speech recognition require a high performance CPU. In this paper, we describe a speech-recognition system designed using a C-based architecture design methodology. Pipelining and parallel processing circuits accelerated by data buffering, memory separation, and loop unrolling were implemented to calculate the Hidden Markov Model (HMM) output probability at high speed and their performances evaluated. It is shown that real time speech recognition in small portable systems is possible.

R1-9 (Time: 10:36 - 10:38)

Title	A Hybrid Memory Architecture for Low Power Embedded System Design
Author	*Tadayuki Matsumura, Yuriko Ishitobi (Kyushu University, Japan), Tohru Ishihara, Maziar Goudarzi (System LSI Research Center Kyushu University, Japan), Hiroto Yasuura (Kyushu University, Japan)
Page	pp. 56 - 62
Keyword	low power, on-chip memory, leakage, design, scratchpad
Abstract	On-chip memories are one of the most power hungry components of today's system on a chips (SoCs). The on-chip memories generally use higher Vdd and Vth than those of logic parts to suppress the static power consumption without increasing the access delay of the memories. This design policy, however, increases the dynamic power consumption since the dynamic power consumption is quadratically proportional to the Vdd. This paper proposes a hybrid memory architecture which consists of the following two regions; 1) a frequently accessed region which uses low Vdd and Vth and 2) a rarely accessed region which uses high Vdd and Vth. The key of our architecture is that the access delays for the two regions are equal to each other, which eases to integrate this memory into processors without any modifications of an internal processor architecture. This paper also proposes a technique for finding the sizes and the code allocation for the regions so as to minimize the total power consumption of the memory. Experimental results demonstrate that the total power consumption of the scratchpad memory can be reduced in all cases.

R1-10 (Time: 10:38 - 10:40)

Title	An Accurate and Efficient Lane Recognition Algorithm for Automotive Active Safety System
Author	*Yusuke Watanabe, Masahiro Fukui (Ritsumeikan University, Japan)
Page	pp. 63 - 68
Keyword	image filter, automobile, lane recognition
Abstract	Lane recognition is an essential technique for automobile active safety applications. We aim at developing a high speed and high accurate lane recognition system. The proposing algorithm provides an efficient filter to extract candidates edges of lanes and avoid noise edges to reduce mis-recognition as much as possible. It is implemented by a simple hardware logic.

R1-11 (Time: 10:40 - 10:42)

Title	Performance Evaluation of Region-Growing Image Segmentation Using Two-Dimensional Image-Block Scanning
Author	*Keita Okazaki, Kazutoshi Awane, Kosuke Yamaoka, Tetsushi Koide, Hans Jüergen Mattausch (Hiroshima University, Japan)
Page	pp. 69 - 73
Keyword	block-scanning
Abstract	We report a 2-dimensional block-scanning image-segmentation architecture based on a region-growing approach which has real-time execution capability. Using the two techniques of a limited scan to the boundary of each grown region and an exhaustive block-internal growing process, we have improved processing speed, power consumption and hardware efficiency in comparison to the previous state of the art. In particular, the processing speed could be maximized and the processing-circuit size could be minimized by adjusting the pixel number within the scanning block, the memory configuration and the memory-access method.

R1-12 (Time: 10:42 - 10:44)

Title	An Effective Parallel Coding Architecture Utilizing Characteristics of Multimedia Application
Author	*Takeshi Kumaki, Masakatsu Ishizaki, Masaharu Tagami, Tetsushi Koide, Hans Jüergen Mattausch (Hiroshima University, Japan)
Page	pp. 74 - 80
Keyword	Content addressable memory, CAM, Parallel coding, Multiport, Huffman coding
Abstract	This paper presents a parallel coding architecture using a flexible multi-ported content addressable memory (CAM). A previously reported Flexible Multi-port Content Addressable Memory (FMCAM) technology is improved by additional schemes for a single search mode and counting value setting and enables the fast parallel coding operation. Moreover, the concept of an inactive category suspend mode is possible and reduces the power consumption. Evaluation results for Huffman encoding within the JPEG application show that in the proposed architecture the number of clock cycles needed for encoding is 93% less than for a conventional DSP. The power consumption during data transmission between memory block and processing block for the improved FMCAM is estimated about 90% smaller than for the original FMCAM. Furthermore, the performance per unit area, measured in MOPS/mm^2, can be improved by a factor 3.8 in comparison to a conventional DSP.

R1-13 (Time: 10:44 - 10:46)

Title	VLSI Architecture for Real-time Retinex Video Image Enhancement
Author	*Kazuyuki Takahashi, Yoshihiro Nozato (Osaka University, Japan), Hiroyuki Okuhata (Synthesis Corporation, Japan), Takao Onoye (Osaka University, Japan)
Page	pp. 81 - 86
Keyword	video image enhancement, Retinex, variational model
Abstract	Real-time VLSI architecture for Full HD 1080i video image enhancement is proposed, which is based on variational approach of the Retinex algorithm. In order to efficiently reduce the enormous computational cost required for image enhancement, processing layers and the number of iterations are determined in accordance with software evaluation result. Pipeline and parallel processing of pixels also contributes to achieve realtime processing of high resolution pictures. In addition, the use of illumination signal calculated for the previous frame rather than that for the current frame reduces required frame memory size. As a result, the proposed architecture with four parallelization, which can be implemented by 100K gates, processes 1,920x1,080, 30fps images in real-time at 24MHz operation.

R1-14 (Time: 10:46 - 10:48)

Title	ΣΔ-Modulator with High Nearby Interferers Suppression by Transmission Zeroes
Author	*Takashi Moue, Akira Matsuzawa (Tokyo Institute of Technology, Japan)
Page	pp. 87 - 90
Keyword	Delta Sigma modulator , A/D conversion , CMOS
Abstract	A Delta Sigma modulator that can suppress nearby interferers strongly by forming zeroes in signal transfer function has been proposed and demonstrated. Feedforward signal passes from input signal terminal to each integrator can form zeroes in signal transfer function to suppress the nearby interferers strongly which often degrade quality of A/D conversion heavily and causes serious instability. A prototype discrete-time 6th-order Delta Sigma modulator of which signal bandwidth is 777 kHz has fabricated in 0.18 um CMOS technology and demonstrated 20 dB suppression to the 2.65 MHz to 8.22 MHz adjacent channel signals and SNR of 59 dB for in-band signals.

R1-15 (Time: 10:48 - 10:50)

Title	The Effects of Switch Resistances on Pipelined ADC Performances and the Optimization for the Settling Time
Author	Masaya Miyahara, *Hiroki Endou, Akira Matsuzawa (Tokyo Institute of Technology, Japan)
Page	pp. 91 - 96
Keyword	analog to digital converter, switched capacitor amplifier, switch resistance, pipeline operation
Abstract	In this paper, we discuss the effects of switch resistances on the step response of switched-capacitor (SC) circuits, especially multiplying digital-to-analog converters (MDACs) in pipelined analog-to-digital converters. Theory and simulation results reveal that the settling time of MDACs can be decreased by optimizing the switch resistances. This switch resistance optimization does not only effectively increase the speed of single-bit MDACs, but also of multi-bit MDACs. Moreover, multi-bit MDACs are faster than the single-bit MDACs when slewing occurs during the step response. With such an optimization, the response of the switch will be improved by up to 50 %.

R1-16 (Time: 10:50 - 10:52)

Title	A 12-bit 3.7-Msample/s Pipelined A/D Converter Based on the Novel Capacitor Mismatch Calibration Technique
Author	*Shuaiqi Wang (Graduate School of Information, Production, and System, Waseda University, Japan), Fule Li ( Institute of Microelectronics,Tsinghua University, China), Yasuaki Inoue (Graduate School of Information, Production, and System, Waseda University, Japan)
Page	pp. 97 - 103
Keyword	A/D conversion, pipelined, capacitor mismatch calibration, low power dissipation
Abstract	TThis paper proposes a 12-bit 3.7-MS/s pipelined A/D Converter based on the novel capacitor mismatch calibration technique. The conventional stage is improved to an algorithmic circuit involving charge summing, capacitors’ exchange and charge redistribution, simply through introducing some extra switches into the analog circuit. This proposed ADC obtains the linearity beyond the accuracy of the capacitor match and verifies the validity of reducing the nonlinear error from the capacitor mismatch to the second order without additional power dissipation and chip size through the novel capacitor mismatch calibration technique. It is processed in 0.5um CMOS technology. Simulation results show that 71.7dB SNDR, 77.9dB SFDR are obtained for a 2V Vpp 500kHz sine input sampled at 3.7MS/s. The whole power dissipation of this ADC is 33.46mW at the power supply of 5V.