Title | Power-Conscious Synthesis of Parallel Prefix Adders under Bitwise Timing Constraints |
Author | *Taeko Matsunaga, Shinji Kimura (Waseda University, Japan), Yusuke Matsunaga (Kyushu University, Japan) |
Page | pp. 7 - 14 |
Keyword | parallel prefix adder, switching activity, power, timing constraints, arithmetic synthesis |
Abstract | Global structures of parallel prefix adders can be synthesized
flexibly depending on each context, such as bitwise input/output
timing constraints. In this paper, an approach for power-conscious synthesis of parallel prefix adders is proposed. Global structures of parallel prefix adders are represented as prefix graphs. The switching cost of a prefix graph is defined based on switching activities of nodes in a prefix graph, and minimized by extending our area minimization algorithms. This approach accepts bitwise input/output timing constraints and bitwise probability that each input signal value is one, and minimizes the total sum of switching activities depending on each distinct context. Calculating switching activities by OBDD-based approach makes this approach efficient. Experimental results show the effectiveness of our approach compared to existing regular parallel prefix adders. |
Title | Associative Memory Design Realizing Reference-Pattern Recognition and Learning based on Short/Long-Term Storage Concept |
Author | *Shogo Sakakibara, Md. Anwarul Abedin, Yuki Tanaka, Ali Ahmadi , Hans Jüergen Mattausch, Tetsushi Koide (Hiroshima University, Japan) |
Page | pp. 21 - 25 |
Keyword | Associative Memory, Short/Long-term memory |
Abstract | In the presented research, an associative memory architecture for searching the most similar data among previously stored reference data is applied, which achieves high speed, low power consumption and small implementation area due to a mixed digital-analog fully-parallel nearest-match search circuitry.
The realization of the learning capability is based on the concept of short/long-term memory and tries to mimic the function of the human brain.
The complete LSI test-chip designed in 0.35um CMOS technology for verification of this architecture. |
Title | Acceleration of Advanced Encryption Standard (AES) Processing on a CAM Enhanced Super Parallel SIMD Processor |
Author | *Masaharu Tagami, Masakatsu Ishizaki, Takeshi Kumaki, Yutaka Kono, Tetsushi Koide, Hans Jüergen Mattausch (Hiroshima University, Japan), Takayuki Gyohten, Hideyuki Noda, Katsumi Dosaka, Kazutami Arimoto, Kazunori Saito (Renesas Technology Corporation, Japan) |
Page | pp. 26 - 31 |
Keyword | super parallel SIMD processor, AES, CAM, multimedia processing, pattern matching |
Abstract | This paper presents an Advanced Encryption Standard (AES) implementation on a Content Addressable Memory (CAM) enhanced super-parallel SIMD processor. The proposed SIMD processor architecture achieves 40 GOPS for 16b additions at 200MHz clock frequency and 250 mW power dissipation. In the AES processing, a table conversion processing is included. We apply an integrated CAM to which the SIMD processor can off-load the table conversion for quick processing. As a result, we can realize high-speed AES execution on the proposed architecture. |
Title | Hardware Realization of Two-Stage Pattern Matching System using Fully-Parallel Associative Memories |
Author | *Md. Anwarul Abedin, Yuki Tanaka, Shogo Sakakibara, Ali Ahmadi , Tetsushi Koide, Hans Jüergen Mattausch (RCNS, Hiroshima University, Japan) |
Page | pp. 32 - 37 |
Keyword | associative memory, pattern matching, fully parallel search, mixed digital/analog circuit |
Abstract | A hardware realization of cascaded fully-parallel associative memory with two-stage winner search is proposed. In this architecture we have used two different types of associative memories. One is based on the $k$-nearest-matches search and other one is a special type of associative memory in which winner search is done only among the activated reference patterns. The activation in the second associative memory is done by first associative memory after searching the k-nearest-matches. We have already designed, fabricated and tested the associative memories separately. The complete two-stage pattern matching system is tested here with Matlab software and hardware realization is currently under the design process. |
Title | A Fast Differential-Amplifier-Based Winner-Search circuit for Fully Parallel Associative Memories |
Author | *Yuki Tanaka, Md. Anwarul Abedin, Shogo Sakakibara, Tetsushi Koide, Hans Jüergen Mattausch (Hiroshima University, Japan) |
Page | pp. 38 - 41 |
Keyword | associative memory, nearest search, digital-analog circuit, differential amplifier |
Abstract | A mixed digital-analog fully parallel associative memory with differential amplifier for winner search is proposed.
The use of proposed differential amplifier for winner search improves the speed, reliability and area efficiency of the associative memory based system.
The test chip consumes $5.48mm^2$ area in 0.35 $\mu$m CMOS technology for 64 reference patterns with 16 binaries of 5-bit.
The operation speed of the system is less than 78 ns with an average power consumption of around 132 mW. |
Title | Reducing the Dynamic Energy Consumption in the Multi-Layer Memory of Embedded Multimedia Processing Systems |
Author | *Ilie I. Luican (University of Illinois at Chicago, United States), Hongwei Zhu (ARM, Inc., United States), Florin Balasa (Southern Utah University, United States), Dhiraj K. Pradhan (University of Bristol, Great Britain) |
Page | pp. 42 - 48 |
Keyword | memory management, embedded systems, dynamic energy |
Abstract | The memories in data-intensive signal processing systems
-- including video and image processing, artificial vision,
real-time 3-D rendering, advanced audio and speech coding,
medical imaging applications --
have an important impact on the overall energy budget.
This paper focuses on the reduction of the dynamic energy
consumption in the memory subsystem, starting from the high-level
algorithmic specification of the application.
The approach to address this problem uses elements of
the theory of polyhedra and relies on a variety of algebraic techniques
specific to the data-flow analysis used in modern compilers. |
Title | An Output Probability Computation Circuit Design for Real Time Speech Recognition |
Author | *Joe Hashimoto, Akihiko Eguchi, Makoto Saituji (Kinki University, Japan), Akihisa Yamada (Sharp Corporation, Japan), Takashi Kambe (Kinki University, Japan) |
Page | pp. 49 - 55 |
Keyword | Speech recognition, C-based architecture design, memory access method, application specific arithmetic circuit, Bach system |
Abstract | Speech recognition is becoming a popular technology for the implementation of human interfaces. However, conventional approaches to large vocabulary continuous speech recognition require a high performance CPU. In this paper, we describe a speech-recognition system designed using a C-based architecture design methodology. Pipelining and parallel processing circuits accelerated by data buffering, memory separation, and loop unrolling were implemented to calculate the Hidden Markov Model (HMM) output probability at high speed and their performances evaluated. It is shown that real time speech recognition in small portable systems is possible. |
Title | A Hybrid Memory Architecture for Low Power Embedded System Design |
Author | *Tadayuki Matsumura, Yuriko Ishitobi (Kyushu University, Japan), Tohru Ishihara, Maziar Goudarzi (System LSI Research Center Kyushu University, Japan), Hiroto Yasuura (Kyushu University, Japan) |
Page | pp. 56 - 62 |
Keyword | low power, on-chip memory, leakage, design, scratchpad |
Abstract | On-chip memories are one of the most power hungry components of today's system on a chips (SoCs). The on-chip memories generally use higher Vdd and Vth than those of logic parts to
suppress the static power consumption without increasing the access
delay of the memories. This design policy, however, increases the
dynamic power consumption since the dynamic power consumption is
quadratically proportional to the Vdd. This paper proposes a hybrid
memory architecture which consists of the following two regions;
1) a frequently accessed region which uses low Vdd and Vth
and 2) a rarely accessed region which uses high Vdd and Vth.
The key of our architecture is that the access delays for the two regions are equal to each other, which eases to integrate this memory into processors without any modifications of an internal processor architecture.
This paper also proposes a technique for finding the sizes and the
code allocation for the regions so as to minimize the total power
consumption of the memory. Experimental results
demonstrate that the total power consumption of the scratchpad memory can be reduced in all cases. |
Title | Performance Evaluation of Region-Growing Image Segmentation Using Two-Dimensional Image-Block Scanning |
Author | *Keita Okazaki, Kazutoshi Awane, Kosuke Yamaoka, Tetsushi Koide, Hans Jüergen Mattausch (Hiroshima University, Japan) |
Page | pp. 69 - 73 |
Keyword | block-scanning |
Abstract | We report a 2-dimensional block-scanning image-segmentation architecture based on a region-growing approach which has real-time execution capability. Using the two techniques of a limited scan to the boundary of each grown region and an exhaustive block-internal growing process, we have improved processing speed, power consumption and hardware efficiency in comparison to the previous state of the art. In particular, the processing speed could be maximized and the processing-circuit size could be minimized by adjusting the pixel number within the scanning block, the memory configuration and the memory-access method. |
Title | An Effective Parallel Coding Architecture Utilizing Characteristics of Multimedia Application |
Author | *Takeshi Kumaki, Masakatsu Ishizaki, Masaharu Tagami, Tetsushi Koide, Hans Jüergen Mattausch (Hiroshima University, Japan) |
Page | pp. 74 - 80 |
Keyword | Content addressable memory, CAM, Parallel coding, Multiport, Huffman coding |
Abstract | This paper presents a parallel coding architecture using a flexible multi-ported content addressable memory (CAM). A previously reported Flexible Multi-port Content Addressable Memory (FMCAM) technology is improved by additional schemes for a single search mode and counting value setting and enables the fast parallel coding operation. Moreover, the concept of an inactive category suspend mode is possible and reduces the power consumption. Evaluation results for Huffman encoding within the JPEG application show that in the proposed architecture the number of clock cycles needed for encoding is 93% less than for a conventional DSP. The power consumption during data transmission between memory block and processing block for the improved FMCAM is estimated about 90% smaller than for the original FMCAM. Furthermore, the performance per unit area, measured in MOPS/mm^2, can be improved by a factor 3.8 in comparison to a conventional DSP. |
Title | VLSI Architecture for Real-time Retinex Video Image Enhancement |
Author | *Kazuyuki Takahashi, Yoshihiro Nozato (Osaka University, Japan), Hiroyuki Okuhata (Synthesis Corporation, Japan), Takao Onoye (Osaka University, Japan) |
Page | pp. 81 - 86 |
Keyword | video image enhancement, Retinex, variational model |
Abstract | Real-time VLSI architecture for Full HD 1080i video image enhancement is proposed, which is based on variational approach of the Retinex algorithm. In order to efficiently reduce the enormous computational cost required for image enhancement, processing layers and the number of iterations are determined in accordance with software evaluation result. Pipeline and parallel processing of pixels also contributes to achieve realtime processing of high resolution pictures. In addition, the use of illumination signal calculated for the previous frame rather than that for the current frame reduces required frame memory size. As a result, the proposed architecture with four parallelization, which can be implemented by 100K gates, processes 1,920x1,080, 30fps images in real-time at 24MHz operation. |
Title | The Effects of Switch Resistances on Pipelined ADC Performances and the Optimization for the Settling Time |
Author | Masaya Miyahara, *Hiroki Endou, Akira Matsuzawa (Tokyo Institute of Technology, Japan) |
Page | pp. 91 - 96 |
Keyword | analog to digital converter, switched capacitor amplifier, switch resistance, pipeline operation |
Abstract | In this paper, we discuss the effects of switch resistances on the step response of switched-capacitor (SC) circuits, especially multiplying digital-to-analog converters (MDACs) in pipelined analog-to-digital converters. Theory and simulation results reveal that the settling time of MDACs can be decreased by optimizing the switch resistances. This switch resistance optimization does not only effectively increase the speed of single-bit MDACs, but also of multi-bit MDACs. Moreover, multi-bit MDACs are faster than the single-bit MDACs when slewing occurs during the step response. With such an optimization, the response of the switch will be improved by up to 50 %. |
Title | A 12-bit 3.7-Msample/s Pipelined A/D Converter Based on the Novel Capacitor Mismatch Calibration Technique |
Author | *Shuaiqi Wang (Graduate School of Information, Production, and System, Waseda University, Japan), Fule Li ( Institute of Microelectronics,Tsinghua University, China), Yasuaki Inoue (Graduate School of Information, Production, and System, Waseda University, Japan) |
Page | pp. 97 - 103 |
Keyword | A/D conversion, pipelined, capacitor mismatch calibration, low power dissipation |
Abstract | TThis paper proposes a 12-bit 3.7-MS/s pipelined A/D Converter based on the novel capacitor mismatch calibration technique. The conventional stage is improved to an algorithmic circuit involving charge summing, capacitors’ exchange and charge redistribution, simply through introducing some extra switches into the analog circuit. This proposed ADC obtains the linearity beyond the accuracy of the capacitor match and verifies the validity of reducing the nonlinear error from the capacitor mismatch to the second order without additional power dissipation and chip size through the novel capacitor mismatch calibration technique. It is processed in 0.5um CMOS technology. Simulation results show that 71.7dB SNDR, 77.9dB SFDR are obtained for a 2V Vpp 500kHz sine input sampled at 3.7MS/s. The whole power dissipation of this ADC is 33.46mW at the power supply of 5V. |