Title | A Regular Expression Matching Circuit Based on a Modular Non-Deterministic Finite Automaton with Multi-Character Transition |
Author | *Hiroki Nakahara, Tsutomu Sasao, Munehiro Matsuura (Kyushu Institute of Technology, Japan) |
Page | pp. 359 - 364 |
Keyword | Regular Expression, IDS, FPGA, reconfigurable device |
Abstract | This paper shows an implementation of a regular expression circuit
based on an NFA (Non-deterministic finite automaton).
Also, it shows that the NFA based one is superior to the DFA (Deterministic finite automaton) based one,
in terms of area and time complexity.
A regular expression matching circuit is generated as follows:
First, the given regular expressions are converted into an NFA.
Then, to reduce the number of states,
the NFA is converted into a modular non-deterministic finite automaton (MNFA(p)) with p-character transition.
Finally, a finite-input memory machine (FIMM) to detect p-characters as well as
the matching elements (MEs) realizing the states for the MNFA(p) are generated.
We designed MNFA(p) for different p on a Xilinx FPGA.
Then, we derived an optimal value p that efficiently uses
both LUTs and embedded memories of the FPGA.
As for the performance per FPGA area,
our method is 6.2-18.6 times better than DFA-based methods,
and is 1.8 times better than the NFA-based method.
Since our method efficiently utilizes FPGA resource,
a low-cost FPGA can be used to implement a high-performance regular expression matching circuit. |
Title | Acceleration of a SAT Based Solver for Minimum Cost Satisfiability Problems Using Optimized Boolean Constraint Propagation |
Author | *Xin Zhang (The Graduate School of Information, Production and Systems, Waseda University, Japan), Peilin Liu (Department of Electronic Engineering Shanghai Jiao Tong University, China), Shinji Kimura (The Graduate School of Information, Production and Systems, Waseda University, Japan) |
Page | pp. 365 - 370 |
Keyword | SAT-Solver, MinCostSAT, BCP |
Abstract | In this paper, we have shown an efficient way to accelerate a SAT based Solver for Minimum-cost problems with fractional coefficients by using Early Conflict Detection based BCP. Various benchmarks are evaluated and our solver with optimized BCP proves to gain significant acceleration rate, especially for satisfiable formulas. |
Title | Circuit Synthesis for Fast Memory Access in System LSI |
Author | *Kazuya Kishida, Takashi Kambe (Kinki University, Japan) |
Page | pp. 371 - 376 |
Keyword | Behavioral synthesis, Memory Access, System LSI, pipelining, on-chip array |
Abstract | High Level Design methodologies are becoming more and more important in the design of Large system LSI devices. Behavioral synthesis from C and other high level languages is a key to achieving the productivity required by such large designs. For memory intensive applications in particular the automatic identification, optimization and synthesis of memory access operations is essential. In this paper, some synthesis techniques for memory access logic are described. The three kinds of memory access method examined are (a) variable registerization (b) memory access pipelining and (c) on-chip arraying. This approach is applied to a speech recognition algorithm and its effectiveness is evaluated. |
Title | Clock Gating Optimization with Delay-Matching Cells |
Author | *Shih-Jung Hsu, Rung-Bin Lin (Yuan Ze University, Taiwan) |
Page | pp. 377 - 382 |
Keyword | Clock gating, Type matching, Delay matching, Clock skew, Low power |
Abstract | Clock gating is an effective method of reducing power dissipation of a high-performance circuit. However, deployment of gated cells increases the difficulty of synthesizing a low-skew gated tree. In this paper, we propose a delay-matching approach to addressing this problem. Delay-matching is achieved using gated cells whose timing characteristics are similar to that of their clock buffer (inverter) counterparts. Experimental results show that delay-matching attains better slew and much better latency with comparable clock skew when compared with type-matching approach. Delay-matching achieves all of these using much less area than type-matching does. Besides, the skew of a delay-matching gated tree, just like the one generated by type-matching, is insensitive to process and operating condition variations. Moreover, the slews of a delay-matching gated tree are less sensitive to process and operating condition variations than that of a type-matching gated tree. |
Title | Design and Evaluation of Digital Receiver for Low Power Wireless Communication |
Author | *Kazuki Ohya, Keishi Sakanushi, Yoshinori Takeuchi, Masaharu Imai (Osaka University, Japan) |
Page | pp. 383 - 388 |
Keyword | Digital Radio Design, Near Distance Wireless Communication, Low Power |
Abstract | This paper studies design and implementation
trade-off of digital receivers for low power
wireless communications. In these days, wireless communication
becomes one of indispensable technologies
in our daily life. For mobile terminal devices, which
have been widely developed, low power and low energy
are required. On the other hand, consideration
for noise affects on transmissions are indispensable for
wireless communication. According to the characteristics
of noise on transmissions, suitable implementation
considering power consumption should be varied.
In this paper, we design digital circuits of receiver
for RuBee, which is one of low power communication
standards. We measured area, power consumption,
and noise tolerance of designed digital receivers by
changing noise characteristics. Experimental results
show that trade-off between power consumption and
noise tolerance exists and optimal implementation can
be changed according to the noise characteristics. |
Title | Design and Verification of an Ultra-Low-Power Active RFID Tag with Multiple Power Domains |
Author | *Kenichi Agawa, Massimo Alioto, Wenting Zhou, Tsung-Te Liu, Louis Alarcon, Kimiya Hajkazemshirazi, Mervin John, Jesse Richmond, Wen Li, Jan Rabaey (University of California at Berkeley, U.S.A.) |
Page | pp. 389 - 394 |
Keyword | RFID, ultra low power, system-level verification, analog-digital-mixed-signal design, wake-up scheme transceiver |
Abstract | An ultra-low-power active RFID tag chip with multiple power and clock domains has been designed and verified. The chip employs a wake-up scheme to reduce its power consumption. It includes a wake-up analog regulator and oscillator as well as a digital power manager, synchronizer, and protocol processor. Thus, we need to verify a complicated wake-up procedure and analog-digital-mixed-signal system for its correct operations. Matlab Simulink models and simulations are explored for system-level verification, and verification time can be suppressed efficiently. |
Title | Device Simulation and Experimental Measurement of High-Voltage Unified-CBiCMOS Buffer Driver for Ultra-High-Speed CCD Image Sensors |
Author | Toshiaki Koike-Akino (Harvard University, U.S.A.), Takashi Hamahata, *Toshiro Akino, Takeharu Goji Etoh (Kinki University, Japan) |
Page | pp. 395 - 400 |
Keyword | Lateral-BJT, CMOS, High-Speed, High-Voltage, High Capacitive Load |
Abstract | We propose a high-performance buffer circuit termed a unified-complementary bipolar CMOS (U-CBiCMOS) technique for ultra-high-speed charge-coupled device (CCD) image sensors. In order to improve driving capability for realizing over-1M frames per second, the U-CBiCMOS buffer driver makes an effective use of a lateral BJT, which is inherent to an MOSFET structure. We designed a prototype of the U-CBiCMOS buffer driver which uses a trench isolated SOI-CMOS process with an asymmetric lightly-doped diffusion (LDD) for high-voltage applications. Through device simulations and experimental measurements, we reveal that the high-voltage CMOS process can offer a peak current gain of higher than 200. In addition, it is demonstrated that the driving performance of the U-CBiCMOS buffer circuit is significantly improved by activating the lateral BJT at a supply voltage of 15V and a load capacitance of 2nF. |
Title | Efficient Multiple Regular Expression Matching on FPGAs based on Extended SHIFT-AND Method |
Author | *Yusaku Kaneta, Shingo Yoshizawa, Shin-ichi Minato, Hiroki Arimura, Yoshikazu Miyanaga (Hokkaido University, Japan) |
Page | pp. 401 - 406 |
Keyword | regular expression, pattern matching, FPGA |
Abstract | In this paper, we study efficient pattern matching over high-speed data streams on a reconfigurable hardware, FPGA (Field Programmable Gate Array) First, we introduce a subclass of regular expressions, called linear regular expressions. Secondly, for the subclass, we present a new architecture, static BP-NFA (static bit-parallel NFA) on FPGA based on one of the well-known pattern matching techniques, bit-parallel pattern matching. Thirdly, we give analysis on the complexity of our architecture. Finally, we show the experimental results on our hardware. |
Title | An Extension of Systolic Regular Expression Matching Hardware for Handling Iteration of Strings Using Quantifiers |
Author | *Yoichi Wakaba, Masato Inagi, Shin'ichi Wakabayashi, Shinobu Nagayama (Hiroshima City University, Japan) |
Page | pp. 412 - 417 |
Keyword | regular expression, string matching, NIDS, quantifier, FPGA |
Abstract | We propose systolic regular expression
matching hardware that can handle iteration of subclasses
of regular expression (e.g. union of strings,)
which cannot directly be handled by conventional systolic
hardware. Our proposed hardware handles these
patterns by using shift registers. FPGA implementation
results showed that the proposed method significantly
reduces the number of LUTs required to handle
those patterns compared with the conventional one, in
which these patterns are expanded into much longer
ones. |
Title | A Novel Timing Synchronization Method for Fast and Accurate Multi-Core Instruction-Set Simulators |
Author | Meng-Huan Wu, *Fan-Wei Yu, Cheng-Yang Fu, Peng-Chih Wang, Ren-Song Tsay (National Tsing Hua University, Taiwan) |
Page | pp. 418 - 423 |
Keyword | Instruction-set simulator, binary translation, multi-core, synchronization |
Abstract | This paper proposes a timing synchronization method for fast and accurate Multi-Core Instruction-Set Simulation (MCISS). In order to achieve accurate simulation results of MCISS, a lock-step approach, which synchronizes every cycle, is commonly used. However, this approach introduces immense overhead and lowers the simulation speed. Instead of synchronizing every cycle, our ap-proach synchronizes based on the data dependency among the simulated programs. Therefore, the synchronization overheads can be highly reduced with accurate simulation results. With the proposed approach, the simulation speed of MCISS is up to 40 ~ 1,000 million instructions per second (MIPS) in general. Our major contribution is on clarifying the data dependency issue of multi-core system. The experimental results show that the MCISS using our simulation method can perform fast and accurate simulation. |
Title | A Power Efficient Unified Gated Flip-Flop |
Author | *Takumi Okuhira, Tohru Ishihara (Kyushu University, Japan) |
Page | pp. 424 - 429 |
Keyword | Low power design, flip-flop, clock gating |
Abstract | Since the clock power consumption in today's processors is considerably large, reducing the clock power consumption contributes to the reduction of the total power consumption in the processors. Recently, a gated flip-flop is proposed for reducing the clock power consumption of flip-flop circuits. The gated flip-flop employs a clock-gating circuit which cuts off an internal clock signal if the data stored in the flip-flop does not need to be updated. Although this reduces the clocking power consumption, the power dissipated in the clock-gating circuit is still large. For reducing the power dissipated in the clock-gating circuit, this paper proposes a technique for unifying the multiple clock-gating circuits, which reduces the overhead of the clock-gating circuit. Power measurement results obtained using a test chip demonstrate that our unified gated flip-flop reduces the power consumption of register circuits by 45% compared to the conventional gated flip-flops if the state transition probability of the flip-flop is 0.1 which is an average state transition probability of flip-flops in a commercial microprocessor used in our experiments. |
Title | Quantitative Graph-Based Minimal Queue Sizing for Throughput Optimization in Latency-Insensitive Designs |
Author | Juinn-Dar Huang, *Yi-Hang Chen, Ya-Chien Ho (National Chiao Tung University, Taiwan) |
Page | pp. 430 - 435 |
Keyword | Latency-Insensitive System, Latency-Insensitive-design, Throughput Optimization, Queue Size Minimization, Integer Linear Programming |
Abstract | As manufacturing processes are constantly moving toward very deep submicron (VDSM), global interconnect delay is becoming one of the most critical performance obstacles in system-on-chip (SoC) designs today. Latency-insensitive-design (LID) methodology, which enables multicycle communication to tolerate latency variation at late stages of the design process without substantially modifying pre-designed IP cores, has been proposed accordingly to conquer this issue. However, the system throughput is still degraded due to imbalanced interconnect latency and communication back-pressure residing in an LID. In this paper, a performance optimization technique with minimal queue sizing is presented. We first model a given LID as a newly proposed quantitative graph (QG), which can be further compacted using the proposed compaction techniques, so that much bigger problems can be handled. On top of QG, integer linear programming is applied to achieve the exact solution with minimal queue size based on the proposed constraint formulation in a reasonable runtime. The experimental results show that our approach can deal with moderately large latency-incentive systems in an acceptable runtime and save about 28% of queues as compared to the prior art. |
Title | A Reconfigurable Layout Method and Evaluation for Network On Chip |
Author | *Yuichi Nakamura (NEC Corp., Japan), Marcello Lajolo (NEC Labs. America, U.S.A.) |
Page | pp. 436 - 441 |
Keyword | NoC, Layout, Reconf |
Abstract | This paper presents a reconfigurable layout method for Networks-on-Chip (NoCs) based on partial re-layout. Currently, the layout design time is quite significant, because of the complexity involved with the verification of timing and signal integrity constraints. This limits the possibility to perform incremental changes at the physical design stage. However, a NoC which connects IP cores by network interfaces can be easily reconfigured during place and route. In general, a strict hierarchical design method can provide ease of reconfigurability, but it results in worse area and timing with respect to a flat layout method, which, on the other hand, does not provide reconfigurability. In this paper, we propose a rough hierarchical layout which combines the benefits of both hierarchical and flat layout design styles. Experimental results show area and performance numbers similar to the ones achieved by a flat layout as well as reconfigurability characteristics similar to the ones provided by a strict hierarchical layout. |
Title | RER: a Tuning Tool for Implementing a Computational Pipeline Across Multiple FPGAs |
Author | Hirokazu Morishita, *Kenta Inakagata (Keio University, Japan), Yasunori Osana (Seikei University, Japan), Naoyuki Fujita (JAXA, Japan), Hideharu Amano (Keio University, Japan) |
Page | pp. 442 - 447 |
Keyword | FPGA, Optimization, CFD |
Abstract | RER (Resource Estimation and Reconfiguration) is a quick optimization tool for arithmetic units of Xilinx IP cores and facilitates partitioning deep computational pipelines across multiple FPGAs. It generates an appropriate computational pipeline structure by composing the optimal configuration of IP cores, from the designer's parameters and database of hardware amount for all possible configurations of the arithmetic IP cores. As an example, complicated pipeline used in MUSCL algorithm in computational Fluid dynamics is divided into two FPGAs, considering the results of optimization by RER. For any partition candidates, 5.46x times or more speed up compared to the software on Intel Core 2 Duo processor had achieved. |
Title | A Tile Based Reconfigurable Architecture with Dual ALU-array/Processor Operating Mode Capability |
Author | Shin'ichi Kouyama, Masayuki Hiromoto (Kyoto University, Japan), Yukihiro Nakamura (Ritsumeikan University, Japan), *Hiroyuki Ochi (Kyoto University, Japan) |
Page | pp. 454 - 459 |
Keyword | Coarse-grained Reconfigurable Architecture |
Abstract | ALU-based coarse-grained reconfigurable devices are suitable for accelerating
simple stream processing by pipelining and parallelization. They are, however,
not efficiently used for sequential processing with complicated controls. In
this paper, we propose a tile based dynamically reconfigurable architecture in
which each tile has dual ALU-array/RISC processor operating mode capability.
The ratio of ALU-based hardware accelerator and RISC processor counterpart can
be dynamically changed depending on the dominant computation in the application
in order to maximize the efficiency. |
Title | VLSI Architecture of V-AMDF based Pitch Detection for Tonal Speech Recognizer |
Author | *Jirabhorn Chaiwongsai, Werapon Chiracharit, Kosin Chamnongthai (King Mongkut’s University of Technology Thonburi, Thailand), Yoshikazu Miyanaga (Hokkaido University, Japan), Kohji Higuchi (University of Electro-Communications, Japan) |
Page | pp. 460 - 465 |
Keyword | tone classification function, V-AMDF, 4-stage pipeline process |
Abstract | Tonal speech recognizer needs tone classification function to guarantee the word correctness. However, tone classification function generally has high computation part because there are many repeated parameters. This paper proposes AMDF-based 4-stage pipeline process VLSI architecture of tone classification function for tonal speech recognizer (TONE-SPEC). The architecture is divided into pitch detection, fundamental frequency extraction and tone decision processes. In the speech recognition, only vowel signal is sent though the pitch detection process so that the sample number is reduced. AMDF detects the periodicity of vowel speech by using 4-stage pipeline process. After that, fundamental frequency is extracted and then tone result is decided by finding look-up table maximum vote in tone decision process. To evaluate the performance of the proposed architecture, the experiment is set and tested with 37 Thai words. The results show the improvement of processing time, comparing to the conventional method. |