Real-Time Demonstration of 20-Gb/s QPSK Burst-Mode Digital Coherent Reception for PON Upstream under Clock Frequency Mismatch of 1.0 MHz

Author(s):  
Noriko Iiyama ◽  
Masamichi Fujiwara ◽  
Takuya Kanai ◽  
Hiro Suzuki ◽  
Jun-ichi Kani ◽  
...  
2013 ◽  
Vol 22 (10) ◽  
pp. 1340025
Author(s):  
TENG WANG ◽  
LEI ZHAO ◽  
ZI-YI HU ◽  
ZHENG XIE ◽  
XIN-AN WANG

In this paper, a novel decomposition approach and VLSI implementation of the chroma interpolator with great hardware reuse and no multipliers for H.264 encoders are proposed. First, the characteristic of the chroma interpolation is analyzed to obtain an optimized decomposition scheme, with which the chroma interpolation can be realized with arithmetic elements (AEs) which are comprised of only adders. Four types of AEs are developed and a pipelining hardware design is proposed to conduct the chroma interpolation with great hardware reuse. The proposed design was prototyped within a Xilinx Virtex6 XC6VLX240T FPGA with a clock frequency as high as 245 MHz. The proposed design was also synthesized with SMIC 130 nm CMOS technology with a clock frequency of 200 MHz, which could support a real-time HDTV application with less hardware cost and lower power consumption.


2004 ◽  
Vol 13 (06) ◽  
pp. 1217-1231
Author(s):  
MOHAMMED SAYED ◽  
WAEL BADAWY

This paper presents a new Computational-RAM (C-RAM) architecture for real-time mesh-based video motion tracking. In Part 1, the motion estimation part of the proposed architecture is presented. Here in Part 2, a new C-RAM mesh-based motion compensation architecture is presented. The input data to the architecture is the mesh nodes motion vectors and the reference frame and the output data is the compensated (i.e., predicted) frame. The architecture uses the affine transformation for warping the deformed patches in the reference frame into the undeformed patches in the current frame. The architecture computes the affine parameters using a multiplication-free algorithm. The reference and current frames are stored in embedded S-RAMs generated with Virage™ Memory Compiler. The proposed motion compensation architecture has been prototyped, simulated and synthesized using the TSMC 0.18 μm CMOS technology. Using 100 MHz clock frequency, the proposed architecture processes one CIF video frame (i.e., 352×288 pixels) in 0.59 ms, which means it can process up to 1694 frames per second. The core area of the proposed motion compensation architecture is 28.04 mm2 and it consumes 31.15 mW.


2013 ◽  
Vol 791-793 ◽  
pp. 1501-1505
Author(s):  
Tao Jia

Due to real-time video decoding requirements, hardware accelerators for video deblocking filtering has gradually become a research hotspot in recent years. Compared with the traditional deblocking filter hardware accelerators which support only single video coding standard, this paper implemented a deblocking filter structure, which filtering algorithm can be configured to support multiple video coding standards; Using SIMD technology to make filtering data fully parallel computing. This structure is a multi-standard deblocking filter accelerator, supports H264, AVS, VP8 to, RealVideo, four kinds of video coding standards. The clock frequency is 200MHz, and it can be used for real-time filtering of multi-standard HD video processing. Deblocking Filter Algorithm


2019 ◽  
Vol 3 (4) ◽  
pp. 48-70
Author(s):  
Lukáš Kohútka ◽  
Lukáš Nagy ◽  
Viera Stopjaková

This paper presents novel hardware architecture of dynamic memory manager providing memory allocation and deallocation operations that are suitable for hard real-time and safety-critical systems due to very high determinism of these operations. The proposed memory manager implements Worst-Fit algorithm for selection of suitable free block of memory that can be used by the external environment, e.g. CPU. The deterministic timing of the memory allocation and deallocation operations is essential for hard real-time systems. The proposed memory manager performs these operations in nearly constant time thanks to the adoption of hardware-accelerated max queue, which is a data structure that continuously provides the largest free block of memory in two clock cycles regardless of actual number or constellation of existing free blocks of memory. In order to minimize the overhead caused by implementing the memory management in hardware, the max queue was optimized by developing a new sorting architecture, called Rocket-Queue. The Rocket-Queue architecture as well as the whole memory manager is described in this paper in detail. The memory manager and the Rocket-Queue architecture were verified using simplified version of UVM and applying billions of randomly generated instructions as testing inputs. The Rocket-Queue architecture was synthesized into Intel FPGA Cyclone V with 100 MHz clock frequency and the results show that it consumes from 17,06% to 38,67% less LUTs than the existing architecture, called Systolic Array. The memory manager implemented in a form of a coprocessor that provides four custom instructions was synthesized into 28nm TSMC HPM technology with 1 GHz clock frequency and 0.9V power supply. The ASIC synthesis results show that the Rocket-Queue based memory manager can occupy up to 24,59% smaller chip area than the Systolic Array based manager. In terms of total power consumption, the Rocket-Queue based memory manager consumes from 15,16% to 42,95% less power.


2021 ◽  
Author(s):  
Yaqi Liu ◽  
Jinqiang Zhang ◽  
Jiaoyang Su ◽  
Wei Liu ◽  
Chaowei Fu

Author(s):  
F. Vacondio ◽  
C. Simonneau ◽  
A. Voicila ◽  
E. Dutisseuil ◽  
J.-M. Tanguy ◽  
...  

2005 ◽  
Vol 14 (03) ◽  
pp. 533-551
Author(s):  
JU-HWAN YI ◽  
CHONG-MIN KYUNG

This paper proposes a symbolic reachability analysis method for multiple-clock system design, which is the first approach to deal with both synchronization problems caused by metastability and rate mismatch problems caused by clock frequency mismatches in a single framework. Three methods are described to reproduce problems that occur with multiple-clock system design during reachability analysis: (1) alternate evaluation for a system with two clocks as the base-line model, (2) nondeterministic delayed evaluation to reproduce a synchronization problem, and (3) double evaluation to reproduce a clock frequency mismatch. Experimental results on ISCAS 89 benchmark show an improvement factor of average CPU time as compared to Clarke's method by 1.29, 55.41, 2.19 and 45.23 times when alternate evaluation, double evaluation, alternate evaluation with NDDE and double evaluation with NDDE is applied, respectively.


Sign in / Sign up

Export Citation Format

Share Document