Enhancing the performance of the aggregated bit vector algorithm in network packet classification using GPU

Packet classification is a computationally intensive, highly parallelizable task in many advanced network systems like high-speed routers and firewalls that enable different functionalities through discriminating incoming traffic. Recently, graphics processing units (GPUs) have been exploited as efficient accelerators for parallel implementation of software classifiers. The aggregated bit vector is a highly parallelizable packet classification algorithm. In this work, first we present a parallel kernel for running this algorithm on GPUs. Next, we adapt an asymptotic analysis method which predicts any empirical result of the proposed kernel. Experimental results not only confirm the efficiency of the proposed parallel kernel but also reveal the accuracy of the analysis method in predicting important trends in experimental results.

Download Full-text

Design of a Low Latency and High Throughput Packet Classification Module on FPGA Platform

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.f4195.049620 ◽

2020 ◽

Vol 9 (4) ◽

pp. 1468-1474

Keyword(s):

High Speed ◽

Network Performance ◽

Packet Classification ◽

Total Power ◽

Network Systems ◽

Chip Area ◽

Packet Header ◽

Rule Set ◽

Bit Vector ◽

Ip Packet

The Packet classification method plays a significant role in most of the Network systems. These systems categories the incoming packets in various flows and takes suitable action based on the requirements. If the size of the network is vast and complexity will arise to perform the different operations, which affects the network performance and other constraints also. So there is the demand for high-speed packet classifiers to reduce the network complexity and improve the network performance. In this article, The Bit vector Packet classifier (BV-PC) Module is designed to improve the network system performance and overcome the existing limitation of Packet classification approaches on FPGA. The BV-PC Module contains Packet generation Unit (PGU) to receive the valid incoming packets, Memory Unit (MU) to store valid packets, Header Extractor Unit (HEU) extracts the IP Header address information from the Valid packets, The BV-Based Source and Destination Address (BV-SA, BV-DA) unit receives the IP packet header Information and Process with BV based rule set and aggregates the BV-SA and BV-DA outputs, Priority Encoder encodes the Highest priority BV Rule for the generation of Classified output. The BV-PC utilizes <2% Chip area (slices), works at 509.38MHz, and consumed Less 0.103 W of total Power on Artix-7 FPGA. The BV-PC operates with a latency of 5 clock cycles and works at 815.03Mpps throughput. The BV-PC is compared with existing approaches and provides Better improvements in Hardware constraints.

Download Full-text

Integrated photonic FFT for photonic tensor operations towards efficient and high-speed neural networks

Nanophotonics ◽

10.1515/nanoph-2020-0055 ◽

2020 ◽

Vol 9 (13) ◽

pp. 4097-4108 ◽

Cited By ~ 1

Author(s):

Moustafa Ahmed ◽

Yas Al-Hadeethi ◽

Ahmed Bakry ◽

Hamed Dalir ◽

Volker J. Sorger

Keyword(s):

Neural Networks ◽

Graphics Processing Units ◽

High Speed ◽

Fourier Transforms ◽

Optoelectronic Device ◽

Small Sample ◽

Sample Number ◽

Chip Area ◽

Domain Specific ◽

Graphics Processing

AbstractThe technologically-relevant task of feature extraction from data performed in deep-learning systems is routinely accomplished as repeated fast Fourier transforms (FFT) electronically in prevalent domain-specific architectures such as in graphics processing units (GPU). However, electronics systems are limited with respect to power dissipation and delay, due to wire-charging challenges related to interconnect capacitance. Here we present a silicon photonics-based architecture for convolutional neural networks that harnesses the phase property of light to perform FFTs efficiently by executing the convolution as a multiplication in the Fourier-domain. The algorithmic executing time is determined by the time-of-flight of the signal through this photonic reconfigurable passive FFT ‘filter’ circuit and is on the order of 10’s of picosecond short. A sensitivity analysis shows that this optical processor must be thermally phase stabilized corresponding to a few degrees. Furthermore, we find that for a small sample number, the obtainable number of convolutions per {time, power, and chip area) outperforms GPUs by about two orders of magnitude. Lastly, we show that, conceptually, the optical FFT and convolution-processing performance is indeed directly linked to optoelectronic device-level, and improvements in plasmonics, metamaterials or nanophotonics are fueling next generation densely interconnected intelligent photonic circuits with relevance for edge-computing 5G networks by processing tensor operations optically.

Download Full-text

Multi-Layer Packet Classification with Graphics Processing Units

Proceedings of the 10th ACM International on Conference on emerging Networking Experiments and Technologies - CoNEXT '14 ◽

10.1145/2674005.2674990 ◽

2014 ◽

Cited By ~ 15

Author(s):

Matteo Varvello ◽

Rafael Laufer ◽

Feixiong Zhang ◽

T.V. Lakshman

Keyword(s):

Graphics Processing Units ◽

Packet Classification ◽

Graphics Processing

Download Full-text

Extending the usage of graphics processing units on the cloud for cost savings on seismic data regularization

Brazilian Journal of Geophysics ◽

10.22564/rbgf.v38i2.2048 ◽

2021 ◽

Vol 38 (2) ◽

Author(s):

Nicholas Torres Okita ◽

Tiago A. Coimbra ◽

José Ribeiro ◽

Martin Tygel

Keyword(s):

Cloud Computing ◽

Graphics Processing Units ◽

Cost Savings ◽

Data Sets ◽

Computing Paradigm ◽

Common Reflection Surface ◽

User Demand ◽

Computationally Intensive ◽

Zero Offset ◽

Graphics Processing

ABSTRACT. The usage of graphics processing units is already known as an alternative to traditional multi-core CPU processing, offering faster performance in the order of dozens of times in parallel tasks. Another new computing paradigm is cloud computing usage as a replacement to traditional in-house clusters, enabling seemingly unlimited computation power, no maintenance costs, and cutting-edge technology, dynamically on user demand. Previously those two tools were used to accelerate the estimation of Common Reflection Surface (CRS) traveltime parameters, both in zero-offset and finite-offset domain, delivering very satisfactory results with large time savings from GPU devices alongside cost savings on the cloud. This work extends those results by using GPUs on the cloud to accelerate the Offset Continuation Trajectory (OCT) traveltime parameter estimation. The results have shown that the time and cost savings from GPU devices’ usage are even larger than those seen in the CRS results, being up to fifty times faster and sixty times cheaper. This analysis reaffirms that it is possible to save both time and money when using GPU devices on the cloud and concludes that the larger the data sets are and the more computationally intensive the traveltime operators are, we can see larger improvements.Keywords: cloud computing, GPU, seismic processing. Estendendo o uso de placas gráficas na nuvem para economias em regularização de dados sísmicosRESUMO. O uso de aceleradores gráficos para processamento já é uma alternativa conhecida ao uso de CPUs multi-cores, oferecendo um desempenho na ordem de dezenas de vezes mais rápido em tarefas paralelas. Outro novo paradigma de computação é o uso da nuvem computacional como substituta para os tradicionais clusters internos, possibilitando o uso de um poder computacional aparentemente infinito sem custo de manutenção e com tecnologia de ponta, dinamicamente sob demanda de usuário. Anteriormente essas duas ferramentas foram utilizadas para acelerar a estimação de parâmetros do tempo de trânsito de Common Reflection Surface (CRS), tanto em zero-offset quanto em offsets finitos, obtendo resultados satisfatórios com amplas economias tanto de tempo quanto de dinheiro na nuvem. Este trabalho estende os resultados obtidos anteriormente, desta vez utilizando GPUs na nuvem para acelerar a estimação de parâmetros do tempo de trânsito em Offset Continuation Trajectory (OCT). Os resultados obtidos mostraram que as economias de tempo e dinheiro foram ainda maiores do que aquelas obtidas no CRS, sendo até cinquenta vezes mais rápido e sessenta vezes mais barato. Esta análise reafirma que é possível economizar tanto tempo quanto dinheiro usando GPUs na nuvem, e conclui que quanto maior for o dado e quanto mais computacionalmente intenso for o operador, maiores serão os ganhos de desempenho observados e economias.Palavras-chave: computação em nuvem, GPU, processamento sísmico.

Download Full-text

End-to-End High Speed Forward Error Correction Using Graphics Processing Units

Lecture Notes in Electrical Engineering - Mobile, Ubiquitous, and Intelligent Computing ◽

10.1007/978-3-642-40675-1_8 ◽

2014 ◽

pp. 47-53

Author(s):

Md Shohidul Islam ◽

Jong-Myon Kim

Keyword(s):

Error Correction ◽

Graphics Processing Units ◽

High Speed ◽

Forward Error Correction ◽

End To End ◽

Forward Error ◽

Graphics Processing

Download Full-text

High-Speed Nonlinear Finite Element Analysis for Surgical Simulation Using Graphics Processing Units

IEEE Transactions on Medical Imaging ◽

10.1109/tmi.2007.913112 ◽

2008 ◽

Vol 27 (5) ◽

pp. 650-663 ◽

Cited By ~ 109

Author(s):

Z.A. Taylor ◽

M. Cheng ◽

S. Ourselin

Keyword(s):

Finite Element Analysis ◽

Finite Element ◽

Graphics Processing Units ◽

High Speed ◽

Surgical Simulation ◽

Nonlinear Finite Element Analysis ◽

Nonlinear Finite Element ◽

Element Analysis ◽

Graphics Processing

Download Full-text

Parallel computing with graphics processing units for high-speed Monte Carlo simulation of photon migration

Journal of Biomedical Optics ◽

10.1117/1.3041496 ◽

2008 ◽

Vol 13 (6) ◽

pp. 060504 ◽

Cited By ~ 232

Author(s):

Erik Alerstam ◽

Tomas Svensson ◽

Stefan Andersson-Engels

Keyword(s):

Monte Carlo Simulation ◽

Monte Carlo ◽

Parallel Computing ◽

Graphics Processing Units ◽

High Speed ◽

Photon Migration ◽

Graphics Processing

Download Full-text

Parallel Heat Transfer Model of a Panel with Phase Change Material for Thermal Storage Applications Computed on Graphics Processing Units

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.1077.118 ◽

2014 ◽

Vol 1077 ◽

pp. 118-123 ◽

Cited By ~ 1

Author(s):

Lubomír Klimeš ◽

Pavel Charvát ◽

Milan Ostrý ◽

Josef Stetina

Keyword(s):

Heat Transfer ◽

Phase Change ◽

Phase Change Material ◽

Graphics Processing Units ◽

Parallel Implementation ◽

Heat Transfer Model ◽

Transfer Model ◽

Wide Range ◽

Graphics Processing ◽

Change Material

Phase change materials have a wide range of application including thermal energy storage in building structures, solar air collectors, heat storage units and exchangers. Such applications often utilize a commercially produced phase change material enclosed in a thin panel (container) made of aluminum. A parallel 1D heat transfer model of a container with phase change material was developed by means of the control volume and effective heat capacity methods. The parallel implementation in the CUDA computing architecture allows the model for running on graphics processing units which makes the model very fast in comparison to traditional models computed on a single CPU. The paper presents the model implementation and results of computational model benchmarking carried out with the use of high-level and low-level GPUs NVIDIA.

Download Full-text

Parallel implementation of the discrete wavelet transform on graphics processing units

2014 1st International Conference on Advanced Technologies for Signal and Image Processing (ATSIP) ◽

10.1109/atsip.2014.6834587 ◽

2014 ◽

Author(s):

Randa Khemiri ◽

Fatma Sayadi ◽

Taoufik Saidani ◽

Marwa Chouchene ◽

Haythem Bahri ◽

...

Keyword(s):

Wavelet Transform ◽

Discrete Wavelet Transform ◽

Graphics Processing Units ◽

Parallel Implementation ◽

Discrete Wavelet ◽

Graphics Processing

Download Full-text

A GPU based multidimensional amplitude analysis to search for tetraquark candidates

10.21203/rs.3.rs-51185/v3 ◽

2020 ◽

Author(s):

Nairit Sur ◽

Leonardo Cristella ◽

Adriano Di Florio ◽

Vincenzo Mastrapasqua

Keyword(s):

Graphics Processing Units ◽

High Energy Physics ◽

High Energy ◽

Amplitude Analysis ◽

Hadron Spectroscopy ◽

Multiple Cores ◽

Analysis Strategies ◽

Computationally Intensive ◽

Computational Resources ◽

Graphics Processing

Abstract The demand for computational resources is steadily increasing in experimental high energy physics as the current collider experiments continue to accumulate huge amounts of data and physicists indulge in more complex and ambitious analysis strategies. This is especially true in the fields of hadron spectroscopy and flavour physics where the analyses often depend on complex multidimensional unbinned maximum-likelihood fits, with several dozens of free parameters, with an aim to study the internal structure of hadrons. Graphics processing units (GPUs) represent one of the most sophisticated and versatile parallel computing architectures that are becoming popular toolkits for high energy physicists to meet their computational demands. GooFit is an upcoming open-source tool interfacing ROOT/RooFit to the CUDA platform on NVIDIA GPUs that acts as a bridge between the MINUIT minimization algorithm and a parallel processor, allowing probability density functions to be estimated on multiple cores simultaneously. In this article, a full-fledged amplitude analysis framework developed using GooFit is tested for its speed and reliability. The four-dimensional fitter framework, one of the firsts of its kind to be built on GooFit, is geared towards the search for exotic tetraquark states in the [[EQUATION]] decays and can also be seamlessly adapted for other similar analyses. The GooFit fitter, running on GPUs, shows a remarkable improvement in the computing speed compared to a ROOT/RooFit implementation of the same analysis running on multi-core CPU clusters. Furthermore, it shows sensitivity to components with small contributions to the overall fit. It has the potential to be a powerful tool for sensitive and computationally intensive physics analyses.

Download Full-text