scholarly journals Enhancing the performance of the aggregated bit vector algorithm in network packet classification using GPU

2019 ◽  
Vol 5 ◽  
pp. e185 ◽  
Author(s):  
Mahdi Abbasi ◽  
Razieh Tahouri ◽  
Milad Rafiee

Packet classification is a computationally intensive, highly parallelizable task in many advanced network systems like high-speed routers and firewalls that enable different functionalities through discriminating incoming traffic. Recently, graphics processing units (GPUs) have been exploited as efficient accelerators for parallel implementation of software classifiers. The aggregated bit vector is a highly parallelizable packet classification algorithm. In this work, first we present a parallel kernel for running this algorithm on GPUs. Next, we adapt an asymptotic analysis method which predicts any empirical result of the proposed kernel. Experimental results not only confirm the efficiency of the proposed parallel kernel but also reveal the accuracy of the analysis method in predicting important trends in experimental results.

The Packet classification method plays a significant role in most of the Network systems. These systems categories the incoming packets in various flows and takes suitable action based on the requirements. If the size of the network is vast and complexity will arise to perform the different operations, which affects the network performance and other constraints also. So there is the demand for high-speed packet classifiers to reduce the network complexity and improve the network performance. In this article, The Bit vector Packet classifier (BV-PC) Module is designed to improve the network system performance and overcome the existing limitation of Packet classification approaches on FPGA. The BV-PC Module contains Packet generation Unit (PGU) to receive the valid incoming packets, Memory Unit (MU) to store valid packets, Header Extractor Unit (HEU) extracts the IP Header address information from the Valid packets, The BV-Based Source and Destination Address (BV-SA, BV-DA) unit receives the IP packet header Information and Process with BV based rule set and aggregates the BV-SA and BV-DA outputs, Priority Encoder encodes the Highest priority BV Rule for the generation of Classified output. The BV-PC utilizes <2% Chip area (slices), works at 509.38MHz, and consumed Less 0.103 W of total Power on Artix-7 FPGA. The BV-PC operates with a latency of 5 clock cycles and works at 815.03Mpps throughput. The BV-PC is compared with existing approaches and provides Better improvements in Hardware constraints.


Nanophotonics ◽  
2020 ◽  
Vol 9 (13) ◽  
pp. 4097-4108 ◽  
Author(s):  
Moustafa Ahmed ◽  
Yas Al-Hadeethi ◽  
Ahmed Bakry ◽  
Hamed Dalir ◽  
Volker J. Sorger

AbstractThe technologically-relevant task of feature extraction from data performed in deep-learning systems is routinely accomplished as repeated fast Fourier transforms (FFT) electronically in prevalent domain-specific architectures such as in graphics processing units (GPU). However, electronics systems are limited with respect to power dissipation and delay, due to wire-charging challenges related to interconnect capacitance. Here we present a silicon photonics-based architecture for convolutional neural networks that harnesses the phase property of light to perform FFTs efficiently by executing the convolution as a multiplication in the Fourier-domain. The algorithmic executing time is determined by the time-of-flight of the signal through this photonic reconfigurable passive FFT ‘filter’ circuit and is on the order of 10’s of picosecond short. A sensitivity analysis shows that this optical processor must be thermally phase stabilized corresponding to a few degrees. Furthermore, we find that for a small sample number, the obtainable number of convolutions per {time, power, and chip area) outperforms GPUs by about two orders of magnitude. Lastly, we show that, conceptually, the optical FFT and convolution-processing performance is indeed directly linked to optoelectronic device-level, and improvements in plasmonics, metamaterials or nanophotonics are fueling next generation densely interconnected intelligent photonic circuits with relevance for edge-computing 5G networks by processing tensor operations optically.


2021 ◽  
Vol 38 (2) ◽  
Author(s):  
Nicholas Torres Okita ◽  
Tiago A. Coimbra ◽  
José Ribeiro ◽  
Martin Tygel

ABSTRACT. The usage of graphics processing units is already known as an alternative to traditional multi-core CPU processing, offering faster performance in the order of dozens of times in parallel tasks. Another new computing paradigm is cloud computing usage as a replacement to traditional in-house clusters, enabling seemingly unlimited computation power, no maintenance costs, and cutting-edge technology, dynamically on user demand. Previously those two tools were used to accelerate the estimation of Common Reflection Surface (CRS) traveltime parameters, both in zero-offset and finite-offset domain, delivering very satisfactory results with large time savings from GPU devices alongside cost savings on the cloud. This work extends those results by using GPUs on the cloud to accelerate the Offset Continuation Trajectory (OCT) traveltime parameter estimation. The results have shown that the time and cost savings from GPU devices’ usage are even larger than those seen in the CRS results, being up to fifty times faster and sixty times cheaper. This analysis reaffirms that it is possible to save both time and money when using GPU devices on the cloud and concludes that the larger the data sets are and the more computationally intensive the traveltime operators are, we can see larger improvements.Keywords: cloud computing, GPU, seismic processing. Estendendo o uso de placas gráficas na nuvem para economias em regularização de dados sísmicosRESUMO. O uso de aceleradores gráficos para processamento já é uma alternativa conhecida ao uso de CPUs multi-cores, oferecendo um desempenho na ordem de dezenas de vezes mais rápido em tarefas paralelas. Outro novo paradigma de computação é o uso da nuvem computacional como substituta para os tradicionais clusters internos, possibilitando o uso de um poder computacional aparentemente infinito sem custo de manutenção e com tecnologia de ponta, dinamicamente sob demanda de usuário. Anteriormente essas duas ferramentas foram utilizadas para acelerar a estimação de parâmetros do tempo de trânsito de Common Reflection Surface (CRS), tanto em zero-offset quanto em offsets finitos, obtendo resultados satisfatórios com amplas economias tanto de tempo quanto de dinheiro na nuvem. Este trabalho estende os resultados obtidos anteriormente, desta vez utilizando GPUs na nuvem para acelerar a estimação de parâmetros do tempo de trânsito em Offset Continuation Trajectory (OCT). Os resultados obtidos mostraram que as economias de tempo e dinheiro foram ainda maiores do que aquelas obtidas no CRS, sendo até cinquenta vezes mais rápido e sessenta vezes mais barato. Esta análise reafirma que é possível economizar tanto tempo quanto dinheiro usando GPUs na nuvem, e conclui que quanto maior for o dado e quanto mais computacionalmente intenso for o operador, maiores serão os ganhos de desempenho observados e economias.Palavras-chave: computação em nuvem, GPU, processamento sísmico. 


2014 ◽  
Vol 1077 ◽  
pp. 118-123 ◽  
Author(s):  
Lubomír Klimeš ◽  
Pavel Charvát ◽  
Milan Ostrý ◽  
Josef Stetina

Phase change materials have a wide range of application including thermal energy storage in building structures, solar air collectors, heat storage units and exchangers. Such applications often utilize a commercially produced phase change material enclosed in a thin panel (container) made of aluminum. A parallel 1D heat transfer model of a container with phase change material was developed by means of the control volume and effective heat capacity methods. The parallel implementation in the CUDA computing architecture allows the model for running on graphics processing units which makes the model very fast in comparison to traditional models computed on a single CPU. The paper presents the model implementation and results of computational model benchmarking carried out with the use of high-level and low-level GPUs NVIDIA.


2020 ◽  
Author(s):  
Nairit Sur ◽  
Leonardo Cristella ◽  
Adriano Di Florio ◽  
Vincenzo Mastrapasqua

Abstract The demand for computational resources is steadily increasing in experimental high energy physics as the current collider experiments continue to accumulate huge amounts of data and physicists indulge in more complex and ambitious analysis strategies. This is especially true in the fields of hadron spectroscopy and flavour physics where the analyses often depend on complex multidimensional unbinned maximum-likelihood fits, with several dozens of free parameters, with an aim to study the internal structure of hadrons. Graphics processing units (GPUs) represent one of the most sophisticated and versatile parallel computing architectures that are becoming popular toolkits for high energy physicists to meet their computational demands. GooFit is an upcoming open-source tool interfacing ROOT/RooFit to the CUDA platform on NVIDIA GPUs that acts as a bridge between the MINUIT minimization algorithm and a parallel processor, allowing probability density functions to be estimated on multiple cores simultaneously. In this article, a full-fledged amplitude analysis framework developed using GooFit is tested for its speed and reliability. The four-dimensional fitter framework, one of the firsts of its kind to be built on GooFit, is geared towards the search for exotic tetraquark states in the [[EQUATION]] decays and can also be seamlessly adapted for other similar analyses. The GooFit fitter, running on GPUs, shows a remarkable improvement in the computing speed compared to a ROOT/RooFit implementation of the same analysis running on multi-core CPU clusters. Furthermore, it shows sensitivity to components with small contributions to the overall fit. It has the potential to be a powerful tool for sensitive and computationally intensive physics analyses.


Sign in / Sign up

Export Citation Format

Share Document