field programmable
Recently Published Documents


TOTAL DOCUMENTS

3369
(FIVE YEARS 937)

H-INDEX

52
(FIVE YEARS 11)

2022 ◽  
Vol 15 (2) ◽  
pp. 1-34
Author(s):  
Tobias Alonso ◽  
Lucian Petrica ◽  
Mario Ruiz ◽  
Jakoba Petri-Koenig ◽  
Yaman Umuroglu ◽  
...  

Customized compute acceleration in the datacenter is key to the wider roll-out of applications based on deep neural network (DNN) inference. In this article, we investigate how to maximize the performance and scalability of field-programmable gate array (FPGA)-based pipeline dataflow DNN inference accelerators (DFAs) automatically on computing infrastructures consisting of multi-die, network-connected FPGAs. We present Elastic-DF, a novel resource partitioning tool and associated FPGA runtime infrastructure that integrates with the DNN compiler FINN. Elastic-DF allocates FPGA resources to DNN layers and layers to individual FPGA dies to maximize the total performance of the multi-FPGA system. In the resulting Elastic-DF mapping, the accelerator may be instantiated multiple times, and each instance may be segmented across multiple FPGAs transparently, whereby the segments communicate peer-to-peer through 100 Gbps Ethernet FPGA infrastructure, without host involvement. When applied to ResNet-50, Elastic-DF provides a 44% latency decrease on Alveo U280. For MobileNetV1 on Alveo U200 and U280, Elastic-DF enables a 78% throughput increase, eliminating the performance difference between these cards and the larger Alveo U250. Elastic-DF also increases operating frequency in all our experiments, on average by over 20%. Elastic-DF therefore increases performance portability between different sizes of FPGA and increases the critical throughput per cost metric of datacenter inference.


2022 ◽  
Vol 15 (2) ◽  
pp. 1-21
Author(s):  
Andrew M. Keller ◽  
Michael J. Wirthlin

Field programmable gate arrays (FPGAs) are used in large numbers in data centers around the world. They are used for cloud computing and computer networking. The most common type of FPGA used in data centers are re-programmable SRAM-based FPGAs. These devices offer potential performance and power consumption savings. A single device also carries a small susceptibility to radiation-induced soft errors, which can lead to unexpected behavior. This article examines the impact of terrestrial radiation on FPGAs in data centers. Results from artificial fault injection and accelerated radiation testing on several data-center-like FPGA applications are compared. A new fault injection scheme provides results that are more similar to radiation testing. Silent data corruption (SDC) is the most commonly observed failure mode followed by FPGA unavailable and host unresponsive. A hypothetical deployment of 100,000 FPGAs in Denver, Colorado, will experience upsets in configuration memory every half-hour on average and SDC failures every 0.5–11 days on average.


2022 ◽  
Vol 15 (2) ◽  
pp. 1-35
Author(s):  
Tom Hogervorst ◽  
Răzvan Nane ◽  
Giacomo Marchiori ◽  
Tong Dong Qiu ◽  
Markus Blatt ◽  
...  

Scientific computing is at the core of many High-Performance Computing applications, including computational flow dynamics. Because of the utmost importance to simulate increasingly larger computational models, hardware acceleration is receiving increased attention due to its potential to maximize the performance of scientific computing. Field-Programmable Gate Arrays could accelerate scientific computing because of the possibility to fully customize the memory hierarchy important in irregular applications such as iterative linear solvers. In this article, we study the potential of using Field-Programmable Gate Arrays in High-Performance Computing because of the rapid advances in reconfigurable hardware, such as the increase in on-chip memory size, increasing number of logic cells, and the integration of High-Bandwidth Memories on board. To perform this study, we propose a novel Sparse Matrix-Vector multiplication unit and an ILU0 preconditioner tightly integrated with a BiCGStab solver kernel. We integrate the developed preconditioned iterative solver in Flow from the Open Porous Media project, a state-of-the-art open source reservoir simulator. Finally, we perform a thorough evaluation of the FPGA solver kernel in both stand-alone mode and integrated in the reservoir simulator, using the NORNE field, a real-world case reservoir model using a grid with more than 10 5 cells and using three unknowns per cell.


2022 ◽  
Vol 27 (3) ◽  
pp. 1-26
Author(s):  
Mahabub Hasan Mahalat ◽  
Suraj Mandal ◽  
Anindan Mondal ◽  
Bibhash Sen ◽  
Rajat Subhra Chakraborty

Secure authentication of any Internet-of-Things (IoT) device becomes the utmost necessity due to the lack of specifically designed IoT standards and intrinsic vulnerabilities with limited resources and heterogeneous technologies. Despite the suitability of arbiter physically unclonable function (APUF) among other PUF variants for the IoT applications, implementing it on field-programmable gate arrays (FPGAs) is challenging. This work presents the complete characterization of the path changing switch (PCS) 1 based APUF on two different families of FPGA, like Spartan-3E (90 nm CMOS) and Artix-7 (28 nm CMOS). A comprehensive study of the existing tuning concept for programmable delay logic (PDL) based APUF implemented on FPGA is presented, leading to establishment of its practical infeasibility. We investigate the entropy, randomness properties of the PCS based APUF suitable for practical applications, and the effect of temperature variation signifying the adequate tolerance against environmental variation. The XOR composition of PCS based APUF is introduced to boost performance and security. The robustness of the PCS based APUF against machine learning based modeling attack is evaluated, showing similar characteristics as the conventional APUF. Experimental results validate the efficacy of PCS based APUF with a little hardware footprint removing the paucity of lightweight security primitive for IoT.


2022 ◽  
Vol 15 (3) ◽  
pp. 1-25
Author(s):  
Stefan Brennsteiner ◽  
Tughrul Arslan ◽  
John Thompson ◽  
Andrew McCormick

Machine learning in the physical layer of communication systems holds the potential to improve performance and simplify design methodology. Many algorithms have been proposed; however, the model complexity is often unfeasible for real-time deployment. The real-time processing capability of these systems has not been proven yet. In this work, we propose a novel, less complex, fully connected neural network to perform channel estimation and signal detection in an orthogonal frequency division multiplexing system. The memory requirement, which is often the bottleneck for fully connected neural networks, is reduced by ≈ 27 times by applying known compression techniques in a three-step training process. Extensive experiments were performed for pruning and quantizing the weights of the neural network detector. Additionally, Huffman encoding was used on the weights to further reduce memory requirements. Based on this approach, we propose the first field-programmable gate array based, real-time capable neural network accelerator, specifically designed to accelerate the orthogonal frequency division multiplexing detector workload. The accelerator is synthesized for a Xilinx RFSoC field-programmable gate array, uses small-batch processing to increase throughput, efficiently supports branching neural networks, and implements superscalar Huffman decoders.


2022 ◽  
Vol 15 (3) ◽  
pp. 1-29
Author(s):  
Eli Cahill ◽  
Brad Hutchings ◽  
Jeffrey Goeders

Field-Programmable Gate Arrays (FPGAs) are widely used for custom hardware implementations, including in many security-sensitive industries, such as defense, communications, transportation, medical, and more. Compiling source hardware descriptions to FPGA bitstreams requires the use of complex computer-aided design (CAD) tools. These tools are typically proprietary and closed-source, and it is not possible to easily determine that the produced bitstream is equivalent to the source design. In this work, we present various FPGA design flows that leverage pre-synthesizing or pre-implementing parts of the design, combined with open-source synthesis tools, bitstream-to-netlist tools, and commercial equivalence-checking tools, to verify that a produced hardware design is equivalent to the designer’s source design. We evaluate these different design flows on several benchmark circuits and demonstrate that they are effective at detecting malicious modifications made to the design during compilation. We compare our proposed design flows with baseline commercial design flows and measure the overheads to area and runtime.


Author(s):  
Fatimazahraa Assad ◽  
Mohamed Fettach ◽  
Fadwa El Otmani ◽  
Abderrahim Tragha

<span>The secure hash function has become the default choice for information security, especially in applications that require data storing or manipulation. Consequently, optimized implementations of these functions in terms of Throughput or Area are in high demand. In this work we propose a new conception of the secure hash algorithm 3 (SHA-3), which aim to increase the performance of this function by using pipelining, four types of pipelining are proposed two, three, four, and six pipelining stages. This approach allows us to design data paths of SHA-3 with higher Throughput and higher clock frequencies. The design reaches a maximum Throughput of 102.98 Gbps on Virtex 5 and 115.124 Gbps on Virtex 6 in the case of the 6 stages, for 512 bits output length. Although the utilization of the resource increase with the increase of the number of the cores used in each one of the cases. The proposed designs are coded in very high-speed integrated circuits program (VHSIC) hardware description language (VHDL) and implemented in Xilinx Virtex-5 and Virtex-6 A field-programmable gate array (FPGA) devices and compared to existing FPGA implementations.</span>


Author(s):  
Wisal Adnan Al-Musawi ◽  
Wasan A. Wali ◽  
Mohammed Abd Ali Al-Ibadi

<p>This study aims to design a new architecture of the artificial neural networks (ANNs) using the Xilinx system generator (XSG) and its hardware co-simulation equivalent model using field programmable gate array (FPGA) to predict the behavior of Chua’s chaotic system and use it in hiding information. The work proposed consists of two main sections. In the first section, MATLAB R2016a was used to build a 3×4×3 feed forward neural network (FFNN). The training results demonstrate that FFNN training in the Bayesian regulation algorithm is sufficiently accurate to directly implement. The second section demonstrates the hardware implementation of the network with the XSG on the Xilinx artix7 xc7a100t-1csg324 chip. Finally, the message was first encrypted using a dynamic Chua system and then decrypted using ANN’s chaotic dynamics. ANN models were developed to implement hardware in the FPGA system using the IEEE 754 Single precision floating-point format. The ANN design method illustrated can be extended to other chaotic systems in general.</p>


2022 ◽  
Vol 15 (2) ◽  
pp. 1-31
Author(s):  
Joel Mandebi Mbongue ◽  
Danielle Tchuinkou Kwadjo ◽  
Alex Shuping ◽  
Christophe Bobda

Cloud deployments now increasingly exploit Field-Programmable Gate Array (FPGA) accelerators as part of virtual instances. While cloud FPGAs are still essentially single-tenant, the growing demand for efficient hardware acceleration paves the way to FPGA multi-tenancy. It then becomes necessary to explore architectures, design flows, and resource management features that aim at exposing multi-tenant FPGAs to the cloud users. In this article, we discuss a hardware/software architecture that supports provisioning space-shared FPGAs in Kernel-based Virtual Machine (KVM) clouds. The proposed hardware/software architecture introduces an FPGA organization that improves hardware consolidation and support hardware elasticity with minimal data movement overhead. It also relies on VirtIO to decrease communication latency between hardware and software domains. Prototyping the proposed architecture with a Virtex UltraScale+ FPGA demonstrated near specification maximum frequency for on-chip data movement and high throughput in virtual instance access to hardware accelerators. We demonstrate similar performance compared to single-tenant deployment while increasing FPGA utilization, which is one of the goals of virtualization. Overall, our FPGA design achieved about 2× higher maximum frequency than the state of the art and a bandwidth reaching up to 28 Gbps on 32-bit data width.


2022 ◽  
Vol 15 (2) ◽  
pp. 1-29
Author(s):  
Paolo D'Alberto ◽  
Victor Wu ◽  
Aaron Ng ◽  
Rahul Nimaiyar ◽  
Elliott Delaye ◽  
...  

We present xDNN, an end-to-end system for deep-learning inference based on a family of specialized hardware processors synthesized on Field-Programmable Gate Array (FPGAs) and Convolution Neural Networks (CNN). We present a design optimized for low latency, high throughput, and high compute efficiency with no batching. The design is scalable and a parametric function of the number of multiply-accumulate units, on-chip memory hierarchy, and numerical precision. The design can produce a scale-down processor for embedded devices, replicated to produce more cores for larger devices, or resized to optimize efficiency. On Xilinx Virtex Ultrascale+ VU13P FPGA, we achieve 800 MHz that is close to the Digital Signal Processing maximum frequency and above 80% efficiency of on-chip compute resources. On top of our processor family, we present a runtime system enabling the execution of different networks for different input sizes (i.e., from 224× 224 to 2048× 1024). We present a compiler that reads CNNs from native frameworks (i.e., MXNet, Caffe, Keras, and Tensorflow), optimizes them, generates codes, and provides performance estimates. The compiler combines quantization information from the native environment and optimizations to feed the runtime with code as efficient as any hardware expert could write. We present tools partitioning a CNN into subgraphs for the division of work to CPU cores and FPGAs. Notice that the software will not change when or if the FPGA design becomes an ASIC, making our work vertical and not just a proof-of-concept FPGA project. We show experimental results for accuracy, latency, and power for several networks: In summary, we can achieve up to 4 times higher throughput, 3 times better power efficiency than the GPUs, and up to 20 times higher throughput than the latest CPUs. To our knowledge, we provide solutions faster than any previous FPGA-based solutions and comparable to any other top-of-the-shelves solutions.


Sign in / Sign up

Export Citation Format

Share Document