Complexity and Performance of Secure Floating-Point Polynomial Evaluation Protocols

Motion simulation and performance analysis of two-body floating point absorber wave energy converter

Renewable Energy ◽

10.1016/j.renene.2020.05.026 ◽

2020 ◽

Vol 157 ◽

pp. 353-367

Author(s):

Yong Ma ◽

Aiming Zhang ◽

Lele Yang ◽

Hao Li ◽

Zhenfeng Zhai ◽

...

Keyword(s):

Performance Analysis ◽

Wave Energy ◽

Wave Energy Converter ◽

Floating Point ◽

Motion Simulation ◽

Energy Converter ◽

Point Absorber ◽

And Performance

Download Full-text

Design and performance tradeoff analysis of floating point datapath in LTE downlink control channel receiver

10.1109/icrtit.2014.6996131 ◽

2014 ◽

Author(s):

S. Syed Ameer Abbas ◽

S. Susithra ◽

D. Shanmuga Priya ◽

S.J. Thiruvengadam

Keyword(s):

Control Channel ◽

Floating Point ◽

Tradeoff Analysis ◽

And Performance ◽

Performance Tradeoff

Download Full-text

Performance/accuracy trade-offs of floating-point arithmetic on NVidia GPUs : from a characterization to an auto-tuner

10.32469/10355/66752 ◽

2017 ◽

Author(s):

◽

Sruthikesh Surineni

Keyword(s):

Floating Point ◽

Processing Unit ◽

Double Precision ◽

Graphic Processing Units ◽

Arithmetic Operations ◽

Floating Point Arithmetic ◽

Performance Accuracy ◽

Trade Offs ◽

And Performance ◽

Point Arithmetic

Floating-point computations produce approximate results, possibly leading to inaccuracy and reproducibility problems. Existing work addresses two issues: first, the design of high precision floating-point representations, and second, the study of methods to support a trade-off between accuracy and performance of central processing unit (CPU) applications. However, a comprehensive study of trade-offs between accuracy and performance on modern graphic processing units (GPUs) is missing. This thesis covers the use of different floating-point precisions (i.e., single and double floating-point precision) in the IEEE 754 standard, the GNU Multiple Precision Arithmetic Library (GMP), and composite floating-point precision on a GPU using a variety of synthetic and real-world benchmark applications. First, we analyze the support for a single and double precision floating-point arithmetic on the considered GPU architectures, and we characterize the latencies of all floating-point instructions on GPU. Second, a study is presented on the performance/accuracy tradeoffs related to the use of different arithmetic precisions on addition, multiplication, division, and natural exponential function. Third, an analysis is given on the combined use of different arithmetic operations on three benchmark applications characterized by different instruction mixes and arithmetic intensities. As a result of this analysis, a novel auto tuner was designed in order to select the arithmetic precision of a GPU program leading to a better performance and accuracy tradeoff depending on the arithmetic operations and math functions used in the program and the degree of multithreading of the code.

Download Full-text

Power and Performance Tradeoff of a Floating-Point Intensive Kernel on OpenCL FPGA Platform

2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) ◽

10.1109/ipdpsw.2018.00115 ◽

2018 ◽

Cited By ~ 2

Author(s):

Zheming Jin ◽

Hal Finkel

Keyword(s):

Floating Point ◽

And Performance ◽

Performance Tradeoff

Download Full-text

Accurate summation, dot product and polynomial evaluation in complex floating point arithmetic

Information and Computation ◽

10.1016/j.ic.2011.09.003 ◽

2012 ◽

Vol 216 ◽

pp. 57-71 ◽

Cited By ~ 8

Author(s):

Stef Graillat ◽

Valérie Ménissier-Morain

Keyword(s):

Floating Point ◽

Polynomial Evaluation ◽

Floating Point Arithmetic ◽

Dot Product ◽

Point Arithmetic

Download Full-text

Efficient floating-point polynomial evaluation on FPGAS

2013 23rd International Conference on Field programmable Logic and Applications ◽

10.1109/fpl.2013.6645530 ◽

2013 ◽

Cited By ~ 1

Author(s):

Martin Langhammer ◽

Bogdan Pasca

Keyword(s):

Floating Point ◽

Polynomial Evaluation

Download Full-text

The DSP Design and Implementation of a Directional Audio System through N-Order Approximate Square Root Method

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.416-417.1147 ◽

2013 ◽

Vol 416-417 ◽

pp. 1147-1151

Author(s):

Yong Chun Xu ◽

Zhe Liu ◽

Jin Yu Guan

Keyword(s):

Basic Principle ◽

Floating Point ◽

Far Field ◽

Square Root ◽

Practical Test ◽

Design And Implementation ◽

Field Solution ◽

And Performance ◽

Square Root Method ◽

Audio System

The signal preprocessing methods in a directional audio system are almost based on Berkatay far-field solution. In this paper, the basic principle and performance of square root method are analyzed, and also a directional audio system based on floating-point DSP is designed with the 4-order approximate square root method. Through theory simulation and practical test, the effect is proven to be satisfactory.

Download Full-text

Fast floating point vectoring algorithms and performance evaluation

Conference Record of the Thirty-Third Asilomar Conference on Signals, Systems, and Computers (Cat. No.CH37020) ◽

10.1109/acssc.1999.831928 ◽

2003 ◽

Author(s):

Jeong-A Lee ◽

K.-J. van der Kolk

Keyword(s):

Performance Evaluation ◽

Floating Point ◽

And Performance

Download Full-text

Implementation and Performance Modeling of Deterministic Particle Transport (Sweep3D) on the IBM Cell/B.E.

Scientific Programming ◽

10.1155/2009/784153 ◽

2009 ◽

Vol 17 (1-2) ◽

pp. 199-208 ◽

Cited By ~ 5

Author(s):

Olaf Lubeck ◽

Michael Lang ◽

Ram Srinivasan ◽

Greg Johnson

Keyword(s):

Message Passing ◽

Programming Model ◽

Performance Model ◽

Floating Point ◽

Design Parameters ◽

Data Movement ◽

Architecture Model ◽

Future Design ◽

Scientific Simulations ◽

And Performance

The IBM Cell Broadband Engine (BE) is a novel multi-core chip with the potential for the demanding floating point performance that is required for high-fidelity scientific simulations. However, data movement within the chip can be a major challenge to realizing the benefits of the peak floating point rates. In this paper, we present the results of implementing Sweep3D on the Cell/B.E. using an intra-chip message passing model that minimizes data movement. We compare the advantages/disadvantages of this programming model with a previous implementation using a master–worker threading strategy. We apply a previously validated micro-architecture performance model for the application executing on the Cell/B.E. (based on our previous work in Monte Carlo performance models), that predicts overall CPI (cycles per instruction), and gives a detailed breakdown of processor stalls. Finally, we use the micro-architecture model to assess the performance of future design parameters for the Cell/B.E. micro-architecture. The methodologies and results have broader implications that extend to multi-core architectures.

Download Full-text

Analysis of Quantization Effects on Higher Order Function and Multilayer Feedforward Neural Networks

Artificial Higher Order Neural Networks for Computer Science and Engineering ◽

10.4018/978-1-61520-711-4.ch008 ◽

2010 ◽

pp. 187-222 ◽

Cited By ~ 3

Author(s):

Minghu Jiang ◽

Georges Gielen ◽

Lin Wang

Keyword(s):

Neural Networks ◽

Fixed Point ◽

Statistical Models ◽

Feedforward Neural Networks ◽

Performance Degradation ◽

Higher Order ◽

Floating Point ◽

Order Function ◽

Network Layers ◽

And Performance

In this chpater we investigate the combined effects of quantization and clipping on Higher Order function neural networks (HOFNN) and multilayer feedforward neural networks (MLFNN). Statistical models are used to analyze the effects of quantization in a digital implementation. We analyze the performance degradation caused as a function of the number of fixed-point and floating-point quantization bits under the assumption of different probability distributions for the quantized variables, and then compare the training performance between situations with and without weight clipping, and derive in detail the effect of the quantization error on forward and backward propagation. No matter what distribution the initial weights comply with, the weights distribution will approximate a normal distribution for the training of floating-point or high-precision fixed-point quantization. Only when the number of quantization bits is very low, the weights distribution may cluster to ±1 for the training with fixed-point quantization. We establish and analyze the relationships for a true nonlinear neuron between inputs and outputs bit resolution, training and quantization methods, the number of network layers, network order and performance degradation, all based on statistical models, and for on-chip and off-chip training. Our experimental simulation results verify the presented theoretical analysis.

Download Full-text