Complexity and Performance of Secure Floating-Point Polynomial Evaluation Protocols

2021 ◽  
pp. 352-369
Author(s):  
Octavian Catrina
2020 ◽  
Vol 157 ◽  
pp. 353-367
Author(s):  
Yong Ma ◽  
Aiming Zhang ◽  
Lele Yang ◽  
Hao Li ◽  
Zhenfeng Zhai ◽  
...  

2017 ◽  
Author(s):  
◽  
Sruthikesh Surineni

Floating-point computations produce approximate results, possibly leading to inaccuracy and reproducibility problems. Existing work addresses two issues: first, the design of high precision floating-point representations, and second, the study of methods to support a trade-off between accuracy and performance of central processing unit (CPU) applications. However, a comprehensive study of trade-offs between accuracy and performance on modern graphic processing units (GPUs) is missing. This thesis covers the use of different floating-point precisions (i.e., single and double floating-point precision) in the IEEE 754 standard, the GNU Multiple Precision Arithmetic Library (GMP), and composite floating-point precision on a GPU using a variety of synthetic and real-world benchmark applications. First, we analyze the support for a single and double precision floating-point arithmetic on the considered GPU architectures, and we characterize the latencies of all floating-point instructions on GPU. Second, a study is presented on the performance/accuracy tradeoffs related to the use of different arithmetic precisions on addition, multiplication, division, and natural exponential function. Third, an analysis is given on the combined use of different arithmetic operations on three benchmark applications characterized by different instruction mixes and arithmetic intensities. As a result of this analysis, a novel auto tuner was designed in order to select the arithmetic precision of a GPU program leading to a better performance and accuracy tradeoff depending on the arithmetic operations and math functions used in the program and the degree of multithreading of the code.


2013 ◽  
Vol 416-417 ◽  
pp. 1147-1151
Author(s):  
Yong Chun Xu ◽  
Zhe Liu ◽  
Jin Yu Guan

The signal preprocessing methods in a directional audio system are almost based on Berkatay far-field solution. In this paper, the basic principle and performance of square root method are analyzed, and also a directional audio system based on floating-point DSP is designed with the 4-order approximate square root method. Through theory simulation and practical test, the effect is proven to be satisfactory.


2009 ◽  
Vol 17 (1-2) ◽  
pp. 199-208 ◽  
Author(s):  
Olaf Lubeck ◽  
Michael Lang ◽  
Ram Srinivasan ◽  
Greg Johnson

The IBM Cell Broadband Engine (BE) is a novel multi-core chip with the potential for the demanding floating point performance that is required for high-fidelity scientific simulations. However, data movement within the chip can be a major challenge to realizing the benefits of the peak floating point rates. In this paper, we present the results of implementing Sweep3D on the Cell/B.E. using an intra-chip message passing model that minimizes data movement. We compare the advantages/disadvantages of this programming model with a previous implementation using a master–worker threading strategy. We apply a previously validated micro-architecture performance model for the application executing on the Cell/B.E. (based on our previous work in Monte Carlo performance models), that predicts overall CPI (cycles per instruction), and gives a detailed breakdown of processor stalls. Finally, we use the micro-architecture model to assess the performance of future design parameters for the Cell/B.E. micro-architecture. The methodologies and results have broader implications that extend to multi-core architectures.


Author(s):  
Minghu Jiang ◽  
Georges Gielen ◽  
Lin Wang

In this chpater we investigate the combined effects of quantization and clipping on Higher Order function neural networks (HOFNN) and multilayer feedforward neural networks (MLFNN). Statistical models are used to analyze the effects of quantization in a digital implementation. We analyze the performance degradation caused as a function of the number of fixed-point and floating-point quantization bits under the assumption of different probability distributions for the quantized variables, and then compare the training performance between situations with and without weight clipping, and derive in detail the effect of the quantization error on forward and backward propagation. No matter what distribution the initial weights comply with, the weights distribution will approximate a normal distribution for the training of floating-point or high-precision fixed-point quantization. Only when the number of quantization bits is very low, the weights distribution may cluster to ±1 for the training with fixed-point quantization. We establish and analyze the relationships for a true nonlinear neuron between inputs and outputs bit resolution, training and quantization methods, the number of network layers, network order and performance degradation, all based on statistical models, and for on-chip and off-chip training. Our experimental simulation results verify the presented theoretical analysis.


Sign in / Sign up

Export Citation Format

Share Document