Evaluation of Clustering Algorithms on GPU-Based Edge Computing Platforms

Internet of Things (IoT) is becoming a new socioeconomic revolution in which data and immediacy are the main ingredients. IoT generates large datasets on a daily basis but it is currently considered as “dark data”, i.e., data generated but never analyzed. The efficient analysis of this data is mandatory to create intelligent applications for the next generation of IoT applications that benefits society. Artificial Intelligence (AI) techniques are very well suited to identifying hidden patterns and correlations in this data deluge. In particular, clustering algorithms are of the utmost importance for performing exploratory data analysis to identify a set (a.k.a., cluster) of similar objects. Clustering algorithms are computationally heavy workloads and require to be executed on high-performance computing clusters, especially to deal with large datasets. This execution on HPC infrastructures is an energy hungry procedure with additional issues, such as high-latency communications or privacy. Edge computing is a paradigm to enable light-weight computations at the edge of the network that has been proposed recently to solve these issues. In this paper, we provide an in-depth analysis of emergent edge computing architectures that include low-power Graphics Processing Units (GPUs) to speed-up these workloads. Our analysis includes performance and power consumption figures of the latest Nvidia’s AGX Xavier to compare the energy-performance ratio of these low-cost platforms with a high-performance cloud-based counterpart version. Three different clustering algorithms (i.e., k-means, Fuzzy Minimals (FM), and Fuzzy C-Means (FCM)) are designed to be optimally executed on edge and cloud platforms, showing a speed-up factor of up to 11× for the GPU code compared to sequential counterpart versions in the edge platforms and energy savings of up to 150% between the edge computing and HPC platforms.

Download Full-text

Constructing a Bioinformatics Platform with Web and Mobile Services Based on NVIDIA Jetson TK1

International Journal of Grid and High Performance Computing ◽

10.4018/ijghpc.2015100105 ◽

2015 ◽

Vol 7 (4) ◽

pp. 57-73 ◽

Cited By ~ 2

Author(s):

Chun-Yuan Lin ◽

Jin Ye ◽

Che-Lun Hung ◽

Chung-Hung Wang ◽

Min Su ◽

...

Keyword(s):

Power Consumption ◽

High Performance Computing ◽

Graphics Processing Units ◽

High Performance ◽

Low Cost ◽

Research Direction ◽

Mobile Services ◽

Performance Ratio ◽

The Cost ◽

Performance Computing

Current high-end graphics processing units (abbreviate to GPUs), such as NVIDIA Tesla, Fermi, Kepler series cards which contain up to thousand cores per-chip, are widely used in the high performance computing fields. These GPU cards (called desktop GPUs) should be installed in personal computers/servers with desktop CPUs; moreover, the cost and power consumption of constructing a high performance computing platform with these desktop CPUs and GPUs are high. NVIDIA releases Tegra K1, called Jetson TK1, which contains 4 ARM Cortex-A15 CPUs and 192 CUDA cores (Kepler GPU) and is an embedded board with low cost, low power consumption and high applicability advantages for embedded applications. NVIDIA Jetson TK1 becomes a new research direction. Hence, in this paper, a bioinformatics platform was constructed based on NVIDIA Jetson TK1. ClustalWtk and MCCtk tools for sequence alignment and compound comparison were designed on this platform, respectively. Moreover, the web and mobile services for these two tools with user friendly interfaces also were provided. The experimental results showed that the cost-performance ratio by NVIDIA Jetson TK1 is higher than that by Intel XEON E5-2650 CPU and NVIDIA Tesla K20m GPU card.

Download Full-text

Design and Operation of a Polygeneration System in Spanish Climate Buildings under an Exergetic Perspective

Energies ◽

10.3390/en14227636 ◽

2021 ◽

Vol 14 (22) ◽

pp. 7636

Author(s):

Ana Picallo-Perez ◽

Jose Maria Sala-Lizarraga

Keyword(s):

Energy Savings ◽

Energy Performance ◽

Point Of View ◽

Office Building ◽

Air Handling Unit ◽

Depth Analysis ◽

Energy And Exergy ◽

Polygeneration System ◽

Almost All

This work defines and analyzes the performance of a polygeneration system in five different locations in Spain to maintain the thermal comfort and air quality of an office building. The facility is based on a chiller and a CHP engine with PV panels that provide almost all the electricity demand of the chiller. According to the energy performance analysis results, the installation working in Bilbao is a full polygeneration system since no electricity needs to be imported from the grid in summer. To quantify the energy savings related to a separated production facility, polygeneration indicators (percentage of savings PES/PExS and equivalent electric efficiency EEE/EExE) have been calculated in energy and exergy terms. The main motivation for using exergy is based on the ambiguity that can arise from the point of view of the First Law. As expected, the exergetic indicators have lower values than the energetic ones. In addition, an in-depth analysis was conducted for the air-handling unit components. The study shows the behavior of components over the year and the efficiency values from both an energy and exergy point of view. From these facts, the need arises to develop methodologies based on exergy.

Download Full-text

Architecture as a strategy for reduced energy consumption? An in-depth analysis of residential practices’ influence on the energy performance of passive houses

Smart and Sustainable Built Environment ◽

10.1108/sasbe-07-2013-0042 ◽

2014 ◽

Vol 3 (3) ◽

pp. 192-206 ◽

Cited By ~ 3

Author(s):

Solvår Wågø ◽

Thomas Berker

Keyword(s):

Energy Consumption ◽

Energy Efficient ◽

Energy Savings ◽

Energy Performance ◽

Content Type ◽

Depth Analysis ◽

New Energy ◽

Passive Houses ◽

The One ◽

Actual Energy Consumption

Purpose – The purpose of this paper is to discuss how architectural solutions may influence residential practice and energy consumption. Design/methodology/approach – The paper is part of a larger study based on qualitative investigations of six energy-efficient housing projects in Norway. Here, the authors examine one of these projects, Løvåshagen in Bergen, the first Norwegian passive house flat building. Based on a combination of 14 interviews with household members and energy consumption data for all flats, the authors show how residential practices influence energy consumption. In the discussion and conclusion, the authors focus on the role of the architecture in these practices. Findings – On the one hand, Løvåshagen reflects a mainstreaming approach to sustainable building, attracting a wide array of different occupants. On the other hand, the specific add-ons that are intended to make the buildings energy efficient require new definitions of comfort and new skills to achieve the promised energy savings. This combination can explain why Løvåshagen, after four years of occupation, has a large variation in actual energy consumption. Practical implications – In designing new energy-efficient housing, greater attention should be paid to the level of end-user control and adaptability, the level of system complexity, and the need for adequate information. An alternative to the mainstreaming approach would be to actively use architecture to influence residential practices towards reduced energy consumption. Originality/value – The use of qualitative methods to analyse quantitative energy data is original and provides promising opportunities to understand the significance of residential practices regarding actual energy consumption.

Download Full-text

Accelerating a Geometrical Approximated PCA Algorithm Using AVX2 and CUDA

Remote Sensing ◽

10.3390/rs12121918 ◽

2020 ◽

Vol 12 (12) ◽

pp. 1918 ◽

Cited By ~ 1

Author(s):

Alina L. Machidon ◽

Octavian M. Machidon ◽

Cătălin B. Ciobanu ◽

Petre L. Ogrutan

Keyword(s):

Energy Consumption ◽

Dimensionality Reduction ◽

Graphics Processing Units ◽

High Performance ◽

Hyperspectral Image ◽

Projection Pursuit ◽

Remote Sensing Data ◽

Principal Component ◽

Large Datasets ◽

Central Processing

Remote sensing data has known an explosive growth in the past decade. This has led to the need for efficient dimensionality reduction techniques, mathematical procedures that transform the high-dimensional data into a meaningful, reduced representation. Projection Pursuit (PP) based algorithms were shown to be efficient solutions for performing dimensionality reduction on large datasets by searching low-dimensional projections of the data where meaningful structures are exposed. However, PP faces computational difficulties in dealing with very large datasets—which are common in hyperspectral imaging, thus raising the challenge for implementing such algorithms using the latest High Performance Computing approaches. In this paper, a PP-based geometrical approximated Principal Component Analysis algorithm (gaPCA) for hyperspectral image analysis is implemented and assessed on multi-core Central Processing Units (CPUs), Graphics Processing Units (GPUs) and multi-core CPUs using Single Instruction, Multiple Data (SIMD) AVX2 (Advanced Vector eXtensions) intrinsics, which provide significant improvements in performance and energy usage over the single-core implementation. Thus, this paper presents a cross-platform and cross-language perspective, having several implementations of the gaPCA algorithm in Matlab, Python, C++ and GPU implementations based on NVIDIA Compute Unified Device Architecture (CUDA). The evaluation of the proposed solutions is performed with respect to the execution time and energy consumption. The experimental evaluation has shown not only the advantage of using CUDA programming in implementing the gaPCA algorithm on a GPU in terms of performance and energy consumption, but also significant benefits in implementing it on the multi-core CPU using AVX2 intrinsics.

Download Full-text

ASTRAL-MP: scaling ASTRAL to very large datasets using randomization and parallelization

Bioinformatics ◽

10.1093/bioinformatics/btz211 ◽

2019 ◽

Vol 35 (20) ◽

pp. 3961-3969 ◽

Cited By ~ 9

Author(s):

John Yin ◽

Chao Zhang ◽

Siavash Mirarab

Keyword(s):

Relative Efficiency ◽

Graphics Processing Units ◽

Large Datasets ◽

Supplementary Information ◽

Gene Trees ◽

Multiple Cores ◽

Dynamic Programing ◽

Very Large Datasets ◽

Speed Up ◽

Graphics Processing

Abstract Motivation Evolutionary histories can change from one part of the genome to another. The potential for discordance between the gene trees has motivated the development of summary methods that reconstruct a species tree from an input collection of gene trees. ASTRAL is a widely used summary method and has been able to scale to relatively large datasets. However, the size of genomic datasets is quickly growing. Despite its relative efficiency, the current single-threaded implementation of ASTRAL is falling behind the data growth trends is not able to analyze the largest available datasets in a reasonable time. Results ASTRAL uses dynamic programing and is not trivially parallel. In this paper, we introduce ASTRAL-MP, the first version of ASTRAL that can exploit parallelism and also uses randomization techniques to speed up some of its steps. Importantly, ASTRAL-MP can take advantage of not just multiple CPU cores but also one or several graphics processing units (GPUs). The ASTRAL-MP code scales very well with increasing CPU cores, and its GPU version, implemented in OpenCL, can have up to 158× speedups compared to ASTRAL-III. Using GPUs and multiple cores, ASTRAL-MP is able to analyze datasets with 10 000 species or datasets with more than 100 000 genes in <2 days. Availability and implementation ASTRAL-MP is available at https://github.com/smirarab/ASTRAL/tree/MP. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Heterogeneous Reconstruction of Tracks and Primary Vertices With the CMS Pixel Tracker

Frontiers in Big Data ◽

10.3389/fdata.2020.601728 ◽

2020 ◽

Vol 3 ◽

Author(s):

A. Bocci ◽

V. Innocente ◽

M. Kortelainen ◽

F. Pantaleo ◽

M. Rovere

Keyword(s):

Graphics Processing Units ◽

High Performance ◽

Heterogeneous Computing ◽

Hadron Collider ◽

Proton Proton ◽

Processing Power ◽

Reconstruction Software ◽

Speed Up ◽

Computing Platforms ◽

Graphics Processing

The High-Luminosity upgrade of the Large Hadron Collider (LHC) will see the accelerator reach an instantaneous luminosity of 7 × 1034 cm−2 s−1 with an average pileup of 200 proton-proton collisions. These conditions will pose an unprecedented challenge to the online and offline reconstruction software developed by the experiments. The computational complexity will exceed by far the expected increase in processing power for conventional CPUs, demanding an alternative approach. Industry and High-Performance Computing (HPC) centers are successfully using heterogeneous computing platforms to achieve higher throughput and better energy efficiency by matching each job to the most appropriate architecture. In this paper we will describe the results of a heterogeneous implementation of pixel tracks and vertices reconstruction chain on Graphics Processing Units (GPUs). The framework has been designed and developed to be integrated in the CMS reconstruction software, CMSSW. The speed up achieved by leveraging GPUs allows for more complex algorithms to be executed, obtaining better physics output and a higher throughput.

Download Full-text

A Study on the Operation Performance of a Minienvironment System

Journal of the IEST ◽

10.17764/jiet.49.1.7k8137171w0h5mnp ◽

2006 ◽

Vol 49 (1) ◽

pp. 63-71 ◽

Cited By ~ 3

Author(s):

Tengfang Xu

Keyword(s):

High Performance ◽

Energy Savings ◽

Pressure Control ◽

Energy Performance ◽

Open Loop ◽

Operation Performance ◽

Contamination Control ◽

Starting Point ◽

Quantitative Results ◽

Magnitude Improvement

A minienvironment is a localized environment created by an enclosure to isolate a product or process from the surrounding environment. Minienvironments have been gaining popularity as a means to provide effective containment for critical contamination control. The use of minienvironments can provide several orders of magnitude improvement in particle cleanliness levels, while energy intensity may be shifted from the conventional cleanroom systems to the minienvironments that enclose specific processes. Prior to this study, there was little information available or published to quantify the energy performance of minienvironment systems. This paper will present quantitative results from a recent study of the operation performance of an open-loop minienvironment air system in a ballroom setting, including quantification of operation range, energy performance index, pressure control, electric power density, and airflows. The paper also provides a comparison of the newly measured results from this study with previously measured cleanroom performance. The results can serve as a starting point for identifying areas for energy savings from applying high-performance minienvironments in cleanrooms.

Download Full-text

Accelerating Training Process in Logistic Regression Model using OpenCL Framework

International Journal of Grid and High Performance Computing ◽

10.4018/ijghpc.2017070103 ◽

2017 ◽

Vol 9 (3) ◽

pp. 34-45

Author(s):

Hamada M. Zahera ◽

Ashraf Bahgat El-Sisi

Keyword(s):

Logistic Regression ◽

Regression Model ◽

Graphics Processing Units ◽

High Performance ◽

Message Passing Interface ◽

Logistic Regression Model ◽

Gpu Computing ◽

Large Datasets ◽

Training Process ◽

Training Time

In this paper, the authors propose a new parallel implemented approach on Graphics Processing Units (GPU) for training logistic regression model. Logistic regression has been applied in many machine learning applications to build building predictive models. However, logistic training regularly requires a long time to adapt an accurate prediction model. Researchers have worked out to reduce training time using different technologies such as multi-threading, Multi-core CPUs and Message Passing Interface (MPI). In their study, the authors consider the high computation capabilities of GPU and easy development onto Open Computing Language (OpenCL) framework to execute logistic training process. GPU and OpenCL are the best choice with low cost and high performance for scaling up logistic regression model in handling large datasets. The proposed approach was implement in OpenCL C/C++ and tested by different size datasets on two GPU platforms. The experimental results showed a significant improvement in execution time with large datasets, which is reduced inversely by the available GPU computing units.

Download Full-text