Organizational Design of Big Data and Analytics Teams

2018 ◽  
Vol 5 (3) ◽  
pp. 132-149 ◽  
Author(s):  
Lennart Hammerström

Abstract Although many would argue that the most important factor for the success of a big data project is the process of analyzing the data, it is more important to staff, structure and organize the participants involved to ensure an efficient collaboration within the team and an effective use of the tool sets, the relevant applications and a customized flow of information. A main challenge of big data projects originates from the amount of people involved and that need to collaborate, the need for a higher and specific education, the defined approach to solve the analytical problem that is undefined in many cases, the data-set itself (structured or unstructured) and the required hard- and software (such as analysis-software or self-learning algorithms). Today there is neither an organizational framework nor overarching guidelines for the creation of a high-performance analytics team and its organizational integration available. This paper builds upon (a) the organizational design of a team for a big data project, (b) the relevant roles and competencies (such as programming or communication skills) of the members of the team and (c) the form in which they are connected and managed.

2021 ◽  
Author(s):  
Li Guochao ◽  
Zhigang Liu ◽  
Jie Lu ◽  
Honggen Zhou ◽  
Li Sun

Abstract Groove is a key structure of high-performance integral cutting tools. It has to be manufactured by 5-axis grinding machine due to its complex spatial geometry and hard materials. The crucial manufacturing parameters (CMP) are grinding wheel positions and geometries. However, it is a challenging problem to solve the CMP for the designed groove. The traditional trial-and-error or analytical methods have defects such as time-consuming, limited-applying and low accuracy. In this study, the problem is translated into a multiple output regression model of groove manufacture (MORGM) based on the big data technology and AI algorithms. The input are 34 groove geometry features and the output are 5 CMP. Firstly, two groove machining big data sets with different range are established, each of which is includes 46656 records. They are used as data resource for MORGM. Secondly, 7 AI algorithms, including linear regression, k nearest-neighbor regression, decision trees, random forest regression, support vector regression and ANN algorithms are discussed to build the model. Then, 28 experiments are carried out to test the big data set and algorithms. Finally, the best MORGM is built by ANN algorithm and the big data set with a larger range. The results show that CMP can be calculated accurately and conveniently by the built MORGM.


2022 ◽  
Vol 9 (1) ◽  
Author(s):  
Georgios Vranopoulos ◽  
Nathan Clarke ◽  
Shirley Atkinson

AbstractThe creation of new knowledge from manipulating and analysing existing knowledge is one of the primary objectives of any cognitive system. Most of the effort on Big Data research has been focussed upon Volume and Velocity, while Variety, “the ugly duckling” of Big Data, is often neglected and difficult to solve. A principal challenge with Variety is being able to understand and comprehend the data. This paper proposes and evaluates an automated approach for metadata identification and enrichment in describing Big Data. The paper focuses on the use of self-learning systems that will enable automatic compliance of data against regulatory requirements along with the capability of generating valuable and readily usable metadata towards data classification. Two experiments towards data confidentiality and data identification were conducted in evaluating the feasibility of the approach. The focus of the experiments was to confirm that repetitive manual tasks can be automated, thus reducing the focus of a Data Scientist on data identification and thereby providing more focus towards the extraction and analysis of the data itself. The origin of the datasets used were Private/Business and Public/Governmental and exhibited diverse characteristics in relation to the number of files and size of the files. The experimental work confirmed that: (a) the use of algorithmic techniques attributed to the substantial decrease in false positives regarding the identification of confidential information; (b) evidence that the use of a fraction of a data set along with statistical analysis and supervised learning is sufficient in identifying the structure of information within it. With this approach, the issues of understanding the nature of data can be mitigated, enabling a greater focus on meaningful interpretation of the heterogeneous data.


Author(s):  
. Monika ◽  
Pardeep Kumar ◽  
Sanjay Tyagi

In Cloud computing environment QoS i.e. Quality-of-Service and cost is the key element that to be take care of. As, today in the era of big data, the data must be handled properly while satisfying the request. In such case, while handling request of large data or for scientific applications request, flow of information must be sustained. In this paper, a brief introduction of workflow scheduling is given and also a detailed survey of various scheduling algorithms is performed using various parameter.


Author(s):  
C. Sauer ◽  
F. Bagusat ◽  
M.-L. Ruiz-Ripoll ◽  
C. Roller ◽  
M. Sauer ◽  
...  

AbstractThis work aims at the characterization of a modern concrete material. For this purpose, we perform two experimental series of inverse planar plate impact (PPI) tests with the ultra-high performance concrete B4Q, using two different witness plate materials. Hugoniot data in the range of particle velocities from 180 to 840 m/s and stresses from 1.1 to 7.5 GPa is derived from both series. Within the experimental accuracy, they can be seen as one consistent data set. Moreover, we conduct corresponding numerical simulations and find a reasonably good agreement between simulated and experimentally obtained curves. From the simulated curves, we derive numerical Hugoniot results that serve as a homogenized, mean shock response of B4Q and add further consistency to the data set. Additionally, the comparison of simulated and experimentally determined results allows us to identify experimental outliers. Furthermore, we perform a parameter study which shows that a significant influence of the applied pressure dependent strength model on the derived equation of state (EOS) parameters is unlikely. In order to compare the current results to our own partially reevaluated previous work and selected recent results from literature, we use simulations to numerically extrapolate the Hugoniot results. Considering their inhomogeneous nature, a consistent picture emerges for the shock response of the discussed concrete and high-strength mortar materials. Hugoniot results from this and earlier work are presented for further comparisons. In addition, a full parameter set for B4Q, including validated EOS parameters, is provided for the application in simulations of impact and blast scenarios.


2021 ◽  
pp. 016555152110184
Author(s):  
Gunjan Chandwani ◽  
Anil Ahlawat ◽  
Gaurav Dubey

Document retrieval plays an important role in knowledge management as it facilitates us to discover the relevant information from the existing data. This article proposes a cluster-based inverted indexing algorithm for document retrieval. First, the pre-processing is done to remove the unnecessary and redundant words from the documents. Then, the indexing of documents is done by the cluster-based inverted indexing algorithm, which is developed by integrating the piecewise fuzzy C-means (piFCM) clustering algorithm and inverted indexing. After providing the index to the documents, the query matching is performed for the user queries using the Bhattacharyya distance. Finally, the query optimisation is done by the Pearson correlation coefficient, and the relevant documents are retrieved. The performance of the proposed algorithm is analysed by the WebKB data set and Twenty Newsgroups data set. The analysis exposes that the proposed algorithm offers high performance with a precision of 1, recall of 0.70 and F-measure of 0.8235. The proposed document retrieval system retrieves the most relevant documents and speeds up the storing and retrieval of information.


Molecules ◽  
2021 ◽  
Vol 26 (2) ◽  
pp. 437
Author(s):  
Marta Tikhomirov ◽  
Błażej Poźniak ◽  
Tomasz Śniegocki

The precise and reliable determination of buprenorphine concentration is fundamental in certain medical or research applications, particularly in pharmacokinetic studies of this opioid. The main challenge is, however, the development of an analytical method that is sensitive enough, as the detected in vivo concentrations often fall in very low ranges. Thus, in this study we aimed at developing a sensitive, repeatable, cost-efficient, and easy HPLC analytical protocol for buprenorphine in rabbit plasma. In order to obtain this, the HPLC-MS2 system was used to elaborate and validate the method for samples purified with liquid-liquid extraction. Fragment ions 468.6→396.2 and 468.6→414.2 were monitored, and the method resulted in a high repeatability and reproducibility and a limit of quantification of 0.25 µg/L with a recovery of 98.7–109.0%. The method was linear in a range of 0.25–2000 µg/L. The suitability of the analytical procedure was tested in rabbits in a pilot pharmacokinetic study, and it was revealed that the method was suitable for comprehensively describing the pharmacokinetic profile after buprenorphine intravenous administration at a dose of 300 µg/kg. Thus, the method suitability for pharmacokinetic application was confirmed by both the good validation results of the method and successful in vivo tests in rabbits.


2018 ◽  
Vol 10 (8) ◽  
pp. 80
Author(s):  
Lei Zhang ◽  
Xiaoli Zhi

Convolutional neural networks (CNN for short) have made great progress in face detection. They mostly take computation intensive networks as the backbone in order to obtain high precision, and they cannot get a good detection speed without the support of high-performance GPUs (Graphics Processing Units). This limits CNN-based face detection algorithms in real applications, especially in some speed dependent ones. To alleviate this problem, we propose a lightweight face detector in this paper, which takes a fast residual network as backbone. Our method can run fast even on cheap and ordinary GPUs. To guarantee its detection precision, multi-scale features and multi-context are fully exploited in efficient ways. Specifically, feature fusion is used to obtain semantic strongly multi-scale features firstly. Then multi-context including both local and global context is added to these multi-scale features without extra computational burden. The local context is added through a depthwise separable convolution based approach, and the global context by a simple global average pooling way. Experimental results show that our method can run at about 110 fps on VGA (Video Graphics Array)-resolution images, while still maintaining competitive precision on WIDER FACE and FDDB (Face Detection Data Set and Benchmark) datasets as compared with its state-of-the-art counterparts.


2019 ◽  
Vol 6 (1) ◽  
Author(s):  
Mahdi Torabzadehkashi ◽  
Siavash Rezaei ◽  
Ali HeydariGorji ◽  
Hosein Bobarshad ◽  
Vladimir Alves ◽  
...  

AbstractIn the era of big data applications, the demand for more sophisticated data centers and high-performance data processing mechanisms is increasing drastically. Data are originally stored in storage systems. To process data, application servers need to fetch them from storage devices, which imposes the cost of moving data to the system. This cost has a direct relation with the distance of processing engines from the data. This is the key motivation for the emergence of distributed processing platforms such as Hadoop, which move process closer to data. Computational storage devices (CSDs) push the “move process to data” paradigm to its ultimate boundaries by deploying embedded processing engines inside storage devices to process data. In this paper, we introduce Catalina, an efficient and flexible computational storage platform, that provides a seamless environment to process data in-place. Catalina is the first CSD equipped with a dedicated application processor running a full-fledged operating system that provides filesystem-level data access for the applications. Thus, a vast spectrum of applications can be ported for running on Catalina CSDs. Due to these unique features, to the best of our knowledge, Catalina CSD is the only in-storage processing platform that can be seamlessly deployed in clusters to run distributed applications such as Hadoop MapReduce and HPC applications in-place without any modifications on the underlying distributed processing framework. For the proof of concept, we build a fully functional Catalina prototype and a CSD-equipped platform using 16 Catalina CSDs to run Intel HiBench Hadoop and HPC benchmarks to investigate the benefits of deploying Catalina CSDs in the distributed processing environments. The experimental results show up to 2.2× improvement in performance and 4.3× reduction in energy consumption, respectively, for running Hadoop MapReduce benchmarks. Additionally, thanks to the Neon SIMD engines, the performance and energy efficiency of DFT algorithms are improved up to 5.4× and 8.9×, respectively.


Author(s):  
Yihao Tian

Big data is an unstructured data set with a considerable volume, coming from various sources such as the internet, business organizations, etc., in various formats. Predicting consumer behavior is a core responsibility for most dealers. Market research can show consumer intentions; it can be a big order for a best-designed research project to penetrate the veil, protecting real customer motivations from closer scrutiny. Customer behavior usually focuses on customer data mining, and each model is structured at one stage to answer one query. Customer behavior prediction is a complex and unpredictable challenge. In this paper, advanced mathematical and big data analytical (BDA) methods to predict customer behavior. Predictive behavior analytics can provide modern marketers with multiple insights to optimize efforts in their strategies. This model goes beyond analyzing historical evidence and making the most knowledgeable assumptions about what will happen in the future using mathematical. Because the method is complex, it is quite straightforward for most customers. As a result, most consumer behavior models, so many variables that produce predictions that are usually quite accurate using big data. This paper attempts to develop a model of association rule mining to predict customers’ behavior, improve accuracy, and derive major consumer data patterns. The finding recommended BDA method improves Big data analytics usability in the organization (98.2%), risk management ratio (96.2%), operational cost (97.1%), customer feedback ratio (98.5%), and demand prediction ratio (95.2%).


Sign in / Sign up

Export Citation Format

Share Document