Research based on large-scale data query with mapreduce technology in cloud computing

Author(s):  
Feiping Wang ◽  
Xiaofeng Gu
2013 ◽  
Vol 441 ◽  
pp. 691-694
Author(s):  
Yi Qun Zeng ◽  
Jing Bin Wang

With the rapid development of information technology, data grows explosionly, how to deal with the large scale data become more and more important. Based on the characteristics of RDF data, we propose to compress RDF data. We construct an index structure called PAR-Tree Index, then base on the MapReduce parallel computing framework and the PAR-Tree Index to execute the query. Experimental results show that the algorithm can improve the efficiency of large data query.


2014 ◽  
Vol 989-994 ◽  
pp. 4594-4597
Author(s):  
Chun Zhi Xing

With the development of Internet, various Internet-based large-scale data are facing increasing competition. With the hope of satisfying the need of data query, it is necessary to use data mining and distributed processing. As a consequence, this paper proposes a large-scale data mining and distributed processing method based on decision tree algorithm.


2012 ◽  
Vol 182-183 ◽  
pp. 2127-2130
Author(s):  
Tie Liang Gao ◽  
Jiao Li ◽  
Jun Peng Zhang ◽  
Bing Jie Shi

MapReduce is a kind of model of program that is use in the parallel computing about large scale data muster in the Cloud Computing[1] , it mainly consist of map and reduce . MapReduce is tremendously convenient for the programmer who can’t familiar with the parallel program .These people use the MapReduce to run their program on the distribute system. This paper mainly research the model and process and theory of MapReduce .


Author(s):  
C. Infant Louis Richards ◽  
T. Yuva ◽  
J.SYLVESTER BRITTO

Cloud Architectures discourse key hitches surrounding large-scale data dispensation. In customary data processing it is grim to get as many machines as an application needs. Second, it is difficult to get the machines when one needs them. Third, it is difficult to dispense and harmonize a large-scale job on different machines, run processes on them, and provision another machine to recover if one machine fails. Fourth, it is difficult to auto scale up and down based on dynamic workloads. Fifth, it is difficult to get rid of all those machines when the job is done. Cloud Architectures solve such difficulties.Optical character recognition of cursive scripts present a number of thought-provokingsnags in both segmentation and recognition processes and this entices many researches in the arena of contraption learning. This paper presents the best approach based on a mishmash of OCR and Cloud Computing to handle with the Apple’s prerequisite, to make it available in the app store to design a splendid OCR for outdoor portable documents. The enactment results on a comprehensive database show a high notch of accuracy which meets the requirements of viable use.


2018 ◽  
Vol 12 (8) ◽  
pp. 69 ◽  
Author(s):  
Faten Hamad

Hadoop is a cloud computing open source system, used in large-scale data processing. It became the basic computing platforms for many internet companies. With Hadoop platform users can develop the cloud computing application and then submit the task to the platform. Hadoop has a strong fault tolerance, and can easily increase the number of cluster nodes, using linear expansion of the cluster size, so that clusters can process larger datasets. However Hadoop has some shortcomings, especially in the actual use of the process of exposure to the MapReduce scheduler, which calls for more researches on Hadoop scheduling algorithms.This survey provides an overview of the default Hadoop scheduler algorithms and the problem they have. It also compare between five Hadoop framework scheduling algorithms in term of the default scheduler algorithm to be enhanced, the proposed scheduler algorithm, type of cluster applied either heterogeneous or homogeneous, methodology, and clusters classification based on performance evaluation. Finally, a new algorithm based on capacity scheduling and use of perspective resource utilization to enhance Hadoop scheduling is proposed.


Sign in / Sign up

Export Citation Format

Share Document