A Novel Approach for Crawling the Opinions from World Wide Web

2016 ◽  
Vol 6 (2) ◽  
pp. 1-23 ◽  
Author(s):  
Surbhi Bhatia ◽  
Manisha Sharma ◽  
Komal Kumar Bhatia

Due to the sudden and explosive increase in web technologies, huge quantity of user generated content is available online. The experiences of people and their opinions play an important role in the decision making process. Although facts provide the ease of searching information on a topic but retrieving opinions is still a crucial task. Many studies on opinion mining have to be undertaken efficiently in order to extract constructive opinionated information from these reviews. The present work focuses on the design and implementation of an Opinion Crawler which downloads the opinions from various sites thereby, ignoring rest of the web. Besides, it also detects web pages which frequently undergo updation by calculating the timestamp for its revisit in order to extract relevant opinions. The performance of the Opinion Crawler is justified by taking real data sets that prove to be much more accurate in terms of precision and recall quality attributes.

2017 ◽  
Author(s):  
Cheng Lee

Recently, we hear more about web generations and its role in current web technologies we are using. Most of people know Web 2.0 and how the huge transformation changed from the previous version (Web 1.0). Web 2.0 is the style that became standard in the late 1990s and includes all the features that have allowed web pages to move beyond static documents. Web 2.0 marked a cultural shift in how web pages were developed, designed, and used from static era to dynamic one. It saw the meteoric rise of social media, including Facebook and Twitter, and user-generated content such as blogs, wikis, Wikipedia being perhaps the most famous and video-sharing sites such as YouTube. Its features made it very attractive for people to be familiar with it and learn to work with it. In this paper, we will go through some aspects of Web Generations from 1.0 to 3.0 and focus on some security issues for each generation.


2017 ◽  
Author(s):  
Cheng Lee

Recently, we hear more about web generations and its role in current web technologies we are using. Most of people know Web 2.0 and how the huge transformation changed from the previous version (Web 1.0). Web 2.0 is the style that became standard in the late 1990s and includes all the features that have allowed web pages to move beyond static documents. Web 2.0 marked a cultural shift in how web pages were developed, designed, and used from static era to dynamic one. It saw the meteoric rise of social media, including Facebook and Twitter, and user-generated content such as blogs, wikis, Wikipedia being perhaps the most famous and video-sharing sites such as YouTube. Its features made it very attractive for people to be familiar with it and learn to work with it. In this paper, we will go through some aspects of Web Generations from 1.0 to 3.0 and focus on some security issues for each generation.


2020 ◽  
pp. 638-657
Author(s):  
Firas Ben Kharrat ◽  
Aymen Elkhleifi ◽  
Rim Faiz

This paper puts forward a new recommendation algorithm based on semantic analysis as well as new measurements. Like Facebook, Social network is considered as one of the most well-prominent Web 2.0 applications and relevant services elaborating into functional ways for sharing opinions. Thereupon, social network web sites have since become valuable data sources for opinion mining. This paper proposes to introduce an external resource a sentiment from comments posted by users in order to anticipate recommendation and also to lessen the cold-start problem. The originality of the suggested approach means that posts are not merely characterized by an opinion score, but receive an opinion grade notion in the post instead. In general, the authors' approach has been implemented with Java and Lenskit framework. The study resulted in two real data sets, namely MovieLens and TripAdvisor, in which the authors have shown positive results. They compared their algorithm to SVD and Slope One algorithms. They have fulfilled an amelioration of 10% in precision and recall along with an improvement of 12% in RMSE and nDCG.


2016 ◽  
Vol 7 (3) ◽  
pp. 99-118 ◽  
Author(s):  
Firas Ben Kharrat ◽  
Aymen Elkhleifi ◽  
Rim Faiz

This paper puts forward a new recommendation algorithm based on semantic analysis as well as new measurements. Like Facebook, Social network is considered as one of the most well-prominent Web 2.0 applications and relevant services elaborating into functional ways for sharing opinions. Thereupon, social network web sites have since become valuable data sources for opinion mining. This paper proposes to introduce an external resource a sentiment from comments posted by users in order to anticipate recommendation and also to lessen the cold-start problem. The originality of the suggested approach means that posts are not merely characterized by an opinion score, but receive an opinion grade notion in the post instead. In general, the authors' approach has been implemented with Java and Lenskit framework. The study resulted in two real data sets, namely MovieLens and TripAdvisor, in which the authors have shown positive results. They compared their algorithm to SVD and Slope One algorithms. They have fulfilled an amelioration of 10% in precision and recall along with an improvement of 12% in RMSE and nDCG.


Author(s):  
Khayra Bencherif ◽  
Mimoun Malki ◽  
Djamel Amar Bensaber

This article describes how the Linked Open Data Cloud project allows data providers to publish structured data on the web according to the Linked Data principles. In this context, several link discovery frameworks have been developed for connecting entities contained in knowledge bases. In order to achieve a high effectiveness for the link discovery task, a suitable link configuration is required to specify the similarity conditions. Unfortunately, such configurations are specified manually; which makes the link discovery task tedious and more difficult for the users. In this article, the authors address this drawback by proposing a novel approach for the automatic determination of link specifications. The proposed approach is based on a neural network model to combine a set of existing metrics into a compound one. The authors evaluate the effectiveness of the proposed approach in three experiments using real data sets from the LOD Cloud. In addition, the proposed approach is compared against link specifications approaches to show that it outperforms them in most experiments.


Author(s):  
Sherif Sakr

Recently, the use of XML continues to grow in popularity, large repositories of XML documents are going to emerge, and users are likely to pose increasingly more complex queries on these data sets. In 2001 XQuery is decided by the World Wide Web Consortium (W3C) as the standard XML query language. In this article, we describe the design and implementation of an efficient and scalable purely relational XQuery processor which translates expressions of the XQuery language into their equivalent SQL evaluation scripts. The experiments of this article demonstrated the efficiency and scalability of our purely relational approach in comparison to the native XML/XQuery functionality supported by conventional RDBMSs and has shown that our purely relational approach for implementing XQuery processor deserves to be pursued further.


2017 ◽  
Vol 29 (11) ◽  
pp. 3040-3077 ◽  
Author(s):  
Duy Nhat Phan ◽  
Hoai An Le Thi ◽  
Tao Pham Dinh

This letter proposes a novel approach using the [Formula: see text]-norm regularization for the sparse covariance matrix estimation (SCME) problem. The objective function of SCME problem is composed of a nonconvex part and the [Formula: see text] term, which is discontinuous and difficult to tackle. Appropriate DC (difference of convex functions) approximations of [Formula: see text]-norm are used that result in approximation SCME problems that are still nonconvex. DC programming and DCA (DC algorithm), powerful tools in nonconvex programming framework, are investigated. Two DC formulations are proposed and corresponding DCA schemes developed. Two applications of the SCME problem that are considered are classification via sparse quadratic discriminant analysis and portfolio optimization. A careful empirical experiment is performed through simulated and real data sets to study the performance of the proposed algorithms. Numerical results showed their efficiency and their superiority compared with seven state-of-the-art methods.


2011 ◽  
Vol 8 (3) ◽  
pp. 711-737 ◽  
Author(s):  
Peiquan Jin ◽  
Hong Chen ◽  
Xujian Zhao ◽  
Xiaowen Li ◽  
Lihua Yue

Temporal information plays important roles in Web search, as Web pages intrinsically involve crawled time and most Web pages contain time keywords in their content. How to integrate temporal information in Web search engines has been a research focus in recent years, among which some key issues such as temporal-textual indexing and temporal information extraction have to be first studied. In this paper, we first present a framework of temporal-textual Web search engine. And then, we concentrate on designing a new hybrid index structure for temporal and textual information of Web pages. In particular, we propose to integrate B+-tree, inverted file and a typical temporal index called MAP21-Tree, to handle temporal-textual queries. We study five mechanisms to implement a hybrid index structure for temporal-textual queries, which use different ways to organize the inverted file, B+-tree and MAP-21 tree. After a theoretic analysis on the performance of those five index structures, we conduct experiments on both simulated and real data sets to make performance comparison. The experimental results show that among all the index schemes the first-inverted-file-then-MAP21-tree index structure has the best query performance and thus is an acceptable choice to be the temporal-textual index for future time-aware search engines.


Author(s):  
Maybin Muyeba ◽  
M. Sulaiman Khan ◽  
Frans Coenen

A novel approach is presented for effectively mining weighted fuzzy association rules (ARs). The authors address the issue of invalidation of downward closure property (DCP) in weighted association rule mining where each item is assigned a weight according to its significance wrt some user defined criteria. Most works on weighted association rule mining do not address the downward closure property while some make assumptions to validate the property. This chapter generalizes the weighted association rule mining problem with binary and fuzzy attributes with weighted settings. Their methodology follows an Apriori approach but employs T-tree data structure to improve efficiency of counting itemsets. The authors’ approach avoids pre and post processing as opposed to most weighted association rule mining algorithms, thus eliminating the extra steps during rules generation. The chapter presents experimental results on both synthetic and real-data sets and a discussion on evaluating the proposed approach.


Author(s):  
Rimpal Unadkat

The World Wide Web (WWW) allows the people to share the information (data) from the large database repositories globally. The tremendous growth in the volume of data and with the terrific growth of number of web pages, traditional search engines now days are not appropriate and not suitable anymore. Search engine is the most important tool to discover any information in World Wide Web. Semantic Search Engine is born of traditional search engine to overcome the above problem. However, to overcome this problem in search engines to retrieve meaningful information intelligently, semantic web technologies are playing a major role. In this paper the authors present survey on the role of search engines in intelligent web, Background, Challenges and some issues.


Sign in / Sign up

Export Citation Format

Share Document