Set of tuples expansion by example with reliability

2017 ◽  
Vol 13 (4) ◽  
pp. 425-444 ◽  
Author(s):  
Ngurah Agus Sanjaya Er ◽  
Mouhamadou Lamine Ba ◽  
Talel Abdessalem ◽  
Stéphane Bressan

Purpose This paper aims to focus on the design of algorithms and techniques for an effective set expansion. A tool that finds and extracts candidate sets of tuples from the World Wide Web was designed and implemented. For instance, when a given user provides <Indonesia, Jakarta, Indonesian Rupiah>, <China, Beijing, Yuan Renminbi>, <Canada, Ottawa, Canadian Dollar> as seeds, our system returns tuples composed of countries with their corresponding capital cities and currency names constructed from content extracted from Web pages retrieved. Design/methodology/approach The seeds are used to query a search engine and to retrieve relevant Web pages. The seeds are also used to infer wrappers from the retrieved pages. The wrappers, in turn, are used to extract candidates. The Web pages, wrappers, seeds and candidates, as well as their relationships, are vertices and edges of a heterogeneous graph. Several options for ranking candidates from PageRank to truth finding algorithms were evaluated and compared. Remarkably, all vertices are ranked, thus providing an integrated approach to not only answer direct set expansion questions but also find the most relevant pages to expand a given set of seeds. Findings The experimental results show that leveraging the truth finding algorithm can indeed improve the level of confidence in the extracted candidates and the sources. Originality/value Current approaches on set expansion mostly support sets of atomic data expansion. This idea can be extended to the sets of tuples and extract relation instances from the Web given a handful set of tuple seeds. A truth finding algorithm is also incorporated into the approach and it is shown that it can improve the confidence level in the ranking of both candidates and sources in set of tuples expansion.

Author(s):  
Carmen Domínguez-Falcón ◽  
Domingo Verano-Tacoronte ◽  
Marta Suárez-Fuentes

Purpose The strong regulation of the Spanish pharmaceutical sector encourages pharmacies to modify their business model, giving the customer a more relevant role by integrating 2.0 tools. However, the study of the implementation of these tools is still quite limited, especially in terms of a customer-oriented web page design. This paper aims to analyze the online presence of Spanish community pharmacies by studying the profile of their web pages to classify them by their degree of customer orientation. Design/methodology/approach In total, 710 community pharmacies were analyzed, of which 160 had Web pages. Using items drawn from the literature, content analysis was performed to evaluate the presence of these items on the web pages. Then, after analyzing the scores on the items, a cluster analysis was conducted to classify the pharmacies according to the degree of development of their online customer orientation strategy. Findings The number of pharmacies with a web page is quite low. The development of these websites is limited, and they have a more informational than relational role. The statistical analysis allows to classify the pharmacies in four groups according to their level of development Practical implications Pharmacists should make incremental use of their websites to facilitate real two-way communication with customers and other stakeholders to maintain a relationship with them by having incorporated the Web 2.0 and social media (SM) platforms. Originality/value This study analyses, from a marketing perspective, the degree of Web 2.0 adoption and the characteristics of the websites, in terms of aiding communication and interaction with customers in the Spanish pharmaceutical sector.


2015 ◽  
Vol 49 (2) ◽  
pp. 205-223
Author(s):  
B T Sampath Kumar ◽  
D Vinay Kumar ◽  
K.R. Prithviraj

Purpose – The purpose of this paper is to know the rate of loss of online citations used as references in scholarly journals. It also indented to recover the vanished online citations using Wayback Machine and also to calculate the half-life period of online citations. Design/methodology/approach – The study selected three journals published by Emerald publication. All 389 articles published in these three scholarly journals were selected. A total of 15,211 citations were extracted of which 13,281 were print citations and only 1,930 were online citations. The online citations so extracted were then tested to determine whether they were active or missing on the Web. W3C Link Checker was used to check the existence of online citations. The online citations which got HTTP error message while testing for its accessibility were then entered in to the search box of the Wayback Machine to recover vanished online citations. Findings – Study found that only 12.69 percent (1,930 out of 15,211) citations were online citations and the percentage of online citations varied from a low of 9.41 in the year 2011 to high of 17.52 in the year 2009. Another notable finding of the research was that 30.98 percent of online citations were not accessible (vanished) and remaining 69.02 percent of online citations were still accessible (active). The HTTP 404 error message – “page not found” was the overwhelming message encountered and represented 62.98 percent of all HTTP error message. It was found that the Wayback Machine had archived only 48.33 percent of the vanished web pages, leaving 51.67 percent still unavailable. The half-life of online citations was increased from 5.40 years to 11.73 years after recovering the vanished online citations. Originality/value – This is a systematic and in-depth study on recovery of vanished online citations cited in journals articles spanning a period of five years. The findings of the study will be helpful to researchers, authors, publishers, and editorial staff to recover vanishing online citations using Wayback Machine.


2015 ◽  
Vol 67 (6) ◽  
pp. 663-686 ◽  
Author(s):  
Saed ALQARALEH ◽  
Omar RAMADAN ◽  
Muhammed SALAMAH

Purpose – The purpose of this paper is to design a watcher-based crawler (WBC) that has the ability of crawling static and dynamic web sites, and can download only the updated and newly added web pages. Design/methodology/approach – In the proposed WBC crawler, a watcher file, which can be uploaded to the web sites servers, prepares a report that contains the addresses of the updated and the newly added web pages. In addition, the WBC is split into five units, where each unit is responsible for performing a specific crawling process. Findings – Several experiments have been conducted and it has been observed that the proposed WBC increases the number of uniquely visited static and dynamic web sites as compared with the existing crawling techniques. In addition, the proposed watcher file not only allows the crawlers to visit the updated and newly web pages, but also solves the crawlers overlapping and communication problems. Originality/value – The proposed WBC performs all crawling processes in the sense that it detects all updated and newly added pages automatically without any human explicit intervention or downloading the entire web sites.


2018 ◽  
Vol 36 (4) ◽  
pp. 620-632 ◽  
Author(s):  
Rita Kosztyánné Mátrai

Purpose The purpose of this paper is to identify important principles which should be applied to electronic library websites to make them usable for all people. Design/methodology/approach The goal of this paper was to make the simplified user interface of Hungarian Electronic Library (VMEK) more accessible and usable by leveraging the latest technologies, standards and recommendations. Vision-impaired and motor-disabled people were also involved in brainstorming and collecting ideas during the design phase and in testing the implemented website. Findings This paper showed that the perspicuity of the Web page is greatly improved by semantically correct HTML codes, clearly defined links and alt attributes, hotkeys and typographic principles. Practical implications The paper presents the design principles of electronic library Web pages which can be applied by Web developers and content managers. The paper identifies design principles, which improve the perspicuity of user interfaces to a great extent (especially in the case of blind users); draws attention to the typographic principles, which promote reading and understanding documents; and recommends guidelines for developing electronic library home pages and managing the content of these home pages. Originality/value This paper bridges the gap between the information and library science field and the Web accessibility and usability field. Based on brainstorming results where people with various kinds of disabilities were involved, the paper gives 11 recommendations which should be taken into account while designing and developing electronic library websites to ensure equal access to their services and documents.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Maria Giovanna Confetto ◽  
Claudia Covucci

PurposeThe objective of this paper is to propose a taxonomy of sustainability communication (SC) topics that provide digital content managers with a guide for setting a sustainability content agenda and for fostering stakeholder engagement mechanisms on environmental, social and economic issues that increasingly characterize conversations on social media of all stakeholder groups.Design/methodology/approachTaxonomy is a conceptual and qualitative way used to classify and represent the corporate sustainability (CS) domain of knowledge. The taxonomy categories of SC topics are both theoretically and empirically derived, combining an in-depth literature review with a thematic content analysis of 300 web pages of the corporate websites of the top ten sustainable brands selected in “The 2019 GlobeScan-SustainAbility Leaders Survey.”FindingsThe analysis of the results led to the construction of a hierarchical dictionary of tags that categorizes all sustainability topics based on a new, four-dimensional conceptual structure: planet, people, profit and governance. Each dimension is organized in four groups of sustainability themes, which, in turn, group multiple topics, considered the smallest communication unit to develop the SC content.Practical implicationsThe taxonomy provides a concise and immediate conceptual framework on all those topics of broader interest, which, suitably modulated, can act as touch points with several groups of stakeholders. Drawn upon the best practices of thematic organization of SCs via the web, the taxonomy represents a guide for programming an editorial plan based on environmental, social, economic and governance issues from a sustainability content marketing perspective. The taxonomy of sustainability topics also finds application as a framework for a content intelligent system, providing a dictionary of tags that can be used for the indexing and retrieval of SC web content.Originality/valueThe study represents the first attempt at reaching a taxonomic organization of the sustainability aspects from a communicational perspective, supporting a new way of thinking and managing SC in the digital realm. Moreover, the results highlight, for the first time, that the Triple Bottom Line (TBL) theory, applied to corporate communications, lacks the governance aspect, which is essential to pursue sustainability consistently and effectively.


2021 ◽  
Vol 33 (7) ◽  
pp. 295-317
Author(s):  
Maria Giovanna Confetto ◽  
Claudia Covucci

PurposeFor companies that intend to respond to the modern conscious consumers' needs, a great competitive advantage is played on the ability to incorporate sustainability messages in marketing communications. The aim of this paper is to address this important priority in the web context, building a semantic algorithm that allows content managers to evaluate the quality of sustainability web contents for search engines, considering the current semantic web development.Design/methodology/approachFollowing the Design Science (DS) methodological approach, the study develops the algorithm as an artefact capable of solving a practical problem and improving the operation of content managerial process.FindingsThe algorithm considers multiple factors of evaluation, grouped in three parameters: completeness, clarity and consistency. An applicability test of the algorithm was conducted on a sample of web pages of the Google blog on sustainability to highlight the correspondence between the established evaluation factors and those actually used by Google.Practical implicationsStudying content marketing for sustainability communication constitutes a new field of research that offers exciting opportunities. Writing sustainability contents in an effective way is a fundamental step to trigger stakeholder engagement mechanisms online. It could be a positive social engineering technique in the hands of marketers to make web users able to pursue sustainable development in their choices.Originality/valueThis is the first study that creates a theoretical connection between digital content marketing and sustainability communication focussing, especially, on the aspects of search engine optimization (SEO). The algorithm of “Sustainability-contents SEO” is the first operational software tool, with a regulatory nature, that is able to analyse the web contents, detecting the terms of the sustainability language and measuring the compliance to SEO requirements.


2013 ◽  
Vol 7 (2) ◽  
pp. 574-579 ◽  
Author(s):  
Dr Sunitha Abburu ◽  
G. Suresh Babu

Day by day the volume of information availability in the web is growing significantly. There are several data structures for information available in the web such as structured, semi-structured and unstructured. Majority of information in the web is presented in web pages. The information presented in web pages is semi-structured.  But the information required for a context are scattered in different web documents. It is difficult to analyze the large volumes of semi-structured information presented in the web pages and to make decisions based on the analysis. The current research work proposed a frame work for a system that extracts information from various sources and prepares reports based on the knowledge built from the analysis. This simplifies  data extraction, data consolidation, data analysis and decision making based on the information presented in the web pages.The proposed frame work integrates web crawling, information extraction and data mining technologies for better information analysis that helps in effective decision making.   It enables people and organizations to extract information from various sourses of web and to make an effective analysis on the extracted data for effective decision making.  The proposed frame work is applicable for any application domain. Manufacturing,sales,tourisum,e-learning are various application to menction few.The frame work is implemetnted and tested for the effectiveness of the proposed system and the results are promising.


Think India ◽  
2019 ◽  
Vol 22 (2) ◽  
pp. 174-187
Author(s):  
Harmandeep Singh ◽  
Arwinder Singh

Nowadays, internet satisfying people with different services related to different fields. The profit, as well as non-profit organization, uses the internet for various business purposes. One of the major is communicated various financial as well as non-financial information on their respective websites. This study is conducted on the top 30 BSE listed public sector companies, to measure the extent of governance disclosure (non-financial information) on their web pages. The disclosure index approach to examine the extent of governance disclosure on the internet was used. The governance index was constructed and broadly categorized into three dimensions, i.e., organization and structure, strategy & Planning and accountability, compliance, philosophy & risk management. The empirical evidence of the study reveals that all the Indian public sector companies have a website, and on average, 67% of companies disclosed some kind of governance information directly on their websites. Further, we found extreme variations in the web disclosure between the three categories, i.e., The Maharatans, The Navratans, and Miniratans. However, the result of Kruskal-Wallis indicates that there is no such significant difference between the three categories. The study provides valuable insights into the Indian economy. It explored that Indian public sector companies use the internet for governance disclosure to some extent, but lacks symmetry in the disclosure. It is because there is no such regulation for web disclosure. Thus, the recommendation of the study highlighted that there must be such a regulated framework for the web disclosure so that stakeholders ensure the transparency and reliability of the information.


2013 ◽  
Vol 347-350 ◽  
pp. 2758-2762
Author(s):  
Zhi Juan Wang

Negative Internet information is harmful for social stability and national unity. Opinion tendency analyzing can find the negative Internet information. Here, a method based on regular expression is introduces that neednt complex technologies about semantics. This method includes: building negative information bank, designing regular expression and the realization of program. The result gotten from this method verified it works perfect on judging the opinion of the web pages.


Sign in / Sign up

Export Citation Format

Share Document