Inequalities in digital memory: ethical and geographical aspects of web archiving

The International Review of Information Ethics ◽

10.29173/irie286 ◽

2017 ◽

Vol 26 ◽

Author(s):

Moisés Rockembach

Keyword(s):

Information Collection ◽

Web Archiving ◽

Internet Archive ◽

Complex Problems ◽

Digital Memory ◽

Web Information ◽

Descriptive Approach ◽

Informational Environment ◽

The Web

This paper approaches web archiving as preservation of digital memory and as a dynamic informational environment with complex problems of harvest, use, access and preservation. It uses a qualitative and exploratory-descriptive approach, identifying web archiving initiatives and promoting a reflection on the ways of defining web information collection, geographical gaps in web archiving and problems regarding uses and rights of this information. Whereas initiatives such as Internet Archive harvest a lot of information from across the web, an imbalance of digital memory exists where many countries do not possess their own web archiving initiatives, and therefore, coverage of information is unequally produced.

Download Full-text

Design and implementation of crawling algorithm to collect deep web information for web archiving

Data Technologies and Applications ◽

10.1108/dta-07-2017-0053 ◽

2018 ◽

Vol 52 (2) ◽

pp. 266-277 ◽

Cited By ~ 2

Author(s):

Hyo-Jung Oh ◽

Dong-Hyun Won ◽

Chonghyuck Kim ◽

Sung-Hee Park ◽

Yong Kim

Keyword(s):

Deep Web ◽

Web Crawler ◽

Web Archiving ◽

Web Browser ◽

Web Documents ◽

Content Type ◽

Web Document ◽

Web Information ◽

Web Crawlers ◽

The Web

Purpose The purpose of this paper is to describe the development of an algorithm for realizing web crawlers that automatically collect dynamically generated webpages from the deep web. Design/methodology/approach This study proposes and develops an algorithm to collect web information as if the web crawler gathers static webpages by managing script commands as links. The proposed web crawler actually experiments with the algorithm by collecting deep webpages. Findings Among the findings of this study is that if the actual crawling process provides search results as script pages, the outcome only collects the first page. However, the proposed algorithm can collect deep webpages in this case. Research limitations/implications To use a script as a link, a human must first analyze the web document. This study uses the web browser object provided by Microsoft Visual Studio as a script launcher, so it cannot collect deep webpages if the web browser object cannot launch the script, or if the web document contains script errors. Practical implications The research results show deep webs are estimated to have 450 to 550 times more information than surface webpages, and it is difficult to collect web documents. However, this algorithm helps to enable deep web collection through script runs. Originality/value This study presents a new method to be utilized with script links instead of adopting previous keywords. The proposed algorithm is available as an ordinary URL. From the conducted experiment, analysis of scripts on individual websites is needed to employ them as links.

Download Full-text

Climate change and web archives: an Ibero-American study based on the Portuguese and Brazilian contexts

Records Management Journal ◽

10.1108/rmj-11-2020-0039 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Moisés Rockembach ◽

Anabela Serrano

Keyword(s):

Climate Changes ◽

Digital Preservation ◽

Web Content ◽

Web Archiving ◽

Content Type ◽

Internet Archive ◽

Digital Heritage ◽

Object Of Study ◽

Web Archive ◽

The Web

Purpose The purpose of this investigation is to analyze information on the web and its preservation as a digital heritage, having as object of study information about events related to climate changes and the environment in Portugal and Brazil, thus contributing to an applied case of preservation of web in the Ibero-American context. Design/methodology/approach It is a theoretical and applied investigation and the methodology uses mixed methods, collecting and analyzing quantitative and qualitative data, from three data sources: the Internet Archive and public collections of Archive-it, the Portuguese web archive and a complementation from collections formed by the research group on web archiving and digital preservation in Brazil. Findings The web archiving initiatives started in 1996, however, over the years, the collections have been specializing, from nationally relevant themes, to thematic niches. The theme “climate changes” has had an impact on scientific and mainstream discussions in the 2000s, and in the years 2010 the theme becomes the focus of digital preservation of web content, as demonstrated in this study. To not preserve data can lead to a rapid loss of this information owing to the ephemerality of the web. Originality/value The originality of this paper is to show the relevance of preserving web content on climate changes, to demonstrate information on climate changes on the web that is currently preserved and what information would need to be preserved.

Download Full-text

Hypermedia, Eternal Life, and the Impermanence Agent

Leonardo ◽

10.1162/002409499553569 ◽

1999 ◽

Vol 32 (5) ◽

pp. 353-358 ◽

Cited By ~ 1

Author(s):

Noah Wardrip-Fruin

Keyword(s):

Eternal Life ◽

The Internet ◽

Web Browsing ◽

Original Stories ◽

Internet Archive ◽

Vannevar Bush ◽

The Web ◽

Critical Project

We look to media as memory, and a place to memorialize, when we have lost. Hypermedia pioneers such as Ted Nelson and Vannevar Bush envisioned the ultimate media within the ultimate archive—with each element in continual flux, and with constant new addition. Dynamism without loss. Instead we have the Web, where “Not Found” is a daily message. Projects such as the Internet Archive and Afterlife dream of fixing this uncomfortable impermanence. Marketeers promise that agents (indentured information servants that may be the humans of About.com or the software of “Ask Jeeves”) will make the Web comfortable through filtering—hiding the impermanence and overwhelming profluence that the Web's dynamism produces. The Impermanence Agent—a programmatic, esthetic, and critical project created by the author, Brion Moss, a.c. chapman, and Duane Whitehurst— operates differently. It begins as a storytelling agent, telling stories of impermanence, stories of preservation, memorial stories. It monitors each user's Web browsing, and starts customizing its storytelling by weaving in images and texts that the user has pulled from the Web. In time, the original stories are lost. New stories, collaboratively created, have taken their place.

Download Full-text

Building the Web Information Dissemination Model for Product Quality and Safety Based on Complex Network

Journal of Physics Conference Series ◽

10.1088/1742-6596/1069/1/012026 ◽

2018 ◽

Vol 1069 ◽

pp. 012026

Author(s):

Yingcheng Xu ◽

Wei Jiang ◽

Xiuli Ning ◽

Ruyi Ye

Keyword(s):

Product Quality ◽

Complex Network ◽

Information Dissemination ◽

Quality And Safety ◽

Web Information ◽

Product Quality And Safety ◽

The Web

Download Full-text

Intelligent Web Information Extraction Model for Agricultural Product Quality and Safety System

10.54216/jisiot.040203 ◽

2021 ◽

pp. 99-110

Author(s):

Mohammad Ali Tofigh ◽

◽

Zhendong Mu

Keyword(s):

Information Extraction ◽

Product Quality ◽

Hot Spot ◽

Safety System ◽

Agricultural Product ◽

Quality And Safety ◽

Web Information Extraction ◽

Web Information ◽

Product Quality And Safety ◽

The Web

With the development of society, people pay more and more attention to the safety of food, and relevant laws and policies are gradually introduced and being improved. The research and development of agricultural product quality and safety system has become a research hot spot, and how to obtain the Web information of the system effectively and quickly is the focus of the research, so it is essential to carry out the intelligent extraction of Web information for agricultural product quality and safety system. The purpose of this paper is to solve the problem of how to efficiently extract the Web information of the agricultural product quality and safety system. By studying the Web information extraction methods of various systems, the paper makes a detailed analysis and research on how to realize the efficient and intelligent extraction of the Web information of the agricultural product quality and safety system. This paper analyzes in detail all kinds of template information extraction algorithms used at present, and systematically discusses a set of schemes that can automatically extract the Web information of agricultural product quality and safety system according to the template. The research results show that the proposed scheme is a dynamically extensible information extraction system, which can independently implement dynamic configuration templates according to different requirements without changing the code. Compared with the general way, the Web information extraction speed of agricultural product quality safety system is increased by 25%, the accuracy is increased by 12%, and the recall rate is increased by 30%.

Download Full-text

Personalizing web information for patients: linking patient medical data with the web via a patient personal knowledge base

Health Informatics Journal ◽

10.1177/1460458206061202 ◽

2006 ◽

Vol 12 (1) ◽

pp. 27-39 ◽

Cited By ~ 16

Author(s):

Asma Al-Busaidi ◽

Alex Gray ◽

Nick Fiddian

Keyword(s):

Knowledge Base ◽

Medical Data ◽

Personal Knowledge ◽

Web Information ◽

The Web

Download Full-text

Multi-modal Services for Web Information Collection Based on Multi-agent Techniques

Agent Computing and Multi-Agent Systems - Lecture Notes in Computer Science ◽

10.1007/11802372_15 ◽

2006 ◽

pp. 129-137

Author(s):

Qing He ◽

Xiurong Zhao ◽

Sulan Zhang

Keyword(s):

Information Collection ◽

Web Information ◽

Multi Agent

Download Full-text

Study on the Web Information Search Prediction Algorithm

Advanced Research on Electronic Commerce, Web Application, and Communication - Communications in Computer and Information Science ◽

10.1007/978-3-642-20367-1_70 ◽

2011 ◽

pp. 434-438

Author(s):

Zhong-Sheng Wang ◽

Mei Cao

Keyword(s):

Information Search ◽

Prediction Algorithm ◽

Web Information ◽

The Web

Download Full-text

A Knowledge-Based Web Information System for the Fusion of Distributed Classifers

Web Information Systems ◽

10.4018/978-1-59140-208-4.ch008 ◽

2004 ◽

pp. 268-304 ◽

Cited By ~ 2

Author(s):

Grigorios Tsoumakas ◽

Nick Bassiliades ◽

Ioannis Vlahavas

Keyword(s):

Information System ◽

Distributed Databases ◽

Semantic Heterogeneity ◽

Knowledge Based ◽

Classifier Selection ◽

Web Information System ◽

Web Information ◽

Geographically Distributed ◽

Fusion Of Classifiers ◽

The Web

This chapter presents the design and development of WebDisC, a knowledge-based web information system for the fusion of classifiers induced at geographically distributed databases. The main features of our system are: (i) a declarative rule language for classifier selection that allows the combination of syntactically heterogeneous distributed classifiers; (ii) a variety of standard methods for fusing the output of distributed classifiers; (iii) a new approach for clustering classifiers in order to deal with the semantic heterogeneity of distributed classifiers, detect their interesting similarities and differences, and enhance their fusion; and (iv) an architecture based on the Web services paradigm that utilizes the open and scalable standards of XML and SOAP.

Download Full-text

Web Information Extraction via Web Views

Web Information Systems ◽

10.4018/978-1-59140-208-4.ch007 ◽

2004 ◽

pp. 227-267

Author(s):

Wee Keong Ng ◽

Zehua Liu ◽

Zhao Li ◽

Ee Peng Lim

Keyword(s):

Information Extraction ◽

Data Model ◽

Information Source ◽

Extraction Process ◽

Web Pages ◽

Efficient Manner ◽

Web Information Extraction ◽

Web Information ◽

Definition Of ◽

The Web

With the explosion of information on the Web, traditional ways of browsing and keyword searching of information over web pages no longer satisfy the demanding needs of web surfers. Web information extraction has emerged as an important research area that aims to automatically extract information from target web pages and convert them into a structured format for further processing. The main issues involved in the extraction process include: (1) the definition of a suitable extraction language; (2) the definition of a data model representing the web information source; (3) the generation of the data model, given a target source; and (4) the extraction and presentation of information according to a given data model. In this chapter, we discuss the challenges of these issues and the approaches that current research activities have taken to revolve these issues. We propose several classification schemes to classify existing approaches of information extraction from different perspectives. Among the existing works, we focus on the Wiccap system — a software system that enables ordinary end-users to obtain information of interest in a simple and efficient manner by constructing personalized web views of information sources.

Download Full-text