scholarly journals Inequalities in digital memory: ethical and geographical aspects of web archiving

2017 ◽  
Vol 26 ◽  
Author(s):  
Moisés Rockembach

This paper approaches web archiving as preservation of digital memory and as a dynamic informational environment with complex problems of harvest, use, access and preservation. It uses a qualitative and exploratory-descriptive approach, identifying web archiving initiatives and promoting a reflection on the ways of defining web information collection, geographical gaps in web archiving and problems regarding uses and rights of this information. Whereas initiatives such as Internet Archive harvest a lot of information from across the web, an imbalance of digital memory exists where many countries do not possess their own web archiving initiatives, and therefore, coverage of information is unequally produced.

2018 ◽  
Vol 52 (2) ◽  
pp. 266-277 ◽  
Author(s):  
Hyo-Jung Oh ◽  
Dong-Hyun Won ◽  
Chonghyuck Kim ◽  
Sung-Hee Park ◽  
Yong Kim

Purpose The purpose of this paper is to describe the development of an algorithm for realizing web crawlers that automatically collect dynamically generated webpages from the deep web. Design/methodology/approach This study proposes and develops an algorithm to collect web information as if the web crawler gathers static webpages by managing script commands as links. The proposed web crawler actually experiments with the algorithm by collecting deep webpages. Findings Among the findings of this study is that if the actual crawling process provides search results as script pages, the outcome only collects the first page. However, the proposed algorithm can collect deep webpages in this case. Research limitations/implications To use a script as a link, a human must first analyze the web document. This study uses the web browser object provided by Microsoft Visual Studio as a script launcher, so it cannot collect deep webpages if the web browser object cannot launch the script, or if the web document contains script errors. Practical implications The research results show deep webs are estimated to have 450 to 550 times more information than surface webpages, and it is difficult to collect web documents. However, this algorithm helps to enable deep web collection through script runs. Originality/value This study presents a new method to be utilized with script links instead of adopting previous keywords. The proposed algorithm is available as an ordinary URL. From the conducted experiment, analysis of scripts on individual websites is needed to employ them as links.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Moisés Rockembach ◽  
Anabela Serrano

Purpose The purpose of this investigation is to analyze information on the web and its preservation as a digital heritage, having as object of study information about events related to climate changes and the environment in Portugal and Brazil, thus contributing to an applied case of preservation of web in the Ibero-American context. Design/methodology/approach It is a theoretical and applied investigation and the methodology uses mixed methods, collecting and analyzing quantitative and qualitative data, from three data sources: the Internet Archive and public collections of Archive-it, the Portuguese web archive and a complementation from collections formed by the research group on web archiving and digital preservation in Brazil. Findings The web archiving initiatives started in 1996, however, over the years, the collections have been specializing, from nationally relevant themes, to thematic niches. The theme “climate changes” has had an impact on scientific and mainstream discussions in the 2000s, and in the years 2010 the theme becomes the focus of digital preservation of web content, as demonstrated in this study. To not preserve data can lead to a rapid loss of this information owing to the ephemerality of the web. Originality/value The originality of this paper is to show the relevance of preserving web content on climate changes, to demonstrate information on climate changes on the web that is currently preserved and what information would need to be preserved.


Leonardo ◽  
1999 ◽  
Vol 32 (5) ◽  
pp. 353-358 ◽  
Author(s):  
Noah Wardrip-Fruin

We look to media as memory, and a place to memorialize, when we have lost. Hypermedia pioneers such as Ted Nelson and Vannevar Bush envisioned the ultimate media within the ultimate archive—with each element in continual flux, and with constant new addition. Dynamism without loss. Instead we have the Web, where “Not Found” is a daily message. Projects such as the Internet Archive and Afterlife dream of fixing this uncomfortable impermanence. Marketeers promise that agents (indentured information servants that may be the humans of About.com or the software of “Ask Jeeves”) will make the Web comfortable through filtering—hiding the impermanence and overwhelming profluence that the Web's dynamism produces. The Impermanence Agent—a programmatic, esthetic, and critical project created by the author, Brion Moss, a.c. chapman, and Duane Whitehurst— operates differently. It begins as a storytelling agent, telling stories of impermanence, stories of preservation, memorial stories. It monitors each user's Web browsing, and starts customizing its storytelling by weaving in images and texts that the user has pulled from the Web. In time, the original stories are lost. New stories, collaboratively created, have taken their place.


2021 ◽  
pp. 99-110
Author(s):  
Mohammad Ali Tofigh ◽  
◽  
◽  
Zhendong Mu

With the development of society, people pay more and more attention to the safety of food, and relevant laws and policies are gradually introduced and being improved. The research and development of agricultural product quality and safety system has become a research hot spot, and how to obtain the Web information of the system effectively and quickly is the focus of the research, so it is essential to carry out the intelligent extraction of Web information for agricultural product quality and safety system. The purpose of this paper is to solve the problem of how to efficiently extract the Web information of the agricultural product quality and safety system. By studying the Web information extraction methods of various systems, the paper makes a detailed analysis and research on how to realize the efficient and intelligent extraction of the Web information of the agricultural product quality and safety system. This paper analyzes in detail all kinds of template information extraction algorithms used at present, and systematically discusses a set of schemes that can automatically extract the Web information of agricultural product quality and safety system according to the template. The research results show that the proposed scheme is a dynamically extensible information extraction system, which can independently implement dynamic configuration templates according to different requirements without changing the code. Compared with the general way, the Web information extraction speed of agricultural product quality safety system is increased by 25%, the accuracy is increased by 12%, and the recall rate is increased by 30%.


2004 ◽  
pp. 268-304 ◽  
Author(s):  
Grigorios Tsoumakas ◽  
Nick Bassiliades ◽  
Ioannis Vlahavas

This chapter presents the design and development of WebDisC, a knowledge-based web information system for the fusion of classifiers induced at geographically distributed databases. The main features of our system are: (i) a declarative rule language for classifier selection that allows the combination of syntactically heterogeneous distributed classifiers; (ii) a variety of standard methods for fusing the output of distributed classifiers; (iii) a new approach for clustering classifiers in order to deal with the semantic heterogeneity of distributed classifiers, detect their interesting similarities and differences, and enhance their fusion; and (iv) an architecture based on the Web services paradigm that utilizes the open and scalable standards of XML and SOAP.


2004 ◽  
pp. 227-267
Author(s):  
Wee Keong Ng ◽  
Zehua Liu ◽  
Zhao Li ◽  
Ee Peng Lim

With the explosion of information on the Web, traditional ways of browsing and keyword searching of information over web pages no longer satisfy the demanding needs of web surfers. Web information extraction has emerged as an important research area that aims to automatically extract information from target web pages and convert them into a structured format for further processing. The main issues involved in the extraction process include: (1) the definition of a suitable extraction language; (2) the definition of a data model representing the web information source; (3) the generation of the data model, given a target source; and (4) the extraction and presentation of information according to a given data model. In this chapter, we discuss the challenges of these issues and the approaches that current research activities have taken to revolve these issues. We propose several classification schemes to classify existing approaches of information extraction from different perspectives. Among the existing works, we focus on the Wiccap system — a software system that enables ordinary end-users to obtain information of interest in a simple and efficient manner by constructing personalized web views of information sources.


Sign in / Sign up

Export Citation Format

Share Document