Climate change and web archives: an Ibero-American study based on the Portuguese and Brazilian contexts

2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Moisés Rockembach ◽  
Anabela Serrano

Purpose The purpose of this investigation is to analyze information on the web and its preservation as a digital heritage, having as object of study information about events related to climate changes and the environment in Portugal and Brazil, thus contributing to an applied case of preservation of web in the Ibero-American context. Design/methodology/approach It is a theoretical and applied investigation and the methodology uses mixed methods, collecting and analyzing quantitative and qualitative data, from three data sources: the Internet Archive and public collections of Archive-it, the Portuguese web archive and a complementation from collections formed by the research group on web archiving and digital preservation in Brazil. Findings The web archiving initiatives started in 1996, however, over the years, the collections have been specializing, from nationally relevant themes, to thematic niches. The theme “climate changes” has had an impact on scientific and mainstream discussions in the 2000s, and in the years 2010 the theme becomes the focus of digital preservation of web content, as demonstrated in this study. To not preserve data can lead to a rapid loss of this information owing to the ephemerality of the web. Originality/value The originality of this paper is to show the relevance of preserving web content on climate changes, to demonstrate information on climate changes on the web that is currently preserved and what information would need to be preserved.

2016 ◽  
Vol 35 (3) ◽  
pp. 64-72 ◽  
Author(s):  
Liladhar R. Pendse

Purpose The purpose of this paper is to highlight the web-archiving as a tool for possible collection development in a research level academic library. The paper highlights the web-archiving project that dealt with the contemporary Ukraine conflict. Currently, as the conflict in Ukraine drags on, the need for collecting and preserving the information from various web-based resources with different ideological orientations acquires a special importance. The demise of the Soviet Union in 1991 and the emergence of independent republics were heralded by some as a peaceful transition to the “free-market” style economies. This transition was nevertheless nuanced and not seamless. Besides the incomplete market liberalization, rent-seeking behaviors of different sort, it was also accompanied by the almost ubiquitous use of and access to the internet and the internet communication technologies. Now 24 years later, the ongoing conflict in Ukraine also appears to be unfolding on the World Wide Web. With the Russian annexation of Crimea and its unification to the Russian Federation, the governmental and non-governmental websites of the Ukrainian Crimea suddenly came to represent a sort of “an endangered archive”. Design/methodology/approach The main purpose of this project was to make the information that is contained in Ukrainian and Russia websites available to the wider body of scholars and students over the longer period of time in a web archive. The author does not take any ideological stance on the legal status of Crimea or on the ongoing conflict in Ukraine. There are currently several projects that are devoted to the preservation of these websites. This article also focuses on providing a survey of the landscape of these projects and highlights the ongoing web-archiving project that is entitled, “the Ukraine Crisis: 2014-2015” at the UC Berkeley Library. Findings The UC Berkeley’s Ukraine Conflict Archive was made available to public in March of 2015 after enough materials were archived. The initial purpose of the archive was to selectively harvest, and archive those websites that are bound to either disappear or change significantly during the evolution of Crimea’s accession to Russia. However, in the aftermath of the Crimean conflict, the ensuing of military conflict in Ukraine had forced to reevaluate the web-archiving strategy. The project was never envisioned to be a competing project to the Ukraine Conflict project. Instead, it was supposed to capture complimentary data that could have been missed by other similar projects. This web archive has been made public to provide a glimpse of what was happening and what is happening in Ukraine. Research limitations/implications Now 24 years later, the ongoing conflict in Ukraine also appears to be unfolding on the World Wide Web. With the Russian annexation of Crimea and its unification to the Russian Federation, the governmental and non-governmental websites of the Ukrainian Crimea suddenly came to represent a sort of “an endangered archive”. The impetus for archiving the selected Ukrainian websites came as a result of the changing geopolitical realities of Crimea. The daily changes to the websites and also loss of information that is contained within them is one of the many problems faced by the users of these websites. In some cases, the likelihood of these websites is relatively high. This in turn was followed by the author’s desire to preserve the information about the daily lives in Ukraine’s east in light of the unfolding violent armed conflict. Originality/value Upon close survey of the Library and Information Sciences currently published articles on Ukraine Conflict, no articles that are currently dedicated to archiving the Crimean and Ukrainian situations were found.


Electronics ◽  
2020 ◽  
Vol 9 (8) ◽  
pp. 1255
Author(s):  
Hyun Cheon Hwang ◽  
Jin Gon Shon ◽  
Ji Su Park

A Web archive system is a traditional subject for preserving web content for the future and the importance is getting more significant due to the explosive growth of web content. The reference model for an open archival information system (OAIS) has been advising guidance for a long-term archiving system and most organizations that archive web content follow this guidance. In addition, the web archive (WARC) ISO standard is for web content archiving. However, there is no way to secure content integrity, and it is hard to identify the original. Because of limitations, a web archive system has a weakness against the dispute of content integrity. In this paper, we proposed the blockchain linked (BCLinked) web archiving system, which uses blockchain technology and an extended WARC field to keep a web content integrity metadata into a blockchain. Furthermore, we designed the BCLinked web archiving system, and we confirmed the proposed system secures content integrity through the experiment.


2018 ◽  
Vol 52 (2) ◽  
pp. 266-277 ◽  
Author(s):  
Hyo-Jung Oh ◽  
Dong-Hyun Won ◽  
Chonghyuck Kim ◽  
Sung-Hee Park ◽  
Yong Kim

Purpose The purpose of this paper is to describe the development of an algorithm for realizing web crawlers that automatically collect dynamically generated webpages from the deep web. Design/methodology/approach This study proposes and develops an algorithm to collect web information as if the web crawler gathers static webpages by managing script commands as links. The proposed web crawler actually experiments with the algorithm by collecting deep webpages. Findings Among the findings of this study is that if the actual crawling process provides search results as script pages, the outcome only collects the first page. However, the proposed algorithm can collect deep webpages in this case. Research limitations/implications To use a script as a link, a human must first analyze the web document. This study uses the web browser object provided by Microsoft Visual Studio as a script launcher, so it cannot collect deep webpages if the web browser object cannot launch the script, or if the web document contains script errors. Practical implications The research results show deep webs are estimated to have 450 to 550 times more information than surface webpages, and it is difficult to collect web documents. However, this algorithm helps to enable deep web collection through script runs. Originality/value This study presents a new method to be utilized with script links instead of adopting previous keywords. The proposed algorithm is available as an ordinary URL. From the conducted experiment, analysis of scripts on individual websites is needed to employ them as links.


First Monday ◽  
2020 ◽  
Author(s):  
Mikhail Fiadotau

After December 2020, Adobe Flash, a technology that was once the standard for rich and interactive Web content, will no longer be supported in browsers. This means users will not be able to access the thousands of diverse creations powered by Flash, from animations to digital games. This is particularly problematic for games, which cannot be easily converted into a more modern format. The threat of losing the legacy of Flash has provoked both reflection and action by online communities dedicated to animation and browser-based games, none more so than Newgrounds, the Web portal credited with popularizing Flash games at the turn of the century. As a result, groups of enthusiasts have been able to make significant progress in preserving Flash games. Still, they continue to face numerous challenges, from rigid copyright laws to a relative lack of recognition of the importance of preserving Flash games as such. Consolidating their efforts by joining forces with other like-minded groups may be the key not only to saving digital heritage created with Flash, but also to the longer-term survival of these creator communities themselves.


Author(s):  
Eveline Maria Florentina Vlassenroot ◽  
Sally Chambers ◽  
Friedel Geeraert ◽  
Peter Mechant

The web and online information has become of utmost importance. However, the short lifespan of online data (with 40% of content being removed after 1 year) poses serious challenges for preserving and safeguarding digital heritage and information. Hence, web or media historians, sociologists or digital scholars must learn to "dig" in online sources such as the Internet Archive or national web archives in order to find relevant research material. In this paper, we explore the requirements of researchers working with web archives and outline how they perceive the limitations and possibilities of using the archived web as a data resource, using survey data (n=154). We asked researchers with and without experience in working with web archives for, amongst others, the search functionalities and selection and access criteria they require. Given that archived web content is relatively new research material, new skills need to be acquired to work with this content which is not something evident or something every researcher is willing to do. Yakel & Thores (2003) point to three distinct forms of knowledge required to work effectively with these sources: (i) domain (subject) knowledge, (ii) artifactual literacy, and their own concept of (iii) archival intelligence. In addition to arriving at significant findings that demonstrate the relationship between researcher’s domain (subject) knowledge, archival intelligence and use frequency of web archives, this study discusses the limitations of using the archived web as a data resource and concludes with actions to overcome these hurdles and fulfill the desiderata of scholars.


Author(s):  
Jessica Ogden

Web archives - broadly conceived as any attempt to capture and preserve the Web for future use - are evermore central to discussions of digital access in the public sphere, as they provide tools for accessing parts of the Web that have been subject to neglect, removal or state and platform-based forms of content moderation and censorship. In this paper I discuss the cultural significance of web archiving through the example of Tumblr’s 2018 efforts to remove so-called ‘Not Safe for Work’ (NSFW) posts from the platform. The paper examines the archiving of Tumblr NSFW by Archive Team, a self-described ‘loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage’. Findings are presented through the concept of culture which provides a dual lens through which to understand web archiving practices as contingent upon the cultural worlds which they create and operate within. Here, web archiving as culture reveals the ways that practices shape (and are shaped by) online community membership, the nature of how and why the Web is archived and the reflexive significance participants place on their own web archival activities. The paper contributes to broader discussions of online community formation and raises further questions about the ethics and role of power in the production of web archives, as well as their positioning as historical representations of online cultures.


2020 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Ranjit Singh ◽  
Abid Ismail ◽  
Sibi PS ◽  
Dipendra Singh

PurposeThe purpose of this paper is to analyse the US states and territories’ official tourism information websites based on the Web Content Accessibility Guideline (WCAG) and Section 508 guidelines to identify the compliance of websites towards disabilities policies and their behaviour pattern.Design/methodology/approachThe official tourism websites of 57 states and territories were analysed through the TAW tool for WCAG 2.0 and AChecker for Section 508. Cluster analysis was used to produce a group of websites underlying the accessibility issues obtained from the online tool to understand the common pattern of behaviour.FindingsThe result revealed that websites have serious and significant accessibility issues underlying the prescribed guidelines that would interfere with the use of the website by disabled people. The main issues that make the website least accessible focussed on the following guideline of WCAG 2.0: compatible, navigable, text alternative, distinguishable and adaptable.Research limitations/implicationsThe empirical results provide the US states and territories’ tourism authority to better understand web accessibility in their websites and its impact on disabled people.Originality/valueAs the web plays an important role in individual lives, this study highlights the accessibility issues which need immediately focussed and technically planned actions from the respective states and territories to ensure that designed web content should communicate effectively and universally.


2017 ◽  
Vol 26 ◽  
Author(s):  
Moisés Rockembach

This paper approaches web archiving as preservation of digital memory and as a dynamic informational environment with complex problems of harvest, use, access and preservation. It uses a qualitative and exploratory-descriptive approach, identifying web archiving initiatives and promoting a reflection on the ways of defining web information collection, geographical gaps in web archiving and problems regarding uses and rights of this information. Whereas initiatives such as Internet Archive harvest a lot of information from across the web, an imbalance of digital memory exists where many countries do not possess their own web archiving initiatives, and therefore, coverage of information is unequally produced.


Archeion ◽  
2020 ◽  
pp. 445-465
Author(s):  
Bartłomiej Konopa

WEB archiving in Europe – National WEB Archives Web archiving, that is activities aimed at collecting and preserving Web resources, has been carried out for almost 25 years. During this time, many projects have been created to fulfill that task, as well as several organizations, such as the International Internet Preservation Consortium, that support it implementation. The article presents the development of activities in this area, and then presents the conclusions of the analysis of the functioning of selected European national Web archives, based on publicly available materials concerning them. This analysis was intended to examine how the Web is currently archived in this part of the world. Three main issues were considered: gathering, describing and access to the resources of the former WWW. The first of them covers the scope of archiving, namely determining what materials are subject to it, as well as the gathering strategies used for this purpose, which shape the archival collections. The second concerns the metadata and other elements used to convey information about what was collected during that process. The last element of the analysis includes the scope of access to archival WWW resources, existing restrictions and their causes, as well as the tools used for this. During the research, the author also became interested in the software used in individual projects. The obtained results show that the model of Web archive has been developed and the activities of the analyzed initiatives in Europe are very similar.


Sign in / Sign up

Export Citation Format

Share Document