HARVESTING, INTEGRATING AND DISTRIBUTING LARGE OPEN GEOSPATIAL DATASETS USING FREE AND OPEN-SOURCE SOFTWARE

Federal, State and Local government agencies in the USA are investing heavily on the dissemination of Open Data sets produced by each of them. The main driver behind this thrust is to increase agencies’ transparency and accountability, as well as to improve citizens’ awareness. However, not all Open Data sets are easy to access and integrate with other Open Data sets available even from the same agency. The City and County of Denver Open Data Portal distributes several types of geospatial datasets, one of them is the city parcels information containing 224,256 records. Although this data layer contains many pieces of information it is incomplete for some custom purposes. Open-Source Software were used to first collect data from diverse City of Denver Open Data sets, then upload them to a repository in the Cloud where they were processed using a PostgreSQL installation on the Cloud and Python scripts. Our method was able to extract non-spatial information from a ‘not-ready-to-download’ source that could then be combined with the initial data set to enhance its potential use.

Download Full-text

HARVESTING, INTEGRATING AND DISTRIBUTING LARGE OPEN GEOSPATIAL DATASETS USING FREE AND OPEN-SOURCE SOFTWARE

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprsarchives-xli-b7-939-2016 ◽

2016 ◽

Vol XLI-B7 ◽

pp. 939-940 ◽

Cited By ~ 1

Author(s):

Ricardo Oliveira ◽

Rafael Moreno

Keyword(s):

Open Source ◽

Open Source Software ◽

Spatial Information ◽

Open Data ◽

Federal State ◽

Data Sets ◽

Data Set ◽

Geospatial Datasets ◽

State And Local ◽

The City

Download Full-text

Using Open Source, Open Data, and Civic Technology to Address the COVID-19 Pandemic and Infodemic

Yearbook of Medical Informatics ◽

10.1055/s-0041-1726488 ◽

2021 ◽

Author(s):

Shinji Kobayashi ◽

Luis Falcón ◽

Hamish Fraser ◽

Jørn Braa ◽

Pamod Amarakoon ◽

...

Keyword(s):

Open Source ◽

Medical Informatics ◽

Open Source Software ◽

Collective Intelligence ◽

Collaborative Work ◽

Open Data ◽

Theme Issue ◽

Health Organizations ◽

The World ◽

Civic Technology

Objectives: The emerging COVID-19 pandemic has caused one of the world’s worst health disasters compounded by social confusion with misinformation, the so-called “Infodemic”. In this paper, we discuss how open technology approaches - including data sharing, visualization, and tooling - can address the COVID-19 pandemic and infodemic. Methods: In response to the call for participation in the 2020 International Medical Informatics Association (IMIA) Yearbook theme issue on Medical Informatics and the Pandemic, the IMIA Open Source Working Group surveyed recent works related to the use of Free/Libre/Open Source Software (FLOSS) for this pandemic. Results: FLOSS health care projects including GNU Health, OpenMRS, DHIS2, and others, have responded from the early phase of this pandemic. Data related to COVID-19 have been published from health organizations all over the world. Civic Technology, and the collaborative work of FLOSS and open data groups were considered to support collective intelligence on approaches to managing the pandemic. Conclusion: FLOSS and open data have been effectively used to contribute to managing the COVID-19 pandemic, and open approaches to collaboration can improve trust in data.

Download Full-text

Geospatial Open Data Usage and Metadata Quality

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10010030 ◽

2021 ◽

Vol 10 (1) ◽

pp. 30

Author(s):

Alfonso Quarati ◽

Monica De Martino ◽

Sergio Rosim

Keyword(s):

Spatial Information ◽

Open Data ◽

Poor Quality ◽

Data Usage ◽

Open Government Data ◽

Metadata Quality ◽

Geospatial Datasets ◽

Government Data ◽

Determining Factor

The Open Government Data portals (OGD), thanks to the presence of thousands of geo-referenced datasets, containing spatial information are of extreme interest for any analysis or process relating to the territory. For this to happen, users must be enabled to access these datasets and reuse them. An element often considered as hindering the full dissemination of OGD data is the quality of their metadata. Starting from an experimental investigation conducted on over 160,000 geospatial datasets belonging to six national and international OGD portals, this work has as its first objective to provide an overview of the usage of these portals measured in terms of datasets views and downloads. Furthermore, to assess the possible influence of the quality of the metadata on the use of geospatial datasets, an assessment of the metadata for each dataset was carried out, and the correlation between these two variables was measured. The results obtained showed a significant underutilization of geospatial datasets and a generally poor quality of their metadata. In addition, a weak correlation was found between the use and quality of the metadata, not such as to assert with certainty that the latter is a determining factor of the former.

Download Full-text

An Analysis of the Dynamics of the Legitimation Processes of Innovations in Open Source Software:

10.32920/ryerson.14660454.v1 ◽

2021 ◽

Author(s):

Soran Nouri

Keyword(s):

Open Source ◽

Open Source Software ◽

Action Theory ◽

Communicative Action ◽

Data Set ◽

Rational Persuasion ◽

Validity Claims ◽

Action Type ◽

Influence Tactic ◽

Bug Fixes

Within the Open Source Software (OSS) literature, there is a lack of studies addressing the legitimation processes of innovations that are born in OSS. This study sets out to analyze the legitimation processes of innovations within the deliberations of the Drupal project. The data set constitutes 52 rational deliberation cases discussing innovations that were proposed by members of the community. Habermas’s Ideal Speech Situations (ISS) is used as the framework to view Drupal’s rational deliberations from; in fact within the 52 cases that are examined in this thesis, there were no violations to the guidelines of the ISS in the deliberations. The Communicative Action Theory, Influence Tactics theory and the theory of Validity Claims are aspects of the framework that is used to code and analyze the conversations. These aspects allow for an effective conceptualization of the dynamics of the Drupal deliberations. This thesis was able to find that legitimation processes of innovations in open source software were influenced by the type, complexity and implications of the innovations on the rest of the community. Also, bug fixes, complex innovations and innovations that have implications on the rest of the software will result in a long (in terms of number of comments) legitimation process. Also, it is empirically backed in this study that in open deliberations that aim at achieving mutual understanding towards a common goal, the communicative action type and the rational persuasion influence tactic are the most common methods for innovators to interact with the community.

Download Full-text

Characterising RDF data sets

Journal of Information Science ◽

10.1177/0165551516677945 ◽

2017 ◽

Vol 44 (2) ◽

pp. 203-229 ◽

Cited By ~ 6

Author(s):

Javier D Fernández ◽

Miguel A Martínez-Prieto ◽

Pablo de la Fuente Redondo ◽

Claudio Gutiérrez

Keyword(s):

Data Structures ◽

Large Scale ◽

Open Data ◽

Structural Features ◽

Data Sets ◽

Data Set ◽

Wide Range ◽

Rdf Data ◽

Description Framework ◽

Resource Description

The publication of semantic web data, commonly represented in Resource Description Framework (RDF), has experienced outstanding growth over the last few years. Data from all fields of knowledge are shared publicly and interconnected in active initiatives such as Linked Open Data. However, despite the increasing availability of applications managing large-scale RDF information such as RDF stores and reasoning tools, little attention has been given to the structural features emerging in real-world RDF data. Our work addresses this issue by proposing specific metrics to characterise RDF data. We specifically focus on revealing the redundancy of each data set, as well as common structural patterns. We evaluate the proposed metrics on several data sets, which cover a wide range of designs and models. Our findings provide a basis for more efficient RDF data structures, indexes and compressors.

Download Full-text

Sharing Open Data in Agriculture

Advances in Library and Information Science - Open Access Implications for Sustainable Social, Political, and Economic Development ◽

10.4018/978-1-7998-5018-2.ch013 ◽

2021 ◽

pp. 244-266

Author(s):

Liah Shonhe

Keyword(s):

Agricultural Sector ◽

Open Data ◽

Research Data ◽

Data Sets ◽

Research Activity ◽

African Countries ◽

Data Set ◽

Data Repositories ◽

Bibliographic Data ◽

Prolific Authors

The main focus of the study was to explore the practices of open data sharing in the agricultural sector, including establishing the research outputs concerning open data in agriculture. The study adopted a desktop research methodology based on literature review and bibliographic data from WoS database. Bibliometric indicators discussed include yearly productivity, most prolific authors, and enhanced countries. Study findings revealed that research activity in the field of agriculture and open access is very low. There were 36 OA articles and only 6 publications had an open data badge. Most researchers do not yet embrace the need to openly publish their data set despite the availability of numerous open data repositories. Unfortunately, most African countries are still lagging behind in management of agricultural open data. The study therefore recommends that researchers should publish their research data sets as OA. African countries need to put more efforts in establishing open data repositories and implementing the necessary policies to facilitate OA.

Download Full-text

Open-Source Data Collection and Data Sets for Activity Recognition in Smart Homes

Sensors ◽

10.3390/s20030879 ◽

2020 ◽

Vol 20 (3) ◽

pp. 879 ◽

Cited By ~ 2

Author(s):

Uwe Köckemann ◽

Marjan Alirezaie ◽

Jennifer Renoux ◽

Nicolas Tsiftes ◽

Mobyen Uddin Ahmed ◽

...

Keyword(s):

Data Collection ◽

Activity Recognition ◽

Care Home ◽

Open Data ◽

Ground Truth ◽

Smart Homes ◽

Sensor Data ◽

Data Sets ◽

Data Set ◽

Home Setting

As research in smart homes and activity recognition is increasing, it is of ever increasing importance to have benchmarks systems and data upon which researchers can compare methods. While synthetic data can be useful for certain method developments, real data sets that are open and shared are equally as important. This paper presents the E-care@home system, its installation in a real home setting, and a series of data sets that were collected using the E-care@home system. Our first contribution, the E-care@home system, is a collection of software modules for data collection, labeling, and various reasoning tasks such as activity recognition, person counting, and configuration planning. It supports a heterogeneous set of sensors that can be extended easily and connects collected sensor data to higher-level Artificial Intelligence (AI) reasoning modules. Our second contribution is a series of open data sets which can be used to recognize activities of daily living. In addition to these data sets, we describe the technical infrastructure that we have developed to collect the data and the physical environment. Each data set is annotated with ground-truth information, making it relevant for researchers interested in benchmarking different algorithms for activity recognition.

Download Full-text

Mapping changes in the affordability of London with open-source software and open data: 1997–2012

Regional Studies Regional Science ◽

10.1080/21681376.2014.985702 ◽

2014 ◽

Vol 1 (1) ◽

pp. 336-338 ◽

Cited By ~ 2

Author(s):

Jonathan Reades

Keyword(s):

Open Source ◽

Open Source Software ◽

Open Data

Download Full-text

Has open data arrived at the British Medical Journal (BMJ)? An observational study

BMJ Open ◽

10.1136/bmjopen-2016-011784 ◽

2016 ◽

Vol 6 (10) ◽

pp. e011784 ◽

Cited By ~ 27

Author(s):

Anisa Rowhani-Farid ◽

Adrian G Barnett

Keyword(s):

Medical Journal ◽

British Medical Journal ◽

Observational Study ◽

Data Sharing ◽

Meta Analysis ◽

Open Data ◽

Research Articles ◽

Data Sets ◽

Data Set ◽

Article 50

ObjectiveTo quantify data sharing trends and data sharing policy compliance at the British Medical Journal (BMJ) by analysing the rate of data sharing practices, and investigate attitudes and examine barriers towards data sharing.DesignObservational study.SettingThe BMJ research archive.Participants160 randomly sampled BMJ research articles from 2009 to 2015, excluding meta-analysis and systematic reviews.Main outcome measuresPercentages of research articles that indicated the availability of their raw data sets in their data sharing statements, and those that easily made their data sets available on request.Results3 articles contained the data in the article. 50 out of 157 (32%) remaining articles indicated the availability of their data sets. 12 used publicly available data and the remaining 38 were sent email requests to access their data sets. Only 1 publicly available data set could be accessed and only 6 out of 38 shared their data via email. So only 7/157 research articles shared their data sets, 4.5% (95% CI 1.8% to 9%). For 21 clinical trials bound by the BMJ data sharing policy, the per cent shared was 24% (8% to 47%).ConclusionsDespite the BMJ's strong data sharing policy, sharing rates are low. Possible explanations for low data sharing rates could be: the wording of the BMJ data sharing policy, which leaves room for individual interpretation and possible loopholes; that our email requests ended up in researchers spam folders; and that researchers are not rewarded for sharing their data. It might be time for a more effective data sharing policy and better incentives for health and medical researchers to share their data.

Download Full-text

Data Fusion Using a Multi-Sensor Sparse-Based Clustering Algorithm

Remote Sensing ◽

10.3390/rs12234007 ◽

2020 ◽

Vol 12 (23) ◽

pp. 4007

Author(s):

Kasra Rafiezadeh Shahi ◽

Pedram Ghamisi ◽

Behnood Rasti ◽

Robert Jackisch ◽

Paul Scheunders ◽

...

Keyword(s):

Clustering Algorithm ◽

Spatial Information ◽

Clustering Algorithms ◽

Hyperspectral Data ◽

Sensor Data ◽

Data Sets ◽

Data Types ◽

Data Set ◽

Multiple Data Sets ◽

Imaging Sensors

The increasing amount of information acquired by imaging sensors in Earth Sciences results in the availability of a multitude of complementary data (e.g., spectral, spatial, elevation) for monitoring of the Earth’s surface. Many studies were devoted to investigating the usage of multi-sensor data sets in the performance of supervised learning-based approaches at various tasks (i.e., classification and regression) while unsupervised learning-based approaches have received less attention. In this paper, we propose a new approach to fuse multiple data sets from imaging sensors using a multi-sensor sparse-based clustering algorithm (Multi-SSC). A technique for the extraction of spatial features (i.e., morphological profiles (MPs) and invariant attribute profiles (IAPs)) is applied to high spatial-resolution data to derive the spatial and contextual information. This information is then fused with spectrally rich data such as multi- or hyperspectral data. In order to fuse multi-sensor data sets a hierarchical sparse subspace clustering approach is employed. More specifically, a lasso-based binary algorithm is used to fuse the spectral and spatial information prior to automatic clustering. The proposed framework ensures that the generated clustering map is smooth and preserves the spatial structures of the scene. In order to evaluate the generalization capability of the proposed approach, we investigate its performance not only on diverse scenes but also on different sensors and data types. The first two data sets are geological data sets, which consist of hyperspectral and RGB data. The third data set is the well-known benchmark Trento data set, including hyperspectral and LiDAR data. Experimental results indicate that this novel multi-sensor clustering algorithm can provide an accurate clustering map compared to the state-of-the-art sparse subspace-based clustering algorithms.

Download Full-text