scholarly journals Leveraging Road Characteristics and Contributor Behaviour for Assessing Road Type Quality in OSM

2021 ◽  
Vol 10 (7) ◽  
pp. 436
Author(s):  
Amerah Alghanim ◽  
Musfira Jilani ◽  
Michela Bertolotto ◽  
Gavin McArdle

Volunteered Geographic Information (VGI) is often collected by non-expert users. This raises concerns about the quality and veracity of such data. There has been much effort to understand and quantify the quality of VGI. Extrinsic measures which compare VGI to authoritative data sources such as National Mapping Agencies are common but the cost and slow update frequency of such data hinder the task. On the other hand, intrinsic measures which compare the data to heuristics or models built from the VGI data are becoming increasingly popular. Supervised machine learning techniques are particularly suitable for intrinsic measures of quality where they can infer and predict the properties of spatial data. In this article we are interested in assessing the quality of semantic information, such as the road type, associated with data in OpenStreetMap (OSM). We have developed a machine learning approach which utilises new intrinsic input features collected from the VGI dataset. Specifically, using our proposed novel approach we obtained an average classification accuracy of 84.12%. This result outperforms existing techniques on the same semantic inference task. The trustworthiness of the data used for developing and training machine learning models is important. To address this issue we have also developed a new measure for this using direct and indirect characteristics of OSM data such as its edit history along with an assessment of the users who contributed the data. An evaluation of the impact of data determined to be trustworthy within the machine learning model shows that the trusted data collected with the new approach improves the prediction accuracy of our machine learning technique. Specifically, our results demonstrate that the classification accuracy of our developed model is 87.75% when applied to a trusted dataset and 57.98% when applied to an untrusted dataset. Consequently, such results can be used to assess the quality of OSM and suggest improvements to the data set.

2020 ◽  
Vol 10 (2) ◽  
pp. 1-26
Author(s):  
Naghmeh Moradpoor Sheykhkanloo ◽  
Adam Hall

An insider threat can take on many forms and fall under different categories. This includes malicious insider, careless/unaware/uneducated/naïve employee, and the third-party contractor. Machine learning techniques have been studied in published literature as a promising solution for such threats. However, they can be biased and/or inaccurate when the associated dataset is hugely imbalanced. Therefore, this article addresses the insider threat detection on an extremely imbalanced dataset which includes employing a popular balancing technique known as spread subsample. The results show that although balancing the dataset using this technique did not improve performance metrics, it did improve the time taken to build the model and the time taken to test the model. Additionally, the authors realised that running the chosen classifiers with parameters other than the default ones has an impact on both balanced and imbalanced scenarios, but the impact is significantly stronger when using the imbalanced dataset.


Missing data arise major issues in the large database regarding quantitative analysis. Due to this issues, the inference of the computational process produce bias results, more damage of data, the error rate can increase, and more difficult to accomplish the process of imputation. Prediction of disguised missing data occurs in the large data sets are another major problems in real time operation. Machine learning (ML) techniques to connect with the classification of measurement to enforce the accuracy rate of predictive values. These techniques overcome the various challenges to the problem of losing data. Recent work based on the prediction of misclassification using supervised ML approach; to predict an output for an unseen input with limited parameters in a data set. When increase the size of parameter, then it generates the outcome of less accuracy rate. This article presented a new approach COBACO, an effective supervised machine learning technique. Several strategies describe the classification of predictive techniques for missing data analysis in efficient supervised machine learning techniques. The proposed predictive techniques COBACO generated more precise, accurate results than the other predictive approaches. The Experimental results obtained using both real and synthetic data set show that the proposed approach offers a valuable and promising insight to the problem of prediction of missing information.


2019 ◽  
pp. 469-487
Author(s):  
Musfira Jilani ◽  
Michela Bertolotto ◽  
Padraig Corcoran ◽  
Amerah Alghanim

Nowadays an ever-increasing number of applications require complete and up-to-date spatial data, in particular maps. However, mapping is an expensive process and the vastness and dynamics of our world usually render centralized and authoritative maps outdated and incomplete. In this context crowd-sourced maps have the potential to provide a complete, up-to-date, and free representation of our world. However, the proliferation of such maps largely remains limited due to concerns about their data quality. While most of the current data quality assessment mechanisms for such maps require referencing to authoritative maps, we argue that such referencing of a crowd-sourced spatial database is ineffective. Instead we focus on the use of machine learning techniques that we believe have the potential to not only allow the assessment but also to recommend the improvement of the quality of crowd-sourced maps without referencing to external databases. This chapter gives an overview of these approaches.


2020 ◽  
Author(s):  
Cecilia Contreras ◽  
Mahdi Khodadadzadeh ◽  
Laura Tusa ◽  
Richard Gloaguen

<p>Drilling is a key task in exploration campaigns to characterize mineral deposits at depth. Drillcores<br>are first logged in the field by a geologist and with regards to, e.g., mineral assemblages,<br>alteration patterns, and structural features. The core-logging information is then used to<br>locate and target the important ore accumulations and select representative samples that are<br>further analyzed by laboratory measurements (e.g., Scanning Electron Microscopy (SEM), Xray<br>diffraction (XRD), X-ray Fluorescence (XRF)). However, core-logging is a laborious task and<br>subject to the expertise of the geologist.<br>Hyperspectral imaging is a non-invasive and non-destructive technique that is increasingly<br>being used to support the geologist in the analysis of drill-core samples. Nonetheless, the<br>benefit and impact of using hyperspectral data depend on the applied methods. With this in<br>mind, machine learning techniques, which have been applied in different research fields,<br>provide useful tools for an advance and more automatic analysis of the data. Lately, machine<br>learning frameworks are also being implemented for mapping minerals in drill-core<br>hyperspectral data.<br>In this context, this work follows an approach to map minerals on drill-core hyperspectral data<br>using supervised machine learning techniques, in which SEM data, integrated with the mineral<br>liberation analysis (MLA) software, are used in training a classifier. More specifically, the highresolution<br>mineralogical data obtained by SEM-MLA analysis is resampled and co-registered<br>to the hyperspectral data to generate a training set. Due to the large difference in spatial<br>resolution between the SEM-MLA and hyperspectral images, a pre-labeling strategy is<br>required to link these two images at the hyperspectral data spatial resolution. In this study,<br>we use the SEM-MLA image to compute the abundances of minerals for each hyperspectral<br>pixel in the corresponding SEM-MLA region. We then use the abundances as features in a<br>clustering procedure to generate the training labels. In the final step, the generated training<br>set is fed into a supervised classification technique for the mineral mapping over a large area<br>of a drill-core. The experiments are carried out on a visible to near-infrared (VNIR) and shortwave<br>infrared (SWIR) hyperspectral data set and based on preliminary tests the mineral<br>mapping task improves significantly.</p>


Data Science in healthcare is a innovative and capable for industry implementing the data science applications. Data analytics is recent science in to discover the medical data set to explore and discover the disease. It’s a beginning attempt to identify the disease with the help of large amount of medical dataset. Using this data science methodology, it makes the user to find their disease without the help of health care centres. Healthcare and data science are often linked through finances as the industry attempts to reduce its expenses with the help of large amounts of data. Data science and medicine are rapidly developing, and it is important that they advance together. Health care information is very effective in the society. In a human life day to day heart disease had increased. Based on the heart disease to monitor different factors in human body to analyse and prevent the heart disease. To classify the factors using the machine learning algorithms and to predict the disease is major part. Major part of involves machine level based supervised learning algorithm such as SVM, Naviebayes, Decision Trees and Random forest.


Author(s):  
Marko Pregeljc ◽  
Erik Štrumbelj ◽  
Miran Mihelcic ◽  
Igor Kononenko

The authors employed traditional and novel machine learning to improve insight into the connections between the quality of an organization of enterprises as a type of formal social units and the results of enterprises’ performance in this chapter. The analyzed data set contains 72 Slovenian enterprises’ economic results across four years and indicators of their organizational quality. The authors hypothesize that a causal relationship exists between the latter and the former. In the first part of a two-part process, they use several classification algorithms to study these relationships and to evaluate how accurately they predict the target economic results. However, the most successful models were often very complex and difficult to interpret, especially for non-technical users. Therefore, in the second part, the authors take advantage of a novel general explanation method that can be used to explain the influence of individual features on the model’s prediction. Results show that traditional machine-learning approaches are successful at modeling the dependency relationship. Furthermore, the explanation of the influence of the input features on the predicted economic results provides insights that have a meaningful economic interpretation.


Author(s):  
Musfira Jilani ◽  
Michela Bertolotto ◽  
Padraig Corcoran ◽  
Amerah Alghanim

Nowadays an ever-increasing number of applications require complete and up-to-date spatial data, in particular maps. However, mapping is an expensive process and the vastness and dynamics of our world usually render centralized and authoritative maps outdated and incomplete. In this context crowd-sourced maps have the potential to provide a complete, up-to-date, and free representation of our world. However, the proliferation of such maps largely remains limited due to concerns about their data quality. While most of the current data quality assessment mechanisms for such maps require referencing to authoritative maps, we argue that such referencing of a crowd-sourced spatial database is ineffective. Instead we focus on the use of machine learning techniques that we believe have the potential to not only allow the assessment but also to recommend the improvement of the quality of crowd-sourced maps without referencing to external databases. This chapter gives an overview of these approaches.


2020 ◽  
Vol 9 (9) ◽  
pp. 504
Author(s):  
Quy Truong ◽  
Guillaume Touya ◽  
Cyril Runz

Though Volunteered Geographic Information (VGI) has the advantage of providing free open spatial data, it is prone to vandalism, which may heavily decrease the quality of these data. Therefore, detecting vandalism in VGI may constitute a first way of assessing the data in order to improve their quality. This article explores the ability of supervised machine learning approaches to detect vandalism in OpenStreetMap (OSM) in an automated way. For this purpose, our work includes the construction of a corpus of vandalism data, given that no OSM vandalism corpus is available so far. Then, we investigate the ability of random forest methods to detect vandalism on the created corpus. Experimental results show that random forest classifiers perform well in detecting vandalism in the same geographical regions that were used for training the model and has more issues with vandalism detection in “unfamiliar regions”.


Sign in / Sign up

Export Citation Format

Share Document