scholarly journals Towards Real-Time Geodemographics: Clustering Algorithm Performance for Large Multidimensional Spatial Databases

2010 ◽  
Vol 14 (3) ◽  
pp. 283-297 ◽  
Author(s):  
Muhammad Adnan ◽  
Paul A Longley ◽  
Alex D Singleton ◽  
Chris Brunsdon
2019 ◽  
Vol 11 (3) ◽  
pp. 327 ◽  
Author(s):  
Xia Wang ◽  
Feng Ling ◽  
Huaiying Yao ◽  
Yaolin Liu ◽  
Shuna Xu

Mapping land surface water bodies from satellite images is superior to conventional in situ measurements. With the mission of long-term and high-frequency water quality monitoring, the launch of the Ocean and Land Colour Instrument (OLCI) onboard Sentinel-3A and Sentinel-3B provides the best possible approach for near real-time land surface water body mapping. Sentinel-3 OLCI contains 21 bands ranging from visible to near-infrared, but the spatial resolution is limited to 300 m, which may include lots of mixed pixels around the boundaries. Sub-pixel mapping (SPM) provides a good solution for the mixed pixel problem in water body mapping. In this paper, an unsupervised sub-pixel water body mapping (USWBM) method was proposed particularly for the Sentinel-3 OLCI image, and it aims to produce a finer spatial resolution (e.g., 30 m) water body map from the multispectral image. Instead of using the fraction maps of water/non-water or multispectral images combined with endmembers of water/non-water classes as input, USWBM directly uses the spectral water index images of the Normalized Difference Water Index (NDWI) extracted from the Sentinel-3 OLCI image as input and produces a water body map at the target finer spatial resolution. Without the collection of endmembers, USWBM accomplished the unsupervised process by developing a multi-scale spatial dependence based on an unsupervised sub-pixel Fuzzy C-means (FCM) clustering algorithm. In both validations in the Tibet Plate lake and Poyang lake, USWBM produced more accurate water body maps than the other pixel and sub-pixel based water body mapping methods. The proposed USWBM, therefore, has great potential to support near real-time sub-pixel water body mapping with the Sentinel-3 OLCI image.


2012 ◽  
Vol 1 (3) ◽  
pp. 49-61 ◽  
Author(s):  
Michael Auer

Parallel processing methods in Geographic Information Systems (GIS) are traditionally used to accelerate the calculation of large data volumes with sophisticated spatial algorithms. Such kinds of acceleration can also be applied to provide real-time GIS applications to improve the responsiveness of user interactions with the data. This paper presents a method to enable this approach for Web GIS applications. It uses the JavaScript 3D graphics API (WebGL) to perform client-side parallel real-time computations of 2D or 2.5D spatial raster algorithms on the graphics card. The potential of this approach is evaluated using an example implementation of a hillshade algorithm. Performance comparisons of parallel and sequential computations reveal acceleration factors between 25 and 100, mainly depending on mobile or desktop environments.


2013 ◽  
Vol 765-767 ◽  
pp. 670-673
Author(s):  
Li Bo Hou

Fuzzy C-means (FCM) clustering algorithm is one of the widely applied algorithms in non-supervision of pattern recognition. However, FCM algorithm in the iterative process requires a lot of calculations, especially when feature vectors has high-dimensional, Use clustering algorithm to sub-heap, not only inefficient, but also may lead to "the curse of dimensionality." For the problem, This paper analyzes the fuzzy C-means clustering algorithm in high dimensional feature of the process, the problem of cluster center is an np-hard problem, In order to improve the effectiveness and Real-time of fuzzy C-means clustering algorithm in high dimensional feature analysis, Combination of landmark isometric (L-ISOMAP) algorithm, Proposed improved algorithm FCM-LI. Preliminary analysis of the samples, Use clustering results and the correlation of sample data, using landmark isometric (L-ISOMAP) algorithm to reduce the dimension, further analysis on the basis, obtained the final results. Finally, experimental results show that the effectiveness and Real-time of FCM-LI algorithm in high dimensional feature analysis.


2021 ◽  
Vol 11 (22) ◽  
pp. 10596
Author(s):  
Chung-Hong Lee ◽  
Hsin-Chang Yang ◽  
Yenming J. Chen ◽  
Yung-Lin Chuang

Recently, an emerging application field through Twitter messages and algorithmic computation to detect real-time world events has become a new paradigm in the field of data science applications. During a high-impact event, people may want to know the latest information about the development of the event because they want to better understand the situation and possible trends of the event for making decisions. However, often in emergencies, the government or enterprises are usually unable to notify people in time for early warning and avoiding risks. A sensible solution is to integrate real-time event monitoring and intelligence gathering functions into their decision support system. Such a system can provide real-time event summaries, which are updated whenever important new events are detected. Therefore, in this work, we combine a developed Twitter-based real-time event detection algorithm with pre-trained language models for summarizing emergent events. We used an online text-stream clustering algorithm and self-adaptive method developed to gather the Twitter data for detection of emerging events. Subsequently we used the Xsum data set with a pre-trained language model, namely T5 model, to train the summarization model. The Rouge metrics were used to compare the summary performance of various models. Subsequently, we started to use the trained model to summarize the incoming Twitter data set for experimentation. In particular, in this work, we provide a real-world case study, namely the COVID-19 pandemic event, to verify the applicability of the proposed method. Finally, we conducted a survey on the example resulting summaries with human judges for quality assessment of generated summaries. From the case study and experimental results, we have demonstrated that our summarization method provides users with a feasible method to quickly understand the updates in the specific event intelligence based on the real-time summary of the event story.


2020 ◽  
Vol 10 (4) ◽  
pp. 1227 ◽  
Author(s):  
Xiaozheng Wang ◽  
Minglun Zhang ◽  
Hongyu Zhou ◽  
Xinglong Lin ◽  
Xiaomin Ren

In maritime communications, the ubiquitous Morse lamp on ships plays a significant role as one of the most common backups to radio or satellites just in case. Despite the advantages of its simplicity and efficiency, the requirement of trained operators proficient in Morse code and maintaining stable sending speed pose a key challenge to this traditional manual signaling manner. To overcome these problems, an automatic system is needed to provide a partial substitute for human effort. However, few works have focused on studying an automatic recognition scheme of maritime manually sent-like optical Morse signals. To this end, this paper makes the first attempt to design and implement a robust real-time automatic recognition prototype for onboard Morse lamps. A modified k-means clustering algorithm of machine learning is proposed to optimize the decision threshold and identify elements in Morse light signals. A systematic framework and detailed recognition algorithm procedure are presented. The feasibility of the proposed system is verified via experimental tests using a light-emitting diode (LED) array, self-designed receiver module, and microcontroller unit (MCU). Experimental results indicate that over 99% of real-time recognition accuracy is realized with a signal-to-noise ratio (SNR) greater than 5 dB, and the system can achieve good robustness under conditions with low SNR.


2020 ◽  
Vol 13 (3) ◽  
pp. 261-282
Author(s):  
Mohammad Khalid Pandit ◽  
Roohie Naaz Mir ◽  
Mohammad Ahsan Chishti

PurposeThe intelligence in the Internet of Things (IoT) can be embedded by analyzing the huge volumes of data generated by it in an ultralow latency environment. The computational latency incurred by the cloud-only solution can be significantly brought down by the fog computing layer, which offers a computing infrastructure to minimize the latency in service delivery and execution. For this purpose, a task scheduling policy based on reinforcement learning (RL) is developed that can achieve the optimal resource utilization as well as minimum time to execute tasks and significantly reduce the communication costs during distributed execution.Design/methodology/approachTo realize this, the authors proposed a two-level neural network (NN)-based task scheduling system, where the first-level NN (feed-forward neural network/convolutional neural network [FFNN/CNN]) determines whether the data stream could be analyzed (executed) in the resource-constrained environment (edge/fog) or be directly forwarded to the cloud. The second-level NN ( RL module) schedules all the tasks sent by level 1 NN to fog layer, among the available fog devices. This real-time task assignment policy is used to minimize the total computational latency (makespan) as well as communication costs.FindingsExperimental results indicated that the RL technique works better than the computationally infeasible greedy approach for task scheduling and the combination of RL and task clustering algorithm reduces the communication costs significantly.Originality/valueThe proposed algorithm fundamentally solves the problem of task scheduling in real-time fog-based IoT with best resource utilization, minimum makespan and minimum communication cost between the tasks.


2020 ◽  
pp. 097215091988979 ◽  
Author(s):  
Dhirendra Prajapati ◽  
Arjun R Harish ◽  
Yash Daultani ◽  
Harpreet Singh ◽  
Saurabh Pratap

This study considers the fresh food city logistics that involves the last-mile distribution of commodities to the customer locations from the local distribution centres (LDCs) established by the e-commerce firms. In this scenario, the last-mile logistics is crucial for its speed of response and the effectiveness in distribution of packages to the target destinations. We propose a clustering-based routing heuristic (CRH) to manage the vehicle routing for the last-mile logistic operations of fresh food in e-commerce. CRH is a clustering algorithm that performs repetitive clustering of demand nodes until the nodes within each cluster become serviceable by a single vehicle. The computational complexity of the algorithm is reduced due to the downsizing of the network through clustering and, hence, produces an optimum feasible solution in less computational time. The algorithm performance was analysed using various operating scenarios and satisfactory results were obtained.


2020 ◽  
Vol 10 (19) ◽  
pp. 6702
Author(s):  
Eugenia Ana Capota ◽  
Cristina Sorina Stangaciu ◽  
Mihai Victor Micea ◽  
Daniel-Ioan Curiac

In mixed criticality systems (MCSs), the time-triggered scheduling approach focuses on a special case of safety-critical embedded applications which run in a time-triggered environment. Sometimes, for these types of MCSs, perfectly periodical (i.e., jitterless) scheduling for certain critical tasks is needed. In this paper, we propose FENP_MC (Fixed Execution Non-Preemptive Mixed Criticality), a real-time, table-driven, non-preemptive scheduling method specifically adapted to mixed criticality systems which guarantees jitterless execution in a mixed criticality time-triggered environment. We also provide a multiprocessor version, namely, P_FENP_MC (Partitioned Fixed Execution Non-Preemptive Mixed Criticality), using a partitioning heuristic. Feasibility tests are proposed for both uniprocessor and homogenous multiprocessor systems. An analysis of the algorithm performance is presented in terms of success ratio and scheduling jitter by comparing it against a time-triggered and an event-driven method in a non-preemptive context.


Author(s):  
Andrew Brown ◽  
Jonathan Rogers

Successful navigation of small, unmanned aerial vehicles (UAVs) in cluttered environments is a challenging task, especially in the presence of turbulent winds and state estimation uncertainty. This paper proposes a probabilistic path planner for UAVs operating in cluttered environments. Unlike previous sampling-based approaches which select robust paths from a set of trajectory candidates, the proposed algorithm seeks to modify an initial desired path so that it satisfies obstacle avoidance constraints. Given a desired path, Monte Carlo uncertainty propagation is performed and obstacle collision risks are quantified at discrete intervals along the trajectory. A numerical optimization algorithm is used to modify the flight path around obstacles and reduce probability of collision while maintaining as much of the originally desired path as possible. The proposed path planner is specifically designed to leverage embedded massively parallel computers for near real-time uncertainty propagation. Thus the planner can be run in real-time in a feedback manner, modifying the path appropriately as new measurements are obtained. Example results for a standard quadrotor show the ability of the path planning scheme to successfully generate trajectories in cluttered environments. Trade studies characterize algorithm performance as a function of obstacle density and collision risk acceptability.


Author(s):  
John Waller

Geographic outliers at GBIF (Global Biodiversity Information Facility) are a known problem. Outliers can be errors, coordinates with high uncertainty, or simply occurrences from an undersampled region. Often in data cleaning pipelines, outliers are removed (even if they are legitimate points) because the researcher does not have time to verify each record one-by-one. Outlier points are usually occurrences that need attention. Currently, there is no outlier detection implemented at GBIF and it is up to the user to flag outliers themselves. DBSCAN (a density-based algorithm for discovering clusters in large spatial databases with noise) is a simple and popular clustering algorithm. It uses two parameters, (1) distance and (2) a minimum number of points per cluster, to decide if something is an outlier. Since occurrence data can be very patchy, non-clustering distance-based methods will fail often Fig. 1. DBSCAN does not need to know the expected number of clusters in advance. DBSCAN does well using only distance and does not require some additional environmental variables like Bioclim. Advanatages of DBSCAN : Simple Easy to understand Only two parameters to set Scales well No additional data sources needed Users would understand how their data was changed Simple Easy to understand Only two parameters to set Scales well No additional data sources needed Users would understand how their data was changed Drawbacks : Only uses distance Must choose parameter settings Sensitive to sparse global sampling Does not include any other relevant environmental information Can only flag outliers outside of a point blob Only uses distance Must choose parameter settings Sensitive to sparse global sampling Does not include any other relevant environmental information Can only flag outliers outside of a point blob Outlier detection and error detection are different. If your goal is to produce a system with no false positives, it will fail. While more complex environmentally-informed outlier detection methods (like reverse jackknifing (Chapman 2005)) might perform better for certain examples or even in genreal, DBSCAN performs adequately on almost everything despite being very simple. Currently I am using DBSCAN to find errors and assess dataset quality. It is a Spark job written in Scala (github). It does not run on species with lots of (>30K) unique latitude-longitude points, since the current implementation relies on an in-memory distance matrix. However, around 99% of species (plants, animals, fungi) on GBIF have fewer than >30K unique lat-long points (2,283 species keys / 222,993 species keys). There are other implementations ( example) that might scale to many more points. There are no immediate plans to include DBSCAN outliers as a data quality flag on GBIF, but it could be done somewhat easily, since this type of method does not rely on any external environmental data sources and already runs on the GBIF cluster.


Sign in / Sign up

Export Citation Format

Share Document