Efficient Shared Execution Processing of k-Nearest Neighbor Joins in Road Networks

Mobile Information Systems ◽

10.1155/2018/1243289 ◽

2018 ◽

Vol 2018 ◽

pp. 1-17 ◽

Cited By ~ 1

Author(s):

Hyung-Ju Cho

Keyword(s):

Euclidean Distance ◽

Nearest Neighbor ◽

Real Life ◽

Road Networks ◽

Nearest Neighbors ◽

Superior Performance ◽

K Nearest Neighbor ◽

Wide Range ◽

Primitive Operation ◽

Nested Loop

We investigate the k-nearest neighbor (kNN) join in road networks to determine the k-nearest neighbors (NNs) from a dataset S to every object in another dataset R. The kNN join is a primitive operation and is widely used in many data mining applications. However, it is an expensive operation because it combines the kNN query and the join operation, whereas most existing methods assume the use of the Euclidean distance metric. We alternatively consider the problem of processing kNN joins in road networks where the distance between two points is the length of the shortest path connecting them. We propose a shared execution-based approach called the group-nested loop (GNL) method that can efficiently evaluate kNN joins in road networks by exploiting grouping and shared execution. The GNL method can be easily implemented using existing kNN query algorithms. Extensive experiments using several real-life roadmaps confirm the superior performance and effectiveness of the proposed method in a wide range of problem settings.

Download Full-text

Klasifikasi Sekolah Menengah Pertama/Sederajat Wilayah Bireuen Menggunakan Algoritma K-Nearest Neighbors Berbasis Web

Computer Engineering Science and System Journal ◽

10.24114/cess.v5i1.14962 ◽

2020 ◽

Vol 5 (1) ◽

pp. 33

Author(s):

Rozzi Kesuma Dinata ◽

Fajriana Fajriana ◽

Zulfa Zulfa ◽

Novia Hasdyna

Keyword(s):

Euclidean Distance ◽

Nearest Neighbor ◽

Nearest Neighbors ◽

K Nearest Neighbor ◽

K Nearest Neighbors

Pada penelitian ini diimplementasikan algoritma K-Nearest Neighbor dalam pengklasifikasian Sekolah Menengah Pertama/Sederajat berdasarkan peminatan calon siswa. Tujuan penelitian ini adalah untuk memudahkan pengguna dalam menemukan sekolah SMP/sederajat berdasarkan 8 kriteria sekolah yaitu akreditasi, fasilitas ruangan, fasilitas olah raga, laboratorium, ekstrakulikuler, biaya, tingkatan kelas dan waktu belajar. Adapun data yang digunakan dalam penelitian ini didapatkan dari Dinas Pendidikan Pemuda dan Olahraga Kabupaten Bireuen. Hasil penelitian dengan menggunakan K-NN dan pendekatan Euclidean Distance dengan k=3, diperoleh nilai precision sebesar 63,67%, recall 68,95% dan accuracy sebesar 79,33% .

Download Full-text

A generalized fuzzy k-nearest neighbor regression model based on Minkowski distance

Granular Computing ◽

10.1007/s41066-021-00288-w ◽

2021 ◽

Author(s):

Mahinda Mailagaha Kumbure ◽

Pasi Luukka

Keyword(s):

Regression Model ◽

Euclidean Distance ◽

Nearest Neighbor ◽

Weighted Average ◽

Nearest Neighbors ◽

Manhattan Distance ◽

K Nearest Neighbor ◽

Minkowski Distance ◽

Regression Methods ◽

Target Sample

AbstractThe fuzzy k-nearest neighbor (FKNN) algorithm, one of the most well-known and effective supervised learning techniques, has often been used in data classification problems but rarely in regression settings. This paper introduces a new, more general fuzzy k-nearest neighbor regression model. Generalization is based on the usage of the Minkowski distance instead of the usual Euclidean distance. The Euclidean distance is often not the optimal choice for practical problems, and better results can be obtained by generalizing this. Using the Minkowski distance allows the proposed method to obtain more reasonable nearest neighbors to the target sample. Another key advantage of this method is that the nearest neighbors are weighted by fuzzy weights based on their similarity to the target sample, leading to the most accurate prediction through a weighted average. The performance of the proposed method is tested with eight real-world datasets from different fields and benchmarked to the k-nearest neighbor and three other state-of-the-art regression methods. The Manhattan distance- and Euclidean distance-based FKNNreg methods are also implemented, and the results are compared. The empirical results show that the proposed Minkowski distance-based fuzzy regression (Md-FKNNreg) method outperforms the benchmarks and can be a good algorithm for regression problems. In particular, the Md-FKNNreg model gave the significantly lowest overall average root mean square error (0.0769) of all other regression methods used. As a special case of the Minkowski distance, the Manhattan distance yielded the optimal conditions for Md-FKNNreg and achieved the best performance for most of the datasets.

Download Full-text

An Incremental Local Outlier Detection Method in the Data Stream

Applied Sciences ◽

10.3390/app8081248 ◽

2018 ◽

Vol 8 (8) ◽

pp. 1248 ◽

Cited By ~ 4

Author(s):

Haiqing Yao ◽

Xiuwen Fu ◽

Yongsheng Yang ◽

Octavian Postolache

Keyword(s):

Outlier Detection ◽

Data Streams ◽

Data Stream ◽

Nearest Neighbor ◽

Nearest Neighbors ◽

Detection Accuracy ◽

K Nearest Neighbor ◽

Major Work ◽

Wide Range ◽

Local Outlier

Outlier detection has attracted a wide range of attention for its broad applications, such as fault diagnosis and intrusion detection, among which the outlier analysis in data streams with high uncertainty and infinity is more challenging. Recent major work of outlier detection has focused on principle research of the local outlier factor, and there are few studies on incremental updating strategies, which are vital to outlier detection in data streams. In this paper, a novel incremental local outlier detection approach is introduced to dynamically evaluate the local outlier in the data stream. An extended local neighborhood consisting of k nearest neighbors, reverse nearest neighbors and shared nearest neighbors is estimated for each data. The theoretical evidence of algorithm complexity for the insertion of new data and deletion of old data in the composite neighborhood shows that the amount of affected data in the incremental calculation is finite. Finally, experiments performed on both synthetic and real datasets verify its scalability and outlier detection accuracy. All results show that the proposed approach has comparable performance with state-of-the-art k nearest neighbor-based methods.

Download Full-text

Finding The Most Desirable Car Using K-Nearest Neighbor From E-Commerce Websites

Jurnal ELTIKOM ◽

10.31961/eltikom.v5i1.221 ◽

2021 ◽

Vol 5 (1) ◽

pp. 25-31

Author(s):

Mohammad Farid Naufal ◽

Yudistira Rahadian Wibisono

Keyword(s):

Euclidean Distance ◽

Nearest Neighbor ◽

Nearest Neighbors ◽

Fuel Type ◽

K Nearest Neighbor ◽

Minkowski Distance ◽

Search System ◽

K Nearest Neighbors ◽

Automatic Data ◽

Selection Of

The increasing number of cars that have been released to the market makes it more difficult for buyer to choose the choice of car that fits with their desired criteria such as transmission, number of kilometers, fuel type, and the year the car was made. The method that is suitable in determining the criteria desired by the community is the K-Nearest Neighbors (KNN). This method is used to find the lowest distance from each data in a car with the criteria desired by the buyer. Euclidean, Manhattan, and Minkowski distance are used for measuring the distance. For supporting the selection of cars, we need an automatic data col-lection method by using web crawling in which the system can retrieve car data from several ecommerce websites. With the construction of the car search system, the system can help the buyer in choosing a car and Euclidean distance has the best accuracy of 94.40%.

Download Full-text

Tropical Balls and Its Applications to K Nearest Neighbor over the Space of Phylogenetic Trees

Mathematics ◽

10.3390/math9070779 ◽

2021 ◽

Vol 9 (7) ◽

pp. 779

Author(s):

Ruriko Yoshida

Keyword(s):

Supervised Learning ◽

Phylogenetic Trees ◽

Nearest Neighbor ◽

Nearest Neighbors ◽

High Dimensional ◽

Learning Method ◽

Dimensional Vector ◽

K Nearest Neighbor ◽

K Nearest Neighbors

A tropical ball is a ball defined by the tropical metric over the tropical projective torus. In this paper we show several properties of tropical balls over the tropical projective torus and also over the space of phylogenetic trees with a given set of leaf labels. Then we discuss its application to the K nearest neighbors (KNN) algorithm, a supervised learning method used to classify a high-dimensional vector into given categories by looking at a ball centered at the vector, which contains K vectors in the space.

Download Full-text

Assessing the Relation between Mud Components and Rheology for Loss Circulation Prevention Using Polymeric Gels: A Machine Learning Approach

Energies ◽

10.3390/en14051377 ◽

2021 ◽

Vol 14 (5) ◽

pp. 1377

Author(s):

Musaab I. Magzoub ◽

Raj Kiran ◽

Saeed Salehi ◽

Ibnelwaleed A. Hussein ◽

Mustafa S. Nasser

Keyword(s):

Machine Learning ◽

Rheological Properties ◽

Nearest Neighbor ◽

Drilling Fluid ◽

Gradient Boosting ◽

K Nearest Neighbor ◽

Wide Range ◽

Machine Learning Approach ◽

Drilling Operations

The traditional way to mitigate loss circulation in drilling operations is to use preventative and curative materials. However, it is difficult to quantify the amount of materials from every possible combination to produce customized rheological properties. In this study, machine learning (ML) is used to develop a framework to identify material composition for loss circulation applications based on the desired rheological characteristics. The relation between the rheological properties and the mud components for polyacrylamide/polyethyleneimine (PAM/PEI)-based mud is assessed experimentally. Four different ML algorithms were implemented to model the rheological data for various mud components at different concentrations and testing conditions. These four algorithms include (a) k-Nearest Neighbor, (b) Random Forest, (c) Gradient Boosting, and (d) AdaBoosting. The Gradient Boosting model showed the highest accuracy (91 and 74% for plastic and apparent viscosity, respectively), which can be further used for hydraulic calculations. Overall, the experimental study presented in this paper, together with the proposed ML-based framework, adds valuable information to the design of PAM/PEI-based mud. The ML models allowed a wide range of rheology assessments for various drilling fluid formulations with a mean accuracy of up to 91%. The case study has shown that with the appropriate combination of materials, reasonable rheological properties could be achieved to prevent loss circulation by managing the equivalent circulating density (ECD).

Download Full-text

mAHTPred: a sequence-based meta-predictor for improving the prediction of anti-hypertensive peptides using effective feature representation

Bioinformatics ◽

10.1093/bioinformatics/bty1047 ◽

2018 ◽

Vol 35 (16) ◽

pp. 2757-2765 ◽

Cited By ~ 63

Author(s):

Balachandran Manavalan ◽

Shaherin Basith ◽

Tae Hwan Shin ◽

Leyi Wei ◽

Gwang Lee

Keyword(s):

Nearest Neighbor ◽

Feature Representation ◽

Superior Performance ◽

Supplementary Information ◽

Gradient Boosting ◽

Support Vector ◽

Pharmaceutical Drugs ◽

K Nearest Neighbor ◽

Feature Descriptors ◽

Predicted Probability

AbstractMotivationCardiovascular disease is the primary cause of death globally accounting for approximately 17.7 million deaths per year. One of the stakes linked with cardiovascular diseases and other complications is hypertension. Naturally derived bioactive peptides with antihypertensive activities serve as promising alternatives to pharmaceutical drugs. So far, there is no comprehensive analysis, assessment of diverse features and implementation of various machine-learning (ML) algorithms applied for antihypertensive peptide (AHTP) model construction.ResultsIn this study, we utilized six different ML algorithms, namely, Adaboost, extremely randomized tree (ERT), gradient boosting (GB), k-nearest neighbor, random forest (RF) and support vector machine (SVM) using 51 feature descriptors derived from eight different feature encodings for the prediction of AHTPs. While ERT-based trained models performed consistently better than other algorithms regardless of various feature descriptors, we treated them as baseline predictors, whose predicted probability of AHTPs was further used as input features separately for four different ML-algorithms (ERT, GB, RF and SVM) and developed their corresponding meta-predictors using a two-step feature selection protocol. Subsequently, the integration of four meta-predictors through an ensemble learning approach improved the balanced prediction performance and model robustness on the independent dataset. Upon comparison with existing methods, mAHTPred showed superior performance with an overall improvement of approximately 6–7% in both benchmarking and independent datasets.Availability and implementationThe user-friendly online prediction tool, mAHTPred is freely accessible at http://thegleelab.org/mAHTPred.Supplementary informationSupplementary data are available at Bioinformatics online.

Download Full-text

Continuous K Nearest Neighbor Query Scheme with Privacy and Security Guarantees in Road Networks

2018 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI) ◽

10.1109/smartworld.2018.00153 ◽

2018 ◽

Author(s):

Changli Zhou ◽

Tian Wang ◽

Hui Tian ◽

Wenxian Jiang

Keyword(s):

Nearest Neighbor ◽

Road Networks ◽

K Nearest Neighbor ◽

Privacy And Security ◽

Nearest Neighbor Query ◽

Security Guarantees

Download Full-text

The K Nearest Neighbor Algorithm for Imputation of Missing Longitudinal Prenatal Alcohol Data

10.21203/rs.3.rs-32456/v2 ◽

2021 ◽

Author(s):

Ayesha Sania ◽

Nicolo Pini ◽

Morgan Nelson ◽

Michael Myers ◽

Lauren Shuffrey ◽

...

Keyword(s):

Missing Data ◽

Missing Values ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Drinking Behavior ◽

Nearest Neighbors ◽

First Trimester ◽

Epidemiologic Studies ◽

K Nearest Neighbor ◽

Timeline Followback

Abstract Background — Missing data are a source of bias in epidemiologic studies. This is problematic in alcohol research where data missingness is linked to drinking behavior. Methods — The Safe Passage study was a prospective investigation of prenatal drinking and fetal/infant outcomes (n=11,083). Daily alcohol consumption for last reported drinking day and 30 days prior was recorded using Timeline Followback method. Of 3.2 million person-days, data were missing for 0.36 million. We imputed missing data using a machine learning algorithm; “K Nearest Neighbor” (K-NN). K-NN imputes missing values for a participant using data of participants closest to it. Imputed values were weighted for the distances from nearest neighbors and matched for day of week. Validation was done on randomly deleted data for 5-15 consecutive days. Results — Data from 5 nearest neighbors and segments of 55 days provided imputed values with least imputation error. After deleting data segments from with no missing days first trimester, there was no difference between actual and predicted values for 64% of deleted segments. For 31% of the segments, imputed data were within +/-1 drink/day of the actual. Conclusions — K-NN can be used to impute missing data in longitudinal studies of alcohol use during pregnancy with high accuracy.

Download Full-text

Approximate Continuous K-Nearest Neighbor Queries for Uncertain Objects in Road Networks

Web-Age Information Management - Lecture Notes in Computer Science ◽

10.1007/978-3-642-23535-1_53 ◽

2011 ◽

pp. 627-638

Author(s):

Guohui Li ◽

Ping Fan ◽

Ling Yuan

Keyword(s):

Nearest Neighbor ◽

Road Networks ◽

K Nearest Neighbor ◽

Nearest Neighbor Queries

Download Full-text