distance metrics
Recently Published Documents


TOTAL DOCUMENTS

399
(FIVE YEARS 148)

H-INDEX

30
(FIVE YEARS 5)

2022 ◽  
Vol 2161 (1) ◽  
pp. 012004
Author(s):  
Swathi Nayak ◽  
Manisha Bhat ◽  
N V Subba Reddy ◽  
B Ashwath Rao

Abstract Classification of stars is essential to investigate the characteristics and behavior of stars. Performing classifications manually is error-prone and time-consuming. Machine learning provides a computerized solution to handle huge volumes of data with minimal human input. k-Nearest Neighbor (kNN) is one of the simplest supervised learning approaches in machine learning. This paper aims at studying and analyzing the performance of the kNN algorithm on the star dataset. In this paper, we have analyzed the accuracy of the kNN algorithm by considering various distance metrics and the range of k values. Minkowski, Euclidean, Manhattan, Chebyshev, Cosine, Jaccard, and Hamming distance were applied on kNN classifiers for different k values. It is observed that Cosine distance works better than the other distance metrics on star categorization.


2021 ◽  
Vol 38 (6) ◽  
pp. 1843-1851
Author(s):  
Ouarda Soltani ◽  
Souad Benabdelkader

The human color skin image database called SFA, specifically designed to assist research in the area of face recognition, constitutes a very important means particularly for the challenging task of skin detection. It has showed high performances comparing to other existing databases. SFA database provides multiple skin and non-skin samples, which in various combinations with each other allow creating new samples that could be useful and more effective. This particular aspect will be investigated, in the present paper, by creating four new representative skin samples according to the four rules of minimum, maximum, mean and median. The obtained samples will be exploited for the purpose of skin segmentation on the basis of the well-known Euclidean and Manhattan distance metrics. Thereafter, performances of the new representative skin samples versus performances of those skin samples, originally provided by SFA, will be illustrated. Simulation results in both SFA and UTD (University of Texas at Dallas) color face databases indicate that detection rates higher than 92% can be achieved with either measure.


2021 ◽  
Author(s):  
Karthik Murugadoss ◽  
Michiel JM Niesen ◽  
Bharathwaj Raghunathan ◽  
Patrick J Lenehan ◽  
Pritha Ghosh ◽  
...  

Highly transmissible or immuno-evasive SARS-CoV-2 variants have intermittently emerged and outcompeted previously circulating strains, resulting in repeated COVID-19 surges, reinfections, and breakthrough infections in vaccinated individuals. With over 5 million SARS-CoV-2 genomes sequenced globally over the last 2 years, there is unprecedented data to decipher how competitive viral evolution results in the emergence of fitter SARS-CoV-2 variants. Much attention has been directed to studying how specific mutations in the Spike protein impact its binding to the ACE2 receptor or viral neutralization by antibodies, but there is limited knowledge of genomic signatures shared primarily by dominant variants. Here we introduce a methodology to quantify the genome-wide distinctiveness of polynucleotide fragments of various lengths (3- to 240-mers) that constitute SARS-CoV-2 lineage genomes. Compared to standard phylogenetic distance metrics and overall mutational load, the quantification of distinctive 9-mer polynucleotides provides a higher resolution of separation between variants of concern (Reference = 89, IQR: 65-108; Alpha = 166, IQR: 150-182; Beta 130, IQR: 113-147; Gamma = 165, IQR: 152-180; Delta = 234, IQR: 216-253; and Omicron = 294, IQR: 287-315). The similar scoring of the Alpha and Gamma variants by our methodology is consistent with these strains emerging at approximately the same time and circulating in distinct geographical regions as dominant strains. Furthermore, evaluation of genomic distinctiveness for 1,363 lineages annotated in GISAID highlights that polynucleotide diversity has increased over time (R2 = 0.37) and that VOCs show high distinctiveness compared to non-VOC contemporary lineages. To facilitate similar real-time assessments on the competitive fitness potential of future variants, we are launching a freely accessible resource for infusing pandemic preparedness with genomic inference ("GENI" — https://academia.nferx.com/GENI). This study demonstrates the value of characterizing new SARS-CoV-2 variants by their genome-wide polynucleotide distinctiveness and emphasizes the need to go beyond a narrow set of mutations at known functionally salient sites.


Sensors ◽  
2021 ◽  
Vol 21 (24) ◽  
pp. 8398
Author(s):  
Bijan G. Mobasseri ◽  
Amro Lulu

Radiometric identification is the problem of attributing a signal to a specific source. In this work, a radiometric identification algorithm is developed using the whitening transformation. The approach stands out from the more established methods in that it works directly on the raw IQ data and hence is featureless. As such, the commonly used dimensionality reduction algorithms do not apply. The premise of the idea is that a data set is “most white” when projected on its own whitening matrix than on any other. In practice, transformed data are never strictly white since the training and the test data differ. The Förstner-Moonen measure that quantifies the similarity of covariance matrices is used to establish the degree of whiteness. The whitening transform that produces a data set with the minimum Förstner-Moonen distance to a white noise process is the source signal. The source is determined by the output of the mode function operated on the Majority Vote Classifier decisions. Using the Förstner-Moonen measure presents a different perspective compared to maximum likelihood and Euclidean distance metrics. The whitening transform is also contrasted with the more recent deep learning approaches that are still dependent on feature vectors with large dimensions and lengthy training phases. It is shown that the proposed method is simpler to implement, requires no features vectors, needs minimal training and because of its non-iterative structure is faster than existing approaches.


2021 ◽  
Vol 11 (23) ◽  
pp. 11294
Author(s):  
Zuo-Cheng Wen ◽  
Zhi-Heng Zhang ◽  
Xiang-Bing Zhou ◽  
Jian-Gang Gu ◽  
Shao-Peng Shen ◽  
...  

Recently, predicting multivariate time-series (MTS) has attracted much attention to obtain richer semantics with similar or better performances. In this paper, we propose a tri-partition alphabet-based state (tri-state) prediction method for symbolic MTSs. First, for each variable, the set of all symbols, i.e., alphabets, is divided into strong, medium, and weak using two user-specified thresholds. With the tri-partitioned alphabet, the tri-state takes the form of a matrix. One order contains the whole variables. The other is a feature vector that includes the most likely occurring strong, medium, and weak symbols. Second, a tri-partition strategy based on the deviation degree is proposed. We introduce the piecewise and symbolic aggregate approximation techniques to polymerize and discretize the original MTS. This way, the symbol is stronger and has a bigger deviation. Moreover, most popular numerical or symbolic similarity or distance metrics can be combined. Third, we propose an along–across similarity model to obtain the k-nearest matrix neighbors. This model considers the associations among the time stamps and variables simultaneously. Fourth, we design two post-filling strategies to obtain a completed tri-state. The experimental results from the four-domain datasets show that (1) the tri-state has greater recall but lower precision; (2) the two post-filling strategies can slightly improve the recall; and (3) the along–across similarity model composed by the Triangle and Jaccard metrics are first recommended for new datasets.


2021 ◽  
Author(s):  
Yiu-ming Cheung ◽  
Zhikai Hu

<div><p>Unsupervised cross-modal retrieval has received increasing attention recently, because of the extreme difficulty of labeling the explosive multimedia data. The core challenge of it is how to measure the similarities between multi-modal data without label information. In previous works, various distance metrics are selected for measuring the similarities and predicting whether samples belong to the same class. However, these predictions are not always right. Unfortunately, even a few wrong predictions can undermine the final retrieval performance. To address this problem, in this paper, we categorize predictions as solid and soft ones based on their confidence. We further categorize samples as solid and soft ones based on the predictions. We propose that these two kinds of predictions and samples should be treated differently. Besides, we find that the absolute values of similarities can represent not only the similarity but also the confidence of the predictions. Thus, we first design an elegant dot product fusion strategy to obtain effective inter-modal similarities. Subsequently, utilizing these similarities, we propose a generalized and flexible weighted loss function where larger weights are assigned to solid samples to increase the retrieval performance, and smaller weights are assigned to soft samples to decrease the disturbance of wrong predictions. Despite less information is used, empirical studies show that the proposed approach achieves the state-of-the-art retrieval performance.</p><br></div>


2021 ◽  
Author(s):  
Yiu-ming Cheung ◽  
Zhikai Hu

<div><p>Unsupervised cross-modal retrieval has received increasing attention recently, because of the extreme difficulty of labeling the explosive multimedia data. The core challenge of it is how to measure the similarities between multi-modal data without label information. In previous works, various distance metrics are selected for measuring the similarities and predicting whether samples belong to the same class. However, these predictions are not always right. Unfortunately, even a few wrong predictions can undermine the final retrieval performance. To address this problem, in this paper, we categorize predictions as solid and soft ones based on their confidence. We further categorize samples as solid and soft ones based on the predictions. We propose that these two kinds of predictions and samples should be treated differently. Besides, we find that the absolute values of similarities can represent not only the similarity but also the confidence of the predictions. Thus, we first design an elegant dot product fusion strategy to obtain effective inter-modal similarities. Subsequently, utilizing these similarities, we propose a generalized and flexible weighted loss function where larger weights are assigned to solid samples to increase the retrieval performance, and smaller weights are assigned to soft samples to decrease the disturbance of wrong predictions. Despite less information is used, empirical studies show that the proposed approach achieves the state-of-the-art retrieval performance.</p><br></div>


Sign in / Sign up

Export Citation Format

Share Document