jaccard distance
Recently Published Documents


TOTAL DOCUMENTS

32
(FIVE YEARS 11)

H-INDEX

7
(FIVE YEARS 2)

2021 ◽  
Vol 5 (2) ◽  
pp. 167-176
Author(s):  
Wahyu Hidayat ◽  
◽  
Ema Utami ◽  
Ahmad Fikri Iskandar ◽  
Anggit Dwi Hartanto ◽  
...  

During Covid-19 pandemic, there was various hoax news about Covid-19. There are truth-clarification platforms for hoax news about Covid-19 such as Jala Hoax and Saber Hoax which categorize into misinformation and disinformation. Classification of supervised learning methods is applied to carry out learning from fact labels. Dataset is taken from Jala Hoax and Saber Hoax as many as 559 data which are made into Class 1 (Misleading Content, Satire/Parody, False Connection), Class 2 (False Context, Imposter Content), Class 3 (Fabricated and Manipulated Content). K-Nearest Neighbor (K-NN) is used to classify categories of misinformation and disinformation. Dissimilarity measure Jaccard Distance is compared with Euclidean, Manhattan, and Minkowski and uses k-value variance in the K-NN to determine the performance comparison results for each test. Results of Jaccard Distance at the value of k = 4 get a higher value than other model with an accuracy 0.696, precision 0.710, recall 0.572, and F1-Score. Maximum results tend to be on label of the most data class in Class 1 (Misleading Content, Satire or Parody, False Connection) with a total of 58 correct data from 61 test data.


2021 ◽  
Author(s):  
Hamid Reza Mohebbi ◽  
Nurit Haspel

Gene fusions events, which are the result of two genes fused together to create a hybrid gene, were first described in cancer cells in the early 1980s. These events are relatively common in many cancers including prostate, lymphoid, soft tissue, and breast. Recent advances in next-generation sequencing (NGS) provide a high volume of genomic data, including cancer genomes. The detection of possible gene fusions requires fast and accurate methods. However, current methods suffer from inefficiency, lack of sufficient accuracy, and a high false-positive rate. We present an RNA-Seq fusion detection method that uses dimensionality reduction and parallel computing to speed up the computation. We convert the RNA categorical space into a compact binary array called binary fingerprints, which enables us to reduce the memory usage and increase efficiency. The search and detection of fusion candidates are done using the Jaccard distance. The detection of candidates is followed by refinement. We benchmarked our fusion prediction accuracy using both simulated and genuine RNA-Seq datasets. Paired-end Illumina RNA-Seq genuine data were obtained from 60 publicly available cancer cell line data sets. The results are compared against the state-of-the-art-methods such as STAR-Fusion, InFusion, and TopHat-Fusion. Our results show that FDJD exhibits superior accuracy compared to popular alternative fusion detection methods. We achieved 90% accuracy on simulated fusion transcript inputs, which is the highest among the compared methods while maintaining comparable run time.


2021 ◽  
Vol 1 (2) ◽  
pp. 87-95
Author(s):  
Nur Aini Rakhmawati ◽  
Miftahul Jannah

Open Food Facts provides a database of food products such as product names, compositions, and additives, where everyone can contribute to add the data or reuse the existing data. The open food facts data are dirty and needs to be processed before storing the data to our system. To reduce redundancy in food ingredients data, we measure the similarity of ingredient food using two similarities: the conceptual similarity and textual similarity. The conceptual similarity measures the similarity between the two datasets by its word meaning (synonym), while the textual similarity is based on fuzzy string matching, namely Levenshtein distance, Jaro-Winkler distance, and Jaccard distance. Based on our evaluation, the combination of similarity measurements using textual and Wordnet similarity (conceptual) was the most optimal similarity method in food ingredients.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Bei Zhang ◽  
Luquan Wang ◽  
Yuanyuan Li

In user cluster analysis, users with the same or similar behavior characteristics are divided into the same group by iterative update clustering, and the core and larger user groups are detected. In this paper, we present the formulation and data mining of the correlation rules based on the clustering algorithm through the definition and procedure of the algorithm. In addition, based on the idea of the K-mode clustering algorithm, this paper proposes a clustering method combining related rules with multivalued discrete features (MDF). In this paper, we construct a method to calculate the similarity between users using Jaccard distance and combine correlation rules with Jaccard distances to improve the similarity between users. Next, we propose a clustering method suitable for MDF. Finally, the basic K-mode algorithm is improved by the similarity measure method combining the correlation rule with the Jaccard distance and the cluster center update method which is the ARMDKM algorithm proposed in this paper. This method solves the problem that the MDF cannot be effectively processed in the traditional model and demonstrates its theoretical correctness. This experiment verifies the correctness of the new method by clustering purity, entropy, contour, and other indicators.


2020 ◽  
Vol 26 (5) ◽  
pp. 495-508
Author(s):  
Zhaobin Wang ◽  
Jing Cui ◽  
Ying Zhu

2020 ◽  
Vol 4 (Supplement_2) ◽  
pp. 1595-1595
Author(s):  
Sabrina Trudo ◽  
Rosa Moreno ◽  
Jeong Hoon Pan ◽  
Daniel Gallaher ◽  
Jae Kyeom Kim ◽  
...  

Abstract Objectives Cruciferous (CRU; rich in glucosinolates) and apiaceous (API; rich in furanocoumarins) vegetable intake decrease colon cancer risk markers, likely through different mechanisms. Previous reports suggest background diets influence efficacy of bioactives. Here, we determined the effects on the composition of the gut microbiome of CRU and API supplementation to different background diets, diet-induced obesity (DIO) and the total western diet (TWD). Methods C57BL/6J male mice were fed standard diet (AIN93G), DIO, DIO with 21% (w/w) CRU (DIO + CRU), DIO with 21% (w/w) API (DIO + API), TWD, TWD with CRU (TWD + CRU), or TWD with API (TWD + API). After 12 weeks, cecal contents were collected for 16S rRNA sequencing and data analyzed by mothur. Results There were no differences in body weight gain except mice fed DIO + CRU gained more than mice fed AIN-93G or TWD. Lachnospiraceae was increased by CRU supplementation to both DIO and TWD and by API supplementation to TWD. CRU increased alpha diversity [Shannon Index, number of observed Operational Taxonomic Unit (OTUs)] compared to DIO and TWD. Regarding beta diversity, DIO + CRU showed distinct cluster compared to DIO (Bray-Curtis, ANOSIM, R = 0.35, P < 0.001; Jaccard distance, R = 0.47, P < 0.001). TWD + CRU showed distinct cluster compared to TWD (Bray-Curtis, R = 0.59, P < 0.001; Jaccard distance, R = 0.62, P < 0.001). API did not change alpha diversity, but did affect beta diversities with distinct clusters between API groups and their basal diet groups (Jaccard distance, R = 0.36 and 0.31 for DIO and TWD, respectively, P < 0.05). Among top 25 discriminating features between DIO and TWD and their supplementation of API and CRU, there were 9 shared OTUs including Lachnospiraceae, Clostridium XlVa, Clostridiales, Eisenbergiella, and Clostridium IV. Akkermansia were decreased in DIO + CRU compared with DIO. In TWD panel, Bifidobacterium and Erysipelotrichaceae decreased in TWD + CRU, while Turicibacter were identified as TWD + CRU signature. Erysipelotrichaceae and Bifidobacterium differentiated AIN-93G, DIO, and TWD. Conclusions CRU supplementation of DIO and TWD altered gut microbiome composition with some differences based on background diet. API also altered composition, albeit to a lesser extent. Funding Sources University of Arkansas, Fulbright Nicaragua Fellow.


Sign in / Sign up

Export Citation Format

Share Document