automated identification
Recently Published Documents


TOTAL DOCUMENTS

1070
(FIVE YEARS 372)

H-INDEX

52
(FIVE YEARS 9)

Author(s):  
Ghazeefa Fatima ◽  
Rao Muhammad Adeel Nawab ◽  
Muhammad Salman Khan ◽  
Ali Saeed

Semantic word similarity is a quantitative measure of how much two words are contextually similar. Evaluation of semantic word similarity models requires a benchmark corpus. However, despite the millions of speakers and the large digital text of the Urdu language on the Internet, there is a lack of benchmark corpus for the Cross-lingual Semantic Word Similarity task for the Urdu language. This article reports our efforts in developing such a corpus. The newly developed corpus is based on the SemEval-2017 task 2 English dataset, and it contains 1,945 cross-lingual English–Urdu word pairs. For each of these pairs of words, semantic similarity scores were assigned by 11 native Urdu speakers. In addition to corpus generation, this article also reports the evaluation results of a baseline approach, namely “Translation Plus Monolingual Analysis” for automated identification of semantic similarity between English–Urdu word pairs. The results showed that the path length similarity measure performs better for the Google and Bing translated words. The newly created corpus and evaluation results are freely available online for further research and development.


2022 ◽  
Vol 31 (1) ◽  
pp. 1-27
Author(s):  
Yaqin Zhou ◽  
Jing Kai Siow ◽  
Chenyu Wang ◽  
Shangqing Liu ◽  
Yang Liu

Security patches in open source software, providing security fixes to identified vulnerabilities, are crucial in protecting against cyber attacks. Security advisories and announcements are often publicly released to inform the users about potential security vulnerability. Despite the National Vulnerability Database (NVD) publishes identified vulnerabilities, a vast majority of vulnerabilities and their corresponding security patches remain beyond public exposure, e.g., in the open source libraries that are heavily relied on by developers. As many of these patches exist in open sourced projects, the problem of curating and gathering security patches can be difficult due to their hidden nature. An extensive and complete security patches dataset could help end-users such as security companies, e.g., building a security knowledge base, or researcher, e.g., aiding in vulnerability research. To efficiently curate security patches including undisclosed patches at large scale and low cost, we propose a deep neural-network-based approach built upon commits of open source repositories. First, we design and build security patch datasets that include 38,291 security-related commits and 1,045 Common Vulnerabilities and Exposures (CVE) patches from four large-scale C programming language libraries. We manually verify each commit, among the 38,291 security-related commits, to determine if they are security related. We devise and implement a deep learning-based security patch identification system that consists of two composite neural networks: one commit-message neural network that utilizes pretrained word representations learned from our commits dataset and one code-revision neural network that takes code before revision and after revision and learns the distinction on the statement level. Our system leverages the power of the two networks for Security Patch Identification. Evaluation results show that our system significantly outperforms SVM and K-fold stacking algorithms. The result on the combined dataset achieves as high as 87.93% F1-score and precision of 86.24%. We deployed our pipeline and learned model in an industrial production environment to evaluate the generalization ability of our approach. The industrial dataset consists of 298,917 commits from 410 new libraries that range from a wide functionalities. Our experiment results and observation on the industrial dataset proved that our approach can identify security patches effectively among open sourced projects.


Complexity ◽  
2022 ◽  
Vol 2022 ◽  
pp. 1-12
Author(s):  
Muhammad Zubair Asghar ◽  
Adidah Lajis ◽  
Muhammad Mansoor Alam ◽  
Mohd Khairil Rahmat ◽  
Haidawati Mohamad Nasir ◽  
...  

Emotion-based sentimental analysis has recently received a lot of interest, with an emphasis on automated identification of user behavior, such as emotional expressions, based on online social media texts. However, the majority of the prior attempts are based on traditional procedures that are insufficient to provide promising outcomes. In this study, we categorize emotional sentiments by recognizing them in the text. For that purpose, we present a deep learning model, bidirectional long-term short-term memory (BiLSMT), for emotion recognition that takes into account five main emotions (Joy, Sadness, Fear, Shame, Guilt). We use our experimental assessments on the emotion dataset to accomplish the emotion categorization job. The datasets were evaluated and the findings revealed that, when compared to state-of-the-art methodologies, the proposed model can successfully categorize user emotions into several classifications. Finally, we assess the efficacy of our strategy using statistical analysis. This research’s findings help firms to apply best practices in the selection, management, and optimization of policies, services, and product information.


2022 ◽  
Author(s):  
Mark Achtman ◽  
Zhemin Zhou ◽  
Jane Charlesworth ◽  
Laura A. Baxter

The definition of bacterial species is traditionally a taxonomic issue while defining bacterial populations is done with population genetics. These assignments are species specific, and depend on the practitioner. Legacy multilocus sequence typing is commonly used to identify sequence types (STs) and clusters (ST Complexes). However, these approaches are not adequate for the millions of genomic sequences from bacterial pathogens that have been generated since 2012. EnteroBase (http://enterobase.warwick.ac.uk) automatically clusters core genome MLST alleles into hierarchical clusters (HierCC) after assembling annotated draft genomes from short read sequences. HierCC clusters span core sequence diversity from the species level down to individual transmission chains. Here we evaluate the ability of HierCC to correctly assign 100,000s of genomes to the species/subspecies and population levels for Salmonella, Clostridoides, Yersinia, Vibrio and Streptococcus. HierCC assignments were more consistent with maximum-likelihood super-trees of core SNPs or presence/absence of accessory genes than classical taxonomic assignments or 95% ANI. However, neither HierCC nor ANI were uniformly consistent with classical taxonomy of Streptococcus. HierCC was also consistent with legacy eBGs/ST Complexes in Salmonella or Escherichia and revealed differences in vertical inheritance of O serogroups. Thus, EnteroBase HierCC supports the automated identification of and assignment to species/subspecies and populations for multiple genera.


2022 ◽  
Vol 12 (1) ◽  
Author(s):  
Xinyue Li ◽  
Chenjie Xia ◽  
Xin Li ◽  
Shuangqing Wei ◽  
Sujun Zhou ◽  
...  

AbstractDiabetes can cause microvessel impairment. However, these conjunctival pathological changes are not easily recognized, limiting their potential as independent diagnostic indicators. Therefore, we designed a deep learning model to explore the relationship between conjunctival features and diabetes, and to advance automated identification of diabetes through conjunctival images. Images were collected from patients with type 2 diabetes and healthy volunteers. A hierarchical multi-tasking network model (HMT-Net) was developed using conjunctival images, and the model was systematically evaluated and compared with other algorithms. The sensitivity, specificity, and accuracy of the HMT-Net model to identify diabetes were 78.70%, 69.08%, and 75.15%, respectively. The performance of the HMT-Net model was significantly better than that of ophthalmologists. The model allowed sensitive and rapid discrimination by assessment of conjunctival images and can be potentially useful for identifying diabetes.


2022 ◽  
Vol 9 ◽  
Author(s):  
Maoyi Zhang ◽  
Changqing Ding ◽  
Shuli Guo

Tracheobronchial diverticula (TD) is a common cystic lesion that can be easily neglected; hence accurate and rapid identification is critical for later diagnosis. There is a strong need to automate this diagnostic process because traditional manual observations are time-consuming and laborious. However, most studies have only focused on the case report or listed the relationship between the disease and other physiological indicators, but a few have adopted advanced technologies such as deep learning for automated identification and diagnosis. To fill this gap, this study interpreted TD recognition as semantic segmentation and proposed a novel attention-based network for TD semantic segmentation. Since the area of TD lesion is small and similar to surrounding organs, we designed the atrous spatial pyramid pooling (ASPP) and attention mechanisms, which can efficiently complete the segmentation of TD with robust results. The proposed attention model can selectively gather features from different branches according to the amount of information they contain. Besides, to the best of our knowledge, no public research data is available yet. For efficient network training, we constructed a data set containing 218 TD and related ground truth (GT). We evaluated different models based on the proposed data set, among which the highest MIOU can reach 0.92. The experiments show that our model can outperform state-of-the-art methods, indicating that the deep learning method has great potential for TD recognition.


2022 ◽  
pp. 102367
Author(s):  
Jiantao Pu ◽  
Joseph K Leader ◽  
Jacob Sechrist ◽  
Cameron A Beeche ◽  
Jatin P Singh ◽  
...  

2022 ◽  
Vol 71 (2) ◽  
pp. 3337-3353
Author(s):  
Pulkit Jain ◽  
Paras Chawla ◽  
Mehedi Masud ◽  
Shubham Mahajan ◽  
Amit Kant Pandit

Sign in / Sign up

Export Citation Format

Share Document