redundancy removal
Recently Published Documents


TOTAL DOCUMENTS

80
(FIVE YEARS 16)

H-INDEX

10
(FIVE YEARS 2)

2022 ◽  
Vol 12 (1) ◽  
pp. 0-0

The traditional frequency based approach to creating multi-document extractive summary ranks sentences based on scores computed by summing up TF*IDF weights of words contained in the sentences. In this approach, TF or term frequency is calculated based on how frequently a term (word) occurs in the input and TF calculated in this way does not take into account the semantic relations among terms. In this paper, we propose methods that exploits semantic term relations for improving sentence ranking and redundancy removal steps of a summarization system. Our proposed summarization system has been tested on DUC 2003 and DUC 2004 benchmark multi-document summarization datasets. The experimental results reveal that performance of our multi-document text summarizer is significantly improved when the distributional term similarity measure is used for finding semantic term relations. Our multi-document text summarizer also outperforms some well known summarization baselines to which it is compared.


2021 ◽  
Author(s):  
Nazreena Rahman ◽  
Bhogeswar Borah

Abstract This paper presents a query-based extractive text summarization method by using sense-oriented semantic relatedness measure. We have proposed a Word Sense Disambiguation (WSD) technique to find the exact sense of a word present in the sentence. It helps in extracting query relevance sentences while calculating the sense-oriented sentence semantic relatedness score between the query and input text sentence. The proposed method uses five unique features to make clusters of query-relevant sentences. A redundancy removal technique is also put forward to eliminate redundant sentences. We have evaluated our proposed WSD technique with other existing methods by using Senseval and SemEval datasets. Experimental evaluation and discussion signifies the better performance of proposed WSD method over current systems in terms of F-score. We compare our proposed query-based extractive text summarization method with other methods participated in Document Understanding Conference (DUC) and as well as with current methods. Evaluation and comparison state that the proposed query-based extractive text summarization method outperforms many existing methods. As an unsupervised learning algorithm, we obtained highest ROUGE (Recall-Oriented Understudy for Gisting Evaluation) score for all three DUC 2005, 2006 and 2007 datasets. Our proposed method is also quite comparable with other supervised learning based algorithms. We also observe that our query-based extractive text summarization method can recognize query relevance sentences which meet the query need.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Jifeng Guo ◽  
Zhiqi Pang ◽  
Wenbo Sun ◽  
Shi Li ◽  
Yu Chen

Active learning aims to select the most valuable unlabelled samples for annotation. In this paper, we propose a redundancy removal adversarial active learning (RRAAL) method based on norm online uncertainty indicator, which selects samples based on their distribution, uncertainty, and redundancy. RRAAL includes a representation generator, state discriminator, and redundancy removal module (RRM). The purpose of the representation generator is to learn the feature representation of a sample, and the state discriminator predicts the state of the feature vector after concatenation. We added a sample discriminator to the representation generator to improve the representation learning ability of the generator and designed a norm online uncertainty indicator (Norm-OUI) to provide a more accurate uncertainty score for the state discriminator. In addition, we designed an RRM based on a greedy algorithm to reduce the number of redundant samples in the labelled pool. The experimental results on four datasets show that the state discriminator, Norm-OUI, and RRM can improve the performance of RRAAL, and RRAAL outperforms the previous state-of-the-art active learning methods.


2021 ◽  
Vol 6 (3) ◽  
pp. 11
Author(s):  
Adonney Allan de Oliveira Veras

The data volume produced by the omic sciences nowadays was driven by the adoption of new generation sequencing platforms, popularly called NGS (Next Generation Sequencing). Among the analysis performed with this data, we can mention: mapping, genome assembly, genome annotation, pangenomic analysis, quality control, redundancy removal, among others. When it comes to redundancy removal analysis, it is worth noting the existence of several tools that perform this task, with proven accuracy through their scientific publications, but they lack criteria related to algorithmic complexity. Thus, this work aims to perform an algorithmic complexity analysis in computational tools for removing redundancy of raw reads from the DNA sequencing process, through empirical analysis. The analysis was performed with sixteen raw reads datasets. The datasets were processed with the following tools: MarDRe, NGSReadsTreatment, ParDRe, FastUniq, and BioSeqZip, and analyzed using the R statistical platform, through the GuessCompx package. The results demonstrate that the BioSeqZip and ParDRe tools present less complexity in this analysis


2020 ◽  
Vol 2020 ◽  
pp. 1-11
Author(s):  
Bo Mi ◽  
Ping Long ◽  
Yang Liu ◽  
Fengtian Kuang

Data deduplication serves as an effective way to optimize the storage occupation and the bandwidth consumption over clouds. As for the security of deduplication mechanism, users’ privacy and accessibility are of utmost concern since data are outsourced. However, the functionality of redundancy removal and the indistinguishability of deduplication labels are naturally incompatible, which bring about a lot of threats on data security. Besides, the access control of sharing copies may lead to infringement on users’ attributes and cumbersome query overheads. To balance the usability with the confidentiality of deduplication labels and securely realize an elaborate access structure, a novel data deduplication scheme is proposed in this paper. Briefly speaking, we drew support from learning with errors (LWE) to make sure that the deduplication labels are only differentiable during the duplication check process. Instead of authority matching, the proof of ownership (PoW) is then implemented under the paradigm of inner production. Since the deduplication label is light-weighted and the inner production is easy to carry out, our scheme is more efficient in terms of computation and storage. Security analysis also indicated that the deduplication labels are distinguishable only for duplication check, and the probability of falsifying a valid ownership is negligible.


2020 ◽  
Vol 131 ◽  
pp. 383-389 ◽  
Author(s):  
Arshdeep Singh ◽  
Padmanabhan Rajan ◽  
Arnav Bhavsar

Author(s):  
George Giannakopoulos ◽  
George Kiomourtzis ◽  
Nikiforos Pittaras ◽  
Vangelis Karkaletsis

This chapter describes the evolution of a real, multi-document, multilingual news summarization methodology and application, named NewSum, the research problems behind it, as well as the steps taken to solve these problems. The system uses the representation of n-gram graphs to perform sentence selection and redundancy removal towards summary generation. In addition, it tackles problems related to topic and subtopic detection (via clustering), demonstrates multi-lingual applicability, and—through recent advances—scalability to big data. Furthermore, recent developments over the algorithm allow it to utilize semantic information to better identify and outline events, so as to offer an overall improvement over the base approach.


Sign in / Sign up

Export Citation Format

Share Document