Automatic Text Summarization by Providing Coverage, Non-Redundancy, and Novelty Using Sentence Graph

Krishnaveni P.;  Balasundaram S. R.

doi:10.4018/jitr.2022010108

Automatic Text Summarization by Providing Coverage, Non-Redundancy, and Novelty Using Sentence Graph

Journal of Information Technology Research ◽

10.4018/jitr.2022010108 ◽

2022 ◽

Vol 15 (1) ◽

pp. 1-18

Author(s):

Krishnaveni P. ◽

Balasundaram S. R.

Keyword(s):

Graph Algorithms ◽

Maximal Clique ◽

Text Summarization ◽

Original Text ◽

Online Information ◽

Automatic Text Summarization ◽

Global Properties ◽

Input Text ◽

Local Properties ◽

Automatic Text

The day-to-day growth of online information necessitates intensive research in automatic text summarization (ATS). The ATS software produces summary text by extracting important information from the original text. With the help of summaries, users can easily read and understand the documents of interest. Most of the approaches for ATS used only local properties of text. Moreover, the numerous properties make the sentence selection difficult and complicated. So this article uses a graph based summarization to utilize structural and global properties of text. It introduces maximal clique based sentence selection (MCBSS) algorithm to select important and non-redundant sentences that cover all concepts of the input text for summary. The MCBSS algorithm finds novel information using maximal cliques (MCs). The experimental results of recall oriented understudy for gisting evaluation (ROUGE) on Timeline dataset show that the proposed work outperforms the existing graph algorithms Bushy Path (BP), Aggregate Similarity (AS), and TextRank (TR).

Download Full-text

Implementasi Algoritma Graf dan Algoritma Genetika pada Peringkasan Single Document

Repositor ◽

10.22219/repositor.v2i11.891 ◽

2020 ◽

Vol 2 (11) ◽

pp. 1521

Author(s):

Lina Dwi Yulianti ◽

Setio Basuki ◽

Yufis Azhar

Keyword(s):

Genetic Algorithms ◽

Graph Algorithms ◽

System Development ◽

High Accuracy ◽

Text Summarization ◽

Test Results ◽

The Core ◽

Automatic Text Summarization ◽

Summarization System ◽

Automatic Text

In today's technological advancements, finding information is easier and faster. But not a little information that is not true or commonly referred to as hoaxes. Therefore, information must be obtained from several sources to ensure the accuracy of the information. Automatic Text Summarization System is a system used for text based document summarization. This system can help find the core of a news document, so it does not require much time to read. Researchers use Graph Algorithms and Genetic Algorithms in system development. From the test results obtained by the accuracy of the system produced by the system with manual numbers have a cosine similarity value of 71.21%. This can prove that the system built can be used by users because the results of tests carried out get high accuracy values.

Download Full-text

A Framework for Word Embedding Based Automatic Text Summarization and Evaluation

Information ◽

10.3390/info11020078 ◽

2020 ◽

Vol 11 (2) ◽

pp. 78 ◽

Cited By ~ 2

Author(s):

Tulu Tilahun Hailu ◽

Junqing Yu ◽

Tessfu Geteye Fantaye

Keyword(s):

Text Summarization ◽

Evaluation Framework ◽

Word Embedding ◽

Evaluation Metrics ◽

Original Text ◽

Automatic Evaluation ◽

Source Text ◽

Automatic Text Summarization ◽

Automatic Text

Text summarization is a process of producing a concise version of text (summary) from one or more information sources. If the generated summary preserves meaning of the original text, it will help the users to make fast and effective decision. However, how much meaning of the source text can be preserved is becoming harder to evaluate. The most commonly used automatic evaluation metrics like Recall-Oriented Understudy for Gisting Evaluation (ROUGE) strictly rely on the overlapping n-gram units between reference and candidate summaries, which are not suitable to measure the quality of abstractive summaries. Another major challenge to evaluate text summarization systems is lack of consistent ideal reference summaries. Studies show that human summarizers can produce variable reference summaries of the same source that can significantly affect automatic evaluation metrics scores of summarization systems. Humans are biased to certain situation while producing summary, even the same person perhaps produces substantially different summaries of the same source at different time. This paper proposes a word embedding based automatic text summarization and evaluation framework, which can successfully determine salient top-n sentences of a source text as a reference summary, and evaluate the quality of systems summaries against it. Extensive experimental results demonstrate that the proposed framework is effective and able to outperform several baseline methods with regard to both text summarization systems and automatic evaluation metrics when tested on a publicly available dataset.

Download Full-text

Developing a new approach to summarize Arabic text automatically using syntactic and semantic analysis

International Journal of Engineering & Technology ◽

10.14419/ijet.v9i2.30324 ◽

2020 ◽

Vol 9 (2) ◽

pp. 342

Author(s):

Amal Alkhudari

Keyword(s):

Language Processing ◽

Automatic System ◽

Semantic Analysis ◽

Text Summarization ◽

Original Text ◽

Arabic Text ◽

Wide Spread ◽

New Approach ◽

Automatic Text Summarization ◽

Automatic Text

Due to the wide spread information and the diversity of its sources, there is a need to produce an accurate text summary with the least time and effort. This summary must preserve key information content and overall meaning of the original text. Text summarization is one of the most important applications of Natural Language Processing (NLP). The goal of automatic text summarization is to create summaries that are similar to human-created ones. However, in many cases, the readability of created summaries is not satisfactory, because the summaries do not consider the meaning of the words and do not cover all the semantically relevant aspects of data. In this paper we use syntactic and semantic analysis to propose an automatic system of Arabic texts summarization. This system is capable of understanding the meaning of information and retrieves only the relevant part. The effectiveness and evaluation of the proposed work are demonstrated under EASC corpus using Rouge measure. The generated summaries will be compared against those done by human and precedent researches.

Download Full-text

A New Biomimetic Method Based on the Power Saves of Social Bees for Automatic Summaries of Texts by Extraction

International Journal of Software Science and Computational Intelligence ◽

10.4018/ijssci.2015010102 ◽

2015 ◽

Vol 7 (1) ◽

pp. 18-38 ◽

Cited By ~ 5

Author(s):

Mohamed Amine Boudia ◽

Reda Mohamed Hamou ◽

Abdelmalek Amine ◽

Amine Rahmani

Keyword(s):

Text Summarization ◽

Second Step ◽

Original Text ◽

Simple Majority ◽

New Approach ◽

Social Bees ◽

Automatic Text Summarization ◽

Final Layer ◽

Biomimetic Method ◽

Automatic Text

In this paper, the authors propose a new approach for automatic text summarization by extraction based on Saving Energy Function where the first step constitute to use two techniques of extraction: scoring of phrases, and similarity that aims to eliminate redundant phrases without losing the theme of the text. While the second step aims to optimize the results of the previous layer by the metaheuristic based on Bee Algorithm, the objective function of the optimization is to maximize the sum of similarity between phrases of the candidate summary in order to keep the theme of the text, minimize the sum of scores in order to increase the summarization rate, this optimization also will give a candidate's summary where the order of the phrases changes compared to the original text. The third and final layer aims to choose the best summary from the candidate summaries generated by bee optimization, the authors opted for the technique of voting with a simple majority.

Download Full-text

Abstractive Summarization: A Survey of the State of the Art

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019815 ◽

2019 ◽

Vol 33 ◽

pp. 9815-9822 ◽

Cited By ~ 5

Author(s):

Hui Lin ◽

Vincent Ng

Keyword(s):

Machine Translation ◽

State Of The Art ◽

The State ◽

Text Summarization ◽

Abstract Representation ◽

Automatic Text Summarization ◽

Input Text ◽

Gradual Shift ◽

Abstractive Summarization ◽

Automatic Text

The focus of automatic text summarization research has exhibited a gradual shift from extractive methods to abstractive methods in recent years, owing in part to advances in neural methods. Originally developed for machine translation, neural methods provide a viable framework for obtaining an abstract representation of the meaning of an input text and generating informative, fluent, and human-like summaries. This paper surveys existing approaches to abstractive summarization, focusing on the recently developed neural approaches.

Download Full-text

Single Document Automatic Text Summarization using Term Frequency-Inverse Document Frequency (TF-IDF)

ComTech Computer Mathematics and Engineering Applications ◽

10.21512/comtech.v7i4.3746 ◽

2016 ◽

Vol 7 (4) ◽

pp. 285 ◽

Cited By ~ 14

Author(s):

Hans Christian ◽

Mikhael Pramodana Agus ◽

Derwin Suhartono

Keyword(s):

Language Processing ◽

Text Summarization ◽

The Other ◽

Online Information ◽

Inverse Document Frequency ◽

Automatic Text Summarization ◽

Document Frequency ◽

Online Source ◽

Automatic Text ◽

F Measure

The increasing availability of online information has triggered an intensive research in the area of automatic text summarization within the Natural Language Processing (NLP). Text summarization reduces the text by removing the less useful information which helps the reader to find the required information quickly. There are many kinds of algorithms that can be used to summarize the text. One of them is TF-IDF (TermFrequency-Inverse Document Frequency). This research aimed to produce an automatic text summarizer implemented with TF-IDF algorithm and to compare it with other various online source of automatic text summarizer. To evaluate the summary produced from each summarizer, The F-Measure as the standard comparison value had been used. The result of this research produces 67% of accuracy with three data samples which are higher compared to the other online summarizers.

Download Full-text

Introducing Word's Importance Level-Based Text Summarization Using Tree Structure

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2020010102 ◽

2020 ◽

Vol 10 (1) ◽

pp. 13-33

Author(s):

Nitesh Kumar Jha ◽

Arnab Mitra

Keyword(s):

Linear Relationship ◽

Directed Graph ◽

Execution Time ◽

Text Processing ◽

Text Summarization ◽

New Approach ◽

Automatic Text Summarization ◽

Input Text ◽

Importance Factor ◽

Automatic Text

Text-summarization plays a significant role towards quick knowledge acquisition from any text-based knowledge resource. To enhance the text-summarization process, a new approach towards automatic text-summarization is presented in this article that facilitates level (word importance factor)-based automated text-summarization. An equivalent tree is produced from the directed-graph during the input text processing with WordNet. Detailed investigations further ensure that the execution time for proposed automatic text-summarization, is strictly following a linear relationship with reference to the varying volume of inputs. Further investigation towards the performance of proposed automatic text-summarization approach ensures its superiority over several other existing text-summarization approaches.

Download Full-text

Automatic Text Summarization Using Deep Reinforcement Learning and Beyond

Information Technology And Control ◽

10.5755/j01.itc.50.3.28047 ◽

2021 ◽

Vol 50 (3) ◽

pp. 458-469

Author(s):

Gang Sun ◽

Zhongxin Wang ◽

Jia Zhao

Keyword(s):

Information Overload ◽

Optimization Method ◽

Text Summarization ◽

Original Text ◽

Baseline Model ◽

Extractive Summarization ◽

Automatic Text Summarization ◽

Text Information ◽

Abstractive Summarization ◽

Automatic Text

In the era of big data, information overload problems are becoming increasingly prominent. It is challengingfor machines to understand, compress and filter massive text information through the use of artificial intelligencetechnology. The emergence of automatic text summarization mainly aims at solving the problem ofinformation overload, and it can be divided into two types: extractive and abstractive. The former finds somekey sentences or phrases from the original text and combines them into a summarization; the latter needs acomputer to understand the content of the original text and then uses the readable language for the human tosummarize the key information of the original text. This paper presents a two-stage optimization method forautomatic text summarization that combines abstractive summarization and extractive summarization. First,a sequence-to-sequence model with the attention mechanism is trained as a baseline model to generate initialsummarization. Second, it is updated and optimized directly on the ROUGE metric by using deep reinforcementlearning (DRL). Experimental results show that compared with the baseline model, Rouge-1, Rouge-2,and Rouge-L have been increased on the LCSTS dataset and CNN/DailyMail dataset.

Download Full-text

Automatic Text Summarization Berdasarkan Pendekatan Statistika pada Dokumen Berbahasa Indonesia

KELUWIH: Jurnal Sains dan Teknologi ◽

10.24123/saintek.v2i1.4045 ◽

2021 ◽

Vol 2 (1) ◽

Author(s):

Christopher Setyawan ◽

Njoto Benarkah ◽

Vincentius Riandaru Prasetyo

Keyword(s):

Text Summarization ◽

Cosine Similarity ◽

Original Text ◽

Frequency Method ◽

Location Method ◽

System Validation ◽

Automatic Text Summarization ◽

Statistical Approaches ◽

Similarity Method ◽

Automatic Text

Abstract—Propelled by the modern technological innovations data and text will be more abundant throughout the year. With this much text, automatic text summarization is needed now more than ever to help summarize a text. Automatic text summarization is defined as the creation of a shortened version of a text by a computer program, the product of this procedure still contains the most important points of the original text. Statistical approaches is one of automatic text summarization method. There is 5 statistical approaches that being used namely aggregation similarity method, frequency method, location method, title method (if text has a title), dan tf-based query method (if text doesn’t have a title). Cosine similarity is used to calculate title method, aggregation similarity method, and tf- based query method. There is two type of validation, user validation and system validation. For system validation compare the similarity between human summary and summary generated by program, which result in accuracy of 76.7647% for summary with 30% length of the original journal. For user validation result in 82% accuracy. The conclusion based on user validation and system validation is statistical approaches is suitable for automatic text summarization. Keywords: automatic text summarization, statistical approaches, Indonesian document, cosine similarity Abstrak— Dengan kemajuan teknologi jumlah data dan teks akan semakin melimpah sepanjang tahun. Dengan banyaknya teks ini dibutuhkan bantuan automatic text summarization untuk merangkum teks tersebut. Automatic text summarization didefinisikan sebagai versi singkat dari suatu teks menggunakan program komputer yang hasilnya masih memiliki informasi penting berupa gagasan dasar dan kata atau kalimat yang dapat merepresentasikan keseluruhan teks original. Salah satu metode dalam automatic text summarization adalah pendekatan statistika. Pendekatan statistika yang digunakan ada 5 yaitu aggregation similarity method, frequency method, location method, title method (bila teks memiliki judul), dan tf-based query method (bila teks tidak memiliki judul). Cosine similarity dipakai untuk perhitungan title method, tf-based query method, dan aggregation similarity method. Validasi dilakukan dengan dua macam validasi. Pertama adalah validasi sistem dengan membandingkan similaritas antara rangkuman program dan rangkuman manusia, yang menghasilkan akurasi 76.7647% untuk rangkuman dengan panjang 30% dari jurnal original. Kedua adalah validasi user yang menghasilkan akurasi 81%. Kesimpulannya berdasarkan validasi user dan validasi sistem yang cukup baik maka pendekatan statistika cocok dipakai dalam kasus automatic text summarization. Kata kunci: automatic text summarization, pendekatan statistika, cosine similarity, dokumen berbahasa Indonesia

Download Full-text

SGATS: Semantic Graph-based Automatic Text Summarization from Hindi Text Documents

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3464381 ◽

2021 ◽

Vol 20 (6) ◽

pp. 1-32

Author(s):

Manju Lata Joshi ◽

Nisheeth Joshi ◽

Namita Mittal

Keyword(s):

Language Processing ◽

Semantic Analysis ◽

Text Summarization ◽

Original Text ◽

Theoretic Approach ◽

Extractive Summarization ◽

Semantic Graph ◽

Text Document ◽

Automatic Text Summarization ◽

Automatic Text

Creating a coherent summary of the text is a challenging task in the field of Natural Language Processing (NLP). Various Automatic Text Summarization techniques have been developed for abstractive as well as extractive summarization. This study focuses on extractive summarization which is a process containing selected delineative paragraphs or sentences from the original text and combining these into smaller forms than the document(s) to generate a summary. The methods that have been used for extractive summarization are based on a graph-theoretic approach, machine learning, Latent Semantic Analysis (LSA), neural networks, cluster, and fuzzy logic. In this paper, a semantic graph-based approach SGATS (Semantic Graph-based approach for Automatic Text Summarization) is proposed to generate an extractive summary. The proposed approach constructs a semantic graph of the original Hindi text document by establishing a semantic relationship between sentences of the document using Hindi Wordnet ontology as a background knowledge source. Once the semantic graph is constructed, fourteen different graph theoretical measures are applied to rank the document sentences depending on their semantic scores. The proposed approach is applied to two data sets of different domains of Tourism and Health. The performance of the proposed approach is compared with the state-of-the-art TextRank algorithm and human-annotated summary. The performance of the proposed system is evaluated using widely accepted ROUGE measures. The outcomes exhibit that our proposed system produces better results than TextRank for health domain corpus and comparable results for tourism corpus. Further, correlation coefficient methods are applied to find a correlation between eight different graphical measures and it is observed that most of the graphical measures are highly correlated.

Download Full-text