Document Summarization Using Sentence-Level Semantic Based on Word Embeddings

In the era of information overload, text summarization has become a focus of attention in a number of diverse fields such as, question answering systems, intelligence analysis, news recommendation systems, search results in web search engines, and so on. A good document representation is the key point in any successful summarizer. Learning this representation becomes a very active research in natural language processing field (NLP). Traditional approaches mostly fail to deliver a good representation. Word embedding has proved an excellent performance in learning the representation. In this paper, a modified BM25 with Word Embeddings are used to build the sentence vectors from word vectors. The entire document is represented as a set of sentence vectors. Then, the similarity between every pair of sentence vectors is computed. After that, TextRank, a graph-based model, is used to rank the sentences. The summary is generated by picking the top-ranked sentences according to the compression rate. Two well-known datasets, DUC2002 and DUC2004, are used to evaluate the models. The experimental results show that the proposed models perform comprehensively better compared to the state-of-the-art methods.

Download Full-text

BUILD KNOWLEDGE GRAPH FROM HETEROGENEOUS DOCUMENTS

Journal of Science and Technology - IUH ◽

10.46242/jst-iuh.v47i05.761 ◽

2021 ◽

Vol 47 (05) ◽

Author(s):

NGUYỄN CHÍ HIẾU

Keyword(s):

Information Retrieval ◽

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Question Answering ◽

Semantic Analysis ◽

Knowledge Graph ◽

Question Answering Systems ◽

Knowledge Graphs

Knowledge Graphs are applied in many fields such as search engines, semantic analysis, and question answering in recent years. However, there are many obstacles for building knowledge graphs as methodologies, data and tools. This paper introduces a novel methodology to build knowledge graph from heterogeneous documents. We use the methodologies of Natural Language Processing and deep learning to build this graph. The knowledge graph can use in Question answering systems and Information retrieval especially in Computing domain

Download Full-text

Multi-neural network-based sentiment analysis of food reviews based on character and word embeddings

International Journal of Electrical Engineering Education ◽

10.1177/0020720920928492 ◽

2020 ◽

pp. 002072092092849

Author(s):

Yong Li ◽

Qingyu Jin ◽

Min Zuo ◽

Haisheng Li ◽

Xiaojun Yang ◽

...

Keyword(s):

Neural Network ◽

Sentiment Analysis ◽

Language Processing ◽

Chinese Character ◽

Semantic Features ◽

Word Embeddings ◽

Emotional Information ◽

Related Sequence ◽

Active Research ◽

Multi Neural Network

Sentiment analysis becomes one of the most active research hotspots in the field of natural language processing tasks in recent years. However, the inability to fully and effectively use emotional information is a problem in present deep learning models. A single Chinese character has different meanings in different words, and the character embeddings are combined with the word embeddings to extract more precise meaning information. In this paper, a single Chinese character and word are used as input units to train. Based on BLSTM, the attention mechanism based on vocabulary semantics in food field is introduced to realize distance-related sequence semantic feature extraction. CNN is used to realize semantic sentiment classification of sequence semantic features. Therefore, a model based on multi-neural network for sentiment information extraction and analysis is proposed. Experiments show that the model has excellent characteristics in sentiment analysis and obtains high accuracy and F value.

Download Full-text

An Empirical Study of Content Understanding in Conversational Question Answering

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6257 ◽

2020 ◽

Vol 34 (05) ◽

pp. 7578-7585

Author(s):

Ting-Rui Chiang ◽

Hao-Tong Ye ◽

Yun-Nung Chen

Keyword(s):

Natural Language Processing ◽

Empirical Study ◽

Language Processing ◽

Question Answering ◽

Source Code ◽

Content Understanding ◽

Question Answering Systems ◽

Benchmark Datasets ◽

Context Free ◽

Answering Questions

With a lot of work about context-free question answering systems, there is an emerging trend of conversational question answering models in the natural language processing field. Thanks to the recently collected datasets, including QuAC and CoQA, there has been more work on conversational question answering, and recent work has achieved competitive performance on both datasets. However, to best of our knowledge, two important questions for conversational comprehension research have not been well studied: 1) How well can the benchmark dataset reflect models' content understanding? 2) Do the models well utilize the conversation content when answering questions? To investigate these questions, we design different training settings, testing settings, as well as an attack to verify the models' capability of content understanding on QuAC and CoQA. The experimental results indicate some potential hazards in the benchmark datasets, QuAC and CoQA, for conversational comprehension research. Our analysis also sheds light on both what models may learn and how datasets may bias the models. With deep investigation of the task, it is believed that this work can benefit the future progress of conversation comprehension. The source code is available at https://github.com/MiuLab/CQA-Study.

Download Full-text

Super Agent Chatbot “3S” Sebagai Media Informasi Menggunakan Metoda Natural Language Processing(NLP)

JURNAL TEKNOLOGI DAN OPEN SOURCE ◽

10.36378/jtos.v2i1.144 ◽

2019 ◽

Vol 2 (1) ◽

pp. 53-64

Author(s):

Herwin H Herwin

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Web Site ◽

Question Answering ◽

Question Answering Systems ◽

Portal Website

STMIK Amik Riau memiliki portal pada website http://www.sar.ac.id difungsikan sebagai media penyebaran informasi bagi sivitas akademika dan stakeholder. Rerata pengunjung setiap hari dalam 3 bulan terakhir adalah 150 kunjungan, namun terjadi peningkatan pada saat penerimaan mahasiswa di setiap tahun akademik. Hal ini mengindikasikan terjadinya peningkatan minat masyarakat untuk mengetahui informasi STMIK Amik Riau. Sayangnya, sampai saat ini pemanfaatan portal web site masih satu arah, dari STMIK Amik Riau ke stakeholder dan masyarakat, tidak terjadi sebaliknya. Komunikasi stakeholder dengan PT sehubungan dengan muatan yang ada di dalam portal menggunakan media sosial dan tidak terintegrasi dengan web. Begitu juga dengan masukan, koreksi, tanggapan, maupun komunikasi lain menggunakan media sosial. Sampai saat ini, masyarakat yang mengunjungi portal website baik masyarakat luas, maupun stakeholder tidak dapat dideteksi waktu berkunjung sehingga tidak dapat disapa dengan filosofi “3S”, padahal masyarakat luas yang telah berkunjung merupakan pasar potensial untuk di edukasi. Masyarakat yang berkunjung ke portal website, dengan sopan di sapa oleh sistem, kemudian dilanjutkan dengan komunikasi langsung, tersedia mesin yang siap memberikan salam dan melayani setiap pertanyaan yang diajukan oleh pengunjung. Penelitian ini bertujuan membuat chatbot yang mampu berkomunikasi dengan pengunjung website. Chatbot yang telah dibuat diberi nama STMIK Amik Riau Intelligence Virtual Information disingkat SILVI. Chatbot dibuat berdasarkan Question Answering Systems (QAS), bekerja dengan algoritma kemiripan antara dua teks. Penelitian ini menghasilkan aplikasi yang siap digunakan, diberi nama SILVI, mampu berkomunikasi dengan pengunjung website. Chatbot mengoptimalkan komunikasi seolah tidak menyadari, tetap menganggap lawan bicara adalah pegawai yang tepat dalam tugas pokok dan fungsi.

Download Full-text

BUILDING QUESTION ANSWERING SYSTEM BASED ON COMPUTING DOMAIN ONTOLOGY

Journal of Science and Technology - IUH ◽

10.46242/jst-iuh.v38i02.294 ◽

2020 ◽

Vol 38 (02) ◽

Author(s):

TẠ DUY CÔNG CHIẾN

Keyword(s):

Language Processing ◽

Digital Libraries ◽

Question Answering ◽

Domain Ontology ◽

Text Documents ◽

Question Answering System ◽

Domain Specific ◽

Sql Database ◽

Question Answering Systems ◽

Education Business

Question answering systems are applied to many different fields in recent years, such as education, business, and surveys. The purpose of these systems is to answer automatically the questions or queries of users about some problems. This paper introduces a question answering system is built based on a domain specific ontology. This ontology, which contains the data and the vocabularies related to the computing domain are built from text documents of the ACM Digital Libraries. Consequently, the system only answers the problems pertaining to the information technology domains such as database, network, machine learning, etc. We use the methodologies of Natural Language Processing and domain ontology to build this system. In order to increase performance, I use a graph database to store the computing ontology and apply no-SQL database for querying data of computing ontology.

Download Full-text

Building Graph for Events and Time in Natural Language Text

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.c8419.019320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 581-586

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Information Extraction ◽

Language Processing ◽

Question Answering ◽

Relation Extraction ◽

Event Extraction ◽

Event Time ◽

Time Graph ◽

Question Answering Systems

Events and time are two major key terms in natural language processing due to the various event-oriented tasks these are become an essential terms in information extraction. In natural language processing and information extraction or retrieval event and time leads to several applications like text summaries, documents summaries, and question answering systems. In this paper, we present events-time graph as a new way of construction for event-time based information from text. In this event-time graph nodes are events, whereas edges represent the temporal and co-reference relations between events. In many of the previous researches of natural language processing mainly individually focused on extraction tasks and in domain-specific way but in this work we present extraction and representation of the relationship between events- time by representing with event time graph construction. Our overall system construction is in three-step process that performs event extraction, time extraction, and representing relation extraction. Each step is at a performance level comparable with the state of the art. We present Event extraction on MUC data corpus annotated with events mentions on which we train and evaluate our model. Next, we present time extraction the model of times tested for several news articles from Wikipedia corpus. Next is to represent event time relation by representation by next constructing event time graphs. Finally, we evaluate the overall quality of event graphs with the evaluation metrics and conclude the observations of the entire work

Download Full-text

Using a Dialogue Manager to Improve Semantic Web Search

International Journal on Semantic Web and Information Systems ◽

10.4018/ijswis.2016010104 ◽

2016 ◽

Vol 12 (1) ◽

pp. 62-78 ◽

Cited By ~ 2

Author(s):

Dora Melo ◽

Irene Pimenta Rodrigues ◽

Vitor Beires Nogueira

Keyword(s):

Semantic Web ◽

Natural Language ◽

Knowledge Base ◽

System Performance ◽

Web Search ◽

Question Answering ◽

Strategies for Improving the Efficacy of Fusion Question Answering Systems

Principles and Applications of Business Intelligence Research ◽

10.4018/978-1-4666-2650-8.ch013 ◽

2012 ◽

pp. 181-198

Author(s):

José Antonio Robles-Flores ◽

Gregory Schymik ◽

Julie Smith-David ◽

Robert St. Louis

Keyword(s):

Search Engines ◽

Web Site ◽

Web Search ◽

Question Answering ◽

Irrelevant Information ◽

Web Pages ◽

Crowd Sourcing ◽

Complete Answer ◽

Question Answering Systems

Web search engines typically retrieve a large number of web pages and overload business analysts with irrelevant information. One approach that has been proposed for overcoming some of these problems is automated Question Answering (QA). This paper describes a case study that was designed to determine the efficacy of QA systems for generating answers to original, fusion, list questions (questions that have not previously been asked and answered, questions for which the answer cannot be found on a single web site, and questions for which the answer is a list of items). Results indicate that QA algorithms are not very good at producing complete answer lists and that searchers are not very good at constructing answer lists from snippets. These findings indicate a need for QA research to focus on crowd sourcing answer lists and improving output format.

Download Full-text

A semi-automatic approach to construct Vietnamese ontology from online text

The International Review of Research in Open and Distributed Learning ◽

10.19173/irrodl.v13i5.1250 ◽

2012 ◽

Vol 13 (5) ◽

pp. 148 ◽

Cited By ~ 3

Author(s):

Bao-An Nguyen ◽

Don-Lin Yang

Keyword(s):

Language Processing ◽

Question Answering ◽

Expert Knowledge ◽

Relation Extraction ◽

Knowledge Bases ◽

Instructional Materials ◽

Formal Representation ◽

Text Documents ◽

Ontology Construction ◽

Question Answering Systems

An ontology is an effective formal representation of knowledge used commonly in artificial intelligence, semantic web, software engineering, and information retrieval. In open and distance learning, ontologies are used as knowledge bases for e-learning supplements, educational recommenders, and question answering systems that support students with much needed resources. In such systems, ontology construction is one of the most important phases. Since there are abundant documents on the Internet, useful learning materials can be acquired openly with the use of an ontology. However, due to the lack of system support for ontology construction, it is difficult to construct self-instructional materials for Vietnamese people. In general, the cost of manual acquisition of ontologies from domain documents and expert knowledge is too high. Therefore, we present a support system for Vietnamese ontology construction using pattern-based mechanisms to discover Vietnamese concepts and conceptual relations from Vietnamese text documents. In this system, we use the combination of statistics-based, data mining, and Vietnamese natural language processing methods to develop concept and conceptual relation extraction algorithms to discover knowledge from Vietnamese text documents. From the experiments, we show that our approach provides a feasible solution to build Vietnamese ontologies used for supporting systems in education.<br /><br />

Download Full-text

Instructor-aided asynchronous question answering system for online education and distance learning

The International Review of Research in Open and Distributed Learning ◽

10.19173/irrodl.v13i5.1269 ◽

2012 ◽

Vol 13 (5) ◽

pp. 102 ◽

Cited By ~ 2

Author(s):

Dunwei Wen ◽

John Cuzzola ◽

Lorna Brown ◽

Dr. Kinshuk

Keyword(s):

Natural Language Processing ◽

Distance Learning ◽

Online Education ◽

Language Processing ◽

Question Answering ◽

Prototype System ◽

Learning Situation ◽

Question Answering System ◽

Question Answering Systems

Question answering systems have frequently been explored for educational use. However, their value was somewhat limited due to the quality of the answers returned to the student. Recent question answering (QA) research has started to incorporate deep natural language processing (NLP) in order to improve these answers. However, current NLP technology involves intensive computing and thus it is hard to meet the real-time demand of traditional search. This paper introduces a question answering (QA) system particularly suited for delayed-answered questions that are typical in certain asynchronous online and distance learning settings. We exploit the communication delay between student and instructor and propose a solution that integrates into an organization’s existing learning management system. We present how our system fits into an online and distance learning situation and how it can better assist supporting students. The prototype system and its running results show the perspective and potential of this research.<br /><br />

Download Full-text