term frequency
Recently Published Documents


TOTAL DOCUMENTS

485
(FIVE YEARS 136)

H-INDEX

28
(FIVE YEARS 4)

2022 ◽  
Vol 12 (1) ◽  
pp. 0-0

The traditional frequency based approach to creating multi-document extractive summary ranks sentences based on scores computed by summing up TF*IDF weights of words contained in the sentences. In this approach, TF or term frequency is calculated based on how frequently a term (word) occurs in the input and TF calculated in this way does not take into account the semantic relations among terms. In this paper, we propose methods that exploits semantic term relations for improving sentence ranking and redundancy removal steps of a summarization system. Our proposed summarization system has been tested on DUC 2003 and DUC 2004 benchmark multi-document summarization datasets. The experimental results reveal that performance of our multi-document text summarizer is significantly improved when the distributional term similarity measure is used for finding semantic term relations. Our multi-document text summarizer also outperforms some well known summarization baselines to which it is compared.


2021 ◽  
Vol 9 ◽  
Author(s):  
Dejian Yang ◽  
Shun Sang ◽  
Xinsong Zhang

The kinetic energy stored in the doubly-fed induction generators (DFIG)-based wind farm can be utilized to sustain the dynamic system frequency. However, difficulties arise in determining the control gain to effectively improve the frequency nadir and smoothly return to the maximum power point tracking (MPPT) operation. This paper addresses a two-phase short-term frequency response (STFR) scheme to boost the frequency nadir and minimize the second drop in the system frequency based on a piecewise control gain. To achieve the first goal, a constant control coefficient, which is determined according to the initial operating conditions of the DFIG-based wind farm, is employed until the frequency nadir produces. To achieve the second goal, the control coefficient, which changes with time, facilitates to smoothly return to the MPPT operation. The effectiveness of the proposed two-phase STFR scheme is verified under various wind power penetration levels, wind speeds, and disturbances. The results reveal that the frequency nadir is improved, and simultaneously, it smoothly returns to the MPPT operation and minimizes the second drop in the system frequency.


Author(s):  
E. Sri Vishva ◽  
D. Aju

Fundamentally, phishing is a common cybercrime that is indulged by the intruders or hackers on naive and credible individuals and make them to reveal their unique and sensitive information through fictitious websites. The primary intension of this kind of cybercrime is to gain access to the ad hominem or classified information from the recipients. The obtained data comprises of information that can very well utilized to recognize an individual. The purloined personal or sensitive information is commonly marketed in the online dark market and subsequently these information will be bought by the personal identity brigands. Depending upon the sensitivity and the importance of the stolen information, the price of a single piece of purloined information would vary from few dollars to thousands of dollars. Machine learning (ML) as well as Deep Learning (DL) are powerful methods to analyse and endeavour against these phishing attacks. A machine learning based phishing detection system is proposed to protect the website and users from such attacks. In order to optimize the results in a better way, the TF-IDF (Term Frequency-Inverse Document Frequency) value of webpages is employed within the system. ML methods such as LR (Logistic Regression), RF (Random Forest), SVM (Support Vector Machine), NB (Naive Bayes) and SGD (Stochastic Gradient Descent) are applied for training and testing the obtained dataset. Henceforth, a robust phishing website detection system is developed with 90.68% accuracy.


Author(s):  
Syaifulloh Amien Pandega Perdana ◽  
Teguh Bharata Aji ◽  
Ridi Ferdiana

Ulasan pelanggan merupakan opini terhadap kualitas barang atau jasa yang dirasakan konsumen. Ulasan pelanggan mengandung informasi yang berguna bagi konsumen maupun penyedia barang atau jasa. Ketersediaan ulasan pelanggan dalam jumlah besar pada website membutuhkan suatu framework untuk mengekstraksi sentimen secara otomatis. Sebuah ulasan pelanggan sering kali mengandung banyak aspek sehingga Aspect Based Sentiment Analysis (ABSA) harus digunakan untuk mengetahui polaritas masing-masing aspek. Salah satu tugas penting dalam ABSA adalah Aspect Category Detection. Metode machine learning untuk Aspect Category Detection sudah banyak dilakukan pada domain berbahasa Inggris, tetapi pada domain bahasa Indonesia masih sedikit. Makalah ini membandingkan kinerja tiga algoritme machine learning, yaitu Naïve Bayes (NB), Support Vector Machine (SVM), dan Random Forest (RF) pada ulasan pelanggan berbahasa Indonesia menggunakan Term Frequency–Inverse Document Frequency (TF-IDF) sebagai term weighting. Hasil menunjukkan bahwa RF memiliki kinerja paling unggul dibandingkan NB dan SVM pada tiga domain yang berbeda, yaitu restoran, hotel, dan e-commerce, dengan nilai f1-score untuk masing-masing domain adalah 84.3%, 85.7%, dan 89,3%.


Author(s):  
Silmi Fauziati ◽  
Adhistya Erna Permanasari ◽  
Indriana Hidayah ◽  
Eko Wahyu Nugroho ◽  
Bobby Rian Dewangga

Makalah ini bertujuan untuk memperbaiki kinerja sistem penilaian tes uraian singkat. Perbaikan kinerja tersebut dilakukan dengan menambahkan regresi linear sederhana pada keluaran gabungan metode cosine similarity (dengan pembobotan frekuensi kata berbasis metode Term Frequency-Inverse Document Frequency (TF-IDF)) dan mekanisme pencocokan kata. Regresi linear dilakukan dengan menjadikan nilai uraian singkat (hasil cosine similarity dan pencocokan kata) sebagai variabel regressor. Untuk mengetahui efektivitas sistem penilaian yang diusulkan, diukur kinerja sistem penilaian relatif terhadap nilai manual yang dilakukan oleh dosen. Diperoleh bahwa sebelum dilakukan regresi linear, sistem penilaian cenderung mengeluarkan nilai lebih tinggi (nilai mengalami bias) dibandingkan nilai manual yang dilakukan dosen. Regresi linear memperbaiki kinerja sistem penilaian tersebut dengan mengurangi bias penilaian secara signifikan, yaitu nilai yang diberikan tidak cenderung lebih tinggi maupun lebih rendah daripada nilai manual oleh dosen. Bahwa bias penilaian dapat diturunkan secara signifikan dengan metode yang sederhana, yaitu regresi linear, diharapkan dapat memberikan kontribusi terhadap akselerasi proses penerapan sistem penilaian otomatis untuk tes uraian pada teknologi pembelajaran dalam jaringan seperti e-learning.


2021 ◽  
Author(s):  
Moon-Ju Jeon ◽  
Sung-Man Bae

BACKGROUND Panic attacks have different clinical characteristics among individuals and countries, characterizing time, place, and symptoms are not clearly predictable OBJECTIVE This study aimed to analyze crucial keywords related to panic disorder and identify various clinical characteristics of panic attacks METHODS We collected 8,728 Twitter posts related to panic disorder from January 1 to December 4, 2020. First, we analyzed crucial and simultaneous emergence keywords related to panic disorder. For this, Term frequency, Term Frequency-Inverse Document Frequency, degree centrality, and N-gram analyses were conducted using Rstudio and TEXTOM and visualized as word clouds. Also, we classfied results of Term frequency for panic disorder into physical symptoms, triggers, time, place, affect, pathology, person, and coping. RESULTS First, depression, drugs, respiration, and stress were keywords related to panic disorder. Next, hyperventilation, palpitations, and shaking were common physical symptoms. Stress, sound, trauma, and coffee were also ranked high in terms of triggering situations. Additionally, in terms of time, morning, night, and dawn accounted for most of the time. Meanwhile, homes, schools, subways, and companies were ranked high as places of occurrence. Regarding affect, fear, tears, and embarrassment were also common. Furthermore, anxiety and depression were ranked high in terms of pathology. Finally, drugs and hospitals were ranked high in terms of coping. CONCLUSIONS These results help to understand the main characteristics of panic disorder and various aspects of unexpected panic attacks and are expected to be a basis for identifying the characteristic clinical aspects of panic disorder among Koreans.


2021 ◽  
Vol 7 (2) ◽  
pp. 153
Author(s):  
Yunita Maulidia Sari ◽  
Nenden Siti Fatonah

Perkembangan teknologi yang pesat membuat kita lebih mudah dalam menemukan informasi-informasi yang dibutuhkan. Permasalahan muncul ketika informasi tersebut sangat banyak. Semakin banyak informasi dalam sebuah modul maka akan semakin panjang isi teks dalam modul tersebut. Hal tersebut akan memakan waktu yang cukup lama untuk memahami inti informasi dari modul tersebut. Salah satu solusi untuk mendapatkan inti informasi dari keseluruhan modul dengan cepat dan menghemat waktu adalah dengan membaca ringkasannya. Cara cepat untuk mendapatkan ringkasan sebuah dokumen adalah dengan cara peringkasan teks otomatis. Peringkasan teks otomatis (Automatic Text Summarization) merupakan teks yang dihasilkan dari satu atau lebih dokumen, yang mana hasil teks tersebut memberikan informasi penting dari sumber dokumen asli, serta secara otomatis hasil teks tersebut tidak lebih panjang dari setengah sumber dokumen aslinya. Penelitian ini bertujuan untuk menghasilkan peringkasan teks otomatis pada modul pembelajaran berbahasa Indonesia dan mengetahui hasil akurasi peringkasan teks otomatis yang menerapkan metode Cross Latent Semantic Analysis (CLSA). Jumlah data yang digunakan pada penelitian ini sebanyak 10 file modul pembelajaran yang berasal dari modul para dosen Universitas Mercu Buana, dengan format .docx sebanyak 5 file dan format .pdf sebanyak 5 file. Penelitian ini menerapkan metode Term Frequency-Inverse Document Frequency (TF-IDF) untuk pembobotan kata dan metode Cross Latent Semantic Analysis (CLSA) untuk peringkasan teks. Pengujian akurasi pada peringkasan modul pembelajaran dilakukan dengan cara membandingkan hasil ringkasan manual oleh manusia dan hasil ringkasan sistem. Yang mana pengujian ini menghasilkan rata-rata nilai f-measure, precision, dan recall tertinggi pada compression rate 20% dengan nilai berturut-turut 0.3853, 0.432, dan 0.3715.


2021 ◽  
Author(s):  
Chuanxiao Li ◽  
Wenqiang Li ◽  
Zhong Tang ◽  
Song Li ◽  
Hai Xiang

Abstract As a vital step of text classification (TC) task, the assignment of term weight has a great influence on the performance of TC. Currently, masses of term weighting schemes can be utilized, such as term frequency-inverse documents frequency (TF-IDF) and term frequency-relevance frequency (TF-RF), and they are all consisted of local part (TF) and global part (e.g., IDF, RF). However, most of these schemes adopt the logarithmic processing on their respective global parts, and it is natural to consider whether the logarithmic processing apply to all these schemes or not. Actually, for a specific term weighting scheme, due to its different ratio of local weight and global weight resulting from logarithmic processing, it usually shows diverse text clasification results on different text sets, which presents poor robustness. To explore the influence of logarithmic processing imposed on the global weight on the classification result of term weighting schemes, TF-RF is selected as the representative because it can achieve a better performance among these schemes adopting logarithmic processing. Then, two propositions along with corresponding methods about the relation between TF part and RF part are proposed based on TF-RF. In addition, two groups of experiments are conducted on the two methods. The first group of experiments proves that one method (denoted as TF-ERF) is more helpful to the improvement than the other one (denoted as ETF-RF). The second group of experiments shows that TF-ERF not only ourperforms TF-RF but also obtains better performance than other existing term weighting schemes.


Sign in / Sign up

Export Citation Format

Share Document