scholarly journals Reducing the Deterioration of Sentiment Analysis Results Due to the Time Impact

Information ◽  
2018 ◽  
Vol 9 (8) ◽  
pp. 184 ◽  
Author(s):  
Yuliya Rubtsova

The research identifies and substantiates the problem of quality deterioration in the sentiment classification of text collections identical in composition and characteristics, but staggered over time. It is shown that the quality of sentiment classification can drop up to 15% in terms of the F-measure over a year and a half. This paper presents three different approaches to improving text classification by sentiment in continuously-updated text collections in Russian: using a weighing scheme with linear computational complexity, adding lexicons of emotional vocabulary to the feature space and distributed word representation. All methods are compared, and it is shown which method is most applicable in certain cases. Experiments comparing the methods on sufficiently representative text collections are described. It is shown that suggested approaches could reduce the deterioration of sentiment classification results for collections staggered over time.

Author(s):  
Pascal Cuxac ◽  
Jean-Charles Lamirel ◽  
Maha Ghribi

Nous présentons une approche alternative pour l'évaluation de la qualité de classifications non supervisées de textes basée sur des critères de rappel, précision et F-mesure non supervisés, exploitant les descripteurs associées aux classes. La comparaison expérimentale du comportement des critères classiques avec notre approche est effectuée sur des données bibliographiques.This paper presents an alternative approach to measuring the quality of non-supervised text classification based on the recall, precision and non-supervised F-measure criteria, using class descriptors. The experimental comparison of classical criteria behaviour to our approach is based on bibliographic data.


Energies ◽  
2020 ◽  
Vol 13 (17) ◽  
pp. 4522
Author(s):  
Kai Chen ◽  
Rabea Jamil Mahfoud ◽  
Yonghui Sun ◽  
Dongliang Nan ◽  
Kaike Wang ◽  
...  

In the process of the operation and maintenance of secondary devices in smart substation, a wealth of defect texts containing the state information of the equipment is generated. Aiming to overcome the low efficiency and low accuracy problems of artificial power text classification and mining, combined with the characteristics of power equipment defect texts, a defect texts mining method for a secondary device in a smart substation is proposed, which integrates global vectors for word representation (GloVe) method and attention-based bidirectional long short-term memory (BiLSTM-Attention) method in one model. First, the characteristics of the defect texts are analyzed and preprocessed to improve the quality of the defect texts. Then, defect texts are segmented into words, and the words are mapped to the high-dimensional feature space based on the global vectors for word representation (GloVe) model to form distributed word vectors. Finally, a text classification model based on BiLSTM-Attention was proposed to classify the defect texts of a secondary device. Precision, Recall and F1-score are selected as evaluation indicators, and compared with traditional machine learning and deep learning models. The analysis of a case study shows that the BiLSTM-Attention model has better performance and can achieve the intelligent, accurate and efficient classification of secondary device defect texts. It can assist the operation and maintenance personnel to make scientific maintenance decisions on a secondary device and improve the level of intelligent management of equipment.


Author(s):  
Adam Piotr Idczak

It is estimated that approximately 80% of all data gathered by companies are text documents. This article is devoted to one of the most common problems in text mining, i. e. text classification in sentiment analysis, which focuses on determining document’s sentiment. Lack of defined structure of the text makes this problem more challenging. This has led to development of various techniques used in determining document’s sentiment. In this paper the comparative analysis of two methods in sentiment classification: naive Bayes classifier and logistic regression was conducted. Analysed texts are written in Polish language and come from banks. Classification was conducted by means of bag-of-n-grams approach where text document is presented as set of terms and each term consists of n words. The results show that logistic regression performed better.


Author(s):  
José C. de Andrade ◽  
André L. D. Goneli ◽  
Cesar P. Hartmann Filho ◽  
Thalita M. S. de Azambuja ◽  
Valdenise C. Barboza

ABSTRACT The objective of this study was to evaluate the quality of second-crop corn harvested with different moisture contents as a function of time before drying. The corn grains were harvested with moisture content of 28.5, 22.4, 21 and 19%, and submitted to a temporary storage for ten days, simulating the time between harvesting and drying. Quality was subsequently evaluated every two days, based on the commercial classification of the grains, sanity test and dry bulk density. The results showed that: the increase in moisture content at harvest affects the physical and sanitary quality of second-crop corn, and this effect is aggravated over time; the moisture content of 19% is the one that least affects grain quality during the ten days of temporary storage.


Author(s):  
E. V. Pugin ◽  
A. L. Zhiznyakov

Processing of image sequences is a very actual trend now. This is confirmed with a vast amount of researches in that area. The possibility of an image sequence processing and pattern recognition became available because of increased computer capabilities and better photo and video cameras. The feature extraction is one of the main steps during image processing and pattern recognition. This paper presents a novel classification of features of image sequences. The proposed classification has three groups: 1) features of a single image, 2) features of an image sequence, 3) semantic features of an observed scene. The first group includes features extracted from a single image. The second group consists of features of any kinds of image sequences. The third group contains semantic features. Reverse feature clarification method is the iterative method when on each iteration we use higher level features to extract lower level features more precisely. The proposed classification of features of image sequences solves a problem of decomposition of the source feature space into several groups. Reverse feature clarification method allows to increase the quality of image processing during iterative process.


2020 ◽  
Vol 36 (4) ◽  
pp. 1095-1128
Author(s):  
Onno Hoffmeister

This study analyses to which extent the classification of countries as developing corresponds with their actual development level. It tracks the evolution of the development status classification schemes (DSCSs) of international organisations over time, identifies three broad concepts of a developing country, based on the social sciences literature, and analyses the degree of correspondence between classifications and concepts, based on eight indicators. The results suggest that development status is a fairly accurate measure of development. All DSCSs strongly correspond with all indicators analysed. Over time, the outcomes of DSCSs have become increasingly heterogeneous. As a result, different classification schemes match different concepts. Schemes of a first generation, which emerged before the 1990s, and which nominate countries for classes, correspond mainly with concepts focusing on difficult starting points or an early stage in systemic transition, whereas schemes of a second generation, set up in the 1990, which classify countries based on specified criteria, typically reflect a welfare-based concept. The paper argues that the growing heterogeneity of DSCSs and deficits in their documentation negatively impact on the quality of international official statistics. It makes proposals for the further development of DSCSs, also in the context of the 2030 Agenda for Sustainable Development.


2020 ◽  
Vol 64 (4) ◽  
pp. 150-167
Author(s):  
Agata Surówka ◽  

In the economy of the 21st century one of the most important resources and factors determining the strengths and competitiveness of regions is human capital. This role in regional development has been noticed in the policy of the European Union. The article presents the results of research into the diversity of human capital in Poland. The category was determined using fifteen indicators, and their selection was dictated by the availability of data across voivodships and their comparability over time. The aim was to verify the diversity of the human capital of voivodships in Poland within the regional structure of the country. The research included an attempt to measure and take into account changes in their diversity in dynamic terms (2007–2018). The research tool was factor analysis. The results allowed the assessment and observation of differences in the classification of voivodship groups. The schooling coefficients of individual types of schools have an impact on the grouping and diversity of similar voivodships in terms of human capital. Demographic processes are particularly unfavorable in the Świętokrzyskie voivodeship. The dynamic approach allows us to claim that groups of objects are characterized by a different specificity. The most favorable quality of human capital was assessed in the Mazowieckie voivodeship. It was observed that the voivodships in Poland also differentiate the indicators characterizing the working and post-working age population. The goal is characterized by variability in time. Given the dynamic dimensions of the category, achieving them in a different way seems very important.


2021 ◽  
Vol 7 ◽  
pp. e639
Author(s):  
Chunlei Li ◽  
Huanyu Li ◽  
Zhoufeng Liu ◽  
Bicao Li ◽  
Yun Huang

Seed purity directly affects the quality of seed breeding and subsequent processing products. Seed sorting based on machine vision provides an effective solution to this problem. The deep learning technology, particularly convolutional neural networks (CNNs), have exhibited impressive performance in image recognition and classification, and have been proven applicable in seed sorting. However the huge computational complexity and massive storage requirements make it a great challenge to deploy them in real-time applications, especially on devices with limited resources. In this study, a rapid and highly efficient lightweight CNN based on visual attention, namely SeedSortNet, is proposed for seed sorting. First, a dual-branch lightweight feature extraction module Shield-block is elaborately designed by performing identity mapping, spatial transformation at higher dimensions and different receptive field modeling, and thus it can alleviate information loss and effectively characterize the multi-scale feature while utilizing fewer parameters and lower computational complexity. In the down-sampling layer, the traditional MaxPool is replaced as MaxBlurPool to improve the shift-invariant of the network. Also, an extremely lightweight sub-feature space attention module (SFSAM) is presented to selectively emphasize fine-grained features and suppress the interference of complex backgrounds. Experimental results show that SeedSortNet achieves the accuracy rates of 97.33% and 99.56% on the maize seed dataset and sunflower seed dataset, respectively, and outperforms the mainstream lightweight networks (MobileNetv2, ShuffleNetv2, etc.) at similar computational costs, with only 0.400M parameters (vs. 4.06M, 5.40M).


2021 ◽  
Vol 2021 ◽  
pp. 1-7
Author(s):  
Chan Sun ◽  
Xiaojuan Li

HRMS is a very critical tool for companies. The recruitment text contains rich information that can provide strong information support for the company’s recruitment work and also improve the efficiency of job seekers in finding job opportunities. To this end, for the problem of multilabel text classification of recruitment information, this paper provides two algorithms for multilayer classification based on supported SVM. First, the same learning subclass method is used for text sorting subclass acquisition, and then, the class of the text is determined. Second, the hemispherical support SVM is used to find the smallest hypersphere in the feature space that contains the most text of that class and segment the text of that class from other texts. For the text to be classified, the distance from it to the center of each hypersphere is used to determine the class of the text. Experimental results on recruitment data demonstrate that the algorithm in this paper has a high check-all rate, check-accuracy rate, and F1. And, the relationship between HRM activities and corporate performance is discussed.


Sign in / Sign up

Export Citation Format

Share Document