misclassification error
Recently Published Documents


TOTAL DOCUMENTS

164
(FIVE YEARS 45)

H-INDEX

19
(FIVE YEARS 2)

2022 ◽  
pp. 1-12
Author(s):  
Mohammed Hamdi

With the evaluation of the software industry, a huge number of software applications are designing, developing, and uploading to multiple online repositories. To find out the same type of category and resource utilization of applications, researchers must adopt manual working. To reduce their efforts, a solution has been proposed that works in two phases. In first phase, a semantic analysis-based keywords and variables identification process has been proposed. Based on the semantics, designed a dataset having two classes: one represents application type and the other corresponds to application keywords. Afterward, in second phase, input preprocessed dataset to manifold machine learning techniques (Decision Table, Random Forest, OneR, Randomizable Filtered Classifier, Logistic model tree) and compute their performance based on TP Rate, FP Rate, Precision, Recall, F1-Score, MCC, ROC Area, PRC Area, and Accuracy (%). For evaluation purposes, I have used an R language library called latent semantic analysis for creating semantics, and the Weka tool is used for measuring the performance of algorithms. Results show that the random forest depicts the highest accuracy which is 99.3% due to its parametric function evaluation and less misclassification error.


2021 ◽  
Vol 38 (6) ◽  
pp. 1713-1718
Author(s):  
Manikanta Prahlad Manda ◽  
Daijoon Hyun

Traditional thresholding methods are often used for image segmentation of real images. However, due to distinct characteristics of infrared thermal images, it is difficult to ensure an optimal image segmentation using the traditional thresholding algorithms, and therefore, sometimes this can lead to over-segmentation, missing object information, and/or spurious responses in the output. To overcome these issues, we propose a new thresholding technique that makes use of the sine entropy-based criterion. Moreover, we build a double thresholding technique that makes use of two thresholds to get the final image thresholding result. Besides, we introduce the sine entropy concept as a supplement of the Shannon entropy in creating threshold-dependent criterion derived from the grayscale histogram. We found that the sine entropy is more robust in interpreting the strength of the long-range correlation in the gray levels compared to the Shannon entropy. We have experimented with our method on several infrared thermal images collected from standard image databases to describe the performance. On comparing with the state-of-art methods, the qualitative results from the experiments show that the proposed method achieves the best performance with an average sensitivity of 0.98 and an average misclassification error of 0.01, and second-best performance with an average sensitivity of 0.99 and an average specificity of 0.93.


Author(s):  
Tongguang Ni ◽  
Yan Ding ◽  
Jing Xue ◽  
Kaijian Xia ◽  
Xiaoqing Gu ◽  
...  

Morphological classification of human sperm heads is a key technology for diagnosing male infertility. Due to its sparse representation and learning capability, dictionary learning has shown remarkable performance in human sperm head classification. To promote the discriminability of the classification model, a novel local constraint and label embedding multi-layer dictionary learning model called LCLM-MDL is proposed in this study. Based on the multi-layer dictionary learning framework, two dictionaries are built on the basis of Laplacian regularized constraint and label embedding term in each layer, and the two dictionaries are approximated to each other as much as possible, so as to well exploit the nonlinear structure and discriminability features of the morphology of human sperm heads. In addition, to promote the robustness of the model, the asymmetric Huber loss is adopted in the last layer of LCLM-MDL, which approximates the misclassification error by using the absolute error function. Finally, the experimental results on HuSHeM dataset demonstrate the validity of the LCLM-MDL.


2021 ◽  
pp. 1-23
Author(s):  
Hiroyuki Kasahara ◽  
Katsumi Shimotsu

We study identification in nonparametric regression models with a misclassified and endogenous binary regressor when an instrument is correlated with misclassification error. We show that the regression function is nonparametrically identified if one binary instrument variable and one binary covariate satisfy the following conditions. The instrumental variable corrects endogeneity; the instrumental variable must be correlated with the unobserved true underlying binary variable, must be uncorrelated with the error term in the outcome equation, but is allowed to be correlated with the misclassification error. The covariate corrects misclassification; this variable can be one of the regressors in the outcome equation, must be correlated with the unobserved true underlying binary variable, and must be uncorrelated with the misclassification error. We also propose a mixture-based framework for modeling unobserved heterogeneous treatment effects with a misclassified and endogenous binary regressor and show that treatment effects can be identified if the true treatment effect is related to an observed regressor and another observable variable.


2021 ◽  
Vol 23 (4) ◽  
pp. 0-0

Document management is a need for an era and managing documents in the regional languages is a significant and untouched area. Marathi corpus consisting of news is processed to form Group Entity document matrix Marathi (GEDMM), Vector space model for Marathi (VSMM) and Hysynset Vector space model for Marathi (HSVSMM). GEDMM uses entity group extracted using Condition random field (CRF). The frequent terms are used to construct VSMM using TF-IDF. HSVSMM uses synsets using hypernyms-hyponyms and synonyms. GEDMM and HSVSMM use dimension reduction by selecting significant feature groups. Hierarchical agglomerative clustering (HAC) is used and a dendrogram is produced to visualize the clusters. The performance analysis is carried out using several parameters like entropy, purity, misclassification error and accuracy. The clusters produced using GEDMM shows the minimum entropy and the highest purity. A random forest classifier is applied and the results are evaluated using misclassification error and accuracy.


2021 ◽  
Vol 2071 (1) ◽  
pp. 012031
Author(s):  
H Yazid ◽  
M H Mat Som ◽  
S N Basah ◽  
S Abdul Rahim ◽  
M F Mahmud ◽  
...  

Abstract Thresholding is one of the powerful methods in segmentation phase. Numerous methods were proposed to segment the foreground from the background but there is limited number of studies that analyse the effect of noise since the present of noise will affect the performance of the thresholding method. In this paper, the main idea is to analyse the effect of noise in Inverse Surface Adaptive Thresholding (ISAT) method. ISAT method is known as an excellent method to segment the image with the present of noise. The result of this analysis can be a guideline to researcher when implementing ISAT method especially in medical image diagnosis. Initially, several images with different noise variations were prepared and underwent ISAT method. In ISAT method, several image processing methods were incorporated namely edge detection, Otsu thresholding and inverse surface construction. The resulting images were evaluated using Misclassification Error (ME) to evaluate the performance of the segmentation result. Based on the obtained results, ISAT performance is consistent although the noise percentage increases from 5% to 25%.


2021 ◽  
Vol 23 (4) ◽  
pp. 1-13
Author(s):  
Jatinderkumar R. Saini ◽  
Prafulla Bharat Bafna

Document management is a need for an era and managing documents in the regional languages is a significant and untouched area. Marathi corpus consisting of news is processed to form Group Entity document matrix Marathi (GEDMM), Vector space model for Marathi (VSMM) and Hysynset Vector space model for Marathi (HSVSMM). GEDMM uses entity group extracted using Condition random field (CRF). The frequent terms are used to construct VSMM using TF-IDF. HSVSMM uses synsets using hypernyms-hyponyms and synonyms. GEDMM and HSVSMM use dimension reduction by selecting significant feature groups. Hierarchical agglomerative clustering (HAC) is used and a dendrogram is produced to visualize the clusters. The performance analysis is carried out using several parameters like entropy, purity, misclassification error and accuracy. The clusters produced using GEDMM shows the minimum entropy and the highest purity. A random forest classifier is applied and the results are evaluated using misclassification error and accuracy.


2021 ◽  
Vol 16 (3) ◽  
pp. 169-176
Author(s):  
Patrick Rakotomarolahy

This paper proposes prediction of the bitcoin return direction with logistic, discriminant analysis and machine learning classification techniques. It extends the prediction of the bitcoin return direction using exogenous macroeconomic and financial variables which have been investigated as drivers of bitcoin return. We also use google trends as proxy for investors interest on bitcoin. We consider those variables as predictors for bitcoin return direction. We conduct an in-sample and out-of-sample empirical analysis and achieve a misclassification error around 4% for in-sample evaluation and around 41% in out-of-sample empirical analysis. Ensemble learning trees based outperforms the other methods in both in-sample and out-of-sample analyses.


Author(s):  
Jisha Anu Jose ◽  
C. Sathish Kumar ◽  
S. Sureshkumar

Aims / Objectives: Identification of fish species is essential in export industries. Among the different fish species exported, tuna forms a significant portion and hence the separation of tuna from other fishes is necessary. The work aims to develop automated systems for the separation of commercially important tuna from other fishes.  Methodology: The work proposes two models for the classification of commercial fishes. The first model uses conventional feature descriptors, which extract features from both spatial and frequency domain. These features are combined and are reduced by an ensemble dimension reduction method. The combined and reduced feature sets are evaluated using different classifiers. The second proposed model uses four pre-trained convolutional neural networks, VGG16, VGG19, Xception, and MobileNet, for the classification. The models are fine-tuned for the classification process. Results: Results show that for the first model, extreme learning machine classifier with Mercer wavelet kernel gives high accuracy on combined feature set while the polynomial kernel ELM provides better performance with the reduced set. For the second model, a comparison of the performance of four CNN models is done, and results indicate that VGG19 outperforms other networks in the classification task.  Conclusion: Among the two proposed models, pre-trained CNN based model shows better performance than the conventional method in the separation task. Different performance measures, accuracy, precision, recall, F-score, and misclassification error are used to evaluate the system. A comparison of performance of the proposed models with the state-of-the-art systems is also reported.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Hyeonsu Ryu ◽  
Yoon-Hyeong Choi ◽  
Eunchae Kim ◽  
Jinhyeon Park ◽  
Seula Lee ◽  
...  

Abstract Background Lung disease caused by exposure to chemical substances such as polyhexamethylene guanidine (PHMG) used in humidifier disinfectants (HDs) has been identified in Korea. Several researchers reported that exposure classification using a questionnaire might not correlate with the clinical severity classes determined through clinical diagnosis. It was asserted that the lack of correlation was due to misclassification in the exposure assessment due to recall bias. We identified the cause of uncertainty to recognize the limitations of differences between exposure assessment and clinical outcomes assumed to be true value. Therefore, it was intended to check the availability of survey using questionnaires and required to reduce misclassification error/bias in exposure assessment. Methods HDs exposure assessment was conducted as a face-to-face interview, using a questionnaire. A total of 5245 applicants participated in the exposure assessment survey. The questionnaire included information on sociodemographic and exposure characteristics such as the period, frequency, and daily usage amount of HDs. Based on clinical diagnosis, a 4 × 4 cross-tabulation of exposure and clinical classification was constructed. When the values of the exposure rating minus the clinical class were ≥ 2 and ≤ − 2, we assigned the cases to the overestimation and underestimation groups, respectively. Results The sex ratio was similar in the overestimation and underestimation groups. In terms of age, in the overestimation group, 90 subjects (24.7%) were under the age of 10, followed by 52 subjects (14.2%) in their 50s. In the underestimation group, 195 subjects (56.7%) were under the age of 10, followed by 80 subjects (23.3%) in their 30s. The overestimation group may have already recovered and responded excessively due to psychological anxiety or to receive compensation. However, relatively high mortality rates and surrogate responses observed among those under 10 years of age may have resulted in inaccurate exposure in the underestimation group. Conclusions HDs exposure assessment using a questionnaire might not correlate with adverse health effects due to recall bias and various other causes such as recovery of injury and psychological anxiety. This study revealed exposure misclassification and characteristics affected by HDs and proposed a questionnaire-based exposure assessment methodology to overcome the limitations of past exposure assessment.


Sign in / Sign up

Export Citation Format

Share Document