training subset
Recently Published Documents


TOTAL DOCUMENTS

29
(FIVE YEARS 13)

H-INDEX

6
(FIVE YEARS 2)

CATENA ◽  
2021 ◽  
Vol 207 ◽  
pp. 105699
Author(s):  
E.M. Baglaeva ◽  
A.P. Sergeev ◽  
A.V. Shichkin ◽  
A.G. Buevich

2021 ◽  
pp. 1-19
Author(s):  
Stergios Liapis ◽  
Konstantinos Christantonis ◽  
Victor Chazan-Pantzalis ◽  
Anastassios Manos ◽  
Despina Elizabeth Filippidou ◽  
...  

This paper presents a novel methodology using classification for day-ahead traffic prediction. It addresses the research question whether traffic state can be forecasted based on meteorological conditions, seasonality, and time intervals, as well as COVID-19 related restrictions. We propose reliable models utilizing smaller data partitions. Apart from feature selection, we incorporate new features related to movement restrictions due to COVID-19, forming a novel data model. Our methodology explores the desired training subset. Results showed that various models can be developed, with varying levels of success. The best outcome was achieved when factoring in all relevant features and training on a proposed subset. Accuracy improved significantly compared to previously published work.


Author(s):  
Ferdinando Di Martino ◽  
Salvatore Sessa

AbstractWe present a new classification algorithm for machine learning numerical data based on direct and inverse fuzzy transforms. In our previous work fuzzy transforms were used for numerical attribute dependency in data analysis: the multi-dimensional inverse fuzzy transform was used to approximate the regression function. Also here the classification method presented is based on this operator. Strictly speaking, we apply the K-fold cross-validation algorithm for controlling the presence of over-fitting and for estimating the accuracy of the classification model: for each training (resp., testing) subset an iteration process evaluates the best fuzzy partitions of the inputs. Finally, a weighted mean of the multi-dimensional inverse fuzzy transforms calculated for each training subset (resp., testing) is used for data classification. We compare this algorithm on well-known datasets with other five classification methods.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Yajing Hao ◽  
Xinrong Yan ◽  
Jianbin Wu ◽  
Huijun Wang ◽  
Linfeng Yuan

Recently, researchers have shown that coverless steganography is relatively safe. On this basis, to improve the payload of the coverless steganography, a novel semiconstruction coverless steganography algorithm is introduced in the paper. Firstly, web crawler technology is applied to crawl a wide range of small icons and hot news images from the Internet. These icons can be used as the training subset, and the hot news can be designed according to construction rules. Secondly, the Alex-Net network is introduced for training in the algorithm, and the adversarial samples are added to the training set. Thirdly, using the preset template, certain small icons and a hot news image are spliced into a secret carrier image according to the construction principle. The hot news image is in the top half of the carrier, and those small icons are in the bottom half. The image on the upper part of the carrier and the icons of the lower part can be connected by image and text semantics, and the semantic matching can be realized between image semantics and explanatory. The experimental results and analysis show that the proposed algorithm can resist steganalysis tools effectively and has good robustness against various image attacks. Meanwhile, the secret information payload has been greatly improved, the maximum payload can reach 180 bits of a single 512 × 512 image. This promising algorithm can be applied to build covert communications.


2020 ◽  
pp. short20-1-short20-9
Author(s):  
Daria Lashchenova ◽  
Alexander Gromov ◽  
Anton Konushin ◽  
Anna Mesheryakova

The covid-19 pandemic has quickly spread all over the world, overwhelming public healthcare systems in many countries. In this situation demand for automatic assistance systems, to facilitate and accelerate a doctor’s job has rapidly increased. Antibody tests were introduced for diagnosing covid-19, but physicians still need tools for quantification of disease severity, since treatment choice strongly depends on it. To estimate the severity of the disease physicians use computer tomography scans. It provides physicians with information about lung lesions and their types and they use this information to determine proper treatment. In this paper we made an attempt to build a system that uses patients’ computer tomography scans for lung and lesion segmentation and for segmentation of specific types of lesions (i.e. pulmonary consolidation and “crazypaving”). Models for lung, lesions, consolidation, and “crazy-paving” segmentation performed with 0.96, 0.65, 0.48, 0.45 Dice coefficients respectively. Also it was shown that removing images with inaccurate ground-truth from the training subset can improve the quality of models trained on it.


Author(s):  
Benjing Wang ◽  
Qin Zhang ◽  
Qi Wang ◽  
Jun Ma ◽  
Xiaoju Cao ◽  
...  

AbstractThe changes of metabolite profiles in preterm birth have been demonstrated using newborn screening data. However, little is known about the holistic metabolic model in preterm neonates. The aim was to investigate the holistic metabolic model in preterm neonates. All metabolite values were obtained from a cohort data of routine newborn screening. A total of 261 758 newborns were recruited and randomly divided into a training subset and a testing subset. Using the training subset, 949 variates were considered to establish a logistic regression model for identifying preterm birth (<37 weeks) from term birth (≥37 weeks). Sventy-two variates (age at collection, TSH, 17α-OHP, proline, tyrosine, C16:1-OH, C18:2, and 65 ratios) entered into the final metabolic model for identifying preterm birth from term birth. Among the variates entering into the final model of PTB [Leucine+Isoleucine+Proline-OH)/Valine (OR=38.36], (C3DC+C4-OH)/C12 (OR=15.58), Valine/C5 (OR=6.32), [Leucine+isoleucine+Proline-OH)/Ornithine (OR=2.509)], and Proline/C18:1 (OR=2.465) have the top five OR values, and [Leucine+Isoleucine+Proline-OH)/C5 (OR=0.05)], [Leucine+Isoleucine+Proline-OH)/Phenylalanine (OR=0.214)], proline/valine (OR=0.230), C16/C18 (OR=0.259), and Alanine/free carnitine (OR=0.279) have the five lowest OR values. The final metabolic model had a capacity of identifying preterm infants with >80% accuracy in both the training and testing subsets. When identifying neonates ≤32 weeks from those >32 weeks, it had a robust performance with nearly 95% accuracy in both subsets. In summary, we have established an excellent metabolic model in preterm neonates. These findings could provide new insights for more efficient nutrient supplements and etiology of preterm birth.


Author(s):  
Justin Im ◽  
Md Taufiqul Islam ◽  
Faisal Ahmmed ◽  
Deok Ryun Kim ◽  
Ashraful Islam Khan ◽  
...  

Abstract Background Sustained investments in water, sanitation, and hygiene (WASH) have lagged in resource-poor settings; incremental WASH improvements may, nonetheless, prevent diseases such as typhoid in disease-endemic populations. Methods Using prospective data from a large cohort in urban Kolkata, India, we evaluated whether baseline WASH variables predicted typhoid risk in a training subpopulation (n = 28 470). We applied a machine learning algorithm to the training subset to create a composite, dichotomous (good, not good) WASH variable based on 4 variables, and evaluated sensitivity and specificity of this variable in a validation subset (n = 28 470). We evaluated in Cox regression models whether residents of “good” WASH households experienced a lower typhoid risk after controlling for potential confounders. We constructed virtual clusters (radius 50 m) surrounding each household to evaluate whether a prevalence of good WASH practices modified the typhoid risk in central household members. Results Good WASH practices were associated with protection in analyses of all households (hazard ratio [HR] = 0.57; 95% confidence interval [CI], .37–.90; P = .015). This protection was evident in persons ≥5 years old at baseline (HR = 0.47; 95% CI, .34–.93; P = .005) and was suggestive, though not statistically significant, in younger age groups (HR = 0.61; 95% CI, .27–1.38; P = .235). The level of surrounding household good WASH coverage was also associated with protection (HR = 0.988; 95% CI, .979–.996; P = .004, for each percent coverage increase). However, collinearity between household WASH and WASH coverage prevented an assessment of their independent predictive contributions. Conclusions In this typhoid-endemic setting, natural variation in household WASH was associated with typhoid risk. If replicated elsewhere, these findings suggest that WASH improvements may enhance typhoid control, short of major infrastructural investments.


2019 ◽  
Vol 9 (20) ◽  
pp. 4216 ◽  
Author(s):  
Zhen Chen ◽  
Xiaoyan Han ◽  
Chengwei Fan ◽  
Zirun He ◽  
Xueneng Su ◽  
...  

In recent years, machine learning methods have shown the great potential for real-time transient stability status prediction (TSSP) application. However, most existing studies overlook the imbalanced data problem in TSSP. To address this issue, a novel data segmentation-based ensemble classification (DSEC) method for TSSP is proposed in this paper. Firstly, the effects of the imbalanced data problem on the decision boundary and classification performance of TSSP are investigated in detail. Then, a three-step DSEC method is presented. In the first step, the data segmentation strategy is utilized for dividing the stable samples into multiple non-overlapping stable subsets, ensuring that the samples in each stable subset are not more than the unstable ones, then each stable subset is combined with the unstable set into a training subset. For the second step, an AdaBoost classifier is built based on each training subset. In the final step, decision values from each AdaBoost classifier are aggregated for determining the transient stability status. The experiments are conducted on the Northeast Power Coordinating Council 140-bus system and the simulation results indicate that the proposed approach can significantly improve the classification performance of TSSP with imbalanced data.


Sign in / Sign up

Export Citation Format

Share Document