Stacking-Based Ensemble Learning of Self-Media Data for Marketing Intention Detection

Social network services for self-media, such as Weibo, Blog, and WeChat Public, constitute a powerful medium that allows users to publish posts every day. Due to insufficient information transparency, malicious marketing of the Internet from self-media posts imposes potential harm on society. Therefore, it is necessary to identify news with marketing intentions for life. We follow the idea of text classification to identify marketing intentions. Although there are some current methods to address intention detection, the challenge is how the feature extraction of text reflects semantic information and how to improve the time complexity and space complexity of the recognition model. To this end, this paper proposes a machine learning method to identify marketing intentions from large-scale We-Media data. First, the proposed Latent Semantic Analysis (LSI)-Word2vec model can reflect the semantic features. Second, the decision tree model is simplified by decision tree pruning to save computing resources and reduce the time complexity. Finally, this paper examines the effects of classifier associations and uses the optimal configuration to help people efficiently identify marketing intention. Finally, the detailed experimental evaluation on several metrics shows that our approaches are effective and efficient. The F1 value can be increased by about 5%, and the running time is increased by 20%, which prove that the newly-proposed method can effectively improve the accuracy of marketing news recognition.

Download Full-text

Spam Mail Filtering Using Data Mining Approach

Handling Priority Inversion in Time-Constrained Distributed Databases - Advances in Data Mining and Database Management ◽

10.4018/978-1-7998-2491-6.ch015 ◽

2020 ◽

pp. 253-282 ◽

Cited By ~ 3

Author(s):

Ajay Kumar Gupta

Keyword(s):

Decision Tree ◽

Classification Accuracy ◽

Time Complexity ◽

Identification Accuracy ◽

Tree Model ◽

Swarm Optimization ◽

Spam Filter ◽

Data Mining Approach ◽

Lower Complexity ◽

Using Data

This chapter presents an overview of spam email as a serious problem in our internet world and creates a spam filter that reduces the previous weaknesses and provides better identification accuracy with less complexity. Since J48 decision tree is a widely used classification technique due to its simple structure, higher classification accuracy, and lower time complexity, it is used as a spam mail classifier here. Now, with lower complexity, it becomes difficult to get higher accuracy in the case of large number of records. In order to overcome this problem, particle swarm optimization is used here to optimize the spam base dataset, thus optimizing the decision tree model as well as reducing the time complexity. Once the records have been standardized, the decision tree is again used to check the accuracy of the classification. The chapter presents a study on various spam-related issues, various filters used, related work, and potential spam-filtering scope.

Download Full-text

A Comparative Study of Transactional and Semantic Approaches for Predicting Cascades on Twitter

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/169 ◽

2018 ◽

Author(s):

Yunwei Zhao ◽

Can Wang ◽

Chi-Hung Chi ◽

Kwok-Yan Lam ◽

Sen Wang

Keyword(s):

Comparative Study ◽

Information Diffusion ◽

Large Scale ◽

Semantic Features ◽

Information Cascades ◽

Social Media Data ◽

Twitter Data ◽

Pros And Cons ◽

Media Data ◽

Transactional Models

The availability of massive social media data has enabled the prediction of people’s future behavioral trends at an unprecedented large scale. Information cascades study on Twitter has been an integral part of behavior analysis. A number of methods based on the transactional features (such as keyword frequency) and the semantic features (such as sentiment) have been proposed to predict the future cascading trends. However, an in-depth understanding of the pros and cons of semantic and transactional models is lacking. This paper conducts a comparative study of both approaches in predicting information diffusion with three mechanisms: retweet cascade, url cascade, and hashtag cascade. Experiments on Twitter data show that the semantic model outperforms the transactional model, if the exterior pattern is less directly observable (i.e. hashtag cascade). When it becomes more directly observable (i.e. retweet and url cascades), the semantic method yet delivers approximate accuracy (i.e. url cascade) or even worse accuracy (i.e. retweet cascade). Further, we demonstrate that the transactional and semantic models are not independent, and the performance gets greatly enhanced when combining both.

Download Full-text

The Research of the Quantitative Method of Desertification Assessment at Large Scale Based on MODIS Data and Decision Tree Model - A Case Study in Farming-Pastoral Region of North China

2012 2nd International Conference on Remote Sensing, Environment and Transportation Engineering ◽

10.1109/rsete.2012.6260783 ◽

2012 ◽

Cited By ~ 2

Author(s):

Duanyang Xu ◽

Chunlei Li ◽

Xiao Song

Keyword(s):

Decision Tree ◽

Quantitative Method ◽

Large Scale ◽

North China ◽

Decision Tree Model ◽

Tree Model ◽

Modis Data ◽

Desertification Assessment

Download Full-text

A novel enhanced decision tree model for detecting chronic kidney disease

Network Modeling Analysis in Health Informatics and Bioinformatics ◽

10.1007/s13721-021-00302-w ◽

2021 ◽

Vol 10 (1) ◽

Author(s):

Avijit Kumar Chaudhuri ◽

Deepankar Sinha ◽

Dilip K. Banerjee ◽

Anirban Das

Keyword(s):

Chronic Kidney Disease ◽

Kidney Disease ◽

Decision Tree ◽

Decision Tree Model ◽

Tree Model

Download Full-text

Decision Tree-Based Adaptive Reconfigurable Cache Scheme

Algorithms ◽

10.3390/a14060176 ◽

2021 ◽

Vol 14 (6) ◽

pp. 176

Author(s):

Wei Zhu ◽

Xiaoyang Zeng

Keyword(s):

Decision Tree ◽

Adaptive Algorithms ◽

Memory Access ◽

Access Time ◽

Decision Tree Algorithm ◽

Verilog Hdl ◽

Tree Model ◽

Cache Associativity ◽

Cache Scheme ◽

Reconfigurable Cache

Applications have different preferences for caches, sometimes even within the different running phases. Caches with fixed parameters may compromise the performance of a system. To solve this problem, we propose a real-time adaptive reconfigurable cache based on the decision tree algorithm, which can optimize the average memory access time of cache without modifying the cache coherent protocol. By monitoring the application running state, the cache associativity is periodically tuned to the optimal cache associativity, which is determined by the decision tree model. This paper implements the proposed decision tree-based adaptive reconfigurable cache in the GEM5 simulator and designs the key modules using Verilog HDL. The simulation results show that the proposed decision tree-based adaptive reconfigurable cache reduces the average memory access time compared with other adaptive algorithms.

Download Full-text

Risk of Pre-Malignancy or Malignancy in Postmenopausal Endometrial Polyps: A CHAID Decision Tree Analysis

Diagnostics ◽

10.3390/diagnostics11061094 ◽

2021 ◽

Vol 11 (6) ◽

pp. 1094

Author(s):

Michael Wong ◽

Nikolaos Thanatsis ◽

Federica Nardelli ◽

Tejal Amin ◽

Davor Jurkovic

Keyword(s):

Decision Tree ◽

Expectant Management ◽

Decision Tree Model ◽

Decision Tree Analysis ◽

Focal Lesions ◽

Tree Model ◽

Normal Endometrium ◽

Endometrial Polyps ◽

Tree Analysis ◽

Interaction Detection

Background and aims: Postmenopausal endometrial polyps are commonly managed by surgical resection; however, expectant management may be considered for some women due to the presence of medical co-morbidities, failed hysteroscopies or patient’s preference. This study aimed to identify patient characteristics and ultrasound morphological features of polyps that could aid in the prediction of underlying pre-malignancy or malignancy in postmenopausal polyps. Methods: Women with consecutive postmenopausal polyps diagnosed on ultrasound and removed surgically were recruited between October 2015 to October 2018 prospectively. Polyps were defined on ultrasound as focal lesions with a regular outline, surrounded by normal endometrium. On Doppler examination, there was either a single feeder vessel or no detectable vascularity. Polyps were classified histologically as benign (including hyperplasia without atypia), pre-malignant (atypical hyperplasia), or malignant. A Chi-squared automatic interaction detection (CHAID) decision tree analysis was performed with a range of demographic, clinical, and ultrasound variables as independent, and the presence of pre-malignancy or malignancy in polyps as dependent variables. A 10-fold cross-validation method was used to estimate the model’s misclassification risk. Results: There were 240 women included, 181 of whom presented with postmenopausal bleeding. Their median age was 60 (range of 45–94); 18/240 (7.5%) women were diagnosed with pre-malignant or malignant polyps. In our decision tree model, the polyp mean diameter (≤13 mm or >13 mm) on ultrasound was the most important predictor of pre-malignancy or malignancy. If the tree was allowed to grow, the patient’s body mass index (BMI) and cystic/solid appearance of the polyp classified women further into low-risk (≤5%), intermediate-risk (>5%–≤20%), or high-risk (>20%) groups. Conclusions: Our decision tree model may serve as a guide to counsel women on the benefits and risks of surgery for postmenopausal endometrial polyps. It may also assist clinicians in prioritizing women for surgery according to their risk of malignancy.

Download Full-text

A Hybrid Decision Tree-Neural Network (DT-NN) Model for Large-Scale Classification Problems

2020 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata50022.2020.9378061 ◽

2020 ◽

Author(s):

Jarrod Carson ◽

Kane Hollingsworth ◽

Rituparna Datta ◽

George Clark ◽

Aviv Segev

Keyword(s):

Neural Network ◽

Decision Tree ◽

Large Scale ◽

Classification Problems ◽

Scale Classification

Download Full-text

Learning Topic Map from Large Scale Social Media Data

Companion Proceedings of the Web Conference 2020 ◽

10.1145/3366424.3382088 ◽

2020 ◽

Author(s):

Hui-Kuo Yang

Keyword(s):

Social Media ◽

Large Scale ◽

Social Media Data ◽

Topic Map ◽

Media Data

Download Full-text

Semantic-Aware Visual Abstraction of Large-Scale Social Media Data With Geo-Tags

IEEE Access ◽

10.1109/access.2019.2935471 ◽

2019 ◽

Vol 7 ◽

pp. 114851-114861 ◽

Cited By ~ 1

Author(s):

Zhiguang Zhou ◽

Xinlong Zhang ◽

Xiaoyun Zhou ◽

Yuhua Liu

Keyword(s):

Social Media ◽

Large Scale ◽

Social Media Data ◽

Media Data

Download Full-text

Reanalysis and External Validation of a Decision Tree Model for Detecting Unrecognized Diabetes in Rural Chinese Individuals

International Journal of Endocrinology ◽

10.1155/2017/3894870 ◽

2017 ◽

Vol 2017 ◽

pp. 1-6 ◽

Cited By ~ 2

Author(s):

Zhong Xin ◽

Lin Hua ◽

Xu-Hong Wang ◽

Dong Zhao ◽

Cai-Guo Yu ◽

...

Keyword(s):

Decision Tree ◽

Predictive Value ◽

Early Stage ◽

Current Model ◽

External Validation ◽

Area Under The Curve ◽

Decision Tree Model ◽

Tree Model ◽

Chinese Adult ◽

Significant Difference

We reanalyzed previous data to develop a more simplified decision tree model as a screening tool for unrecognized diabetes, using basic information in Beijing community health records. Then, the model was validated in another rural town. Only three non-laboratory-based risk factors (age, BMI, and presence of hypertension) with fewer branches were used in the new model. The sensitivity, specificity, positive predictive value, negative predictive value, and area under the curve (AUC) for detecting diabetes were calculated. The AUC values in internal and external validation groups were 0.708 and 0.629, respectively. Subjects with high risk of diabetes had significantly higher HOMA-IR, but no significant difference in HOMA-B was observed. This simple tool will help general practitioners and residents assess the risk of diabetes quickly and easily. This study also validates the strong associations of insulin resistance and early stage of diabetes, suggesting that more attention should be paid to the current model in rural Chinese adult populations.

Download Full-text