IoT Based Health—Related Topic Recognition from Emerging Online Health Community (Med Help) Using Machine Learning Technique
The unprompted patient’s and inimitable physician’s experience shared on online health communities (OHCs) contain a wealth of unexploited knowledge. Med Help and eHealth are some of the online health communities offering new insights and solutions to all health issues. Diabetes mellitus (DM), thyroid disorders and tuberculosis (TB) are chronic diseases increasing rapidly every year. As part of the project described in this article comments related to the diseases from Med Help were collected. The comments contain the patient and doctor discussions in an unstructured format. The sematic vision of the internet of things (IoT) plays a vital role in organizing the collected data. We pre-processed the data using standard natural language processing techniques and extracted the essential features of the words using the chi-squared test. After preprocessing the documents, we clustered them using the K-means++ algorithm, which is a popular centroid-based unsupervised iterative machine learning algorithm. A generative probabilistic model (LDA) was used to identify the essential topic in each cluster. This type of framework will empower the patients and doctors to identify the similarity and dissimilarity about the various diseases and important keywords among the diseases in the form of symptoms, medical tests and habits.