Disease and Social Media in Post-Natural Disaster Recovery Philippines
AbstractIntroductionThe Philippines is plagued with natural disasters and resulting precipitating factors for disease outbreaks. The developing country has a strong disease surveillance program during and post-disaster phases; however, latent disease contracted during these emergency situations emerges once the Filipinos return to their homes. Coined the social media capital of the world, the Philippines provides an opportunity to evaluate the potential of social media use in disease surveillance during the post-recovery period. By developing and defining a non-traditional method for enhancing detection of infectious diseases post-natural disaster recovery in the Philippines, this research aims to increase the resilience of affected developing countries through advanced passive disease surveillance with minimal cost and high impact.MethodsWe collected 50 million geo-tagged tweets, weekly case counts for six diseases, and all natural disasters from the Philippines between 2012 and 2013. We compared the predictive capability of various disease lexicon-based time series models (e.g., Twitter’s BreakoutDetection, Autoregressive Integrated Moving Average with Explanatory Variable [ARIMAX], Multilinear regression, and Logistic regression) and document embeddings (Gensim’s Doc2Vec).ResultsThe analyses show that the use of only tweets to predict disease outbreaks in the Philippines has varying results depending on which technique is applied, the disease type, and location. Overall, the most consistent predictive results were from the ARIMAX model which showed the significance in tweet value for prediction and a role of disaster in specific instances.DiscussionOverall, the use of disease/sick lexicon-filtered tweets as a predictor of disease in the Philippines appears promising. Due to the consistent and large increase use of Twitter within the country, it would be informative to repeat analysis on more recent years to confirm the top method for prediction. In addition, we suggest that a combination disease-specific model would produce the best results. The model would be one where the case counts of a disease are updated periodically along with the continuous monitoring of lexicon-based tweets plus or minus the time from disaster.