Can increases in Twitter posts predict increases in cumulative incidence of COVID-19 in the United States? Evidence that social media can inform epidemic surveillance. (Preprint)
BACKGROUND Though public health systems are responding rapidly to the COVID-19 pandemic, outcomes from publicly available, crowd-sourced big data may assist in helping to identify hot spots, prioritize equipment allocation and staffing, while also informing health policy related to “shelter in place” and social distancing recommendations. OBJECTIVE To assess if the rising state-level prevalence of COVID-19 related posts on Twitter (tweets) is predictive of state-level cumulative COVID-19 incidence after controlling for socio-economic characteristics. METHODS We identified extracted COVID-19 related tweets from January 21st to March 7th (2020) across all 50 states (N = 7,427,057). Tweets were combined with state-level characteristics and confirmed COVID-19 cases to determine the association between public commentary and cumulative incidence. RESULTS The cumulative incidence of COVID-19 cases varied significantly across states. Ratio of tweet increase (p=0.03), number of physicians per 1,000 population (p=0.01), education attainment (p=0.006), income per capita (p = 0.002), and percentage of adult population (p=0.003) were positively associated with cumulative incidence. Ratio of tweet increase was significantly associated with the logarithmic of cumulative incidence (p=0.06) with a coefficient of 0.26. CONCLUSIONS An increase in the prevalence of state-level tweets was predictive of an increase in COVID-19 diagnoses, providing evidence that Twitter can be a valuable surveillance tool for public health.