BACKGROUND
Harnessing health-related data posted on social media in real-time has the potential to offer insights into how the pandemic impacts the mental health and general well-being of individuals and populations over time.
OBJECTIVE
The aim of this study was to obtain information on symptoms and medical conditions self-reported by non-Twitter social media users during the coronavirus disease 2019 (COVID-19) pandemic, and to determine how discussion of these symptoms and medical conditions on social media changed over time.
METHODS
We used natural language processing (NLP) algorithms to identify symptom and medical condition topics being discussed on social media between June 14 and December 13, 2020. The sample social media posts were geotagged by NetBase, a third-party data provider. We calculated the positive predictive value and sensitivity to validate the classification of the posts. We also assessed the frequency of different health-related discussions on social media over time during the study period, and compared the changes in the frequency of each symptom/medical condition discussion to the fluctuation of U.S. daily new COVID-19 cases during the study period. Additionally, we compared the trends of the 5 most commonly mentioned symptoms and medical conditions from June 14 to August 31 (when the U.S. passed 6 million COVID-19 cases) to the trends observed from September 1 to December 13, 2020.
RESULTS
Within a total of 9,807,813 posts (nearly 70% were sourced from the U.S.), we identified discussion of 120 symptom topics and 1,542 medical condition topics. Our classification of the health-related posts had a positive predictive value of over 80% and an average classification rate of 92% sensitivity. The 5 most commonly mentioned symptoms on social media during the study period were: anxiety (in 201,303 posts or 12.2% of the total posts mentioning symptoms), generalized pain (189,673, 11.5%), weight loss (95,793, 5.8%), fatigue (91,252, 5.5%), and coughing (86,235, 5.2%). The 5 most discussed medical conditions were: COVID-19 (in 5,420,276 posts or 66.4% of the total posts mentioning medical conditions), unspecified infectious disease (469,356, 5.8%), influenza (270,166, 3.3%), unspecified disorders of the central nervous system (253,407, 3.1%), and depression (151,752, 1.9%). The changes in the frequency of 2 medical conditions, COVID-19 and unspecified infectious disease, were similar to the fluctuation of daily new confirmed cases of COVID-19 in the U.S.
CONCLUSIONS
COVID-19 and symptoms of anxiety were the two most commonly discussed health-related topics on social media from June 14 to December 13, 2020. Real-time monitoring of social media posts on symptoms and medical conditions may help assess the population's mental health status and enhance public health surveillance for infectious disease.