Detecting early warning indicators to the rise of COVID-19 infection cases in the context of U.S.: An exploratory data analysis

This work aims to investigate if social media data, Twitter in particular can be used to detect early warning indicators of COVID-19 pandemic in the United States (US). To demonstrate the viability of this work, English tweets were collected with a hasghtag of COVID-19 related topics ranges from 12th March to end of April 2020. With the help of with N-gram language model and Term Frequency and Inverse Document Frequency (TF-IDF) significant N-grams (N=2) such as (“new, york”), (“social, distancing”), (“stay, safe”), (“toilet, paper”), (“wash, hand”), (“tested, positive”), (look, like), (“front, line”), (“grocery, store”) etc. are extracted. The analysis shows that the appearances of the N-grams in Twitter directly reflect the characteristics of the infection cases and are almost similarly distributed over different clusters. This study also reveals that the tweets of (“new, york”) increases with (“stay, home”), (“social, distancing”), (“stay, safe”), (“look, like”) and (“tested positive”); and decreases with (“toilet, paper”). Ngrams with such relationships are recognized as indicators and are validated with the mapping of number of infection cases. Results show that social media data can project the actual scenario of infection curve and able to detect early warning indicators once the pandemic is moderately recognized.

Adnan Morshed, Jaman, 2022

Type of Thesis Master Thesis
Client
Supervisor Laurenzi, Emanuele, Hinkelmann, Knut
Views: 24
Studyprogram: Business Information Systems (Master)
Keywords
Confidentiality: öffentlich
Type of Thesis
Master Thesis
Authors
Adnan Morshed, Jaman
Supervisor
Laurenzi, Emanuele, Hinkelmann, Knut
Publication Year
2022
Thesis Language
English
Confidentiality
Public
Studyprogram
Business Information Systems (Master)
Location
Olten