We collected over 550 million geolocated tweets from Twitter and created an algorithm to find words and phrases suggesting HIV-related risk behaviors such as “sex” or “get high.” The algorithm captured 8,538 tweets indicating sexual risk behaviors and 1,342 tweets suggesting stimulant drug use. Using geolocation information, we mapped the origin of these tweets on a U.S. map to identify where these behaviors were occurring.
We then merged these tweets with county-level data on HIV cases (from AIDSVu.org) to run statistical prediction models. We found a significant, positive correlation between county-level HIV prevalence and real-time communication about HIV-risk behaviors and drug use.
We then merged these tweets with county-level data on HIV cases (from AIDSVu.org) to run statistical prediction models. We found a significant, positive correlation between county-level HIV prevalence and real-time communication about HIV-risk behaviors and drug use.
This study provides the first evidence for how real-time social media data may be used for behavioral health prediction models. Moreover, it provides models for how public health departments and hospitals can use this approach to monitor and prepare for disease outbreaks.
Publications
- Young SD, Rivers C, Lewis B. Methods of using real-time social media technologies for detection and remote monitoring of HIV outcomes. Prev Med 2014; 63: 112–115.
- Young SD. Behavioral insights on big data: using social media for predicting biomedical outcomes. Trends Microbiol 2014; 22(11): 601–602.
Funding
National Institutes of Mental Health (K01 MH09884).