A Field Trip
Date: March 11, 2022
Instructors: Yun Dai, Fan Luo
We first introduce the field of computational social science, including its topics, approaches and desired skill sets. Then we review a computational social research project as a case study.
The Field of Computational Social Science
Edelmann, A., Wolff, T., Montagne, D., & Bail, C. A. (2020). Computational social science and sociology. Annual Review of Sociology, 46, 61-81. http://dx.doi.org/10.1146/annurev-soc-121919-054621
Lazer, D., Pentland, A., Adamic, L., Aral, S., Barabási, A. L., Brewer, D., ... & Van Alstyne, M. (2009). Computational social science. Science, 323(5915), 721-723. http://dx.doi.org/10.1126/science.1167742
Lazer, D. M., Pentland, A., Watts, D. J., Aral, S., Athey, S., Contractor, N., ... & Wagner, C. (2020). Computational social science: Obstacles and opportunities. Science, 369(6507), 1060-1062. http://dx.doi.org/10.1126/science.aaz8170
Approaches
Athey, S., & Imbens, G. W. (2019). Machine learning methods that economists should know about. Annual Review of Economics, 11, 685-725. http://dx.doi.org/10.1146/annurev-economics-080217-053433
Grimmer, J., Roberts, M. E., & Stewart, B. M. (2021). Machine learning for social science: An agnostic approach. Annual Review of Political Science, 24, 395-419. http://dx.doi.org/10.1146/annurev-polisci-053119-015921
Molina, M., & Garip, F. (2019). Machine learning for sociology. Annual Review of Sociology, 45, 27-45. http://dx.doi.org/10.1146/annurev-soc-073117-041106
A Case Study
Wu, L., Wang, D., & Evans, J. A. (2019). Large teams develop and small teams disrupt science and technology. Nature, 566(7744), 378-382. http://dx.doi.org/10.1038/s41586-019-0941-9
Blue Team Dynamics
Date: October 12, 2022
Instructor: Yun Dai
In this session, we begin with the seminal case of Google Flu Trends. It was a surveillance tool that Google launched in 2008 to estimate influenza activity in near-real time. GFT models raised hope of faster and easier estimates than "old-school" methods of data collection and statistical analysis, claiming the ascendance of data-driven approaches. In 2015, however, after several major stumbles in subsequent influenza seasons, GFT stopped publishing estimates.
But scientific efforts on harnessing search queries to nowcast or forecast epidemics have continued to the present. We review publications along these lines of research. We focus on one issue, the "blue team dynamics". This describes a process where the algorithm producing the data has been modified by the service provider in accordance with their business model, inducing specific user behaviors and introducing patterns into data.
More generally, beyond the "blue team dynamics", we also discuss benefits and biases, and promise and potential perils of social research using big data for disease nowcasting and forecasting.
digital disease surveillance, forecasting, web data mining, confounding algorithms, drifting
Digital Disease Surveillance
Aiello, A. E., Renson, A., & Zivich, P. (2020). Social media-and internet-based disease surveillance for public health. Annual Review of Public Health, 41, 101. https://doi.org/10.1146/annurev-publhealth-040119-094402
Budd, J., Miller, B. S., Manning, E. M., Lampos, V., Zhuang, M., Edelstein, M., ... & McKendry, R. A. (2020). Digital technologies in the public-health response to COVID-19. Nature Medicine, 26(8), 1183-1192. https://doi.org/10.1038/s41591-020-1011-4
Simonsen, L., Gog, J. R., Olson, D., & Viboud, C. (2016). Infectious disease surveillance in the big data era: towards faster and locally relevant systems. The Journal of Infectious Diseases, 214(suppl_4), S380-S385. https://doi.org/10.1093/infdis/jiw376
Althouse, B. M., Scarpino, S. V., Meyers, L. A., Ayers, J. W., Bargsten, M., Baumbach, J., ... & Wesolowski, A. (2015). Enhancing disease surveillance with novel data streams: challenges and opportunities. EPJ Data Science, 4(1), 1-8. http://dx.doi.org/10.1140/epjds/s13688-015-0054-0
Gasser, U., Ienca, M., Scheibner, J., Sleigh, J., & Vayena, E. (2020). Digital tools against COVID-19: taxonomy, ethical challenges, and navigation aid. The Lancet Digital Health, 2(8), e425-e434. https://doi.org/10.1016/S2589-7500(20)30137-0
Groseclose, S. L., & Buckeridge, D. L. (2017). Public health surveillance systems: recent advances in their use and evaluation. Annual Review of Public Health, 38, 57-79. https://doi.org/10.1146/annurev-publhealth-031816-044348
Google Flu Trends: Models
Ginsberg, J., Mohebbi, M. H., Patel, R. S., Brammer, L., Smolinski, M. S., & Brilliant, L. (2009). Detecting influenza epidemics using search engine query data. Nature, 457(7232), 1012-1014. http://dx.doi.org/10.1038/nature07634
Cook, S., Conrad, C., Fowlkes, A. L., & Mohebbi, M. H. (2011). Assessing Google flu trends performance in the United States during the 2009 influenza virus A (H1N1) pandemic. PloS one, 6(8), e23610. https://doi.org/10.1371/journal.pone.0023610
Google Flu Trends: Critiques
Butler, D. (2013). When Google got flu wrong: US outbreak foxes a leading web-based method for tracking seasonal flu. Nature, 494(7436), 155-157. https://dx.doi.org/10.1038/494155a
Lazer, D., Kennedy, R., King, G., & Vespignani, A. (2014). The parable of Google Flu: traps in big data analysis. Science, 343(6176), 1203-1205. http://dx.doi.org/10.1126/science.1248506
Ortiz, J. R., Zhou, H., Shay, D. K., Neuzil, K. M., Fowlkes, A. L., & Goss, C. H. (2011). Monitoring influenza activity in the United States: a comparison of traditional surveillance systems with Google Flu Trends. PloS One, 6(4), e18687. https://doi.org/10.1371/journal.pone.0018687
Epidemic Forecasts Using Internet Searches
Santillana, M., Zhang, D. W., Althouse, B. M., & Ayers, J. W. (2014). What can digital disease detection learn from (an external revision to) Google Flu Trends?. American Journal of Preventive Medicine, 47(3), 341-347. https://doi.org/10.1016/j.amepre.2014.05.020
Yang, S., Santillana, M., & Kou, S. C. (2015). Accurate estimation of influenza epidemics using Google search data via ARGO. Proceedings of the National Academy of Sciences, 112(47), 14473-14478. http://dx.doi.org/10.1073/pnas.1515373112
Ma, S., & Yang, S. (2022). Covid-19 forecasts using internet search information in the united states. Scientific Reports, 12(1), 1-16. https://doi.org/10.1038/s41598-022-15478-y
Wisdom of Crowds
Date: October 13, 2022
Instructor: Yun Dai
In this session, we turn to the second facet of computational social research—research design utilizing crowdsourcing technology to extract collective intelligence. We focus on one stage in a research project, data generation, including data collection and data production processes such as text annotation.
We discuss three such types of research designs: 1) an open survey that evolves over time based on the ideas of its participants; 2) a system that distributes microtasks in the crowd, whose outputs are as reliable and valid as those from expert human readers; and 3) a software application that interfaces with crowdsourcing technology, and that automates recruiting, collecting both behavior and survey data, and providing incentives to generate responses all in one stop.
These kinds of research design have the potential to improve the scope, efficiency, cost, scalability, sampling, response rates, and convenience of social scientific projects, compared to research in the analog age. Challenges of data quality control, assessing response biases, or adjusting sampling biases can be addressed in the design phase or analysis of data.
crowdsourcing, collective intelligence, software design, Internet technology, data generation
Open Survey
Salganik, Matthew J., and Karen E. C. Levy. 2015. “Wiki Surveys: Open and Quantifiable Social Data Collection.” PLoS One, 10 (5):e0123483. https://doi.org/10.1371/journal.pone.0123483
Distribution of Microtasks
Benoit, K., Conway, D., Lauderdale, B. E., Laver, M., & Mikhaylov, S. (2016). Crowd-sourced text analysis: Reproducible and agile production of political data. American Political Science Review, 110(2), 278-295. https://doi.org/10.1017/S0003055416000058
Pennycook, G., & Rand, D. G. (2019). Fighting misinformation on social media using crowdsourced judgments of news source quality. Proceedings of the National Academy of Sciences, 116(7), 2521-2526. https://doi.org/10.1073/pnas.1806781116
Benoit, K., Munger, K., & Spirling, A. (2019). Measuring and explaining political sophistication through textual complexity. American Journal of Political Science, 63(2), 491-508. https://doi.org/10.1111/ajps.12423
Porter, N. D., Verdery, A. M., & Gaddis, S. M. (2020). Enhancing big data in the social sciences with crowdsourcing: Data augmentation practices, techniques, and opportunities. PloS One, 15(6), e0233154. https://doi.org/10.1371/journal.pone.0233154
Social Media Survey App
Bail, Christopher A. (2015). “Taming Big Data Using App Technology to Study Organizational Behavior on Social Media.” Sociological Methods & Research, 46(2), 189-217.
Science of Where
Date: October 14, 2022
Instructor: Fan Luo
It is always intriguing to acquire data, pin it on a location, and decipher the meaning behind the information we acquired and how that is related to others. Snow’s cholera map is one of the earliest disease maps that presents the co-relationship between patients and water pumps. However, with limited technology, it was impossible to further explore and discover the more dynamic and complex patterns, let alone make predictions of the potential development.
Nowadays, thanks to the rapid development of technology, to name a few, satellite imagery and the Internet of things (IoT), the acquisition process is more streamlined. Besides, GIS and computation together reveals the underlying patterns of the data and makes more accurate predictions with the data. This session, using three studies on the influence of COVID19 over urban regions as examples, presents the approaches from a spatial perspective.
However, technology doesn’t automatically solve every problem, and researchers have to be aware of distortions and potential issues in stages from obtaining data to making decisions. We will discuss some common cases, including the causes and the steps to eliminate their effects.
geospatial analysis, pattern recognition, prediction
Albalate, D., Bel, G., & Gragera, A. (2022). Mobility, environment and inequalities in the post-COVID city. Cambridge Journal of Regions, Economy and Society. https://doi.org/10.1093/cjres/rsac021
Li, Q., & Xu, W. (2022). The impact of covid-19 on bike-sharing travel pattern and flow structure: Evidence from Wuhan. Cambridge Journal of Regions, Economy and Society. https://doi.org/10.1093/cjres/rsac005
Wang, S., Zhang, M., Huang, X., Hu, T., Li, Z., Sun, Q. C., & Liu, Y. (2022). Urban-regional disparities in mental health signals in Australia during the COVID-19 pandemic: A study via Twitter data and Machine Learning Models. Cambridge Journal of Regions, Economy and Society. https://doi.org/10.1093/cjres/rsac025