Hopelessly Computational
Home | Workshops | Events

Hopelessly Computational is a series of pop-up sessions that introduces students to the world of computational social research.

Each session approaches the field from one facet of its research, featuring one theme, paired with one of its characteristics.

The series follows one path into the field, one that involves massive computation or that is administered by computational means, leading to a happy coda when methodological or logistical challenges are addressed, and therefore expanding the range of the problem-solving ability of social scientists.

Presented by the Data Services @ NYU Shanghai Library.

A Field Trip

Date: March 11, 2022
Instructors: Yun Dai, Fan Luo

We first introduce the field of computational social science, including its topics, approaches and desired skill sets. Then we review a computational social research project as a case study.

The Field of Computational Social Science

Edelmann, A., Wolff, T., Montagne, D., & Bail, C. A. (2020). Computational social science and sociology. Annual Review of Sociology, 46, 61-81. http://dx.doi.org/10.1146/annurev-soc-121919-054621

Lazer, D., Pentland, A., Adamic, L., Aral, S., Barabási, A. L., Brewer, D., ... & Van Alstyne, M. (2009). Computational social science. Science, 323(5915), 721-723. http://dx.doi.org/10.1126/science.1167742

Lazer, D. M., Pentland, A., Watts, D. J., Aral, S., Athey, S., Contractor, N., ... & Wagner, C. (2020). Computational social science: Obstacles and opportunities. Science, 369(6507), 1060-1062. http://dx.doi.org/10.1126/science.aaz8170

Approaches

Athey, S., & Imbens, G. W. (2019). Machine learning methods that economists should know about. Annual Review of Economics, 11, 685-725. http://dx.doi.org/10.1146/annurev-economics-080217-053433

Grimmer, J., Roberts, M. E., & Stewart, B. M. (2021). Machine learning for social science: An agnostic approach. Annual Review of Political Science, 24, 395-419. http://dx.doi.org/10.1146/annurev-polisci-053119-015921

Molina, M., & Garip, F. (2019). Machine learning for sociology. Annual Review of Sociology, 45, 27-45. http://dx.doi.org/10.1146/annurev-soc-073117-041106

A Case Study

Wu, L., Wang, D., & Evans, J. A. (2019). Large teams develop and small teams disrupt science and technology. Nature, 566(7744), 378-382. http://dx.doi.org/10.1038/s41586-019-0941-9

Blue Team Dynamics

Date: October 12, 2022
Instructor: Yun Dai

In this session, we begin with the seminal case of Google Flu Trends. It was a surveillance tool that Google launched in 2008 to estimate influenza activity in near-real time. GFT models raised hope of faster and easier estimates than "old-school" methods of data collection and statistical analysis, claiming the ascendance of data-driven approaches. In 2015, however, after several major stumbles in subsequent influenza seasons, GFT stopped publishing estimates.

But scientific efforts on harnessing search queries to nowcast or forecast epidemics have continued to the present. We review publications along these lines of research. We focus on one issue, the "blue team dynamics". This describes a process where the algorithm producing the data has been modified by the service provider in accordance with their business model, inducing specific user behaviors and introducing patterns into data.

More generally, beyond the "blue team dynamics", we also discuss benefits and biases, and promise and potential perils of social research using big data for disease nowcasting and forecasting.

digital disease surveillance, forecasting, web data mining, confounding algorithms, drifting

Digital Disease Surveillance

Aiello, A. E., Renson, A., & Zivich, P. (2020). Social media-and internet-based disease surveillance for public health. Annual Review of Public Health, 41, 101. https://doi.org/10.1146/annurev-publhealth-040119-094402

Budd, J., Miller, B. S., Manning, E. M., Lampos, V., Zhuang, M., Edelstein, M., ... & McKendry, R. A. (2020). Digital technologies in the public-health response to COVID-19. Nature Medicine, 26(8), 1183-1192. https://doi.org/10.1038/s41591-020-1011-4

Simonsen, L., Gog, J. R., Olson, D., & Viboud, C. (2016). Infectious disease surveillance in the big data era: towards faster and locally relevant systems. The Journal of Infectious Diseases, 214(suppl_4), S380-S385. https://doi.org/10.1093/infdis/jiw376

Althouse, B. M., Scarpino, S. V., Meyers, L. A., Ayers, J. W., Bargsten, M., Baumbach, J., ... & Wesolowski, A. (2015). Enhancing disease surveillance with novel data streams: challenges and opportunities. EPJ Data Science, 4(1), 1-8. http://dx.doi.org/10.1140/epjds/s13688-015-0054-0

Gasser, U., Ienca, M., Scheibner, J., Sleigh, J., & Vayena, E. (2020). Digital tools against COVID-19: taxonomy, ethical challenges, and navigation aid. The Lancet Digital Health, 2(8), e425-e434. https://doi.org/10.1016/S2589-7500(20)30137-0

Groseclose, S. L., & Buckeridge, D. L. (2017). Public health surveillance systems: recent advances in their use and evaluation. Annual Review of Public Health, 38, 57-79. https://doi.org/10.1146/annurev-publhealth-031816-044348

Google Flu Trends: Models

Ginsberg, J., Mohebbi, M. H., Patel, R. S., Brammer, L., Smolinski, M. S., & Brilliant, L. (2009). Detecting influenza epidemics using search engine query data. Nature, 457(7232), 1012-1014. http://dx.doi.org/10.1038/nature07634

Cook, S., Conrad, C., Fowlkes, A. L., & Mohebbi, M. H. (2011). Assessing Google flu trends performance in the United States during the 2009 influenza virus A (H1N1) pandemic. PloS one, 6(8), e23610. https://doi.org/10.1371/journal.pone.0023610

Google Flu Trends: Critiques

Butler, D. (2013). When Google got flu wrong: US outbreak foxes a leading web-based method for tracking seasonal flu. Nature, 494(7436), 155-157. https://dx.doi.org/10.1038/494155a

Lazer, D., Kennedy, R., King, G., & Vespignani, A. (2014). The parable of Google Flu: traps in big data analysis. Science, 343(6176), 1203-1205. http://dx.doi.org/10.1126/science.1248506

Ortiz, J. R., Zhou, H., Shay, D. K., Neuzil, K. M., Fowlkes, A. L., & Goss, C. H. (2011). Monitoring influenza activity in the United States: a comparison of traditional surveillance systems with Google Flu Trends. PloS One, 6(4), e18687. https://doi.org/10.1371/journal.pone.0018687

Epidemic Forecasts Using Internet Searches

Santillana, M., Zhang, D. W., Althouse, B. M., & Ayers, J. W. (2014). What can digital disease detection learn from (an external revision to) Google Flu Trends?. American Journal of Preventive Medicine, 47(3), 341-347. https://doi.org/10.1016/j.amepre.2014.05.020

Yang, S., Santillana, M., & Kou, S. C. (2015). Accurate estimation of influenza epidemics using Google search data via ARGO. Proceedings of the National Academy of Sciences, 112(47), 14473-14478. http://dx.doi.org/10.1073/pnas.1515373112

Ma, S., & Yang, S. (2022). Covid-19 forecasts using internet search information in the united states. Scientific Reports, 12(1), 1-16. https://doi.org/10.1038/s41598-022-15478-y

Wisdom of Crowds

Date: October 13, 2022
Instructor: Yun Dai

In this session, we turn to the second facet of computational social research—research design utilizing crowdsourcing technology to extract collective intelligence. We focus on one stage in a research project, data generation, including data collection and data production processes such as text annotation.

We discuss three such types of research designs: 1) an open survey that evolves over time based on the ideas of its participants; 2) a system that distributes microtasks in the crowd, whose outputs are as reliable and valid as those from expert human readers; and 3) a software application that interfaces with crowdsourcing technology, and that automates recruiting, collecting both behavior and survey data, and providing incentives to generate responses all in one stop.

These kinds of research design have the potential to improve the scope, efficiency, cost, scalability, sampling, response rates, and convenience of social scientific projects, compared to research in the analog age. Challenges of data quality control, assessing response biases, or adjusting sampling biases can be addressed in the design phase or analysis of data.

crowdsourcing, collective intelligence, software design, Internet technology, data generation

Open Survey

Salganik, Matthew J., and Karen E. C. Levy. 2015. “Wiki Surveys: Open and Quantifiable Social Data Collection.” PLoS One, 10 (5):e0123483. https://doi.org/10.1371/journal.pone.0123483

Distribution of Microtasks

Benoit, K., Conway, D., Lauderdale, B. E., Laver, M., & Mikhaylov, S. (2016). Crowd-sourced text analysis: Reproducible and agile production of political data. American Political Science Review, 110(2), 278-295. https://doi.org/10.1017/S0003055416000058

Pennycook, G., & Rand, D. G. (2019). Fighting misinformation on social media using crowdsourced judgments of news source quality. Proceedings of the National Academy of Sciences, 116(7), 2521-2526. https://doi.org/10.1073/pnas.1806781116

Benoit, K., Munger, K., & Spirling, A. (2019). Measuring and explaining political sophistication through textual complexity. American Journal of Political Science, 63(2), 491-508. https://doi.org/10.1111/ajps.12423

Porter, N. D., Verdery, A. M., & Gaddis, S. M. (2020). Enhancing big data in the social sciences with crowdsourcing: Data augmentation practices, techniques, and opportunities. PloS One, 15(6), e0233154. https://doi.org/10.1371/journal.pone.0233154

Social Media Survey App

Bail, Christopher A. (2015). “Taming Big Data Using App Technology to Study Organizational Behavior on Social Media.” Sociological Methods & Research, 46(2), 189-217.

Science of Where

Date: October 14, 2022
Instructor: Fan Luo

It is always intriguing to acquire data, pin it on a location, and decipher the meaning behind the information we acquired and how that is related to others. Snow’s cholera map is one of the earliest disease maps that presents the co-relationship between patients and water pumps. However, with limited technology, it was impossible to further explore and discover the more dynamic and complex patterns, let alone make predictions of the potential development.

Nowadays, thanks to the rapid development of technology, to name a few, satellite imagery and the Internet of things (IoT), the acquisition process is more streamlined. Besides, GIS and computation together reveals the underlying patterns of the data and makes more accurate predictions with the data. This session, using three studies on the influence of COVID19 over urban regions as examples, presents the approaches from a spatial perspective.

However, technology doesn’t automatically solve every problem, and researchers have to be aware of distortions and potential issues in stages from obtaining data to making decisions. We will discuss some common cases, including the causes and the steps to eliminate their effects.

geospatial analysis, pattern recognition, prediction

Albalate, D., Bel, G., & Gragera, A. (2022). Mobility, environment and inequalities in the post-COVID city. Cambridge Journal of Regions, Economy and Society. https://doi.org/10.1093/cjres/rsac021

Li, Q., & Xu, W. (2022). The impact of covid-19 on bike-sharing travel pattern and flow structure: Evidence from Wuhan. Cambridge Journal of Regions, Economy and Society. https://doi.org/10.1093/cjres/rsac005

Wang, S., Zhang, M., Huang, X., Hu, T., Li, Z., Sun, Q. C., & Liu, Y. (2022). Urban-regional disparities in mental health signals in Australia during the COVID-19 pandemic: A study via Twitter data and Machine Learning Models. Cambridge Journal of Regions, Economy and Society. https://doi.org/10.1093/cjres/rsac025

Robot Historians

Upcoming