Data Science for Public Good

by Joel Thurston, Ph.D. and Cesar Montalvo, Ph.D.
Social and Decision Analytics Division, Biocomplexity Institute at the University of Virginia


In this Methodspace interview I asked Joel and Cesar to tell us about how data science can be used for social good, and how their program is cultivating a next generation of data scientists. This summer we will welcome a series of posts from this year’s cohorts to discuss their research projects. - Janet Salmons, Research Community Manager for Sage Methodspace


Students and staff from the 2022 Data Science for the Public Good (DSPG) program

JS. Tell us about Data Science for the Public Good. What do you do, how, and why is it important?

JT & CM. The Biocomplexity Institute's Social and Decision Analytics Division’s Data Science for the Public Good (DSPG) program is an experiential learning program that brings together students from across the country and engages them on research projects that address local, state, and federal government challenges around critical social issues. DSPG Young Scholars are undergraduate and graduate students conducting applied research at the intersection of statistics, computation, engineering, and the social sciences. Participants work in collaborative teams, vertically and horizontally integrated with our postdoctoral scholars and research faculty from the Social and Decision Analytics Division, as well as our external project stakeholders.

In the past, participants have worked on an array of different topics, from developing tools and building the capacity of rural Virginia coastal communities to deal with the impact of climate change to assessing the impact of broadband development on rural property values in Iowa to constructing a data commons or open knowledge repository to curate data insights for decision makers across the National Capital Region.

The DSPG program is all about preparing the next generation of data scientists. We live in an increasingly complex and interconnected world, facing challenges that often extend beyond what we think of as traditional community borders. As researchers, we firmly believe that data is one of the essential tools for solving these challenges, and one of the most critical skills a person can possess is data literacy. DSPG teaches people not just how to find and analyze data but how to use it to solve real problems by applying our Data Science Framework.

This framework details our research process from concrete problem identification to data discovery, acquisition, ingestion, and wrangling to the statistical modeling and analysis that produces the actionable insights that ultimately lead to the public good. Throughout the process, we emphasize communication and dissemination as we work with stakeholders and sponsors to clarify and refine our research questions, tweak our modeling approaches, and ultimately frame our results in a language our target audiences will understand. Equally important and ubiquitous to our framework is a heavy emphasis on ethics. 

Whether DSPG participants take their next step forward as data scientists, policy makers, entrepreneurs, or follow a career path that hasn’t even been thought of yet, our goal is to give them the tools to succeed in whatever career path they choose.

JS. How do we define working towards the “public good” and why is it important for data scientists to operate in this space?

JT & CM. At the University of Virginia Biocomplexity Institute's Social and Decision Analytics Division we think about data science for the public good as turning data into action to benefit communities. For us, “action” often comes in the form of informing policy decisions by local, state, or federal decision makers. People benefit, for example, when we use data to help emergency services identify areas with a high risk of losing access to staple food items in the event of a natural disaster. Or when we use data to assist county officials to underserved families facing food insecurity in areas generally considered to be higher-income neighborhoods. But actions can be anything that generates, maintains, or improves the welfare of individuals and communities. Data Science for the public good could involve developing tools to understand complex problems such as climate change, combatting misinformation, building trust in science, or working with public and private partners to ensure that data are used in an ethical and responsible manner.

Learn about recent research projects here: https://biocomplexity.virginia.edu/our-research/research-projects.

JS. What are common obstacles data scientists face when trying to serve the public good?

JT & CM. Providing evidence-based insights to policymakers has its share of obstacles, such as the lack of timely data access, non-standardized data, incomplete information, and limited geography reporting. TThere are also privacy concerns beyond the higher profile issues of security breaches and cyberattacks. We are very careful to ensure that malicious actors cannot leverage the insights and data we provide to identify specific individuals or groups of people. Data scientists face technical challenges such as the need for large-scale and costly computing infrastructure. Our work often involves complex data curation and maintenance systems that require extensive amounts of time, money, and human talent. We seek to overcome these obstacles through collaborative and multidisciplinary efforts.

One way in which researchers at the Social and Decision Analytics Division address some of these challenges is by employing our Community Learning through Data-Driven Discovery process. By following this community-focused approach, we ensure that the individuals most impacted by our work are involved in each stage of the research process.

More information about our CLD3 approach can be found online at https://datascienceforthepublicgood.org/economic-mobility/research-framework

JS. What specific knowledge and skills do data scientists need to keep this focus?

Team building at the puzzle table

JT & CM. Working in a transdisciplinary research environment like the Biocomplexity Institute, you realize that a range of skills are necessary to be a successful data scientist and serve the public good. There are technical skills necessary to identify, acquire, and manipulate data (e.g., learning programming languages). There are also statistical theory, research, and critical thinking skills (e.g., measurement theory and hypothesis testing). In some cases, you may also need a level of subject matter expertise in a particular topic (e.g., population dynamics, social determinants of health). Furthermore, mainly when working in domains relevant to the public good, you will also need to engage in stakeholder management and science communication since you may be working with groups of people (e.g., policymakers, community leaders, members of the public) who are new to using large amounts of data or highly technical analysis.

If it is not already obvious, since it is nearly impossible for anyone to master all these skills, arguably your most important ability to be a successful data scientist for the public good is the ability to work on a team!

Meet the Data Science for the Public Good Co-Coordinators

Cesar Montalvo is a Research Assistant Professor with the Social and Decision Analytics Division of the Biocomplexity Institute and Initiative. He works at the interface of economics, statistics, mathematical models and public policy. Cesar is an economist who graduated from the University San Francisco de Quito and a Master’s degree in Economics from Iowa State University. He received his Ph.D. in Applied Mathematics for Life and Social Sciences from Arizona State University. His dissertation focused on dynamical systems related to social mobility and education.

He has worked on projects regarding the skilled technical workforce and social mobility at the community level. He is currently leading efforts to develop a new method for calculating food insecurity by developing a comprehensive cost-of-living calculator for communities in the National Capital Region.

Cesar is driven by a strong desire to carry out research and practice that contribute to reduce poverty and inequality in our communities.

Joel Thurston is a Senior Scientist with the UVA Biocomplexity Institute Social and Decision Analytics division. Joel received his Ph.D. in Social Psychology from the University of California Santa Barbara (UCSB). Prior to joining the Biocomplexity Institute, he worked for the UCSB Center for Evolutionary Psychology and the U.S. Army Research Institute.

Joel has a long-standing interest in the interface of group perception and group dynamics, conceptualizing and measuring emergent group properties, and the science of team science. He seeks to apply measurement theory and social science research methodologies to develop analytic techniques for administrative data, addressing topics such as how intragroup processes contribute to performance for the U.S. Army. He is currently leading efforts to combine natural language processing techniques with qualitative analysis to identify characteristics of U.S. Army Soldiers that predict individual and unit performance.

Joel and Cesar are part of a team to develop an equity-focused data commons to be used by local, state, and regional government stakeholders to address social issues across the National Capital Region.

To learn more about our Data Science Framework please see https://biocomplexity.virginia.edu/data-science-framework and https://hdsr.mitpress.mit.edu/pub/hnptx6lq/release/10.


More Methodspace posts about data science and computational social science

Previous
Previous

Netnography Explained

Next
Next

Books about Ethnography