Ethics and Big Data & Social Media Research
In the first quarter of 2021 we explored design steps, starting with a January focus on research questions. We continued to learn about the design stage in February by focusing on Choosing Methodology and Methods. The focus for March is on Designing an Ethical Study.
Big Data, including datasets and data collected from social media, present researchers with an "n" beyond what any single researcher could assemble. There is no single approach to using these data sources, with both qualitative and quantitative researchers across disciplines carrying out such research. The curated list of open access articles listed here shows the scope of methodological thinking.
One of the most reliable sources of guidance on ethical issues in Big Data research is the ethics guidelines developed by the Association of Internet Researchers (AoIR.) They have created a series of four guides published from 2002 to 2019, and rather than replacing earlier ones, each new set builds on the prior document. Find all of them here.
Brower, R. L., Jones, T. B., Osborne-Lampkin, L., Hu, S., & Park-Gaghan, T. J. (2019). Big Qual: Defining and Debating Qualitative Inquiry for Large Data Sets. International Journal of Qualitative Methods. https://doi.org/10.1177/1609406919880692
Abstract. Big qualitative data (Big Qual), or research involving large qualitative data sets, has introduced many newly evolving conventions that have begun to change the fundamental nature of some qualitative research. In this methodological essay, we first distinguish big data from big qual. We define big qual as data sets containing either primary or secondary qualitative data from at least 100 participants analyzed by teams of researchers, often funded by a government agency or private foundation, conducted either as a stand-alone project or in conjunction with a large quantitative study. We then present a broad debate about the extent to which big qual may be transforming some forms of qualitative inquiry. We present three questions, which examine the extent to which large qualitative data sets offer both constraints and opportunities for innovation related to funded research, sampling strategies, team-based analysis, and computer-assisted qualitative data analysis software (CAQDAS). The debate is framed by four related trends to which we attribute the rise of big qual: the rise of big quantitative data, the growing legitimacy of qualitative and mixed methods work in the research community, technological advances in CAQDAS, and the willingness of government and private foundations to fund large qualitative projects.
Cooky, C., Linabary, J. R., & Corple, D. J. (2018). Navigating Big Data dilemmas: Feminist holistic reflexivity in social media research. Big Data & Society. https://doi.org/10.1177/2053951718807731
Abstract. Social media offers an attractive site for Big Data research. Access to big social media data, however, is controlled by companies that privilege corporate, governmental, and private research firms. Additionally, Institutional Review Boards’ regulative practices and slow adaptation to emerging ethical dilemmas in online contexts creates challenges for Big Data researchers. We examine these challenges in the context of a feminist qualitative Big Data analysis of the hashtag event #WhyIStayed. We argue power, context, and subjugated knowledges must each be central considerations in conducting Big Data social media research. In doing so, this paper offers a feminist practice of holistic reflexivity in order to help social media researchers navigate and negotiate this terrain.
Mancosu, M., & Vegetti, F. (2020). What You Can Scrape and What Is Right to Scrape: A Proposal for a Tool to Collect Public Facebook Data. Social Media + Society. https://doi.org/10.1177/2056305120940703
Abstract. In reaction to the Cambridge Analytica scandal, Facebook has restricted the access to its Application Programming Interface (API). This new policy has damaged the possibility for independent researchers to study relevant topics in political and social behavior. Yet, much of the public information that the researchers may be interested in is still available on Facebook, and can be still systematically collected through web scraping techniques. The goal of this article is twofold. First, we discuss some ethical and legal issues that researchers should consider as they plan their collection and possible publication of Facebook data. In particular, we discuss what kind of information can be ethically gathered about the users (public information), how published data should look like to comply with privacy regulations (like the GDPR), and what consequences violating Facebook’s terms of service may entail for the researcher. Second, we present a scraping routine for public Facebook posts, and discuss some technical adjustments that can be performed for the data to be ethically and legally acceptable. The code employs screen scraping to collect the list of reactions to a Facebook public post, and performs a one-way cryptographic hash function on the users’ identifiers to pseudonymize their personal information, while still keeping them traceable within the data. This article contributes to the debate around freedom of internet research and the ethical concerns that might arise by scraping data from the social web.
Markham, A. N., Tiidenberg, K., & Herman, A. (2018). Ethics as Methods: Doing Ethics in the Era of Big Data Research—Introduction. Social Media + Society. https://doi.org/10.1177/2056305118784502
Abstract. This is an introduction to the special issue of “Ethics as Methods: Doing Ethics in the Era of Big Data Research.” Building on a variety of theoretical paradigms (i.e., critical theory, [new] materialism, feminist ethics, theory of cultural techniques) and frameworks (i.e., contextual integrity, deflationary perspective, ethics of care), the Special Issue contributes specific cases and fine-grained conceptual distinctions to ongoing discussions about the ethics in data-driven research. In the second decade of the 21st century, a grand narrative is emerging that posits knowledge derived from data analytics as true, because of the objective qualities of data, their means of collection and analysis, and the sheer size of the data set. The by-product of this grand narrative is that the qualitative aspects of behavior and experience that form the data are diminished, and the human is removed from the process of analysis. This situates data science as a process of analysis performed by the tool, which obscures human decisions in the process. The scholars involved in this Special Issue problematize the assumptions and trends in big data research and point out the crisis in accountability that emerges from using such data to make societal interventions. Our collaborators offer a range of answers to the question of how to configure ethics through a methodological framework in the context of the prevalence of big data, neural networks, and automated, algorithmic governance of much of human socia(bi)lity.
Metcalf, J., & Crawford, K. (2016). Where are human subjects in Big Data research? The emerging ethics divide. Big Data & Society. https://doi.org/10.1177/2053951716650211
Abstract. There are growing discontinuities between the research practices of data science and established tools of research ethics regulation. Some of the core commitments of existing research ethics regulations, such as the distinction between research and practice, cannot be cleanly exported from biomedical research to data science research. Such discontinuities have led some data science practitioners and researchers to move toward rejecting ethics regulations outright. These shifts occur at the same time as a proposal for major revisions to the Common Rule—the primary regulation governing human-subjects research in the USA—is under consideration for the first time in decades. We contextualize these revisions in long-running complaints about regulation of social science research and argue data science should be understood as continuous with social sciences in this regard. The proposed regulations are more flexible and scalable to the methods of non-biomedical research, yet problematically largely exclude data science methods from human-subjects regulation, particularly uses of public datasets. The ethical frameworks for Big Data research are highly contested and in flux, and the potential harms of data science research are unpredictable. We examine several contentious cases of research harms in data science, including the 2014 Facebook emotional contagion study and the 2016 use of geographical data techniques to identify the pseudonymous artist Banksy. To address disputes about application of human-subjects research ethics in data science, critical data studies should offer a historically nuanced theory of “data subjectivity” responsive to the epistemic methods, harms and benefits of data science and commerce.
Clark-Parsons, R., & Lingel, J. (2020). Margins as Methods, Margins as Ethics: A Feminist Framework for Studying Online Alterity. Social Media + Society. https://doi.org/10.1177/2056305120913994
Abstract. Prevailing theories of marginalized media position the work of resistance as beneath or less than the institutions against which resistance works, raising a number of methodological and ethical challenges for research on online alterity. We offer a margins-as-methods approach for studies of social media on the margins, directing critical attention to the theoretical, ethical, and political implications of positioning subsets of social media users as peripheral to an imagined center. Drawing on theories of feminist reflexivity and our own fieldwork experiences, we articulate the margins-as-methods approach through two sets of practices: deconstructing the power politics behind theories of alterity and identifying how these power politics shape every stage of the research process. We conclude by offering guiding questions for researchers to reflect on as they evaluate the methodological and ethical challenges specific to their projects. The margins-as-methods approach and the reflexive questions it raises build accountability for how our research process may reinscribe the very power relationships that we, alongside our interlocutors, work to contest.
Taylor, L. (2017). What is data justice? The case for connecting digital rights and freedoms globally. Big Data & Society. https://doi.org/10.1177/2053951717736335
Abstract. The increasing availability of digital data reflecting economic and human development, and in particular the availability of data emitted as a by-product of people’s use of technological devices and services, has both political and practical implications for the way people are seen and treated by the state and by the private sector. Yet the data revolution is so far primarily a technical one: the power of data to sort, categorise and intervene has not yet been explicitly connected to a social justice agenda by the agencies and authorities involved. Meanwhile, although data-driven discrimination is advancing at a similar pace to data processing technologies, awareness and mechanisms for combating it are not. This paper posits that just as an idea of justice is needed in order to establish the rule of law, an idea of data justice – fairness in the way people are made visible, represented and treated as a result of their production of digital data – is necessary to determine ethical paths through a datafying world. Bringing together the emerging scholarly perspectives on this topic, I propose three pillars as the basis of a notion of international data justice: (in)visibility, (dis)engagement with technology and antidiscrimination. These pillars integrate positive with negative rights and freedoms, and by doing so challenge both the basis of current data protection regulations and the growing assumption that being visible through the data we emit is part of the contemporary social contract.
Zimmer, M. (2018). Addressing Conceptual Gaps in Big Data Research Ethics: An Application of Contextual Integrity. Social Media + Society. https://doi.org/10.1177/2056305118768300
Abstract. The rise of big data has provided new avenues for researchers to explore, observe, and measure human opinions, activities, and interactions. While scholars, professional societies, and ethical review boards have long-established research ethics frameworks to ensure the rights and welfare of the research subjects are protected, the rapid rise of big data-based research generates new challenges to long-held ethical assumptions and guidelines. This article discloses emerging conceptual gaps in relation to how researchers and ethical review boards think about privacy, anonymity, consent, and harm in the context of big data research. It closes by invoking Nissenbaum’s theory of “privacy as contextual integrity” as a useful heuristic to guide ethical decision-making in big data research projects.