Ethical Dilemmas for Data Collection on Social Media

by Janet Salmons, Research Community Manager for SAGE Methodspace
Dr. Salmons is the author of Doing Qualitative Research Online, which focuses on ethical research and writing, and What Kind of Researcher Are You? which focuses on researcher integrity. Use the code COMMUNIT24 for 25% off through December 31, 2024.


How can researchers use ethical practices to collect data online?

The web is full of casual social media posts, digital footprints, interactions in online discussions, and formally-archived documents. (See this post Finding Data in Documents and Datasets.) This wealth of material is irresistible to social researchers who are trying to understand contemporary experiences, perspectives, and events. The ethical collection and use of such material is anything but straightforward.

This set of open-access articles offers recent scholarly thinking about ethics and online data collection.


Chua, S. M. (2022). Navigating conflict between research ethics and online platform terms and conditions: a reflective account. Research Ethics, 18(1), 39–50. https://doi.org/10.1177/17470161211045526
Abstract. Internet users’ comments in online spaces have attracted researchers’ attention in recent years. Although this data is typically publicly available, its use requires careful consideration so as to not cause harm to the users, while complying with the terms and conditions (Ts & Cs) of the online spaces. However, the Ts & Cs and researchers’ ethical considerations may sometimes be in conflict. I faced such a conflict when I conducted discourse analysis of online discussions that were sourced from a public online learning platform owned by a private company. In this article, I reflect on how I navigated the Ts & Cs and copyright law, taking users’ likely expectations into consideration when deciding whether to seek informed consent and anonymize content. I employed an ‘attribution with anonymization’ method to acknowledge users for their comments while safeguarding their confidentiality. Given the variety of online spaces and research methods, ethical decision-making must be a contextualized process that requires researchers to consider the nature of the online platform and the potential experience of the users, rather than simply following guidelines or Ts & Cs.

Felderer, B., & Blom, A. G. (2022). Acceptance of the Automated Online Collection of Geographical Information. Sociological Methods & Research, 51(2), 866–886. https://doi.org/10.1177/0049124119882480
Abstract. The ease at which online paradata can be captured in web surveys seems to increase social researchers’ desire to collect such data. Yet little attention is paid to whether respondents actually approve of their collection. This article, therefore, studies online survey respondents’ acceptance of automatically collecting their geographical locations. In wave 4 of the German Internet Panel, we asked respondents for their consent to automatically track their location using a JavaScript. Respondents were also asked to report their location in a set of traditional survey questions. About 62 percent of respondents consented to the automated collection of their location whereas 97 percent provided their location manually. With respect to consent biases, we find evidence that the composition of the achieved sample of geo-located respondents is biased and that the personal characteristics associated with respondents’ willingness to be geo-located differ between the automated tracking and manual provision of geo-information.

Friedman, M. S., Chiu, C. J., Croft, C., Guadamuz, T. E., Stall, R., & Marshal, M. P. (2016). Ethics of Online Assent: Comparing Strategies to Ensure Informed Assent Among Youth. Journal of Empirical Research on Human Research Ethics, 11(1), 15–20. https://doi.org/10.1177/1556264615624809 (Find an open-access version here.)
Abstract. Individuals, including youth, often participate in online research without understanding the characteristics of studies they have agreed to be part of. We assessed the impact of including questions as part of the assent process by randomizing 568 youth to one of three groups: (a) asking youth to only read study information and then indicate their willingness to participate, (b) requiring youth to answer two questions about the study’s risks and voluntary nature as part of the assent process, and (c) requiring youth to answer seven questions. Participants in the two- and seven-question groups, compared with the no-question group, were less likely to complete the assent process but, among those who did complete it, were more likely to read and understand study information.

Gliniecka, M. (2023). The Ethics of Publicly Available Data Research: A Situated Ethics Framework for Reddit. Social Media + Society, 9(3). https://doi.org/10.1177/20563051231192021

Abstract. Using user-generated content from open-access platforms such as Reddit for research raises ethical questions and challenges. Research projects involving publicly available data can qualify for an exemption from human research ethics review. However, when the exemption is granted, some scholars move to the data collection phase without attending further to ethical considerations. This does not always result from negligence but can be driven by the lack of coherent guidelines or limitations of procedural ethics. Despite receiving an exemption from ethics review, researchers can still engage with ethical concerns throughout the project. This article argues that a “situated ethics approach” to researching publicly available online data, which pays attention to flexibility, reflexivity, and complexity of research ethics, should be applied to projects working with data from user-led platforms—Reddit or others. Using a reflexive process and drawing iteratively on learnings, this article describes and analyses a situated ethics framework applied to a case study of doctoral research about youth health discussions on Reddit. Through a focus on three key areas: digital context, users’ views, and project specificity, the framework inspired a set of ethical questions that can assist with applying situated ethics to other studies. This paper advocates that a “situated ethics approach” to researching publicly available online data can usefully advance debates and practice in research on user-led platforms with public data, such as Reddit.

Klassen, S., & Fiesler, C. (2022). “This Isn’t Your Data, Friend”: Black Twitter as a Case Study on Research Ethics for Public Data. Social Media + Society, 8(4). https://doi.org/10.1177/20563051221144317

Abstract. While research has been conducted with and in marginalized or vulnerable groups, explicit guidelines and best practices centering on specific communities are nascent. An excellent case study to engage within this aspect of research is Black Twitter. This research project considers the history of research with Black communities, combined with empirical work that explores how people who engage with Black Twitter think about research and researchers in order to suggest potential good practices and what researchers should know when studying Black Twitter or other digital traces from marginalized or vulnerable online communities. From our interviews, we gleaned that Black Twitter users feel differently about their content contributing to a research study depending on, for example, the type of content and the positionality of the researcher. Much of the advice participants shared for researchers involved an encouragement to cultivate cultural competency, get to know the community before researching it, and conduct research transparently. Aiming to improve the experience of research for both Black Twitter and researchers, this project is a stepping stone toward future work that further establishes and expands user perceptions of research ethics for online communities composed of vulnerable populations.

Mackenzie, E., Berger, N., Holmes, K., & Walker, M. (2021). Online educational research with middle adolescent populations: Ethical considerations and recommendations. Research Ethics, 17(2), 217–227. https://doi.org/10.1177/1747016120963160

Abstract. Adolescent populations have become increasingly accessible through online data collection methods. Online surveys are advantageous in recruiting adolescent participants and can be designed for adolescents to provide informed consent without the requirement of parental consent. This study sampled 338 Australian adolescents to participate in a low risk online survey on adolescents’ experiences and perceptions of their learning in science classes, without parental consent. Adolescents were recruited through Facebook and Instagram advertising. In order to judge potential participants’ capacity to consent, two multiple-choice questions about the consent process were required to be answered correctly prior to accessing the survey. This simple strategy effectively determined whether middle adolescents had the capacity to provide informed consent to participate in low risk online educational research.

Mahoney, J., Le Louvier, K., Lawson, S., Bertel, D., & Ambrosetti, E. (2022). Ethical considerations in social media analytics in the context of migration: lessons learned from a Horizon 2020 project. Research Ethics, 18(3), 226–240. https://doi.org/10.1177/17470161221087542
Abstract. The ubiquitous use of social platforms across the globe makes them attractive options for investigating social phenomena including migration. However, the use of social media data raises several crucial ethical issues around the areas of informed consent, anonymity and profiling of individuals, which are particularly sensitive when looking at a population such as migrants, which is often considered as ‘vulnerable’. In this paper, we discuss how the opportunities and challenges related to social media research in the context of migration impact on the development of large-scale scientific projects. Building on the EU-funded research project PERCEPTIONS, we explore the concrete challenges experienced in such projects regarding profiling, informed consent, bias, data sharing and ethical approval procedures, as well as the strategies used to mitigate them. We draw from lessons learned in this project to discuss implications and recommendations to researchers, funders and university ethics review panels. This paper contributes to the growing discussion on the ethical challenges associated with big social data research projects on migration by highlighting concrete aspects stakeholders should be looking for and questioning when involved in such large-scale scientific projects where collaboration, data sharing and transformation and practicalities are of importance.

Mancosu, M., & Vegetti, F. (2020). What You Can Scrape and What Is Right to Scrape: A Proposal for a Tool to Collect Public Facebook Data. Social Media + Society. https://doi.org/10.1177/2056305120940703
Abstract. In reaction to the Cambridge Analytica scandal, Facebook has restricted the access to its Application Programming Interface (API). This new policy has damaged the possibility for independent researchers to study relevant topics in political and social behavior. Yet, much of the public information that the researchers may be interested in is still available on Facebook, and can be still systematically collected through web scraping techniques. The goal of this article is twofold. First, we discuss some ethical and legal issues that researchers should consider as they plan their collection and possible publication of Facebook data. In particular, we discuss what kind of information can be ethically gathered about the users (public information), how published data should look like to comply with privacy regulations (like the GDPR), and what consequences violating Facebook’s terms of service may entail for the researcher. Second, we present a scraping routine for public Facebook posts, and discuss some technical adjustments that can be performed for the data to be ethically and legally acceptable. The code employs screen scraping to collect the list of reactions to a Facebook public post, and performs a one-way cryptographic hash function on the users’ identifiers to pseudonymize their personal information, while still keeping them traceable within the data. This article contributes to the debate around freedom of internet research and the ethical concerns that might arise by scraping data from the social web.

Özkula SM. The Issue of “Context”: Data, Culture, and Commercial Context in Social Media Ethics. Journal of Empirical Research on Human Research Ethics. 2020;15(1-2):77-86. doi:10.1177/1556264619874646

Abstract. One of the central concerns in research ethics in recent years has been the vast amount of data available from social media platforms and the related concerns around what establishes an ethical use of data. Toward addressing these challenges, researchers have therefore called for the consideration of “context” in Internet research. However, context remains a fuzzy concept and little guidance exists on its different dimensions. In response to this issue, this article uses worked examples from three data sets to discuss three different dimensions of “context”: data context, cultural context, and commercial context. The article problematizes these dimensions and offers suggestions toward creating ethical sensibility to these by drawing on two data sets from 2017: (a) climate change imagery scraped from five social platforms and (b) digital-ethnographic work at the climate summit COP23.

Perrault, E. K., & Keating, D. M. (2018). Seeking Ways to Inform the Uninformed: Improving the Informed Consent Process in Online Social Science Research. Journal of Empirical Research on Human Research Ethics, 13(1), 50–60. https://doi.org/10.1177/1556264617738846 (Find an open-access version here.)
Abstract. Participants often do not read consent forms in social science research. This is not surprising, especially for online studies, given they do not typically offer greater risk than what is encountered in daily life. However, if no one is reading, are participants really informed? This study used previous research to craft experimentally manipulated consent forms utilizing different visual presentations (e.g., greater use of line spacing, bullets, bolding, diagrams). Participants (n = 547) were randomly exposed to one of seven form variations. Results found no significant differences between forms in reading or comprehension. Open-ended questions asked participants why they do not read consent forms and what would influence them to read the forms. Participants most frequently stated forms need to be shorter, and important information needs to be highlighted. We suggest improvements to informed consent forms, including removing much of the information that is constant across forms, and only including unique aspects of studies.

Ravn S, Barnwell A, Barbosa Neves B. What Is “Publicly Available Data”? Exploring Blurred Public–Private Boundaries and Ethical Practices Through a Case Study on Instagram. Journal of Empirical Research on Human Research Ethics. 2020;15(1-2):40-45. doi:10.1177/1556264619850736

Abstract. This article adds to the literature on ethics in digital research by problematizing simple understandings of what constitutes “publicly available data,” thereby complicating common “consent waiver” approaches. Based on our recent study of representations of family life on Instagram, a platform with a distinct visual premise, we discuss the ethical challenges we encountered and our practices for moving forward. We ground this in Lauren Berlant’s concept of “intimate publics” to conceptualize the different understandings of “publics” that appear to be at play. We make the case for a more reflexive approach to social media research ethics that builds on the socio-techno-ethical affordances of the platform to address difficult questions about how to determine social media users’ diverse, and sometimes contradictory, understandings of what is “public.”

Stommel, W., & Rijk, L. de. (2021). Ethical approval: none sought. How discourse analysts report ethical issues around publicly available online data. Research Ethics, 17(3), 275–297. https://doi.org/10.1177/1747016120988767

Abstract. Although ethical guidelines for doing Internet research are available, most prominently those of the Association of Internet Researchers (www.aoir.org), ethical decision-making for research on publicly available, naturally-occurring data remains a major challenge. As researchers might also turn to others to inform their decisions, this article reviews recent research papers on publicly available, online data. Research involving forums such as Facebook pages, Twitter, YouTube, news comments, blogs, etc. is examined to see how authors report ethical considerations and how they quote these data. We included 132 articles published in discourse analysis-oriented journals between January 2017 and February 2020. Roughly one third of the articles (85 out of 132) did not discuss ethical issues, mostly claiming the data were publicly available. Quotations nevertheless tended to be anonymized, although retrievability of posts was generally not taken into account. In those articles in which ethical concerns were reported, related decisions appeared to vary substantially. In most cases it was argued that informed consent was not required. Similarly, approval from research ethics committees was mostly regarded unnecessary. Other ethical issues like consideration of users’ expectations and intentions, freedom of choice, possible harm, sensitive topics, and vulnerable groups were rarely discussed in the articles. We argue for increased attention to ethical issues and legal aspects in discourse analytic articles involving online data beyond mentioning general concerns. Instead, we argue for more involvement of users/participants in ethical decision-making, for consideration of retrievability of posts and for a role for journal editors.

Note: In a previous post about Research Ethics & Extant Data, I interviewed Dr. Wyke Stommel about this article "Ethical approval: none sought. How discourse analysts report ethical issues around publicly available online data." I also spoke with her about a book she co-authored about Analysing Digital Interaction.


More Methodspace posts about online research ethics

Previous
Previous

Insights and Inspirations from Leading Scholars for Future Computational Social Scientists

Next
Next

Ethics & Interview Platforms