Studying Sensitive Problems Online

In March 2022 we studied research approaches to use when studying sensitive topics, potentially with vulnerable participants. These kinds of studies have additional challenges when the research is conducted online. This collection of open-access articles includes studies using Big Data, social media posts, focus groups, and other online methods.


Birnholtz, J., Kraus, A., Zheng, W., Moskowitz, D. A., Macapagal, K., & Gergle, D. (2020). Sensitive Sharing on Social Media: Exploring Willingness to Disclose PrEP Usage Among Adolescent Males Who Have Sex With Males. Social Media + Society. https://doi.org/10.1177/2056305120955176

Abstract. Self-presentation, the process by which people disclose information about themselves to others, is fundamental to online interaction and research on communication technology. Technology often mediates the self-presentation process by obscuring who is in the audience via constrained cues and opaque feed algorithms that govern the visibility of social media content. This can make it risky to disclose sensitive or potentially stigmatizing information about oneself, because it could fall into the wrong hands or be seen by an unsupportive audience. Still, there are times when it is socially beneficial to disclose sensitive information, such as LGBTQ+ (lesbian, gay, bisexual, transgender, queer, and others) people expressing their identities or disclosing HIV status. Decisions about sensitive disclosure, moreover, can be even more complicated in today’s social media landscape with many platforms and audiences in play, particularly for younger users who often use many platforms. We lack a good understanding, however, of how people make these decisions. This article addresses questions about sensitive disclosure on social media through a survey study of adolescent men who have sex with men and their willingness to disclose on social media the use of pre-exposure prophylaxis (PrEP), an HIV prevention medication. Results suggest that perceived platform audience composition and platform features such as ephemerality play into disclosure decisions, as well as the perceived normativity of PrEP use among peers.

Edenroth-Cato, F., & Sjöblom, B. (2022). Biosociality in Online Interactions: Youths’ Positioning of the Highly Sensitive Person Category. YOUNG, 30(1), 80–96. https://doi.org/10.1177/11033088211015815

Abstract. This article examines how young people in a Swedish online forum and in blogs engage in discussions of one popularized psychological personality trait, the highly sensitive person (HSP), and how they draw on different positionings in discursive struggles around this category. The material is analysed with concepts from discursive psychology and post-structuralist theory in order to investigate youths’ interactions. The first is a nuanced positioning, from which youths disclose the weaknesses and strengths of being highly sensitive. Some youths become deeply invested in this kind of positioning, hence forming a HSP subjectivity. This can be opposed using contrasting positionings, which objects to norms of biosociality connected to the HSP. Lastly, there are rather distanced and investigative approaches to the HSP category. We conclude that while young people are negotiating the HSP category, they are establishing an epistemological community.

Gilbert, S., Vitak, J., & Shilton, K. (2021). Measuring Americans’ Comfort With Research Uses of Their Social Media Data. Social Media + Society. https://doi.org/10.1177/20563051211033824

Abstract. Research using online datasets from social media platforms continues to grow in prominence, but recent research suggests that platform users are sometimes uncomfortable with the ways their posts and content are used in research studies. While previous research has suggested that a variety of contextual variables may influence this discomfort, such factors have yet to be isolated and compared. In this article, we present results from a factorial vignette survey of American Facebook users. Findings reveal that researcher domain, content type, purpose of data use, and awareness of data collection all impact respondents’ comfort—measured via judgments of acceptability and concern—with diverse data uses. We provide guidance to researchers and ethics review boards about the ways that user reactions to research uses of their data can serve as a cue for identifying sensitive data types and uses.

Greyson, D., Chabot, C., Mniszak, C., & Shoveller, J. A. (2021). Social media and online safety practices of young parents. Journal of Information Science. https://doi.org/10.1177/01655515211053808

Abstract. Studies of parents’ online safety concerns typically centre on information privacy and on worries over unknown third parties preying on children, whereas investigations into youth perspectives on online safety have found young people to focus on threats to safety or reputation by known individuals. The case of youth who are themselves parents raises questions regarding how these differing perspectives are negotiated by individuals who are in dual roles as youth and parents. Using interview and ethnographic observation data from the longitudinal Young Parent Study in British Columbia, Canada, this analysis investigates social media and online safety practices of 113 young parents. Online safety concerns of young parents in this study focused on personal safety, their children’s online privacy and image management. These concerns reflect their dual roles, integrating youth image and information management concerns with parental concerns over the safety and information privacy of their own children.

Haynes, D., & Robinson, L. (2021). Delphi study of risk to individuals who disclose personal information online. Journal of Information Science. https://doi.org/10.1177/0165551521992756

Abstract. A two-round Delphi study was conducted to explore priorities for addressing online risk to individuals. A corpus of literature was created based on 69 peer-reviewed articles about privacy risk and the privacy calculus published between 2014 and 2019. A cluster analysis of the resulting text-base using Pearson’s correlation coefficient resulted in seven broad topics. After two rounds of the Delphi survey with experts in information security and information literacy, the following topics were identified as priorities for further investigation: personalisation versus privacy, responsibility for privacy on social networks, measuring privacy risk, and perceptions of powerlessness and the resulting apathy. The Delphi approach provided clear conclusions about research topics and has potential as a tool for prioritising future research areas.

Skelton, K., Evans, R., LaChenaye, J., Amsbary, J., Wingate, M., & Talbott, L. (2018). Utilization of online focus groups to include mothers: A use-case design, reflection, and recommendations. DIGITAL HEALTH. https://doi.org/10.1177/2055207618777675

Abstract. Advances in technology over the past decade have allowed unique methodologies to emerge, enabling the engagement of hard-to-reach populations on sensitive topics in a way that was before thought not possible with traditional face-to-face modalities. This study aimed to use online focus group discussions (FGDs) to explore breastfeeding mothers’ use of social media. Results indicate participants had a positive experience with online FGDs, and almost all preferred this method to traditional face-to-face focus groups. We discuss reflections of the online FGD experience, including best practices and recommendations for innovative ways to include time-constrained or hard-to-reach participants, for yielding rich qualitative data.

Veale, M., & Binns, R. (2017). Fairer machine learning in the real world: Mitigating discrimination without collecting sensitive data. Big Data & Society. https://doi.org/10.1177/2053951717743530

Abstract. Decisions based on algorithmic, machine learning models can be unfair, reproducing biases in historical data used to train them. While computational techniques are emerging to address aspects of these concerns through communities such as discrimination-aware data mining (DADM) and fairness, accountability and transparency machine learning (FATML), their practical implementation faces real-world challenges. For legal, institutional or commercial reasons, organisations might not hold the data on sensitive attributes such as gender, ethnicity, sexuality or disability needed to diagnose and mitigate emergent indirect discrimination-by-proxy, such as redlining. Such organisations might also lack the knowledge and capacity to identify and manage fairness issues that are emergent properties of complex sociotechnical systems. This paper presents and discusses three potential approaches to deal with such knowledge and information deficits in the context of fairer machine learning. Trusted third parties could selectively store data necessary for performing discrimination discovery and incorporating fairness constraints into model-building in a privacy-preserving manner. Collaborative online platforms would allow diverse organisations to record, share and access contextual and experiential knowledge to promote fairness in machine learning systems. Finally, unsupervised learning and pedagogically interpretable algorithms might allow fairness hypotheses to be built for further selective testing and exploration. Real-world fairness challenges in machine learning are not abstract, constrained optimisation problems, but are institutionally and contextually grounded. Computational fairness tools are useful, but must be researched and developed in and with the messy contexts that will shape their deployment, rather than just for imagined situations. Not doing so risks real, near-term algorithmic harm.

Zhou, N., Wang, L., Marino, S., Zhao, Y., & Dinov, I. D. (2022). DataSifter II: Partially synthetic data sharing of sensitive information containing time-varying correlated observations. Journal of Algorithms & Computational Technology. https://doi.org/10.1177/17483026211065379

Abstract. There is a significant public demand for rapid data-driven scientific investigations using aggregated sensitive information. However, many technical challenges and regulatory policies hinder efficient data sharing. In this study, we describe a partially synthetic data generation technique for creating anonymized data archives whose joint distributions closely resemble those of the original (sensitive) data. Specifically, we introduce the DataSifter technique for time-varying correlated data (DataSifter II), which relies on an iterative model-based imputation using generalized linear mixed model and random effects-expectation maximization tree. DataSifter II can be used to generate synthetic repeated measures data for testing and validating new analytical techniques. Compared to the multiple imputation method, DataSifter II application on simulated and real clinical data demonstrates that the new method provides extensive reduction of re-identification risk (data privacy) while preserving the analytical value (data utility) in the obfuscated data. The performance of the DataSifter II on a simulation involving 20% artificially missingness in the data, shows at least 80% reduction of the disclosure risk, compared to the multiple imputation method, without a substantial impact on the data analytical value. In a separate clinical data (Medical Information Mart for Intensive Care III) validation, a model-based statistical inference drawn from the original data agrees with an analogous analytical inference obtained using the DataSifter II obfuscated (sifted) data. For large time-varying datasets containing sensitive information, the proposed technique provides an automated tool for alleviating the barriers of data sharing and facilitating effective, advanced, and collaborative analytics.


More Methodspace Posts about Studying Sensitive Issues

Previous
Previous

Online Interviews about Sensitive Topics

Next
Next

Compassion Fatigue: The Potential Impact of Sensitive Research on the Researcher