Collaboration: Human Skills in a Big Data World
The focus for October 2022 is on collaboration, including co-research, co-authorship, or co-editing.
Researchers rarely succeed alone. The ability to collaborate is essential. Nowhere is the ability to reach across disciplines and professional fields more important than in Big Data research.
SAGE Research Methods offers a succinct definition of Big Data: "Data sets so large or complex that traditional data processing tools are inadequate." To our point: our data processing skills might be inadequate! These complex, and yes big projects can stretch even the most resourceful researcher to the limits of their knowledge and expertise.
Collaborative Partnerships in Big Data Research
Big Data researchers have much to contribute to research teams, and at the same time, they need insights from others. Some social science researchers have the training and background needed to work with enormous datasets. If not, collaborating with others who have the technical expertise can help.
Here is an example from Carl Miller, Director of Research at the Centre for the Analysis of Social Media. He described how working with technologists allowed non-technical researchers to make use of Big Data. This excerpt is from a UK National Center for Research Methods newsletter article:
We established the Centre for the Analysis of Social Media that brought together social and policy researchers at Demos, and technologists from the University of Sussex with the explicit aim of confronting this challenge. The first layer of the challenge has been the technology itself. The tools of big data analysis needed to be put into the hands of non-technical researchers: the subject matter experts who have long understood social science, and now needed to be able to do it in a new way. We built a technology platform, Method52, which allowed non-technical users to use a graphical user interface, and drag-and-drop components to flexibly conduct big data analytics, rather than be faced with a screen full of code.
This example demonstrates ways today's research calls on us to think about research as a team, rather than a solo, activity. Find a recent report from the CASM about technology and research here.
Here are four types of partnerships, with related open-access research examples. As you can see, such collaborations are often multidisciplinary.
Big Data Researchers + Data Scientists
Big Data researchers collaborate with programmers and other data scientists.
van der Vlist, F. N., & Helmond, A. (2021). How partners mediate platform power: Mapping business and data partnerships in the social media ecosystem. Big Data & Society, 8(1). https://doi.org/10.1177/20539517211025061
Abstract. Social media platforms’ digital advertising revenues depend considerably on partnerships. Business partnerships are endemic and essential to the business of platforms, yet their role remains relatively underexplored in the literature on platformisation and platform power. This article considers the significance of partnerships in the social media ecosystem to better understand how industry platforms, and the infrastructure they build, mediate and shape platform power and governance. We argue that partners contribute to ‘platformisation’ through their collective development of business-to-business platform infrastructures. Specifically, we examine how partners have integrated social media platforms with what we call the audience economy – an exceptionally complex global and interconnected marketplace of intermediaries involved in the creation, commodification, analysis, and circulation of data audiences for purposes including but not limited to digital advertising and marketing. We determined which relationships are involved, which are exclusive or shared, and identified key ecosystem partners. Further, we found that partners build and integrate extensive infrastructures for data-sourcing and media distribution, surfacing infrastructural and strategic sources and locations, or ‘nodes’, of power in this ecosystem. The empirical findings thus highlight the significance of partnerships and partner integrations and draw attention to the powerful industry players and intermediaries that remain largely invisible.
Zhao W, Yin Z, Fan T, Luo J. Research on influence spread of scientific research team based on scientific factor quantification of big data. International Journal of Distributed Sensor Networks. 2019;15(4). doi:10.1177/1550147719842158
Abstract. With the development of science and technology, the interactions among scientific research teams become more and more frequent, and their relationship and behavior become more and more complex. Many researches mainly adopt complex network to analyze, but these researches only consider some aspects of scientific research factors, so lack of comprehensive consideration. From the aspect of ability, resource, activity, and familiarity, scientific research factors are quantified based on multi-source data of scientific and technological big data, and some factors of text information are similarly quantified. Based on paper citation and project cooperation, a complex network which takes scientific research team as node is constructed and is weighted by quantification of scientific research factor. The experiment of influence spread is carried out by the comparison of unweighted network and weighted network, the comparison of single node and multiple nodes, and the comparison of influence spread and other index. The results show that the scientific research factor is closely related to the influence spread; the proposed scientific research factor quantification improves the analysis of scientific research team relationship. The relationship between influence spread and the number of related communities is greater than the number of adjacent nodes. In addition, the influence spread can effectively reflect the importance of scientific research team.
Big Data Researchers + Qualitative Researchers
Researchers sometimes find that they want to dig into real-life stories that explain the trends or patterns discovered in Big Data studies. Here are two articles that discuss ways that researchers work together:
Bjerre-Nielsen, A., & Glavind, K. L. (2022). Ethnographic data in the age of big data: How to compare and combine. Big Data & Society, 9(1). https://doi.org/10.1177/20539517211069893
Abstract. Big data enables researchers to closely follow the behavior of large groups of individuals by using high-frequency digital traces. However, these digital traces often lack context, and it is not always clear what is measured. In contrast, data from ethnographic fieldwork follows a limited number of individuals but can provide the context often lacking from big data. Yet, there is an under-explored potential in combining ethnographic data with big data and other digital data sources. This paper presents ways that quantitative research designs can combine big data and ethnographic data and account for the synergies that such combinations can provide. We highlight the differences and similarities between ethnographic data and big data, focusing on the three dimensions: individuals, depth of information, and time. We outline how ethnographic data can validate big data by providing a “ground truth” and complement it by giving a “thick description.” Further, we lay out ways that analysis carried out using big data could benefit from collaboration with ethnographers, and we discuss the potential within the fields of machine learning and causal inference.
Bornakke, T., & Due, B. L. (2018). Big–Thick Blending: A method for mixing analytical insights from big and thick data sources. Big Data & Society, 5(1). https://doi.org/10.1177/2053951718765026
Abstract. Recent works have suggested an analytical complementarity in mixing big and thick data sources. These works have, however, remained as programmatic suggestions, leaving us with limited methodological inputs on how to archive such complementary integration. This article responds to this limitation by proposing a method for ‘blending’ big and thick analytical insights. The paper first develops a methodological framework based on the cognitivist linguistics terminology of ‘blending’. Two cases are then explored in which blended spaces are crafted from engaging big and thick analytical insights with each other. Through these examples, we learn how blending processes should be conducted as a rapid, iterative and collaborative effort with respect for individual expertise. Further, we demonstrate how the unique, but often overlooked, granularity of big data plays a key role in affording the blending with thick data. We conclude by suggesting four commonly appearing blending strategies that can be applied when relying upon big and thick data sources.
Choroszewicz, M. (2022). Emotional labour in the collaborative data practices of repurposing healthcare data and building data technologies. Big Data & Society, 9(1). https://doi.org/10.1177/20539517221098413
Abstract. This article focuses on emotions, conceptualised as emotional labour, evoked during data practices used to repurpose and enable healthcare data journeys for Finnish public healthcare. Combined approaches from critical data studies and the sociology of emotions were used to contribute to a better understanding of the mundane but often invisible work of the emotions of experts involved in data practices, such as facilitating data journeys and building data technologies. The article is based on a two-and-a-half-year ethnographic study conducted in a Finnish regional public healthcare and social service organisation. The study results were derived from the analysis of 39 interviews and fieldnotes produced by observing 170 h of various meetings, events and work activities performed by experts. The results were organised into three forms of observed experts’ emotional labour related to three phases of healthcare data journeys: (a) caring for data production and preparing data for travel, (b) managing excitement and frustration in data processing for continually building the data management system, and (c) reassuring users in making sense of obtained data analytics. The results contribute to a greater understanding of the emotions and emotional labour generated by healthcare data journeys and in relation to the volatile nature of healthcare data and the collaborative character of data practices. This work advocates for a better recognition of the emotional aspects of data practices and their implications on data-based knowledge and datafication processes in healthcare.
Ford, H. (2014). Big Data and Small: Collaborations between ethnographers and data scientists. Big Data & Society, 1(2). https://doi.org/10.1177/2053951714544337
Abstract. In the past three years, Heather Ford—an ethnographer and now a PhD student—has worked on ad hoc collaborative projects around Wikipedia sources with two data scientists from Minnesota, Dave Musicant and Shilad Sen. In this essay, she talks about how the three met, how they worked together, and what they gained from the experience. Three themes became apparent through their collaboration: that data scientists and ethnographers have much in common, that their skills are complementary, and that discovering the data together rather than compartmentalizing research activities was key to their success.
Big Data Researchers + Librarians
Libraries and librarians are taking new roles as research partners with data scientists. As a result, cultural and historical documents and materials are being made available to researchers. These articles discuss the opportunities:
Ames, S., & Lewis, S. (2020). Disrupting the library: Digital scholarship and Big Data at the National Library of Scotland. Big Data & Society, 7(2). https://doi.org/10.1177/2053951720970576
Abstract. With a mass digitisation programme underway and the addition of non-print legal deposit and web archive collections, the National Library of Scotland is now both producing and collecting data at an unprecedented rate, with over 5PB of storage in the Library’s data centres. As well as the opportunities to support large scale analysis of the collections, this also presents new challenges around data management, storage, rights, formats, skills and access. Furthermore, by assuming the role of both creators and collectors, libraries face broader questions about the concepts of ‘collections' and ‘heritage', and the ethical implications of collecting practices. While the ‘collections as data’ movement has encouraged cultural heritage organisations to present collections in machine-readable formats, new services, processes and tools also need to be established to enable these emerging forms of research, and new modes of working need to be established to take into account an increasing need for transparency around the creation and presentation of digital collections. This commentary explores the National Library of Scotland's new digital scholarship service, the implications of this new activity and the obstacles that libraries encounter when navigating a world of Big Data.
Lynch, R., Young, J. C., Jowaisas, C., Rothschild, C., Garrido, M., Sam, J., & Boakye-Achampong, S. (2021). Data challenges for public libraries: African perspectives and the social context of knowledge. Information Development, 37(2), 292–306. https://doi.org/10.1177/0266666920907118
Abstract. This article sheds light on the collection and use of data by libraries in sixteen countries across Africa. It highlights the challenges that librarians and library organizations face in gathering, analyzing, and presenting data of various types for self-advocacy. In this study, qualitative data from a meeting of library representatives was analyzed to identify main challenges including: data integrity in terms of completeness, accuracy, credibility, and relevancy; infrastructure; capacity; local investment in libraries; time; and participation of data collectors and respondents. Implications for those collecting data on African libraries as well as those supporting the use of data in these contexts are discussed. The purpose of this paper is not to feed into representations of African libraries as chronically under-resourced and lacking in capacity, but rather, to constructively engage with first-hand accounts of how librarians are experiencing and navigating barriers in order to offer potential avenues forward for the field.
Terras, M., Coleman, S., Drost, S., Elsden, C., Helgason, I., Lechelt, S., Osborne, N., Panneels, I., Pegado, B., Schafer, B., Smyth, M., Thornton, P., & Speed, C. (2021). The value of mass-digitised cultural heritage content in creative contexts. Big Data & Society, 8(1). https://doi.org/10.1177/20539517211006165
Abstract. How can digitised assets of Galleries, Libraries, Archives and Museums be reused to unlock new value? What are the implications of viewing large-scale cultural heritage data as an economic resource, to build new products and services upon? Drawing upon valuation studies, we reflect on both the theory and practicalities of using mass-digitised heritage content as an economic driver, stressing the need to consider the complexity of commercial-based outcomes within the context of cultural and creative industries. However, we also problematise the act of considering such heritage content as a resource to be exploited for economic growth, in order to inform how we consider, develop, deliver and value mass-digitisation. Our research will be of interest to those wishing to understand a rapidly changing research and innovation landscape, those considering how to engage memory institutions in data-driven activities and those critically evaluating years of mass-digitisation across the heritage sector.
Big Data Researchers + Policy Makers
Policy-makers and decision-makers rely on data. Sometimes they use datasets or published reports, but important projects involve deeper working relationships between researchers and government or non-governmental agencies, business, healthcare, nonprofit or community-based organizations.
Espinoza, M. I., & Aronczyk, M. (2021). Big data for climate action or climate action for big data? Big Data & Society, 8(1). https://doi.org/10.1177/2053951720982032
Abstract. Under the banner of “data for good,” companies in the technology, finance, and retail sectors supply their proprietary datasets to development agencies, NGOs, and intergovernmental organizations to help solve an array of social problems. We focus on the activities and implications of the Data for Climate Action campaign, a set of public–private collaborations that wield user data to design innovative responses to the global climate crisis. Drawing on in-depth interviews, first-hand observations at “data for good” events, intergovernmental and international organizational reports, and media publicity, we evaluate the logic driving Data for Climate Action initiatives, examining the implications of applying commercial datasets and expertise to environmental problems. Despite the increasing adoption of Data for Climate Action paradigms in government and public sector efforts to address climate change, we argue Data for Climate Action is better seen as a strategy to legitimate extractive, profit-oriented data practices by companies than a means to achieve global goals for environmental sustainability.
Löfgren, K., & Webster, C. W. R. (2020). The value of Big Data in government: The case of ‘smart cities.’ Big Data & Society, 7(1). https://doi.org/10.1177/2053951720912775
Abstract. The emergence of Big Data has added a new aspect to conceptualizing the use of digital technologies in the delivery of public services and for realizing digital governance. This article explores, via the ‘value-chain’ approach, the evolution of digital governance research, and aligns it with current developments associated with data analytics, often referred to as ‘Big Data’. In many ways, the current discourse around Big Data reiterates and repeats established commentaries within the eGovernment research community. This body of knowledge provides an opportunity to reflect on the ‘promise’ of Big Data, both in relation to service delivery and policy formulation. This includes, issues associated with the quality and reliability of data, from mixing public and private sector data, issues associated with the ownership of raw and manipulated data, and ethical issues concerning surveillance and privacy. These insights and the issues raised help assess the value of Big Data in government and smart city.
McCosker, A., Yao, X., Albury, K., Maddox, A., Farmer, J., & Stoyanovich, J. (2022). Developing data capability with non-profit organisations using participatory methods. Big Data & Society, 9(1). https://doi.org/10.1177/20539517221099882
Abstract. In this paper, we explore the methodologies underpinning two participatory research collaborations with Australian non-profit organisations that aimed to build data capability and social benefit in data use. We suggest that studying and intervening in data practices in situ, that is, in organisational data settings expands opportunities for improving the social value of data. These situated and collaborative approaches not only address the ‘expertise lag’ for non-profits but also help to realign the potential social value of organisational data use. We explore the relationship between data literacy, data expertise and data capability to test the idea that collaborative work with non-profit organisations can be a practical step towards addressing data equity and generating data-driven social outcomes. Rather than adopting approaches to data literacy that focus on individuals – or ideal ‘data citizens’ – we target the organisation-wide data settings, goals and practices of the non-profit sector. We conclude that participatory methods can embed social value-generating data capability where it can be sustained at an organisational level, aligning with community needs to promote collaborative data action.
Van Rossem, W., & Pelizza, A. (2022). The ontology explorer: A method to make visible data infrastructures for population management. Big Data & Society, 9(1). https://doi.org/10.1177/20539517221104087
Abstract. This article introduces the methodology of the ‘Ontology Explorer’, a semantic method and JavaScript-based open-source tool to analyse data models underpinning information systems. The Ontology Explorer has been devised and developed by the authors, who recognized a need to compare data models collected in different formats and used by diverse systems. The Ontology Explorer is distinctive firstly because it supports analyses of information systems that are not immediately comparable and, secondly, because it systematically and quantitatively supports discursive analysis of ‘thin’ data models – also by detecting differences and absences through comparison. When applied to data models underpinning systems for population management, the Ontology Explorer enables the apprehension of how people are ‘inscribed’ in information systems: which assumptions are made about them, and which possibilities are excluded by design. The Ontology Explorer thus constitutes a methodology to capture authorities’ own imaginaries of populations and the ‘scripts’ through which they enact actual people. Furthermore, the method allows the comparison of scripts from diverse authorities. This is exemplified by illustrating its functioning with information systems for population management deployed at the European border. Our approach integrates a number of insights from early infrastructure studies and extends their methods and analytical depth to account for contemporary data infrastructures. By doing so, we hope to trigger a systematic discussion on how to extend those early methodical innovations at the semantic level to contemporary developments in digital methods.
Want to learn more about Big Data and Research?
Big Data & Society is an open-access SAGE journal that “publishes interdisciplinary work principally in the social sciences, humanities and computing and their intersections with the arts and natural sciences about the implications of Big Data for societies.” Get the basics with Data infrastructure literacy, an article from Big Data and Society (Gray, Gerlitz, & Bounegru, 2018). Also, learn the language with the Glossary of Big Data Terms.
Want to learn more about research with datasets? This curated collection of open-access articles can help you understand defining characteristics, and develop data literacy skills needed to work with large datasets and machine learning tools for managing Big Data sources.