Analyze Big Data

How can you analyze and interpret Big Data?

Just as the term "Big Data" can describe a wide range of datasets or collections, there is no one-size-fits- all model for Big Data analysis. Here are some varied examples from across disciplines.

Bornakke, T., & Due, B. L. (2018). Big–Thick Blending: A method for mixing analytical insights from big and thick data sources. Big Data & Society. https://doi.org/10.1177/2053951718765026

Abstract. Recent works have suggested an analytical complementarity in mixing big and thick data sources. These works have, however, remained as programmatic suggestions, leaving us with limited methodological inputs on how to archive such complementary integration. This article responds to this limitation by proposing a method for ‘blending’ big and thick analytical insights. The paper first develops a methodological framework based on the cognitivist linguistics terminology of ‘blending’. Two cases are then explored in which blended spaces are crafted from engaging big and thick analytical insights with each other. Through these examples, we learn how blending processes should be conducted as a rapid, iterative and collaborative effort with respect for individual expertise. Further, we demonstrate how the unique, but often overlooked, granularity of big data plays a key role in affording the blending with thick data. We conclude by suggesting four commonly appearing blending strategies that can be applied when relying upon big and thick data sources.

Feng, G. C. (2020). Research Performance Evaluation in China: A Big Data Analysis. SAGE Open. https://doi.org/10.1177/2158244019901257

Abstract. China’s scientific achievement has received considerable international attention due to a large amount of research and development (R&D) spending. This article aims to study the performance of China’s R&D expenditures (in the form of research funding) by examining the research performance of individual researchers based on bibliometric measures. This study concludes that research practice is not merely determined by capital possessed. Besides, international collaboration primarily accounts for research performance of scholars, whereas research funding and publishing in Chinese-based journals do not impact research performance significantly.

Milan, S., & Treré, E. (2019). Big Data from the South(s): Beyond Data Universalism. Television & New Media, 20(4), 319–335. https://doi.org/10.1177/1527476419837739

Abstract. This article introduces the tenets of a theory of datafication of and in the Souths. It calls for a de-Westernization of critical data studies, in view of promoting a reparation to the cognitive injustice that fails to recognize non-mainstream ways of knowing the world through data. It situates the “Big Data from the South” research agenda as an epistemological, ontological, and ethical program and outlines five conceptual operations to shape this agenda. First, it suggests moving past the “universalism” associated with our interpretations of datafication. Second, it advocates understanding the South as a composite and plural entity, beyond the geographical connotation (i.e., “global South”). Third, it postulates a critical engagement with the decolonial approach. Fourth, it argues for the need to bring agency to the core of our analyses. Finally, it suggests embracing the imaginaries of datafication emerging from the Souths, foregrounding empowering ways of thinking data from the margins.

Resnyansky, L. (2019). Conceptual frameworks for social and cultural Big Data analytics: Answering the epistemological challenge. Big Data & Society. https://doi.org/10.1177/2053951718823815

Abstract. This paper aims to contribute to the development of tools to support an analysis of Big Data as manifestations of social processes and human behaviour. Such a task demands both an understanding of the epistemological challenge posed by the Big Data phenomenon and a critical assessment of the offers and promises coming from the area of Big Data analytics. This paper draws upon the critical social and data scientists’ view on Big Data as an epistemological challenge that stems not only from the sheer volume of digital data but, predominantly, from the proliferation of the narrow-technological and the positivist views on data. Adoption of the social-scientific epistemological stance presupposes that digital data was conceptualised as manifestations of the social. In order to answer the epistemological challenge, social scientists need to extend the repertoire of social scientific theories and conceptual frameworks that may inform the analysis of the social in the age of Big Data. However, an ‘epistemological revolution’ discourse on Big Data may hinder the integration of the social scientific knowledge into the Big Data analytics.

Rogers, R. (2021). Visual media analysis for Instagram and other online platforms. Big Data & Society. https://doi.org/10.1177/20539517211022370

Abstract. Instagram is currently the social media platform most associated with online images (and their analysis), but images from other platforms also can be collected and grouped, arrayed by similarity, stacked, matched, stained, labelled, depicted as network, placed side by side and otherwise analytically displayed. In the following, the initial focus is on Instagram, together with certain schools of thought such as Instagramism and Instagrammatics for its aesthetic and visual cultural study. Building on those two approaches, it subsequently focuses on other web and social media platforms, such as Google Image Search, Twitter, Facebook and 4chan. It provides demonstrations of how querying techniques create online image collections, and how these sets are analytically grouped through arrangements collectively referred to as metapictures.

Schweinberger, M., Haugh, M., & Hames, S. (2021). Analysing discourse around COVID-19 in the Australian Twittersphere: A real-time corpus-based analysis. Big Data & Society. https://doi.org/10.1177/20539517211021437

Abstract. Public discourse about the COVID-19 that appears on Twitter and other social media platforms provides useful insights into public concerns and responses to the pandemic. However, acknowledging that public discourse around COVID-19 is multi-faceted and evolves over time poses both analytical and ontological challenges. Studies that use text-mining approaches to analyse responses to major events commonly treat public discourse on social media as an undifferentiated whole, without systematically examining the extent to which that discourse consists of distinct sub-discourses or which phases characterize its development. They also confound structured behavioural data (i.e., tagging) with unstructured user-generated data (i.e., content of tweets) in their sampling methods. The present study aims to demonstrate how one might go about addressing both of these sets of challenges by combining corpus linguistic methods with a data-driven text-mining approach to gain a better understanding of how the public discourse around COVID-19 developed over time and what topics combine to form this discourse in the Australian Twittersphere over a period of nearly four months. By combining text mining and corpus linguistics, this study exemplifies how both approaches can complement each other productively.

Venturini, T., Jacomy, M., & Jensen, P. (2021). What do we see when we look at networks: Visual network analysis, relational ambiguity, and force-directed layouts. Big Data & Society. https://doi.org/10.1177/20539517211018488

Abstract. It is increasingly common in natural and social sciences to rely on network visualizations to explore relational datasets and illustrate findings. Such practices have been around long enough to prove that scholars find it useful to project networks in a two-dimensional space and to use their visual qualities as proxies for their topological features. Yet these practices remain based on intuition, and the foundations and limits of this type of exploration are still implicit. To fill this lack of formalization, this paper offers explicit documentation for the kind of visual network analysis encouraged by force-directed layouts. Using the example of a network of Jazz performers, band and record labels extracted from Wikipedia, the paper provides guidelines on how to make networks readable and how to interpret their visual features. It discusses how the inherent ambiguity of network visualizations can be exploited for exploratory data analysis. Acknowledging that vagueness is a feature of many relational datasets in the humanities and social sciences, the paper contends that visual ambiguity, if properly interpreted, can be an asset for the analysis. Finally, we propose two attempts to distinguish the ambiguity inherited from the represented phenomenon from the distortions coming from fitting a multidimensional object in a two-dimensional space. We discuss why these attempts are only partially successful, and we propose further steps towards a metric of spatialization quality.

Previous
Previous

Interview with Daniela Duca on creating SAGE Texti: A free tool for cleaning and pre-processing textual data

Next
Next

Six new software tools supporting research methods in the social sciences awarded SAGE Concept Grants