The five pitfalls of coding and labeling - and how to avoid them
Whether you call it ‘content analysis’, ‘textual data labeling’, ‘hand-coding’, or ‘tagging’, a lot more researchers and data science teams are starting up annotation projects these days. Learn how to avoid potential pitfalls.
Emotion and reason in political language
In the day-to-day of political communication, politicians constantly decide how to amplify or constrain emotional expression, in service of signalling policy priorities or persuading colleagues and voters. We propose a new method for quantifying emotionality in politics using the transcribed text of politicians’ speeches. This new approach, described in more detail below, uses computational linguistics tools and can be validated against human judgments of emotionality.
The validity problem with automated content analysis
There’s a validity problem with automated content analysis. In this post, Dr. Chung-hong Chan introduces a new tool that provides a set of simple and standardized tests for frequently used text analytic tools and gives examples of validity tests you can apply to your research right away.
My journey into text mining
My journey into text mining started when the institute of Digital Humanities (DH) at the University of Leipzig invited students from other disciplines to take part in their introductory course. I was enrolled in a sociology degree at the time, and this component of data science was not part of the classic curriculum; however, I could explore other departments through course electives and the DH course sounded like the perfect fit.
How to embrace text analysis as a computational social scientist
In this guest blog, Alix Dumoulin and Regina Catipon cover how to embrace text analysis as a social scientist, the challenge cleaning text corpora brings in preprocessing, and introduce our upcoming tool, Texti, that will save researchers time.
From preprocessing to text analysis: 80 tools for mining unstructured data
Text mining techniques have become critical for social scientists working with large scale social data, be it Twitter collections to track polarization, party documents to understand opinions and ideology, or news corpora to study the spread of misinformation. In the infographic shown in this blog, we identify more than 80 different apps, software packages, and libraries for R, Python and MATLAB that are used by social science researchers at different stages in their text analysis project. We focused almost entirely on statistical, quantitative and computational analysis of text, although some of these tools could be used to explore texts for qualitative purposes.
What does it mean to anonymize text?
Text data are a resource that we are only beginning to understand. Many human interactions are moving to the digital world, and we become increasingly sophisticated in documenting interactions. Face-to-face encounters are replaced by written communication (e.g., WhatsApp, Twitter) and every crime incident or hospital visit is recorded. All of these interactions leave a trace in the form of text data.
Roundup: #text2data - new ways of reading
‘From text to data - new ways of reading’ was a 2-day event organised by the National Library of Sweden, the National Archives and Swe-Clarin. The conference brought together librarians, digital collection curators, and scholars in digital humanities and computational social science to talk about the tools and challenges involved in large scale text collection and analysis.