The five pitfalls of coding and labeling - and how to avoid them
Whether you call it ‘content analysis’, ‘textual data labeling’, ‘hand-coding’, or ‘tagging’, a lot more researchers and data science teams are starting up annotation projects these days. Learn how to avoid potential pitfalls.
Swahili Lexicon for Sentiment Analysis
The "Swahili Lexicon for Sentiment Analysis" project, funded by a Sage Concept Grant, aims to develop and test a Swahili Lexicon annotated by native Swahili speakers for text mining, particularly sentiment analysis.
My journey into text mining
My journey into text mining started when the institute of Digital Humanities (DH) at the University of Leipzig invited students from other disciplines to take part in their introductory course. I was enrolled in a sociology degree at the time, and this component of data science was not part of the classic curriculum; however, I could explore other departments through course electives and the DH course sounded like the perfect fit.
How to embrace text analysis as a computational social scientist
In this guest blog, Alix Dumoulin and Regina Catipon cover how to embrace text analysis as a social scientist, the challenge cleaning text corpora brings in preprocessing, and introduce our upcoming tool, Texti, that will save researchers time.
SAGE Concept Grants: Feedback for applicants
The 2020 SAGE Ocean Concept Grant program drew over 140 applications from all over the world. In this blog post, we’re giving you an insight into our judging criteria and sharing the most common reasons why applications did not progress further, to serve as feedback for this year’s applicants and guidance for future applicants.
Introducing the SAGE Ocean Fellowship: Apply now
Our product development team at SAGE Ocean is excited to present a new opportunity: we are seeking a SAGE Ocean Fellow, who will work with us to refine and test a new product for academics that apply automated text analysis techniques in their research.
From preprocessing to text analysis: 80 tools for mining unstructured data
Text mining techniques have become critical for social scientists working with large scale social data, be it Twitter collections to track polarization, party documents to understand opinions and ideology, or news corpora to study the spread of misinformation. In the infographic shown in this blog, we identify more than 80 different apps, software packages, and libraries for R, Python and MATLAB that are used by social science researchers at different stages in their text analysis project. We focused almost entirely on statistical, quantitative and computational analysis of text, although some of these tools could be used to explore texts for qualitative purposes.
The ethics of AI and working with data at scale: what are the experts saying
If we were to do a text mining exercise on all the incredible discussions at last week’s conference 100+ Brilliant Women in AI & Ethics, education would beat all other topics by a mile. We talked about educating kids, we had teenagers share their thoughts on AI in poems and essays, and exchanged views on the nuances of teaching ethics in computing and working with large volumes of social data both for computer scientists and experts from other disciplines.
Making sensitive text data accessible for computational social science
Text is everywhere, and everything is text. More textual data than ever before are available to computational social scientists—be it in the form of digitized books, communication traces on social media platforms, or digital scientific articles. Researchers in academia and industry increasingly use text data to understand human behavior and to measure patterns in language. Techniques from natural language processing have created a fertile soil to perform these tasks and to make inferences based on text data on a large scale.
Roundup: #text2data - new ways of reading
‘From text to data - new ways of reading’ was a 2-day event organised by the National Library of Sweden, the National Archives and Swe-Clarin. The conference brought together librarians, digital collection curators, and scholars in digital humanities and computational social science to talk about the tools and challenges involved in large scale text collection and analysis.