Discovering ACEP: Use R for Content Analysis in Spanish
by Agustín Nieto & Nicolas Rabino (Instituto de Humanidades y Ciencias Sociales, Consejo Nacional de Investigaciones Científicas y Técnicas, Universidad Nacional de Mar del Plata)
Natural language processing and text mining have changed the way in which social researchers and data scientists can analyze and understand the information contained in large amounts of text. In this blog we will present a package of functions for the R programming language called ACEP, specially designed to analyze social conflict. This package fills a gap in this area for Spanish speakers.
What is ACEP?
ACEP is an R package designed specifically for content analysis in Spanish. It was developed within the framework of the Observatorio de Conflictividad Social de Mar del Plata and offers a variety of functions and tools that facilitate the extraction and analysis of information from text. Below we will highlight some of the most important features of ACEP:
Text Preprocessing: simplifies the process of text preprocessing, such as tokenization (breaking text into words or phrases), lemmatization (reducing words to their base form), and the removal of empty words (common words that do not contribute meaning, such as "and", "the", "the"). This is essential to clean up and prepare the text before performing more advanced analyses.
Specialized Dictionaries: includes specialized dictionaries that can be used in the analysis of social conflict, where you can define dictionaries of terms related to protests, demands, social actors and more.
Data Visualization: offers tools for textual data visualization, such as word clouds and graphs, which help to visually summarize the most frequent terms in a set of text and facilitate the identification of patterns.
Frequency Analysis: Users can perform frequency analysis to determine the number of words or phrases in a set of text. This is useful for identifying trends and patterns over time.
Newspaper databases: some national newspaper databases are included (La Nación), as well as some from certain Argentine cities.
Functions under experimentation to be implemented in the near future:
Chat-GPT: Function to interact with Open AI models from secret and paid API keys.
Triplet Extraction (Subject-Verb-Object): this function allows extracting subjects of the protest; action performed and object(s) of the action. It also returns named entities (NER).
The package is downloaded from the official R Project (CRAN) site and its development version from a repository on GitHub.
An example of how anyone can analyse social conflicts with ACEP
The article on "Conflictividad laboral en la pesca" (Labor conflict in fishing) conducted an exploratory analysis of labor conflicts in the Argentine fishing industry, using the ACEP package and a dictionary. The focus was on conflicts involving the Fish Industry Workers' Union (SOIP) in Mar del Plata between 2009 and 2020. The combination of different dictionaries allowed for the identification of the temporality of these conflicts. Know more by clicking this link.
In conclusion, ACEP emerges as an innovative and essential tool for social conflict analysis in the Spanish-speaking sphere, offering a range of valuable and promising functions for research work and data analysis in this field.
We invite you to visit our website: https://agusnieto77.github.io/ACEP/.
Nicolás Rabino is a professor and graduate in History from the National University of Mar del Plata (UNMDP). Currently, he is pursuing a Ph.D. in History through a doctoral scholarship from CONICET. His area of interest focuses on the analysis of social conflicts in recent Argentine history, combining social history with digital history.
Agustín Nieto is a professor and Ph.D. in History. He is an independent researcher with CONICET, based at the Institute of Humanities and Social Sciences (INHUS), where he serves as vice-director. Additionally, he is a lecturer in the Sociology program at the Faculty of Humanities of the National University of Mar del Plata (UNMdP). Agustín also holds the position of Coordinator of the Observatory of Social Conflict (UNMdP) and is a member of the Network of Social Conflict Observatories. In recent years, he has specialized in computational techniques for processing historical sources (digital history) and for the analysis of social conflict (computational sociology).
In the field of artificial intelligence (AI), Transformers have revolutionized language analysis. Never before has a new technology universally improved the benchmarks of nearly all language processing tasks: e.g., general language understanding, question - answering, and Web search. The transformer method itself, which probabilistically models words in their context (i.e. “language modeling”), was introduced in 2017 and the first large-scale pre-trained general purpose transformer, BERT, was released open source from Google in 2018. Since then, BERT has been followed by a wave of new transformer models including GPT, RoBERTa, DistilBERT, XLNet, Transformer-XL, CamemBERT, XLM-RoBERTa, etc. The text package makes all of these language models and many more easily accessible to use for R-users; and includes functions optimized for human-level analyses tailored to social scientists.