Unlocking crime data for research: An update from 2019's SAGE Concept Grant winners
Text Wash uses machine learning and natural language processing to unlock previously untapped crime data, that so far has been inaccessible to research due to the need to anonymize the personally identifiable information it contains.
What does it mean to anonymize text?
Text data are a resource that we are only beginning to understand. Many human interactions are moving to the digital world, and we become increasingly sophisticated in documenting interactions. Face-to-face encounters are replaced by written communication (e.g., WhatsApp, Twitter) and every crime incident or hospital visit is recorded. All of these interactions leave a trace in the form of text data.
Making sensitive text data accessible for computational social science
Text is everywhere, and everything is text. More textual data than ever before are available to computational social scientists—be it in the form of digitized books, communication traces on social media platforms, or digital scientific articles. Researchers in academia and industry increasingly use text data to understand human behavior and to measure patterns in language. Techniques from natural language processing have created a fertile soil to perform these tasks and to make inferences based on text data on a large scale.