Python: Preparing Text as Data

Python: Preparing Text as Data

You’ve collected or received your text data and need to clean them for analysis. In this workshop we’ll go over the types of cleaning you might need to do given your research question, and how to do it.
Things you’ll learn in this workshop:

Tokenization
(Foreign) language detection
Stemming and lemmatization
Stoplisting (removing some words)
Classifying words by semantic type (e.g. emotional, rational)