8 Week 8: Introduction to Supervised Machine-Learning

Slides

I will not be providing code to run SVM or Bi-LSTM. However, if you are interested in good tutorials, please check out the following links:

In this week’s lecture, we learned a framework for Supervised Machine Learning models. This framework includes creating a training set.

Think of a dataset (corpus) and a classification task. Ideally, both the corpus and the classification task can be used in your final paper. However, it’s ok if this is done for this assignment (you will still need to get a corpus). You can choose whatever task, except for sentiment classification.
Decide the number of categories that you will be predicting.
Decide the number of observations you will code per category.
Create a codebook (draft) to guide coders who will (hypothetically) label your training set.
Label a sample of your data (N=100; decide how you will sample the data and explain your decision). Have a classmate label the same sample (you can find the coder pairing here). Estimate inter-coder reliability and evaluate the results.
How difficult/easy was the task? What problems did you run into? What would you change from your codebook to improve it? What other lessons did you learn from this exercise?

Note: We will be using the codebook and training set for an optional assignment (next week). It can also be the basis for your final paper.