Aarhus University Seal

Supervised machine learning with LingPipe: Capturing writing styles with character-level n-gram language models

Tutorial by Hilke Reckman, PhD. School of Communication and Culture

Info about event

Time

Thursday 18 February 2016,  at 10:00 - 13:00

Location

IMC Meeting Room, Jens Chr. Skous Vej 4, Building 1483-312

Organizer

Kristoffer

If you are interested in participating, please contact Kristoffer L Nielbo (kln@cas.au.dk)

How do you show that there is a detectable difference between two or more different categories of texts, for example books written in different periods, by authors of different genders, etc.? Supervised machine learning for automated classification can be a useful tool here.  LingPipe is a ready-to-use package for applying a range of supervised machine learning algorithms. In this class you will learn how to use LingPipe's ngram models. After a general introduction, we will walk through an example, where we use character-level language modeling to distinguish between the writing styles of male versus female authors of short online reviews. The system is trained on labeled data and then tested and evaluated for how it performs on unseen texts. This is a beginner-level tutorial aimed at all who are interested in digital text methods. No previous experience with machine learning, text mining, or programming is required.

Hilke Reckman