Aarhus University Seal

Sampling human discourse and activity

Talk by Leon Derczynski, The University of Sheffield

Info about event

Time

Tuesday 2 February 2016,  at 11:00 - 13:00

Location

IMC Meeting Room, Jens Chr. Skous Vej 4, Building 1483-312

Organizer

Riccardo Fusaroli

Abstract:

Across its many forms, user-generated content (UGC) acts as a sample of all human discourse. It has been harder to automatically handle this type of text with traditional tools; unlike newswire, it comprises a much broader and more direct example of language. This talk examines the kind of variation we see in user-generated content, especially social media. Further, this talk not only presents methods for coping with the noise without changing it, but also goes on to explain the many kinds of latent information expressed by the stable, consistent linguistic variation seen across society and the internet.