Sampling human discourse and activity

Talk by Leon Derczynski, The University of Sheffield

2015.11.13 | Anne-Mette Pedersen

Date Tue 02 Feb
Time 11:00 13:00
Location IMC Meeting Room, Jens Chr. Skous Vej 4, Building 1483-312


Across its many forms, user-generated content (UGC) acts as a sample of all human discourse. It has been harder to automatically handle this type of text with traditional tools; unlike newswire, it comprises a much broader and more direct example of language. This talk examines the kind of variation we see in user-generated content, especially social media. Further, this talk not only presents methods for coping with the noise without changing it, but also goes on to explain the many kinds of latent information expressed by the stable, consistent linguistic variation seen across society and the internet.