Broadening giants' shoulders: Long-Form Recordings to Study Infants' Vocal Development and Speech Environments
IMC Tuesday Seminar: Talk by Alejandrina Cristia, senior researcher at the Centre National de la Recherche Scientifique (CNRS)
Info about event
Time
Location
Jens Chr. Skous Vej 4, 8000 Aarhus C, building 1483, room 312 and online
Organizer
Abstract
Traditional laboratory experiments have significantly advanced our understanding of speech and language use, but a reliance on controlled environments limits our ability to ensure observations generalize to real-world communication as it occurs in the wide variety of settings humans learn and use language in. This talk introduces a complementary approach that extends beyond the confines of the lab setting, aiming to enable us to test extant theories' generalization to everyday interactions with greater statistical power through larger samples and denser individual sampling. Our approach is anchored on employing machine learning to analyze speech behavior as it unfolds in real-world interactions captured through long-form recordings. These data can be interrogated in very different ways. For instance, in a 13-author collaborative study, we analyzed over 40,000 hours of audio from 1,001 children across 12 countries. Correlation analyses suggested that maturation and speech exposure were more important predictors of infants' speech development than gender and socioeconomic status. Another way in which such data can be used is to assess the extent to which causal models predict general milestones in the infant population. For example, in one study, we use state-of-the-art self-supervised learning models to argue that tailored biases are needed to face the rich variability of naturalistic audio, meaning that uninformed statistical learning cannot suffice. I argue that these big data approaches can help us refine our theories and render them computationally testable.
About the speaker
Alejandrina Cristia is a senior researcher at the Centre National de la Recherche Scientifique (CNRS), leader of the Language Acquisition Across Cultures team, at the Laboratoire de Sciences Cognitives et Psycholinguistique (LSCP) cohosted by the Ecole Normale Supérieure, EHESS, and PSL. Her long-term aim is to shed light on the child language development, both descriptively and mechanistically. To this end, her team draws methods and insights from linguistics, psychology, anthropology, economics, and speech technology. With an interest in cumulative, collaborative, and transparent science, she co-founded the first meta-meta-analysis platform (metalab.stanford.edu) and several international networks, including DAylong Recordings of Children's Language Environment (darcle.org), and the Consortium on Language Variation in Input Environments around the World (LangVIEW), which aims to increase participant and researcher diversity in language development studies.
Free of charge - All are welcome