Minimum Description Length (MDL) analysis for linguists [Part 2]

John Goldsmith (Professor Emeritus, University of Chicago)
Event time: 
Friday, November 3, 2023 - 12:15pm to 1:30pm
Dow Hall room 201 (LingSem) See map
Event description: 

***This is part 2 of a 3-part speaker series***

Minimum Description Length (MDL) analysis is the invention of statisticians, who think of themselves as designing tools that should in principle be usable and useful for intelligent people, regardless of what they are studying. Traditional statistical tools have been helpful for some linguists (notably in sociolinguistics and phonetics), but they have not been helpful for most linguists. MDL tries to come to grips with notions that are much more familiar to linguists, though: linguists have always cared about finding solutions that are as simple as possible, and measuring how well our analyses do or don’t do justice to the data. MDL offers some conceptual tools for thinking about those questions. And I’ll illustrate the idea by looking at two simple questions: discovery of words from continuous discourse, and discovering morphemes within words.

The discussion at the first talk, on October 27, will be at a relatively high level, and focus on the ideas – including a way to link the empirical claims of a grammar to empirical evidence quite different from that employed in generative grammar.

The second and the third talk will go more into the details, and is intended for people who might be interested in actual research. In both I will show and demonstrate software that embodies learning methods inspired by MDL. In the second talk, I will focus on word discovery – looking at a problem much like that which Dick Aslin discussed in his talk in the department last spring, in a quite different direction. In the third talk, I will look in more detail at the problem of discovering the morphology of a language automatically, a problem I’ve been working on for – well,  a very long time (since before “Unsupervised Learning of the Morphology of a Natural Language,” in Computational Linguistics 2001, if anyone is interested in taking a look ahead of time).