Generalizing from Inconsistent Data: The Combined Roles of Type and Token Frequency

Gaja Jarosz (UMass Amherst)
Event time: 
Monday, November 13, 2023 - 4:00pm to 5:30pm
Rosenfeld Hall, Room 109 See map
Event description: 

Language acquisition proceeds on the basis of incomplete, ambiguous linguistic input. Due to recent developments in computational modeling of morphophonological learning, there now exist numerous approaches for learning of various kinds of hidden morphophonological structure from incomplete, unlabeled, and noisy data. These computational models make it possible to connect the full representational richness of linguistic theory with noisy, ambiguous data representative of language learners’ linguistic experience to make detailed and experimentally testable predictions about language learning and generalization. This connection has the potential to be particularly informative in domains where there are unsettled theoretical debates about the essential representations involved, with widely varying representational and/or architectural assumptions covering the same range of basic facts.

In this talk, I present the results of an ongoing collaborative research project (joint work with Maggie Baird, Cerys Hughes, Seoyoung Kim, Andrew Lamont, Max Nelson, Brandon Prickett) examining one such domain in the morphophonological realm – lexical exceptionality – from both experimental and computational perspectives. Although there are extensive literatures on exceptionality from both of these perspectives, there is little consensus about how language learners encode patterns with exceptions and how they generalize such patterns to novel examples. This is an important question - when there is inconsistency in the input, learners must make choices about how broadly and how strongly to encode regularities, which items to treat as regulars and which as exceptions, and these choices provide clues about the biases inherent to language learning and the grammatical systems we ultimately acquire.

Psycho-computational research on lexical exceptionality and morphological productivity overwhelmingly supports the hypothesis that the type frequency of a pattern (the number of words in the lexicon following that pattern) is a better predictor of that pattern’s productivity than token frequency (how often the pattern is encountered overall in the learning data) (Aronoff, 1976; Baayen and Lieber, 1991; Pierrehumbert, 2001; Albright and Hayes, 2003; Hay and Beckman, 2003). However, the shape of the function mapping type frequency to productivity remains controversial, with some work suggesting that function is essentially linear (“frequency matching”) and other work showing that learners “regularize” the input distribution, boosting the productivity of the most frequent patterns. There is also a growing experimental literature showing that a variety of factors can influence the degree to which learners regularize.

A second controversy concerns the role of token frequency in productivity. While the primary role of type frequency is well-established, research on morphological processing and productivity has also shown that there are systematic relationships between productivity and token frequency. The findings are conflicting, however. Some lines of research suggest that token frequency and productivity are inversely related (Baayen and Lieber, 1991; Bybee, 1995; O’Donnell et al., 2011), other results indicate that they are positively related (Nosofsky, 1988; Casenhiser and Goldberg, 2005; Bybee and Eddington, 2006; Barðdal, 2008; Endress and Hauser, 2011), and yet others argue that token frequency plays no role in productivity (Hayes and Wilson, 2008; Daland et al., 2011; Becker et al., 2011, 2012; Yang 2016).

We examine the independent roles of type frequency and token frequency as well as their interaction in three artificial language learning experiment involving lexicalized plural allomorphy. Our learning framework makes it possible to examine the effects of these variables on generalization to novel forms as well as to examine how sensitivity to these factors affects the time-course of learning. To maximize the chances that the effects of frequency manipulations can be detected and isolated, the experiments are designed to minimize the influence of other factors that have been shown to interact with productivity as much as possible. The first two experiments are designed to differentiate the predictions for generalization of three distinct hypotheses about the relationship between productivity and type frequency, while the third experiment investigates the independent role of token frequency. To preview the results, we find that both type and token frequency independently and positively contribute to learning rates and generalization across the three experiments, with type frequency playing a dominant role. We also apply two computational learning theories - implementing two prominent theoretical linguistic frameworks - to the learning of the lexically-conditioned allomorphy patterns in our experiments. Neither of these models are designed to encode a particular relationship between token frequency, type frequency, and productivity, but we show that both models correctly predict the general trends in generalization rates, learning curves, and the influence of both type and token frequency observed across the experimental conditions.

Event Type: