Deep Phonology: Modeling language from raw acoustic data in a fully unsupervised manner
In this talk, I propose that language can be modeled from raw speech data in a fully unsupervised manner with Generative Adversarial Networks (GANs) and that such modeling has implications both for the understanding of language acquisition and for the understanding of how deep neural networks learn internal representations. I propose an extension of the GAN architecture in which learning of meaningful linguistic units emerges from a requirement that the networks output informative data and which captures both the perception and production principles of human speech. I further propose a technique to identify latent variables in deep convolutional networks that represent linguistically meaningful units in a causal and interpretable way. With this model, we can “wug-test” deep neural networks, analyze how their biases match human learning biases in behavioral experiments, how speech processing in the brain compares to intermediate representations in deep neural networks, how symbolic-like rule-like computation emerges in internal representations, and what GANs’ innovative outputs can teach us about productivity in human language.