January 6, 2018

A paper entitled Effects of Surprisal and Entropy on Vowel Duration in Japanese, co-authored by Assistant Professor Jason Shaw and Associate Professor Shigeto Kawahara of Keio University, has been published in the journal Language and Speech. The paper investigates the hypothesis that people generally articulate sounds more clearly and precisely if those sounds are not very predictable from the context. For example, in rapid English speech, the last few syllables of a word may be pronounced more quickly or in a slurred manner when the identity of the word can be deduced from the first few syllables. If the listener has already figured out what word is being spoken from the first few syllables, then the last few syllables are not very important for communicating the word from the speaker to the listener. On the other hand, if the word may be easily confused with another word, then all syllables would be articulated very clearly to avoid ambiguity.

Jason and Shigeto’s study focused on vowel duration in Japanese. Vowel duration is, as its name suggests, the amount of time it takes to pronounce a particular vowel. In Japanese, the duration of a vowel is used to distinguish between different sounds: two instances of the same vowel are considered to be different sounds if one is longer in duration than the other. For example, the name of the Japanese city Ōsaka literally means “big hill,” and the ō is of long duration. However, if the long ō is replaced by a short o, then the resulting word osaka literally means “little hill.” Apart from the fact that length distinguishes between different sounds, however, Jason and Shigeto hypothesized that predictability has an additional effect on vowel duration. This means that in syllables that contain a consonant followed by a vowel that is considered to be of short duration, the exact length of the vowel duration may further vary based on how predictable the particular vowel is given the identity of the preceding consonant. If the vowel is predictable, then it may be articulated more quickly, thus shortening its duration. If the vowel is unpredictable, then it may be articulated more slowly and carefully, lengthening its duration.

To test this hypothesis, Jason and Shigeto looked at a large collection of Japanese recordings called the Corpus of Spontaneous Japanese. They considered syllables of CV form—a consonant followed by a vowel. According to information theory, predictability can be measured in two ways. For a given consonant, the surprisal of each possible vowel is the negative base-2 logarithm of the conditional probability of that vowel given the identity of the preceding consonant. If a syllable has a high level of surprisal, then that means that the vowel does not usually follow the consonant in that syllable, and therefore this particular syllable is not very predictable. The entropy of the syllable is the expected value of the surprisal over all possible vowels that might follow the consonant of the syllable. If a syllable has high entropy, then there is usually a lot of variation among which vowel might appear after that consonant. In their study, Jason and Shigeto calculated the surprisal for each consonant–vowel pair, as well as the entropy for each consonant.

Examining the surprisal, entropy, and vowel duration of each syllable in the corpus, Jason and Shigeto found that predictability really does seem to have an effect on vowel duration overall. However, the authors also observed that predictability interacts with other articulatory considerations, and these interactions sometimes neutralize or even reverse the expected effect that high surprisal and entropy would lengthen vowels. For example, the expected effect was observed in syllables with short a, long ā, short i, short e, and long ē. However, for long and short versions of o, higher levels of surprisal actually correlated with shorter durations rather than longer ones. In this case, Jason and Shigeto noticed that the consonants for which o has low surprisal are typically articulated at the front of the mouth, while the vowel o itself is articulated at the back of the mouth. Although low surprisal is expected to shorten the vowel, this shortening is compensated for by the fact that in the particular syllables with low surprisal, the tongue must travel a greater distance than usual to reach the back of the mouth from the consonants in the front. Other deviations from the expected effect, such as those found with short u, long ū, and long ii, are also likely to be caused by other possible factors that affect vowel duration.

Readers with access to Language and Speech may read the paper online. For readers without access, a summary of the paper was featured in the Yale Daily News, the student newspaper of Yale College.

