Today: the third installment of my how-to guide for word tasting, A Word Taster’s Companion.
The world speaks in harmony
It’s our ability to parse the flow of sound into separate sounds that makes language work. We have a conceptual understanding of the different sounds we make – ideal sounds, targets that we aim for and come variously close to when we actually speak. When the sounds are strung together, we still think of them as independent units. It’s like handwriting: the letters may flow together so you can’t say exactly where one ends and the next one starts, but you can see the different letters.
Now, when we hear someone talking, how do we know what different movements their mouth is making, what targets they’re shooting for? It’s all to do with the harmonics.
When you make a vocalization, your vocal cords are vibrating at a certain frequency – which, if you’re singing, is the note you’re singing – but they’re also echoing in your vocal tract at various frequencies that are multiples of the base frequency (two, three, four or more waves for every one of the base frequency). If you sing an A at 440 Hertz (vibrations per second), there are also echoes of that at, for instance, 880 Hertz and 1760 Hertz, among others.
Now, which harmonics sound louder and which sound quieter will be determined by the shape of the resonating space in your mouth. There’s a resonating space at the back of your mouth, from your larynx to the top of your tongue, and the higher your tongue is, the longer that space and the lower the frequency of the harmonics that stand out. There’s also a space between the front of your mouth and the closest point your tongue comes to your palate, and the smaller that space is, the higher the resonance. The stand-out harmonics those spaces engender are called formants: the one at the back is the first formant, and the one at the front is the second formant. (There are third and fourth formants that play smaller roles.)
Thus, [u] – “oo” as in “boot” – is heard as it is because it has lower harmonics coming out in both formants: the back of the tongue is high, making a big space between it and the larynx, and it’s also far back, making a big space between it and the front of the mouth. On the other hand, [æ] – “a” as in “cat” – is heard as it is because both formants are higher; the tongue is low and towards the front. And [i] – “ee” as in “beet” – has low resonances in the first set, and higher ones in the second set. The second set are always at least a little higher than the first, even when saying the low back vowel [a], as in “bother.”
We also recognize consonants this way. If they’re consonants that stop the flow of air, we recognize them by what the tongue is doing immediately before and after. If they let just a little air through, we also get the sound of the air as it hisses or buzzes. I’ll go into close-up details of the vowels and consonants in coming chapters.
So we hear these sounds, and we have a sense of where in the mouth they’re coming from, and we also have an idea of what sound could come next in any given word – by the time you’re a couple of sounds into a word, the possibilities are narrowed down quite a bit. We can also hear the effect of the tongue moving and changing the shape of the resonating space in the mouth. And we have learned a repertory of different sounds that we recognize as distinct speech sounds (I won’t say “letters”; those are what we write to represent the sounds). The actual sounds won’t always be exactly identical, but as long as they’re close enough to a target, an identifiable known speech sound, they will be identified as it, especially if the sounds around it lead us to expect it.
These target sounds – sounds that we recognize as separate speech sounds – are called phonemes. If you meet someone who speaks another language who can’t manage to differentiate “bit” from “beat,” that’s because their native language doesn’t have a distinction between those two vowel sounds, so they’re not used to making the distinction when speaking. They may even believe they can’t. They might have a heck of a hard time telling them apart when listening, too, because they both land close enough to the same target in the set of sounds they’re used to. It’s the same with English speakers hearing and making sounds from some other languages: we may not be able to tell apart sounds that, to the language’s native speakers, are obviously different. After all, learning language is also a process of unlearning: in order to have separate sounds, you not only have to treat similar sounds as completely different; you also have to forget that some sounds are different because you need to treat them as the same in order for your language to make sense.