Auditory Activities

Submitted by jan on

These tutorial activities are designed to review / clarify / reinforce some of the concepts covered in the Hearing lecture of How Your Brain Works. The key concepts we want to revise are shown below in bold italic font. Below them are suggested online activities from the Auditory Neuroscience web site to illustrate the concepts and familiarize yourself with them. Try the activities, then check whether you understand the concepts deeply. For these activities, particularly the spatial hearing ones, you should use a pair of stereo headphones connected to your computer or mobile device. If you still have questions, raise them with a TA or the lecturer.

1) Vocal Sounds

The sound spectra of voiced speech sounds (like vowels) are characterized by broad formant peaks and sharp harmonic peaks. The formants define the vowel type, while the harmonics define the pitch. 

Go to the "Two Formant Artificial Vowels" page ( and play withe the applet to make artificial vowels of different types. The web app simply makes a click train to simulate glottal pulse trains and sends them through two bandpass filters to simulate vocal tract resonances. The pitch, (the fundamental frequency and its harmonics) is thus given by the click rate, and the formants are thus created by the bandpass filters. 

Compare what you have just tried with the artificial vowels to what you get with natural vowels. Open the spectrogram app ( in a new browser and produce some different natural vowels of different types and pitches. Are the harmonics and the formants where you would expect them to be?

Note that natural speech sounds have more than two formants, yet the little app played with only two. Speech scientists believe that the lowest two formants in speech sounds are key to defining vowel identity. 

The way mammals vocalize is pretty generic.

Have a look at the video of Mishka, the talking dog. Are you surprised? All mammals have vocal folds and resonant cavities in their vocal tracts, and are therefore in principle capable of producing vocal sounds that are fundamentally somewhat similar to those made by humans. Do you think a mouse might be able to do what Mishka does? If not, why not?

2) Place and Timing Codes for Periodicity and Pitch

The pitch of a sound is determined by its harmonic structure, or, equivalently, by its periodicity. 

In the lecture we discussed that cochlear filters are not sharply tuned enough to resolve many harmonics of "spectrally complex" sounds, like click trains, which have a "harmonic stack" spectrum. You may want to take another look at the animation which we saw in the lecture to remind yourself of this idea, and make sure you understand what is meant by the idea that the "cochlea's tonotopy resolves the harmonics 1-3", but not higher ones.

You may also recall from the lecture that unresolved higher harmonics are a problem for "place theories of pitch", because "missing fundamental stimuli" can have clear pitches but only have unresolved harmonics. If you want to, revisit the missing fundamental sound examples at This has led to the idea that, in addition to, or perhaps instead of, place coded information on the tonotopic array, the auditory system can use "timing codes" for pitch. The idea here is that the brain analyses inter-spike intervals of auditory nerve fiber responses which "phase lock" to the temporal periodicity of the stimulus. To get a proper appreciation of phase locking, study the video on page Make sure you understand what it is that you are looking at. Would you be able to explain this video in some detail to one of your class mates if you had to? If not, get a class mate to try to explain it to you. 

3) Spatial Hearing

ITDs and ILDs

In the lecture we discussed the role of interaural time differences (ITDs) and interaural level differences (ILDs) in helping people know which direction a sound comes from. 

Remind yourself of the nature of these cues, and how they interact, by looking at the demos on and How sensitive are you to ITDs and ILDs respectively? How big does an ITD have to be, in miliseconds, for a sound to sound "a long way over to one side"?

With that background knowledge in mind, what do you think the devices shown on page are all about? Would they work? How would they work? What would they do to ITDs and ILDs? What are their limitations?

If you are curious, you can also check out the "binaural beats" demo. If you search "binaural beats" on youtube you may discover that here is a lot of hippie mysticism around binaural beats, with pieces of electronic music incorporating binaural beats in a manner that is meant to "entrain your brainwaves" in all sorts of weird and wonderful ways.  As budding scientists, you should probably meet this type of hippie mysticism with a healthy does of scientific skepticism. For an old school auditory neuroscientist, binaural beats are nothing more than a cute little demo that your brain can interpret continuous changes in interaural phases as changing ITDs which  suggest a moving sound source.

Spectral Cues

In the lecture we also mentioned that "spectral cues" to sound location, which are created by direction-dependent filtering of sound by the outer ear (the so-called "head related transfer function") can make it possible to tell whether a sound comes from above or from below.  It is possible to mimic the outer ears' directional filters  with digital filtering of sounds delivered over headphones. This makes it possible to create a "virtual acoustic space" in which sounds can move up or down as well as left or right. One difficulty with this technique is that the shape of your outer ears is unique, and therefore your head related transfer function is not identical to that of others. Consequently, virtual acoustic space generated with someone else's HRTF won't be perfect, but it may be close enough to give some illusion of up and down movement. You can try this for yourself on the web page


4) Hearing Speech

Place and manner of articulation. The McGurk effect.

In one of the last slides of the lecture, we introduced the idea, based on data by Mesgarani and others, that neurons in higher auditory cortical areas on the superior temporal gyrus may be sensitive to "phonetic", rather than acoustic, features of speech sounds, such as the place and the manner of articulation. To remind you, one of many possible "manners of articulation" of a short consonant, such as "b" or "d" or "k", might be to briefly close off the air flow through your vocal tract, only to then release air in a sudden puff, producing a so-called "plosive" consonant sound. The "place of articulation" for "b" would however be different from that of "d", because in the first case, the air flow is obstructed and released by both lips (a "bilabial place of articulation") while for the "d" the air is cut off by the tongue pushing against the anterior part of the roof of the mouth known as the "alveolus". Now, you can hear whether the place of articulation is bilabial or alveolar,  because "b" and "d" sound slightly different. But you can of course also see it. In a "b", the lips close for a moment. In a "d" they don't. But what if auditory and visual evidence for the place of articulation conflict? Find out by checking out the rather remarkable "McGurk effect" video at Do you think auditory neurons in the STG are influenced by pictures of moving lips?