How does the human auditory system work? A full explanation is beyond the scope and intent of this book, but a condensed explanation of simple auditory transduction follows. For a more serious study, please visit Chapter 4: References.
Once acoustic energy is collected by the outer ear, it is transformed into mechanical (vibrational) energy by the eardrum. The three bones in the middle ear, colloquially known as hammer, anvil and stirrup, efficiently transmit this vibration into the fluid-filled inner ear (the cochlea).6 Inside the cochlea, vibration in the inner-ear fluid sets up a traveling wave on the coiled basilar membrane; different regions respond most strongly to different frequencies (tonotopy).9 10 Hair cells convert this mechanical motion into electrical signals, which travel along the auditory nerve through several processing stages before reaching auditory cortex, including primary auditory cortex in Heschl’s gyrus.11 7 8
Auditory Transduction (2002)
courtesy of radiant3d
Two complementary coding mechanisms are commonly described for how the auditory system is believed to code the pitch of a sound once it has reached the inner ear: the pitch place theory and the pitch temporal theory.5
The place theory of pitch is based on the fact that different frequency components of the input sound stimulate different places along the basilar membrane (as shown in the video above) and in turn stimulate auditory nerve fibres with different characteristic frequencies. The vibrational energy is carried by the inner ear fluid and travels the length of the membrane as a traveling wave. The wave reaches its greatest amplitude at a frequency-dependent location along the membrane. High frequencies have their peak nearer the base (near the middle ear), while low frequencies travel farther toward the apex before reaching their peak amplitude.9 10
The basis for the temporal theory of pitch perception is the timing of neural firings, which occur in response to vibrations on the basilar membrane. In this view, pitch can be represented by time intervals between neural events that follow the stimulus periodicity (including fine structure at low frequencies, and sometimes envelope periodicity when components are not cleanly separated). The reciprocal of the periodicity corresponds to the fundamental frequency of the waveform.5
The auditory processing parts of the brain (Heschl’s gyrus) are supplied with information concerning the place of stimulation on the basilar membrane (place theory) and neural firing patterns (temporal theory). The importance of both types of information depends on the frequencies present and the type of sound.5 In general, both place and timing information contribute at low and mid frequencies. Auditory-nerve phase locking to temporal fine structure is strongest at low frequencies and becomes much weaker above roughly 4–5 kHz, where place cues dominate more strongly.13 By the time signals reach auditory cortex, phase locking to fine structure is limited to roughly a few hundred Hz (e.g., ~60–250 Hz), so any timing-based pitch information must be transformed at earlier stages.12
With this basic framework in mind—place cues from the frequency-dependent traveling wave on the basilar membrane, and timing cues from patterns of neural firing—we can now return to the 19th century and see why pitch perception became such a contentious problem. The earliest investigators were essentially asking: Is pitch determined by the lowest spectral component and its place of excitation, or can pitch be inferred from temporal periodicity even when that component is weak or absent? This question sits at the center of the Seebeck–Ohm dispute, and it set the stage for Helmholtz’s influential attempt to reconcile the two.
The saga begins with a debate about the pitch of a complex tone between two scientists, August Seebeck (1805-1849) and Georg Simon Ohm (1789-1854).
