ArticlePDF Available

Interdependent Encoding of Pitch, Timbre, and Spatial Location in Auditory Cortex

Authors:

Abstract and Figures

Because we can perceive the pitch, timbre, and spatial location of a sound source independently, it seems natural to suppose that cortical processing of sounds might separate out spatial from nonspatial attributes. Indeed, recent studies support the existence of anatomically segregated "what" and "where" cortical processing streams. However, few attempts have been made to measure the responses of individual neurons in different cortical fields to sounds that vary simultaneously across spatial and nonspatial dimensions. We recorded responses to artificial vowels presented in virtual acoustic space to investigate the representations of pitch, timbre, and sound source azimuth in both core and belt areas of ferret auditory cortex. A variance decomposition technique was used to quantify the way in which altering each parameter changed neural responses. Most units were sensitive to two or more of these stimulus attributes. Although indicating that neural encoding of pitch, location, and timbre cues is distributed across auditory cortex, significant differences in average neuronal sensitivity were observed across cortical areas and depths, which could form the basis for the segregation of spatial and nonspatial cues at higher cortical levels. Some units exhibited significant nonlinear interactions between particular combinations of pitch, timbre, and azimuth. These interactions were most pronounced for pitch and timbre and were less commonly observed between spatial and nonspatial attributes. Such nonlinearities were most prevalent in primary auditory cortex, although they tended to be small compared with stimulus main effects.
Main effects and interactions of pitch, timbre, and azimuth. A–D , PSTH matrices illustrating the main effects and 2-way interactions of timbre, pitch, and azimuth on the responses of two cortical neurons. Data are sorted according to two of the three stimulus dimensions, as indicated on the left and top margins of each panel, and averaged across the third. Each PSTH shows the mean firing rate post stimulus onset in Hz. The effects of stimulus pitch and timbre (vowel identity) are plotted in A and B and the effects of stimulus azimuth and pitch are shown in C and D . A , C , Data from the neuron whose responses are shown in Figure 1 B . B , D , Data from the neuron shown in Figure 1 C . The first four rows and columns of each PSTH matrix represent the responses to combinations of the two stimulus parameters indicated. The cells at the end of each row and column show the average response to the stimulus parameter indicated by the corresponding row and column headers. For example, the cell at the end of the first row in A and B shows the mean response to a pitch of 200 Hz, regardless of timbre or location. The bottom right-hand PSTH in each matrix shows the overall grand average PSTH, constructed across all 64 stimulus conditions. The color scale underlay in each PSTH highlights the difference between it and the grand average PSTH, with blue indicating a decrease in firing relative to the average and red showing an increase. The color scales for the first four rows and columns in each of the four panels saturate at Ϯ 33 spikes/s in A , Ϯ 32 in B , Ϯ 11 in C , and Ϯ 8 in D . The color scales in the panels in the last row and bottom column saturate at Ϯ 60 spikes/s in A and C , and Ϯ 32 in B and D .
… 
Content may be subject to copyright.
Behavioral/Systems/Cognitive
Interdependent Encoding of Pitch, Timbre, and Spatial
Location in Auditory Cortex
Jennifer K. Bizley,
1
Kerry M. M. Walker,
1
Bernard W. Silverman,
2
Andrew J. King,
1
and Jan W. H. Schnupp
1,3
1
Department of Physiology, Anatomy, and Genetics, University of Oxford, Oxford OX1 3PT, United Kingdom,
2
St Peter’s College, Oxford OX1 3DL, United
Kingdom, and
3
Robotics, Brain, and Cognitive Sciences Department, Italian Institute of Technology, 16163 Genova, Italy
Because we can perceive the pitch, timbre, and spatial location of a sound source independently, it seems natural to suppose that cortical
processing of sounds might separate out spatial from nonspatial attributes. Indeed, recent studies support the existence of anatomically
segregated “what” and “where” cortical processing streams. However, few attempts have been made to measure the responses of indi-
vidual neurons in different cortical fields to sounds that vary simultaneously across spatial and nonspatial dimensions. We recorded
responses to artificial vowels presented in virtual acoustic space to investigate the representations of pitch, timbre, and sound source
azimuth in both core and belt areas of ferret auditory cortex. A variance decomposition technique was used to quantify the way in which
altering each parameter changed neural responses. Most units were sensitive to two or more of these stimulus attributes. Although
indicating that neural encoding of pitch, location, and timbre cues is distributed across auditory cortex, significant differences in average
neuronal sensitivity were observed across cortical areas and depths, which could form the basis for the segregation of spatial and
nonspatial cues at higher cortical levels. Some units exhibited significant nonlinear interactions between particular combinations of
pitch, timbre, and azimuth. These interactions were most pronounced for pitch and timbre and were less commonly observed between
spatial and nonspatial attributes. Such nonlinearities were most prevalent in primary auditory cortex, although they tended to be small
compared with stimulus main effects.
Key words: auditory cortex; tuning; sound; spike trains; vocalization; localization; parallel; hearing
Introduction
One of the most important functions of the auditory system is to
identify and discriminate vocal calls, such as speech sounds. This
task requires the listener to process several complex perceptual
properties of a single auditory object, and so is likely to engage a
number of functionally distinct cortical areas in parallel. To iden-
tify a spoken vowel, for example, the auditory system must deter-
mine the positions of formant peaks in the spectral envelope of
the vowel sound (Peterson and Barney, 1952). Vowel discrimina-
tion is therefore a timbre discrimination task. Meanwhile, in ad-
dition to its timbre, the pitch of a spoken vowel can convey in-
formation about the speaker’s identity (Gelfer and Mikos, 2005)
and emotional state (Fuller and Lloyd, 1992; Reissland et al.,
2003), and so the periodicity of the vowel must also be analyzed.
Finally, localization of the speaker requires processing binaural
disparity cues and monaural spectral cues that are independent of
the timbre and pitch of the vowel. Because many other species
generate vocalizations in an entirely analogous manner, process-
ing the pitch, timbre, and location of vowel-like sounds is an
important task for the mammalian auditory system in general.
Building on earlier studies in the visual system (Mishkin and
Ungerleider, 1982; Goodale and Milner, 1992), it is widely
thought that a separation of function exists within higher-order
auditory processing streams, such that more posterior, or dorsal,
cortical areas mediate sound localization, whereas more anterior,
or ventral, areas are responsible for object identification (Raus-
checker et al., 1997; Romanski et al., 1999; Kaas and Hackett,
2000; Alain et al., 2001; Maeder et al., 2001; Tian et al., 2001;
Warren and Griffiths, 2003; Barrett and Hall, 2006; Lomber and
Malhotra, 2008). Consequently, we might expect the pitch and
the timbre of a complex auditory stimulus to be represented in a
separate region from its spatial location. However, it is not un-
common for listeners to find themselves in cluttered acoustic
environments, where the pitch, timbre, and spatial location of
several sound sources may have to be tracked simultaneously.
Separating the neural processing of different perceptual at-
tributes could make this task harder, by creating a sort of “bind-
ing problem.”
Previous work has focused almost exclusively on differences
across cortical areas in the representation of just one parameter,
such as sound-source location (Recanzone, 2000; Stecker et al.,
2005; Harrington et al., 2008) or pitch (Bendor and Wang, 2005).
The extent to which these attributes are encoded independently
has not previously been investigated. Here we used “artificial
vowel” sounds to investigate how pitch (as determined by the
Received Oct. 2, 2008; revised Jan. 9, 2009; accepted Jan. 13, 2009.
This work was supported by the Biotechnology and Biological Sciences Research Council (Grants BB/D009758/1
to J.W.H.S., A.J.K., and J.K.B.), the Engineering and Physical Sciences Research Council (Grant EP/C010841/1 to
J.W.H.S.), a Rothermere Fellowship and Hector Pilling Scholarship to K.M.M.W., and a Wellcome Trust Principal
Research Fellowship to A.J.K. We are grateful to Israel Nelken for valuable discussion and comments on this
manuscript.
CorrespondenceshouldbeaddressedtoDr.JenniferK.Bizley,DepartmentofPhysiology,Anatomy,andGenetics,
Sherrington Building, University of Oxford, Parks Road, Oxford OX1 3PT, UK. E-mail: jennifer.bizley@dpag.ox.ac.uk.
DOI:10.1523/JNEUROSCI.4755-08.2009
Copyright © 2009 Society for Neuroscience 0270-6474/09/292064-12$15.00/0
2064 The Journal of Neuroscience, February 18, 2009 29(7):2064 –2075
pulse rate), timbre (as determined by formant filter frequencies),
and location are encoded within and across five identified areas of
the auditory cortex of the ferret. Our aim was to determine the
degree to which these perceptual attributes are represented in a
mutually independent manner in both primary and secondary
cortical fields, and to look for evidence for feature specialization
across these fields.
Materials and Methods
Animal preparation. All animal procedures were approved by the local
ethical review committee and performed under license from the UK
Home Office in accordance with the Animal (Scientific Procedures) Act
1986. Five adult, female, pigmented ferrets (Mustela putorius) were used
in this study. All animals received regular otoscopic examinations before
the experiment, to ensure that both ears were clean and disease free.
Anesthesia was induced by a single dose of a mixture of medetomidine
(Domitor; 0.022 mg/kg/h; Pfizer) and ketamine (Ketaset; 5 mg/kg/h; Fort
Dodge Animal Health). The left radial vein was cannulated and a contin-
uous infusion (5 ml/h) of a mixture of medetomidine and ketamine in
physiological saline containing 5% glucose was provided throughout the
experiment. The ferrets also received a single, subcutaneous, dose of 0.06
mg/kg/h atropine sulfate (C-Vet Veterinary Products) and, every 12 h,
subcutaneous doses of 0.5 mg/kg dexamethasone (Dexadreson; Intervet
UK) to reduce bronchial secretions and cerebral edema, respectively. The
ferret was intubated, placed on a ventilator (7025 respirator; Ugo Basile)
and supplemented with oxygen. Body temperature, end-tidal CO
2
, and
the electrocardiogram (ECG) were monitored throughout the experi-
ment. Experiments typically lasted between 36 and 60 h.
The animal was placed in a stereotaxic frame and the temporal muscles
on both sides were retracted to expose the dorsal and lateral parts of the
skull. A metal bar was cemented and screwed into the right side of the
skull, holding the head without further need of a stereotaxic frame. On
the left side, the temporal muscle was largely removed, and the suprasyl-
vian and pseudosylvian sulci were exposed by a craniotomy, exposing
auditory cortex (Fig. 1) (Kelly et al., 1986). The dura was removed and
the cortex covered with silicon oil. The animal was then transferred to a
small table in an anechoic chamber (IAC).
Stimuli. Sounds were generated using TDT system 3 hardware
(Tucker-Davis Technologies) and MATLAB (MathWorks), and pre-
sented through customized Panasonic RPHV297 headphone drivers.
Closed-field calibrations were performed using a one-eighth inch con-
denser microphone (Bru¨el and Kjær), placed at the end of a model ferret
ear canal, to create an inverse filter that ensured the driver produced a flat
(⬍⫾5 dB) output.
Pure tone stimuli were used to obtain frequency response areas
(FRAs), both to characterize individual units and to determine tonotopic
gradients, so as to confirm the cortical field in which any given recording
was made. The tones used ranged, in 1/3-octave steps, from 200 Hz to 24
kHz, and were 100 ms in duration (5 ms cosine ramped). Intensities
ranged from 10 to 80 dB SPL in 10 dB increments. Each frequency-level
combination was presented pseudorandomly at least 3 times, at a rate of
one per second. Artificial vowel stimuli were created in MATLAB, using
an algorithm adapted from Malcolm Slaney’s Auditory Toolbox (http://
cobweb.ecn.purdue.edu/malcolm/interval/1998-010/). Click trains
with a duration of 150 ms and a repetition rate corresponding to the
desired fundamental frequency were passed through a cascade of four
bandpass filters to impart spectral peaks at the desired formant frequen-
cies. The vowel sounds were normalized to have equal root-mean-square
amplitudes, and calibrations were performed using a one-eighth inch
condenser microphone (Bru¨el and Kjær) to ensure that changes in pitch
or timbre did not influence the overall sound pressure level. Virtual
acoustic space (VAS) techniques were then used to add sound-source
direction cues to the artificial vowel sounds. A series of measurements
including head size, sex, body weight, and pinna size, were taken from
each ferret to select the best match from our extensive library of ferret
head-related transfer function recordings. We have shown previously
that ferret spectral localization cue values scale with the size of the head
and external ears (Schnupp et al., 2003). Sound-source direction cues
were generated by convolving the artificial vowel sounds with minimum
phase filters that imparted the appropriate interaural level differences
and spectral cues corresponding to a particular direction in the horizon-
tal plane, and which at the same time equalized out any differences in the
headphone transfer functions that had been revealed during headphone
calibration. Small delays were then introduced in the sound waveforms
to generate appropriate interaural time differences.
We presented sounds from four virtual sound-source directions
(45°, 15°, 15°, and 45° azimuth, at 0° elevation) and used four sound
pitches with F0 equal to 200, 336, 565, and 951 Hz. Four timbres were
chosen: /a/with formant frequencies F1–F4 at 936, 1551, 2815, and 4290
Hz; // with formant frequencies at 730, 2058, 2979, and 4294 Hz; /u/
with formant frequencies at 460, 1105, 2735, and 4115 Hz; and /i/ with
formant frequencies at 437, 2761, 3372, and 4352 Hz. The permutation of
these 4 pitches by 4 timbres by 4 source directions gave us a stimulus set
of 64 sounds.
Data acquisition. Recordings were made with silicon probe electrodes
(Neuronexus Technologies). In two animals, we used electrodes with an
84 configuration (8 active sites on 4 parallel probes, with a vertical
spacing of 150
m). In a small number of recordings in one of these
animals, and in another animal, we used electrodes with a 16 2 config-
uration (16 active sites spaced at 100
m intervals on each of two probes).
In the final two animals, electrodes with 4 4 and 16 1 configurations
were used (100 –150
m spacing of active sites on each probe). The
electrodes were positioned so that they entered the cortex approximately
orthogonal to the surface of the ectosylvian gyrus. A photographic record
was made of each electrode penetration to allow later reconstruction of
the location of each recording site relative to anatomical landmarks (sur-
face blood vessels, sulcal patterns), to allow us to construct functional
maps of the auditory cortex.
The neuronal recordings were bandpass filtered (500 Hz to 5 kHz),
amplified (up to 20,000 times), and digitized at 25 kHz. Data acquisition
and stimulus generation were performed using BrainWare (Tucker-
Davis Technologies).
Data analysis. Spike sorting was performed off-line. Single units were
isolated from the digitized signal either by manually clustering data ac-
cording to spike features such as amplitude, width, and area, or by using
an automated k-means clustering algorithm, in which the voltage poten-
tial at 7 points across the duration of the spike window served as vari-
ables. We also inspected auto-correlation histograms, and only cases in
which the interspike-interval histograms revealed a clear refractory pe-
riod were classed as single units.
Composite map generation. Analysis of responses to vowel stimuli was
performed blind to animal number or the position of the electrode pen-
etration relative to anatomical landmarks. Before examining how the
responses to vowel stimuli varied as a result of their cortical location, each
penetration was first assigned to a cortical field on the basis of the re-
sponses of units to simple stimuli. This was done by measuring pure-tone
FRAs at all recording sites, and comparing these to previously docu-
mented physiological criteria for each of the fields that have been char-
acterized in ferret auditory cortex (Bizley et al., 2005). Penetrations were
assigned to a given cortical field according to the characteristic frequency
(CF) and tuning properties derived from the FRA and the latency and
duration of the response, together with photographs recording the loca-
tion of the electrode penetrations on the cortical surface and the overall
frequency organization obtained for each animal.
Having established the recording locations within each individual an-
imal and noted that there were consistent trends in the responses to
vowel sounds between cortical fields across animals, we established a
“composite” auditory cortex map. Field boundaries were determined on
the basis of the responses to pure tones and noise bursts, but blind to the
responses to vowel sounds. This approach has previously been used to
investigate the representation of multisensory responses in ferret audi-
tory cortex (Bizley and King, 2008). To create the composite map, the
penetration locations and cortical field boundaries for each individual
animal were projected onto a single animal frequency map derived using
optical imaging of intrinsic signals. This map was a representative exam-
ple taken from Nelken et al. (2004). This procedure was performed sep-
arately for each animal. Morphing each animal’s cortical map onto a
Bizley et al. Pitch, Timbre, and Location Coding in Cortex J. Neurosci., February 18, 2009 29(7):2064 –2075 • 2065
Figure 1. Stimuli and example responses. A, Frequency spectra for 16 of the artificial vowel stimuli used in this experiment. The four timbres used (corresponding to the vowels /a/, //, /u/, and
/i/) are shown in different columns, with the four pitches (F0 of 200, 336, 565, and 951 Hz) shown in different rows. Each of these stimuli was also presented at one of four different virtual sound
directions(45°, 15°, 15°, and 45° azimuth).B,C, Spike rasterplots from twodifferent cortical neurons inresponse to all64 stimulus combinations.In each case,the same dataare replotted three
times,organized either according to stimulus azimuth(Az), pitch (F0),or timbre (vowel,vID). Both ofthese units producedresponses that wereclearly dependent onat least twoof the threestimulus
dimensions.
2066 J. Neurosci., February 18, 2009 29(7):2064 –2075 Bizley et al. Pitch, Timbre, and Location Coding in Cortex
single example in this way allowed the data from each animal to be
superimposed in a bias-free manner.
Results
Responses to stimulus pitch, timbre, and location
We used a set of 64 artificial vowel sounds, comprising all possible
combinations of four spatial locations, four pitches, and four
timbres. The parameter values were chosen to be quite widely
spaced along each of the three dimensions. VAS stimuli were
presented at 45°, 15°, 15°, and 45° azimuth at 0° elevation,
with negative azimuths denoting locations to the animal’s right,
contralateral to the recording sites. Fundamental frequencies of
200, 336, 565, and 951 Hz were used, and the four timbres corre-
sponded to the vowels: /a/, //, /u/, and /i/. These parameter
ranges were chosen to make the stimuli easily discriminable along
each perceptual dimension, both for human listeners, and, as far
as we know from available psychoacoustical data, also for our
animal model species, the ferret. The azimuth spacing of 30°
corresponds approximately to two to three ferret behavioral just
noticeable difference (JND) limens (Parsons et al., 1999). The
perceptual distance of the pitch steps used here (0.75 octaves) is
similarly approximately twice as wide as the ferrets’ JNDs
(Walker et al., 2009). The ferrets’ ability to discriminate the spec-
tral envelopes associated with the four different vowels has not
yet been formally investigated, but preliminary experiments have
demonstrated that ferrets rapidly learn to discriminate the iden-
tity of these vowel sounds and do so across at least a two octave
range of pitches (J. K. Bizley, K. M Walker, A. J. King, and J. W. H.
Schnupp, unpublished observations).
Each artificial vowel stimulus was 150 ms long. Figure 1A
shows the frequency spectra for the 16 possible combinations of
pitch and timbre. This illustrates that both timbre and pitch
changes affect the spectral envelope of the sounds, and presenting
these sounds from different virtual directions can introduce fur-
ther changes in the spectral envelope. Yet although changes in
location, pitch, and timbre all affect the sound spectrum, the
perceptual consequences of these changes are quite distinct. If the
perceptual distinction between pitch, timbre, and location is re-
flected at the level of neuronal discharges in auditory cortex, then
this stimulus set ought to reveal this separation according to per-
ceptual categories.
Extracellular recordings were performed using multisite sili-
con electrodes in anesthetized ferrets. We sampled 900 acous-
tically sensitive recording sites. At 615 of these, we were able to
obtain stable recordings of neural responses to 30 40 presenta-
tions of each of the 64 artificial vowel stimuli, which were pre-
sented in a randomly interleaved order. Three hundred twenty-
four recordings were from single units and 292 were small
clusters of units. Because we were unable to find any systematic
difference in the response properties of single units and small unit
clusters, the term “unit” will be used to refer to both groups.
We observed a rich variety of response types across units, and
responses were frequently clearly modulated by more than one,
and often all three, stimulus dimensions. Figure 1, Band C, shows
the responses from two different units. In each case, the three
panels show the same data plotted three times, with the 64 stimuli
ordered into groups of 16 with a common azimuth, pitch, or
timbre (first, second, and third panels, respectively). These exam-
ples illustrate a common finding: tuning to stimulus pitch, tim-
bre, or azimuth alone did not adequately describe the responses
of these units and their responses could not be captured satisfac-
torily as a single spike count value. Rather, neurons often showed
a degree of sensitivity to each parameter (e.g., Fig. 1B), and/or to
particular combinations of parameters (Fig. 1C) in a time-
dependent manner. Further examples are illustrated in supple-
mental Figure 1, available at www.jneurosci.org as supplemental
material.
To examine these stimulus effects, we constructed post stim-
ulus time histogram (PSTH) matrices, in which data were sorted
according to two of the three stimulus parameters and pooled
across the third. Figure 2, Aand C, show such PSTH matrices for
the unit illustrated in Figure 1B. The 16 panels to the top left of
Figure 2Ashow the data arranged according to all 16 timbre
pitch combinations. The first four columns show the PSTH for
the responses to timbres corresponding to /i/, /u/, //, and /a/,
whereas the top four rows show the data for pitches at 200, 336,
565, and 951 Hz, respectively. The rightmost column in Figure
2Ashows the mean response for each pitch, averaged across all
timbres and azimuths, whereas the bottom row shows the mean
response for each timbre, and the bottom right PSTH illustrates
the grand average response across all stimuli.
Displaying the data in this manner makes it easier to appreci-
ate the effect of varying either stimulus pitch or timbre. To de-
scribe these effects, we shall adopt the terminology used in
ANOVA-type linear statistical models, with mean spike rate dur-
ing some small time interval as our “response variable,” whereas
pitch, timbre, location, and poststimulus time serve as “explana-
tory variables.” We treat these as categorical variates, as we can-
not assume the relationship between stimulus parameter value
and spike rate to be linear or even monotonic. Within that con-
ceptual framework, comparing the top four panels of the right-
most column with the bottom right panel therefore reveals the
“main effect” of varying pitch on the discharge pattern on this
unit. To make it easier to visualize the main effects of each stim-
ulus parameter, we plot the individual PSTHs in the rightmost
column and bottom row on top of a color scale that shows how
each particular PSTH differs from the grand average at each time
bin. Red means the neural firing rate is, at that time point, larger
than average, blue indicates that it is below average, and the sat-
uration of the color encodes the size of the difference.
The grand average PSTH in the bottom right panel in Figure
2Ashows that the unit responded to artificial vowel sounds with
an initial increase in firing rate, which peaked at a rate of 50 Hz
at 50 ms post stimulus onset, followed by a smaller second peak
at 180 ms. The main effect of presenting a relatively low pitch
(200 Hz, top panel of the last column) was to decrease the size of
the first peak in the PSTH and to increase that of the second.
Conversely, the main effect of high pitches (951 Hz, fourth panel
in the last column) was to increase the size of the first response
peak and to decrease the second. Similarly, the main effect of
varying timbre can be appreciated by comparing the panels in the
bottom row of the PSTH matrix in Figure 2A. A timbre corre-
sponding to the vowel /u/ strongly enhanced the initial response
peak (bottom row, second panel), whereas the timbre for /i/ sup-
pressed it (bottom row, first panel), but timbre changes did not
affect the later part of the response. This unit was therefore sen-
sitive to both pitch and timbre, and the effects of changing pitch
or timbre were manifest at different latencies after stimulus onset.
In the conceptual framework of an ANOVA-style analysis, the
simplest assumption for responses to a particular pitch/timbre
combination would be that the main effects of pitch and timbre
might be additive. To look for nonlinear interactions between the
stimulus dimensions, we compare the PSTHs in the main body of
the matrix against the values that would be predicted from the
linear sum of the main effects, which are shown by the color scales
in the rightmost column and bottom row of Figure 2A–D. The
Bizley et al. Pitch, Timbre, and Location Coding in Cortex J. Neurosci., February 18, 2009 29(7):2064 –2075 • 2067
color scales in the main body (first four rows and columns) of the
PSTH matrix show these “two-way interactions.” Therefore any
deviation from white shows that the response observed was non-
linear, with red colors indicating a supra-additive response, and
blue colors indicating a subadditive one. For example, in the unit
shown in Figure 2A, the combination of /i/ and a 200 Hz F0
elicited a supralinear response, whereas the combination of /i/
and 951 Hz F0 resulted in a response that was smaller than the
linear prediction. However, examination of the absolute values
for the interactions and main effects shows that the size of the
two-stimulus interactions were small relative to the “main ef-
fects” of any one stimulus parameter; the interaction coefficients
did not exceed 33 spikes/s compared with 86 spikes/s for the
largest main effect.
Figure 2Cshows the azimuth-by-pitch main effects and inter-
actions for the same unit, whereas Figure 2, Band D, shows the
pitch-by-timbre and pitch-by-azimuth main effects and interac-
tions, respectively, for a second sample unit (the same as that
illustrated in Fig. 1C). Although this second sample unit exhib-
ited rather different temporal discharge patterns, like the first, it
was clearly influenced by more than one stimulus dimension. As
we shall see further below, the data shown in Figure 2 were fairly
typical of many of the units recorded throughout all cortical areas
characterized. Thus, most units were sensitive to more than one
stimulus dimension, their firing patterns could change at various
times post-stimulus onset, and nonadditive interactions between
stimuli were not uncommon.
To quantify the strength and significance of these main effects
and interactions, we performed a 4-way ANOVA on the spike
counts, averaged across the 30 40 repeat presentations for each
Figure 2. Main effects and interactions of pitch, timbre, and azimuth. A–D, PSTH matrices illustrating the main effects and 2-way interactions of timbre, pitch, and azimuth on the responses of
two cortical neurons. Data are sorted according to two of the three stimulus dimensions, as indicated on the left and top margins of each panel, and averaged across the third. Each PSTH shows the
mean firing rate post stimulus onset in Hz. The effects of stimulus pitch and timbre (vowel identity) are plotted in Aand Band the effects of stimulus azimuth and pitch are shown in Cand D.A,C,
Data from the neuron whose responses are shown in Figure 1 B.B,D, Data from the neuron shown in Figure 1C. The first four rows and columns of each PSTH matrix represent the responses to
combinations of the two stimulus parameters indicated. The cells at the end of each row and column show the average response to the stimulus parameter indicated by the corresponding row and
column headers. For example, the cell at the end of the first row in Aand Bshows the mean response to a pitch of 200 Hz, regardless of timbre or location. The bottom right-hand PSTH in each matrix
showsthe overall grand average PSTH, constructedacross all 64stimulus conditions. Thecolor scale underlayin each PSTHhighlights the differencebetween it andthe grand averagePSTH, with blue
indicating a decrease in firing relative to the average and red showing an increase. The color scales for the first four rows and columns in each of the four panels saturate at 33 spikes/s in A,32
in B,11 in C, and 8inD. The color scales in the panels in the last row and bottom column saturate at 60 spikes/s in Aand C, and 32 in Band D.
2068 J. Neurosci., February 18, 2009 29(7):2064 –2075 Bizley et al. Pitch, Timbre, and Location Coding in Cortex
of the 16 stimuli, in each 20 ms bin for the first 300 ms after
stimulus onset. In this manner, each response was represented as
a vector of 15 sequential spike counts. Our choice of 20 ms bin
widths was based on previous studies of ferret auditory cortex,
which indicate that this is likely to be a suitable temporal resolu-
tion for decoding neural responses (Schnupp et al., 2006; Walker
et al., 2008). In this ANOVA, the 3 stimulus parameters (azimuth,
pitch, and timbre) plus the time bin served as factors. To quantify
the relative strength with which one of the three stimulus dimen-
sions influenced the firing of a particular unit, we calculated the
proportion of variance explained by each of azimuth, pitch, and
timbre, Var
stim
, as:
Varstim
SSstim bin SSerror dfstim bin
SStotal SSbin
, (1)
where “stim” refers to the stimulus parameter of interest (pitch,
timbre, or azimuth), SS
stim bin
is the sum of squares for the
interaction of the stimulus parameter and time bin, SS
error
is the
sum of squares of the error term, df
stim bin
refers to the degrees of
freedom for the stimulus time bin interaction, SS
total
is the total
sum of squares, and SS
bin
is the sum of squares for the time bin
factor. A significant SS
bin
reflects the fact that the response rate
was not flat over the duration of the 300 ms response window.
This is in itself unsurprising, but by examining the stimulus-by-
time-bin interactions, we were able to test the statistical signifi-
cance of the influence a given stimulus parameter had, not just on
the overall spike rate, but also on the temporal discharge pattern
of the response. Stimulus-by-time-bin interactions were com-
mon, and revealed how a particular stimulus parameter influ-
enced the shape of the PSTH. Subtracting the SS
error
df
stim bin
from the SS
stim bin
term allows us to calculate the proportion of
response variance attributable to each of the stimuli, taking into
account the additional variance explained simply by adding extra
parameters to the model. For the responses shown in Figure 2, A
and C, the percentage of variance explained by the stimulus main
effects was 5% for azimuth, 17% for pitch and 56% for timbre,
whereas for the unit shown in Figure 2, Band D, the main effects
of azimuth, pitch, and timbre accounted for 3%, 19%, and 7%,
respectively, of the variance in the neural discharge patterns.
Thus, although both units were significantly influenced by all 3
stimulus dimensions, one might justifiably describe the first as
being “predominantly” sensitive to timbre and the second to
pitch. Only 23% of units were significantly ( p0.001) modu-
lated by azimuth, pitch, or timbre alone. In contrast, 36% of
neural responses were dependent on two of the three stimulus
dimensions and 29% of units were influenced by all three. The
responses of the remaining 12% of units were not significantly
modulated by any of the stimuli.
As mentioned above, we combined data from single units and
from small multiunit clusters for most analyses. To verify that any
joint sensitivity was not a result of recording from more than one
neuron, we determined the proportion of units sensitive to com-
binations of two parameters for single units alone. Sensitivity to
both pitch and timbre was observed in 56% of all recording sites
and in 52% (169 of 324) of the single units. Thirty percent of all
recordings and 33% of all single units were sensitive to both pitch
and azimuth, whereas 34% of all recordings and 36% of single
units were sensitive to timbre and azimuth. Thus, we were equally
likely to observe combination sensitivity in both multiunits and
well separated single units.
Spike count measures do not fully capture
response complexity
To examine the importance of the temporal discharge pattern in
the neural response, we also performed an ANOVA that was
restricted to the overall spike counts calculated over the first 75
ms after stimulus onset. This ANOVA was performed again using
pitch, azimuth, and timbre as independent variables, but this
time excluded poststimulus time. It resulted in far fewer units
exhibiting significant response modulation as a function of these
stimulus attributes. For example, using a single spike count mea-
sure, only 11% of all units exhibited significant ( p0.05) sen-
sitivity to sound azimuth, although 36% of units had shown a
significant ( p0.001) time-bin azimuth interaction. The
same comparison for pitch and timbre yielded values of 18%
compared with 66% for pitch, and 29% compared with 73% for
timbre. Over this single time window of 75 ms, the discharge rates
of 60% of units no longer exhibited any significant stimulus main
effects, and only 13% of units were modulated by more than one
parameter. Analyzed in this manner, the unit shown in Figure
2A, for example, was found to be sensitive only to timbre,
whereas that depicted in Figure 2Bwas no longer sensitive to any
of the three stimulus dimensions.
When we performed the analysis using a longer response win-
dow (300 ms, data not shown) there were even fewer neurons
whose responses were significantly modulated by these stimuli,
and neither unit shown in Figure 1 was found to be sensitive to
pitch, azimuth, or timbre. Figure 2Ashows that, for one of these
units, the early and the late part of the response varied in opposite
ways with stimulus pitch. The resulting “cancellation” of stimu-
lus effects is the likely to be the main reason why using an inap-
propriately wide analysis window failed to give a significant result
despite clear stimulus dependence. These results clearly demon-
strate that highly significant but transient stimulus effects can
easily be missed in an analysis that is temporally too coarse-
grained. We therefore adopted the response variance explained
statistics described in Equation 1 as our preferred measure of
neural stimulus sensitivity for the further analyses described
below.
Cortical distribution of sensitivity to pitch, timbre,
and azimuth
To examine the distribution of stimulus sensitivity across audi-
tory cortex, recordings were made in 5 of the 7 previously iden-
tified acoustically responsive areas in the ferret ectosylvian gyrus:
the primary and anterior auditory fields (A1 and AAF), the tono-
topically organized posterior pseudosylvian and posterior supra-
sylvian fields (PPF and PSF), which are located on the posterior
bank of the ectosylvian gyrus, and the nontonotopic anterior
dorsal field (ADF) on the anterior bank (Fig. 3A, Table 1) (Ko-
walski et al., 1995; Nelken et al., 2004; Bizley et al., 2005). These
five areas make up the auditory “core” (A1 and AAF), and “belt”
(PPF, PSF, ADF) areas in this species. We have previously re-
ported that 60% of neurons in the anterior ventral field (AVF)
are visually sensitive (Bizley et al., 2007). AVF and the ventropos-
terior (VP) area are likely to be “parabelt” areas and were not
included in the present study. In four of five animals, recordings
were made in all five cortical fields and, in the remaining animal,
responses were recorded in three fields.
The locations of each field were determined for individual
animals as described in the Materials and Methods. Because we
used multisite electrode arrays and commonly recorded several
units at different depths, the range of CFs obtained in these cases
was visualized by plotting one Voronoi tile for each unit re-
Bizley et al. Pitch, Timbre, and Location Coding in Cortex J. Neurosci., February 18, 2009 29(7):2064 –2075 • 2069
corded, arranged in a circular manner
around the penetration site, with the most
deeply recorded unit shown rightmost and
proceeding clockwise from deep to super-
ficial. The composite frequency map ob-
tained in this manner is shown in Figure
3B. The low frequency regions that mark
the boundaries of fields AAF and ADF and
of A1, PPF, and PSF are readily apparent.
To investigate the anatomical distribu-
tion of sensitivity to pitch, timbre, and lo-
cation, we generated composite feature
sensitivity maps using methods described
further below. Figure 3C–E maps the pro-
portion of variance explained (Eq. 1) by
the main effects of azimuth, pitch, and
timbre, respectively, for every penetration
made, onto the surface of the auditory cor-
tex. Each “tile” in the Voronoi tesselation
shows the average value obtained for all
units recorded at that site. The color scale
indicates the proportion of variance ex-
plained with darker, red colors indicating
low, and brighter, yellow colors indicating
high values. These plots suggest that there
are areas in which clusters of units have a
higher sensitivity to each stimulus param-
eter. Highly azimuth sensitive units were
particularly common in A1, as well as in an
area encircling the tip of the pseudosylvian
sulcus. Highly pitch sensitive units were
most commonly found in the middle of
auditory cortex, around the point at which
the low frequency edges of the tonotopic
core and belt areas converge, as shown in
Figure 3B. Timbre sensitivity was highest
in the primary fields and along the low fre-
quency ridge that separates the two poste-
rior fields, PPF and PSF. To illustrate the
range of percentage variance explained in
any one electrode penetration, Figure
3F–H plots the values for every unit re-
corded, with each tile representing a single
recorded unit, and multiple units from a
single penetration arranged in a circular
manner, as in Figure 3B, around the site of
the penetration. Overall, Figure 3 suggests
that there are “clusters” of recording sites
that are relatively more sensitive to stimu-
lus azimuth, pitch, or timbre, but these are
not obviously restricted to a particular
subset of the five tonotopic cortical fields
investigated here, and, in each field, we observed considerable
unit-to-unit variability in the sensitivity to pitch, timbre, and
location.
The distribution of parameter sensitivity for each cortical area
was visualized in box-plot format (Fig. 3I–K). Despite the very
wide unit to unit variation, some degree of specialization is nev-
ertheless apparent, as, for all three stimulus dimensions, there
were significant differences in the proportion of variance ex-
plained by cortical field (Kruskal–Wallis test,
2
27, 68, and 77
respectively, p0.0001). Tukey–Kramer post hoc comparisons
(p0.05) revealed that azimuth sensitivity was, on average,
significantly higher in A1 and PPF than in AAF, PSF, and ADF,
pitch sensitivity tended to be more pronounced in A1 and the
posterior fields PSF and PPF than in AAF and ADF, whereas AAF
showed the highest average level of timbre sensitivity. These
Figure3. Distributionofrelativesensitivitytolocation,pitch,andtimbreacrosstheauditorycortex.A,Locationofferretauditorycortex
on the middle, anterior, and posterior ectosylvian gyri (MEG, AEG, and PEG, respectively). The inset shows the location of seven auditory
cortical fields. The color scale shows the tonotopic organization as visualized using optical imaging of intrinsic signals (from Nelken et al.,
2004). B, Voronoi tessellation map showing the characteristic frequencies (CFs) of all unit recordings made (n811). These data were
collected from a total of five animals and have been compiled onto one auditory cortex map. Each tile of the tessellation shows the CF
obtained from each recording site, using the same color scale as in A.C–E, Voronoi tessellation maps plotting the proportion of variance
explainedbyeachofthestimulusdimensions:azimuth(C),pitch (D), and timbre (E). Each tile represents the average value obtained at that
penetration. All units included in the variance decomposition are shown (n619). F–H,asC–E, but here each individual unit is plotted,
with tiles representing units from a single penetration arranged counterclockwise by depth around the penetration site. I–K, Box-plots
showing the proportion of variance explained by azimuth (I), pitch (J), and timbre (K) for each of the five cortical areas examined. The
boxes show the upper and lower quartile values, and the horizontal lines at their “waist” indicate the median. In all cases, there was a
significanteffectofcorticalfieldonthedistributionof variance values (Kruskal–Wallis test, p0.001), and significant pairwise differences
are indicated by the horizontal lines above the plots (Tukey–Kramer post hoc test, p0.05).
Table 1. Total number of recordings (probe placements and units) in each cortical
field for the 5 ferrets used in this study
A1 AAF PPF PSF ADF
Probe placements (n)10 7 117 5
Units (n) 189 101 152 96 77
2070 J. Neurosci., February 18, 2009 29(7):2064 –2075 Bizley et al. Pitch, Timbre, and Location Coding in Cortex
trends were seen consistently in each of the animals, and not just
in the pooled data, and a jack-knife test was used to establish that
no one animal contributed disproportionately to the small but
significant differences reported here. The same statistical tests
(i.e., Kruskal–Wallis and Tukey–Kramer post hoc tests) were im-
plemented 5 times, excluding each of the five animals in turn. In
all cases, the same significant trends across cortical areas were
preserved.
Linear and nonlinear interactions in the cortical sensitivity to
pitch, timbre, and azimuth
Our analysis method quantified the linear contribution of each of
the three stimulus dimensions to the units’ responses, as well as
the nonlinear effects of presenting particular stimulus combina-
tions. This nonlinear effect is quantified in our ANOVA by the
three-way interaction of the time bin factor with two different
stimulus parameters, and can be thought of as a measure of com-
bination sensitivity. The interaction coefficients measure nonad-
ditive (“multiplicative”) interactions between categorical vari-
ates, and are analogous to a logical AND operation (by how much
does the response differ from the purely linear, additive expecta-
tion for specific parameter combinations e.g., when timbre /i/
AND azimuth 45°?). Over two-thirds of units tested were sen-
sitive to more than one stimulus dimension and 41% of all units
showed significant ( p0.001) nonlinear interactions for at least
one combination of pitch timbre, azimuth pitch, or azi-
muth timbre. We also investigated whether it would be possi-
ble to describe the nature of these nonlinear interactions either as
“predominantly expansive” or facilitatory, or as “predominantly
compressive” or saturating. Expansive nonlinearity would make
the response of a neuron more selective for a particular stimulus
parameter combination, whereas saturat-
ing nonlinearities would have the opposite
effect. Expansive nonlinearities are supra-
additive, and would result in positive in-
teraction coefficients which grow system-
atically as main effect coefficients grow
larger. Compressive, subadditive nonlin-
earities, would, in contrast, result in large,
positive main effect coefficients being as-
sociated with negative interaction coeffi-
cients. We therefore compared the sums of
main effect coefficients (the “predicted ad-
ditive response”) with their corresponding
interaction coefficients, but we found no
significant systematic trends or relation-
ships. The interaction effects thus appear
to be too variable to allow them to be char-
acterized generally as overall predomi-
nantly expansive or compressive.
The distribution across the cortical sur-
face of sensitivity to combinations of stim-
ulus parameters is shown in Figure 4A–C.
Nonlinear interactions were most com-
monly observed in A1 and AAF, where
54% and 56% of units exhibited significant
interactions between two or more stimuli,
respectively. The most common interac-
tions were between pitch and timbre, al-
though azimuth timbre sensitivity and
azimuth pitch sensitivity were also ob-
served (Fig. 4D). The proportion of units
exhibiting significant interactions fell to
36%, 30%, and 15% in fields PPF, PSF, and ADF, respectively.
Although the number of units showing interactions varied be-
tween cortical fields, the overall magnitude of the interaction
term did not vary between cortical areas for either the azimuth
pitch or azimuth timbre conditions. This is shown in Figure 4 E
by plotting the proportion of variance explained by the sum of the
spatial and nonspatial interaction terms (azimuth pitch and
azimuth timbre) for each of the five cortical areas (Kruskal–
Wallis test,
2
5.5, p0.24). In contrast, the proportion of re-
sponse variance explained by the pitch timbre interaction term
did show a significant variation across cortical fields (Fig. 4F,
Kruskal–Wallis test,
2
48.2, p0.001), with combination sensi-
tivity for nonspatial parameters accounting for more of the variance
in the primary areas A1 and AAF than in the posterior fields PSF and
PPF, which in turn, had higher values than ADF. This distribution is
very similar to that observed for the timbre main effects.
Both the number and magnitude of the pitch timbre inter-
actions were greater than for either the timbre azimuth or
pitch azimuth interaction terms, supporting a separation of
spatial and nonspatial attributes throughout auditory cortex.
However, this separation is far from complete: of the 254 units in
which there was a significant interaction, 225 were sensitive to
pitch–timbre combinations and 60 of these (i.e., 26%) were also
sensitive to pitch–azimuth or timbre–azimuth combinations. In
summary, whereas interactions in the “what” domain were more
common, a substantial minority of units were sensitive to com-
binations of both spatial and nonspatial stimulus features.
Stimulus sensitivity differs with cortical depth
Our multisite recording electrodes allowed us to make simulta-
neous recordings at 8 –16 different depths throughout the cortex.
Figure 4. Nonlinear sensitivity to stimulus combinations. A–C, Maps showing the distributions of neural sensitivity attribut-
able to (proportion of response variance explained by) timbre azimuth (A), pitch azimuth (B), or timbre pitch (C)
nonlinear two-way interactions. D, Histogram showing the number of units in each field in which there were significant two-
stimulus interactions for each of these stimulus parameter combinations. The total number of units recorded in each cortical field
are listed above. E,F, Box-plots summarizing the statistical distributions of the summed azimuth timbre and azimuth pitch
interactions (E), and the pitch timbre interactions (F). There was no significant difference in the distribution between fields for
the interactions between spatial and nonspatial parameters shown in E(p0.24). In contrast, the magnitude of the pitch
timbre interactions did vary with cortical field ( p0.001). Horizontal lines above the box-plots show which distributions had
pairwise significantly different means (Tukey–Kramer post hoc test, p0.05).
Bizley et al. Pitch, Timbre, and Location Coding in Cortex J. Neurosci., February 18, 2009 29(7):2064 –2075 • 2071
We grouped recordings coarsely into “su-
perficial” or “deep” simply based on
whether or not the recording site was
within or 800
m below the cortical sur-
face. These divisions coarsely divide units
into supragranular and infragranular cor-
tical layers. The distribution of proportion
of variance explained values are plotted in
Figure 5. Azimuth sensitivity was found to
be greatest in the deeper cortical layers (2-
sampled ttest, p0.01), whereas pitch
and timbre sensitivity were greater in the
superficial layers ( p0.004 and p
0.001, respectively). Pitch–timbre interac-
tions were also more common in the su-
perficial layers ( p0.002) whereas pitch–
azimuth ( p0.33) and timbre–azimuth
(p0.35) interactions were found to be
equally distributed in depth. Data were
pooled across all cortical areas for these
analyses, but similar trends were observed in each of the five
cortical fields individually.
Azimuth, pitch, and timbre sensitivities do not depend on
unit characteristic frequency
In Figure 3Ewe showed that clusters of units with high timbre
sensitivity were commonly found in the low CF border region
between A1, PPF, and PSF. This raises the question of whether
sensitivity to timbre, or indeed to other stimulus parameters,
varies systematically with unit CF. However, scatter plots of unit
CF against the proportion of variance explained (supplemental
Fig. 2A–C, available at www.jneurosci.org as supplemental ma-
terial) showed no systematic relationship between unit CF and
azimuth, pitch, or timbre sensitivity. Furthermore, pitch, timbre,
or azimuth sensitivity was just as common among units that were
unresponsive or untuned to pure tones as in units with clearly
defined CFs (supplemental Fig. 2D–F, available at www.
jneurosci.org as supplemental material), and there were no CFs at
which it was particularly easy or difficult to obtain vowel re-
sponses (supplemental Fig. 1G, available at www.jneurosci.org as
supplemental material). We also found no significant correla-
tions between CF and best pitch (defined as the pitch that elicited
the most spikes per presentation) for pitch-sensitive units with a
CF 1 kHz.
Discussion
It has been proposed that a division of labor exists across auditory
cortical areas whereby the ability of humans and animals to rec-
ognize or localize auditory objects can be attributed to anatomi-
cally separate processing streams. This concept is inspired by
earlier studies that postulated distinct hierarchies for processing
different visual features, such as color or motion, in extrastriate
visual cortex (Ungerleider and Haxby, 1994). Parallel processing
in the auditory system is supported by behavioral-deactivation
(Lomber and Malhotra, 2008), functional imaging (Alain et al.,
2001; Maeder et al., 2001; Warren and Griffiths, 2003; Barrett and
Hall, 2006), and electrophysiological studies (Recanzone, 2000;
Tian et al., 2001; Cohen et al., 2004), as well as by anatomical
evidence showing differences in the connectivity of these regions
(Hackett et al., 1999; Romanski et al., 2000; Bizley et al., 2007).
Most of the physiological studies compared the sensitivity of dif-
ferent auditory cortical areas to only one stimulus parameter,
such as spatial location or pitch. Here we adopted a different
approach, based around a stimulus set in which several stimulus
dimensions were systematically varied, to explore the relative
sensitivity of neurons in different cortical fields to azimuth, pitch,
and timbre.
Choice of stimulus parameter values
Because we examined neural responses to three stimulus at-
tributes, the range of values selected in each case was necessarily
limited, but nonetheless covered behaviorally pertinent and
broadly comparable ranges. Ferrets can accurately localize low-
frequency narrowband noise bursts (Kacelnik et al., 2006) and
lateralize the same synthetic vowels that were used in the present
experiment. Although the range of azimuths used covered only
the frontal quadrant of auditory space, behavioral and electro-
physiological studies have shown particularly high sensitivities
within this region to changes in sound-source direction (Mrsic-
Flogel et al., 2005; Nodal et al., 2008). The separation of our
stimuli in both azimuth (Parsons et al., 1999) and pitch (Walker
et al., 2009) was large compared with psychoacoustic difference
limens. There have been few studies of timbre discrimination in
animals, but chinchillas can discriminate the vowels /i/ and /a/
across a range of speakers and pitches (Burdick and Miller, 1975),
and our own data demonstrate that ferrets can accurately dis-
criminate the vowel timbres used in this study (Bizley, Walker,
King, and Schnupp, unpublished observations).
Distribution of azimuth, pitch, and timbre sensitivity
The bulk of behavioral and neurophysiological studies of audi-
tory cortex have been performed in cats. A1 is well conserved
across species, and AAF appears to be equivalent in cat and ferret
(Kowalski et al., 1995; Imaizumi et al., 2004; Bizley et al., 2005).
The posterior fields, PPF and PSF, in the ferret share certain
similarities with the posterior auditory field (PAF) in the cat, both
being tonotopically organized and containing neurons with dif-
ferent temporal response properties from those in the primary
fields (Phillips and Orman, 1984; Stecker et al., 2003; Bizley et al.,
2005). Like the cat’s secondary auditory field (Schreiner and
Cynader, 1984), ferret ADF lacks tonotopic organization and
contains neurons with broad frequency response areas (Bizley et
al., 2005). Although strict homologies have yet to be established,
auditory cortex appears to be organized in a similar manner in
these species.
We found that, as in cats (Harrington et al., 2008), sensitivity
Figure 5. Parameter sensitivity in superficial and deep cortical layers. A–C, Distribution of parameter sensitivity (response
variance attributable) to azimuth (A), pitch (B), and timbre (C) for responses recorded at superficial (800
m) or deep (800
m) cortical locations. Higher sensitivities to pitch and timbre were relatively more common in the superficial layers.
2072 J. Neurosci., February 18, 2009 29(7):2064 –2075 Bizley et al. Pitch, Timbre, and Location Coding in Cortex
to changes in sound azimuth varies across auditory cortex, but is
nonetheless a property of all areas examined. This contrasts with
recent behavioral data in cats (Lomber et al., 2007), which suggest
that certain cortical fields, such as A1 and PAF, are required for
normal sound localization, whereas others, such as AAF, are not.
However, complex behaviors, like remembering to approach a
sound source for a reward, are bound to require cognitive control
from high-order cortical areas, most likely in frontal cortex. The
profound differences observed in response to cooling different
cortical fields may therefore have less to do with the physiological
properties of neurons in those areas than with their projections to
higher-order brain regions (Hackett et al., 1998; Romanski et al.,
1999).
Our findings are also in broad agreement with many previous
studies of cortical pitch sensitivity. Our “pitch sensitivity” test
was less stringent than that used by Bendor and Wang (2005),
who reported a pitch-selective area in marmoset auditory cortex.
However, in agreement with their observations, we found pitch-
sensitive neurons to be more common in the superficial cortical
layers and to occur frequently (although not exclusively) near the
low-frequency border of A1. Nevertheless, overall, we did not
find a correlation between unit CF and pitch sensitivity or a ten-
dency for low CF neurons to be more sensitive to pitch. The lack
of a single pitch center in ferret auditory cortex is supported by
the results of imaging studies in ferrets (Nelken et al., 2008) and
humans (Hall and Plack, 2008).
The neural basis for timbre processing has been much less
widely studied. Previous studies have demonstrated that the re-
sponse properties of neurons in A1 are well suited to detect spec-
tral envelope cues (Calhoun and Schreiner, 1998; Versnel and
Shamma, 1998). Moreover, the spectral integration properties of
A1 neurons have been reported to be topographically organized
in A1 in a manner that might support vowel discrimination based
on the frequency relationship of the first and second formants
(Ohl and Scheich, 1997). Consistent with these studies, we found
that sensitivity to vowel timbre was greatest in the primary audi-
tory cortical areas of the ferret. Nevertheless, as with azimuth and
pitch sensitivity, the responses of neurons recorded in all five
cortical fields were modulated by timbre. Lesions of the dorsal/
rostral auditory association cortex, but not A1, in rats impaired
performance on a multiformant vowel discrimination task (Ku-
doh et al., 2006). Using fMRI, timbre sensitivity has been dem-
onstrated in both posterior Heschl’s gyrus and the superior tem-
poral sulcus in humans (Menon et al., 2002; Kumar et al., 2007).
Previous studies have reported systematic differences in neu-
ral tuning properties along isofrequency laminae in A1. Changes
in the representation of properties such as tone threshold and
tuning bandwidth (Schreiner and Mendelson, 1990; Cheung et
al., 2001; Read et al., 2001) and binaural response characteristics
(Middlebrooks et al., 1980; Rutkowski et al., 2000) have been
observed. However, the sampling density of recording sites in
individual animals was insufficiently fine to investigate whether
this was the case for any of the parameters investigated in the
present study.
Interdependent coding of pitch, timbre, and azimuth
Our analysis revealed that the neural encoding of pitch, location,
and timbre cues is interwoven and distributed across auditory
cortex. These methods were sensitive to changes in neural firing
that occurred over time and were able to capture the effects ap-
parent in the raw data in a way that a simple spike-count measure
failed to. Previous studies have also shown that the timing of
spikes in auditory cortex carries information useful for discrim-
inating natural sounds (Schnupp et al., 2006; Goure´ vitch and
Eggermont, 2007), as well as about a sound’s pitch (Steinschnei-
der et al., 1998) and location (Furukawa and Middlebrooks, 2002;
Nelken et al., 2005).
Our stimuli spanned three independent perceptual dimen-
sions, but very few neurons in any of the cortical fields examined
were sensitive to changes in azimuth, pitch, or timbre only. Nev-
ertheless, small but significant differences in average neuronal
sensitivity were observed across cortical areas and depths. These
subtle regional differences could provide the basis for the subse-
quent anatomical segregation of spatial and nonspatial informa-
tion in higher-order cortical areas. Although it is possible that
cortical areas other than those sampled here exhibit greater func-
tional specialization, or that a clearer distinction might be appar-
ent with other stimulus features, such as those that are temporally
modulated, it is important to remember that spatial and nonspa-
tial aspects of sounds often have to be considered together. For
instance, to operate effectively in the presence of multiple sound
sources, it is necessary to be able to track specific pitch, timbre,
and sound-source location combinations over time. Spatial sen-
sitivity has been reported within auditory cortical and prefrontal
areas thought to be concerned with sound identification (Cohen
et al., 2004; Gifford and Cohen, 2005; Lewald et al., 2008). More-
over, a recent study (Recanzone, 2008) documenting sensitivity
to monkey calls found that neurons throughout auditory cortex
were equally selective in their responses. Interactions between
spatial and nonspatial processing streams are known to occur in
the visual cortex (Tolias et al., 2005). Such effects are likely to be
particularly important in audition, where multiple sounds can be
perceived simultaneously at several locations.
We found that cortical neurons often responded nonlinearly
to feature combinations. This was particularly the case for pitch
and timbre in the primary fields, A1 and AAF. This apparent
combination sensitivity could simply reflect intermixing of rela-
tively low level sensitivity to multiple sound features. However, it
has been argued that A1 might represent auditory objects by
grouping together physical stimulus attributes from a common
source, with higher-order cortical areas extracting perceptual fea-
tures, such as the object’s location, from this object-based repre-
sentation (Nelken and Bar-Yosef, 2008). Grouping stimulus at-
tributes is essential for tracking a sound source through a
potentially cluttered acoustic environment (Bregman, 1990), and
the nonlinear sensitivity that we observed in A1 may be ideally
suited to achieving this. We observed a decrease in nonlinear
feature interactions away from A1, suggesting an increasingly
independent representation of these perceptual dimensions in
higher auditory cortex. Although still sensitive to different sound
features, this sensitivity was well described by linear interactions,
which help to preserve information for subsequent processing.
Ultimately, the question of interest is how this distributed
network of neurons contributes to perception. In humans, selec-
tive attention has been shown to modulate putative localization
and identification pathways independently (Ahveninen et al.,
2006). The true extent to which a division of labor exists within
auditory cortex may therefore become apparent only when ani-
mals use the activity in these areas to listen to different attributes
of sound.
References
Ahveninen J, Ja¨a¨skela¨inen IP, Raij T, Bonmassar G, Devore S, Ha¨ma¨la¨inen M,
Leva¨ nen S, Lin FH, Sams M, Shinn-Cunningham BG, Witzel T, Belliveau
JW (2006) Task-modulated “what” and “where” pathways in human
auditory cortex. Proc Natl Acad Sci U S A 103:14608 –14613.
Bizley et al. Pitch, Timbre, and Location Coding in Cortex J. Neurosci., February 18, 2009 29(7):2064 –2075 • 2073
Alain C, Arnott SR, Hevenor S, Graham S, Grady CL (2001) “What” and
“where” in the human auditory system. Proc Natl Acad Sci U S A
98:12301–12306.
Barrett DJ, Hall DA (2006) Response preferences for “what” and “where” in
human non-primary auditory cortex. Neuroimage 32:968–977.
Bendor D, Wang X (2005) The neuronal representation of pitch in primate
auditory cortex. Nature 436:1161–1165.
Bizley JK, King AJ (2008) Visual-auditory spatial processing in auditory cor-
tical neurons. Brain Res 1242:24–36.
Bizley JK, Nodal FR, Nelken I, King AJ (2005) Functional organization of
ferret auditory cortex. Cereb Cortex 15:1637–1653.
Bizley JK, Nodal FR, Bajo VM, Nelken I, King AJ (2007) Physiological and
anatomical evidence for multisensory interactions in auditory cortex.
Cereb Cortex 17:2172–2189.
Bregmen AS (1990) Auditory scene analysis: the perceptual organisation of
sound. Cambridge, MA: MIT.
Burdick CK, Miller JD (1975) Speech perception by the chinchilla: discrim-
ination of sustained /a/ and /i. J Acoust Soc Am 58:415–427.
Calhoun BM, Schreiner CE (1998) Spectral envelope coding in cat primary
auditory cortex: linear and non-linear effects of stimulus characteristics.
Eur J Neurosci 10:926–940.
Cheung SW, Bedenbaugh PH, Nagarajan SS, Schreiner CE (2001) Func-
tional organization of squirrel monkey primary auditory cortex: re-
sponses to pure tones. J Neurophysiol 85:1732–1749.
Cohen YE, Russ BE, Gifford GW 3rd, Kiringoda R, MacLean KA (2004)
Selectivity for the spatial and nonspatial attributes of auditory stimuli in
the ventrolateral prefrontal cortex. J Neurosci 24:11307–11316.
Fuller DR, Lloyd LL (1992) Effects of configuration on the paired-associate
learning of blissymbols by preschool children with normal cognitive abil-
ities. J Speech Hear Res 35:1376–1383.
Furukawa S, Middlebrooks JC (2002) Cortical representation of auditory
space: information-bearing features of spike patterns. J Neurophysial 87:
1749–1762.
Gelfer MP, Mikos VA (2005) The relative contributions of speaking funda-
mental frequency and formant frequencies to gender identification based
on isolated vowels. J Voice 19:544–554.
Gifford GW 3rd, Cohen YE (2005) Spatial and non-spatial auditory pro-
cessing in the lateral intraparietal area. Exp Brain Res 162:509–512.
Goodale MA, Milner AD (1992) Separate visual pathways for perception
and action. Trends Neurosci 15:20–25.
Goure´vitch B, Eggermont JJ (2007) Spatial representation of neural re-
sponses to natural and altered conspecific vocalizations in cat auditory
cortex. J Neurophysiol 97:144–158.
Hackett TA, Stepniewska I, Kaas JH (1998) Subdivisions of auditory cortex
and ipsilateral cortical connections of the parabelt auditory cortex in
macaque monkeys. J Comp Neurol 394:475–495.
Hackett TA, Stepniewska I, Kaas JH (1999) Prefrontal connections of the
parabelt auditory cortex in macaque monkeys. Brain Res 817:45–58.
Hall DA, Plack CJ (2008) Pitch processing sites in the human auditory
brain. Cereb Cortex. Advance online publication. Retrieved July 4, 2008.
doi:10.1093/cercor/bhn108.
Harrington IA, Stecker GC, Macpherson EA, Middlebrooks JC (2008) Spa-
tial sensitivity of neurons in the anterior, posterior, and primary fields of
cat auditory cortex. Hear Res 240:22–41.
Imaizumi K, Priebe NJ, Crum PA, Bedenbaugh PH, Cheung SW, Schreiner
CE (2004) Modular functional organization of cat anterior auditory
field. J Neurophysiol 92:444– 457.
Kaas JH, Hackett TA (2000) Subdivisions of auditory cortex and processing
streams in primates. Proc Natl Acad Sci U S A 97:11793–11799.
Kacelnik O, Nodal FR, Parsons CH, King AJ (2006) Training-induced plas-
ticity of auditory localization in adult mammals. PLoS Biol 4:e71.
Kelly JB, Judge PW, Phillips DP (1986) Representation of the cochlea in
primary auditory cortex of the ferret (Mustela putorius). Hear Res
24:111–115.
Kowalski N, Versnel H, Shamma SA (1995) Comparison of responses in the
anterior and primary auditory fields of the ferret cortex. J Neurophysiol
73:1513–1523.
Kudoh M, Nakayama Y, Hishida R, Shibuki K (2006) Requirement of the
auditory association cortex for discrimination of vowel-like sounds in
rats. Neuroreport 17:1761–1766.
Kumar S, Stephan KE, Warren JD, Friston KJ, Griffiths TD (2007) Hierar-
chical processing of auditory objects in humans. PLoS Comput Biol
3:e100.
Lewald J, Riederer KA, Lentz T, Meister IG (2008) Processing of sound lo-
cation in human cortex. Eur J Neurosci 27:1261–1270.
Lomber SG, Malhotra S (2008) Double dissociation of ‘what’ and ‘where’
processing in auditory cortex. Nat Neurosci 11:609– 616.
Lomber SG, Malhotra S, Hall AJ (2007) Functional specialization in non-
primary auditory cortex of the cat: areal and laminar contributions to
sound localization. Hear Res 229:31–45.
Maeder PP, Meuli RA, Adriani M, Bellmann A, Fornari E, Thiran JP, Pittet A,
Clarke S (2001) Distinct pathways involved in sound recognition and
localization: a human fMRI study. Neuroimage 14:802–816.
Menon V, Levitin DJ, Smith BK, Lembke A, Krasnow BD, Glazer D, Glover
GH, McAdams S (2002) Neural correlates of timbre change in harmonic
sounds. Neuroimage 17:1742–1754.
Middlebrooks JC, Dykes RW, Merzenich MM (1980) Binaural response-
specific bands in primary auditory cortex (AI) of the cat: topographical
organization orthogonal to isofrequency contours. Brain Res 181:31–48.
Mishkin M, Ungerleider LG (1982) Contribution of striate inputs to the
visuospatial functions of parieto-preoccipital cortex in monkeys. Behav
Brain Res 6:57–77.
Mrsic-Flogel TD, King AJ, Schnupp JWH (2005) Encoding of virtual acous-
tic space stimuli by neurons in ferret primary auditory cortex. J Neuro-
physiol 93:3489–3503.
Nelken I, Bar-Yosef O (2008) Neurons and objects: the case of auditory
cortex. Front Neurosci 2:107–113.
Nelken I, Bizley JK, Nodal FR, Ahmed B, Schnupp JW, King AJ (2004)
Large-scale organization of ferret auditory cortex revealed using contin-
uous acquisition of intrinsic optical signals. J Neurophysiol
92:2574–2588.
Nelken I, Chechik G, Mrsic-Flogel TD, King AJ, Schnupp JWH (2005) En-
coding stimulus information by spike numbers and mean response time
in primary auditory cortex. J Comput Neurosci 19:199–221.
Nelken I, Bizley JK, Nodal FR, Ahmed B, King AJ, Schnupp JWH (2008)
Responses of auditory cortex to complex stimuli: functional organization
revealed using intrinsic optical signals. J Neurophysiol 99:1928–1941.
Nodal FR, Bajo VM, Parsons CH, Schnupp JW, King AJ (2008) Sound lo-
calization behavior in ferrets: comparison of acoustic orientation and
approach-to-target responses. Neuroscience 154:397–408.
Ohl FW, Scheich H (1997) Orderly cortical representation of vowels based
on formant interaction. Proc Natl Acad Sci U S A 94:9440 –9444.
Parsons CH, Lanyon RG, Schnupp JW, King AJ (1999) Effects of altering
spectral cues in infancy on horizontal and vertical sound localization by
adult ferrets. J Neurophysiol 82:2294–2309.
Peterson GE, Barney HL (1952) Control methods used in a study of the
vowels. J Acoust Soc Am 24:175–184.
Phillips DP, Orman SS (1984) Responses of single neurons in posterior field
of cat auditory cortex to tonal stimulation. J Neurophysiol 51:147–163.
Rauschecker JP, Tian B, Pons T, Mishkin M (1997) Serial and parallel pro-
cessing in rhesus monkey auditory cortex. J Comp Neurol 382:89–103.
Read HL, Winer JA, Schreiner CE (2001) Modular organization of intrinsic
connections associated with spectral tuning in cat auditory cortex. Proc
Natl Acad Sci U S A 98:8042– 8047.
Recanzone GH (2000) Spatial processing in the auditory cortex of the ma-
caque monkey. Proc Natl Acad Sci U S A 97:11829 –11835.
Recanzone GH (2008) Representation of con-specific vocalizations in the
core and belt areas of the auditory cortex in the alert macaque monkey.
J Neurosci 28:13184–13193.
Reissland N, Shepherd J, Herrera E (2003) The pitch of maternal voice: a
comparison of mothers suffering from depressed mood and non-
depressed mothers reading books to their infants. J Child Psychol Psychi-
atry 44:255–261.
Romanski LM, Tian B, Fritz J, Mishkin M, Goldman-Rakic PS, Rauschecker
JP (1999) Dual streams of auditory afferents target multiple domains in
the primate prefrontal cortex. Nat Neurosci 2:1131–1136.
Romanski LM, Tian B, Fritz JB, Mishkin M, Goldman-Rakic PS, Rauschecker
JP (2000) Reply to “What’, ‘where’ and ‘how’ in auditory cortex’. Nat
Neurosci 3:966.
Rutkowski RG, Wallace MN, Shackleton TM, Palmer AR (2000) Organisa-
tion of binaural interactions in the primary and dorsocaudal fields of the
guinea pig auditory cortex. Hear Res 145:177–189.
2074 J. Neurosci., February 18, 2009 29(7):2064 –2075 Bizley et al. Pitch, Timbre, and Location Coding in Cortex
Schnupp JW, Booth J, King AJ (2003) Modeling individual differences in
ferret external ear transfer functions. J Acoust Soc Am 113:2021–2030.
Schnupp JW, Hall TM, Kokelaar RF, Ahmed B (2006) Plasticity of temporal
pattern codes for vocalization stimuli in primary auditory cortex. J Neu-
rosci 26:4785–4795.
Schreiner CE, Cynader MS (1984) Basic functional organization of second
auditory cortical field (AII) of the cat. J Neurophysiol 51:1284–1305.
Schreiner CE, Mendelson JR (1990) Functional topography of cat primary
auditory cortex: distribution of integrated excitation. J Neurophysiol
64:1442–1459.
Stecker GC, Mickey BJ, Macpherson EA, Middlebrooks JC (2003) Spatial
sensitivity in field PAF of cat auditory cortex. J Neurophysiol
89:2889–2903.
Stecker GC, Harrington IA, Macpherson EA, Middlebrooks JC (2005) Spa-
tial sensitivity in the dorsal zone (area DZ) of cat auditory cortex. J Neu-
rophysiol 94:1267–1280.
Steinschneider M, Reser DH, Fishman YI, Schroeder CE, Arezzo JC (1998)
Click train encoding in primary auditory cortex of the awake monkey:
evidence for two mechanisms subserving pitch perception. J Acoust Soc
Am 104:2935–2955.
Tian B, Reser D, Durham A, Kustov A, Rauschecker JP (2001) Functional
specialization in rhesus monkey auditory cortex. Science 292:290–293.
Tolias AS, Keliris GA, Smirnakis SM, Logothetis NK (2005) Neurons in ma-
caque area V4 acquire directional tuning after adaptation to motion stim-
uli. Nat Neurosci 8:591–593.
Ungerleider LG, Haxby JV (1994) ‘What’ and ‘where’ in the human brain.
Curr Opin Neurobiol 4:157–165.
Versnel H, Shamma SA (1998) Spectral-ripple representation of steady-
state vowels in primary auditory cortex. J Acoust Soc Am 103:2502–2514.
Walker KM, Ahmed B, Schnupp JW (2008) Linking cortical spike pattern
codes to auditory perception. J Cogn Neurosci 20:135–152.
Walker KM, Schnupp JWH, Hart-Schnupp S, King AJ, Bizley JK (2009) Dis-
crimination by ferrets of the direction of pitch changes in simple and
complex sounds. J Acoust Soc Am, in press.
Warren JD, Griffiths TD (2003) Distinct mechanisms for processing spatial
sequences and pitch sequences in the human auditory brain. J Neurosci
23:5799–5804.
Warren JD, Uppenkamp S, Patterson RD, Griffiths TD (2003) Separating
pitch chroma and pitch height in the human brain. Proc Natl Acad Sci
U S A 100:10038–10042.
Bizley et al. Pitch, Timbre, and Location Coding in Cortex J. Neurosci., February 18, 2009 29(7):2064 –2075 • 2075
... Identifying auditory 'objects' requires that animal and human listeners are capable of discriminating 62 sounds along a given perceptual dimension while generalizing across variability in other dimensions 63 (Griffiths and Warren, 2004;Bizley and Cohen, 2013). At the level of the single neuron this requires 64 that neuronal responses are both selective for one sound feature but tolerant (or invariant) across 65 others (Ison and Quiroga, 2008;Bizley et al., 2009). While the consequences of behavioural training 66 on neural invariance is unknown, appropriate environmental exposure during development can 67 shape auditory cortical responses to complex sound features: In the auditory cortical neurons of 68 animals reared in complex acoustic environments fewer neurons respond to any single sound but 69 responses were more selective for particular spectro-temporal features and can tolerate greater 70 acoustic variability (Bao et al., 2013). ...
... In order to test our first hypothesis, that we would see an increase in both sensitivity and invariance 245 in control animals, we determined the proportion of units whose responses were significantly 246 modulated by variation in stimulus location, pitch (determined by fundamental frequency, F0) and 247 timbre, using the variance decomposition approach used in Bizley et al. (2009). Since all five animals 248 trained in a timbre discrimination task, we predicted a greater number of neurons might convey 249 timbre information, and that we might observe fewer neurons that were additionally sensitive to 250 untrained stimulus features (pitch in the 2AFC animals and azimuth in 2AFC and T/P GNG animals). ...
... Finally we examined how training impacted sensitivity to variation in sound source location. In our 372 control dataset we observed that the spatial tuning elicited by these vowel stimuli presented in virtual 373 acoustic space was modest, and as anticipated, predominantly contralateral (Bizley et al., 2009). When 374 the same neurons were tested with spatially modulated broadband noise (also in VAS) we observed 375 considerably greater spatial modulation leading us to suggest that the low spatial sensitivity was a 376 product of the stimuli rather than the neurons we were recording or the VAS technique. ...
Preprint
Full-text available
Auditory learning is supported by long-term changes in the neural processing of sound. We mapped neural sensitivity to timbre, pitch and location in animals trained to discriminate the identity of artificial vowels based on their spectral timbre in a two-alternative forced choice (T2AFC, n=3, female ferrets) or to detect changes in fundamental frequency or timbre of repeating artificial vowels in a go/no-go task (n=2 female ferrets). Neural responses were recorded under anaesthesia in two primary cortical fields and two tonotopically organised non-primary fields. Responses were compared these data to that of naive control animals. We observed that in both groups of trained animals the overall sensitivity to sound timbre was reduced across three cortical fields but enhanced in non-primary field PSF. Neural responses in trained animals were able to discriminate vowels that differed in either their first or second formant frequency unlike control animals whose sensitivity was mostly driven by changes in the second formant. Neural responses in the T2AFC animals, who were required to generalise across pitch when discriminating timbre, became less modulated by fundamental frequency, while those in the go/no-go animals were unchanged relative to controls. Finally, both trained groups showed increased spatial sensitivity and altered tuning. Trained animals showed an enhanced representation of the midline, where the speaker was located in the experimental chamber. Overall, these results demonstrate training elicited widespread changes in the way in which auditory cortical neurons represent complex sounds with changes in how both task relevant and task-irrelevant features were represented.
... How frequency and harmonic information are differentially represented in the auditory cortex has been a point of contention, with tonotopy in some cases appearing to represent perceptual pitch (Pantev et al., 1989;Pantev et al., 1996;Monahan et al., 2008) and in other cases showing that the frequency of pure tones is organized tonotopically, with a separate periodotopic representation of timbre (Langner et al., 1998;Warren et al., 2003;Bizley et al., 2009;Allen et al., 2017). These studies have typically used pure tones or narrowband noise -with perceptually unambiguous signals as pitch-evoking stimuli. ...
... However, this does not preclude the presence of other frequencyor timbre-based representations throughout auditory cortex -high decoding of pitch within (but not across) tone types specifically during later time windows suggests spectrally specific tonotopic representations downstream (Allen et al., 2022). Additionally, high decoding of tone type (pure vs. complex MF), suggests the existence of a timbre-sensitive region within auditory cortex (Bizley et al., 2009;Allen et al., 2017). Similar analyses could be applied to an exploration of absolute vs. relative pitch encoding and the role of pitch for different auditory domains, such as melodic and speech processing. ...
Preprint
The ability to perceive pitch allows human listeners to experience music, recognize the identity and emotion conveyed by conversational partners, and make sense of their auditory environment. A pitch percept is formed by weighting different acoustic cues (e.g., signal fundamental frequency and inter-harmonic spacing) and contextual cues (expectation). How and when such cues are neurally encoded and integrated remains debated. In this study, twenty-eight participants listened to tone sequences with different acoustic cues (pure tones, complex missing fundamental tones, and ambiguous mixtures), placed in predictable and less predictable sequences, while magnetoencephalography was recorded. Decoding analyses revealed that pitch was encoded in neural responses to all three tone types, in the low-to-mid auditory cortex, bilaterally, with right-hemisphere dominance. The pattern of activity generalized across cue-types, offset in time: pitch was neurally encoded earlier for harmonic tones (∼85ms) than pure tones (∼95ms). For ambiguous tones, pitch emerged significantly earlier in predictable contexts, and could be decoded even before tone onset. The results suggest that a unified neural representation of pitch emerges by integrating independent pitch cues, and that context alters the dynamics of pitch generation when acoustic cues are ambiguous.
... Overall, these results demonstrate that STG is organized according to sites with a dominant feature and that tuning within a site has a degree of heterogeneity that makes them not entirely modular 37,38 . This variation in speech feature tuning potentially facilitates local h v g f blae k t s "It had gone like clockwork." ...
Article
Full-text available
Understanding the neural basis of speech perception requires that we study the human brain both at the scale of the fundamental computational unit of neurons and in their organization across the depth of cortex. Here we used high-density Neuropixels arrays1–3 to record from 685 neurons across cortical layers at nine sites in a high-level auditory region that is critical for speech, the superior temporal gyrus4,5, while participants listened to spoken sentences. Single neurons encoded a wide range of speech sound cues, including features of consonants and vowels, relative vocal pitch, onsets, amplitude envelope and sequence statistics. Neurons at each cross-laminar recording exhibited dominant tuning to a primary speech feature while also containing a substantial proportion of neurons that encoded other features contributing to heterogeneous selectivity. Spatially, neurons at similar cortical depths tended to encode similar speech features. Activity across all cortical layers was predictive of high-frequency field potentials (electrocorticography), providing a neuronal origin for macroelectrode recordings from the cortical surface. Together, these results establish single-neuron tuning across the cortical laminae as an important dimension of speech encoding in human superior temporal gyrus.
... Animal studies have investigated pitch responses in the auditory cortex of several species, sometimes with seemingly conf licting results. In ferrets, cortical responses indicative of pitch processing have been shown to be distributed across auditory fields (Bizley et al. 2009;Walker et al. 2011). Using high-field fMRI in cats, Butler et al. (2015) found that responses to RIN stimuli (compared to narrowband noise) were not present in the subdivisions relating to the core auditory cortex (A1 and anterior auditory field) but were instead unique to regions further upstream (posterior auditory field and A2). ...
Article
Full-text available
The perception of pitch is a fundamental percept, which is mediated by the auditory system, requiring the abstraction of stimulus properties related to the spectro-temporal structure of sound. Despite its importance, there is still debate as to the precise areas responsible for its encoding, which may be due to species differences or differences in the recording measures and choices of stimuli used in previous studies. Moreover, it was unknown whether the human brain contains pitch neurons and how distributed such neurons might be. Here, we present the first study to measure multiunit neural activity in response to pitch stimuli in the auditory cortex of intracranially implanted humans. The stimulus sets were regular-interval noise with a pitch strength that is related to the temporal regularity and a pitch value determined by the repetition rate and harmonic complexes. Specifically, we demonstrate reliable responses to these different pitch-inducing paradigms that are distributed throughout Heschl's gyrus, rather than being localized to a particular region, and this finding was evident regardless of the stimulus presented. These data provide a bridge across animal and human studies and aid our understanding of the processing of a critical percept associated with acoustic stimuli.
Preprint
Listeners readily extract multi–dimensional auditory objects such as a ´localized talker´ from complex acoustic scenes with multiple talkers. Yet, the neural mechanisms underlying simultaneous encoding and linking of different sound features — for example, a talker´s voice and location — are poorly understood. We analyzed invasive intracranial recordings in neurosurgical patients attending to a localized talker in real–life cocktail party scenarios. We found that sensitivity to an individual talker´s voice and location features was distributed throughout auditory cortex and that neural sites exhibited a gradient from sensitivity to a single feature to joint sensitivity to both features. On a population level, cortical response patterns of both dual–feature sensitive sites but also single–feature sensitive sites revealed simultaneous encoding of an attended talker´s voice and location features. However, for single–feature sensitive sites, the representation of the primary feature was more precise. Further, sites which selective tracked an attended speech stream concurrently encoded an attended talker´s voice and location features, indicating that such sites combine selective tracking of an attended auditory object with encoding of the object´s features. Finally, we found that attending a localized talker selectively enhanced temporal coherence between single–feature voice sensitive sites and single–feature location sensitive sites, providing an additional mechanism for linking voice and location features in multi–talker scenes. These results demonstrate that a talker´s voice and location features are linked during multi-dimensional object formation in naturalistic multi–talker scenes by joint population coding as well as by temporal coherence between neural sites.
Chapter
Chapter 4 ended with the remark that the formation of an auditory unit can be strongly affected by sound components that precede and follow that auditory unit. This chapter describes the process in which successive auditory units are linked to each other to form auditory streams. An auditory stream is a sequence of auditory units that are perceived as coming from one and the same sound source. Examples are the successive syllables of a speech utterance, or the sequence of tones that together form a melody. The result of this complex process of auditory-stream formation is an auditory scene consisting of more or less well-defined auditory streams only one of which can be attended to effortlessly. Moreover, when the number of sound sources is more than three to four, listeners generally underestimate the number of sound sources in an auditory scene. The most important characteristic of an auditory stream is that the successive auditory units are temporally coherent, i.e., that they are well ordered in time and their beats form a well-defined rhythm. Establishing temporal coherence between successive auditory units is a complex process depending on many factors. These factors can be relatively simple, such as the pitch and the timbre of successive auditory units, or more complex factors such as the familiarity of the sounds. The result appears to be a very flexible and adaptive system that also operates well in very noisy circumstances such as bustling restaurants or cocktail parties. When segments of an auditory stream are masked by other sounds, the auditory system is highly capable of restoring this information in such a way that the listener is not aware of this restoration. Besides having a well-defined rhythm, auditory streams have well-defined loudness contours and, if at least the constituent auditory units have pitch, well-defined pitch contours, but these contours are not perceived independently of each other. In addition, the consonant and dissonant relations between the parallel streams of musical scenes are described. This chapter ends with the description of three different approaches to computational modelling of the auditory-stream-formation process.
Article
Full-text available
A key question in auditory neuroscience is to what extent are brain regions functionally specialized for processing specific sound features such as location and identity. In auditory cortex, correlations between neural activity and sounds support both the specialization of distinct cortical subfields, and encoding of multiple sound features within individual cortical areas. However, few studies have tested the contribution of auditory cortex to hearing in multiple contexts. Here we determined the role of ferret primary auditory cortex in both spatial and non-spatial hearing by reversibly inactivating the middle ectosylvian gyrus during behavior using cooling (n=2 females) or optogenetics (n=1 female). Optogenetic experiments utilized the mDLx promoter to express Channelrhodopsin2 in GABAergic interneurons and we confirmed both viral expression (n=2 females) and light-driven suppression of spiking activity in auditory cortex, recorded using Neuropixels under anesthesia (n=465 units from 2 additional untrained female ferrets). Cortical inactivation via cooling or optogenetics impaired vowel discrimination in co-located noise. Ferrets implanted with cooling loops were tested in additional conditions that revealed no deficits for identifying vowels in clean conditions, or when the temporally coincident vowel and noise were spatially separated by 180 degrees. These animals did however show impaired sound localization when inactivating the same auditory cortical region implicated in vowel discrimination in noise. Our results demonstrate that, as a brain region showing mixed selectivity for spatial and non-spatial features of sound, primary auditory cortex contributes to multiple forms of hearing. SIGNIFICANCE STATEMENT: Neurons in primary auditory cortex are often sensitive to the location and identity of sounds. Here we inactivated auditory cortex during spatial and non- spatial listening tasks using cooling, or optogenetics. Auditory cortical inactivation impaired multiple behaviors, demonstrating a role in both the analysis of sound location and identity and confirming a functional contribution of mixed selectivity observed in neural activity. Parallel optogenetic experiments in two additional untrained ferrets linked behavior to physiology by demonstrating that expression of Channelrhodopsin 2 permitted rapid light-driven suppression of auditory cortical activity recorded under anesthesia.
Article
Full-text available
Recent long-term measurements of neuronal activity have revealed that, despite stability in large-scale topographic maps, the tuning properties of individual cortical neurons can undergo substantial reformatting over days. To shed light on this apparent contradiction, we captured the sound response dynamics of auditory cortical neurons using repeated 2-photon calcium imaging in awake mice. We measured sound-evoked responses to a set of pure tone and complex sound stimuli in more than 20,000 auditory cortex neurons over several days. We found that a substantial fraction of neurons dropped in and out of the population response. We modeled these dynamics as a simple discrete-time Markov chain, capturing the continuous changes in responsiveness observed during stable behavioral and environmental conditions. Although only a minority of neurons were driven by the sound stimuli at a given time point, the model predicts that most cells would at least transiently become responsive within 100 days. We observe that, despite single-neuron volatility, the population-level representation of sound frequency was stably maintained, demonstrating the dynamic equilibrium underlying the tonotopic map. Our results show that sensory maps are maintained by shifting subpopulations of neurons "sharing" the job of creating a sensory representation.
Article
Full-text available
Although many studies have examined the performance of animals in detecting a frequency change in a sequence of tones, few have measured animals' discrimination of the fundamental frequency (F0) of complex, naturalistic stimuli. Additionally, it is not yet clear if animals perceive the pitch of complex sounds along a continuous, low-to-high scale. Here, four ferrets (Mustela putorius) were trained on a two-alternative forced choice task to discriminate sounds that were higher or lower in F0 than a reference sound using pure tones and artificial vowels as stimuli. Average Weber fractions for ferrets on this task varied from approximately 20% to 80% across references (200-1200 Hz), and these fractions were similar for pure tones and vowels. These thresholds are approximately ten times higher than those typically reported for other mammals on frequency change detection tasks that use go/no-go designs. Naive human listeners outperformed ferrets on the present task, but they showed similar effects of stimulus type and reference F0. These results suggest that while non-human animals can be trained to label complex sounds as high or low in pitch, this task may be much more difficult for animals than simply detecting a frequency change.
Article
Full-text available
Auditory cortical processing in primates has been proposed to be divided into two parallel processing streams, a caudal spatial stream and a rostral nonspatial stream. Previous single neuron studies have indicated that neurons in the rostral lateral belt respond selectively to vocalization stimuli, whereas imaging studies have indicated that selective vocalization processing first occurs in higher order cortical areas. To test the dual stream hypothesis and to find evidence to account for the difference between the electrophysiological and imaging results, we recorded the responses of single neurons in core and belt auditory cortical fields to both forward and reversed vocalizations. We found that there was little difference in the overall firing rate of neurons across different cortical areas or between forward and reversed vocalizations. However, more information was carried in the overall firing rate for forward vocalizations compared with reversed vocalizations in all areas except the rostral field of the core (area R). These results are consistent with the imaging results and are inconsistent with early rostral cortical areas being involved in selectively processing vocalization stimuli based on a firing rate code. They further suggest that a more complex processing scheme is in play in these early auditory cortical areas.
Article
Full-text available
Sounds are encoded into electrical activity in the inner ear, where they are represented (roughly) as patterns of energy in narrow frequency bands. However, sounds are perceived in terms of their high-order properties. It is generally believed that this transformation is performed along the auditory hierarchy, with low-level physical cues computed at early stages of the auditory system and high-level abstract qualities at high-order cortical areas. The functional position of primary auditory cortex (A1) in this scheme is unclear – is it ‘early’, encoding physical cues, or is it ‘late’, already encoding abstract qualities? Here we argue that neurons in cat A1 show sensitivity to high-level features of sounds. In particular, these neurons may already show sensitivity to ‘auditory objects’. The evidence for this claim comes from studies in which individual sounds are presented singly and in mixtures. Many neurons in cat A1 respond to mixtures in the same way they respond to one of the individual components of the mixture, and in many cases neurons may respond to a low-level component of the mixture rather than to the acoustically dominant one, even though the same neurons respond to the acoustically-dominant component when presented alone.
Article
Responds to P. Belin and R. J. Zatorre's (see record 2000-00762-001) comments on the L. M. Romanski et al (see record 1999-15261-008) article which reports the findings of (at least) 2 streams of auditory projections to the prefrontal cortex and supports a model which involves a dorsal pathway for extracting a verbal message and a ventral pathway for identifying the speaker. In response, the authors point to several lines of evidence linking posterior auditory belt cortex and dorsolateral prefrontal cortex with auditory spatial processing. They concur that "motion processing" in a general sense may indeed by 1 of the functions of the dorsal pathway but suggest that it can be used in the service of both auditory space processing and speech perception. 2008 APA, all rights reserved)
Article
Relationships between a listener's identification of a spoken vowel and its properties as revealed from acoustic measurement of its sound wave have been a subject of study by many investigators. Both the utterance and the identification of a vowel depend upon the language and dialectal backgrounds and the vocal and auditory characteristics of the individuals concerned. The purpose of this paper is to discuss some of the control methods that have been used in the evaluation of these effects in a vowel study program at Bell Telephone Laboratories. The plan of the study, calibration of recording and measuring equipment, and methods for checking the performance of both speakers and listeners are described. The methods are illustrated from results of tests involving some 76 speakers and 70 listeners.
Article
We tested the responses of neurons in the lateral parietal area (area LIP) for their sensitivity to the spatial and non-spatial attributes of an auditory stimulus. We found that the firing rates of LIP neurons were modulated by both of these attributes. These data indicate that, while area LIP is involved in spatial processing, non-spatial processing is not restricted to independent channels.
Article
Electrophysiological studies in mammal primary auditory cortex have demonstrated neuronal tuning and cortical spatial organization based upon spectral and temporal qualities of the stimulus including: its frequency, intensity, amplitude modulation and frequency modulation. Although communication and other behaviourally relevant sounds are usually complex, most response characterizations have used tonal stimuli. To better understand the mechanisms necessary to process complex sounds, we investigated neuronal responses to a specific class of broadband stimuli, auditory gratings or ripple stimuli, and compared the responses with single tone responses. Ripple stimuli consisted of 150–200 frequency components with the intensity of each component adjusted such that the envelope of the frequency spectrum is sinusoidal. It has been demonstrated that neurons are tuned to specific characteristics of those ripple stimulus including the intensity, the spacing of the peaks, and the location of the peaks and valleys (C. E. Schreiner and B. M. Calhoun, Auditory Neurosci. 1994; 1: 39–61). Although previous results showed that neuronal response strength varied with the intensity and the fundamental frequency of the stimulus, it is shown here that the relative response to different ripple spacings remains essentially constant with changes in the intensity and the fundamental frequency. These findings support a close relationship between pure-tone receptive fields and ripple transfer functions. However, variations of other stimulus characteristics, such as spectral modulation depth, result in non-linear alterations in the ripple transformation. The processing between the basilar membrane and the primary auditory cortex of broadband stimuli appears generally to be non-linear, although specific stimulus qualities, including the phase of the spectral envelope, are processed in a nearly linear manner.
Article
Relationships between a listener's identification of a spoken vowel and its properties as revealed from acoustic measurement of its sound wave have been a subject of study by many investigators. Both the utterance and the identification of a vowel involve subjective responses and are affected by the language and dialectal backgrounds and the vocal and auditory characteristics of the individuals concerned. The purpose of this paper is to discuss some of the control methods that are being used in the evaluation of these effects in a vowel study program in progress at Bell Telephone Laboratories. The plan of the study, calibration of recording and measuring equipment, and methods for checking the performance of both speakers and listeners are described. The methods are illustrated from results of tests involving some 76 speakers and 70 listeners.
Article
The mechanisms by which the human T-cell leukemia virus type I (HTLV-I) Tax oncoprotein deregulates cellular signaling for oncogenesis have been extensively studied, but how Tax itself is regulated remains largely unknown. Here we report that Tax was negatively regulated by PDLIM2, which promoted Tax K48-linked polyubiquitination. In addition, PDLIM2 recruited Tax from its functional sites into the nuclear matrix where the polyubiquitinated Tax was degraded by the proteasome. Consistently, PDLIM2 suppressed Tax-mediated signaling activation, cell transformation, and oncogenesis both in vitro and in animal. Notably, PDLIM2 expression was down-regulated in HTLV-I-transformed T cells, and PDLIM2 reconstitution reversed the tumorigenicity of the malignant cells. These studies indicate that the counterbalance between HTLV-I/Tax and PDLIM2 may determine the outcome of HTLV-I infection. These studies also suggest a potential therapeutic strategy for cancers and other diseases associated with HTLV-I infection and/or PDLIM2 deregulation.