Content uploaded by Daniel Butts
Author content
All content in this area was uploaded by Daniel Butts
Content may be subject to copyright.
INSTITUTE OF PHYSICS PUBLISHING NETWORK: COMPUTATION IN NEURAL SYSTEMS
Network: Comput. Neural Syst. 14 (2003) 177–187 PII: S0954-898X(03)52703-8
How much information is associated with a particular
stimulus?
Daniel A Butts
Department of Neurobiology, Harvard Medical School, 220 Longwood Avenue, Boston,
MA 02115, USA
E-mail: daniel
butts@hms.harvard.edu
Received 21 August 2002, in final form 24 October 2002
Published 28 January 2003
Online at stacks.iop.org/Network/14/177
Abstract
Although the Shannon mutual information can be used to reveal general features
of the neural code, it cannot directly address which symbols of the code are
significant. Further insight can be gained by using information measures that
are specific to particular stimuli or responses. The specific information is a
previously proposed measure of the amount of information associated with a
particular response; however, as I show, it does not properly characterize the
amount of information associated with particular stimuli. Instead, I propose
anewmeasure: the stimulus-specific information (SSI), defined to be the
average specific information of responses given the presence of a particular
stimulus. Like other information theoretic measures, the SSI does not rely on
assumptions about the neural code, and is robust to non-linearities of the system.
To demonstrate its applicability, the SSI is applied to data from simulated visual
neurons, and identifies stimuli consistent with the neuron’s linear kernel. While
the SSI reveals the essential linearity of the visual neurons, it also successfully
identifies the well-encoded stimuli in a modified example where linear analysis
techniques fail. Thus, I demonstrate that the SSI is an appropriate measure of
theinformation associated with particular stimuli, and provides a new unbiased
method of analysing the significant stimuli of a neural code.
1. Introduction
Information theory provides measures for comparing different coding schemes of neurons
and neural ensembles, avoiding both biases caused by preconceptions of the neural code and
complications inherent in analysing neuronal systems with non-linearities and under conditions
of complex stimuli. As a result, information theory has been used to analyse neural data in a
variety of sensory systems(for a review,see Borst and Theunissen 1999). Such studies typically
make comparisons between the Shannon mutual information given different classifications of
0954-898X/03/020177+11$30.00 © 2003 IOP Publishing Ltd Printed in the UK 177
178 DAButts
stimulus and response ensembles, but do not address which stimuli and responses within these
ensembles are significant in information transmission.
As aresult, DeWeese and Meister (1999) proposed an information theoretic measure of
the significance of particular symbols in the neural code: the specific information.Thespecific
information of a particular response is defined as the reduction in uncertainty in the stimulus
gained by the observation of that response. Since the mutual information represents the average
reduction in the uncertainty of the stimulus gained by one measurement, specific information
is intuitively a good representation of the degree to which a given response contributes to the
overall mutual information. Furthermore, DeWeese and Meister (1999) show that specific
information has unique properties that are appropriate for a measure of the information of a
response.
Specific information can be applied to both particular stimuli and particular responses due
to the symmetry between stimulus and responseininformation measures. However, because of
the asymmetry of stimulus and response with respect to causality (i.e., stimuli cause responses
and not vice versa), here I show that specific information does not provide a good measure
of stimulus significance. I propose a new measure, the ‘stimulus-specific information’ (SSI),
which is defined to be the average reduction in uncertainty of one observation given a particular
stimulus.
Since the definition of the SSI relies on a measure of the information associated
with responses (i.e., specific information), I will first define specific information (proposed
by DeWeese and Meister 1999). Using a simple example, I will then show that it does not
provide a good measure of the information associated with particular stimuli, and motivate the
definition of the SSI.
To demonstrate its effectiveness, the SSI is applied to data from realistic simulations of
neurons in the lateral geniculate nucleus (LGN) presented with full-field white-noise stimuli
(Keat et al 2001), and to a modified version of this model that defies typical linear analyses.
In both cases, the SSI identifies the most significant stimuli and offers additional insight into
the underlying neural code.
2. Decomposing mutual information into response-specific information
Consider a system presented with an ensemble of stimuli S and whose behaviour can be
classified into a set of responses R.The Shannon mutual information between the stimulus S
and response R ensembles of this system is given by
I [R, S] =
s∈S
r∈ R
p(s, r) log
2
p(s, r)
p(s) p(r)
(1)
where p(s, r) is the joint probability distribution, i.e., the probability of simultaneously
observing the stimulus s ∈ S and the response r ∈ R.The joint probability distribution
p(s, r) can be computed by counting the frequency of each stimulus–response pair over a
sufficient amount of time, over which the ‘natural’ ensemble of stimuli is sampled with the
probability given by the prior distribution p(s).
While the mutual information can be used to objectively evaluate different coding schemes
through different classifications of the stimulus and response ensembles (S and R), it represents
an average over the entire set of stimuli s ∈ S and responsesr ∈ R.Itisoftenofinterest to know
whichparticular stimuli are effectively encoded by the system, and which particular responses
communicate information about the stimuli. Such questions can be addressed by decomposing
I [R, S]into measures that represent the contributions of specific stimuli or responses to the
Stimulus-specific information 179
mutual information, i.e.
I [R, S] =
s∈S
p(s)i(s) =
r∈ R
p(r)i(r ). (2)
In this sense, the mutual information is explicitly a weighted average over individual
contributions from particular stimuli or particular responses.
There are an arbitrary number of ways to perform such decompositions, meaning that there
is not one single measure that represents the ‘specific’ information associated with a particular
stimulus or response. As a result, an appropriate measure must be chosen that properly signifies
therole of particular stimuli and responses in information transmission.
DeWeese and Meister (1999) argue that the information of a response must be additive,
since intuitively, information should accumulate over consecutive independent measurements
such that the total information from multiple measurements is equal to the sum of information
gained from each measurement separately. They show that there is only one decomposition of
mutual information that is additive, the specific information,givenby
i
sp
(r) = H [S] − H [S|r](3)
where the entropy of the prior distribution is given by H [S] =−
s
p(s) log
2
p(s) and
theconditional entropy associated with a particular response r is given by H [S|r] =
−
s
p(s|r) log
2
p(s|r).
The specific information has a straightforward intuitive meaning with regards to the process
of information transmission. Entropy is a measure of the uncertainty in probability distributions
(see Cover and Thomas 1991): broad distributions have large entropy, and narrow distributions
(where the variable in question is more localized to particular values) have little entropy. The
specific information is a difference in entropy of stimulus distributions (equation (3)), and thus
represents the amount that the initial uncertainty of the stimulus is reduced by observing the
response r.Furthermore,specific information is calibrated to mutual information: the specific
information is a direct measure of the amount learned by a given measurement, the mutual
information is the amount learned by a measurement on average overall possible responses.
Thus, the specific information serves as an information theoretic measure through which
thesignificance of different responses can be compared. Responses with a high specific
information reduce the uncertainty in the stimulus the most and are thus most significant to
thesystem, and responses that do not reduce the amount of uncertainty in the stimulus are less
significant.
3. The significance of astimulus is different to that of a response
Shannon’s mutual information is a statistical measure of the interdependence of two random
variables, which, in the study of sensory systems, is usually stimulus and response. As a
result of this generality, expressions for information theoretic quantities do not distinguish
between stimulus and response (for example, see equation (1)), meaning that tools applicable
to responses can be applied to stimuli. For example, the specific information of a stimulus can
be calculated by interchanging response and stimulus in equation (3):
i
sp
(s) = H [R] − H [R|s]. (4)
Is the specific information an appropriate measure of the information associated with a
particular stimulus? Below I show that, due to the asymmetry of stimulus and response with
respect to causality, a stimulus that is significant to a system does not have the same properties
as a significant response.
180 DAButts
Consider a simple joint probability distribution p(r, s) where there are only two stimuli
and two responses, with the probabilities of each stimulus–response pair given by
Without a measurement, the probability of stimulus s
1
is
3
4
and the probability of s
2
is
1
4
,i.e.,
the prior distribution is p(s) ={
3
4
,
1
4
} with an entropy H [S] = 2 −
3
4
log
2
3 = 0.81 bits.
Before addressing the question of stimulus significance, consider the significant responses
of this system. Each response is equally likely, but conveys different amounts of information
about the stimuli. Observation of the first response r
1
designates an ambiguous situation, with
equal probability of either stimulus: p(s|r
1
) ={
1
2
,
1
2
}: H [S|r
1
] = 1bit. The second response,
however, clearly designates s
1
,sincep(s|r
2
) ={1, 0}: H [S|r
2
] = 0bits. The information
content of each response is reflected in its specific information, with i
sp
(r
1
) =−0.19 bits
(uninformative), and i
sp
(r
2
) = H [S] = 0.81 bits. The total mutual information (equation (1))
can be calculated from the specific informations as their weighted average over the ensemble
of responses (equation (2)): I [R, S] = 0.31 bits.
This example was devised to place an intuitive notion of stimulus significance at odds
with the specific information applied to stimuli. With s
2
present, the response is completely
predictable (r
1
), whereas the presence of s
1
could result in either response. As a result, the
specific information applied to stimuli is higher for s
2
: i
sp
(s
1
) = 0.08 bitsversus i
sp
(s
2
) = 1bit.
However, recall that responses encode information about stimuli,andnot the reverse.
Though s
2
has the maximal specific information, neither response specifies it: observation of
r
1
givesanequalprobability of s
1
and s
2
,andobservation of r
2
unambiguously designates s
1
.
As aresult, from an information transmission standpoint, s
1
is encoded more effectively than
s
2
,incontrast to their specific informations.
Why does specific informationfailtoselect the more effectively encoded stimulus?
Specific information is largest for those stimuli that have few responses associated with them,
without regard to whether these responses are informative. For example, consider a neuron
(such as the example discussed later in this paper) that is unresponsive (does not fire) to many
stimuli. These stimuli would have a large specific information because ‘not firing’ is the only
response associated with them. But observing a ‘not firing’ response would be very ambiguous
since there are many stimuli that do not cause the neuron to fire. Thus, the specific information
of a stimulus that the neuron does not respond to would be relatively high, but one would not
say that it was well encoded.
4. Stimulus-specific information
Itherefore propose that the most informative stimuli are those that cause the most informative
responses, and are thus well encoded by the system. The responses associated with a stimulus
s are given by the conditional distribution p(r|s),andtheinformation conveyed by a particular
response r is given by its specific information i
sp
(r).Thus, I propose an information theoretic
measure of stimulus significance called the SSI, given by
i
ssi
(s) ≡
r
p(r|s)i
sp
(r) =
r
p(r|s){H [S] − H [S|r]}. (5)
Since the specific information i
sp
(r) is the reduction in uncertainty of the stimulus gained by
the particular observation r ∈ R,theSSI i
ssi
(s) is the average reduction of uncertainty gained
from one measurement given the stimulus s ∈ S.
Stimulus-specific information 181
Stimulus
Response
7.8 ms
L
S
=3
L
R
=1
(∆t=0.5)
-1 -1 +1
λ
11
A. B.
-100 -80 -60 -40 -20 0
0
0.1
0.2
0.3
0.4
Latency λ (ms)
Mutual Information (bits)
L
S
=10
L
R
=1
(∆t=0.5)
Figure 1. Mutual information between full-field flicker stimuli and neuronal spikes. (A) A three-
frame stimulus word [−1, −1, +1] and a one-frame response word [1, 1] are associated at a given
latency λ between the end of the stimulus word and beginning of the response word. (B) Mutual
information is calculated between stimulus and response given a choice for L
S
= 10, L
R
= 1, and
t = 1/2, and is shown as a function of latency λ.
When calculated using the example of the last section, we see that i
ssi
(s
1
) = 0.48 bits, and
i
ssi
(s
2
) =−0.19 bits. Like the specific information, the weighted average of i
ssi
(s) over the
stimulus ensemble s ∈ S gives the mutual information, and the value of the SSI is calibrated to
the mutual information and specific information. For example, the value of i
ssi
(s
1
) is consistent
with the fact that an observation completely determines the stimulus (giving close to 1 bit of
information) half of the time, and otherwise is not informative. The SSI of s
2
demonstrates
that the only possible measurement that can result when s
2
is presented is uninformative.
5. Simulated visual neurons
To demonstrate the use of SSI, I perform an information theoretic calculation on simulated
visual neurons. Data were generated using a model proposed by Keat et al (2001), which
accurately reproduces the timing of single action potentials as well as their statistical
distribution over multiple trials. Using model parameters that simulate the behaviour of cat
LGN neurons, spike trains were generated in response to random full-field flicker stimuli
presented at 128 Hz (see methods).
Information quantities between the full-field flicker stimulus S and resulting spike trains
R can be calculated as a function of several parameters: the length of the stimulus word L
S
,
thelength and time resolution of the response word (L
R
and t), and the latency λ between
them (see figure 1(A)). This method of calculating information is similar to those of Liu et al
(2001). For this paper, I will use L
S
= 10 frames, L
R
= 1 frame, and t = 0.5 frames, where
L
S
is chosen to be as large as possible given the limited number of data (see methods), and
L
R
and t are sufficiently representative of results found with longer response words (data
not shown). General questions about the dependence of mutual information on stimulus and
response word length and resolution are addressed elsewhere (Liu et al 2001).
The mutual information I [R, S]isshowninfigure 1(B) as a function of latency λ.It
peaks at a latency of −16.5ms, when the average response carries 0.37 bits about the stimulus.
Since I [R, S] can be decomposed into SSI (see equation (2)), the total mutual information can
be thought of as an average that includes both stimuli that are well encoded and those that are
not. To determine the contribution of each stimulus to the total mutual information, the SSI is
calculated for each of the 2
10
= 1024 stimuli.
182 DAButts
A
B
C
D
E
Stimulus-Specific Information (bits)
Number of Stimuli
A.
B.
Specific Information (bits)
0 1 2 3 4 5 6
0
2
4
6
8
10
AB
C
DE
-1.5 -1 -0.5 0 0.5
AC
-100 -80 -60 -40 -20 0
SSI-Averaged Stimulus
Spike-Triggered Average
Time before response (ms)
-0.4
0.0
0.4
0.8
-0.1
0.0
0.1
0.2
Figure 2. Information measures of particular stimuli. (A) SSI i
ssi
(s) and specific information i
sp
(s)
are calculated for the 2
10
stimuli at a latency λ of −16.5msandtheirdistributions are displayed as
histograms in the left and right frames respectively. Bins with greater than 10 elements are cut off to
focus on the few outlying stimuli. The five stimuli with the largest SSI are labelled A–E and shown
in the inset of the left frame. Two of them, A and C,havethelowest specific information (right
frame). (B) To identify the features of the stimulus ensemble that are well encoded, an average
stimulus is calculated by weighting each stimulus by its SSI (circles). This SSI-weighted average
stimulus is approximately proportional to the spike-triggered average stimulus of the neuron up to
ascalefactor (solid curve). Note that the spike-triggered average stimulus is scaled so it is directly
comparable to the units of average SSI.
The left panel of figure 2(A) shows a histogram of the SSI distribution of the stimulus
ensemble. The minimum SSI was 0.085 bits (leaving the lowest bin between 0 and 0.05 bits
empty). However, over half the stimuli (599 out of 1024) have SSIs in the next lowest bin
(between 0.05 and 0.1 bits). On the other extreme, the top five stimuli are well separated from
the rest of the distribution, and are shown in the inset. Note that the ‘best’ stimulus (A)isa
simple off-to-on transition which occurs at −47.8ms. Otherstimuli with lower SSI have a
slightly different off-to-on latency (C and D), or an ‘on’ frame instead of an ‘off’ frame at
large latencies (B and E).
For comparison, the right panel of figure 2(A) shows the distribution of specific
informations for the same stimuli (i
sp
(s),givenbyequation (4)). Specific information gives
nearly the opposite classification of stimuli as SSI: 486 out of 1024 stimuli are clustered in
the largest occupied bin (0.74 bits). At the same time, stimuli with some of the highest SSIs
have some of the lowest specific informations; stimuli A and C have the two lowest specific
informations (figure 2(A), right), and stimuli B, D,andE are lost in the broader distribution.
Stimulus-specific information 183
As explained above, this is due to the fact that many stimuli rarely elicit a spike, meaning that
they are associated with only one response (‘no spike’), and have high specific information.
At thesame time, stimuli that often result in a spike cause an uncertain response since they
may or may not elicit a spike.
Of course, the five stimuli with the largest SSIs are only presented 0.5% of the time,
and almost 90% of the stimuli have an SSI less than 1 bit but account for almost half of
the informationconveyed. To determine the important features in the stimulus ensemble
that lead to informative responses, I calculate an average stimulus based on each stimulus’s
fractional contribution to the mutual information p(s)i
ssi
(s)/I[R, S]. This ‘SSI-weighted
average stimulus’ is shown as circles in figure 2(B), and is compared to a more common
measure of significant stimuli: the spike-triggered average stimulus (STA), shown as a solid
curve.Notethat these results apply for a given latency λ =−16.5ms,meaningthat since the
SSI-weighted average has a resolution of one frame (7.8 ms), there are ten discrete points in
this average (L
S
= 10). The SSI-weighted average calculatedforother latencies is in similar
agreement to the STA (data not shown).
The close agreement between the spike-triggered and SSI-weighted average stimulus
results from a combination of two factors:
(1) the ability particular stimuli to evoke spikes in the Keat model is proportional to a linear
convolution with the stimulus (see the methods for more details); and
(2) the majority of information in the responses of this neuron is carried by spikes (found by
calculating the specific information of responses i
sp
(r) of equation (3)—data not shown).
As aresult of these two conditions, the SSI of a given stimulus is roughly proportional to the
number of spikes that were associated with it, i.e., i
ssi
(s) ≈ p(spike|s)i
sp
(spike) ∝ p(spike|s).
Thus, each stimulus contributes to the spike-triggered and SSI-weighted averages in proportion
to the number of spikes that each stimulus evokes, resulting in the same shape shown in
figure 2(B).
6. Specific surprise identifies a different aspect of stimulus significance
Thus, the SSI gives an appropriate characterization of the ‘best encoded’ stimuli, in stark
contrast to the specific information. However, as mentioned earlier, there are many possible
stimulus-specific decompositions of mutual information (equation (2)), some of which might
provide alternative but reasonable classifications. For example, in their discussion of a measure
of the information associated with individual symbols (i.e., stimuli and responses), DeWeese
and Meister (1999) consider the ‘specific surprise’, given by (when applied to stimuli)
i
sur
(s) =
r
p(r|s) log
2
p(r|s)
p(r)
. (6)
Note that this is the Kullback–Leibler distance between the conditional probability p(r|s) and
the marginal distribution p(r).Below,Idemonstrate that the specific surprise provides an
alternative measure of stimulus significance, but it highlights other properties of the stimulus
in a way that lacks the intuitive meaning of the SSI as a measure of the information associated
with a stimulus.
Specific surprise, in the form of equation (6), compares the marginal distribution p(r) to
the conditional distribution p(r|s).Asdiscussed above in relation to specific information, such
184 DAButts
Specific Surprise (bits)
Number of Stimuli
A. B.
0 1 2 3 4 5 6
0
2
4
6
8
10
AB
C
D
E
A
B
C
D
E
-100 -80 -60 -40 -20 0
-0.1
0.0
0.1
0.2
Averaged Stimulus
Spike-Triggered Average
Time before response (ms)
Surprise-Averaged
SSI-Averaged
Figure 3. The specific surprise. (A) The distribution of specific surprises over the stimulus
ensemble is calculated for same visual neuron as in figure 2. Though the distribution is very similar
to the SSI distribution (figure 2(A), left)—with the same top five stimuli, there are subtle differences,
such as their order (A–B–D–E–C). (B) The specific-surprise-weighted average stimulus (filled
circles) is compared to the spike-triggered average stimulus (solid curve, scaled for the best fit) and
the best fit of the SSI-weighted average from figure 2(B) (dotted curve).
acomparison is not intuitively meaningful with respect to causality. However, using Bayes’
law, specific surprise can be re-expressed as
i
sur
(s) =
r
p(r|s)
log
2
1
p(s)
− log
2
1
p(s|r)
. (7)
The logarithm of the reciprocal probability of a stimulus is often referred to as its ‘surprise’,
since rarer stimuli are more ‘surprising’. In this form, specific surprise has a causal meaning:
the reduction in surprise of a particular stimulus gained from each response, averaged over
all responses associated with that stimulus.Thus, whereas the SSI weighs each response r
by its effect over the stimulus ensemble (through the change of H [S]toH [S|r]), the specific
surprise measures of the effect of the response on just the stimulus in question (through the
change of p(s) to p(s|r )).
How does using a different response weight change the evaluation of stimulus significance?
In figure 3(A), the specific surprise is calculated for the simulated visual neuron. It provides
classifications of the stimuli similar to the SSI, and identifies the same top five stimuli, though
with a different order (A–B–D–E–C). The subtle differences between the two measures in
this example are reflected in a comparison between the SSI-weighted average of figure 2(B)
and the specific-surprise-weighted average, calculated in the same way, shown in figure 3(B).
The specific-surprise-weighted average is shown as solid circles, and the best scaling of the
spike-triggered average is shown as a solid curve. In the meantime, since the SSI-weighted
average fit the STA almost exactly (figure 2(B)), theSTAscaled to the SSI-weighted average
is shown as a dotted curve.Notethat the shape of the surprise-weighted average stimulus is
adifferent shape to the STA, and it has a smaller magnitude compared to the SSI-weighted
average, meaning that larger specific surprise does not as closely correspond to particular
features of the stimulus.
The specific surprise and the SSI behave more distinctly in the simple example of the 2 ×2
joint probability distribution described earlier in this paper. In this case, the specific surprise is
higher for the second stimulus s
2
(which only had one ambiguous response r
1
associated with
it): i
sur
(s
1
) = 0.08 bits and i
sur
(s
2
) = 1bit. This contrasts with the intuitive notion that s
1
is
better encoded than s
2
,sinces
2
is never clearly designated by a response. Does this mean that
specific surprise is a bad measure of a well-encoded stimulus?
Stimulus-specific information 185
0 1 2 3 4 5 6
0
2
4
6
8
10
A'
C'
B'B'
D'
Stimulus-Specific Information (bits)
Number of Stimuli
A'C' B'D'
Figure 4. A non-linear neuron. The distribution of the SSI for the 2
10
stimuli is calculated for a
non-linear neuron whose spike-triggered average is zero. This neuron is designed such that opposite
stimuli evoke the same response, resulting in pairs of stimuli that have the same SSI (inset).
In fact, it is simply a different measurewith a different interpretation: while s
1
is
unambiguously denoted by an observation of r
2
(changing its probability:
3
4
→ 1), the
measurement of r
1
decreases its likelihood (
3
4
→
1
2
), nearly cancelling the change in surprise:
i
sur
(s
1
) = 0.08 =
1
3
(−0.59) +
2
3
(0.42) bits. In the meantime, the specific surprise of s
2
is
relatively large (i
sur
(s
2
) = 1bit) because the only response associated with s
2
is r
1
,andit
leads to an increase in the probability of observing s
2
:
1
4
→
1
2
.Thus, each possible response
contributes to the specific surprise of a particular stimulus s in relation to how much its
observation increases the likelihood of s.
Why do the specific surprise and the SSI give similar results with the visual neuron? For
aresponse to be informative (i.e., have a large specific information i
sp
(r)), the entropy of
its conditional distribution p(s|r) must be less than that of the prior p (s),meaning that the
probability of some stimuli must be increased. As a result, in many cases, an informative
response will result in the largest changes in probability of the stimuli that are best encoded,
and the specific surprise and the SSI will be qualitative agreement. However, as the simple
2 × 2examplemade clear, this is not always the case.
So,the specific surprise and the SSI are measuring fundamentally different aspects of
stimulus significance. Since the SSI is explicitly measuring an average difference in entropies,
it has intuitive meaning with regard to information transmission. As a result, the SSI in both
the simple 2 × 2example and the visual neuron fulfil this intuition of what a well-encoded
stimulus is, in contrast to the specific surprise.
7. Non-linear neurons
In the example of the visual neuron, the SSI correctly identified the significant stimuli to a
neuron, in a way that was validated by linear analysis of the neuron, i.e., the spike-triggered
average. In this way, the agreementbetween the spike-triggered and SSI-weighted average
stimulus demonstrated that the neuron is essentially linear. Thus, while this example validates
the performance of the SSI, it makes clear that linear analysis would have been sufficient in
identifying the significant stimuli in this case.
Information theory is most useful in studying neurons with non-linear properties, or those
that do not encode the bulk of their information in single spikes. To demonstrate the potential
utility of the SSI, the visual neuron model discussed above is modified to exhibit non-linear
186 DAButts
behaviour. This modified model retains the same spiking properties, but is designed such
that it has, on average, the same response to a stimulus as it does to the opposite stimulus
(see the methods for more details). For example,anon-to-off transition will now evoke the
same number of spikes as an off-to-on transition with the same latency. As a result, the spike-
triggered average stimulus of this neuron is zero, since a spike is just as likely to be evoked by a
particular stimulus as its opposite. In this case, important stimuli can only be identified through
higher-order statistics, such as the spike-triggered covariance used by Arcas et al (2000).
Of course, differences in higher-order statistics are detected by information theoretic
measures, and the SSI distribution for the modified non-linear neuron is shown in figure 4
(calculated in the same way as the distribution of figure 2(A)). Thesixstimuli with the largest
SSI are shown in the inset, demonstrating the pairing of each stimulus with its opposite.
8. Conclusions
Ihaveshownthatthe SSI is an appropriate and reliable measure of the information associated
with aparticular stimulus. As with other information measures, the SSI is calculated without
particular assumptions about the coding scheme, and is robust to non-linearities in the system
being studied. As a result, the SSI is particularly useful in identifying the stimuli that are
significant to a neuron where linear analyses break down. The SSI is also useful in cases
where neural responses other than individual spikes carry information, though such examples
were not considered in this paper.
Unfortunately, unlike the specific information of a response proposed by DeWeese and
Meister (1999), the SSI does not have a mathematical quality that shows it to be the only
possible measure of information associated with a stimulus. As previously discussed, there
are many possible stimulus-specific decompositions of mutual information; as a result, I have
chosen a decomposition that has an intuitive meaning with respect to information transmission,
and furthermore have shown it gives expected results in both simple constructed examples and
realistic examples using neuronal data.
The SSI has the same drawbacks as other information measures: a significant number
of data are required in order to properly estimate the underlying probability distributions
needed by such calculations. However, since the same data as are used to calculate Shannon’s
mutual information can also be used for the specific information of responses and the SSI of
stimuli, these specific measures can extend the applicability and use of information studies in
neuroscience.
Methods
Simulated visual neurons
Data for the information calculations performed on the example visual neuron in this paper
wasgenerated using a model of visual neurons proposed by Keat et al (2001). This
phenomenological model isable to reproduce both the precise spike timing and variability
of observed cat LGN neurons, and thus generated ‘realistic’ neuronal data for the purpose of
evaluating information quantities.
The simulated neuron is presented as either black or white frames (with equal probability)
at 128 Hz, and generates a neural spike train. The basis of the neural response is a linear
convolution between the stimulus s(t) and a linear kernel K (τ ):
g(t) =
0
−∞
dτ K (τ )s(t − τ).
Stimulus-specific information 187
This function of time g(t) is modified by adding correlated Gaussian noise and a term that
represents spike-dependent effects, which accounts for a neuronal refractory period. Spikes
occur when the total of g(t), noise, and spike-dependent effects exceeds a threshold.
Forthe information calculations, a large data set consisting of 32 million stimulus frames
(representing roughly 70 h in real time) was used. This copious data set allowed for an
information analysis of long stimulus and response words simultaneously, though information
theoretic analysis is possible with far fewer data (see Liu et al 2001).
Non-linear neurons
To generate a neuron with a spike-triggered average of zero, the same model as above was
used, with one modification: the linear convolution g(t) wassquared before plugging it into
therest of the model. As a result, opposite stimuli, which result in g
0
and −g
0
in the linear
model, now have the same effect on the response: (g
0
)
2
.Otherparameters of the model were
scaled so that this model had the same overall spike rate.
Acknowledgments
Iamgrateful to Mark Goldman for extensive input and comments on this manuscript. This
work was supported by an NSF Postdoctoral Fellowship in Biological Informatics.
References
Arcas B A Y, Fairhall A L and Bialek W 2000 What can a single neuron compute? Adv. Neural Inform. Process. Syst.
13 75–81
Borst A and Theunissen F E 1999 Information theory and neural coding Nature Neurosci. 2 947–57
Cover T M and Thomas J A 1991 Elements of Information Theory (New York: Wiley)
Dan Y, Alonso J M, Usrey W M and Reid R C 1998 Coding of visual information by precisely correlated spikes in
the lateral geniculate nucleus Nature Neurosci. 1 501–7
DeWeese M R and Meister M 1999 How to measure the information gained from one symbol Network: Comput.
Neural Syst. 10 325–40
Keat J, Reinagel P, Reid R C and Meister M 2001 Predicting every spike: a model for the responses of visual neurons
Neuron 30 803–17
LiuRC, Tzonev S, Rebrick S and Miller K D 2001 Variability andinformation in a neural code of the cat lateral
geniculate nucleus J. Neurophysiol. 86 2789–806
Reinagal P and Reid R C 2000 Temporal coding of visual information in the thalamus J. Neurosci. 20 5392–400
Theunissen F, Roddey J C, Stufflebeam S, Clague H and MillerJP1995 Information theoretic analysis of dynamical
encoding by four identified primary sensory interneurons in the cricket cercal system J. Neurophysiol. 75 1345–64