sagvolden, johansen, aase, and russell (sagvolden et al.) examine attention-deficit/hyperactivity disorder (adhd) at levels of analysis ranging from neurotransmitters to behavior. at the behavioral level they attribute aspects of adhd to anomalies of delay-of-reinforcement gradients. with a normal gradient, responses followed after a long delay by a reinforcer may share in the effects of that reinforcer; with a diminished or steepened gradient they may fail to do so. steepened gradients differentially select rapidly emitted responses (hyperactivity), and they limit the effectiveness with which extended stimuli become conditioned reinforcers, so that observing behavior is less well maintained (attention deficit). impulsiveness also follows from steepened gradients, which increase the effectiveness of smaller, more immediate consequences relative to larger, more delayed ones. individuals who vary in the degree to which their delay gradients are steepened will show different balances between hyperactivity and attention deficit. given the range of adhd phenomena addressed, it may be unnecessary to appeal to additional behavioral processes such as extinction deficit. extinction deficit is more likely a derivative of attention deficit, in that failure to attend to stimuli differentially correlated with extinction should slow its progress. the account suggests how relatively small differences in delay gradients early in development might engender behavioral interactions leading to very large differences later on. the steepened gradients presumably originate in properties of neurotransmitter function, but behavioral interventions that use consistently short delays of reinforcement to build higher-order behavioral units as a scaffolding to support complex cognitive and social skills may nonetheless be feasible.
Sagvolden, Johansen, Aase, and Russell (SJA&R) provide an interpretation of attention-
deficit/hyperactivity disorder (ADHD) at levels of analysis that range from neurotransmitters to
behavior. In the long run, the success of their account will depend on the adequacy with which
fine details of dopamine systems are linked via grosser cellular and neuroanatomical levels to
their eventual molar behavioral products. To the extent that evolutionary contingencies have
selected nervous systems on the basis of the behavior that they engender, we must understand the
properties of that behavior if we are to understand how the brain serves it (Catania 2000). My
main objective here is to elucidate aspects of SJA&R’s account that bear on the possible roles of
delay-of-reinforcement gradients and other behavioral phenomena in producing ADHD.
The ubiquity of delayed reinforcement. Much important behavior, called operant
behavior, occurs because of its consequences, that is, its effects on the environment. Some
important consequences are those that afford opportunities for new behavior, as when something
one does allows eating or drinking or playing, or as when one’s shift of attention leads to new
things seen or felt or heard. Responses that produce particular consequences are said to be
members of operant classes. Some consequential effects are immediate and others are delayed,
and their immediacy determines the potency with which they change or maintain behavior. In
other words, the extent to which consequences such as reinforcers operate to alter the future
likelihood of responses in the class that produced them depends, along with many other
variables, on the delays between the responses and their consequences.
Delay of reinforcement is a ubiquitous effect even if reinforcers are delivered very promptly on
responses, because other responses typically precede the one that actually produces the reinforcer
(Dews 1962). “The reinforced response is followed by the reinforcing stimuli; the preceding
unreinforced responses are also followed by the reinforcing stimuli, though not quite so
promptly. Indeed, the whole pattern of . . . responding is followed by the reinforcing stimuli and
so, in a sense, is reinforced” (Dews 1966, p. 578). It was once regarded as paradoxical that
schedules of intermittent reinforcement produced more behavior than the reinforcement of every
response. But if only every 10th response produces a reinforcer, 10 responses, not just the last
one, share in the effects of that reinforcer. The earlier responses make a smaller contribution than
the later ones by virtue of the longer delays that separate them from the reinforcer, but the sum of
all 10 contributions is necessarily greater than that from the 10th response alone.
One way of thinking about how reinforcers work is to assume that responses weighted according
to a decay function by the delays that separate them from a reinforcer contribute to a reserve of
potential behavior, and that subsequent responding depends on the magnitude of that reserve,
which is then depleted when responding occurs without reinforcement (e.g., Catania 2001;
Catania 2003). Skinner (1938) proposed a reserve that received contributions only from the
response that just preceded the reinforcer, but retracted the proposal when it became clear that it
could not accommodate data from schedules of reinforcement (Skinner 1940). The retraction
might have been unnecessary if the contributions of responses preceding the one that produced
the reinforcer had been recognized (Catania 1971).
Furthermore, delays may affect behavior in other ways. The onset of a stimulus that sets the
occasion for responding may be followed by a reinforced response after a shorter or a longer
delay. If reinforcers are delivered in its presence, the stimulus will become a conditional
reinforcer, but its potency will depend on the delay (Dinsmoor 1983; 1995). One simple but
exceedingly important response that is maintained by such a stimulus is that of attending to it. A
stimulus in the presence of which an opportunity for reinforcement is likely to arise very soon is
more likely to be observed or looked at or attended to than one in the presence of which that
opportunity is still some time away.
Experimental assessments of delay gradients. Figure 1 provides examples of two delay
gradients obtained with pigeons. The first shows rates of responding as a function of the time
between one response and the later reinforcement of a different response; the second shows rates
of responding maintained by a response-produced stimulus as a function of the time between the
onset of that stimulus and the subsequent delivery of a reinforcer in its presence. In both cases
the data have been fit by exponential decay functions. Candidates for the delay gradient have
included exponential, hyperbolic, and logarithmic functions, but the appropriateness of one or
the other depends on both procedural and statistical considerations. For example, integrals of
hyperbolic functions approach logarithmic functions, so the former are better fits to data from
procedures that assess one point on the gradient at a time, whereas the latter are better fits to data
from procedures that assess rates of responding over long time periods and therefore across a
range of delays. Therefore, variance in the decay parameters of exponential functions may
generate hyperbolic functions when data are averaged (Killeen 1994; Killeen 2001).
Figure 1. Pigeon 73: Rate of left-key pecks as a function of the delay between the last left-key
peck (*) and a reinforcer produced by a right-key peck (). Pigeon 47: Rate of key-A pecks as
a function of the delay between the key-A peck that turned on the key-B stimulus (*) and the
later production of a reinforcer by a key-B peck in the presence of that stimulus (). Procedures
are shown schematically below each graph.
The first experiment illustrated in Figure 1 involved random-interval reinforcement of a
sequence of pecks on two keys by a pigeon. For example, if reinforcement was contingent on
exactly four left pecks followed by exactly four right pecks, left pecks would always be
separated from the reinforcer by the time taken to emit the right pecks, and that time could be
manipulated by varying the required number of right pecks. The data for Pigeon 73 in Figure 1
were obtained by varying the required number of pecks on the right key (R), while the number
required on the left key (L) was held constant (cf. Catania 1971). Similar data can be generated
with procedures that alter the time it takes for the pigeon to emit its right-key pecks; such
procedures demonstrate that time rather than the intervening number of responses is the
appropriate dimension along which to measure the effects of delayed reinforcers (cf. Catania
The second experiment involved an observing-response procedure (Kelleher et al. 1962). During
successive presentations of yellow on the right key (B), contingencies irregularly alternated
between a fixed-interval schedule of reinforcement and an equal duration of extinction. These
presentations were preceded by brief presentations of the left or observing-response key (A), lit
white. If a white-key (observing) peck occurred during a brief window of time before the onset
of the right-key stimulus, the right key lit green if the current contingency was fixed-interval
reinforcement, and the right key lit red if it was extinction. Procedures that allow observing
pecks to produce only green if fixed interval or only red if extinction show that observing pecks
are maintained because green under these circumstances functions as a conditional reinforcer.
Essentially, pigeons peck the observing key to get a look at green on the right key. But, as shown
in Figure 1, the rate of left-key pecking decreases as a function of the duration of the fixed
interval. The potency of green as a conditional reinforcer that maintains the observing response
depends on the delay from the onset of green to the later delivery of a reinforcer. A substantial
body of evidence demonstrates that organisms work to observe discriminative stimuli correlated
with the delivery of reinforcers; they do not work to observe discriminative stimuli that are
equally informative but are instead correlated with extinction or aversive events (Dinsmoor
1983; 1995).
Both delay gradients in Figure 1 extend over many seconds. They are the facts about behavior
that must be taken into account by hypotheses about mechanism. The gradients may be expected
to vary as a function of a variety of parameters, and their properties are presumably influenced
by such factors as whether response sequences are homogeneous or heterogeneous and whether
the responses that make up those sequences are relatively simple units or are instead integrated
higher-order, and perhaps temporally extended, ones (Catania 1995; 1998). In any case, the
durations of the delays considered here differ by orders of magnitude from those of synaptic
events or even of cascading neuronal processes involving large numbers of cells.
Implications of anomalous delay gradients. Now we are ready to examine the
implications for ADHD. As argued by SJA&R, the two major components of ADHD,
hyperactivity and attention deficit, can each be interpreted as consequences of a delay-of-
reinforcement gradient that is more limited in its temporal range than the ordinary delay gradient.
Figure 2 illustrates the rationale by comparing one hypothetical exponential decay gradient with
another that declines more steeply. Each gradient is assumed to end when it reaches the previous
reinforcer, based on data showing that the retroactive effects of reinforcers do not extend back
past the previous reinforcer to still earlier responses (Catania et al. 1988), though this blocking
might be attenuated in situations where reinforcers vary in kind or magnitude.
Figure 2. A hypothetical normal delay gradient (1) and one that decays more steeply over time
(2). Each gradient represents the magnitude of the effect of a reinforcer (arrow) on events that
occur at different earlier times. Illustrative response sequences are shown in A and B;
illustrative discriminative stimuli (and therefore potential conditional reinforcers) are shown in C
and D (cf. Figures 8 and 10 in SJA&R).
If gradient 1 operates for the reinforced behavior of a given organism at a given time, then the
five responses in A as well as the five in B will share in the effects of the reinforcer, but the
summed effects in B will clearly be greater than those in A. Similarly, it will support the stimuli
in both C and D as conditional reinforcers, but the effectiveness as a conditional reinforcer of
the stimulus in C will clearly be weaker than that in D. With gradient 2, however, the early
responses in A and the stimulus with early onset in C will be outside the range of effectiveness of
the reinforcer, because at those longer delays the gradient is at near-zero levels. This gradient
will differentially strengthen relatively rapid sequences of responses, and only stimuli with
relatively short delays from onset to reinforcer will be sufficiently effective as conditional
reinforcers to sustain observing behavior. The outcome will be rapid responding accompanied by
deficits in observing behavior or, in other words, hyperactivity plus attention deficit. The
differential strengthening of relatively rapid responding takes time, so a delay function like that
of gradient 2 may engender hyperactivity; but the hyperactivity may take a while to develop and
may develop separately in different environments.
The case for steepened delay gradients as a mechanism underlying ADHD is strengthened by
comparisons of the behavior of Wistar Kyoto (WKY) and spontaneously hyperactive (SHR) rats
(though the latter abbreviation was originally based on the hypertension of those rats, which was
discovered first, rather than on their hyperactivity). SJA&R present the argument for SHR rats as
a nonhuman model for ADHD in some detail (and see also Sagvolden 2000; Sagvolden et al.
1993; 1988). In other research with WKY and SHR rats, reinforcers were arranged for a fixed
consecutive number of responses on one lever followed by a single response on a second lever,
and longer response sequences were maintained by WKY rats than by SHR rats (Evenden &
Meyerson 1998). This is what we would expect if delay gradients for SHR rats were abridged or
steepened relative to those of WKY rats, and it suggests that a direct comparison of delay
gradients for SHR and WKY rats in experiments similar to those illustrated in Figure 1 would be
of substantial interest. And if a quick way could be developed to obtain such gradients from non-
ADHD and ADHD children (say, using computer games on laptop computers), such data would
not only help to validate SJA&R’s SHR model but might also be of considerable diagnostic
To this point I have considered only gradients based on reinforcing events. It would be useful to
know about the properties of delay gradients involving aversive stimuli. Aversive stimuli may
reduce behavior when they are contingent on responses in punishment procedures, or they may
maintain behavior when they are postponed or canceled by responses in avoidance procedures
(Catania 1998, pp. 88–110). Steepened gradients would probably make a difference in either
case. Steepened punishment gradients would reduce the effectiveness of both natural punishment
contingencies (e.g., getting burned on touching a hot stove) and artificial ones (getting scolded
after teasing a sibling); this could be manifested in proneness to accidents as well as in
disobedience. Steepened avoidance gradients would make it more difficult to maintain avoidance
behavior, because such behavior makes only indirect contact with aversive events (after a
successful avoidance response, nothing happens); this could be manifested in risk-taking or other
varieties of carelessness.
Impulsivity. One aspect of behavior often included in diagnoses of ADHD is impulsivity or
impulsiveness, where behavior with fairly immediate consequences dominates over behavior
with larger but more delayed consequences. Impulsivity is sometimes described in terms of
executive dysfunction, disinhibition, or failure to withhold behavior, and it is typically regarded
as the inverse of self-control (Rachlin & Green 1972). An account of impulsivity and self-control
in terms of hypothetical delay gradients is illustrated in Figure 3 (cf. Rachlin 1995, Fig. 1, p.
Imagine a rat given access to two levers on trials that occur every minute or so. A press on the
first lever 10 seconds into the trial or later produces a small reinforcer, and a press on the second
lever 30 seconds into the trial or later produces a large reinforcer. Each trial ends as soon as
either reinforcer is delivered. If 10 seconds pass and the rat presses the first lever, it receives the
small reinforcer but has permanently lost the large one on that trial. The only way to obtain the
later large reinforcer is to refrain from pressing the first lever until the large reinforcer is
available for a press on the other lever. On the left, Figure 3 shows the respective exponential
decay gradients engendered by the smaller but earlier reinforcer arranged for the first response at
time A and by the larger but later reinforcer arranged for the other response at time B.
This example assumes some separate experience with the contingencies arranged for each lever.
A rat in this situation for the first time might start with presses on the A lever, always producing
the smaller, more immediate reinforcer, and so might never reach the time at which its press on
the B lever could produce the larger but later one. The relative heights of the respective gradients
can be taken as representing the relative likelihoods of the two responses during the time leading
up to the earlier reinforcer. The two gradients are shown starting at different maxima reflecting
the different A and B reinforcer magnitudes; if they started at equal maxima and decayed at
equal rates, they could not cross at E.
In this example, the B response is more probable than the A response up until time E, but
thereafter the A response becomes more probable. One way to overcome the higher probability
of A (or, in other words, to show self-control rather than impulsiveness) is if a B response prior
to time E becomes a commitment of some kind. For example, the B response might make the A
response unavailable (perhaps via retraction of the A lever) for the remainder of the time until
the B reinforcer becomes available. Under such circumstances, we might observe many instances
of self-control, in the sense that B responses committing to the later larger reinforcer would
occur before any A responses that would produce the smaller earlier reinforcer and therefore end
the sequence.
Figure 3. Hypothetical normal (A and B) or anomalous (C and D) delay gradients based on a
relatively small reinforcer at an early time (A or C) and a larger one at a later time (B or D). If
the relative height of the gradient at a given moment is a predictor of changing preference
between the smaller and larger reinforcers, the gradients on the left generate impulsiveness, or
selection of the more immediate smaller rather the more delayed larger reinforcer, only between
E and A; a commitment made prior to E results in selection of B and would be regarded as an
instance of self-control. With the steeper gradients on the right, however, impulsiveness
prevails throughout the entire range of delays.
Now consider the steeper gradients on the right in Figure 3. In this instance, the gradient
engendered by the smaller earlier reinforcer is everywhere higher than the other gradient in the
time leading up to C, even though the D gradient starts at a relatively higher maximum. With
these steepened gradients, there will be no circumstances in which the probability of the D
response exceeds that of the C response, so self-control will be completely displaced by
impulsivity. Impulsivity follows so directly from these kinds of gradients that it is not necessary
to appeal to deficient extinction or executive dysfunction.
For impulsivity as for hyperactivity and attention deficit, no problems are posed by issues of
localization, such as SJA&R’s discussion of dopaminergic systems in mesolimbic, mesocortical,
and nigrostriatal branches (e.g., Fig. 1 SJA&R). Delay gradients with common decay properties
could as easily operate for behavior classes intermixed within a single area as for those discretely
localized in separate areas.
Individual differences in the balance between hyperactivity and attention deficit.
As outlined in SJA&R’s review of ADHD, some individuals display both hyperactivity and
attention deficit, but in others one or the other component dominates. These individual
differences vary with gender, age, and other variables (e.g., Sagvolden & Berger 1996). They can
be accommodated by assuming delay gradients that decline at different rates. Varieties of
presentation of ADHD symptoms are perhaps best viewed not as separate classes but rather as
lying along a continuum involving rate of decay of the delay gradient as a parameter. Two ways
in which delay gradients might vary are illustrated in Figure 4.
Figure 4. On the left, the hypothetical delay gradients descend exponentially from common
maximum values. In this instance, the normal gradient (a) is the highest, and all other gradients
are based on decrements relative to it. On the right, a similar family of gradients has been
transformed so that the area under each curve is a constant. In this instance, the normal
gradient (b) is the one that intersects the origin at the lowest point, so that the other gradients
show decrements relative to it at longer delays and increments at shorter delays.
Consider first the family of gradients on the left, in which the highest gradient (a) represents a
normal or non-ADHD gradient. Let us start with the steepest gradient, furthest from the normal
gradient. For the individual whose gradient drops asymptotically to near zero within a second or
so, responses must be very close to the reinforcer to be captured by it. The time period is so short
that only single responses can typically be strengthened. If sequences of responses cannot be
strengthened, there will be no hyperactivity. But this gradient will generate profound attention
deficit, because only brief stimuli quickly followed by reinforcers will acquire any conditional
reinforcing effectiveness. (We might also expect such other problems as severe impulsiveness
and poor acquisition of coordinated sequential behavior.)
Next consider a gradient that drops asymptotically to near zero only after a delay of a couple of
seconds or so. Attention deficit is still likely to be a problem, but in this case sequences of rapid
responses will sometimes be fully captured within the effective temporal extent of the gradient.
They will come to dominate over slower sequences of responses, so in this instance we can
expect to see both attention deficit and hyperactivity.
Finally consider a gradient that drops asymptotically to near zero only after several seconds and
therefore is closer to the normal gradient (a). The longer time period means that attention deficit
will be less of a problem, because stimuli will acquire conditional reinforcing properties, though
perhaps with slightly diminished potency. But faster response sequences will still be
differentially strengthened relative to more leisurely ones. In this case hyperactivity will
dominate and any attention deficit that becomes evident is likely to be mild.
We could play out the details further (e.g., by extending the argument to impulsivity), but the
point is that a single parameter determining the rate of decay of the delay gradient might be
sufficient to determine both the absolute and the relative severity of the attention and
hyperactivity components of ADHD. If a compromised dopamine neurotransmitter mechanism is
implicated in ADHD, as proposed by SJA&R, graded behavioral outcomes should be expected
from variations in the degree of compromise. The account is of special interest because it
promises to subsume a range of individual differences under a single mechanism.
But this is only one way in which the parameters of delay gradients might vary. Another
possibility is illustrated in the right graph of Figure 4. In that case, the normal or non-ADHD
gradient (b) is the one that crosses the y-axis at the lowest point. The others decline more steeply,
like those in the left graph. Here the area under each curve is equal to a constant. Such functions
might be appropriate, for example, if variations in the rate of decay depend on how quickly a
fixed quantity of some neurotransmitter is depleted. Such depletion can occur either slowly or
rapidly, as in the family of curves on the left, but the steeper the rate of decay, the higher the
maximum would have to be to hold the area constant. Differential selection of response
sequences and maintenance of attention would still vary with the rate-of-decay parameter, but
these curves have some additional implications.
One argument in favor of the equal-area functions on the right over the exclusively decremental
functions on the left is suggested by the impulsivity examples in Figure 3. An account of
impulsivity in terms of exponential gradients will not work unless the gradients generated by
different reinforcer magnitudes start at different maxima. Furthermore, if the effects are
everywhere decrements, as on the left in Figure 4, then the only source of higher rates of
responding would be the differential selection of rapid sequences; with extreme decrements, little
if any responding could be supported by reinforcers. This might be an appropriate model for
other behavior pathologies, but it seems not to capture the defining features of ADHD.
The equal-area functions in Figure 4, however, are consistent with a model in which a reserve of
potential behavior is replenished by responses weighted according to the delays that separate
them from a reinforcer and in which subsequent responding depends on the magnitude of that
reserve. In this case, hyperactivity follows not only from the differential strengthening of more
rapid sequences but also from the direct strengthening of responses that are very quickly
followed by reinforcers. With equal-area functions, greater strengthening occurs with steeper
functions, but with steeper and steeper functions, the temporal window within which responding
will be strengthened progressively narrows.
SJA&R argue that children with ADHD are less sensitive to changes in reinforcement
contingencies and require stronger and more salient reinforcers. This might seem consistent with
the decremental (left) gradients of Figure 4, but problems that appear to be motivational might
instead be problems of contingencies. Apparent insensitivity to reinforcement contingencies can
come about not only because of weak reinforcers but also because of strong reinforcers presented
after a delay. Furthermore, the latter problem will be more likely with steeper delay gradients.
Extinction deficit. I have so far emphasized delay gradients. But along with SJA&R’s
presentation in terms of delay gradients, they have also offered extinction deficit as an alternative
mechanism contributing to the complex of symptoms that define ADHD. We have already seen
that delay gradients on their own adequately account for many features of ADHD, but there are
other reasons besides parsimony to question the role of extinction deficits.
Extinction demonstrates that the effects of reinforcement are temporary, and SJA&R correctly
point out that the variables that produce increments in responding when reinforcement begins
may be different from those that produce decrements after it ends. It is therefore appropriate to
consider different mechanisms for reinforcement and for extinction. But extinction deficit, the
absence of the response decrements that typically occur during extinction, has no relevant
temporal parameters and therefore is not applicable to situations that can be interpreted in terms
of differential delays (that is another reason why the direct determination of delay gradients with
WKY and SHR rats might be especially valuable).
One problem with assessing extinction effects is the metric used to assess the progress of
extinction. For example, if extinction for SHR rats begins with higher baseline rates of
responding than for WKY rats, should comparisons be based on relative declines in responding
or on the absolute levels reached at certain times? Procedures that changed baseline rates of
responding for one or the other group in an attempt to match baseline rates would have to deal
somehow with the differential effects of the contingencies that such matched baselines would
Another and perhaps even less tractable problem with assessing extinction deficit, however, is
that extinction is rarely studied in isolation. In Johansen & Sagvolden (2004), for example,
extinction was studied in successive sessions that each began with a fixed period of
reinforcement. Thus, the procedure involved the acquisition of a discrimination between the
early and the late portions of each session. If attention deficit affects orientation toward visual
cues, it presumably also affects attention without evident motor components, such as attention to
temporal cues. (I here treat attention as a variety of behavior, but one defined by the
environmental contingencies it can enter into rather than by a particular topography.) Thus, even
if SHR rats responded more in extinction than WKY rats, the difference could be attributed as
readily to differences in attention to temporal stimuli as to an extinction deficit.
Failure to attend to temporal cues rather than extinction deficit might also account for continued
responding early in the individual segments of fixed-interval (FI) schedules of reinforcement. A
similar confounding exists in procedures that compare reinforcement versus extinction
contingencies arranged in the presence of different visual or auditory stimuli, where what might
seem like extinction deficit might depend instead on a failure to attend to relevant stimuli. Thus
it seems reasonable to consider the possibility that extinction deficit is not a separate source of
some of the properties of ADHD, but rather is a derivative of the kinds of anomalies of delay
gradients that we have already considered.
I have had little to say here about other factors that might contribute to ADHD, such as executive
functions, verbal governance, and other higher-order processes. But given differences in delay
gradients similar to those already considered, it is plausible that complex skills such as the
hierarchical structuring of verbal governed behavior and the monitoring of one’s own behavior
would develop differently in a child with than in a child without ADHD.
ADHD and development. As we know from the analysis of nonlinear systems, very small
differences in initial conditions can result in exceedingly large long-term differences (Gleick
1987). For example, even if the only problem with autism was aversion with regard to both eye
contact and touch, many of the everyday contingencies that build social interaction would be
missed, such as not noticing when a parent smiles at something one has done. These interactions
provide the scaffolding on which more complex social behavior depends, including verbal
behavior, so the effects will be seen in all of the other behavior that depends on them. This is
presumably why early intervention matters so much.
One significant feature of SJA&R’s account is the parallel case they have presented for ADHD.
It should be no surprise that different early histories with ADHD, especially in combination with
the variations in delay gradients that we have entertained, could lead to vastly different spectra of
behavioral competencies and difficulties. Might small path dependencies lead sometimes to
oppositional defiant disorder and sometimes to conduct disorder and sometimes to neither? Even
the dominance of motor versus cognitive components might depend on differences in historical
paths, and perhaps we should also entertain the possibility that such behavioral trajectories can
drive certain features of brain organization rather than be driven by them. As suggested by
SJA&R, analyses in terms of the ebb and flow of complex interactions of behavior with
contingencies involving parents, peers, teachers, and others are a daunting but unavoidable
Perhaps there are also circumstances in which features of ADHD are advantageous. With
experimental contingencies that favor varied over stereotyped response sequences, for example,
comparisons of the behavior of WKY and SHR rats have shown that SHR rats learn to vary
rather than repeat sequences more readily than WKY rats (Mook et al. 1993). Variable behavior
provides the raw material on which the selection of behavior by contingencies operates within
individual lifetimes, so this behavioral capacity may have been selected by evolutionary
contingencies (cf. Neuringer 2002). We may argue from our anthropocentric view that an
organism with more extended delay gradients will be more capable of taking into account events
that are more remote in time, but such capabilities surely must be balanced against the
importance of its sensitivity to the immediate consequences produced by its behavior.
Interventions and implications. If delay gradients are implicated in ADHD, their properties
presumably originate in the properties of neurotransmitter function, but this does not imply that
pharmacological interventions are the only recourse. Behavioral interventions that use
consistently short delays of reinforcement to build higher-order behavioral units as a scaffolding
to support complex cognitive and social skills may nonetheless be feasible. For example, the
shaping of behavior with prompt consequences both correlated with and intermixed with longer-
term ones might provide the prerequisites for building conditional reinforcers that maintain
longer periods of attention and that bridge increasingly extended delays. The decremental (and
detrimental) effects of delays might be attenuated with the creation of higher-order temporal
units, especially if they also involve mediation by verbal behavior. Computer games may be
particularly useful tools, because their rapid responsivity, which sometimes so easily captures the
behavior of children with ADHD, allows both for the precise control of contingencies relating
skilled behavior to its consequences and for the structured embedding of minimal behavioral
units into higher-order coordinated units. Behavior is the interaction of an organism with its
environment, so such interventions might teach us things not only about how brain structure
drives behavior but also about how behavior drives brain structure.
It may be worth noting that this account has mostly dealt with behavior in its own terms.
Although the interpretation of ADHD in terms of delay gradients is theoretical, delay gradients
themselves are not theory but rather are measurable properties of behavior. At least in part
because of the limitations of my expertise, this commentary has only occasionally made contact
with other levels of analysis. One of the great strengths of SJA&R’s contribution is its
articulation among the several levels, and I look forward to the buttressing and the widening of
the bridges that they have begun to build among those levels. The following quotation is
particularly apt: “Valid facts about behavior are not invalidated by discoveries concerning the
nervous system, nor are facts about the nervous system invalidated by facts about behavior. Both
sets of facts are part of the same enterprise, and I have always looked forward to the time when
neurology would fill in the temporal and spatial gaps which are inevitable in a behavioral
analysis” (Skinner 1984, p. 543).
Eliot Shimoff collaborated in the research that generated the data used illustratively in Figure 1.
Rouben Rostamian provided helpful insights into the properties of exponential decay functions.
