ArticlePDF Available

A Review of Positive Conditioned Reinforcement1

Authors:

Abstract

This review critically analyzes experimental data relevant to the concept of conditioned reinforcement. The review has five sections. Section I is a discussion of the relationship between primary and conditioned reinforcement in terms of chains of stimuli and responses. Section II is a detailed analysis of the conditions in which the component stimuli in chained schedules of reinforcement will become conditioned rein forcers; this section also analyzes studies of token reinforcement, observing responses, switching responses, implicit chained schedules, and higher‐order conditioning. Section III analyzes experiments in which potential conditioned reinforcers are used either to prolong responding or to generate responding during experimental extinction. This section discusses hypotheses that have been offered as alternatives to the concept of conditioned reinforcement and hypotheses concerning the necessary and sufficient conditions for establishing a conditioned reinforcer. Section IV discusses other variables that act when a conditioned reinforcer is being established or that act when an established conditioned reinforcer is used to develop or maintain behavior. Section V is a general discussion of conditioned reinforcement. The evidence indicates that the conditioned reinforcing effectiveness of a stimulus is directly related to the frequency of primary reinforcement occurring in its presence, but is independent of the response rate or response pattern occurring in its presence. Results from chained schedules comprised of several components indicate that a stimulus can be established as a conditioned reinforcer by pairing it with an already established conditioned reinforcer rather than a primary reinforcer; however, this type of higher‐order conditioning has not been clearly demonstrated with respondent conditioning procedures. Although discriminative stimuli are usually conditioned reinforcers, the available evidence indicates that establishing a stimulus as a discriminative stimulus is not necessary or sufficient for establishing it as a conditioned reinforcer. Discriminative stimuli in chained schedules with several components are not always conditioned reinforcers; stimuli that are simply paired with reinforcers can become conditioned reinforcers. The hypotheses that have been offered as alternatives to the concept of conditioned reinforcement are too limited to integrate the data that exist. The concepts of conditioned reinforcement and chained schedule, however, can be used to integrate the data obtained with diverse techniques. Recent experiments have revealed several techniques for the development of effective conditioned reinforcers. These techniques provide a powerful tool for advancing understanding of conditioned reinforcement and for extending control over behavior.
A
REVIEW
OF
POSITIVE
CONDITIONED
REINFORCEMENT1
ROGER
T.
KELLEHER2
AND
LEWIS
R.
GOLLUB3
HARVARD
MEDICAL
SCHOOL
AND
UNIVERSITY
OF
MARYLAND
This
review
critically
analyzes
experimental
data
relevant
to
the
concept
of
conditioned
reinforcement.
The
review
has
five
sections.
Section
I
is
a
discussion
of
the
relationship
between
primary
and
conditioned
reinforcement
in
terms
of
chains
of
stimuli
and
responses.
Section
II
is
a
detailed
analysis
of
the
conditions
in
which
the
component
stimuli
in
chained
schedules
of
reinforcement
will
become
conditioned
reinforcers;
this
section
also
analyzes
studies
of
token
reinforcement,
observing
responses,
switching
responses,
implicit
chained
schedules,
and
higher-order
conditioning.
Section
III
analyzes
experiments
in
which
potential
conditioned
reinforcers
are
used
either
to
prolong
responding
or
to
generate
responding
during
experimental
extinction.
This
section
discusses
hypotheses
that
have
been
offered
as
alterna-
tives
to
the
concept
of
conditioned
reinforcement
and
hypotheses
concerning
the
necessary
and
sufficient
conditions
for
establishing
a
conditioned
reinforcer.
Section
IV
discusses
other
variables
that
act
when
a
conditioned
reinforcer
is
being
established
or
that
act
when
an
established
conditioned
reinforcer
is
used
to
develop
or
maintain
behavior.
Section
V
is
a
general
discussion
of
conditioned
reinforcement.
The
evidence
indicates
that
the
conditioned
reinforcing
effectiveness
of
a
stimulus
is
directly
related
to
the
frequency
of
primary
reinforcement
occurring
in
its
presence,
but
is
independent
of
the
response
rate
or
response pattern
occurring
in
its
presence.
Results
from
chained
schedules
comprised
of
several
components
indicate
that
a
stimulus
can
be
established
as
a
conditioned
reinforcer
by
pairing
it
with
an
already
established
conditioned
reinforcer
rather
than
a
primary
reinforcer;
however,
this
type
of
higher-order
conditioning
has
not
been
clearly
demonstrated
with
respondent
conditioning
procedures.
Although
discriminative
stimuli
are
usually
conditioned
reinforcers,
the
available
evidence
indicates
that
establishing
a
stimulus
as
a
discriminative
stimulus
is
not
necessary
or
sufficient
for establishing
it
as
a
conditioned
reinforcer.
Discriminative
stimuli
in
chained
schedules
with
several
components
are
not
always
conditioned
reinforcers;
stimuli
that
are
simply
paired
with
reinforcers
can
become
conditioned
reinforcers.
The
hypotheses
that
have
been
offered
as
alternatives
to
the
concept
of
conditioned
reinforcement
are
too
limited
to
integrate
the
data
that
exist.
The
concepts
of
conditioned
reinforcement
and
chained
schedule,
however,
can
be
used
to
integrate
the
data
obtained
with
diverse
techniques.
Recent
experiments
have
revealed
several
techniques
for
the
de-
velopment
of
effective
conditioned
reinforcers.
These
techniques
provide
a
powerful
tool
for
advancing
understanding
of
conditioned
reinforcement
and
for
extending
control
over
behavior.
I.
INTRODUCTION
Reinforcement
is
a
central
concept
in
most
theories
of
behavior
(e.g.,
Skinner,
1938;
Hull,
1943;
Spence,
1956).
Although
each
of
these
theories
of
behavior
(e.g.,
Skinner,
1938;
Hull,
"Preparation
of
this
review
by
the
first
author
was
supported
in
part
by
Research
Grants
M-2094
and
MY-2645
from
the
Institute
of
Mental
Health
of
the
National
Institutes
of
Health,
U.S.
Public
Health
Serv-
ice.
Preparation
of
this
review
by
the
second
author
was
supported
in
part
by
NSF
Grant
G-8621,
by
a
grant
from
the
General
Research
Board
of
the
Uni-
versity
of
Maryland,
and
by
Research
Grant
MY
1604,
National
Institutes
of
Health,
U.S.
Public
Health
Service.
The
authors
are
indebted
to
Dr.
W.
H.
Morse
somewhat
diffeirently,
the
following
definitions
represent
descriptively
the
common
meaning
of
the
term.
A
reinforcing
stimulus
increases
the
probability
of
that
class
of
responses
that
immediately
precedes
it;
the
presentation
of
a
reinforcing
stimulus
is
a
reinforcement.
As
Skinner
has
noted:
for
his
criticism
of
an
earlier
draft
of
this
paper,
and
to
Mrs.
Marion
Stauffer,
Mrs.
Lillian
Stickle,
and
Miss
Sandra
Schneiderman
for
their
help
in
preparation
of
the
manuscript.
2The
first
author
prepared
part
of
this
review
while
at
Smith
Kline
and
French
Laboratories.
3Part
of
the
preparation
of
this
review
by
the
second
author
was
done
at
the
Psychological
Laboratories,
Harvard
University.
543
ROGER
T.
KELLEHER
and
LEWIS
R.
GOLLUB
"Some
reinforcements
consist
of
presenting
stimuli:
of
adding
something-for
example,
food,
water,
or
sexual
contact-to
the
situa-
tion.
These
we
call
positive
reinforcers.
Others
consist
of
removing
something-for
example,
a
loud
noise,
a
very
bright
light,
extreme
cold
or
heat,
or
electric
shock-
from
the
situation.
These
we
call
negative
reinforcers"
(Skinner,
1953,
p.
73).
The
concept
of
chaining
is
closely
related
to
the
concept
of
reinforcement.
Skinner's
law
of
chaining
for
operant
and
respondent
be-
havior
states
that
"the
response
of
one
reflex
may
constitute
or
produce
the
eliciting
or
dis-
criminative
stimulus
of
another"
(Skinner,
1938,
p.
32).
If
a
discriminative
or
eliciting
stimulus
produced
in
this
way
maintains
re-
sponding,
the
stimulus
is
a
reinforcer.
These
two
classes
of
stimuli
are
distinguished
in
the
following
way:
an
eliciting
stimulus
is
a
stim-
ulus
consistently
followed
by
a
correlated
re-
sponse
(Skinner,
1938).
We
shall
define
a
dis-
criminative
stimulus
as
a
stimulus
in
the
presence
of
which
an
operant
response
is
reinforced.
Each
stimulus
class
has
a
corre-
sponding
chain
class.
In
a
respondent
chain,
a
response
follows
an
eliciting
stimulus
and
produces
the
eliciting
stimulus
for
another
response.
In
an
operant
chain,
responses
occur
in
the
presence
of
a
discriminative
stimulus
and
produce
the
discriminative
stimuli
for
other
responses.
The
stimuli
in
a
chain
may
of
course
be
exteroceptive
or
interoceptive.
In
the
present
section
we
shall
discuss
chain-
ing
and
reinforcement
in
general
before
de-
fining
and
discussing
conditioned
reinforce-
ment.
The
conditioning
of
an
operant
usually
es-
tablishes
a
chain.
For
example,
in
using
food
reinforcers
to
train
a
food-deprived
rat,
the
first
procedural
step
is
magazine
training.
The
rat
is
placed
in
an
illuminated
chamber,
and
at
irregular
time
intervals
a
click
briefly
pre-
cedes
the
delivery
of
a
food
pellet
into
a
tray.
After
several
of
these
pairings
of
click
and
food,
the
rat
will
approach
the
food
tray
when
the
click
occurs
and
will
consume
the
food
pellet.
The
magazine-training
procedure
estab-
lishes
a
complex
chain
of
responses
consisting
of
operant
and
respondent
components.
The
click
is
both
a
discriminative
stimulus
that
controls
operants,
such
as
approaching
the
tray,
and
an
eliciting
stimulus
for
respondents,
such
as
salivation
(Shapiro,
1960).
The
food
pellet
also
is
both
a
discriminative
stimulus
that
controls
operants,
such
as
seizing
and
chewing
the
pellet,
and
an
eliciting
stimulus
for
respondents,
such
as
salivation
and
other
alimentary
reflexes
(cf.
Skinner,
1938,
p.
52).
The
magazine
training
procedure
usually
establishes
a
stable
chain
of
responses,
and
then
the
chain
is
extended.
For
example,
if
the
click
and
the
chain
of
responses
that
fol-
lows
the
click
are
made
contingent
upon
some
operant
response,
this
response
will
be
con-
ditioned.
How
far
can
such
a
chain
be
ex-
tended?
A
demonstration
of
a
dramatically
extended
chain
was
arranged
by
Pierrel
and
Sherman
(in
press).
A
food-deprived
rat,
in a
special
experimental
box,
engaged
in
the
following
sequence
of
behavior:
it
climbed
to
the
top
of
a
spiral
staircase,
"bowed
to
the
audience,"
pushed
down
a
drawbridge,
crossed
the
bridge,
climbed
a
ladder,
used
a
chain
to
pull
in
a
model
railroad
car,
pedaled
the
car
through
a
tunnel,
climbed
a
flight
of
stairs,
ran
through
a
tube,
and
stepped
into
an
ele-
vator
which
descended
to
the
base
of
the
plat-
form
(raising
the
school
banner).
At
this
point
a
buzzer
sounded,
and
the
rat
pressed
a
lever
and
received
a
food
pellet.
Of
course,
the
de-
livery
of
the
food
pellet
was
followed
by
seiz-
ing,
salivation,
chewing,
swallowing
and
a
complex
chain
of
gastrointestinal
reflexes.
What
reinforcer
(or
reinforcers)
enabled
the
establishment
and
maintenance
of
such
a
re-
markable
sequence
of
behavior?
The
rat
was
food-deprived,
and
the
delivery
of
the
food
pellet
is
obviously
important.
However,
the
effectiveness
of
food-delivery
depends
on
mag-
azine
training.
Some
investigators
have
ana-
lyzed
the
sequence
of
events
that
follows
food
delivery.
They
have
attempted
to
determine
the
event
in
this
sequence
which
is
necessary
for
reinforcement.
For
example,
Hull,
Liv-
ingston,
Rouse,
and
Barker
(1951)
interrupted
the
sequence
of
events
following
food-delivery
by
surgically
preparing
a
dog
with
an
esopha-
geal
fistula.
Food
was
released
from
the
esoph-
agus
before
it
reached
the
stomach
(sham
feed-
ing).
In
one
experiment,
the
dog
was
sham-fed
large
amounts
every
day
in
an
experimental
room;
by
the
eighth
day
of
the
experiment
the
dog
could
not
even
be
coaxed
to
eat
in
the
room.
In
another
experiment,
the
dog
more
often
selected
an
alley
where
normal
feeding
occurred
than
an
alley
where
sham-
544
A
REVIEW
OF
POSITIVE
CONDITIONED
REINFORCEMENT
feeding
occurred.
On
the
other
hand,
the
dog
selected
an
alley
where
it
was
sham-fed
rather
than
an
alley
where
it
was
not
fed.
Interpre-
tations
of
these
and
similar
results
have
been
hotly
disputed.
Most
of
the
controversy
cen-
ters
around
a
single
question.
Was
food
a
re-
inforcer
the
first
time
it
was
presented
or
did
it
become
a
reinforcer
through
conditioning?
Faced
with
a
similar
problem,
Pavlov
noted
that
salivation
elicited
by
even
the
first
pres-
entations
of
food
to
his
laboratory
dogs
was
a
conditioned
response.
He
referredl
to
experi-
ments
by
Dr.
Zitovich
(Pavlov,
1927,
pp.
22-
23),
in
which
dogs
that
had
been
fed
only
milk
did
not
salivate
the
first
time
they
were
given
meat
or
bread.
"Only
after
the
puppies
have
been
allowed
to
eat
bread
and
meat
on
several
occasions
does
the
sight
or
smell
of
these
foodstuffs
evoke
the
secretion."
(Pavlov,
1927,
p.
23).
Hull
et
al.,
(1951)
concluded
that
their
dog
preferred
sham-feeding
over
no
feeding
because
of
an
unspecified
history
of
conditioning;
that
is,
food-in-the-mouth
and
swallowing
had
always
preceded
food-in-the-
stomach.
In
what
may
be
just
one
more
ques-
tion
in
an
infinite
regress,
it
may
be
asked
whether
the
effects
of
food-in-the-stomach
have
been
conditioned
because
they
always
preceded
food-in-the-intestine.
That
Hull
recognized
this
difficulty
is
suggested
by
the
following
note
at
the
end
of
a
chapter
on
secondary
(conditioned)
reinforcement:
"In
the
present
chapter
it
has
been
seen
that
despite
well-marked
differences
there
are
a
number
of
striking
similarities
be-
tween
primary
and
secondary
reinforce-
ment.
Perhaps
the
most
notable
of
the
sim-
ilarities
is
the
fact
of
reinforcement
itself.
So
far
as
our
present
knowledge
goes,
the
habit
structures
mediated
by
the
two
types
of
reinforcement
agents
are
qualitatively
identical.
This
consideration
alone
consti-
tutes
a
very
considerable
presumption
in
favor
of
the
view
that
both
forms
are
at
bottom,
i.e.,
physiologically
the
same.
It
is
difficult
to
believe
that
the
processes
of
organic
evolution
would
generate
two
en-
tirely
distinct
physiological
mechanisms
which
would
yield qualitatively
exactly
the
same
product,
even
though
real
duplica-
tions
of
other
physiological
functions
are
known
to
have
evolved."
(Hull,
1943,
pp.
99-1
00).
In
summary,
it
has
been
found
that
food,
which
is
usually
referred
to
as
a
primary
or
unconditioned
reinforcer,
has
acquired
its
reinforcing
effects
in
large
part,
through
con-
ditioning.
This
was
true
in
both
respondent
and
operant
situations.
So
far
we
do
not
know
the
critical
event
in
food
reinforcement.
For
convenience
an
arbitrary
distinction
will
be
made
between
primary
reinforcers
and
conditioned
reinforcers.
Some
stimuli
are
positive
reinforcers
because
of
a
specified
ex-
perimental
history
of
conditioning.
We
shall
call
these
stimuli
conditioned
reinforcers,
and
refer
to
the
presentation
of
this
type
of
rein-
forcer
as
conditioned
reinforcement.4
The
effectiveness
of
other
reinforcers,
such
as
food
and
water,
apparently
does
not
depend
on
any
special
history
of
conditioning.
These
stimuli
are
reinforcers
for
most
members
of
a
given
species.
We
shall
call
these
stimuli
pri-
mary
reinforcers,
and
refer
to
the
presenta-
tion
of
this
type
of
reinforcer
as
primary
re-
inforcement.
A
stimulus
repeatedly
presented
just
before
a
primary
reinforcer
can
acquire
reinforcing
properties.
Operationally,
this
pro-
cedure
resembles
the
respondent
conditioning
procedure;
that
is,
the
presentation
of
the
primary
reinforcing
stimulus
is
contingent
upon
the
presentation
of
the
conditioned
re-
inforcer
rather
than
upon
a
response.
Unlike
food
and
water,
several
types
of
re-
inforcers
do
not
involve
magazine
training.
They
seem
to
be
"pure"
primary
reinforcers,
which
do
not
involve
the
conditioning
of
a
chain
of
behavior
before
a
consummatory
re-
sponse.
Because
the
reinforcing
properties
do
not
need
to
be
learned,
these
events
offer
in-
teresting
possibilities
for
studies
with
positive
reinforcers.
Some
of
these
reinforcers
are:
in-
travenous
injections
of
glucose
(Coppock
and
Chambers,
1954);
subcortical
brain
stimula-
tion
(Olds
and
Milner,
1954);
and
presenta-
tion
of
heat
to
a
cold
animal
(Weiss,
1957).
It
is
interesting
to
note,
however,
that
Carlton
and
Marks
(1958)
found
that
stable
lever-
pressing
could
not
be
established
by
heat
pres-
entations
unless
some
exteroceptive
stimulus
4The
popular
use
of
the
term
secondary
reinforce-
ment
is
unfortunate
because
it
does
not
encourage
an
analysis
of
the
processes
involved
in
developing
a
stimulus
as
a
reinforcer.
The
use
of
the
term
con-
ditioned
reinforcement
emphasizes
the
conditioning
process,
and
makes
it
unnecessary
to
use
awkward
and
confusing
terms
such
as
tertiary
and
quarternary
re-
inforcement.
545
ROGER
T.
KELLEHER
and
LEWIS
R.
GOLLUB
(e.g.,
a
tone)
had
previously
been
associated
with
heat
presentations.
We
can
now
suggest
an
analysis
of
the
com-
plex
chain
which
was
developed
in
the
rat
demonstration.
The
food
pellet
is
a
primary
reinforcer
for
the
rat.
Primary
reinforcement
occurs
when
the
rat
presses
the
lever
in
the
presence
of
the
buzzer.
Food
reinforces
lever-
pressing.
Since
the
buzzer
occurs
just
before
the
primary
reinforcer
it is
establishedl
as
a
conditioned
reinforcer.
In
turn,
the
buzzer
also
reinforces
lever-pressing
and
establishes
the
stimulus
that
precedes
the
buzzer
as
a
con-
ditioned
reinforcer,
and
so
forth.
Naturally,
this
analysis
raises
many
questions.
What
are
the
necessary
and
sufficient
conditions
for
establishing
a
stimulus
as
aconditioned
re-
inforcer?
Can
a
conditioned
reinforcer
be
used
to
establish
another
conditioned
reinforcer?
If
it
can,
how
far
can
this
process
be
extended?
How
durable
are
the
conditioned
reinforcers
when
the
chain
does
not
terminate
in
pri-
mary
reinforcement?
What
determines
the
ef-
fectiveness
of
a
conditioned
reinforcer?
Experiments
designed
to
answer
these
ques-
tions,
as
well
as
others,
will
be
discussed
in
the
following
sections.
Most
of
these
ques-
tions
are
not
new.
The
empirical
phenomena
that
have
generated
the
concept
of
condi-
tioned
reinforcement,
as
well
as
the
ways
in
which
the
concept
is
employed
in
various
theories
of
behavior,
have
been
presented
in
detail
elsewhere
(Skinner,
1938;
Hilgard
and
Marquis,
1940;
Spence,
1947;
Miller,
1951;
Skinner,
1953).
Recent
experiments
have
sug-
gested
answers
to
some
of
these
questions
and
have
raised
many
new
questions.
In
the
pres-
ent
review
we
shall
emphasize
recent
experi-
ments
on
conditioned
reinforcement.
Early
experiments
will
be
discussed
when
the
experi-
mental
technique
or
the
interpretation
is
relevant
to
our
analysis.
The
purpose
of
the
present
paper
is
to
review
experimental
data
on
conditioned
reinforcement,
to
interpret
these
data
in
terms
of
principles
of
condition-
ing,
to
discuss
previous
interpretations
of
these
data,
and
to
suggest
promising
directions
for
future
research.
II.
CONDITIONED
REINFORCEMENT
AND
CHAINED
SCHEDULES
The
relation
of
chains
to
conditioned
re-
inforcement
was
pointed
out
above.
Many
reinforcement
operations,
such
as
magazine
training,
establish
chains.
Chains
can
also
be
established
explicitly
in
order
to
investigate
conditioned
reinforcement.
In
a
response
chain
each
response
in
a
sequence
of
responses
makes
the
next
one
possible
or
more
likely
to
be
reinforced
(Ferster
and
Skinner,
1957).
In
such
a
chain,
the
sequence
of
individual
re-
sponses,
rather
than
the
stimuli
which
control
the
behavior,
is
emphasized.
The
occurrence
of
the
sequence
and
of
each
member
of
the
se-
quence
is
generally
presumed
to
be
under
pro-
prioceptive
stimulus
control,
rather
than
ex-
teroceptive
control.
The
components
of
such
a
chain
are
the
individual
responses
which
make
up
the
sequence.
A
more
general
classification
of
sequences
of
behavior
is
provided
by
the
chained
sched-
ule.
In
a
chained
schedule
of
reinforcement,
responding
in
the
presence
of
one
exterocep-
tive
stimulus
produces
a
second
exteroceptive
stimulus;
responding
in
the
presence
of
the
second
stimulus
produces
a
third
stimulus,
and
so
forth
(Ferster
and
Skinner,
1957).
A
pri-
mary
reinforcer
terminates
the
chained
sched-
ule.
Each
exteroceptive
stimulus,
including
the
primary
reinforcer,
defines
a
component
of
the
chained
schedule.
The
topography
of
the
response
in
each
component
may
or
may
not
be
the
same.
In
one
type
of
chained
schedule,
often
called
a
heterogeneous
chain,
the
different
components
of
the
chain
can
be
identified
by
different
response
topographies.
The
extended
chain
in
the
Pierrel
and
Sherman
demonstra-
tion
described
above
is
an
example
of
a
heter-
ogeneous
chained
schedule.
In
another
type
of
chained
schedule,
often
called
a
homo-
geneous
chain,
a
single
response
topography
occurs
throughout
the
chain,
and
this
re-
sponse
produces
successive
exteroceptive
stim-
uli.
Most
chaining
procedures
are
mixtures
of
homogeneous
and
heterogeneous
chains.
For
example,
different
exteroceptive
stimuli
may
indicate
components
of
a
chain
in
which
dif-
ferent
response
topographies
are
appropriate.
Also,
any
homogeneous
chain
that
terminates
with
food
presentation
and
eating
is
tech-
nically
a
heterogeneous
chain.
The
distinction
between
heterogeneous
and
homogeneous
chained
schedules
is
not
essential
to
our
analysis
of
conditioned
reinforcement.
Each
component
stimulus
of
a
chained
schedule
may
be
a
conditioned
reinforcer
for
546
A
REVIEW
OF
POSITIVE
CONDITIONED
REINFORCEMENT
responding
in
the
component
which
precedes
it.
Of
course,
behavioral
criteria
are
neces-
sary
for
determining
whether
or
not
a
stimulus
is
a
reinforcer.
If
a
primary
reinforcer
is
pre-
sented
after
responses
of
a
particular
class,
these
responses
become
more
frequent.
Many
experiments
have
shown
that
even
very
fine
details
of
the
method
of
presenting
a
rein-
forcer
will
determine
the
behavior
generated.
In
particular,
different
schedules
of
reinforce-
ment
of
a
free
operant
(Ferster
and
Skinner,
1957)
will
control
rates
of
responding
that
differ
over
a
wide
range;
scheduling
proce-
dures
for
delivering
food
to
a
rat
which
has
run
down
an
alley
maze
(Logan,
1960)
will
have
similar
effects.
Even
when
the
precise
re-
lationship
between
response
and
reinforcer
is
not
specified,
some
relationship
will
exist,
and
will
have
its
effects.
In
determining
whether
a
stimulus
is
a
conditioned
reinforcer,
the
re-
sponse
pattern
that
precedes
the
presentation
of
that
stimulus
must
be
considered,
as
well
as
the
overall
frequency
of
the
response.
These
response
patterns
can,
be
compared
with
the
patterns
in
control
procedures,
which
will
be
described
in
this
section,
and
with
response
patterns
under
comparable
contingencies
of
primary
reinforcement.
Chained
schedules
have
several
advantages
for
the
investigation
of
conditioned
reinforce-
ment.
First,
behavior
can
be
maintained
on
chained
schedules
because
the
schedule
is
terminated
by
a
primary
reinforcer.
With
maintained
behavior
the
effects
of
many
dif-
ferent
variables
can
be
studied
in
a
stable,
chronic
situation.
Investigators
have
often
assessed
the
strength
of
a
conditioned
rein-
forcer
at
the
same
time
that
it
was
being
weakened
by
experimental
extinction
(see
Section
III).
Chained
schedules
of
reinforce-
ment
permit
us
to
assess
the
strength
of
condi-
tioned
reinforcing
stimuli
without
the
con-
founding
effects
of
experimental
extinction.
Second,
the
component
stimuli
in
a
chained
schedule
are
easily
identified
because
they
are
exteroceptive.
This
identifiability
permits
a
detailed
analysis
of
the
role
of
stimuli
in
the
chain.
Third,
the
stimulus
control
of
behavior
in
a
chained
schedule
permits
precise
analysis
of
the
quality
of
performance
in
each
member
of
a
chain.
The
pattern
and
rate
of
respond-
ing,
and
the
frequency
and
probability
of
reinforcement
are
known
for
each
of
the
com-
ponent
stimuli.
A.
CONTROL
PROCEDURES
How
much
of
the
performance
on
a
chained
schedule
can
be
attributed
to
conditioned
re-
inforcing
effects
of
specific
stimuli,
and
how
much
must
be
attributed
to
other
variables?
Several
confounding
variables
should
be
men-
tioned.
First,
the
organism
may
not
see
or
attend
to
the
component
exteroceptive
stimuli
of
the
chained
schedule.
In
this
case,
perform-
ance
would
not
be
influenced
by
the
stimuli,
and
it
would
be
incorrect
to
attribute
condi-
tioned
reinforcing
effects
to
any
of
the
com-
ponent
stimuli.
Second,
the
response
rate
in
any
of
the
component
stimuli
of
the
chain
may
be
due
solely
to
the
proximity
of that
component
to
primary
reinforcement
rather
than
to
the
conditioned
reinforcing
effect
of
the
next
stimulus.
Finally,
the
possibilities
of
stimulus
generalization,
response
generaliza-
tion,
or
unusual
effects
that
may
be
produced
by
certain
kinds
of
stimuli
must
all
be
con-
sidered.
For
example,
in
some
organisms
an
intense
stimulus
may
always
produce
a
high
response
rate
because
of
a
history
of
reinforce-
ment
or
because
of
genetic
factors.
Some
of
these
difficulties
can
be
minimized
or
elimi-
nated
by
the
use
of
appropriate
control
pro-
cedures
that
precede
or
alternate
with
the
chained schedule
procedure.
1.
Multiple
Schedules
as
Control
Procedures
In
multiple
schedules,
as
in
chained
sched-
ule,
a
different
exteroceptive
stimulus
is
asso-
ciated
with
each
component.
The
patterns
of
responding
that
develop
on
a
chained
schedule
can
be
compared
to
the
patterns
of
responding
that
develop
on
analogous
multiple
schedules
of
reinforcement.
Two
types
of
multiple
sched-
ules
have
been
used
as
control
procedures
for
chained
schedules.
In
the
first
type
of
multiple
(mult)
schedule,
responding
is
first
reinforced
with a
primary
reinforcer
in
the
presence
of
each
of
the
stim-
uli
that
will
be
used
in
the
chained
schedule.
For
example,
in
the
initial
phase
of
an
experi-
ment,
responding
in
the
presence
of
one
stimulus,
S2,
produces
food
on
a
1-min
vari-
able-interval
(VI
1)
schedule5;
responding
in
the
presence
of
another
stimulus,
SI,
produces
food
on
a
50-response
fixed-ratio
(FR
50)
"The
numbers
in
the
notations
for
time-controlled
schedules
represent
minutes,
unless
otherwise
indicated.
ROGER
T.
KELLEHER
and
LEWIS
R.
GOLLUB
schedule
(mult
VI
1
FR
50).
The
two
sched-
ules
and
the
corresponding
stimuli
alternate,
changing
only
after
reinforcement.
This
mult
VI
1
FR
50
schedule
can
be
converted
to
the
analogous
chained
(chain)
schedule
by
omit-
ting
the
primary
reinforcer
at
the
end
of
S2.
On
chain
VI
1
FR
50,
responses
in
S2
produce
S,
on
a
VI
1
schedule;
responses
in
Si
pro-
duce
food
on
FR
50.
If
SI
is
a
conditioned
re-
inforcer,
responding
will
be
maintained
in
S26.
In
the
second
type
of
multiple
schedule,
re-
sponding
is
reinforced
in
Sl,
but
is
not
rein-
forced
(extinction)
in
S2;
changes
from
S2
to
SI
occur
independently
of
responses.
In
an
ex-
periment
with
pigeons
by
Ferster
and
Skinner
(1957,
p.
667),
for
example,
an
orange
stimulus
(S2),
indicating
extinction
(ext),
appeared
for
a
variable
time
interval
averaging
1
min.
At
the
end
of
this
time
interval
a
blue
stimulus
(Si)
appeared,
and
responding
in
the
presence
of
the
blue
stimulus
was
reinforced
by
food
on
an
FR
50
schedule.
After
10
sessions
on
this
mult
ext
FR
50
schedule,
the
response
rates
of
the
pigeons
were
low
in
S2
and
high
in
SI.
In
the
eleventh
session
the
schedule
was
changed
to
chain
VI
1
FR
50.
Responses
in
S2
produced
S,
on
a
VI
1
schedule;
responses
in
S,
produced
food
on
an
FR
50
schedule.
On
this
chained
schedule,
response
rates
in
S2
in-
creased
to
intermediate
levels
while
response
rates
in
SI
remained
high.
The
rates
and
pat-
terns
of
responding
in
both
components
of
the
chained
schedule
were
similar
to
those
usually
developed
when
responding
is
re-
inforced
by
food.
The
results
of
the
Ferster
and
Skinner
ex-
periment
indicate
that
SI
was
a
conditioned
reinforcer
for
responding
in
the
presence
of
S2.
The
stimulus
condition
and
temporal
re-
lation
of
S2
to
the
primary
reinforcement
in
the
multiple
schedule
established
only
low
rates
of
responding.
Although
the
same
tem-
poral
proximity
to
primary
reinforcement
pre-
vailed
in
the
chained
schedule,
much
higher
rates
were
maintained.
However,
the
con-
tingencies
on
the
multiple
schedule-
and
the
contingencies
on
the
chained
schedule
differ
"The
component
stimuli
of
chained
schedules
will
be
numbered
from
the
last
to
the
first
stimulus,
i.e.,
in
the
reverse
of
the
order
in
which
they
actually
appear.
The
last
stimulus
in
any
chain
is
S1,
the
next
to
the
last
stimulus
is
S2,
and
so
forth.
This
notation
is
convenient
for
discussing
chains
of
different
lengths.
The
lower
the
number
assigned
to
a
stimulus,
the
closer
it
is
to
the
primary
reinforcer.
in
one
important
way.
If
the
bird
did
not
re-
spond
in
S2
on
the
multiple
schedule,
SI
still
appeared.
If
the
bird
did not
respond
in
S2
on
the
chained
schedule,
S,
would
not
appear,
and
reinforcement
would
be
delayed.
Possibly
this
response
requirement
alone,
rather
than
the
conditioned
reinforcing
effect
of
SI,
was
responsible
for
performance
in
S2.
This
possi-
bility
could
be
eliminated
by
employing
a
tandem
schedule
control.
2.
Tandem
Schedules
as
Control
Procedures
In
tandem
(tand)
schedules
of
reinforce-
ment,
as
in
chained
schedules,
changes
from
one
component
to
the
next
are
contingent
upon
responding;
however,
the
same
extero-
ceptive
stimulus
appears
in
both
components.
For
example,
Gollub
(1958)
trained
a
pigeon
first
on
tand
Fl
3
Fl
2
and
then
on
chain
Fl
3
Fl
2.
The
stimulus
that
would
become
SI
in
the
chained
schedule
was
in
effect
throughout
the
tandem
schedule.
On
tand
Fl
3
Fl
2
the
first
response
after
3
min
elapsed
in
the
Fl
3
segment
started
the
Fl
2
schedule,
and
the
first
response
after
2
min
elapsed
in
the
Fl
2
seg-
ment
was
reinforced
by
food.
The
only
extero-
ceptive
stimulus
change
in
a
tandem
schedule
was
the
delivery
of
food
at
the
end
of
the
se-
quence.
The
transition
from
the
tandem
to
the
chained
schedule
is
shown
in
Fig.
1
(at
the
arrow
in
record
A).
During
final
perform-
ance
on
tand
Fl
3
Fl
2
the
response
rates
were
10
per
min
during
Fl
3
and
49.1
per
min
dur-
ing
Fl
2.
On
the
tandem
schedule
responding
was
positively
accelerated
between
reinforce-
ments;
i.e.,
the
performance
resembled
be-
havior
characteristic
of
a
simple
Fl
5
schedule.
During
final
performance
on
chain
Fl
3
Fl
2,
the
response
rate
on
Fl
3
(in
the
presence
of
S2)
was
19.9
responses
per
min,
and
the
re-
sponse
rate
on
Fl
2
(in
the
presence
of
SI)
was
56.2
responses
per
min.
On
the
chained
sched-
ule
positively
accelerated
responding
some-
times
occurred
in
each
of
the
components
(see
record
D).
The
results
of
Gollub's
experiment
indicate
that
the
appearance
of
S,
was
a
conditioned
reinforcer
for
responding
in
the
presence
of
S2.
In
this
example,
the
use
of
the
tandem
sched-
ule
as
a
control
procedure
facilitates
an
inter-
pretation
of
the
results.
When
the
schedule
was
changed
from
tandem
to
chained,
Fl
3
response
rates
increased
and
positively
acceler-
ated
responding
developed
in
both
Fl
3
and
548
A
REVIEW
OF
POSITIVE
CONDITIONED
REINFORCEMENT
Fig.
1.
Tandem
and
chain
Fl
3
Fl
2.
Record
A
is
the
transition
from
tandem
to
chained
schedule
(at
the
arrow).
Record
A'
is
the
continuation
of
that
session.
Records
B,
C
and
D
show
further
development
in
the
chained
schedule:
Record
B
is
from
the
second,
C
from
the
fourth
and
D
from
the
eighth
session
on
the
chained
schedule.
Pips
indicate
change
from
one
schedule
component
to
the
next.
Small
dots
above
the
record
inidicate
food
deliveries
(from
Gollub,
1958).
Fl
2.
These
changes
in
performance
demon-
strate
that
the
major
effects
of
the
chained
schedule
were
due
to
the
exteroceptive
stimuli
in
each
component.
3.
Other
Control
Procedures
Other
techniques
for
manipulating
the
com-
ponent
stimuli
in
chained
schedules
provide
information
about
conditioned
reinforcement.
First,
the
component
stimuli
in
two-compo-
nent
chained
schedules
can
be
reversed.
In
longer
chained
schedules
the
order
of
the
stimuli
can
be
altered
in
more
complex
ways;
e.g.,
the
same
stimulus
may
occur
in
the
sec-
ond
and
fourth
components
of
a
four-compo-
nent
chain.
Second,
the
order
of
the
stimuli
can
be
varied
temporarily,
by
presenting
one
of
the
component
stimuli
at
an
unusual
time.
Under
this
stimulus
"probe"
procedure,
the
degree
to
which
each
stimulus
controls
its
appropriate
rate
independently
of
its
order
of
presentation
is
assessed.
Finally,
the
dis-
criminability
of
the
component
stimuli
can
be
varied.
In
this
section
we
shall
discuss
conditioned
reinforcement
in
two-component
chained
schedules,
in
extended
chained
schedules
(those
with
more
than
two
components),
in
other
procedures
involving
chained
schedules,
and
in
implicit
chained
schedules.
B.
CONDITIONED
REINFORCEMENT
IN
Two-COMPONENT
CHAINED
SCHEDULES
In
two-component
chained
schedules,
the
behavior
in
the
presence
of
S2,
the
discrimina-
tive
stimulus
for
the
initial
component,
can
be
used
as
a
dependent
variable
indicating
the
strength
of
the
conditioned
reinforcing
effect
of
SI,
the
discriminative
stimulus
for
the
second
component.
Table
1
is
a
summary
of
experiments
on
two-component
chained
schedules.
For
convenience,
these
schedules
are
classified
according
to
the
schedule
of
primary
reinforcement
associated
with
Sj;
i.e.,
according
to
the
columns
in
Table
1.
In
our
analysis
of
conditioned
reinforcement
in
two-
component
chained
schedules
we
will
con-
sider
the
following
questions:
(1)
Is
the
conditioned
reinforcing
effective-
ness
of
S,
a
function
of
the
frequency
of
reinforcement
(reinforcements
per
min)
or
the
probability
of
reinforcement
(the
recip-
rocal
of
responses
per
reinforcement)
in
Sl?
(2)
Is
the
conditioned
reinforcing
effective-
ness
of
S,
a
function
of
the
pattern
of
re-
sponding
which
occurs
in
the
presence
of
Sl?
Experiments
in
which
schedules
of
re-
inforcement
generate
different
patterns
of
responding
are
relevant
to
this
question.
549
ROGER
T.
KELLEHER
and
LEWVIS
R.
GOLLUB
-4
-4
00
w
0
0
_
x
U)
8
f-
3
=
--
118
,5SO
x
I
0
-
-
11
I>
;>
X
o
Of)
L0.I
a;~
LC
>
>f
0
-
---
-
-
X
>
>
;
>
>
X
I
I
1
N
N
0'
-.~~
-
II
I
-S
u!
atnpaq3S
CIO
CC)
Cqu
-'
oo
5
Cu
_co
_
_
04 -
0
aaa
.~
.v
.W
of
0@
'a
a
N
go
~N
-
t
^
^
"I
I
550
0j
c.)
C.)
CZ)
$0
0)0
I
. .
I
I
L
i
-
II
;A4
A
REVIEIV
OF
POSITIVE
CONDITIONED
REINFORCEMENT
(3)
Is
the
con(litione(l
reinforcing
effective-
ness
of
SI
a
function
of
the
type
of
schedule
of
reinforcement
occurring
in
S,?
Although
this
question
is
related
to
the
second
one,
there
is
an
important
(listinction.
A
DRL
schedule
an(d
a
VI
schedule
in
Sl,
for
ex-
ample,
might
generate
similar
rates
and
patterns
of
response,
and
frequencies
of
re-
inforcement,
but
the
conditioned
reinforc-
ing
effectiveness
of
SI
might
(lepen(l
on
the
type
of
schedule
used.
Experiments
in
which
(lifferent
types
of
schedule
of
reinforcement
generated
similar
rates
and
patterns
of
re-
spondling
are
relevant
to
this
question.
(4)
Is
establishing
S,
as
a
(liscriminative
stimulus
a
necessary
and
sufficient
condition
for
making
S,
a
conditione(d
reinforcer?
1.
VI
Schedule
in
S,
Variable-interval
schedules
(letermine
that
reinforcements
are
available
at
irregular
in-
tervals
in time.
They
are
often
used
as
com-
ponents
in
chained
schedules
because
they
determine
a
maximum
frequency
of
reinforce-
ment
without
establishing
extensive
temporal
control
of
responding
as
do
fixed-interval
schedules;
i.e.,
response
rates
are
usually
con-
stant
over
time.
Also,
average
rates
of
respond-
ing
are
directly
related
to
overall
reinforce-
ment
frequency;
however,
VI
schedules
can
provide
a
constant
frequency
of
reinforcement
even
under
very
different
rates
of
responding.
Ferster
and
Skinner
(1957,
p.
673)
exposed
pigeons
to
mult
ext
VI
1
and
then
to
chain
VI
1
VI
1.
On
the
multiple
schedule,
response
rates
in
S2
were
near
zero.
After
three
sessions
on
the
chained
schedule,
the
pigeons
re-
sponded
almost
as
frequently
in
S2
as
they
did
in
Sl.
The
results
suggest
that
S,
was
a
condi-
tioned
reinforcer
which
developed
and
main-
tained
responding
in
S.
An
important
control
procedure
was
lacking
in
this
experiment,
and
in
many
others
discussed
in
this
review.
As
we
have
previously
noted,
the
response
re-
quirements
for
reinforcement
on
a
chained
schedule
are
identical
to
those
on
the
analo-
gous
tandem
schedule.
Moreover,
the
require-
ments
of
a
tandem
schedule
may
approximate
those
of
a
single,
simple
schedule.
In
this
ex-
periment,
for
example,
the
response
contin-
gencies
on
tand
VI
1
VI
1
were
very
similar
to
the
response
contingencies
on
VI
2.
Because
of
this
similarity
of
response
contingencies,
performance
on
chain
VI
1
VI
1
may
have
been
due
either
to
the
response
contingencies
alone
or
to
the
conditioned
reinforcing
effect
of
Sl.
Findley
(1954;
1962)
studied
the
perform-
ance
of
rats
on
chain
VI
4
VI
x
schedules.
When
the
rat
pulled
a
metal
chain
in
dark-
ness
(Se),
a
light
(S1)
was
presented
on
a
VI
4
schedule.
In
the
presence
of
S1,
lever
press-
ing
was
reinforced
by
food
on
a
VI
schedule
that
was
systematically
varied
from
8
min
to
0.5
min.
As
reinforcement
frequency
on
the
VI
schedule
in
S,
was
increased,
response
rates
in
S,
increased
slightly;
the
rats
pressed
the
lever
at
about
the
same
rate
when
they
were
responding,
but
had
fewer
long
pauses.
As
shown
in
Fig.
2,
the
chain-pulling
rates
in
So
decreased
in
a
negatively
accelerated
fashion
as
the
frequency
of
reinforcement
on
VI
in
S,
was
(lecreased.
Autor
(1960)
studied
the
performances
of
pigeons
on
chlain
VI
1
VI
x
schedules.
In
Autor's
study
the
experimental
chamber
con-
tained
two
response
keys,
and
a
two-compo-
nent
chaine(d
schedule
was
programmed
on
each
key.
The
first
components
of
these
chained
schedules
were
identical
but
inde-
pendent
VI
1
schedules
which
were
pro-
grammed
concurrently.
Different
stimuli
(key
lights)
were
associated
with
each
of
these
VI
sclhedules.
WVhenever
the
stimulus
associated
with
the
second
component
of
the
chain
was
produce(d
on
one
key,
the
stimulus
associated
with
the
second
component
of
the
chain
on
the
other
key
could
not
occur,
and
responses
on
the
other
key
were
ineffective.
After
a
perio(l
of
time,
which
was
long
enough
to
in-
sure
at
least
one
food
reinforcement
in
the
second-component
VI
schedule,
the
concur-
rent,
first-component
VI
1
schedules
again
went
into
effect.
The
second-component
sched-
ule
on
key
A
was
held
constant
at
VI
15
sec,
while
the
second-component
VI
schedule
on
key
B
was
systematically
varied
from
VI
1
min
to
VI
3.75
sec.
When
the
second-component
schedules
were
both
VI
15
sec,
response
rates
in
the
first
components
on
keys
A
and
B
were
about
equal.
As
the
frequency
of
reinforce-
ment
on
the
VI
schedule
in
the
second
com-
ponent
on
key
B
was
increased,
the
first-
component
VI
1
response
rate
on
key
B
in-
creased
in
a
linear
fashion.
The
results
of
both
Findley's
(1954)
and
Autor's
(1960)
studies
suggest
that
the
re-
inforcing
effectiveness
of
the
second-compo-
551
ROGER
T.
KELLEHER
and
LEWIS
R.
GOLLUB
I
4
I
a
a
C
RAT
NO.1
FIRST
VALUES
--
SECOND
VALUES
.
a
a
a
a
RAT
NO.2
FIRST
VALUES
--
SECOND
VALUES
I
I
I
RAT
NO.3
FIRST
VALUES
--SECOND
VALUES
-.
I~~~~~~~~~~~~~~~~~~~
AN
VI
(MINS)
Fig.
2.
Response
rates
in
the
dark
(S2)
as
a
function
of
the
mean
VI
schedule
in
the
light
(S1)
(from
Findley,
1962).
nent
stimulus
is
directly
related
to
the
fre-
quency
of
food
reinforcement
with
which
it
is
associated.
Ferster
and
Skinner
(1957,
p.
664
ff.)
in-
vestigated
chain
Fl
x
VI
3
schedules,
varying
x
from
2
to
7.5
min.
The
average
response
rates
in
S,
on
VI
3
remained
high
at
all
Fl
values
tested.
As
the
duration
of
the
Fl
in
S2
was
increased,
the
initial
Fl
pauses
became
longer
an(d
terminal
response
rates
became
lower.
It
was
apparent,
however,
that
average
total
response
rates
between
food
reinforce-
ments
were
higher
on
chain
Fl
x
VI
3
sched-
ules
than
on
comparable
chain
VI
3
Fl
x
schedules.
On
chain
VI
3
Fl
x,
the
minimum
reinforce-
ment
frequency
in
S,
is
determined
by
the
Fl
schedule.
On
chain
Fl
x
VI
3,
the
average
reinforcement
frequency
in
S,
is
determine(d
by
the
VI
schedule
but
the
minimum
interval
is
usually
quite
short.
These
results
suggest
that
when
S,
is
paired
with
a
fixed
reinforce-
ment
frequency,
it
may
be
a
less
effective
con-
ditioned
reinforcer
than
when
it
is
paired
with
a
variable
reinforcement
frequency
with
the
same
mean
value.
2.
VI
drl
Schedule
in
S,
A
VI
dii1
schedule
can
provide
reinforce-
ments
at
a
frequency
determined
by
the
VI
schedule,
and
at
the
same
time
control
a
specified
rate
of
responding,
determined
by
the
d-1
schedule.
On
a
VI
3
dri
4
sec
schedule,
for
example,
a
reinforcement
becomes
available
every
3
min
on
the
average
(VI
3),
but
there
is
the
addi-
tional
requirement
that
only
a
response
that
follows
a
pause
of
at
least
4
sec
(dri
4
sec)
can
be
reinforced.
Because
of
the
added
require-
ment,
a
VI
3
dri
4
sec
schedule
generates
lower
response
rates
than
simple
VI
3.
Ferster
and
Skinner
(1957)
used
a
VI
3
drl
4
sec
schedule
to
generate
low
response
rates
in
Sl,
while
making
it
possible
for
the
pigeons
to
obtain
food
about
every
3
min
on
the
average.
In
one
study
(Ferster
and
Skinner,
1957,
p.
676
ff)
the
schedules
for
two
pigeons
were
changed
from
chain
VI
3
VI
3
to
chain
VI
3
VI
3
dir
4
sec.
On
chain
VI
3
VI
3,
stable
re-
sponse
rates
of
about
180
per
min
were
main-
tained
in
both
components.
When
VI
3
drl
4
sec
was
scheduled
in
SI,
average
response
rates
in
S,
for
each
bird
decreased
to
about
30
per
min.
Although
response
rates
in
Si
de-
I
I
552
F
I
A
REVIEW
OF
POSITIVE
CONDITIONED
REINFORCEMENT
creased
when
the
drl
4
sec
was
added,
the
frequency
of
reinforcement
in
SI
remained
almost
as
high
as
it
had
been
on
VI
3
alone.
One
bird
responded
irregularly
on
the
VI
3
(in
S2)
as
the
response
rates
in
SI
decreased;
periods
of
rapid
responding
alternated
with
prolonged
pausing.
The
response
rates
of
a
second
bird
were
unchanged
on
VI
3
(in
S2),
despite
the
declining
response
rates
in
S,
pro-
duced
by
the
VI
3
drl
4
sec
schedule.
The
deterioration
of
the
performance
of
the
first
bird
in
S2
suggests
that
the
effective-
ness
of
SI
as
a
conditioned
reinforcer
was
diminished
by
the
addition
of
the
drl
4
sec
requirement.
Moreover,
this
diminution
oc-
curred
even
though
the
frequency
of
reinforce-
ment
in
S,
remained
almost
constant.
The
schedule
of
reinforcement
in
Sf
may
be
able
to
diminish
the
effectiveness
of
that
stimulus
as
a
conditioned
reinforcer
without
changing
the
frequency
of
reinforcement
in
Sl.
How-
ever,
this
finding
was
not
replicated
with
the
second
bird.
Ferster
and
Skinner
(1957)
also
studied
chain
FR
35
VI
3
drl
4
sec.
On
this
schedule
the
35th
response
in
the
presence
of
S2
pro-
duced
Sl,
and
in
the
presence
of
Sl,
responses
were
reinforced
on
VI
3
drl
4
sec.
Stable
re-
sponse
rates
of
about
20
per
min
prevailed
in
Sl,
and
the
frequency
of
reinforcement
in
Si
averaged
about
0.3
per
min.
Irregular
respond-
ing
developed
on
FR
in
S2,
but
average
re-
sponse
rates
in
S2
were
60
per
min
or
higher.
These
response
rates
are
much
lower
than
those
which
are
usually
generated
by
a
simple
FR
35
schedule.
"Nevertheless,
the
perform-
ance
illustrates
the
possibly
unexpected
con-
dition
that
responding
may
occur
at
a
fairly
high
rate
when
the
reinforcement
consists
of
the
production
of
a
stimulus
controlling
a
low
rate."
(Ferster
and
Skinner,
1957,
p.
671.)
3.
DRL
Schedule
in
S,
Ferster
and
Skinner
described
one
differ-
ential
rate
schedule
as
crf
drl.
Under
this
schedule
a
response
is
reinforced
only
when
it
follows
the
preceding
response
by
a
specified
time.
We
will
designate
this
schedule
DRL,
following
Skinner
&
Morse
(1958).
Wilson
and
Keller
(1953)
have
also
studied
behavior
under
this
reinforcement
contingency,
which
they
called
"spaced
responding."
Ferster
and
Skinner
studied
chain
FR
x
DRL
6
sec,
varying
x
from
20
to
120
responses.
On
chain
FR
20
DRL
6
sec,
for
example,
the
20th
response
in
the
presence
of
S2
produced
Sj;
in
the
presence
of
SI,
the
first
response
fol-
lowing
a
pause
of
at
least
6
sec
was
reinforced
by
food.
Average
response
rates
on
DRL
6
sec
in
S,
were
low,
and
frequency
of
reinforce-
ment
in
S,
was
between
2
and
10
per
min.
On
the
other
hand,
except
for
some
initial
paus-
ing,
typically
high
response
rates
developed
on
FR
in
S2.
The
results
obtained
with
chain
FR
95
DRL
6
sec
are
shown
in
Fig.
3.
(Ferster
and
Skinner,
1957,
p.
688).
Following
brief
initial
pauses
in
S2,
the
FR
95
requirement
was
met
rapidly;
the
average
response
rate
on
FR
was
180
responses
per
min.
Average
response
rates
in
Si
were
10
per
min.
Fig.
3.
Cumulative
records
showing
performances
of
two
pigeons
on
chain
FR
95
DRL
6.
Changes
from
the
first
to
the
second
component
are
not
designated;
pips
indicate
food
deliveries
(from
Ferster
&
Skinner,
1957,
courtesy
of
Appleton-Century-Crofts.)
Another
experiment
by
Ferster
and
Skinner
(1957,
p.
690)
provides
an
interesting
com-
parison
of
tand
FR
120
DRL
6
sec
and
chain
FR
120
DRL
6
sec.
Pigeons
were
trained
on
tand
FR
120
DRL
6
sec;
i.e.,
after
120
re-
sponses
had
been
emitted,
the
first
response
following
a
pause
longer
than
6
sec
was
re-
inforced
by
food;
no
exteroceptive
stimuli
indicated
the
change
from
the
first
to
the
second
component.
On
this
tandem
schedule
the
birds
responded
at
intermediate
rates;
553
ROGER
T.
KELLEHER
and
LEWIS
R.
GOLLUB
the
high
response
rates
characteristic
of
simple
FR
schedules
did
not
occur.
When
the
sched-
ule
was
changed
from
tandem
to
chained,
response
patterns
and
response
rates
changed
dramatically.
The
low
stable
rates
that
had
prevailed
in
the
FR
component
of
the
tandem
schedule
were
replaced
by
pauses
alternating
with
very
high
rates.
Low
stable rates
of
responding
prevailed
in
the
DRL
component
of
the
tandem
and
chained
schedules.
In
subsequent
experiments
with
chain
FR
120
DRL
6
sec,
Ferster
and
Skinner
(1957)
varied
the
discriminability
of
S2
and
SI.
When
the
physical
differences
between
Se
and
SI
were
reduced,
the
response
rates
in
S2
de-
creased,
approaching
the
low
response
rates
that
prevailed
in
Sl.
When
differences
between
S2
and
SI
were
increased,
the
response
rates
in
S2
increased
and
a
performance
character-
istically
generated
by
FR
schedules
emerged,
while
response
rates
in
S,
remained
low.
These
experiments
demonstrate
that
a
discriminative
stimulus
that
controls
a
low
response
rate
can
be
a
conditioned
reinforcer
that
will
develop
and
maintain
high
response
rates.
One
recent
experiment
investigated
whether
the
conditioned
reinforcing
effect
of
a
stimulus
is
a
function
of
the
type
of
schedule
of
re-
inforcement
independent
of
the
frequency
or
probability
of
reinforcement.
Brady
and
Thach
(1960)
compared
the
relative
con-
ditioned
reinforcing
effects
of
the
discrimi-
native
stimuli
for
FR
and
DRL
schedules
of
food
reinforcement.
These
schedules
were
the
final
components
of
two
chains.
Four
extero-
ceptive
stimuli
(colored
lights)
occurred,
two
correlated
with
each
chain.
A
single
response
key
was
available
to
the
monkey
subjects.
The
two
chained
schedules
were
chain
Fl
4.5
FR
x
and
chain
Fl
4.5
DRL
x.
In
the
presence
of
the
first
stimulus,
responses
produced
a
second
stimulus
on
a
Fl
4.5
schedule.
In
the
presence
of
the
second
stimulus,
responses
were
re-
inforced
by
food
on
an
FR
schedule
until
five
food
reinforcements
occurred.
In
the
presence
of
the
third
stimulus,
responses
produced
a
fourth
stimulus
on
an
Fl
4.5
schedule.
In
the
presence
of
the
fourth
stimulus,
responses
were
reinforced
by
food
on
a
DRL
schedule
until
five
reinforcements
occurred.
Technically,
this
is
a
multiple
(chain
Fl
4.5
FR
x)
(chain
Fl
4.5
DRL
x).
The
relative
response
rates
on
the
Fl
4.5
schedules
were
used
to
assess
the
conditioned
reinforcing
effects
of
the
stimuli
correlated
with
the
FR
and
DRL
schedules
of
food
reinforcement.
These
schedules
were
varied
from
FR
100
to
FR
400
and
from
DRL
2.5
sec
to
DRL
20
sec.
Another
part
of
the
experiment
investigated
the
effects
of
4.5
min
time-out
periods
following
the
fifth
food
reinforcement
on
each
schedule.
The
relative
conditioned
reinforcing
effects
of
the
stimuli
correlated
with
food-reinforce-
ment
schedules
were
not
a
simple
function
of
frequency
of
reinforcement.
With
primary
reinforcement
schedules
of
FR
100
and
DRL
20
sec
the
frequencies
of
food
reinforcement
for
the
two
schedules
were
either
equal
or
were
higher
for
the
DRL
schedule. Nevertheless,
higher
response
rates
were
maintained
on
the
Fl
which
precededl
the
FR
than
on
the
Fl
which
preceded
the
DRL.
The
same
result
was
obtained
even
when
the
FR
requirement
was
increase(l
and
the
DRL
requirement
was
decreased.
However,
at
FR
400
and
DRL
5
sec
or
DRL
2.5
sec,
the
stimulus
correlated
with
the
DRL
schedule
became
a
stronger
con-
ditioned
reinforcer
than
the
one
correlated
with
the
FR
schedule.
The
addition
of
time-
out
periods
following
primary
reinforcement
periods
did
not
affect
these
results.
The
frequencies
of
reinforcement
on
VI
di1
and
DRL
schedules
are
similar
to
those
occurring
on
VI
schedules.
Many
of
the
results
on
DRL
schedules
in
S,
are
consistent
with
the
results
on
VI
schedules
in
S,;
they
indicate
that
reinforcement
frequency
in
S,
is
the
most
important
single
variable
determin-
ing
the
conditioned
reinforcing
effect
of
Si.
However,
the
results
of
Brady
and
Thach
(1960),
as
well
as
the
results
that
Ferster
and
Skinner
obtained
with
one
pigeon,
suggest
that
the
type
of
schedule
used
in
S,
can
be
more
important
than
the
frequency
of
re-
inforcement.
4.
FI
Schedule
in
S,
Fixed-interval
schedules
determine
the
avail-
ability
of
reinforcements
at
constant
intervals
in
time.
Usually,
a
reinforcement
is
available
after
a
given
time
from
some
event,
such
as
the
last
reinforcement,
or
presentation
of
a
stim-
ulus.
Responses
other
than
the
reinforced
response
have
no
programmed
consequence.
Although
the
maximum
reinforcement
fre-
quency
is
determined
by
the
schedule,
complex
changes
in
amount
and
temporal
pattern
of
554
A
REVIEW
OF
POSITIVE
CONDITIONED
REINFORCEMENT
responding
occur
throughout
the
interval.
Responding
is
often
positively
accelerated
between
reinforcements.
Ferster
and
Skinner
(1957,
p.
659)
studied
the
behavior
of
pigeons
on
mult
ext
Fl
1
and
then
on
chain
VI
1
Fl
1.
On
the
multiple
schedule
response
rates
were
low
during
ext
(in
S2)
and
high
during
Fl
1
(in
SI).
On
chain
VI
1
Fl
1
response
rates
in
S2
(now
correlated
with
VI
1)
increased
steadily.
After
prolonged
exposure
to
chain
VI
1
Fl
1,
average
response
rates
in
S,
(Fl
1)
varied
from
about
40
to
60
per
min.
Although
occasional
pauses occurred
in
So,
the
average
response
rate
was
about
120
responses
per
min.
When
the
frequency
of
food
reinforcement
was
decreased
by
chang-
ing
the
schedule
from
Fl
1
to
Fl
1.5,
perform-
ance
in
S,
was
little
changed;
however,
the
VI
1
response
rate
decreased
to
about
60
per
min.
Ferster
and
Skinner
(1957,
pp.
660ff)
also
studied
chain
VI
3
Fl
x.
After
60
sessions
on
chain
VI
3
Fl
2,
irregular
patterns
of
respond-
ing
were
maintained
on
VI
3;
substantial
paus-
ing
followed
by
increasing
rates
of
responding
occurred
on
Fl
2.
Average
response
rates
were
higher
on
VI
3
(in
S2)
than
on
Fl
2
(in
S1).
In
subsequent
experimental
sessions,
the
VI
3
was
held
constant
while
the
interval
in
Fl
was
2,
5.5,
or
7.5
min.
Average
response
rates
in
S,
did
not
decrease
at
Fl
5.5
or
Fl
7.5;
however,
pauses
alternated
with
variable
re-
sponse
rates
in
S2.
The
results
with
chain
VI
A
3
FI
x
suggest
that
the
conditioned
reinforcing
effect
of
S,
is
markedly
attenuated
when
the
schedule
in
S,
is
Fl
5.5
or
Fl
7.5.
Ferster
and
Skinner
(1957,
pp.
695
ff)
used
a
multiple
(chain)
(chain)
procedure
to
com-
pare
the
relative
conditioned
reinforcing
effects
of
stimuli
associated
with
Fl
and
FR
schedules
of
reinforcement.
The
experimental
chamber
contained
two
response
keys,
with
only
one
illuminated
at
any
time.
When
the
left
key
was
white,
responses
on
it
pro(luced
(on
VI
3)
a
red
light
on
the
right
key;
responses
on
the
red
right
key
produced
food
on
FR
x.
In
a
similar
fashion,
when
the
left
key
was
green,
responses
on
the
left
key
produced
(on
VI
3)
a
blue
light
on
the
right
key;
responses
on
the
blue
right
key
produced
food
on
Fl
x.
Technically,
this
is
mult
(chain
VI
3
Fl
x)
(chain
VI
3
FR
x).
If
the
red
stimulus,
indicat-
ing
FR
x,
and
the
blue
stimulus,
indicating
Fl
x,
are
equally
effective
conditioned
re-
inforcers,
response
rates
should
be
about
equal
in
the
white
and
green
stimuli.
Fig.
4
shows
the
performance
of
one
bird
when
red
was
correlated
with
FR
50
and
blue
with
Fl
10.
The
bird
responded
at
a
relatively
high
rate
in
the
pre-FR
stimulus
(record
B) but
at
a
very
low
rate
in
the
pre-FI
stimulus
(record
C).
In
a
subsequent
session,
the
red
and
blue
stimuli
were
reversed;
i.e.,
red
was
correlated
with
Fl
10
and
blue
with
FR
50.
After
the
reversal,
the
blue
stimulus
became
an
effective
/R'S/SEC
0
0.
0
10
MINUTES
+
TOR's
IN
2
MR
---
Fig.
4.
Performance
on
?nult
(chain
VI
3
Fl
10)
(chain
VI
3
FR
50).
Record
A
shows
Fl
10
and
FR
50
per-
formances
in
the
second
components;
pips
indicate
food
deliveries.
The
Fl
10
and
FR
50
schedules
can
be
distinguished
by
inspection
of
the
cumulative
records.
Record
B
shows
performance
in
the
first
component
VI
3
schedule
that
preceded
the
FR
50
stimulus.
Record
C
shows
performance
in
the
first
component
VI
3
schedule
that
preceded
Fl
10
stimulus
(from
Ferster
&
Skinner,
1957,
courtesy
of
Appleton-Century-Crofts).
555
ROGER
T.
KELLEHER
and
LEWIS
R.
GOLLUB
10
MINUTES
Fig.
5.
Cumulative
response
records
of
chlain
FR
20
Fl
x.
Records
A
and
C
show
performances
on
the
first
and
second
components,
respectively,
of
chain
FR
20
FI
1.
Record
B
shows
the
first
component
of
chain
FR
20
Fl
2
(from
Ferster
and
Skinner,
1957,
courtesy
of
Appleton-Century-Crofts).
conditioned
reinforcer
while
the
red
stimulus
lost
its
conditioned
reinforcing
effectiveness.
The
rates
in
each
S1
depended
upon
the
schedule
associated
with
the
stimulus.
The
frequency
of
food
reinforcement
on
Fl
was
increased
tenfold,
from
Fl
10
to
Fl
1.
As
the
reinforcement
frequency
in-
creased,
performance
on
the
Fl
became
almost
indistinguishable
from
performance
on
FR
50,
and
response
rates
on VI
3
in
green
(pre-FI
stimulus)
became
comparable
to
response
rates
on
VI
3
in
white
(pre-FR
stimulus).
These
results
show
that
a
conditioned
reinforcer
established
by
Fl
1
may
be
as
effective
as
one
established
by
FR
50.
Gollub
(1958)
changed
reinforcement
sched-
ules
from
tand
Fl Fl
to
chain
Fl
FI;
the
schedules
were
Fl
0.5
FI
0.5,
Fl
1
Fl
1,
and
Fl
3
Fl
2.
The
transition
from
tand
Fl
3
Fl
2
to
chain
Fl
3
Fl
2
was
described
above;
the
other
transitions
were
similar.
For
example,
rates
(responses
per
min)
on
tand
Fl
1
Fl
1
for
one
bird
were
12.5
in
the
first
component
and
99.1
in
the
second
component;
the
pat-
terns
of
responding
were
similar
to
those
that
occur
on
simple
Fl
2
schedules.
Later,
the
schedule
was
changed
from
tand
Fl
1
Fl
1
to
chain
Fl
1
Fl
1
and
average
response
rates
(responses
per
min)
in
the
first
component
decreased
from
12.5
to
5.9;
average
rates
in
the
second
component
increased
from
99.1
to
104.5.
With
further
exposure
to
chain
Fl
1
Fl
1,
however,
rates
in
the
first
component
in-
creased
to
16.7,
and
rates
in
the
second
com-
ponent
decreased
to
90.5.
Gollub
suggests
that
the
transition
to
the
chained
schedule
consists
of
three
effects.
The
first
effect
is
dlue
to
the
introduction
of
stimuli
which
are
novel
to
the
pigeon;
the
magnitu(le
an(d
direction
of
this
effect
dependls
on
the
type
of
stimulus
and
the
bird's
previous
history.
The
second
effect,
which
is
character-
ized
by
long
pauses
in
S2
(first
component)
and
high
rates
in
S,
(second
component)
is
due
to
a
discrimination
contingency;
responding
in
S,
is
reinforced
by
food
while
responding
in
S2
is
not.
The
third
effect
is
the
development
of
S,
as
a
conditioned
reinforcer.
Gollub
concludes:
"By
the
end
of
four
or
five
sessions
the
rate
in
the
initial
component
of
the
two-
component
chains
had
increased
consider-
ably
over
the
value
previously
maintained,
and
a
scallop
frequently
appeared
in
both
S,
and
S,.
The
fact
that
the
rate
in
S2
had
increased
over
the
rate
both
early
in
the
chain
and
under
the
tandem
schedule
is
explained
by
the
fact
that
SI
became
a
con-
ditioned
reinforcer,
strengthening
respond-
ing
in
S2."
(Gollub
1958,
p.
44.)
Ferster
and
Skinner
(1957,
p.
681)
com-
pared
the
performance
of
pigeons
on
chain
FR
20
Fl
1
with
the
performance
on
chain
FR
20
Fl
2.
TIhe
performance
of
one
bird
under
these
schedules
is
shown
in
Fig.
5.
On
chain
FR
20
Fl
1,
sustained
response
rates
of
about
150
responses
per
min
occurred
on
Fl
1
(record
C);
the
pausing
and
scalloping,
which
are
characteristic
of
performance
on
simple
Fl
schedules
did
not
occur.
There
were
consistent
pauses
after
reinforcement
on
FR
20
(record
A),
which
would
be
unusual
on
FR
556
A
REVIEW
OF
POSITIVE
CONDITIONED
REINFORCEMENT
20
reinforced
with
food.
When
reinforcement
frequency
in
S,
was
decreased
by
a
schedule
change
from
Fl
1
to
Fl
2,
average
response
rates
on
Fl
decreased
only
slightly;
however,
performance
on
the
FR
20
component
(record
B)
became
severely
strained.
These
results
suggest
that
the
effectiveness
of
S,
as
a
con-
ditioned
reinforcer
is
directly
related
to
the
frequency
of
reinforcement
in
Si.
5.
FR
Schedule
in
S,
The
probability
of
reinforcement
under
a
fixed-ratio
schedule
is
determined
by
the
num-
ber
of
responses
which
are
required
for
each
reinforcement.
Because
the
emission
of
a
fixed
number
of
responses
requires
a
period
of
time,
the
rate
of
responding
will
indirectly
deter-
mine
the
frequency
of
reinforcement
in
time,
although
this
time
is
not
fixed.
Ferster
and
Skinner
(1957,
p.
667
ff)
con-
ducted
parametric
studies
of
chain
VI
1
FR
x
schedules.
In
one
experiment
birds
were
re-
inforced
on
chain
VI
1
FR
x,
with
x
varied
from
50
to
300
responses.
Fig.
6
shows
the
performance
of
one
bird
as
the
FR
require-
ment
was
increased
from
75
to
300.
On
chain
VI
1
FR
50,
performance
in
S2
(record
B)
was
more
erratic
than
performance
on
simple
VI
1
schedules
of
food
reinforcement;
however,
response
rates
were
fairly
high.
As
the
re-
quirement
was
increased,
however,
the
per-
formance
on
FR
became
strained;
i.e.,
long
periods
of
low
response
rates
or
pausing
oc-
curred
before
the
FR
requirement
was
com-
pleted
at
the
characteristic
high
response
rate
(see
record
E).
The
frequency
and
duration
of
these
periods
of
straining
increased
as
the
FR
requirement
was
increased
from
50
to
300;
however,
the
response
rates
which
abruptly
terminated
these
periods
of
straining
remained
very
high.
As
the
FR
requirement
in
S,
was
increased,
the
response
rate
on
VI
in
S2
de-
creased
accordingly
(records
D
and
F).
Average
response
rates
on VI
decreased
at
relatively
small
FR
requirements
under
which
average
FR
response
rates
did
not
decrease.
Investigators
have
studied
the
performances
of
pigeons
and
rats
on
chain
Fl
4
FR
x
(Ferster
and
Skinner,
1957;
Hanson
and
Witoslawski,
1959).
Ferster
and
Skinner
investigated
chain
Fl
4
FR
x
with
two
birds
that
had
been,trained
on
miult
Fl
FR.
Although
the
FR
schedule
was
varied
between
FR
20
and
FR
60,
only
the
performances
on
chain
Fl
4
FR
60
were
published.
The
rates
and
patterns
of
respond-
ing
on
FR
60
were
typical
of
performances
on
simple
FR
60.
On
Fl
4
the
characteristic
accelerated
responding
(scallop)
occurred;
al-
though,
pauses
were
longer
than
those
usually
generated
by
Fl
4
reinforced
by
food.
Hanson
an(i
Witoslawski
(1959)
used
rats
to
investigate
chain
Fl
4
FR
x,
with
FR
sched-
ules
requiring
5,
60,
and
120
responses.
Re-
sponse
rates
in
the
Fl
component
were
in-
versely
related
to
the
response
requirement
under
FR.
At
FR
120,
the
rats
frequently
paused
for
more
than
4
min
in
the
Fl
4
component.
tY
19/
j
/
r
r
t
,0
MINUTES
I
t
I
r/
Fig.
6.
Cumulative
response
records
of
chain
VI
1
FR
x.
Performances
on
the
first
and
second
components
of
the
chain
were
recorded
separately;
pips
indicate
the
completion
of
each
component.
Records
A,
C,
and
E
show
performances
in
S,
as
the
FR
was
increased
from
75
to
300;
Records
B,
D,
and F
show
the
corresponding
per-
formances
on
VI
1
in
S2
(from
Ferster
and
Skinner,
1957,
courtesy
of
Appleton-Century-Crofts)
.
557
ROGER
T.
KELLEHER
and
LEWIS
R.
GOLLUB
The
results
of
these
experiments
suggest
that
the
conditioned
reinforcing
effectiveness
of
SI
was
directly
related
to
the
probability
of
reinforcement
in
SI.
6.
VR
Schedule
in
S,
Variable-ratio
schedules,
like
fixed-ratio
schedules,
determine
the
probability
of
re-
inforcement
by
providing
a
reinforcement
after
a
specified
number
of
responses.
Unlike
the
fixed-ratio
schedule,
the
number
of
re-
sponses
required
on
each
occasion
under
variable-ratio
is
variable,
and
only
the
average
response
requirement
is
specified.
The
vari-
able-ratio
schedule
generates
very
high
rates
of
responding,
with
occasional
pauses,
but
these
pauses
are
usually
not
correlated
with
reinforcements.
Using
the
two-key
procedure
described
above,
Autor
(1960)
studied
the
performances
of
pigeons
on
chain
VI
1
VR
x
schedules.
The
primary
reinforcement
schedule
on
key
A
was
held
constant
at
VR
40,
while
the
primary
re-
inforcement
schedule
on
key
B
was
systemat-
ically
varied
from
VR
100
to
VR
16.
When
the
terminal
schedules
were
both
VR
40,
response
rates
in
the
first
components
(VI
1
schedules)
on
keys
A
and
B
were
about
equal.
Autor
found
a
linear
relation
between
the
relative
response
rates
in
the
first
components
and
the
relative
VR
requirements
in
the
second
components.
The
conditioned
rein-
forcing
effect
of
the
stimulus
correlated
with
the
second
component
was
directly
related
to
the
probability
of
reinforcement.
Although
we
have
discussed
the
conditioned
reinforcing
effectiveness
of
S1
as
a
function
of
frequency
of
reinforcement
for
VI,
VI
drl,
DRL,
and
Fl
schedules
and
as
a
function
of
probability
of
reinforcement
for
FR
and
VR
schedules,
it
is
obvious
that
frequencies
and
probabilities
of
reinforcement
were
closely
related
in
all
of
these
experiments.
In
most
instances,
however,
it
was
difficult
to
compare
directly
the
effects
of
frequency
and
proba-
bility.
Autor
(1960)
computed
both
frequencies
and
probabilities
of
reinforcement
from
each
of
his
studies
of
chain
VI
1
VI
x
and
chain
VI
1
VR
x
schedules.
For
each
VI-
or
VR-
schedule
in
S1,
Autor
determined
both
the
mean
number
of
responses
per
reinforcement
and
the
mean
number
of
reinforcements
per
min.
The
conditioned
reinforcing
effectiveness
of
S,
was
related
in
a
linear
fashion
to
both
the
frequency
and
the
probability
of
re-
inforcement
in
Sl.
Autor
(1960,
p.
41)
con-
cludes
that
"although
both
variable-interval
and
variable-ratio
schedules
were
used
to
pro-
gram
the
availability
of
food
presentations,
the
choice
of
whether
to
select
frequency
or
probability
of
reinforcement
as
the
indepen-
dent
variable
does
not
appear
to
be
critical."
7.
Uncorrelated
Schedules
in
S,
The
relationship
between
frequency
of
re-
inforcement
in
SI
and
conditioned
reinforcing
effectiveness
of
S,
is
emphasized
when
rein-
forcements
are
not
contingent
upon
any
specific
behavior.
Using
a
delay
of
reinforce-
ment
paradigm
with
pigeons,
Ferster
(1953)
investigated
several
variations
of
chain
VI
Fl
schedules.
In
the
first
experiment,
responses
in
the
presence
of
S2
produced
Si
(dark
experi-
mental
chamber)
on
a
VI
1
schedule;
after
a
fixed,
1-min
interval
of
time
in
Si,
food
was
presented
independently
of
responses
(uncor-
related
Fl
1).
The
use
of
a
dark
experimental
chamber
as
S,
helped
to
insure
that
no
re-
sponding
would
occur
in
the
presence
of
Sl.
Under
these
conditions,
the
birds
stopped
re-
sponding
in
S2*
In
a
second
experiment,
the
uncorrelated
(uncorr)
Fl
value
in
Si
(dark
chamber)
was
first
1
sec
and
was
then
gradually
increased
to
1
min.
Under
these
conditions,
the
birds
continued
responding
in
S2.
The
results
of
Ferster's
experiments
indicate
that
the
appearance
of
SI,
even
when
no
behavior
was
required
in
Sl,
reinforced
responding
in
S2;
but
the
maintenance
of
responding
in
S2
depended
upon
the
training
procedure.
Ferster
suggested
that
"superstitious"
or
mediating
behavior,
might
not
have
had
a
chance
to
develop
in
his
first
experiment,
but
might
have
been
conditioned
when
the
un-
correlated
Fl
in
SI
was
gradually
increased
in
his
second
experiment
(cf.
Skinner,
1948).
This
interpretation
was
evaluated
in
a
third
ex-
periment
in
which
responses
in
S2
produced
Si
on
a
VI
1
schedule
(response
key
was
dark
but
general
illumination
remained);
food
was
presented
on
an
uncorr
Fl
1
schedule
in
Si.
However,
every
response
in
S,
postponed
re-
inforcement
for
1
min.
Under
these
condi-
tions,
the
birds
pecked
the
key
in
S2,
but
not
in
Si.
Observations
revealed
that
the
birds
had
developed
"superstitious"
response
pat-
terns
(e.g.,
"turning
in
a
circle
with
head
558
A
REVIEW
OF
POSITIVE
CONDITIONED
REINFORCEMENT
stretched
high")
in
S1.
Behavior
that
is
in-
advertently
reinforced
in
SI
may
influence
the
conditioned
reinforcing
effectiveness
of
S,.
Ferster
(1953)
also
trained
pigeons
on
chain
VI
1
Fl
1.
Constant
response
rates
developed
on
VI
1
in
S2,
and
positively
accelerated
re-
sponding
developed
on
Fl
1
in
SI.
In
the
second
phase
of
the
experiment,
food-delivery
occurred
in
SI
on
uncorr
Fl.
Although
re-
sponse
rates
in
SI
declined
to
low
values,
the
birds
maintained
constant
response
rates
in
S2.
In
a
similar
experiment,
Ferster
and
Skin-
ner
(1957,
p.
684)
changed
the
schedule
from
chain
VI
1
Fl
1
to
chain
VI
1
uncorr
Fl
1.
Again,
constant
response
rates
were
main-
tained
on
the
VI
1
component
in
S2,
while
response
rates
fell
to
almost
zero
in
the
uncorr
Fl
1
component
in
SI.
These
investigators
conclude
that
"the
conditioned
reinforcement
of
the
property
of
the
stimulus
for
the
second
schedule
would
appear
to
derive
from
its
correlation
with
food,
regardless
of
whether
or
not
the
bird
is
responding
in
the
presence
of
the
second
stimulus"
(Ferster
and
Skinner,
1957,
p.
685).
Ferster
and
Skinner
qualify
this
conclusion
by
noting
that
response
rates
in
S,
did
not
reach
zero.
Using
the
two-key
procedure
described
pre-
viously,
Autor
(1960)
has
studied
the
per-
formances
of
pigeons
on
chain
VI
1
uncorr
VI
x
schedules.
The
schedule
in
the
second
component
on
key
A
was
always
uncorr
VI
15
sec,
while
the
schedule
in
the
second
compo-
nent
on
key
B
was
varied
from
uncorr
VI
1
to
uncorr
VI
3.75
sec.
In
Autor's
experiments
responses
did
not
occur
on
the
uncorr
VI
schedules.
When
both
terminal
schedules
were
uncorr
VI
15
sec,
response
rates
in
the
first-
component
VI
1
schedules
on
keys
A
and
B
were
about
equal.
As
the
frequency
of
primary
reinforcement
in
the
second-component
on
key
B
was
increased,
the
first-component
VI
1
re-
sponse
rate
on
key
B
increased
in
a
linear
fashion.
In
summary,
the
results
obtained
with
uncorr
VI
schedules
were
the
same
as
those
obtained
with
standard
VI
schedules.
Autor's
results,
which
are
consistent
with
the
results
of
Ferster
and
Skinner
(1957),
show
that
responding
in
the
first
component
of
a
chained
schedule
can
be
maintained
whether
or
not
responses
occur
in
the
presence
of
SI.
This
suggests
that
the
conditioned
reinforc-
ing
effectiveness
of
S,
is
directly
related
to
the
frequency
of
reinforcement
in
S,
even
when
no
responses
occur
in
S,.
These
data
also
sug-
gest
that
establishing
S,
as
a
discriminative
stimulus
is
not
a
necessary
condition
for
estab-
lishing
SI
as
a
conditioned
reinforcer.
Unrecorded
response
topographies
can
be
"superstitiously"
developed
on
uncorr
Fl
or
uncorr
VI
schedules.
Must
superstitious
be-
havior
(levelop
on
an
uncorrelate(d
schedule
in
S,
if
S,
is
to
become
a
con(litioned
reinforcer?
In
Ferster's
(1953)
experiment
on
chain
VI
1
uncorr
Fl
1
recorded
response
rates
in
S,
did
not
reach
zero.
In
Autor's
experiments
on
chain
VI
1
uncorr
VI
x,
response
rates
in
SI
did
reach
zero.
One
might
argue
that
the
short
intervals
on
uncorr
VI
x
would
favor
the
de-
velopment
of
superstitious
behavior.
(The
sub-
ject's
behavior,
other
than
key-pecking,
was
not
observed
in
these
experiments.)
On
the
other
hand,
there
is
no
unequivocal
evidence
that
the
occurrence
of
superstitious
behavior
in
SI
is
necessary
for
maintaining
S,
as
a
con-
ditioned
reinforcer.
Further,
the
experimental
evidlence
shows
that
the
con(litioned
reinforc-
ing
effectiveness
of
Si
is
not
related
to
the
re-
sponse
rate
occurring
in
Sl.
Finally,
super-
stitious
behavior
is
extremely
variable
(cf.
Morse
1955),
and
it
is
unlikely
that
it
would
mediate
orderly
conditioned
reinforcement
functions,
such
as
those
reported
by
Autor
(1960).
Since
it
is
(lifficult
to
prove
that
un-
recordled
behavior
is
not
occurring
in
the
presence
of
a
stimulus,
those
who
suggest
that
unrecorded
behavior
mediates
conditioned
re-
inforcing
effects
must
demonstrate
that
it
does
so.
In
our
opinion,
the
results
show
that
S,
can
be
a
conlitipiwd
reinTorce
n
though
no
specifc
response
pattern
occurs
in
the
presence
of
S,.
In
summarizing
experiments
on
two-com-
ponent
chaine(d
sche(lules
we
will
relate
the
results
to
the
four
questions
raise(l
at
the
be-
ginning
of
this
section.
(1)
Generally,
the
conditioned
reinforcing
effectiveness
of
S,
is
directly
related
to
both
frequency
and
prob-
ability
of
reinforcement
in
Sl.
In
most
of
the
experiments
in
which
frequency
and
probabil-
ity
of
reinforcement
vary
jointly,
it
is
difficult
to
determine
whether
one
is
more
influential
than
the
other.
One
experimept
(Autor,
1960)
that
compared
frequency
and
probability
of
reinforcement
in
SI
indicates
that
they
are
equally
effective
in
determining
the
condi-
tioned
reinforcing
effectiveness
of
SI.
Never-
theless,
S,
can
be
a
conditioned
reinforcer
even
559
ROGER
T.
KELLEHER
and
LEWIS
R.
GOLLUB
when
reinforcements
in
S,
are
delivered
inde-
pendently
of
responses.
(2)
At
a
given
fre-
quency
or
probability
of
reinforcement,
the
conditioned
reinforcing
effectiveness
of
S,
is
apparently
unrelated
to
the
pattern
of
re-
sponding
that
occurs
in
S,
(Ferster
and
Skin-
ner,
p.
684
ff:
Autor,
1960).
(3)
On
the
other
hand,
the
results
of
two
experiments
(Ferster
and
Skinner,
1957,
pp.
678
ff;
Brady
and
Thach,
1960)
suggest
that
the
conditioned
reinforcing
effectiveness
of
S,
can
depend
more
upon
the
type
of
schedule
in
S,
than
on
the
frequency
of
reinforcement
in
Si.
(4)
The
re-
sults
of
several
experiments
(Ferster
and
Skin-
ner,
1957,
p.
685;
Gollub,
1958;
Autor,
1960)
indicate
that
establishing
SI
as
a
discriminative
stimulus
is
neither
a
necessary
nor
sufficient
condition
for
establishing
S,
as
a
conditioned
reinforcer.
C.
CONDITIONED
REINFORCEMENT
IN
EXTENDED
CHAINED
SCHEDULES
The
number
of
components
in
a
chained
schedule
can
be
extended
indefinitely
if
enough
discriminative
stimuli
are
available
and
if
performance
can
be
maintained.
In
a
three-component
chained
schedule
with
FR
30
in
each
component,
for
example,
the
30th
re-
sponse
in
S3
produces
S2,
the
30th
response
in
S2
produces
Sl,
and
the
30th
response
in
S,
produces
food.
The
component
schedules
of
an
extended
chained
schedule
may
be
the
same,
as
in
this
example,
or
they
may
differ
from
each
other.
The
extended
chained
sched-
ule
in
the
rat
demonstration
described
in
the
first
section
of
this
review
is
a
10-component
chained
schedule.
"Token
reinforcement"
studies
with
primates
are
also
extended
chained
schedules.
In
analyzing
conditioned
reinforcement
in
extended
chained
schedules
the
same
ques-
tions
asked
for
two-component
chained
sched-
ules
can
again
be
raised;
however,
extended
chained
schedules
emphasize
the
importance
of certain
variables.
For
example,
the
dual
role
of
the
component
stimuli
must
be
con-
sidered.
A
stimulus
in
an
extended
chained
schedule
may
be
both
a
discriminative
stim-
ulus,
in
which
a
conditioned
(or
primary)
re-
inforcement
is
contingent
upon
an
operant
response,
and
a
conditioned
reinforcing
stim-
ulus
for
operant
responses
that
precede
it.
If
a
conditioned
reinforcer
is
weaker
than
the
reinforcer
used
to
establish
it,
the
number
of
components
in
which
behavior
can
be
main-
tained
will
be
limited;
i.e.,
the
conditioned
reinforcing
effect
of
S4
in
a
chained
schedule
might
be
too
weak
to
develop
or
maintain
be-
havior
in
S.5.
Investigations
of
extended
chained
schedules
should
enable
determina-
tion
of
the
chain
length
at
which
the
condi-
tioned
reinforcing
effect
is
attenuated.
When
attenuation
does
occur,
it
should
be
possible
to
identify
the
necessary
and
sufficient
condi-
tions
for
establishing
and
maintaining
a
stim-
ulus
as
a
conditioned
reinforcer.
1.
FR
in
Each
Component
The
highest
probability
of
reinforcement
on
an
FR
schedule
is
on
CRF
(FR
1),
when
every
response
is
reinforced.
Several
experi-
ments
have
investigated
extended
chains
with
FR
1
in
each
component.
Arnold
(1947a,
1947b)
studied
four-component
chains
of
FR
1.
In
the
first
experiment,
each
time
a
rat
pressed
a
button
in
the
wall
of
the
experimental
chamber,
a
motor
moved
the
wall,
and
an
al-
most
identical
response
button
appeared;
pressing
the
second
button
moved
a
third
one
into
place,
and
so
forth.
Pressing
the
fourth
btutton
produced
food.
In
the
second
experi-
ment,
four
different
manipulanda
were
pre-
sented
successively,
and
the
first
response
on
the
fourth
manipulandum
produced
food.
From
a
series
of
daily
trials,
Arnold
recorded
mean
and
median
response
latencies.
In
the
final
trials
on
each
procedure,
latencies de-
creased
from
the
first
to
the
third
response,
but
tended
to
increase
on
the
fourth
response.
(In
the
second
experiment,
the
median
laten-
cies
decreased
from
the
first
to
the
fourth
response.)
N.
V.
Napalkov
(1959)
conditioned
pigeons
on
chains
comprised
of
as
many
as
seven
com-
ponents.
Table
2
is
an
outline
of
one
of
Napalkov's
experiments,
showing
the
re-
sponses,
and
the
discriminative
stimulus
and
the
reinforcing
stimulus
used
to
establish
and
maintain
each
response.
This
extended
chain
was
developed
in
reverse
order.
First,
every
peck
on
a
"special
lever"
was
reinforced
with
food.
Then
the
pecking
response
was
rein-
forced
only
in
the
presence
of
the
white
light,
thereby
establishing
the
light
as
a
discrimina-
tive
stimulus
for
pecking.
Next,
the
white
light
was
presented
only
after
the
pigeon
had
jumped
onto
a
platform;
the
pigeon
would
560
A
REVIEW
OF
POSITIVE
CONDITIONED
REINFORCEMENT
Table
2
Summary
of
Napalkov's
Chaining
Procedure
1
2
3
4
Response
Discrimlinative
Reinforcing
No.
of Trials
Stimulus
Stimulus
for
Response
to
Occur
in
SD
1.
Peck
lever
white
light
(Si)
food
14-18
2.
Jump
on
black
air
vane
platform
rotating
(S.,)
white
light
12-20
3.
Jump
onto
floor
of
apparatus
whistle
(S,)
black
air
vane
20-30
4.
Jump
down
onto
a
platform
blue
light
(S4)
whistle
18-36
5.
Jump
onto
rod
horn
(siren)
(S,)
blue
light
31-45
6.
Jump
into
right
section
of
apparatus
bell
(S,)
horn
40-55
7.
Jump
into
left
section
of
chamber
and
large
white
air
up
onto
a
shelf
vane
(S7)
bell
42-61
then
peck
the
lever,
and
receive
food.
The
jumping
response
then
produced
the
white
light
only
in
the
presence
of
a
spinning
air
vane
(establishing
the
vane
as
a
discriminative
stimulus
for
jumping).
Other
members
of
the
chain
were
added
in
a
similar
manner.
Some
of
the
responses
were
first
evoked
through
"external
prompting";
e.g.,
leaps
onto
a
small
perch
were
induced
when
a
"special
object
was
moved
cautiously
near
to
the
bird."
This
procedure
is
a
seven-component
chained
sched-
ule
with
FR
1
in
each
component.
Column
4
in
Table
2
shows
that
the
num-
ber
of
trials
required
to
add
each
component
to
the
chain
was
smaller
for
those
compo-
nents
closer
to
food-delivery.
Napalkov
states
that
performance
on
this
chained
schedule
was
stable
for
10
months
of
experimentation,
and
remained
intact
even
after
a
three-month
period
of
no
experimentation.
The
exterocep-
tive
stimuli
exerted
discriminative
control
be-
cause
"when
one
of
the
intermediate
condi-
tioned
stimuli
in
a
chain
was
administered
the
pigeon
always
responded
by
the
corre-
sponding
conditioned
reflex"
(Napalkov,
1959).
Napalkov
also
imposed
delays
of
up
to
2
min
between
the
occurrence
of
a
response
and
the
presentation
of
its
reinforcing
stim-
ulus.
In
an
experiment
with
a
five-component
chain,
for
example,
a
response
in
S4
was
fol-
lowed
by
S3
with
a
short
delay,
which
was
gradually
lengthened
to
2
min.
Response
la-
tencies
increased,
but
responding
was
main-
tained
despite
the
delay.
When
the
2-min
de-
lay
was
intro(luced
abruptly
behavior
was
more
severely
disrupted
than
when
the
delay
was
increased
gradually.
This
result
is
similar
to
Ferster's
(1953)
finding
that
when
there
was
a
1
min
delay
between
a
key
peck
and
foo(d
delivery,
responding
was
maintained
only
if
the
delay
had
been
gradually
increased
to
1
min.
Shirkova
and
Verevkina
(1960)
also
investi-
gated
extended
chained
schedules
with
FR
1
in
each
component.
Their
investigations
dem-
onstrated
that
the
performance
in
one
com-
ponent
of
the
chain
affected
a
topographically
different
response
in
another
component
of
the
chain.
Shirkova
and
Verevkina
(1960)
es-
tablished
complex
chains
of
responses
in
sev-
eral
types
of
primate.
In
addition,
they
in-
troduced
several
interesting
novelties
into
the
basic
heterogeneous
chaining
situation.
An
ex-
ample
of
the
basic
chained
schedule
is
the
fol-
lowing:
"In
response
to
a
blue
light,
the
monkey
started
to
move
the
glider
(lever),
with
the
green
light
he
turned
the
air
vane,
and
with
the
red
light
he
pushed
the
arm
lever
which
gave
him
food."
The
method
used
by
Napalkov
for
introducing
each
seg-
ment
of
the
chain
in
the
reverse
order
was
also
used
by
Shirkova
and
Verevkina.
This
sequence
"was
easily
established,
each
link
561
ROGER
T.
KELLEHER
and
LEWIS
R.
GOLLUB
being
formed
after
2-3
combinations,
and
the
reinforcement
of
all
chains
of
reflexes
occurred
after
2-59
combinations
of
stimuli
and
food."
Because
the
several
responses
in
the
chain
had
different
topographies,
the
degree
of
ex-
teroceptive
stimulus
control
could
be
com-
pared
with
the
degree
of
control
exerted
by
other
sources,
such
as
response-produced
stim-
uli.
This
was
done
by
withholding
the
appro-
priate
stimulus
following
a
response.
The
authors
observed
six
stages
in
the
develop-
ment
of
control.
In
the
initial
stage
exterocep-
tive
stimuli
exercised
little
control,
and
any
movement
could
occur
to
any
stimulus.
In
the
final
stage,
the
appropriate
response
occurred
to
each
stimulus
regardless
of
order.
In
another
part
of
the
experiment,
Shirkova
and
Verevkina
(1960)
established
more
com-
plex
performances
by
a
procedure
similar
to
the
free-operant
experiments
on
SA
avoid-
ance7
(Ferster,
1
958a).
In
the
final
stages
of
this
experiment
the
reinforcement
at
the
end
of
one,
two-component
chain
was
the
removal
of
SA,
after
which
the
organism
was
exposed
to
either
the
first
or
second
components
of
the
original
two-member
chain.
Responding
in
this
chain
led
to
food
reinforcement.
Since
no
responses
occur
in
SA,
that
stimulus
is
re-
ferred
to
as
an
inhibitory
stimulus,
and
a
re-
sponse
that
removes
it
is
called
a
disinhibi-
tory
response.
Disinhibitory
chains
with
one
and
two
components
were
studied.
The
au-
thors
refer
to
similar
experiments
with
a
num-
ber
of
different
species,
and
observe
that
two-
component
disinhibitory
chains
were
easily
established
and
maintained
in
various
pri-
mates,
but
that
disinhibitory
chains
were
not
established
in
turtles,
and
"in
pigeons,
rats,
cats
and
dogs
the
first
disinhibitory
chain
is
produced
with
difficulty"
(Shirkova
and
Ver-
evkina,
1960).
FR
Schedules
with
Added
Counters.
Ex-
tended
chained
schedules
with
FR
in
each
component
have
been
investigated
by
Ferster
and
Skinner
(1957,
pp.
89ff.).
In
these
experi-
ments
pigeons
performed
on
FR
schedules
with
"added
counters."
On
FR
70
with
added
counter,
for
example,
an
illuminated
slit
projected
on
the
response
key
increased
in
length
every
time
the
pigeon
responded;
each
response
produced
the
same
increment
in
the
size
of
the
slit.
Technically,
this
procedure
is
7SA
is
any
stimulus
that
is
correlated
with
extinction.
a
70-component
chained
schedule
with
FR
1
in
each
component.
Ferster
and
Skinner
used
two
control
pro-
cedures
before
introducing
the
extended
chain.
In
these
control
procedures,
the
birds
were
trained
on
FR
70
with
the
slit
always
large
(S,)
or
with
the
slit
presented
at
all
sizes,
uncorrelated
with
reinforcement.
These
pro-
cedures
are
equivalent
to
tandem
controls
for
the
later
performance
on
the
extended
chain.
When
the
birds
which
had
previously
been
reinforced
on
FR
70
with
SI
continuously
pre-
sented
were
exposed
to
the
chain,
they
de-
veloped
very
high
response
rates.
One
of
the
two
birds
trained
with
uncorrelated
stimuli
developed
very
high
response
rates
which
did
not
increase
under
the
extended
chain;
the
other
developed
lower
response
rates
which
increased
on
the
extended
chain.
In
most
of
the
studies
of
"added
counters,"
more
responses
were
required
to
increase
the
length
of
the
slit
near
the
end
of
each
ratio
than
at
the
start;
i.e.,
the
sizes
of
the
com-
ponent
FR
schedules
were
actually
larger
near
the
end
of
the
extended
chain.
When
two
pigeons
were
trained
on
FR
190
with
the
slit
small
throughout,
and
then
exposed
to
FR
190
with
added
counter
(slit
growing),
they
even-
tually
responded
at
higher
average
rates
on
FR
190
with
added
counter
than
they
had
on
simple
FR
190.
The
stimulus
control
exerted
by
the
counter
was
investigated
by
reversing
the
counter;
i.e.,
the
slit
was
large
at
the
start
of
the
ratio
and
became
smaller
with
responses.
Under
this
stimulus
reversal
procedure,
re-
sponding
was
negatively
accelerated
in
each
of
several
total
chains.
The
same
pigeons
were
studied
under
larger
response
requirements
with
added
counters.
One
bird
maintained
high
average
response
rates
at
FR
380;
the
second
bird
showed
severe
straining
(paused
as
long
as
1.5
hr)
at
FR
200
even
with
the
added
counter.
When
the
counter
was
reversed
for
this
second
bird,
the
pausing
was
temporarily
eliminated
and
responding
was
negatively
accelerated
in
each
FR.
Ferster
and
Skinner
(1957,
p.
93)
con-
cluded
that:
"The
long
pauses
after
reinforce-
ment
at
larger
fixed
ratios
are
clearly
not
due
to
a
factor
such
as
physical
exhaustion
or
fatigue,
but
to
the
extremely
unfavorable
stimuli
then
present."
On
FR
schedules
with
high
response
re-
quirements
and
added
counters,
responding
562
A
REVIEW
OF
POSITIVE
CONDITIONED
REINFORCEMENT
was
negatively
accelerated.
For
example,
on
FR
380
with
added
counter,
the
pigeon
paused
no
more
than
a
few
seconds
after
reinforce-
ment;
the
first
100
responses
occurred
at
a
rate
of 10
per
sec,
followed
by
an
abrupt
shift
to
about
5
responses
per
sec
(Ferster
and
Skin-
ner,
1957,
p.
92).
Ferster
and
Skinner
(1957,
p.
97)
suggest:
"More
negative
curvature
in
larger
fixed-ratios
may
occur
because
the
nec-
essarily
smaller
increase
in
the
size
of
the
slit
per
response
is
not
so
reinforcing."
Some
studies
of
counters
that
grew
most
rapidly
in
certain
parts
of
the
ratio
indicated
that
the
highest
rates
occurred
when
the
slit
was
chang-
ing
the
largest
amount
per
response,
while
the
lowest
rates
occurred
where
the
slit
changes
were
smallest.
Ferster
and
Skinner
(1957,
p.
97)
also
stud-
ied
performance
on
chain
FR
250
(with
added
counter)
FR
160
(blue
stimulus).
That
is,
the
slit
increased
for
250
responses
and
then
a
blue
light
was
presented;
the
160th
response
in
the
presence
of
the
blue
light
was
rein-
forced
by
food.
While
the
counter
was
in
effect,
response
rates
were
12
per
sec,
but
de-
creased
to
4
per
sec
in
the
presence
of
the
blue
light.
Although
the
size
of
the
slit
changed
con-
tinuously,
the
results
indicate
that
pigeons
discriminate
only
a
finite
number
of
different
slit
sizes.
One
should,
therefore,
consider
FR
schedules
with
added
counters
as
extended
chains
with
FR
1
or,
more
often,
low
FRs
in
each
component.
The
added
counters
in-
crease
average
response
rates
and
serve
to
maintain
performances
when
the
total
re-
sponse
requirement
is
large.
These
findings
suggest
that
responding
can
be
maintained
on
very
extended
chained
schedules
when
the
component
schedules
require
only
a
few
re-
sponses.
Because
response
rates
tend
to
be
highest
when
the
size
of
the
slit
is
changing
most
rapidly,
it
is
inferred
that
the
size
of
the
component
schedules
in
extended
chains
is
critical.
Unfortunately,
it
is
difficult
to
specify
the
size
of
the
component
schedules
with
a
continuous
added
counter.
This
prob-
lem
can
be
circumvented
by
the
use
of
what
Ferster
and
Skinner
(1957)
call
a
block
counter.
The
block
counter
is
an
extended
chained
FR
schedule
in
which
the
stimuli
change
discretely.
In
control
investigations,
Ferster
and
Skinner
(1957)
programmed
FR
90
and
FR
120
schedules
in
which
stimuli
changed
after
every
10
responses;
four
stimuli
(key
colors)
were
presented
in
a
variable
sequence.
When
the
schedule
was
changed
from
a
simple
FR
to
an
FR
with
a
variable
stimulus
se-
quence,
pauses
were
eliminated,
and
average
response
rates
increased.
After
further
expos-
ure
to
the
variable
stimulus
sequence,
re-
sponse
rates
were
comparable
to
rates
that
had
been
maintainedI
on
FR
schedules
with
one
stimulus
throughout.
Pigeons
were
exposed
first
to
FR
120
and
then
to
chain
FR
35
FR
35
FR
35
FR
20.
Pausing
developed
after
food-
delivery
and
terminal response
rates
increased.
Ferster
and
Skinner
(1957,
p.
116)
suggest
that
pausing
occurs
because
the
first
component
of
the
chained
schedule
is
an
unfavorable
stimulus
with
respect
to
food
delivery.
Ferster
and
Skinner
(1957,
pp.
681
ff.)
studied
several
chained
FR
schedules
in
which
the
total
response
requirement
was
120
re-
sponses
(chain
FR
15
FR
105,
chain
FR
30
FR
30
FR
30
FR
30,
and
chain
FR
60
FR
30
FR
30).
On
each
schedule,
substantial
paus-
ing
occurred
in
the
first
component
while
brief
pauses
occurred
at
the
start
of
the
other
components.
Figure
7
shows
performances
on
9
A
b
i
MIF
-
5
MIN.
Fig.
7.
Record
A
shows
performance
on
chain
FR
15
FR
105;
Records
B
and
C
are
from
successive
sessions
oni
chain
FR
30
FR
30
FR
30
FR
30
(from
Ferster
&
Skinner,
1957,
courtesy
of
Appleton-Century-Crofts).
.563
ROGER
T.
KELLEHER
and
LEWIS
R.
GOLLUB
chain
FR
15
FR
105
(record
A)
and
chain
FR
30
FR
30
FR
30
FR
30
(records
B
and
C).
The
data
which
the
authors
presented
were
not
extensive
enough
to
warrant
comparison
among
these
chained
schedules.
However,
pauses
in
the
first
component
of
each
of
the
chained
schedules
were
much
longer
than
those
that
occurred
on
simple
FR
120
sched-
ules,
on
FR
120
schedules
in
which
the
stimuli
appear
in
a
variable
sequence,
or
on
FR
120
with
a
"continuous
counter."
Why
does
the
"continuous
counter"
lead
to
decreased
pausing
after
reinforcement
and
in-
crease(l
average
response
rates,
whereas
the
"block
counter"
has
the
opposite
effects?
One
possibility,
which
was
suggested
by
Ferster
and
Skinner
(1957,
p.
116),
is
that
there
is
more
stimulus
generalization
between
the
ex-
treme
values
of
the
slit
of
the
continuous
coun-
ter
than
between
the
discretely
different
stim-
uli
of
the
block
counter.
On
the
other
hand,
under
large
response
requirements,
response
rates
of
10
per
sec
occurred
immediately
after
reinforcement,
when
the
slit
was
at
its
mini-
mum
length,
and
fell
to
2.5
per
sec
just
before
reinforcement,
when
the
slit
was
at
its
maximum
length.
This
finding
suggests
another
way
to
account
for
the
differences
between
performances
on
continuous
and
block
counters.
The
most
important
difference
between
the
two
types
of
counter
may
be
the
response
re-
quirement
in
each
component
of
the
chained
schedules.
Response
rates
with
the
continuous
counter
are
highest
when
a
relatively
large
change
in
the
size
of
the
slit
occurs
with
each
response.
On FR
120
with
continuous
counter,
overall
response
rates
are
very
high,
and
the
highest
response
rates
occur
after
reinforce-
ment
(Ferster
and
Skinner,
1957,
p.
103);
on
the
four-component
chained
schedule
with
FR
30
in
each
component
a
prolonged
pause
follows
reinforcement.
The
first
10
responses
on
FR
120
with
continuous
counter
produce
a
change
in
the
size
of
the
slit
which
is
prob-
ably
discriminable;
however,
the
bird
must
respond
30
times
on
the
four-component
chained
schedule
before
the
stimulus
changes.
If
the
changes
in
stimulus
in
each
case
are
weak
conditioned
reinforcers,
it
seems
prob-
able
that
responding
will
be
maintained
more
effectively
on
a
small
FR
schedule
(continuous
counter)
than
on
a
large
FR
schedule
(block
counter).
Our
interpretation
of
these
results
indicates
that
interactions
between
variables,
such
as
type
of
schedule
and
strength
of
conditioned
reinforcer,
must
be
considered.
If
a
condi-
tione(I
reinforcer
is
always
less
effective
than
the
reinforcer
used
to
establish
it,
a
stimulus
that
occurs
near
the
start
of
an
extended
chain
would
be
a
relatively
ineffective
conditioned
reinforcer.
Under
some
frequency
or
probabil-
ity
of
reinforcement,
however,
behavior
may
be
maintained
by
a
relatively
ineffective
re-
inforcer.
Further
experimental
investigations
can
determine
whether
these
factors
account
for
the
different
performances
generatedl
by
continuous
and
block
counters.
At
the
present
time,
we
shall
consider
other
studies
of ex-
ten(le(l
chained
schedules
which
are
relevant
to
these
interpretations.
FR
Schedutles
with
Token
Reinforcers.
The
same
basic
procedure
has
been
used
in
all
"token
reward"
experiments.
Initially,
food
is
delivered
whenever
the
sound
of
a
vending
machine
occurs.
The
subjects
are
then
trained
by
food
reinforcement
to
insert
tokens
(poker
chips)
into
the
vending
machine.
Finally,
the
delivery
of
the
token
is
made
a
consequence
of
some
response.
In
this
final
phase
the
ani-
mal
can
be
physically
prevented
from
insert-
ing
tokens
(Wolfe,
1936;
Cowles,
1937)
or
trained
not
to
insert
them
unless
a
specific
stimulus
is
present
(Kelleher,
1957c).
In
either
case,
the
animal
can
be
required
to
keep
the
tokens
for
a
specified
time
before
exchanging
them
for
food.
In
this
section,
exchange
will
refer
to
exchange
of
tokens
for
food;
exchange
interval
will
refer
to
the
length
of
time
during
which
the
animal
cannot
exchange
tokens,
and
exchange
ratio
will
refer
to
the
number
of
tokens
which
must
be
obtained
before
ex-
change
is
possible.
Schedules
of
token
reinforcement
with
ex-
change
ratios
resemble
extended
chained
schedules.
For
example,
if
an
animal
receives
one
token
for
20
responses
(FR
20),
and
the
exchange
ratio
is
10
tokens,
the
schedule
is
similar
to
a
10-component
chained
schedule
with
FR
20
in
each
component.
The
proce-
dures
differ
in
that
the
number
of
tokens
de-
livered
is
directly
correlated
with
the
number
of
food
reinforcements
that
can
be
obtained,
as
the
exchange
of
each
token
produces
the
delivery
of
a
discrete
portion
of
food.
Also,
the
tokens
can
be
handled
during
the
ex-
change
delay,
and
a
particular
response
of
564
A
REVIEW
OF
POSITIVE
CONDITIONED
REIVFORCEMENT
manipulating
the
token
is
reinforced
by
food
delivery.
The
effectiveness
of
token
reinforcers
can
be
compared
directly
with
that
of
food.
Wolfe
(1936)
compared
the
speed
with
which
chim-
panzees
pulled
a
weighted
lever
when
each
response
was
reinforced
by
a
token
or
piece
of
food.
Ten
reinforced
responses
comprised
each
session.
The
tokens
could
be
exchanged
for
food
immediately,
and
the
rate
of
responding
was
corrected
to
allow
for
the
time
spent
in
exchanging
tokens.
Wolfe
summarized:
"For
three
out
of
four
subjects
the
tokens
were
about
as
effective
as
was
food
in
eliciting
a
work
task
when
measured
either
in
terms
of
doing
a
fixed
task
or
in
terms
of
amount
of
work
done
within
a
stated
time
interval.
The
fourth
subject
did
more
and
faster
work
for
food
than
for
tokens"
(1936,
p.
71).
Kelleher
(1957a)
compared
food
and
tokens
as
reinforcers
of
the
behavior
of
chimpanzees
under
FR
60.
The
chimpanzees
had
long
ex-
perimental
histories
on
schedules
of
token
re-
inforcement.
For
the
first
few
sessions
the
tokens
maintained
higher
response
rates
than
food,
probably
because
of
the
animals'
experi-
mental
histories
on
schedules
of
token
rein-
forcement.
In
later
sessions
foodl
maintained
higher
response
rates
than
tokens.
Indeed,
as
the
animals
developed
higher
response
rates
for
food
the
patterns
of
responding
for
tokens
were
completely
disrupted.
In
other
experiments
(Kelleher,
1956;
1957b;
1958b)
the
exchange
ratios
were
50
and
60
tokens.
The
schedules
of
token
rein-
forcement
were
fixed-ratio
schedules
requiring
from
20
to
125
responses.
Both
animals
de-
veloped
the
biphasic
patterns
of
responding
that
are
characteristic
of
simple
FR
schedules
(Ferster
and
Skinner,
1957),
i.e.,
the
animal
was
either
pausing,
after
a
reinforcement,
or
responding
at
a
very
high
rate
(the
"running
rate").
The
pauses
were
relatively
brief
under
FR
20
and
FR
30
but
became
prolonged
at
FR
60,
FR
100
and
FR
125.
For
example,
the
chimpanzees
usually
paused
more
than
2
hr
at
the
beginning
of
sessions
on
FR
125.
After
they
had
received
several
tokens,
they
paused
less
and
responded
at
the
running
rate.
The
response
patterns
of
chimpanzees
on
large
FR
schedules
(FR
60
to
FR
125)
of
token
reinforcement
(Kelleher,
1
958b)
are
compar-
able
to
the
response
patterns
of
pigeons
on
ex-
tended
chained
FR
schedules
(Ferster
and
Skinner
1957,
p.
682).
In
the
token
experi-
ments
the
number
of
tokens
in
the
animal's
possession
usually
increases
as
the
exchange
period
approaches.
As
the
animal
has
no
tokens
at
the
start
of
the
session,
but
has
many
tokens
at
the
time
of
exchange,
the
number
of
tokens
could
be
a
discriminative
stimulus
with
respect
to
food
delivery
com-
parable
to
the
component
stimuli
of
a
chained
schedule.
On
extended
chained
FR
schedules,
if
the
stimulus
associated
with
the
last
com-
ponent
of
the
chain
is
presented
at
the
start
of
the
chain,
this
stimulus
will
control
high
response
rates
until
a
new
discrimination
de-
velops.
Kelleher
(1958b)
investigated
whether
similar
factors
control
pausing
in
the
token
reinforcement
studies.
Chimpanzees
that
had
been
trained
on
an
FR
125
schedule
of
token
reinforcement
received
50
tokens
at
the
start
of
a
session,
but
still
had
to
obtain
another
50
tokens
before
any
of
them
could
be
ex-
changed
for
food
(cf.
Wolfe,
1936).
The
results
obtained
with
one
chimpanzee
are
shown
in
Fig.
8.
The
initial
periods
of
pausing,
which
were
usually
more
than
2
hr
(record
A),
were
almost
completely
abolished
(record
B).
By
comparison,
S,
generated
high
response
rates
even
if
presented
immediately
after
reinforce-
ment
on
extended
chained
FR
schedules
with
pigeons;
the
presence
of
many
tokens
at
the
start
of
the
session
on
the
FR
schedule
of
token
reinforcement
generated
high
response
rates
with
chimpanzees.
The
tokens
are
discrimina-
tive
stimuli
as
well
as
conditioned
reinforcers.
2.
Fl
in
Each
Component
Gollub
(1958)
investigated
extended
chained
Fl
schedules
of
reinforcement.
In
one
study
pigeons
were
exposed
first
to
simple
CRF,
Fl
2,
or
Fl
10
schedules
and
then
to
a
five-
component
chained
schedule
with
Fl
2
in
each
component.
During
the
first
session
on
the
five-component
chain,
the
birds
responded
at
moderate
rates
in
each
component.
Within
20
experimental
hr,
however,
extensive
paus-
ing
developed
in
the
first
component
(S5);
average
response
rates
were
higher
in
each
successive
component,
and
positively
acceler-
ated
responding
developed
in
those
compo-
nents
in
which
responding
was
maintained.
Gollub
explains
these
results
on
the
basis
of
the
temporal
contingencies
generated
by
chained
schedules,
the
interactions
between
schedule
and
responding,
and
the
effects
of
565
ROGER
T.
KELLEHER
and
LEWIS
R.
GOLLUB
NO.
117
+
35
MIN.
B
10
MHUTES
Fig.
8.
Record
A
shows
performance
on
the
FR
125
schedule
of
token
reinforcement
at
an
exchange
ratio
of
50
tokens;
periods
of
pausing
have
bcen
omitted
as
indicated.
Record
B
shows
the
following
session
in
which
the
chimpanzee
was
given
50
poker
chips
at
the
start
of
the
session
(from
Kelleher,
1958b).
the
stimuli
in
the
chain.
The
contingencies
of
reinforcement
in
Fl
10
and
in
a
five-compo-
nent
chain
with
Fl
2
in
each
component
differ
in
three
important
ways.
First,
on
simple
Fl
10
the
first
response
after
10
min
is
reinforced
by
food;
on
the
five-component
chain
with
Fl
2
in
each
component,
at
least
five
responses
must
be
emitted
before
reinforcement.
Second,
no
two
pauses
in
Fl
10
can
both
postpone
food-delivery;
however,
pauses
in
different
components
of
the
five-component
chain
will
postpone
food-delivery
additively.
Third,
the
components
of
the
chain
are
stimuli
which
might
function
as
an
"added
clock"
(Ferster
and
Skinner,
1957).
Because
S.5
is
never
closely
followed
by
food
delivery,
it
might
act
like
an
unoptimal
setting
of
the
"added
clock"
and
produce
pausing.
Gollub
also
used
tandem
control
proce-
dures.
One
pigeon
was
trained
on
a
five-com-
ponent
tandem
schedule
with
Fl
1
in
each
component.
Average
pauses
of
7
min
and
6
min
developed
in
S,
andI
S4
respectively.
Average
response
rates
were
higher
in
succes-
sive
components
of
the
tandem
schedule,
and
each
of
the
last
three
components
was
com-
pleted
in
about
1
min.
When
the
schedule
was
changed
to
a
five-component
chained
schedule
with
Fl
1
in
each
component,
the
performance
changed
markedly.
Average
pauses
in
S
in-
creased
to
about
20
min
while
average
pauses
in
S4
decreased
to
about
2
min.
Records
from
the
fourth
session
on
the
five-component
chain
are
shown
in
Fig.
9.
These
results
indicate
that
S4
was
not
an
effective
conditioned
rein-
forcer
for
responding
in
S,
but
that
S3
was
a
reinforcer
for
responding
in
S4.
As
noted
above,
when
Gollub
exposed
pigeons
first
to
two-component
tandem
Fl
schedules
and
then
to
two-component
chained
Fl
schedules,
re-
sponse
rates
in
the
first
component
of
the
chains
increased
when
the
minimum
time
between
food
reinforcements
was
5
min.
On
the
chained
schedule
with
five
components,
however,
prolonged
pauses
developed
in
the
first
component.
Also,
on
five-component
chained
Fl
schedules
after
simple
reinforce-
ment
histories,
prolonged
pausing
(leveloped
in
the
first
component.
These
results
con-
sistently
suggest
that
the
early
stimuli
in
ex-
tended
chained
Fl
schedules
are
not
condi-
tioned
reinforcers.
Gollub's
results
on
a
series
of
chained
Fl
schedules
are
summarized
in
Fig.
10.
To
facilitate
the
development
of
condi-
tioned
reinforcing
effects
of
early
stimuli
in
extended
Fl
chains,
Gollub
first
established
behavior
with
two-component
chained
sched-
ules,
and
then
added
components
singly.
"For
all
the
subjects,
a
chain
of
FIs
was
reached
on
which
responding
in
the
initial
components
was
not
maintained.
The
num-
ber
of
components
in
the
schedule
which
failed
to
maintain
responding
depended
upon
the
duration
of
the
components.
Chains
with
three
short
(30")
FIs
main-
tained
responding,
but
four-
and
five-com-
ponent
chains
of
Fl
30"
did
not.
With
longer
FIs
(1
to
3
min),
responding
was
not
maintained
in
the
initial
component
of
a
three-component
chain
even
when
that
566
A
REVIEW
OF
POSITIVE
CONDITIONED
REINFORCEMENT
component
was
a
very
short
interval.
Al-
though
other
factors
are
important
in
main-
110
taining
responding
on
chained
schedules,
the
values
of
the
parameters
of
the
schedule
2
COMPONENTS
have
a
great
effect"
(Gollub,
1958,
p.
55).
'MO
Gollub's
results
showed
that
the
initial
go
-
stimuli
in
two-
or
three-component
chained
F
Fl
schedules
were
discriminative
stimuli
con-
>
,
V)
w
70
70
z
0
60
w
SO
a.
50
0
w-
40
z
4
50
Ia
567
INITIAL
SECOND
THIRD
FOURTH
FIFTH
Fig.
10.
Mean
response
rates
in
chains
of
different
lengths
with
FI
0.5
in
each
component.
The
parameter
of
each
curve
is
the
number
of
components
in
the
chain
(from
Gollub,
1958).
Fig.
9.
Cumulative
records
showing
chain
Fl
1
Fl
1
FI
1
Fl
1
Fl
1.
Each
segment
shows
a
complete
chain.
Records
of
the
initial
component
have
been
removed;
the
duration
of
the
initial
component,
in
minutes,
is
shown
at
the
left
of
each
segment.
Pips
indicate
the
termination
of
each
component
(from
Gollub,
1958).
trolling
moderate
response
rates.8
However,
these
stimuli
were
not
conditioned
reinforcers
for
responding
in
a
preceding
component.
Gollub
concludes
that
his
results
contradict
the
hypothesis
that
a
discriminative
stimulus
in
necessarily
a
conditioned
reinforcer.
Gollub's
results
with
three-component
chained
Fl
schedules
have
recently
been
con-
firmed
in
experiments
by
Kelleher
&
Fry
(1962).
On
three-component
chained
schedules
with
Fl
1
or
Fl
1.5
in
each
component,
pro-
longed
pausing
developed
in
the
first
com-
ponent,
and
scallops
developed
in
the
second
and
third
components.
Neither
pauses
nor
scallops
developed
on
comparable
tandem
schedules.
Because
the
tandem
and
chained
schedules
have
the
same
response
and
rein-
"We
have
previously
defined
a
discriminative
stimulus
as
one
in
the
presence
of
which
reinforcement
is
con-
tingent
on
an
operant
response.
The
reinforcement
can
be
either
primary
or
conditioned.
ROGER
T.
KELLEHER
and
LETVIS
R.
GOLLUB
forcement
contingencies,
the
differences
in
performance
must
be
due
to
the
sequence
of
stimuli
on
the
chained
schedules.
In
another
part
of
the
study
Kelleher
&
Fry
presented
the
component
stimuli
in
a
variable
sequence.
Re-
sponse
and
reinforcement
contingencies
re-
mained
the
same,
but
the
component
stimuli
appeared
in
different
orders
between
food
reinforcements.
Under
the
variable
stimulus
sequence,
higher
average
response
rates
were
developed
and
maintained
in
the
first
com-
ponent.
The
performances
were
not
equiv-
alent
to
those
that
had
been
maintained
on
the
tandem
schedule,
however,
because
scal-
lops
developed
in
each
component.
These
scallops
indicate
that
the
stimulus
changes
were
conditioned
reinforcers.
With
the
vari-
able
stimulus
sequence
the
stimuli
appeared
equally
often
in
each
position
of
the
chain,
and
no
stimulus
was
consistently
associated
with
the
first
component
of
the
chain.
Never-
theless,
the
first
component
always
followed
food
delivery,
and
the
birds
were
discrimi-
nating
the
stimulus
changes.
Probably
the
most
important
characteristic
of
the
variable
stimulus
sequence
is
that
each
stimulus
was
intermittently
terminated
by
food
delivery.
This
intermittent
pairing
with
primary
re-
inforcement
would
maintain
the
conditioned
reinforcing
effect
of
each
stimulus.
Also,
one-
third
of
the
presentations
of
each
stimulus
are
terminated
by
food
reinforcement
of
a
re-
sponse.
This
consequence
would
also
tend
to
maintain
responding
in
each.
Thus,
the
vari-
able
stimulus
sequence
makes
the
extended
chained
Fl
schedules
more
comparable
to
Fl
schedules
of
token
reinforcement,
in
which
each
token
is
eventually
paired
with
food.
Fixed-interval
schedules
of
token
reinforce-
ment.
Kelleher
(1957c)
trained
chimpanzees
to
press
a
lever
on
an
Fl
5
schedule
of
food
re-
inforcement.
The
Fl
5
schedule
generated
characteristic
scalloped
patterns
of
responding.
The
schedule
for
two
chimpanzees
was
changed
from
Fl
5
with
food-delivery
to
Fl
5
with
token
delivery.
Exchange
intervals
dur-
ing
the
5-hr
session
were
1
hr.
The
response
rates
with
token
reinforcers
were
very
low
and
the
animals
ceased
responding
after
13
hr.
Next,
the
chimpanzees
were
reconditioned
by
permitting
the
immediate
exchange
of
each
token,
and
were
then
required
to
accumulate
groups
of
tokens
before
exchange;
the
ex-
change
ratio
was
gradually
increased
from
two
to
eight
tokens.
When
2,
3,
4,
or
6
tokens
were
required, the
response
rates
were
directly
re-
lated
to
the
temporal
proximity
of
exchange
for
food;
i.e.,
as
the
animals
accumulated
tokens,
response
rates
increased.
Character-
istic
scallops
developed
in
most
intervals.
When
the
exchange
ratio
was
eight
tokens,
the
animals
stopped
responding.
Although
these
experiments
indicate
that
tokens
are
relatively
weak
reinforcers
when
delivered
under
Fl
5,
the
tokens
did
maintain
respond-
ing
in
both
subjects
at
exchange
intervals
as
long
as
40
min.
The
results
obtained
with
the
chimpanzees
on
Fl
schedules
of
token
reinforcement
are
comparable
in
many
ways
to
the
results
ob-
tained
with
pigeons
on
extended
chained
Fl
schedules.
A
record
from
one
chimpanzee
appears
in
Fig.
11;
the
Fl
component
follow-
ing
food
delivery
is
characterized
by
low
response
rates
or
prolonged
pausing.
Average
rates
increase
in
each
of
the
successive
Fl
components,
and
scallops
appear
in
many
components.
The
similarity
between
these
data,
from
chimpanzees
reinforced
with
tokens,
to
the
data
from
pigeons
reinforced
with
colored
lights,
is
striking.
This
re-
semblence
strengthens
the
generality
of
the
concept
of
conditioned
reinforcement
in
the
analysis
of
both
phenomena.
3.
VI
in
Each
Component
Gollub
directly
compared
chained
Fl
sched-
ules
with
chained
VI
schedules.
For
example,
pigeons
were
exposed
to
five-component
chains
with
VI
1
in
each
component
after
they
had
been
exposed
to
five-component
chains
with
Fl
1
in
each.
Performances
on
five-component
chains
of
VI
1
and
Fl
1
are
shown
in
Fig.
12.
More
responding
was
maintained
by
the
chained
VI
schedules
(records
A
and
B)
than
by
the
chained
Fl
schedules
(records
C
and
D).
The
scallops
that
occurred
within
components
on
Fl
were
replaced
by
fairly
constant
re-
sponse
rates
on
VI.
Gollub
mentions
two
factors
that
account
for
the
differences
be-
tween
chained
VI
and
Fl
schedules.
First,
the
Fl
schedules
in
each
component
tend
to
gen-
erate
pauses
which
are
additive
in
delaying
reinforcement,
and
these
delays
weaken
the
chain.
On
the
other
hand,
the
component
VI
schedules
tend
to
maintain
responding,
and
the
chain
is
not
likely
to
weaken.
Second,
be-
cause
short
intervals occasionally
occur
in
each
568
A
REVIEW
OF
POSITIVE
CONDITIONED
REINTFORCEMENT
A
,
.
PAUSES
C
45
MIN
-
'
PAUSES
90
MIN
_
__
.
-0
9
2
0
MINUTES
I
Fig.
11.
Cumulative
response
records
showing
performance
of
a
chimpanzee
on
Fl
5
schedules
of
token
re-
inforcement
with
exchange
ratios
of
three
tokens
(Record
A),
four
tokens
(Record
B),
and
six
tokens
(Record
C).
Pips
indicate
token
deliveries;
arrows
indicate
exchanges
of
tokens
for
food.
(From
Kelleher,
1957c,
cour-
tesy
of
J.
coinp.
physiol.
Psychol.)
component
VI
schedule,
even
early
stimuli
are
occasionally
followed
by
food
after
only
a
short
delay.
The
short
intervals
on
chained
VI
schedules
may
strengthen
the
conditioned
reinforcing
effects
of
the
early
stimuli
more
than
long
intervals
weaken
the
effects.
4.
Different
Schedules
in
Each
Component
Multiple
Schedule
of
Token
Reinforcement.
Kelleher
(1957b)
exposed
chimpanzees
first
to
an
Fl
5
schedule
of
token
reinforcement
and
then
to
a
mult
Fl
5
FR
20
schedule
of
token
reinforcement.
The
exchange
ratio
was
grad-
ually
increased
to
60
tokens.
Under
this
multi-
ple
schedule,
the
animal
had
to
complete
30
Fl's
and
30
FR's;
the
actual
exchange
intervals
were
200
to
270
min.
Both
animals
maintained
substantial
overall
response
rates
(5.33
and
6.02
responses
per
min).
Overall
response
rates
increased
as
each
session
proceeded;
as
the
animals
accumulated
tokens
they
re-
sponded
faster.
Although
overall
response
rates
increased
throughout
each
session,
the
chimpanzees
re-
sponded
differently
in
the
different
compo-
nents
of
the
multiple
schedule.
Early
in
each
session,
the
FR
performances
were
severely
strained;
Fl
response
rates
were
low,
but
Fl
segments
were
never
prolonged.
In
the
last
hour
of
each
session,
performances
were
simi-
lar
to
performances
that
would
be
expected
on
mult
Fl
5
FR
20
schedules
of
food
rein-
forcement.
Thus,
tokens
are
conditioned
re-
inforcers
which
can
maintain
appropriate
performances
on
different
schedules
of
rein-
forcement
even
when
the
schedules
alternate
between
exchanges
for
food.
The
difference
between
performances
early
and
late
in
each
session
suggests
either
that
the
number
of
tokens
that
have
been
delivered
acts
as
a
stim-
ulus
which
depresses
response
rates
early
in
each
session
or
that
tokens
delivered
in
the
last
hour
of
the
session
have
a
greater
condi-
tioned
reinforcing
effect.
z
0
m
6
1r
569
4
I
I
ROGER
T.
KELLEHER
and
LEWIS
R.
GOLLUB
Kelleher's
results
with
token
reinforcement
differ
strikingly
from
the
earlier
work
by
Wolfe
(1936).
Wolfe
used
FR
1
schedules
of
token
reinforcement
and
two
different
sched-
ules
of
exchange.
In
Wolfe's
"limited
work"
technique
(1936,
p.
28
ff.),
for
example,
chim-
panzees
received
a
token
for
each
response
on
a
weighted
lever,
but
could
not
exchange
the
token
until
after
an
interval.
The
exchange
interval
was
increased
until
the
animals
failed
to
respond
within
5
min
on
three
trials.
The
duration
of
the
intervals
beyond
which
sub-
jects
failed
to
respond
were
20
min
for
one
subject,
1
hr
for
another
subject,
and
more
than
1
hr
for
two
other
subjects.
Wolfe's
re-
sults
showed
that
the
tokens
were
adequate
conditioned
reinforcers
for
maintaining
per-
formance
on
FR
1
despite
exchange
intervals
of
20
to
60
min.
In
Wolfe's
"unlimited
work"
technique
(1936,
p.
33
ff.)
the
chimpanzees
had
10-min
sessions
on
an
FR
1
schedule
of
token
rein-
forcement.
Tokens
could
be
exchanged
for
food
only
at
the
end
of
the
session.
The
great-
est
number
of
responses
made
in
any
session
was
28
for
one
subject,
and
most
responses
occurred
in
the
first
few
minutes
of
each
ses-
sion.
Response
rates
tended
to
decrease
over
the
10
sessions
on
this
procedure.
In
the
last
session
the
number
of
responses
was
12
for
one
subject
and
three
for
the
other.
In
a
sec-
ond
experiment
with
the
"unlimited
work"
technique,
the
animals
were
given
5,
15,
or
30
tokens
just
before
each
session.
These
tokens,
as
well
as
those
obtained
in
the
session,
could
be
exchanged
only
at
the
end
of
the
session.
When
the
chimpanzees
were
given
tokens
before
a
session,
one
subject
responded
infre-
quently
and
the
other
did
not
respond.
Wolfe
(1936)
attributed
the
results
to
"token-satia-
tion"
(see
also
Miller,
1951).
Fig.
12.
Record
A
shows
cumulative
response
records
from
the
16th
session
on
chain
VI
1
VI
1
VI
1
VI
1
VI
1.
Following
the
session
shown
in
Record
A,
the
schedule
was
changed
to
chain
Fl
1
Fl
1
FI
1
Fl
1
FI
1.
Records
B,
C,
and
D
show
the
first,
fourth,
and
sixth
sessions
following
the
change.
Pips
indicate
change
from
one
schedule
component
to
the
next.
Small
dots
above
the
record
indicate
food
deliveries
(from
Gollub,
1958).
570
A
REVIEW
OF
POSITIVE
CONDITIONED
REINFORCEMENT
In
summary,
Wolfe
found
that
chimpanzees
which
had
tokens
would
stop
working,
and
Kelleher
found
that
under
several
schedules
of
token
reinforcement
chimpanzees
responded
at
a
high
rate
when
they
had
tokens.
Wolfe
possibly
anticipated
a
result
opposite
from
the
one
he
obtained,
and
at
one
point
sug-
gested:
"With
unlimited
time
in
which
to
do
a
fixed
amount
of
work
one
might
expect
on
the
basis
of
the
goal
gradient
hypothesis
that
the
work
rate
would
be
positively
acceler-
ated
in
that
the
more
tokens
the
subject
had
secured
the
nearer
he
would
be
to
his
pri-
mary
goal,
food"
(1936,
p.
69).
In
several
of
Kelleher's
experiments
the
chimrpanzees
did
have
practically
unlimited
time
in
which
to
earn
a
specified
number
of
tokens.
However,
in
other
experiments
(Kelle-
her,
1957b)
the
animals
could
exchange
their
poker
chips
after
exchange
intervals
of
35
to
120
min.
In
both
types of
experiment
response
rates
increased
throughout
the
interval.
The
effects
of
exchange
interval
and
exchange
ratio
should
be
directly
compared.
Probably
some
special
aspect
of
Wolfe's
pro-
cedure
caused
the
animals
to
stop
responding
when
they
had
tokens.
Wolfe
indicates
that
one
of
his
animals
in
the
"unlimited
work"
situation
played
with
tokens
when
he
was
not
pulling
the
lever.
Wolfe
does
not
mention
whether
animals
attempted
to
force
tokens
through
the
shutter
that
blocked
them
from
exchanging
tokens.
Because
the
token-rein-
forced
response
in
Wolfe's
experiments
re-
quired
much
energy,
behavior
involving
the
token
might
decrease
response
rates.
Kelleher
(1958,
p.
288)
noted
that
his
animals
were
very
inactive
at
the
start
of
each
session,
but
became
extremely
active
when
they
had
nu-
merous
tokens,
and
continually
manipulated
several
tokens
with
one
hand
and
rattled
others
in
their
mouths.
All
this
activity
was
accompanied
by
high
rates
of
responding.
Physical
characteristics
of
the
tokens
may
be
important
aside
from
their
possible
role
as
discriminative
stimuli.
The
number
of
tokens
in
the
animal's
possession
varies
di-
rectly
with
the
amount
of
food
that
will
be
forthcoming.
Also,
the
informal
observations
of
Wolfe
(1936)
and
Kelleher
(1956,
1957b)
suggest
that
manipulation
of
the
tokens
with
hands
and
mouth
may
be
relevant.
Results
with
token
reinforcement
could
be
compared
with
results
with
other
types
of
conditioned
reinforcer.
For
example,
it
would
be
possible
to
use
an
"added
counter"
that
would
be
com-
parable
to
the
tokens.
Responses
on
one
key
would
be
reinforced
by
increases
in
the
length
of
a
slit.
When
the
slit
reached
a
specified
length
or
when
a
specified
time
interval
had
elapsed,
responses
on
a
second
key
would
pro-
duce
both
food
and
a
corresponding
decrease
in
the
length
of
the
slit.
Such
experiments
could
determine
whether
the
results
obtained
in
the
token
reinforcement
studies
depend
upon
the
species
used
or
upon
other
variables
such
as
the
handling
of
tokens
during
ex-
change
delays.
Token
reinforcement
studies
differ
from
other
chained
schedules
in
several
other
ways.
In
Kelleher's
studies,
for
example,
a
token
(or
tokens)
plus
a
red
light
was
the
stimulus
com-
plex
which
indicated
that
a
particular
re-
sponse
(inserting
the
token
in
a
slot)
would
be
reinforced
by
food.
The
presence
of
the
token
(or
tokens)
without
the
red
light
was
not
ade-
quate,
and
the
animals
did
not
insert
tokens
until
the
red
light
appeared.
On
the
other
hand,
if
no
tokens
had
been
obtained,
the
food-reinforced
response
could
not
be
emitted
when
the
red
light
appeared.
The
token
re-
inforcement
studies
have
all
used
FR
1
sched-
ules
of
food-delivery;
in
the
presence
of
the
appropriate
stimulus
every
token-insertion
produces
food.
Other
schedules
of
food-deliv-
ery
should
be
studied.
D.
CONDITIONED
REINFORCEMENT
IN
COMPLEX
SCHEDULES
1.
Observing
Response
Experiments
In
his
analysis
of
discrimination
learning,
Wyckoff
(1952)
distinguished
between
re-
sponses
that
produce
food
and
responses
that
produce
discriminative
stimuli.
Wyckoff
re-
ferred
to
responses
of
the
latter
type
as
ob-
serving
responses.
Although
most
analyses
have
dealt
with
observing
responses
theoret-
ically
(e.g.,
Spence,
1940),
Wyckoff
(1952)
de-
veloped
an
ingenious
technique
for
explicitly
studying
observing
responses
in
pigeons.
Pe-
riods
in
which
key-pecking
was
reinforced
by
food
on
Fl
0.5
alternated
unpredictably
with
extinction
(ext)
periods.
The
response
key
re-
571
ROGER
T.
KELLEHER
and
LEWIS
R.
GOLLUB
mained
white
throughout
both
Fl
and
ext
periods
unless
the
pigeon
pressed
a
pedal.
While
the
pedal
was
depressed,
the
key
was
either
red
or
green;
red
was
correlated
with
Fl
periods,
green
with
ext
periods.
The
dura-
tions
of
the
pedal-pressing
responses
increased
as
the
two
colors
developed
control
of
pecking,
but
decreased
when
the
correlation
of
stimuli
and
schedules
was
reversed
or
when
the
stim-
uli
were
not
correlated
with
Fl
and
ext
periods.
An
analysis
of
this
complex
procedure
sug-
gests
that
it
is
a
type
of
chained
schedule.
If
the
bird
does
not
press
the
pedal,
the
schedule
is
mix
Fl
0.5
ext.9
If
the
bird
presses
the
pedal,
the
schedule
changes
to
a
mnult
Fl
0.5
ext.
The
bird's
behavior
determines
whether
it
is
ex-
posed
to
the
mixed
or
the
multiple
schedule
of
reinforcement;
the
pattern
of
observing
re-
sponses
should
then
indicate
the
conditioned
reinforcing
effectiveness
of
the
stimuli
asso-
ciated
with
the
multiple
schedule.
Using
similar
procedures,
Kelleher
(1958a)
studied
observing
responses
in
chimpanzees.
9A
mixed
(mix)
schedule
is
one
in
which
the
same
ex-
teroceptive
stimulus
is
associated
with
different
sched-
ules
of
reinforcement.
Rs
FR(30)
Periods
in
which
pressing
one
lever
(the
food
response)
produced
food
on
VR
100
alternated
unsystematically
in
time
with
periods
of
ext.
A
stimulus
window
remained
dark
throughout
both
types
of
period
unless
the
chimpanzee
pressed
a
second
lever.
A
response
on
the
sec-
ond
lever
(the
observing
response)
produced
either
a
red
or
blue
stimulus
for
30
sec.
If
the
schedule
changed
during
this
time,
the
stim-
ulus
changed
accordingly.
The
red
light
was
correlated
with
VR;
the
blue
with
ext.
In
some
of
Kelleher's
experiments,
observing
re-
sponses
produced
discriminative
stimuli
on
FR
schedules.
Performances
in
which
observ-
ing
responses
produced
tIe
discriminative
stimuli
on
FR
30
and
FR
60
are
shown
in
Fig.
13.
Kelleher
also
found
that
observing
responses
did
not
occur
when
they
failed
to
produce
the
discriminative
stimuli
or
when
the
stimuli
were
not
correlated
with
VR
and
ext
periods.
A
similar
analysis
is
applicable
to
these
ex-
periments.
A
chimpanzee
could
remain
on
mix
VR
ext
or
respond
on
the
observing
lever
and
produce
30
sec
of
mult
VR
ext.
Since
observ-
ing
responses
are
maintained,
we
conclude
that
the
appearance
of
the
stimuli
correlated
125
a
10
MINUTES
Fig.
13.
Representative
observing
response
records
of
a
chimpanzee
when
observing
responses
produced
the
discriminative
stimulus
on
FR
30
and
FR
60
schedules.
The
records
did
not
run
during
30-second
periods
in
which
the
red
or
blue
stimulus
was
on
(from
Kelleher,
1958a).
572
A
REVIEW
OF
POSITIVE
CONDITIONED
REINFORCEMENT
with
the
multiple
schedule
are
conditioned
reinforcers.
It
is
important
to
note
that
ob-
serving
responses
do
not
alter
the
programmed
frequency
of
reinforcement.
The
VR
100
and
ext
schedules
alternate
in
time
in
a
pre-
arranged
pattern
independently
of
observing
responses.
Of
course,
observing
responses
pro-
duce
either
the
blue
(ext)
stimulus,
indicating
a
zero
probability
of
reinforcement,
or
the
red
(VRIOO)
stimulus,
indicating
a
0.01
proba-
bility
of
reinforcement.
If
no
observing
re-
sponses
occur,
the
probability
of
reinforce-
ment
is
intermediate
between
zero
and
0.01.
The
results
suggest
that
the
presentation
of
the
two
discriminative
stimuli,
one
correlated
with
zero
probability
of
reinforcement,
the
other
with
0.01
probability,
has
a
conditioned
reinforcing
effect.
Since
the
observing
response
is
maintained
by
the
production
of
this
pair
of
stimuli
in
place
of
a
single
stimulus
corre-
lated
with
an
average
frequency
of
food
re-
inforcement
equal
to
that
of
the
pair,
it
might
be
concluded
that
the
combined
conditioned
reinforcing
effectiveness
of
the
pair
is
greater
than
that
of
the
single
stimulus.
Prokasy
(1956)
found
a
similar
result
with
rats
trained
in
a
T-maze.
The
rats
were
given
food
on
half
of
the
trials
at
random
in
either
goal
box
after
a
forced
delay
of
30
sec
in
a
chamber
in
the
arm
of
the
maze.
In
the
"consistent"
arm
of
the
maze
the
delay
chamber
was
one
color
(black
or
white)
on
reinforced
trials
and
the
other
color
on
unreinforced
trials.
In
the
"in-
consistent"
arm
the
color
of
the
delay
chamber
was
not
correlated
with
reinforcement.
It
was
found
that
the
rats
turned
more
frequently
into
the
consistent
arm.
Wyckoff
(1959)
inter-
prets
this
type
of
result
as
indicating
that
the
function
relating
conditioned
reinforcing
effectiveness
and
reinforcement
frequency
must
be
positively
accelerated
over
some
of
its
range.
2.
Switching
Experiments
Two
mutually
exclusive
responses
can
be
maintained
by
two
independent
schedules
of
food
reinforcement
programmed
concurrently.
On
such
a
concurrent
schedule
the
subject
can
work
on
one
schedule
to
the
complete
or
partial
exclusion
of
the
other,
or
it
can
alter-
nate
(switch)
back
and
forth
between
the
two
schedules.
Concurrent
schedules
involving
two
responses
are
not
usually
considered
to
be
chains
because
switching
behavior,
like
ob-
serving
behavior,
is
usually
inferred.
Findley
(1958)
made
switching
behavior
explicit
by
requiring
a
specific
switching
response.
The
frequency
of
the
switching
response
could
be
recorded
independently
of
responses
that
were
reinforced
by
food.
Findley
(1958)
trained
pigeons
to
peck
on
each
of
two
response
keys.
On
one
key,
the
food
key,
red
and
green
stimuli
appeared
al-
ternately.
The
colors
alternated
whenever
a
response
occurred
on
the
second
key,
the
switching
key.
For
example,
the
red
and
green
stimuli
were
correlated
with
VI
6
and
VI
2
schedules,
respectively;
one
schedule
and
its
correlated
stimulus
remained
in
effect
until
a
switching
response
occurred.
Technically,
this
is
mult
VI
6
VI
2
in
which
the
bird
con-
trols
when
the
components
alternate.
The
inter-reinforcement
intervals
for
each
VI
schedule
were
timed
continuously
until
a
re-
inforcement
was
available;
however,
reinforce-
ments
were
delivered
only
when
the
appro-
priate
schedule
(and
its
correlated
stimulus)
was
in
effect.
For
two
pigeons
VI
6
was
pro-
grammed
in
the
red
stimulus,
while
the
other
schedule
was
systematically
varied
from
VI
2
to
VI
20
in
the
green
stimulus.
Both
schedules
for
another
pigeon
were
varied
simultaneously
-in
green,
from
VI
2
to
VI
20,
and
in
red,
from
VI
20
to
VI
2.
The
results
were
that
the
percentage
of
time
in
green
decreased
as
the
frequency
of
reinforcement
in
green
decreased.
When
identical
VI
schedules
occurred
in
red
and
green,
the
percentage
of
time
in
each
color
was
about
50%.
If
the
switching
response
is
maintained
by
the
conditioned
reinforcing
effects
of
the
stim-
ulus
that
it
produces,
Findley's
results
suggest
that
the
conditioned
reinforcing
effect
of
a
stimulus
is
directly
related
to
the
frequency
of
food-delivery
correlated
with
it.
Unfortu-
nately,
the
conditioned
reinforcing
effects
in
Findley's
study are
confounded
with
other
events
because
both
VI
schedules
were
timed
continuously.
If
a
reinforcement
became
avail-
able
on
the
VI
associated
with
the
absent
color,
the
first
food
response
which
occurred
after
a
switching
response
would
be
reinforced.
This
contingency
would
establish
a
simple
chain,
composed
of
a
switching
response
fol-
lowed
by
food
response,
and
this
chain
would
be
frequently
reinforced
(cf.
Kelleher,
1958).
Findley
tested
this
possibility
by
timing
the
inter-reinforcement
intervals
only
in
the
VI
573
ROGER
T.
KELLEHER
and
LEWVIS
R.
GOLLUB
schedule
that
was
in
effect.
Under
this
proce-
dure,
switching
rates
declined
markedly,
and
even
when
the
VI
schedules
were
equal
in
red
and
green,
the
bird
tended
to
remain
in
one
color.
The
results
indicate
that
the
"acci-
dentally"
established
response
chain
main-
tained
the
switching
behavior
in
these
para-
metric
experiments.
Findley
(1958)
also
investigated
switching
behavior
under
Fl
and
FR
schedules
of
re-
inforcement
on
the
food
key.
Following
each
reinforcement
in
the
continued
presence
of
one
color,
the
frequency
or
probability
of
reinforcement
in
that
color
was
decreased
by
increasing
the
parameter
value
of
the
schedule
by
a
fixed
amount.
These
schedules
are
re-
ferred
to
as
progressive
interval
and
progres-
sive
ratio
schedules
because
of
this
progressive
change.
For
example,
in
a
green
stimulus
the
schedule
was
FR
101
for
the
first
reinforce-
ment,
FR
201
for
the
second,
FR
301
for
the
third,
and
so
on.
Each
response
on
the
switch-
ing
key
changed
the
color
on
the
food
rein-
forcement
key,
and
reset
the
food
reinforce-
ment
schedule
to
its
minimal
value.
Accumu-
lated
responses
on
FR,
or
elapsed
time
on
Fl,
which
occurred
before
a
switch,
were
erased
by
a
switching
response.
Considerable
train-
ing
was
needed
to
establish
responding
under
these
conditions,
and
the
development
of
re-
sponding
varied
widely,
depending
upon
the
experimental
history
of
the
pigeon.
After
establishing
stable
behavior,
Findley
investigated
the
effects
of
several
variables:
the
amount
of
schedule
parameter
increment
with
each
reinforcement;
the
number
of
switching
responses
required
to
change
colors;
different
schedule
increments
in
each
of
the
stimuli
on
the
food
reinforcement
key;
a
different
num-
ber
of
responses
required
to
switch
from
red
to
green
from
that
required
to
switch
from
green
to
red;
and
different
types
of
schedule,
Fl
and
FR,
in
each
of
the
stimuli.
Briefly,
Findley
found
that:
(1)
The
larger
the
incre-
ment
in
schedule
parameter
per
reinforce-
ment,
the
higher
the
rate
of
switching
(chang-
ing
the
color
and
resetting
the
programmer
to
its
minimum
value).
The
switching
rate
was
an
increasing
monotonic
fu'nction
of
the
con-
stant
with
which
successive
inter-reinforce-
ment
intervals
in
Fl
were
increased,
and
the
pauses
that
followed
reinforcement
grew
longer
under
the
Fl
schedules
with
a
2-
or
4-min
increment.
(2)
The
main
effect
of
re-
quiring
a
large
number
of
switching
responses
to
change
colors
(when
the
food
reinforcement
schedule
was
a
progressive
FR)
was
that
the
pigeon
worked
for
longer
periods
of
time
in
each
color
before
switching.
The
behavior
on
the
food
key
was
not
disturbed
by
increasing
the
response
requirement
on
the
switching
key.
(3)
If
the
increments
were
greater
in
the
presence
of
one
color
than
in
the
other,
the
pigeon
worked
for
longer
periods
of
time
in
the
color
associated
with
the
smaller
incre-
ments.
(4)
Pigeons
worked
for
longer
periods
of
time
in
the
stimulus
in
which
the
switch-
ing
requirement
was
greater
under
progressive
schedules.
For
example,
one
pigeon
was
under
the
progressive
Fl
schedule
with
the
values
of
0.75,
1.5,
3,
6,
etc.,
and
30
responses
were
re-
quired
to
switch
from
red
to
green,
while
one
response
was
required
to
switch
from
green
to
red.
The
bird
spent
about
90%
of
the
time
in
red,
the
color
in
which
the
switching
require-
ment
was
greater.
(5)
No
differential
switching
effects
were
found
between
progressive
FR
in
one
color
and
progressive
Fl
in
the
other
color
until
more
than
2000
reinforcements
had
been
received.
Under
the
initial
set
of
values
se-
lected
(FR
101
with
increments
of
100
and
Fl
0.75
with
increments
of
0.75
min),
more
time
was
spent
in
the
color
correlated
with
Fl.
When
nine
switching
responses
were
required
to
switch
from
FR
to
Fl
and
only
one
switch-
ing
response
to
change
from
Fl
to
FR,
more
time
was
spent
in
the
FR
color.
In
what
way
do
conditioned
reinforcement
and
chained
schedules
help
account
for
these
results?
Findley
suggests
that
the
switching
re-
sponse
is
maintained
because
the
appearance
of
a
new
color
is
a
conditioned
reinforcer.
In
these
experiments
a
response
on
the
switch-
ing
key
cannot
be
followed
closely
by
the
de-
livery
of
food.
The
change
in
key
color
is,
however,
correlated
with
a
more
optimal
schedule
of
food
reinforcement
(a
smaller
re-
sponse
requirement
on
FR
or
more
frequent
interval
reinforcement).
This
is
shown
by
the
result
that
when
the
schedule
did
not
change
after
reinforcement
in
either
color,
the
switch-
ing
rate
fell
to
near-zero
levels.
Probably
it
was
difficult
to
establish
stable
behavior
under
the
progressive
schedules
because
it
was
diffi-
cult
to
establish
discriminative
control
be-
tween
the
different
schedules
of
food
reinforce-
ment
before
and
after
a
switching
response.
The
relevant
discriminative
stimulus
was
not
574
A
REVIEWV
OF
POSITIVE
CONDITIONED
REINFORCEMNIENT5
one
or
the
other
color,
but
change
in
color.
In
many
of
these
experiments
the
same
less
optimal
schedules
were
programmed
in
the
presence
of
each
color.
It
is
probably
difficult
to
establish
a
(liscrimination
when
the
stim-
ulus
for
the
(liscl
imination
is
"change
from
red
to
green"
an(d
"change
from
green
to
re(l."
If
the
occasion
of
and
reinforcement
for
switching
are
ma(le
clearer,
switching
should
be
more
easily
maintaine(l.
In
Fin(dley's
final
experiment,
a
"time-out"
followe(d
each
food
reinforcement,
but
the
"time-out"
could
be
terminated
by
a
switch-
ing
response.
This
procedure
resulted
in
maxi-
mum
switching
rates.
The
"time-out"
had
be-
come
the
discriminative
stimulus
for
switch-
ing
responses,
an(l
the
appearance
of
the
new
stimulus
was
the
conditione(d
reinforcer
that
maintained
switching
responses
(luring
the
"time-out."
These
experiments
imply,
then,
that
in
switching
and
schedule
preference,
conditions
under
which
the
organism
is
"free"
to
choose
between
different
activities,
switch-
ing
"is,
in
effect,
only
a
complex
sequence
where
behavior
associated
with
each
[stimulus
condition]
is
chained
in
a
not-so-obvious
fash-
ion
to
an
intermediate
operant."
(Findley,
1958,
p.
143.)
3.
Implicit
Chains
Skinner
(1948)
has
demonstrated
that
when
a
reinforcing
stimulus
(food)
is
regularly
pre-
sented
at
brief
intervals
indepen(lently
of
an
animal's
behavior,
some
identifiable
response
will
be
conditionedl.
This
conditioning
will
occur
because
of
an
interacting
set
of
events
in
this
situation.
The
reinforcing
stimulus
will
increase
the
frequency
of
any
response
that
precedes
it
andl
this
response
will
then
be
more
likely
to
occur
just
before
a
subsequent
re-
inforcement.
In
his
(liscussion
of
these
results
Skinner
notes:
"The
results
throw
some
light
on
incidlental
behavior
observed
in
experi-
ments
in
which
a
discriminative
stimulus
is
frequently
presented.
Such
a
stimulus
has
re-
inforcing
value
and
can
set
up
superstitious
behavior"
(Skinner,
1948).
The
development
of
superstitious
behavior
by
conditioned
reinrforcers
occurs
frequently
in
studies
of
other
phenomena.
For
example,
on
schedules
of
food
reinforcement
one
may
wish
to
minimize
the
after-effects
of
eating
by
scheduling
a
brief
time
out
following
food
delivery.
If
the
animal
responds
occasionally
(luring
the
time-out
period,
a
response
may
occur
just
before
the
appearance
of
the
(lis-
criminative
stimulus
for
the
schedule
of
food
reinforcement.
If
this
(liscriminative
stimulus
is
a
con(litione(l
reinforcer,
it
will
increase
the
frequency
of
respon(ling
(luring
the
time
out.
This
increase(d
response
frequency
will
in-
crease
the
probability
that
another
response
will
occur
near
the
end
of
the
time-out
period
just
before
the
(liscriminative
stimulus
ap-
pears.
In
this
fashion,
substantial
rates
of
re-
spondling
(luring
time-out
perio(ls
can
be
(le-
veloped
an(l
maintained
by
conditioned
re-
inforcement.
Morse
(1955)
investigated
superstitious
chains
develope(l
by
con(litione(l
reinforce-
nent.
Pigeons
were
trained
on
either
mailt
VI
3
(green)
FR
25
(red)
or
mailt
FR
25
(green)
FR
25
(red).
When
performances
were
stable,
both
schedules
were
changed
to
m7alt
ext
(green)
FR
25
(red).
The
ext
component
was
1)resente(l
for
a
fixed
time;
i.e.,
the
appearance
of
the
red
stimulus
(correlated
with
FR
25),
occurre(d
in(lepen(lently
of
responses
in
the
green
stimulus,
(correlated
with
ext).
If
the
red
stimulus,
which
is
presumably
a
condi-
tione(d
reinforcer,
happens
to
appear
just
after
responses
in
the
green
stimulus,
it
might
increase
the
frequency
of
responding
in
the
green
stimulus.
During
the
first
session
of
malt
ext
FR
25,
response
rates
in
the
green
stimulus
decreased
rapidly
for
the
birds
that
had
been
traine(l
on
mnalt
FR
25
FR
25.
On
the
other
hand,
response
rates
in
the
green
stimulus
de-
creasedl
slowly
for
the
birds
trained
on
matlt
VI
3
FR
25.
Thus,
adventitious
correlations
of
responses
in
the
green
stimulus
with
the
appearance
of
the
red
stimulus
had
a
higher
probability
with
the
birds
that
had
been
re-
inforced
on
VI
3.
Morse's
results
showed
that
the
pigeons
trained
on
VI
3
developed
"Fl-
like"
performances
in
the
green
stimulus;
the
pigeons
trained
on
FR
25
responded
infre-
quently
in
the
green
stimulus,
and
the
tem-
poral
pattern
of
responding
was
S-shaped.
If
responding
in
the
green
stimulus
had
been
maintained
by
adventitious
correlations
with
the
appareance
of
Sl,
this
responding
could
be
extinguished
by
eliminating
the
ad-
ventitious
correlations.
Morse
prevented
these
correlations
by
providing
a
2-min
time-out
(TO)
between
the
green
stimulus
and
the
red
stimulus.
He
found
that
response
rates
in
the
green
stimulus
were
consistently
lower
when
575}
ROGER
T.
KELLEHER
and
LEWIS
R.
GOLLUB
the
green
stimulus
was
followed
by
TO
than
when
it
was
followed
immediately
by
the
red
stimulus.
In
other
experiments,
Morse
(1955)
rein-
forced
pigeons
on
mult
Fl
15
FR
55,
and
then
on
muilt
ext
FR
55.
The
ext
stimulus
(S2)
which
was
presented
for
15-min
periods,
con-
trolled
substantial
rates
of
responding.
These
response
rates
were
maintained
by
the
appear-
ance
of
the
stimulus
(Si)
correlated
with
FR
55
even
though
S,
was
presented
independently
of
responses
in
S2*
Morse's
experiments
demon-
strate
that
conditioned
reinforcers,
like
pri-
mary
reinforcers,
can
develop
and
maintain
substantial
amounts
of
behavior
when
their
presentation
occurs
in
close
temporal
con-
tiguity
with
any
response.
E.
CONDITIONED
REINFORCEMENT
AND
HIGHER
ORDER
CONDITIONING
The
distinction
between
respondent
and
operant
chains
(Skinner,
1938)
has
received
little
attention
in
recent
analyses
of
condi-
tioned
reinforcement.
This
distinction,
which
enables
avoidance
of
some
misconceptions
about
conditioned
reinforcers,
is
useful
for
categorizing
procedures
that
establish
condi-
tioned
reinforcers.
In
a
respondent
chain,
a
stimulus
is
established
as
a
conditioned
(elicit-
ing)
stimulus
by
pairing
it
with
an
uncondi-
tioned
(eliciting)
stimulus.
It
is
then
paired
with
another
(new)
stimulus
in
an
attempt
to
extend
the
chain
by
developing
another
con-
ditioned
stimulus.
This
procedure
is
known
as
higher-order
conditioning
(cf.
Pavlov,
1927).
Many
reinforcement
theorists
(e.g.,
Hull,
1943;
Miller,
1951;
Spence,
1951;
and
Osgood,
1953)
apparently
accept
higher-order
conditioning
as
a
model
for
all
conditioned
reinforcement.
These
theorists
cite
Frolov's
investigation
of
higher-order
conditioning
(described
by
Pav-
lov,
1927)
as
providing
the
empirical
basis
for
the
concept
of
conditioned
reinforcement.
Frolov
conditioned
a
dog
to
salivate
at
the
sound
of
a
metronome.
Subsequently
he
paired
a
black
square
("neutral"
stimulus)
with
the
sound
of the
metronome
(conditioned
stimulus
for
salivation).
On
each
trial
the
black
square
was
presented
to
the
dog
for
10
sec
then
removed.
The
metronome
was
presented
15
sec
later
and
was
sounded
for
30
sec.
After
several
presentations,
in
which
no
food
was
given
to
the
dog,
the
black
square
alone
elic-
ited
salivation.
These
results
were
interpreted
as
showing
that
the
sound
of
the
metronome
(a
conditioned
stimulus)
had
itself
become
a
reinforcer
for
the
salivary
response.
Three
aspects
of
the
Frolov
experiment
de-
serve
emphasis.
First,
as
Pavlov
noted,
the
temporal
relationship
between
the
presenta-
tion
of
the
square
and
the
metronome
is
crit-
ical.
Unless
the
black
square
is
withdrawn
15
sec
before
the
sound
of the
metronome,
conditioning
cannot
be
accomplished.
Sec-
ondly,
the
strength
of
conditioned
salivation
to
the
black
square
was
low,
as
indicated
by
the
small
magnitude
of
salivation.
Finally,
the
sound
of
the
metronome
alone
was
consist-
ently
followed
by
the
presentation
of
food,
but
when
the
black
square
preceded
the
sound
of
the
metronome
food
was
not
presented.
As
Skinner
has
noted
(1938),
this
procedure
would
be
used
to
establish
a
"conditional"
discrimination
in
which
the
black
square
would
inhibit
salivation.
In
view
of
these
qualifications,
it
is
surprising
that
the
Frolov
experiment
has
been
so
frequently
accepted
as
the
principal
experimental
demonstration
of
conditioned
reinforcement.
In
addition,
since
a
complete
description
of
Frolov's
experiment
is
not
available,
we
do
not
know
whether
appropriate
controls
were
established
for
phenomena
such
as
sensitiza-
tion.
This
is
especially
important
in
view
of
the
peculiar
temporal
parameters
which
Pav-
lov
cites
as
necessary
for
the
result.
Indeed,
the
temporal
arrangement
of
the
stimuli
is
quite
dissimilar
from
the
optimal
arrangement
for
producing
conditioned
responses.
Razran
(1955)
has
presented
a
brief
review
of
experiments
on
higher-order
conditioning.
The
results
of
these
experiments
are
generally
consistent
with
what
might
be
expected
from
the
points
discussed
in
the
previous
para-
graph.
Razran
notes
that:
is...
as
directly
observed
in
the
laboratory,
second-order
conditioning
is
a
rather
un-
common
and
unsuitable
phenomenon.
Of
the
347
Russian
reports
on
classical
condi-
tioning
of
dogs
and
monkeys
that
the
writer
has
managed
to
see,
only
eight
offer
accounts
of
any
degree
of
successful
second-order
con-
ditioning.
Much
more
commonly-in
63
separate
investigations-non-reinforced
pair-
ings
of
conditioned
and
non-conditioned
stimuli
are
reported
as
resulting
in
a
con-
576
A
REVIEW
OF
POSITIVE
CONDITIONED
REINFORCEMENT
ditioned
inhibition;
that
is
second-order
to-
be-conditione(d
stimuli,
instead
of
evoking
a
CR
come
to
interfere
with
it
when
they
are
applied
simultaneously
or
shortly
before
the
first-order
conditioned
stimuli"
(Razran
1955,
p.
327).
In
a
subsequent
review,
however,
Razran
(1960)
cites
several
recent
experiments
in
which
higher-order
conditioning
was
clearly
demonstrated
with
interoceptive
stimuli.
All
of
these
recent
experiments
used
negative
pri-
mary
reinforcers
to
establish
the
conditioned
reinforcer.
The
status
of
higher-order
condi-
tioning
established
with
positive
primary
re-
inforcers
remains
in
doubt.
We
conclude
that
it
can
be
accomplished
only
under
certain
ideal
conditions
(cf.
Pavlov,
1927).
Further
experiments
should
be
conductedl
to
deter-
mine
the
generality
of
this
phenomenon.
Despite
the
difficulty
of
establishing
higher-
order
conditioning
with
respondent
condi-
tioning
procedures,
similar
phenomena
have
been
established
with
operant
procedures.
That
is,
in
chained
schedules
with
three
or
more
components
at
least
one
stimulus
be-
comes
a
conditioned
reinforcer
without
any
direct
pairing
with
food.
For
example,
in
a
three-component
chained
schedule,
S,
is
a
con-
ditioned
reinforcer
and
will
reinforce
re-
sponses
in
S2
because
of
its
pairing
with
food;
S2
is
a
conditioned
reinforcer
for
responses
in
S3,
but
has
never
been
paired
directly
with
food,
only
with
Sl.
In
the
terms
introduced
by
Pavlov,
S2
is
a
second-order
conditioned
stim-
ulus
(reinforcer).
Since
there
have
been
many
experiments
in
which
one
or
more
conditioned
reinforcers
have
been
established
in
this
way,
it
must
be
asked
why
a
higher-order
conditioning
proce-
dure
is
more
successful
in
operant
than
in
respondent
behavior.
First,
response
topog-
raphies
in
the
two
situations
have
been
differ-
ent.
Key-pecking
and
lever-pressing
have
been
studied
with
operant
procedures,
and
saliva-
tion
with
respondent;
these
responses
differ
in
many
dimensions
which
may
be
relevant
for
the
effects
of
reinforcement-e.g.,
difficulty
or
amount
of
work,
and
type
of
neural
inner-
vation.
Second,
the
conditioning
procedures
differ.
In
the
respondent
case
the
reinforcer
is
contingent
on
a
stimulus,
in
the
operant,
on
a
response
in
the
presence
of
a
stimulus.
Third,
in
the
operant
situation
with
chained
schedules,
responding
is
maintained
because
each
chain
terminated
with
primary
reinforce-
ment.
The
respondent
procedure
has
been
conducted
under
primary
extinction
(no
food).
Possibly,
a
respondent
chain
which
always
terminates
with
the
presentation
of
the
US
would
establish
higher-order
conditioned
stim-
uli.
Since
this
procedure
has
not
been
used,
the
possibility
of
its
success
is
conjectural.
How
does
S2
become
a
conditioned
rein-
forcer?
Minimally,
it
must
be
paired
with
Sl,
an
established
conditioned
reinforcer.
Also,
as
there
is
firm
evidence
for
higher-order
con-
ditioning
only
with
operant
chains,
the
pair-
ing
of
S2
and
SI
must
be
contingent
on
a
response.
These
minimal
criteria
for establish-
ing
higher-order
conditioned
reinforcers
re-
semble
the
respondent
case,
and
suggest
how
the
phenomenon
may
be
investigated
there.
Attempts
to
explain
how
any
conditioned
re-
inforcer
is
established
will
be
discussed
in
the
following
section.
They
do
not,
however,
cast
any
light
on
this
problem.
In
summary,
extended
chain
schedules
es-
tablish
stimuli
as
conditioned
reinforcers
through
pairing
with
other
conditioned
re-
inforcers,
rather
than
with
a
primary
rein-
forcer.
These
conditioned
reinforcers
resemble
higher-order
conditioned
stimuli
in
respond-
ent
conditioning.
But
higher-order
condition-
ing
is
not
a
firmly
established
principle
in
respondent
conditioning,
especially
with
posi-
tive
reinforcers.
More
research
is
needed
in
respondent
conditioning
before
either
the
similarities
or
differences
in
these
phenomena
can
be
analyzed.
III.
CONDITIONED
REINFORCEMENT
AND
EXPERIMENTAL
EXTINCTION
Many
investigators
have
assessed
the
effects
of
conditioned
reinforcers
only
after
the
pri-
mary
reinforcer
has
been
excluded
from
the
experiment.
We
shall
refer
to
this
class
of
techniques
as
the
extinction
procedure.
The
popularity
of
this
procedure
presumably
comes
from
the
assumption
that
a
test
for
the
re-
inforcing
effects
of
a
stimulus
should
be
con-
ducted
only
when
known
reinforcers
have
been
excluded
from
the
test
situation.
Thus,
the
extinction
procedure
eliminates
one
source
of
confounding
effects,
but
at
the
cost
of
in-
troducing
others.
Previous
reviews
have
dealt
almost
exclusively
with
experiments
that
used
577
ROGER
T.
KELLEHER
and
LEWIS
R.
GOLLUB
the
extinction
procedure
(Miller,
1951;
Myers,
1958).
In
the
present
review,
we
describe
the
extinction
procedure
and
present
a
critical
analysis
of
recent
experimental
results.
Some
recent
experiments
have
been
performed
to
show
that
factors
other
than
conditioned
re-
inforcement
could
account
for
results
with
the
extinction
procedure.
These
will
be
dis-
cussed
in
Part
A
of
this
section.
The
extinc-
tion
procedure
has
also
been
used
extensively
in
attempts
to
determine
the
necessary
and
sufficient
conditions
for
establishing
a
condi-
tioned
reinforcer.
These
will
be
discussed
in
Part
B.
The
reader
who
is
interested
in
other
analyses
of
experiments
with
the
extinction
procedure
should
refer to
the
earlier
reviews.
As
noted
previously,
magazine
training
and
lever
training
are
accomplished
by
establish-
ing
a
chain
of
responses.
One
response
in
this
chain,
approaching
the
food
receptacle,
is
controlled
by
the
stimulus
associated
with
food-delivery.
In
the
extinction
procedure
pos-
sible
conditioned
reinforcing
effects
of
this
stimulus
are
assessed
by
breaking
the
chain.
Investigations
with
the
extinction
procedure
have
used
two
general
techniques,
the
estab-
lished
response
technique
and
the
new
re-
sponse
technique.
In
both
techniques
the
chain
is
broken
by
omitting
its
final
member
(food
delivery).
In
the
established
response
technique
the
conditioned
reinforcing
effects
of
a
stimulus
are
assessed
by
comparing
the
effects
of
break-
ing
an
established
chain
at
each
of
two
differ-
ent
points.
For
example,
Bugelski
(1938)
trained
rats
to
press
a
lever
for
food;
each
food-delivery
was
immediately
preceded
by
a
distinct
click.
The
rats
were
then
divided
into
two
groups
for
extinction
with
respect
to
food;
responses
by
the
control
group
had
no
effects;
responses
by
the
experimental
group
produced
the
click.
The
results
were
that
the
experi-
mental
group
responded
more
than
the
con-
trol
group.
Bugelski
concluded
that
the
en-
hanced
resistance
to
extinction
showed
that
the
click
was
a
"sub-goal"
(conditioned
rein-
forcer).
Although
the
maintenance
of
respond-
ing
under
these
conditions
is
usually
taken
as
evidence
for
conditioned
reinforcement,
alternative
interpretations
have
been
proposed
to
show
that
the
click
may
have
effects
other
than
those
of
reinforcing
the
response.
The
new
response
technique
assesses
the
conditioned
reinforcing
effects
of
a
stimulus
by
adding
a
new
first
component
to
the
chain
at
the
same
time
the
final
component
(food-
delivery)
is
omitted.
For
example,
in
an
early
experiment
by
Skinner
(1938),
a
distinct
click
that
occurred
when
the
food-magazine
oper-
ated
became
a
stimulus
for
approaching
the
food
receptacle.
Next,
a
lever
was
introduced,
and
the
click
occurred
whenever
the
rat
pressed
the
lever;
the
click,
however,
was
not
followed
by
food-delivery.
Skinner
found
that
lever-pressing
was
established
by
the
click
and
he
concluded
that
the
click
was
a
conditioned
reinforcer
which
could
be
used
to
condition
a
new
response.
Although
these
examples
had
FR
1
in
each
component
of
the
established
chain,
this
re-
striction
is
not
necessary;
intermittent
rein-
forcement
can
be
introduced
at
any
point
in
the
chain
during
training
or
during
extinc-
tion.
Many
of
the
experiments
discussed
in
this
section
of
the
review
have
introduced
in-
termittency
into
the
chained
schedule
at
one
or
more
points.
A.
HYPOTHESES
THAT
CONDITIONED
REINFORCEMENT
Is
NOT
NECESSARY
FOR
EXPLAINING
THESE
RESULTS
With
the
extinction
procedure,
the
effec-
tiveness
of
the
conditioned
reinforcing
stimu-
lus
is
being
extinguished
while
it
is
being
evaluated.
Experiments
using
the
established
response
procedure
have
consistently
shown
that
the
click
enhances
resistance
to
extinc-
tion;
however,
the
effect
is
usually
small.
Sev-
eral
experimenters
have
suggested
that
the
concept
of
conditioned
reinforcement
is
not
relevant
to
these
results.
Three
alternative
hypotheses
have
been
offered.
1.
Facilitation
Hypothesis
Wyckoff,
Sidowski
and
Chambliss
(1958)
sug-
gest
that
investigators
have
failed
to
distin-
guish
between
the
effects
of
the
click
as
a
con-
ditioned
reinforcer
and
the
effects
of
the
click
as
a
discriminative
stimulus.
The
discrimina-
tive
effects
of
a
stimulus
determine
the
rates
and
patterns
of
responding
that
prevail
in
the
presence
of
the
stimulus;
the
conditioned
re-
inforcing
effects
of
a
stimulus
determine
the
rates
and
patterns
of
responding
preceding
the
stimulus
(Dinsmoor,
1950;
Schoenfeld,
Antonitis,
and
Bersh,
1950).
Wyckoff
et
al.
(1958)
attempted
to
isolate
the
possible
conditioned
reinforcing
effects
of
a
578
A
REVIEW
OF
POSITIVE
CONDITIONED
REINFORCEMIENT
stimulus
associated
with
a
primary
reinforcer
from
its
discriminative
effects.
One
of
their
experiments
used
a
variation
of
the
estab-
lished
response
procedure.
Water-deprived
rats
pressed
a
lever,
thereby
producing
a
dipper
of
water.
No
distinct
stimulus
was
associated
with
the
delivery
of
water
during
this
training.
Next,
the
lever
was
removed
and
the
rats
were
trained
to
approach
and
lick
the
dipper
at
the
sound
of
a
buzzer.
The
dipper
was
dry,
but
when
the
animal
licked
it,
the
dipper
operated
and
presented
a
drop
of
water;
the
sound
of
the
buzzer
terminated
when
the
dipper
was
touched.
Licking
the
dipper
in
the
presence
of
the
buzzer
was
reinforced
on
VI
16
sec
by
the
operation
of
the
dipper.
Wyckoff
et
al.
believed
that
introducing
intermittency
at
this
point
by
means
of
the
VI
schedule
would
ensure
that
the
licking
response
would
be
resistant
to
extinction.
After
all
ani-
mals
had
been
trained
to
lick
the
dipper
in
the
presence
of
the
buzzer,
the
lever
was
again
introduced
and
experimental
extinction
began.
For
the
experimental
group,
every
lever-
press
produced
the
sound
of
the
buzzer.
For
the
control
group,
the
buzzer
did
not
sound
unless
the
animal
had
not
pressed
the
lever
for
10
sec.
Control
and
experimental
animals
received
the
same
number
of
buzzer
presenta-
tions.
The
groups
did
not
differ
in
the
number
of
lever
presses
that
occurred
during
a
50-min
extinction
session.
In
this
experiment,
as
well
as
another
one
in
which
the
new
response
procedure
was
used,
Wycoff
et
al.
concluded
that
there
were
no
conditioned
reinforcing
effects
when
the
discriminative
stimulus
effects
of
the
buzzer
were
controlled.
This
experiment
differs
from
the
usual
ex-
periment
in
three
ways.
First,
as
the
authors
note,
"Licking
served
as
an
operant
which
produced
reinforcement
as
well
as
being
the
consummatory
response
after
reinforcement
was
delivered."
Second,
as
Crowder,
Morris
and
McDaniel
(1959)
have
noted,
licking
was
reinforced
on
a
16-sec
variable-interval
sched-
ule;
thus,
intervals
between
buzzer
onset
and
water
presentation
were
not
optimal
for
es-
tablishing
a
conditioned
reinforcer
(cf.
Bersh,
1951).
Third,
there
are
indications
that
buzzer
termination
was
a
more
effective
conditioned
reinforcer
than
buzzer
onset.
When
the
dry
dipper
was
licked,
the
buzzer
terminated,
but
an
unspecified
period
of
time
elapsed
before
the
dipper
appeared
again
with
water.
Indeed,
Wyckoff
et
al.
noted
that
during
training
"some
subjects
had
apparently
learned
to
wait
for
buzzer
termination
before
touching
the
dipper."
The
conditions
which
prevailed
at
reinforcement
have
not
been
clearly
specified.
In
fact,
the
sound
of
the
buzzer
was
absent
at
the
time
of
water-delivery.
Thus,
the
stimu-
lus
which
was
probably
the
most
potent
con-
ditioned
reinforcer,
buzzer
termination,
was
not
the
one
that
was
made
contingent
upon
lever
pressing.
Crowder
et
al.
(1959)
compared
the
reinforc-
ing
and
discriminative
effects
of
a
buzzer-light
stimulus
paired
with
food
by
using
a
yoked-
box
control
with
the
established
response
pro-
cedure.
First,
rats
were
trained
to
press
a
lever
for
food.
During
experimental
extinction,
pairs
of
rats
were
tested
in
separate
but
con-
nected
experimental
chambers.
When
the
ex-
perimental
rat
pressed
its
lever
the
buzzer-
light
stimulus
was
presented
to
both
the
ex-
perimental
rat
and
the
paired
control
rat.
When
the
control
rat
pressed
its
lever,
the
buzzer-light
stimulus
was
not
presented.
Dur-
ing
a
50-min
extinction
session,
the
experi-
mental
rats
responded
almost
twice
as
often
as
control
rats.
These
results,
which
are
in-
compatible
with
the
interpretations
of
Wyck-
off
et
al.
(1958),
as
well
as
those
of
Bugelski
(1956)
(see
below),
demonstrate
the
condi-
tioned
reinforcing
effect
of
the
stimulus
asso-
ciated
with
food-delivery
even
when
the
elicit-
ing
and
the
discriminative
effects
of
this
stim-
ulus
have
been
controlled.
An
experiment
by
Skinner
(1938,
pp.
106-
108),
conducted
to
show
the
effects
of
delayed
conditioned
reinforcement,
employed
an
in-
teresting
variation
of
the
established
response
procedure.
Rats
that
had
been
trained
on
an
Fl
schedule
of
reinforcement
were
studied
in
experimental
extinction.
During
the
initial
phase
of
extinction,
responses
did
not
produce
the
click.
When
response
rates
were
low,
the
click
was
presented
only
when
the
rat
had
not
responded
for
a
brief
period
of
time
(4
or
6
sec).
This
introduction
of
the
click
produced
a
temporary
increase
in
response
rates.
When
response
rates
became
low
again,
the
delay
was
removed,
and
each
response
produced
the
click.
In
the
majority
of
the
rats,
the
removal
of
the
delay
produced
an
immediate
increase
in
response
rates,
and
a
third
extinction
curve.
These
results
are
opposed
to
both
the
elicita-
579
ROGER
T.
KELLEHER
and
LEWIS
R.
GOLLUB
tion
hypothesis
(see
below)
and
the
facilitation
hypothesis.
In
other
rats,
however,
the
third
extinction
curve
was
not
obtained.
Possibly
the
conditioned
reinforcing
effect
of
the
click
ex-
tinguished
too
rapidly
under
the
conditions
of
this
experiment.
In
one
recent
experiment
with
the
extinc-
tion
procedure,
Kelleher
(1961)
employed
a
technique
similar
to
the
one
used
by
Skinner.
Each
of
the
two
pigeons
in
Kelleher's
experi-
ment
served
as
its
own
control.
The
birds
were
trained
on
an
Fl
5
schedule.
Each
food
de-
livery
was
immediately
preceded
by
a
distinct
click.
When
performance
had
been
established
on
Fl
5,
experimental
extinction
was
started.
During
extinction,
patterns
and
rates
of
re-
sponding
were
manipulated
by
presenting
the
click
on
different
reinforcement
schedules.
Un-
der
one
schedule,
the
click
occurred
only
if
the
pigeons
had
not
responded
for
a
brief
period
of
time
(5
or
10
sec).
On
this
schedule
response
rates
were
low.
When
the
click
was
produced
on
an
FR
schedule,
response
rates
were
high.
When
the
click
was
produced
on
an
Fl
5
schedule,
there
was
an
initial
pause
followed
by
an
increasing
rate
in
each
interval.
These
three
schedules
for
presenting
the
click
were
alternated
during
extinction,
and
interactions
between
the
schedules
were
observed.
When
the
schedule
was
changed
from
FR
to
Fl,
for
example,
the
characteristic
initial
Fl-pause
did
not
occur
at
once;
rather,
responding
occurred
at
the
high
rates
that
had
been
characteristic
on
the
FR
schedule.
Thus,
depending
on
the
schedule,
either
a
very
low
or
a
very
high
rate
followed
the
presentation
of
the
click.
Because
the
same
stimulus
(the
click)
was
used
to
gen-
erate
each
response
pattern,
these
results
also
oppose
both
the
elicitation
hypothesis
and
the
facilitation
hypothesis.
The
facilitation
hypothesis
has
also
been
ap-
plied
to
results
obtained
with
the
new
re-
sponse
technique.
Wyckoff
(1959,
p.
69)
notes:
"The
use
of
this
paradigm
in
a
Skinner
box
situation
has
typically
yielded
either
no
sec-
ondary
reinforcing
effects
at
all
or
effects
which
just
barely
reach
statistical
significance
(Bersch,
1951;
Estes,
1949;
Schoenfeld
et
al.,
1950;
Wyckoff
et
al.
1958)."
In
brief,
Wyckoff
suggests
that
positive
results
are
obtained
only
when
the
click
causes
increased
activity
in
the
vicinity
of
the
lever
and
increases
the
proba-
bility
that
the
animal
will
press
the
lever.
Using
the
new
response
procedure,
Wyckoff
et
al.
(1958),
were
not
able
to
demonstrate
conditioned
reinforcement.
Myers
(1958),
who
analyzed
this
experiment
in
detail,
concluded
that
the
negative
results
occurred
because
the
lever
and
the
reinforcement
magazine
were
at
opposite
ends
of
the
experimental
chamber;
i.e.,
the
facilitative
effects
of
the
magazine
were
minimized.
Our
analysis
of
the
Wyckoff
et
al.
experiment
with
the
established
response
pro-
cedure
was
presented
above.
The
same
analysis
applies
to
their
experiment
with
the
new
re-
sponse
procedure.
Crowder,
Gill,
Hodge,
and
Nash
(1959)
used
the
new
response procedure,
while
controlling
for
response
facilitation.
During
extensive
magazine
training,
a
buzzer
and
light
briefly
preceded
each
food-delivery.
Pairs
of
rats
were
then
tested
in
a
yoked-box
procedure;
when
the
experimental
rat
pressed
its
lever,
the
buzzer
and
light
were
presented
to
both
the
experimental
and
the control
rat,
but
no
food
was
delivered.
Lever-presses
by
the
control
rat
had
no
effects.
The
experimental
rats
responded
significantly
more
than
the
con-
trols.
In
this
experiment,
the
conditioned
rein-
forcing
effects
of
the
buzzer
and
light
were
demonstrated
even
when
response
facilitation
was
controlled.
2.
Elicitation
Hypothesis
Bugelski
(1956),
who
has
reinterpreted
his
earlier
results
(Bugelski,
1938),
believes
that
most
investigators
have
failed
to
distinguish
between
the
effects
of
the
click
as
a
condi-
tioned
reinforcer
and
its
effects
as
an
eliciting
stimulus.
For
example,
Bugelski
suggests:
"The
click
is
not,
then,
a
secondary
rein-
forcer
in
any
sense,
particularly
that
of
a
reward
or
sub-goal;
it
is
not
a
discriminated
stimulus....
It
is
rather
a
part
of
a
pattern
that
evokes
a
particular
response.
Every
part
of
the
bar-press
behavior
is
controlled
by
stimuli
of
a
direct
or
unconditioned
na-
ture
or
by
conditioned
stimuli.
The
click
is
such
a
conditioned
stimulus.
To
talk
about
'operants'
is
futile
business
if
we
are
to
arrive
at
any
understanding
of
behavior"
(Bugelski,
1956,
p.
93).
Bugelski
infers
that
the
chain
of
events
that
occurs
when
a
rat
presses
a
lever
for
food
is
a
respondent
chain.
According
to
Bugelski's
analysis,
the
click
is
a
conditioned
stimulus
that
elicits
lever-pressing.
Thus,
animals
that
receive
the
click
in
extinction
will
respond
580
A
REVIEW
OF
POSITIVE
CONDITIONED
REINFORCEMENT
more
frequently
than
animals
that
do
not
re-
ceive
it.
Bugelski
does
not
distinguish
between
operant
and
respondent
conditioning.
Al-
though
this
review
is
not
the
proper
place
to
consider
in
detail
this
view
of
behavior,
its
relevance
to
conditioned
reinforcement
can
be
discussed.
Bugelski
does
not
specify
the
unconditioned
stimulus
which
initially
elicits
lever-pressing
(cf.
Skinner
1938),
nor
does
he
consider
the
effects
of
schedules
of
reinforcement.
When
animals
have
been
trained
on
Fl
or
DRL
schedules
of
reinforcement,
for
example,
the
magazine-click
is
characteristically
followed
by
a
zero
response
rate
(Ferster
and
Skinner,
1957).
Obviously,
the
click
does
not
elicit
re-
sponding
on
these
schedules.
For
example,
Skinner
used
an
Fl
schedule
during
training
in
one
early
experiment
with
the
established
response
technique
(Skinner,
1938);
it
is
prob-
able
that
responding
occurred
infrequently
following
the
click
and
food-delivery
during
training.
Nevertheless,
responding
occurred
more
frequently
in
the
extinction
records
when
responses
produced
the
click
(see
also,
Kelleher
1961).
The
empirical
data
contradict
the
elicitation
hypothesis.
3.
Discrimination
Hypothesis
The
discrimination
hypothesis
states
that
the
amount
of
responding
in
extinction
is
simply
a
function
of
similarity
between
con-
ditions
of
training
and
extinction.
Strong
ex-
perimental
evidence
for
the
discrimination
hypothesis
is
provided
by
two
experiments
(Bitterman,
Fedderson,
and
Tyler,
1953;
and
Elam,
Tyler,
and
Bitterman,
1954).
These
experiments
have
been
thoroughly
described
in
previous
reviews
(Walker,
1957;
Myers,
1958;
Mowrer,
1960).
Some
experiments
can
be
as
easily
explained
by
the
discrimination
hypothesis
as
by
the
conditioned
reinforcement
concept.
Myers
(1958)
has
noted,
for
example,
that
in
Bugel-
ski's
(1938)
experiment,
the
experimental
group
produced
the
click
during
both
train-
ing
and
extinction
but
the
control
group
did
not.
Stimulus
conditions
in
extinction
are,
therefore,
more
similar
to
those
in
condition-
ing
for
the
experimental
group,
and
the
con-
cept
of
conditioned
reinforcement
is
unneces-
sary
to
explain
the
results.
Other
experiments,
however,
indicate
that
enhanced
resistance
to
extinction
produced
by
a
stimulus
which
had
been
associated
with
food-delivery
involves
more
than
the
similar-
ity
between
the
conditioning
and
extinction
situations.
Skinner
(1938)
has
demonstrated
that
the
click
is
a
conditioned
reinforcer,
but
some
of
his
experimental
results
do
not
sup-
port
the
(liscrimination
hypothesis.
When
re-
sponses
during
extinction
produced
the
click,
a
single,
negatively
accelerated
extinction
curve
was
obtained.
When
responses
during
extinction
did
not
produce
the
click
in
the
first
phase
of
extinction,
but
did
produce
the
click
in
the
second
phase,
two
successively
negatively
accelerated
extinction
curves
were
obtained.
Subjects
extinguished
under
this
two-phase
procedure
consistently
responded
more
in
extinction
than
subjects
extinguished
when
every
response
produced
the
click.
Since
the
stimulus
conditions
in
extinction
were
more
similar
to
those
in
conditioning
for
sub-
jects
extinguished
with
the
click
throughout
extinction,
Skinner's
(1938)
results
are
incom-
patible
with
the
discrimination
hypothesis.
The
previously
described
results
of
Kelle-
her's
(1961)
experiment
are
also
incompatible
with
the
discrimination
hypothesis.
The
pigeons
in
this
experiment
had
been
trained
on
an
Fl
5
schedule
of
food
reinforcement.
Thus,
presenting
the
click
on
Fl
5
in
extinc-
tion
was
a
condition
very
similar
to
training.
However,
the
results
showed
that
more
re-
sponding
occurred
when
the
click
was
pre-
sented
on
an
FR
schedule.
Experimental
results
relevant
to
the
dis-
crimination
hypothesis
fall
into
three
groups.
One
group
of
experiments
(Bitterman,
et
al
1953;
and
Elam,
et
al
1954)
favors
the
discrim-
ination
hypothesis.
As
Mowrer
concluded
after
a
thorough
analysis
of
these
experiments
(Mowrer,
1960,
pp.
462-468),
they
do
support
the
discrimination
hypothesis,
but
they
do
not
impugn
the
conditioned
reinforcement
concept.
The
second
group
of
experiments
(for
example,
Bugleski,
1938)
is
consistent
with
the
discrimination
hypothesis
as
well
as
with
the
concept
of
conditioned
reinforce-
ment.
The
third
group
of
experiments
(Skin-
ner,
1938;
Saltzman,
1949;
Kelleher,
1961)
is
inconsistent
with
the
discrimination
hypoth-
esis
but
supports
the
conditioned
reinforce-
ment
concept.
In
summary,
the
discrimination
hypothesis,
like
the
facilitation
hypothesis,
can
be
used
to
account
for
some
of
the
experimental
findings
581
ROGER
T.
KELLEHER
and
LEWIS
R.
GOLLUB
obtained
with
the
extinction
procedure.
How-
ever,
the
conditioned
reinforcement
concept
has
sufficient
explanatory
power
to
integrate
all
of
the
available
evidence.
B.
AN
HYPOTHESIS
ABOUT
THE
NECESSARY
AND
SUFFICIENT
CONDITIONS
FOR
ESTABLISHING
A
CONDITIONED
REINFORCER
In
several
sections
of
this
review
we
have
referred
to
the
discriminative
stimulus
hy-
pothesis.
This
hypothesis
was
initially
gener-
ated
by
experiments
with
the
established
re-
sponse
procedure.
Unlike
the
three
hypotheses
just
discussed,
the
discriminative
stimulus
hy-
pothesis
is
not
intended
to
supplant
or
limit
the
concept
of
conditioned
reinforcement.
This
hypothesis
attempts
to
specify
the condi-
tions
under
which
a
stimulus
becomes
a
con-
ditioned
reinforcer.
Skinner
(1938,
p.
245)
suggested
that
a
stimulus
may
become
a
conditioned
reinforcer
when
it is
established
as
a
discriminative
stim-
ulus.
On
the
basis
of
further
evidence, Keller
and
Schoenfeld
(1950)
concluded
that
the
es-
tablishment
of
a
stimulus
as
a
discriminative
stimulus
is
a
necessary
and
sufficient
condition
for
its
establishment
as
a
conditioned
rein-
forcer.
Experimental
evidence
supporting
the
dis-
criminative
stimulus
hypothesis
has
been
pre-
sented
by
Keller
and
Schoenfeld
(1950)
and
Myers
(1958).
Schoenfeld,
Antonitis,
and
Bersh
(1950)
conditioned
two
groups
of
rats
to
press
a
lever
by
presenting
food
pellets
after
each
response.
For
the
experimental
group,
a
light
appeared
for
1
sec
immediately
after
the
food-
pellet
was
delivered,
i.e.,
during
the
time
that
the
rats
would
be
eating
the
pellet.
No
light
was
presented
to
the
control
group.
During
extinction,
lever
presses
produced
the
light
for
all
animals.
The
groups
did
not
differ
in
the
number
of
responses
emitted
in
extinction.
(Note
that
these
results
do
not
support
the
discrimination
hypothesis.)
The
available
evi-
dence
suggests
that
if
the
light
had
appeared
immediately
before
food-delivery,
it
would
have
been
both
a
discriminative
stimulus
for
the
response
of
approaching
the
food
recepta-
cle,
and
a
conditioned
reinforcer.
Schoenfeld
et
al.
concluded
that
the
light
in
their
experi-
ment
was
not
a
conditioned
reinforcer
because
it
had
not
been
established
as
a
discriminative
stimulus.
Dinsmoor
(1950)
investigated
the
relation-
ship
between
the
(liscriminative
and
condi-
tione(l
reinforcing
functions
of
stimuli.
In
one
experiment,
rats
were
trained
to
respond
on
an
FR
1
schedule
of
food
reinforcement
in
the
presence
of
a
light.
The
animals
were
then
(livi(led
into
three
groups
for
experimental
ex-
tinction.
The
control
group
was
extinguished
with
the
light
off.
The
first
experimental
group
was
extinguished
in
the
presence
of
the
light,
but
each
response
turned
the
light
off
for
3
sec.
The
second
experimental
group
was
extinguished
with
the
light
off,
but
each
re-
sponse
turne(l
the
light
on
for
3
sec.
The
ex-
perimental
groups
responde(d
more
than
the
control
group,
but
they
did
not
differ
from
each
other.
Dinsmoor
concluded
that
the
dis-
criminative
and
con(litioned
reinforcing
ef-
fects
of
the
light
were
interchangeable.
In
a
similar
experiment
(Dinsmoor,
1952)
rats
were
trained
on
an
Fl
5
schedule
of
food
reinforcement.
Half
of
these
animals
under-
went
experimental
extinction
with
periods
of
light
alternating
with
periods
of
(lark
inde-
pendently
of responses.
The
remaining
ani-
mals
were
extinguished
in
the
dark,
but
each
response
turne(d
the
light
on
for
3
sec.
These
two
groups
had
similar
extinction
curves.
An-
other
group
of
rats
was
trained
on
an
FR
1
schedule
of
reinforcement
with
the
light
on,
and
was
extinguished
with
periods
of
light
and
dark
alternating
independently
of
re-
sponses.
This
group
extinguished
faster
than
either
of
the
groups
that
had
been
trained
on
Fl
5.
This
experiment
supported
the
notion
that
discriminative
and
conditioned
reinforc-
ing
effects
of
the
light
are
interchangeable.
The
conditioned
reinforcing
effects
of
the
light
after
training
on
Fl
5
and
FR
1
were
not
directly
compared,
but
the
results
suggest
that
the
discriminative
stimulus
associated
with
Fl
5
schedule
may
have
been
a
more
durable
conditioned
reinforcer
than
the
dis-
criminative
stimulus
associated
with
FR
1.
Experiments
by
Coate
(1956)
and
Ratner
(1956a)
with
the
established
response
pro-
cedure,
have
tested
the
conditioned
reinforc-
ing
effect
of
the
click
after
weakening
its
con-
trol
as
a
discriminative
stimulus.
In
the
first
phase
of
Coate's
experiment,
lever-pressing
pro(luced
the
click
and
food-delivery
on
VI
36
sec
during
training;
approaches
to
the
food
receptacle
were
recorded.
In
the
second
phase,
the
lever
was
removed,
and
the
click
582
A
REVIEW
OF
POSITIVE
CONDITIONED
REINFORCEMENT
r
583
alone
was
intermittently
presented
to
the
ex-
brief
tone,
while
presses
on
the
initially
pre-
perimental
group
but
not
to
the
control
ferred
lever
had no
effects.
Because
the
tone
group.
At
the
end
of
this
second
phase,
the
was
assigned
to
the
non-preferred
lever,
the
animals
of
the
experimental
group
no
longer
rats
responded
slightly
more
on
the
no-tone
approached
the
food
receptacle
when
the
click
lever
(cf.
Kendler,
1959).
In
the
second
phase,
occurred.
In
the
final
phase,
lever-presses
pro-
the
levers
were
removed and
the
tone
was
duced
the
click
but
no
food
was
delivered.
Al-
frequently
presented
just
before
brain
stimu-
though
the
animals
of
the
experimental
group
lation
was
delivered.
In
the
third
phase,
the
immediately
approached
the
food
receptacle
levers
were
reintroduced
into
the
experi-
more
frequently,
they
pressed
the
lever
less
mental
chamber,
and
the
procedure
was
the
often
during
experimental
extinction
than
did
same
as
in
the
first
phase.
In
the
final
phase,
the
control
group.
The
results
of
the
experi-
one
lever
was
removed,
and
responses
on
the
ments
by
Coate
and
Rathner,
as
well
as
experi-
remaining
lever
produced
brain
stimulation;
ments
by
Skinner
(1938),
indicate
that
extinc-
no
tones
were
presented.
Response
rates
in
the
tion
of
one
component
of
a
chain
weakens
the
final
phase
indicated
that
brain
stimulation
conditioned
reinforcing
effect
of
the
stimulus
was
a
positive
reinforcer
for
most
of
the
ani-
associated
with
that
component.
The
discrimi-
mals
(experimental
group).
The
remaining
ani-
native
stimulus
hypothesis
suggests
that
the
mals
were
controls.
The
results
from
the
third
stimulus
is
weakened
as
a
conditioned
rein-
phase
showed
that
the
tone
had
become
a
con-
forcer
because
the
response
that
it
controlled
ditioned
reinforcer
for
the
experimental
as
a
discriminative
stimulus
had
been
ex-
group,
but
not
for
the
control
group.
Note
tinguished.
that
the
effective
delivery
of
the
reinforcer
Several
experiments
with
the
new
response
(brain
stimulation)
in
the
second
phase
did
proce(lure
are
relevant
to
the
discriminative
not
require
any
operant
response;
therefore,
stimulus
hypothesis.
In
an
experiment
by
the
tone
did
not
become
a
discriminative
Ratner
(1956b)
a
click
immediately
preceded
stimulus.
Stein's
results
suggest
that
it
is
not
the
delivery
of
water
during
magazine
train-
necessary
to
establish
a
stimulus
as
a
discrimi-
ing.
During
testing,
all
responses
by
the
ex-
native
stirmulus
for
the
stimulus
to
become
a
perimental
groups
produced
the
click,
and
re-
conditioned
reinforcer.
We
have
previously
sponses
by
the
control
group
had
no
effect.
noted
that
response-independent
correlations
Both
lever-presses
and
approaches
to
the
dip-
of
a
stimulus
with
food
made
that
stimulus
a
per
were
recorded.
A
transient
conditioned
conditioned
reinforcer
(Ferster,
1953;
Ferster
reinforcing
effect
was
demonstrated;
however,
and
Skinner,
1957;
Autor,
1960).
In
these
ex-
the
two
groups
did
not
differ
in
the
number
periments
a
consummatory
response
(eating)
of
approaches
to
the
dipper.
As
Myers
(1958)
occurred
in
the
presence
of,
or
immediately
has
noted,
the
results
suggest
that
the
condi-
after,
the
conditioned
reinforcer.
tioned
reinforcing
effect
of
a
stimulus
can
be
Reinforcers
that
can
be
effectively
delivered
demonstrated
in
the
absence
of
a
discrimina-
independently
of
any
consummatory
operant
tive
effect
of
the
stimulus.
As
noted
in
the
response
should
lead
to
more
knowledge
of
discussion
of
chained
schedules
of
reinforce-
the
mechanism
of
conditioned
reinforcement.
ment,
several
experiments
have
indicated
that
Stein's
experiment
(1958)
was
the
first
to
take
the
establishment
of
a
stimulus
as
a
discrimi-
advantage
of
this
characteristic
of
brain
stimu-
native
stimulus
is
not
sufficient
to
establish
it
lation
in
the
study
of
conditioned
reinforce-
as
a
conditioned
reinforcer.
ment.
Carlton
and
Marks
(1958)
reported
that
An
experiment
by
Stein
(1958)
has
impor-
a
tone
associated
with
the
delivery
of
heat
to
tant implications
for
the
discriminative
stimu-
a
cold
animal
had
become
a
conditioned
re-
lus
hypothesis.
Stein
used
brain
stimulation
inforcer;
however,
this
experiment
was
not
de-
as
a
reinforcer.
In
preliminary
training,
rats
signed
to
demonstrate
conditioned
reinforce-
were
placed
in
an
experimental
chamber
con-
ment.
Further
investigations
of
conditioned
taining
two
levers.
Lever-presses
had
no
con-
reinforcers
established
by
association
with
sequences,
and
initial
preferences
for
either
these
types
of
reinforcers
should
contribute
to
lever
were
determined
for
each
animal.
In
the
understanding
of
conditioned
reinforcement,
first
phase
of
the
experiment,
presses
on
the
because
the
consummatory
and
reinforcing
initially
non-preferred
lever
produced
only
a
aspects
can
be
separated
experimentally.
ROGER
T.
KELLEHER
and
LEWIS
R.
GOLLUB
C.
SUMMARY
The
three
hypotheses
offered
as
alternatives
to
conditioned
reinforcement
have
limited
ap-
plicability
to
conditioned
reinforcement
situa-
tions.
A
stimulus
can
have
several
functions
besides
reinforcing
a
response.
These
hypoth-
eses
stress
the
other
functions,
especially
the
discriminative
control
of
behavior,
but
these
other
functions
do
not
exclude
the
reinforcing
role.
These
hypotheses
are
even
less
applicable
to
new
response
procedures
and
to
maintained
behavior
on
chained
schedules.
At
best,
they
suggest
controls
which
should
be
included
in
an
experimental
design
to
show
the
strength
of
a
stimulus
solely
as
a
conditioned
rein-
forcer.
On
the
other
hand
there
are
alternative
techniques
(e.g.,
chained
schedules)
which
can
often
provide
the
same
kind
of
information,
and
allow
for
more
extensive
behavioral
ma-
nipulations
than
extinction
procedures.
For
example,
other
techniques
are
available
for
developing
new
behavior
with
conditioned
re-
inforcers.
Tokens
can
be
used
to
develop
new
re-
sponses
(Cowles,
1937)
and
new
patterns
of
responding
(Kelleher,
1956,
1957a,
1957b);
the
conditioned
reinforcing
stimulus
can
be
strengthened
in
other
ways
between
periods
of
testing
(cf.
Saltzman,
1949);
and
chained
sched-
ules
can
be
extended
without
omitting
the
un-
conditioned
reinforcement
that
terminates
the
chain.
The
common
characteristic
of
the
ex-
tinction
procedures
is
that
they
are
chains
that
are
broken
before
the
final
component.
It
is
unfortunate
that
the
popularity
of
the
extinc-
tion
procedures
has
led
many
psychologists
to
consider
conditioned
reinforcement
only
in
the
context
of
these
procedures.
Many
hypoth-
eses
and
theories
are
applicable
only
to
results
obtained
with
the
extinction
procedures.
In
considering
conditioned
reinforcement,
the
systematists
of the
future
should
attempt
to
integrate
relevant
data
provided
by
all
tech-
niques.
IV.
OTHER
VARIABLES
AFFECTING
CONDITIONED
REINFORCEMENT
Preceding
sections
of
this
review
have
con-
sidered
the
action
of
conditioned
reinforcers
in
chained
schedules.
The
variables
manipu-
lated
in
these
experiments
have
been
length
of
chain,
frequency
of
reinforcement,
and
schedule
of
reinforcement.
Many
other
varia-
bles
are
relevant
to
conditioned
reinforcement.
In
this
section
we
divide
these
variables
into
two
classes,
static
and
dynamic.
Static
variables
act
when
a
conditioned
reinforcer
is
being
es-
tablished.
For
example,
the
number
of
pair-
ings
of
the
stimulus
which
is
to
be
established
as
a
conditioned
reinforcer
with
the
primary
reinforcer
is
a
static
variable.
Dynamic
varia-
bles
act
when
a
conditioned
reinforcer
is
being
used
to
develop
or
maintain
behavior.
For
ex-
ample,
the
intermittency
of
presenting
an
es-
tablished
conditioned
reinforcer
is
a
dynamic
variable.
Static
and
dynamic
variables
are
distin-
guished
in
terms
of
whether
the
variable
can
be
changed
during
the
course
of
experiment-
ing
with
a
given
organism.
Generally,
only
one
number
of
pairings
can
be
used,
since
a
test
for
the
effect
of
the
conditioned
reinforcer
changes
the
organism's
behavior
irreversibly.
Similarly,
only
one
level
of
deprivation
is
in
force
during
the
establishment
of
a
condi-
tioned
reinforcer.
On
the
other
hand,
the
ef-
fect
of
an
established
conditioned
reinforcer
can
be
tested
under
several
different
levels
of
deprivation
in
a
single
organism.
Most
of
the
experiments
on
these
variables
have
been
con-
ducted
with
extinction
procedures,
and
most
of
the
static
variables
demand
such
a
test.
The
dynamic
variables
can
be
investigated
in
both
extinction
and
maintained
behavior
pro-
cedures.
A.
NUMBER
OF
PAIRINGS
OF
STIMULUS
WITH
PRIMARY
REINFORCER
Three
studies
on
the
strength
of
conditioned
reinforcement
as
a
function
of
number
of
pairings
showed
that
the
conditioned
reinforc-
ing
effectiveness
of
a
stimulus
is
directly
re-
lated
to
the
number
of
pairings.
Over
the
range
of
number
of
pairings
used
(0
to
160),
the
absolute
size
of
the
effects
of
conditioned
reinforcement
was
relatively
small.
Using
the
established
response
extinction
procedure,
Miles
(1956)
trained
rats
to
press
a
bar
for
food
on
FR
1;
each
food-delivery
was
accom-
panied
by
a
flash
of
light
and
a
click.
Paired
groups
of
rats
had
0,
10,
20,
40,
80,
and
160
pairings
of
response-produced
light-click
and
food-delivery.
For
the
experimental
animals,
each
response
in
extinction
produced
the
light
and
click;
the
control
animals
were
extin-
584
A
REVIEW
OF
POSITIVE
CONDITIONED
REINFORCEMENT
guished
without
light
and
click.
Following
0
pairings,
the
experimental animals
made
an
average
of
about
three
more
responses
than
the
comparable
controls.
Following
160
pairings,
the
experimental
animals
made
an
average
of
about
20
more
responses
than
controls.
Using
the
new
response
extinction
proced-
ure,
Bersh
(1951)
found
an
average
of
23
more
responses
following
120
pairings
than
following
0
pairings.
Both
of
these
experi-
ments
indicate
that
the
asymptote
for
the
con-
ditioned
reinforcing
effect
is
reached
after
about
100
pairings.
Hall
(1951a)
trained
rats
in
a
straight
alley;
different
groups
had
25, 50,
or
75
primary
reinforcements.
The
rats
were
then
tested
in
a
T-maze
in
which
the
goal-box
in
one
arm
was
the
same
as
the
one
in
which
the
rats
had
been
previously
reinforced.
No
primary
rein-
forcement
occurred
in
the
T-maze.
The
rats
ran
more
frequently
to
that
arm
of
the
T-maze
with
the
goal-box
in
which
they
had
been
previously
reinforced.
This
effect
was
greater
for
rats
which
had
been
reinforced
75
or
50
times
in
that
goal-box
than
for
rats
which
had
been
reinforced
only
25
times.
Again,
the
ef-
fect
was
relatively
weak,
yielding
fewer
than
two
out
of
15
more
turns
to
the
goal-box
pre-
viously
paired
with
primary
reinforcement.
B.
INTERVAL
BETWEEN
STIMULUS
AND
PRIMARY
REINFORCER
An
important
static
variable
is
the
time
in-
terval
between
presentation
of
the
stimulus
and
the
primary
reinforcer.
This
interval,
which
is
similar
to
the
CS-US
interval
in
re-
spondent
conditioning,
has
been
investigated
in
parametric
experiments.
Using
the
extinc-
tion
paradigm,
Jenkins
(1950),
with
the
new
response
procedure,
and
Bersh
(1951),
with
the
established
response
procedure,
found
that
intervals
of
about
0.5
to
1
sec
produced
the
most
effective
conditioned
reinforcers.
Note
that
these
intervals
are
about
the
same
as
the
optimum
CS-US
interval
in
several
condi-
tioned
respondents
(Osgood,
1953).
Experiments
with
explicit
chained
schedules
suggest
that
the
interval
between
stimulus
and
primary
reinforcer
is
not
a
critical
dynamic
variable.
These
experiments
were
reported
more
completely
in
previous
sections.
Napal-
kov
(1958)
and
Ferster
(1953)
both
maintained
behavior
with
conditioned
reinforcers
pre-
sented
2
and
1
min,
respectively,
before
food
reinforcement.
If
such
long
delays
were
pre-
sented
at
the
beginning
of
training,
behavior
was
not
maintained.
It
was,
however,
possible
to
increase
delays
gradually
until
a
sizable
interval
was
reached.
Autor
(1960)
maintained
behavior
with
conditioned-reinforcers
pre-
sented
at
variable-intervals
averaging
60
sec.
The
conditioned
reinforcing
effects
obtained
with
these
longer
intervals
may
depend
on
the
gradual
lengthening
of
the
interval.
Both
Napalkov
and
Ferster
found
that
sudden
im-
position
of
1-
or
2-min
intervals
caused
a
loss
of
behavior.
The
variable
intervals
used
by
Autor
are
similar
to
gradual
lengthening
pro-
cedures
because
of
the
short
intervals
which
occasionally
occur.
The
intervals
in
the
chained
schedule
ex-
periments
differ
from
the
intervals
in
the
extinction
procedure
experiments
in
another
way.
In
the
chained
schedules
the
stimulus
is
present
throughout
the
interval
while
in
studies
with
the
extinction
procedure
the
stimulus
appeared
briefly
at
the
start
of
the
interval.
The
available
evidence
raises
several
questions.
Does
the
similarity
of the
optimal
intervals
for
establishing
conditioned
rein-
forcers
to
the
optimal
intervals
for
establishing
conditioned
responses
indicate
that
condi-
tioned
reinforcers
are
established
by
simple
stimulus
contingency?
Do
the
procedures
rely
on
some
common
underlying
process,
or
is
the
similarity
a
coincidence?
Is
the
duration
of
the
stimulus
presentation
relevant?
Experimental
analysis
is
required
to
answer
these questions.
C.
AMOUNT
OF
PRIMARY
REINFORCER
Does
the
effectiveness
of
a
stimulus
as
a
conditioned
reinforcer
depend
on
the
amount
(or
quality)
of
the
primary
reinforcer
with
which
it
was
paired?
Since
many
measures
of
behavior
in
simple
operant
conditioning
situa-
tions
vary
with
this
parameter
(e.g.
Guttman,
1953)
one
would
expect
that
studies
done
explicitly
on
conditioned
reinforcement
would
yield
some
positive
relationship
between
amount
and
conditioned
reinforcement.
Wolfe
(1936)
conducted
preference
studies
with
tokens
that
could
be
exchanged
for
different
amounts
and
types
of
reinforcement.
Using
a
simple
discrimination
situation,
Wolfe
showed
that
chimpanzees
would
select
the
token
which
could
be
used
to
obtain:
(1)
585
ROGER
T.
KELLEHER
and
LEWIS
R.
GOLLUB
food
rather
than
one
that
could
not
be
ex-
changed;
(2)
two
pieces
of
food
rather
than
a
token
which
could
be
used
to
obtain
one
piece
of
food;
(3)
the
reinforcer
(food
or
water)
that
was
consistent
with
the
animal's
current
state
of
deprivation
(hunger
or
thirst).
Recent
studies
have
shown
that
the
effectiveness
of
a
conditioned
reinforcing
stimulus
is
directly
re-
lated
to
the
amount
of
food
with
which
it
was
paired
(D'Amato,
1955
and
Lawson,
1953);
the
amount
of
time
allowed
for
eating
(Powell
and
Thomas,
1957);
or
the
concentration
of
food
(Butler
and
Thomas,
1958).
Two
studies
(Lawson,
1957
and
Hopkins,
1955)
found
no
effects
of
amount
of
food,
probably
because
of
the
procedures
employed
(cf.
Myers,
1958).
The
effectiveness
of
conditioned
reinforcers
which
have
been
established
or
maintained
by
extreme
amounts
of
primary
reinforcement
has
not
been
investigated.
Amount
of
primary
reinforcement
might
be
an
important
variable
in
situations
where
only
weak
conditioned
re-
inforcing
effects
can
be
demonstrated.
For
example,
in
those
extended
chained
schedules
that
produce
extremely
long
pauses
in
the
early
components
of
the
chain,
performance
might
be
modified
if
a
large
amount
of
food
were
delivered
at
the
termination
of
the
chain
(cf.
Findley,
1962).
Similarly
on
FR
schedules
of
token
reinforcement
with
large
exchange
ratios,
it
might
be
possible
to
eliminate
pro-
longed
pauses
at
the
start
of
the
session
by
delivering
tokens
which
could
be
exchanged
for
a
relatively
large
amount
of
primary
re-
inforcement
at
the
end
of
the
session.
D.
EFFECT
OF
LEVEL
OF
DEPRIVATION
As
a
static
variable,
deprivation
level
was
found
to
be
irrelevant
to
the
conditioned
re-
inforcing
effects
of
a
stimulus
(Hall,
1951b).
In
this
experiment,
rats
were
reinforced
for
run-
ning
down
an
alley
under
a
6-hr
or
a
22-hr
water
deprivation.
All
rats
were
later
tested
under
a
6-hr
deprivation
in
a
T-maze;
one
end-box
of
the
maze
had
been
previously
paired
with
water
reinforcement.
A
con-
ditioned
reinforcing
effect
was
found
in
both
groups
previously
trained
at
either
level
of
deprivation,
but
there
was
no
difference
be-
tween
the
groups.
Used
as
a
dynamic
variable,
deprivation
level
influences
the
conditioned
reinforcing
effects
of
a
stimulus
(Miles,
1956).
In
this
experiment,
rats
pressed
a
bar
for
food
on
FR
1;
a
light-click
stimulus
accompanied
each
food-delivery,
and
after
80
food-deliveries
the
rats
were
extinguished.
Different
groups
were
extinguished
at
levels
of
deprivation
ranging
from
0
to
40
hrs.
For
the
experimental
animals,
each
response
in
extinction
pro-
duced
the
light-click
stimulus.
The
control
animals
were
extinguished
without
the
light-
click
stimulus.
At
0-hr
deprivation,
the
experi-
mental
animals
made
an
average
of
about
eight
more
responses
than
the
controls;
at
40-hr
deprivation,
the
experimental
animals
made
an
average
of
about
22
more
responses
than
the
controls.
Other
investigations
of
level
of
deprivation
have
been
reviewed
by
Myers
(1958).
He
concluded
(p.
295):
"Both
resistance
to
extinction
and
learning
measures
yield
the
conclusion
that
secondary
reinforcement
strength
increases
with
increased
(Irive
state.
There
is
still
too
little
evidence
to
permit
us
to
decide
whether
or
not secondary
reinforcement
is
effective
with
satiated
subjects."
Ferster
and
Skinner
(1957,
pp.
679-680)
investigated
the
effects
of
progressively
de-
creasing
levels
of
deprivation
during
pro-
longed
sessions
on
chain
VI
3
VI
3
and
chain
VI
1
VI
1
schedules.
The
response
rates
in
Si
were
not
affected,
while
response
rates
in
S2
declined
markedly
as
levels
of
deprivation
de-
creased.
This
suggests
that
the
conditioned
reinforcing
effectiveness
of
a
discriminative
stimulus
can
be
decreased
by
manipulating
deprivation
even
while
the
response
rates
and
frequency
of
reinforcements
controlled
by
the
stimulus
remain
constant.
Future
experiments
on
the
conditioned
reinforcing
effects
of
the
component
stimuli
in
chained
schedules
should
analyze
the
effects
of
varying
levels
of
deprivation.
Can
a
conditioned
reinforcer
established
under
one
type
of
deprivation
be
effective
under
other
types
of
deprivation
even
when
the
original
deprivation
is
absent?
This
question
has
not
been
unequivocally
answered.
Estes
(1949a,
1949b)
concluded
that
a
con-
ditioned
reinforcer
generalized
across
types
of
deprivation.
These
experiments
involved
food
and
water
deprivations,
however,
and
it
is
known
that
these
deprivations
are
not
inde-
pendent
(Verplanck
and
Hayes,
1953;
Grice
and
Davis,
1957).
That
is,
a
food-deprived
rat
is
also
somewhat
water-deprived,
and
vice
versa.
Therefore,
a
zero
level
of
the
original
586
A
REVIEW
OF
POSITIVE
CONDITIONED
REINFORCEMENT
deprivation
was
not
reached,
and
behavior
was
actually
tested
under
some
unknown
level
of
the
original
deprivation.
The
question
con-
cerning
transfer
of
conditioned
reinforcement
needs
to
be
attacked
in
experiments
with
deprivations
that
are
more
independent
than
those
based
on
food
and
water,
or
the
results
of
these
experiments
must
be
correlated
with
other
experiments
which
measure
the
degree
of
induced
deprivation.
Another
question,
related
to
deprivation,
is
whether
a
stimulus
established
as
a
con-
ditioned
reinforcer
under
several
different
deprivation-reinforcement
conditions
is
a
more
effective
conditioned
reinforcer
than
one
established
under
a
single
deprivation.
Skin-
ner
(1953)
refers
to
a
stimulus
that
has
been
paired
with
several
kinds
of
reinforcement
as
a
"generalized
conditioned
reinforcer,"
and
suggests
that
these
reinforcers
should
be
extremely
effective.
That
is,
some
deprivation
for
which
the
conditioned
reinforcer
is
ap-
propriate
will
usually
exist,
and
the
reinforcer
will
thus
be
effective.
Money,
presumably
paired
with
many
kinds
of
reinforcers,
owes
its
potent
reinforcing
properties
to
this
multiple
pairing.
Despite
the
systematic
importance
of
gen-
eralized
conditioned
reinforcers,
very
little
research
has
explicitly
evaluated
their
effects.
In
the
few
studies
which
have
been
performed,
food
and
water
deprivations
were
used.
As
previously
noted,
these
deprivations
are
not
independent.
In
one
set
of
studies
by
Wike
and
his
associates,
a
generalized
conditioned
reinforcer
was
defined
as
a
stimulus
paired
with
a
food
reinforcer
(20-sec
access
to
wet
wash)
when
the
rat
was
deprived
of
both
food
and
water.
The
same
primary
reinforcer
given
to
rats
deprived
only
of
food
was
assumed
to
establish
a
simple
conditioned
reinforcer.
Whether
the
first
procedure
establishes
a
generalized
conditioned
reinforcer,
or
a
simple
conditioned
reinforcer
under
more
severe
de-
privation
is
a
moot
point.
In
one
experiment
(Wike
and
Barrientos,
1958)
rats
were
fed
wet
/ash
after
they
had
run
down
a
short
alley.
On
some
days
the
rats
had
been
deprived
of
both
food
and
water,
and
were
fed
in
one
distinctive
end-box;
on
other
days
the
rats
were
deprived
only
of
food,
and
were
fed
in
a
second
end-box.
The
test
procedure
was
a
series
of
15
trials
in
a
T-maze,
with
the
runway
end-boxes
as
the
end-boxes
of
the
arms
of
the
maze.
Rats
were
tested
under
three
deprivation
conditions:
food-and-water-deprived,
food-deprived,
and
sated.
Rats
run
under
the
double
deprivation
condition
went
to
the
"generalized
condi-
tioned
reinforcing"
end-box
more
frequently
than
either
of
the
other
groups.
A
second
experiment
provided
some
evidence
that
an
end-box
paired
with
food
and
water
under
the
corresponding
deprivations
is
a
more
effective
reinforcer
in
a
T-maze
than
a
box
paired
only
with
food
reinforcement.
Al-
though
this
difference
was
statistically
signifi-
cant,
the
absolute
value
of
the
difference
was
only
0.62
trials
out
of
15.
The
difference
be-
tween
generalized
and
simple
conditioned
reinforcers
was
slightly
greater
under
water
than
under
food
deprivation
(0.94
out
of
15
trials).
A
third
experiment
employed
the
same
design
as
the
second,
except
that
the
simple
conditioned
reinforcer
was
based
on
water
rather
than
food.
The
rats
turned
more
often
toward
the
generalized
reinforcer
only
when
they
had
been
food-deprived
on
the
test
day.
The
way
in
which
these
generalized
reinforcer
effects
are
tied
to
a
current
deprivation,
and
the
small
order
of
magnitude
of
preference
for
the
generalized
reinforcer
does
not
allow
one
to
make
any
extensive
generalizations
of
these
results.
In
some
recent
experiments
with
autistic
children,
Ferster
and
DeMyer
(1962)
used
coins
as
generalized
reinforcers.
The
coins
could
be
used
by
the
child to
operate
any
of
a
number
of
reinforcing
devices.
These
de-
vices
included:
a
pinball
machine,
a
pigeon
trained
to
perform
only
when
its
compartment
was
lighted,
a
color
wheel
giving
a
kaleido-
scopic
effect,
a
phonograph,
an
eight-column
vending
machine
with
a
separate
light
and
coin
slot
in
each
column
so
that
the
child
could
choose
a
particular
candy,
a
second
vending
machine
which
delivered
small
trin-
kets
or
small
packages
containing
parts
of
the
child's
lunch,
and
an
electric
organ.
Using
these
coins
as
reinforcers,
the
experi-
menters
were
able
to
develop
and
maintain
complex
operant
behavior
with
autistic
chil-
dren.
A
technique
such
as
this
should
enable
definitive
studies
of
generalized
conditioned
reinforcers.
It
is
also
probable
that
the
study
of
the
generalized
conditioned
reinforcers
would
advance
understanding
of
reinforcers
in
general.
587
ROGER
T.
KELLEHER
and
LEWIS
R.
GOLLUB
E.
EFFECT
OF
INTERMITTENT
REINFORCEMENT
1.
Intermittent
Primary
Reinforcement
In
recent
years
there
has
been
intense
in-
terest
in
the
effects
of
intermittent
reinforce-
ment.
Subjects
intermittently
reinforced
in
training
are
usually
more
resistant
to
extinc-
tion
than
subjects
continuously
reinforced
in
training.
One
of
the
difficulties
with
the
extinction
procedure
is
that
the
conditioned
reinforcing
effectiveness
of
the
stimulus
ex-
tinguishes
too
rapidly.
The
intermittency
of
pairing
of
a
stimulus
and
a
primary
reinforcer
is
a
static
variable
that
might
modify
the
durability
of
the
conditioned
reinforcing
effects
of
the
stimulus.
With
discrete
trial
techniques,
most
investi-
gators
have
used
the
new
response
procedure.
For
example,
D'Amato,
Lachman,
and
Kivy
(1958)
studied
intermittency
by
putting
food
in
a
distinctive
goal-box
at
the
end
of
a
run-
way
on
50%,o
of
the
training
trials
for
one
group
of
rats
and
on
100%,
of
the
training
trials
for
the
other
group.
On
subsequent
extinction
trials,
the
distinctive
goal-box
was
in
one
arm
of
a
T-maze.
The
conditioned
re-
inforcing
effectiveness
of
the
goal-box
was
more
resistant
to
extinction
when
it
had
been
intermittently
paired
with
food
than
when
it
had
been
continuously
paired
with
food
(see
also,
Saltzman,
1949;
McClelland
&
McGown,
1953).
However,
when
a
goal-box
that
had
been
intermittently
paired
with
food
was
in
one
arm
of
a
T-maze,
and
a
goal-box
that
had
been
continuously
paired
was
in
the
other
arm
rats
ran
more
frequently
to
the
continuously
paired
goal-box.
(See
also
Mason,
1957.)
With
free-operant
techniques,
two
types
of
intermittency
have
been
used.
In
the
first
type,
primary
reinforcement
occurs
intermittently
in
the
presence
of
a
continuous
stimulus.
For
example,
Dinsmoor
(1952)
trained
rats
on
Fl
1
in
the
presence
of
SD;
during
testing,
SA
was
continuously
in
effect,
but
each
response
pro-
duced
SD
for
3
sec.
This
experiment,
and
one
in
which
Wyckoff
et
al.
(1958)
used
a
similar
procedure,
were
discussed
in
Section
III.
Neither
experiment
demonstrated
unequivo-
cal
conditioned
reinforcing
effects.
In
the
second
type,
primary
reinforcement
is
inter-
mittently
paired
with
a
brief
stimulus.
For
example,
using
the
new
response
paradigm,
Fox
and
King
(1961)
presented
a
3-sec
buzzer
50
times
in
each
of
four
training
sessions.
For
one
group,
food
was
always
paired
with
the
buzzer.
For
another
group,
the
intermittency
with
which
food
was
paired
with
the
buzzer
was
gradually
increased;
in
the
fourth
session
food
was
paired
with
the
buzzer
on
five
of
the
50
presentations.
A
bar
was
then
introduced
into
the
experimental
chamber.
For
half
of
each
group,
each
bar-press
produced
the
buz-
zer;
for
the
others,
bar-presses
did
not
produce
the
buzzer.
Although
the
buzzer
was
a
con-
ditioned
reinforcer,
it
was
no
more
effective
following
intermittent
pairing
than
following
continuous
pairing.
In
the
experiments
just
discussed,
inter-
mittent
pairing
was
not
a
consistently
effective
or
potent
static
variable.
Future
experiments
should
study
the
effects
of
more
extreme
intermittency.
Some
experiments
in
which
a
conditioned
reinforcer
was
established
by
in-
termittent
pairing
and
then
was
presented
intermittently
will
be
discussed
below.
Intermittent
primary
reinforcement
has
also
been
used
as
a
dynamic
variable.
Using
a
dis-
crete
trial
technique
and
the
new
response
procedure,
Saltzman
(1949)
trained
rats
in
a
runway
with
a
distinctive
goal-box,
and
tested
them
in
a
modified
T-maze
with
the
goal-box
in
one
arm.
The
rats
never
received
food
in
the
maze,
but
a
food-reinforced
runway
trial
preceded
each
maze
trial.
In
the
maze,
the
rats
ran
to
the
distinctive
goal-box
on
an
average
of
11.5
trials
out
of
15
trials.
In
chained
sched-
ules,
which
were
discussed
in
Section
II,
pri-
mary
reinforcement
occurred
intermittently
in
the
presence
of
Sl.
The
results
generally
showed
that
the
conditioned
reinforcing
effec-
tiveness
of
S,
was
directly
related
to
frequency
of
reinforcement
in
SI.
It
would
be
interesting
to
determine
the
effects
of
presenting
a
brief
stimulus
according
to
a
schedule,
while
inter-
mittently
associating
primary
reinforcement
with
the
stimulus.
For
example,
responses
could
be
reinforced
by
a
click
on
an
Fl
5
schedule,
with
every
third
presentation
of
the
click
followed
by
primary
reinforcement.
The
results
of
this
type
of
experiment
would
prob-
ably
differ
from
results
with
three-component
chained
schedules
of
Fl
5.
2.
Intermittent
Conditioned
Reinforcement
When
a
conditioned
reinforcer
appears
in-
termittently
during
extinction,
there
should
not
only
be
a
clearer
picture
of
the
animal's
tendency
to
respond,
but
the
stimulus
might
588
A
REVIEW
OF
POSITIVE
CONDITIONED
REINFORCEMENT
remain
an
effective
reinforcer
over
a
longer
period
of
time.
Dinsmoor,
Kish,
and
Keller
(1953)
trained
rats
on
FR
1
with
a
light-flash
preceding
each
food-delivery.
During
extinction,
each
re-
sponse
by
rats
of
one
group
produced
the
light
(FR
1);
responses
by
the
other
group
produced
the
light
on
Fl
5.
The
total
number
of
responses
by
the
two
groups
did
not
(liffer
significantly.
Fox
an(l
King
(1961)
obtained
similar
results
in
comparing
FRl
and
Fl
1
schedules
of
conditioned
reinforcement
in
extinction.
In
Kelleher's
(1961)
experiment
(see
Section
III),
pigeons
were
traine(l
on
an
Fl
schedule
of
food
reinforcement,
with
a
click
preceding
each
food-delivery.
During
extinction,
re-
sponses
produce(d
the
click
according
to
var-
ious
schedules.
Although
the
click
was
always
paired
with
food
during
training,
the
results
indlicated
that
the
click
was
a
durable
con-
ditione(d
reinforcer.
Indeed,
the
click
was
used
as
a
con(litione(l
reinforcer
to
develop
be-
havior
on
three
schedules
of
reinforcement
during
more
than
four
experimental
hr,
with
no
indication
that
the
conditioned
reinforcing
effect
was
diminishing.
Although
the
estab-
lished
response
procedure
was
used
in
this
investigation,
new
patterns
of
responding
were
generate(d
by
the
click
during
extinction.
This
technique
of
developing
new
patterns
offers
an
alternative
to
the
new
response
procedure
for
assessing
the
effectiveness
of
a
conditioned
re-
inforcer.
In
several
experiments
the
stimulus
and
pri-
mary
reinforcer
were
paired
intermittently,
thereby
establishing
the
stimulus
as
a
con-
ditioned
reinforcer;
the
conditioned
reinforcer
was
then
presented
intermittently
during
extinction
(double
intermittency).
Clayton
(1952)
used
VR
schedules
(1.7
or
1.5)
(luring
both
training
and
extinction.
There
was
no
evidence
that
the
VR
schedules
were
superior
to
FR
1
in
developing
or
maintaining
a
con-
ditioned
reinforcer.
However,
Zimmerman
(1957,
1959)
has
used
double
intermittency
to
demonstrate
powerful
conditioned
reinforcing
effects.
In
Zimmerman's
first
experiment
(1957),
a
buzzer
briefly
preceded
the
delivery
of
water
to
thirsty
rats.
The
rats
approached
the
dipper
when
the
buzzer
sounded.
In
later
training,
the
delivery
of
water
followed
only
1
out
of
10
buzzer
presentations
on
the
average.
During
extinction,
lever-presses
produced
the
buzzer
on
FR
1
for
six
responses,
and
then
the
schedule
of
buzzer
presentations
was
changed
to
Fl
1.
Water
was
not
delivered
(luring
test-
ing.
Responding
was
developed
and
main-
tained
on
Fl
1;
when
the
buzzer
was
no
longer
presented,
an
extinction
curve
followed.
On
the
following
(lay,
the
buzzer
was
used
to
re-
condition
responding
on
Fl
1.
The
results
demonstrated
that
the
buzzer
was
a
durable
an(I
effective
con(litioned
reinforcer.
In
Zimmerman's
second
experiment
(1959)
rats
were
trained
in
a
straight
alley.
A
buzzer
sounded
just
before
the
animal
was
released
from
the
starting
box
of
the
straight
alley.
Food
was
presented
in
the
goal-box
on
one
out
of
eight
trials
on
the
average.
During
testing,
a
lever
was
introduced
into
the
starting
box,
an(l
lever-presses
produced
the
buzzer
and
re-
lease
from
the
starting
box,
but
food
was
no
longer
presented
in
the
goal-box.
The
fre-
quency
with
which
lever-presses
produced
the
buzzer
and
release
was
decreased
by
a
schedule
change
from
FR
1
to
FR
20
over
the
first
five
test
(lays.
Rats
in
the
experimental
groups
were
paired
with
rats
in
Control
Group
1.
The
buzzer
and
release
for
the
control
rat
occurred
at
the
same
time
that
they
occurred
for
the
experimental
rat;
however,
if
the
control
rat
pressed
the
lever
during
the
last
10
sec
of
the
delay
period,
buzzer
and
release
were
delayed
for
another
10
sec.
Control
Group
2
was
never
reinforced
during
training
in
the
runway,
the
control
rats
being
on
the
same
FR
schedule
of
buzzer
and
release
(luring
testing
as
the
experimental
group.
Zimmerman's
procedure
established
the
buzzer
as
a
conditioned
re-
inforcer
that
would
generate
and
maintain
performance
on
FR
20.
However,
the
FR
per-
formance
became
increasingly
strained
during
successive
test
days.
Some
responding
was
maintained
over
the
10th
to
the
14th
daily
1½-hr
sessions.
Neither
control
group
re-
sponded
consistently.
Latencies
in
leaving
the
starting
box,
and
runway
performances
were
also
recorded.
Zimmerman
reports
that
after
the
eighth
day
the
animals
in
the
experimental
group
usually
stopped
just
outside
the
starting
box.
This
change
in
runway
performance
did
not
affect
the
lever-pressing
performance
until
much
later.
Zimmerman
(1959,
p.
457)
con-
cludes:
"It
appears
that
although
the
existence
or
non-existence
of
discriminative
behavior
following
the
onset
of
the
secondary
reinforcer
589
ROGER
T.
KELLEHER
and
LEWVIS
R.
GOLLUB
may
be
crucial for
the
establishment
of
bar
pressing,
there
is
no
close
functional
depen-
dence
between
latency
and
response
rate."
The
Zimmerman
experiments
(1957,
1959)
demonstrate
that
effective
conditioned
rein-
forcers
can
be
established
and
used
to
generate
behavior
with
the
new
response
procedure.
The
question
remains
as
to
why
these
ex-
periments
were
so
successful.
Zimmerman
has
stated:
"Usually
in
studying
the
acquisition
and
extinction
of
a
bar-pressing
response,
in-
vestigators
have
used
a
single
Skinner
box
compartment.
When,
in
such
a
situation,
a
bar
press
produces
a
secondary
reinforcer,
the
discriminative,
or
cue,
function
of
the
stimulus
typically
causes
the
animal
to
move
to
a
different
area
of
the
compartment,
i.e.,
to
a
food
hopper
or
water
dipper
(cf.
Estes,
1950,
p.
99).
However,
the
overall
change
or
movement
is
not
great.
By
contrast,
in
the
procedure
used
in
the
present
study,
the
response
released
by
the
secondary
rein-
forcer
in
its
discriminative
role
is
highly
distinctive
and
carries
the
animal
into
a
quite
different
situation.
More
specifically,
bar-pressing
takes
place
in
the
starting
box
of
a
straight-alley
runway;
and
when
such
behavior
produces
the
secondary
reinforcer,
the
animal
jumps
from
the
starting
box
and
moves
to
the
far
end
of
the
alley.
This
procedural
innovation
combined
with
the
technique
of
double-intermittent
reinforce-
ment
already
described
(Zimmerman,
1957),
produces
highly
stable
behavior,
which
is
brought
into
existence
and
maintained
solely
by
the
unreinforced
running
be-
havior."
(Zimmerman,
1959,
p.
353.)
Neither
of
Zimmerman's
experiments
in-
cluded
control
groups
to
determine
whether
double
intermittency
was
necessary
for
his
results.
For
example,
even
if
water
delivery
had
followed
every
buzzer
presentation
during
training
in
the
first
experiment
(Zimmerman,
1957)
or
if
every
training
trial
in
the
runway
had
been
reinforced
in
the
second
experiment
(Zimmerman,
1959),
the
same
results
might
have
occurred
during
testing.
Because
our
earlier
analyses
indicated
that
the
establish-
ment
of
a
stimulus
as
a
discriminative
stimulus
is
neither
a
necessary
nor
a
sufficient
condition
for
establishing
it
as
a
conditioned
reinforcer,
the
use
of
intermittency
during
training
may
not
be
critical.
In
Zimmerman's
second
experi-
ment
it
is
interesting
that
".
. .
the
animals
continued
to
press
the
bar
for
a
long
time
even
after
the
running
had
deteriorated
to
a
short
leap
from
the
starting
box."
(Zimmerman,
1959,
p.
357.)
This
finding
suggests
that
buzzer-
release
was
still
a
conditioned
reinforcer
main-
taining
lever-pressing
even
after
"running
to
the
goal-box"
had
been
extinguished.
Pre-
sumably,
it
was
not
necessary
to
train
the
rats
in
such
a
way
that
"running
to
the
goal-box"
was
highly
resistant
to
extinction.
In
any
event,
Zimmerman's
modifications
of
the
new
response
procedure
can
generate
durable
and
effective
con(litioned
reinforcers
(Zimmerman
1957,
1959).
Fox
and
King
(1961)
also
used
dlouble
intermittency,
and
found
that
this
procedure
was
more
effective
than
any
other
combination
of
intermittency
in
training
and
testing
(see
above).
In
summary,
the
results
indicate
that
inter-
mittent
conditioned
reinforcement
is
an
im-
portant
variable.
It
is
not
clear,
however,
whether
its
importance
depends
on
the
pairing
procedure
used
in
training
or
other
procedural
differences.
F.
INITIAL
STATUS
OF
THE
CONDITIONED
REINFORCING
STIMULUS
It
is
frequently
assumed
that
any
stimulus
that
is
to
be
established
as
a
conditioned
re-
inforcer
should
be
a
"neutral"
stimulus
in
the
sense
that
it
is
neither
a
positive
nor
a
negative
primary
reinforcer
(cf.
Myers,
1958).
This
is
an
unnecessary
restriction.
Under
appropriate
experimental
conditions,
a
positively
or
nega-
tively
reinforcing
stimulus
can
be
established
as
a
positive
conditioned
reinforcer.
To
show
that
a
positive
primary
reinforcer
can
be
established
as
a
conditioned
reinforcer,
it
is
necessary
to
enhance
its
effectiveness
as
a
reinforcer.
This
requires
a
sensitive
tech-
nique
for
measuring
changes
in
the
effective-
ness
of
a
reinforcer.
For
example,
a
flash
of
light
might
be
a
weak
positive
primary
re-
inforcer
for
a
rat;
i.e.
responding
would
be
maintained
by
FR
1
reinforcement
of
the
light-flash,
but
not
by
FR
10
or
Fl
1.
If
the
light-flash
were
then
paired
with
a
strong
pri-
mary
reinforcer,
it
might
be
established
as
a
strong
conditioned
reinforcer.
This
could
be
demonstrated
by
using the
light-flash
as
a
rein-
A
REVIEW
OF
POSITIVE
CONDITIOANED
REINFORCEMENT
forcer
that
would
maintain
behavior
on
FR
10
or
Fl
1.
Studies
of
this
type
have
not
been
re-
ported.
In
an
excellent
discussion
of
the
initial
status
of
conditioned
stimuli,
Pavlov
(1927,
pp.
29-32)
(lescribes
experiments
by
Erofeeva
in
which
electric
shock
was
the
CS
and
food
the
US.
"Thus
in
one
particular
experiment
a
strong
nocuous
stimulus-an
electric
current
of
great
strength-was
converted
into
an
alimentary
conditioned
stimulus,
so
that
its
application
to
the
skin
did
not
evoke
the
slightest
defence
reaction.
Instead,
the
animal
exhibited
a
well-marked
alimentary
conditioned
reflex,
turning
its
head
to
where
it
usually
received
the
food
and
smacking
its
lips,
at
the
same
time
producing
a
profuse
secretion
of
saliva."
(Pavlov,
1927,
pp.
29-
30.)
An
interesting
demonstration
of
conditioned
reinforcement
would
be
to
establish
a
negative
primary
reinforcer
as
a
positive
conditioned
reinforcer.
This
requires
a
powerful
technique
for
reversing
the
initial
effect
of
the
stimulus.
In
one
recent
experiment,
which
was
not
interpreted
in
terms
of
conditioned
reinforce-
ment,
Holz
and
Azrin
(1961)
established
electric
shock
as
a
conditioned
reinforcer.
They
trained
pigeons
on
a
VI
2
schedule
of
food
reinforcement,
and
then
punished
re-
sponding.
That
is,
the
birds
were
still
re-
inforced
on VI
2,
but
every
response
produced
an
electric
shock.
When
punishment
was
started,
the
VI
2
response
rate
decreased
to
50%
of
the
unpunished
level.
Next,
sessions
in
which
VI
2
and
punishment
were
both
in
effect
alternated
with
extinction
sessions
in
which
responses
were
neither
reinforced
nor
punished.
After
several
weeks
of
alternate
daily
exposures
to
extinction
and
punishment
plus
positive
reinforcement,
response
rates
were
much
higher
in
the
punishment-
reinforcement
sessions
than
in
the
extinction
sessions.
Food
reinforcement
was
then
omitted
entirely,
and
the
pigeons
were
exposed
to
alternate
daily
sessions
of
punishment
(plus
extinction)
and
extinction
alone.
Under
these
conditions
response-rates
were
much
higher
in
the
sessions
with
punishment.
Figure
14
shows
positively
accelerated
responding
occur-
ring
when
periods
of
electric
shock
were
introduced
in
the
middle
of
extinction
ses-
sions;
when
punishment
stopped,
responding
declined
to
its
usual
extinction
level.
Note,
however,
that
responding
did
not
stop
im-
mediately
when
shocks
were
omitted.
This
finding
closely
resembles
Skinner's
(1938)
three-phase
extinction
procedure
discussed
in
the
preceding
section.
Substantial
response
rates
were
maintained
when
each
response
produced
a
brief
electric
shock.
This
suggests
that
the
shock
was
a
conditioned
reinforcer.
It
is
also
possible
that
the
shock
might
be
a
discriminative
stimulus
for
a
key-peck,
as,
in
training,
re-
sponses
made
after
a
preceding
response,
therefore
after
a
shock,
were
reinforced.
Holz
and
Azrin
investigated
this possibility
by
pre-
senting
shocks
independently
of
responses.
Responses
themselves
did
not
produce
shock.
Even
when
shocks
were
presented
frequently,
only
moderate
response
rates
occurred.
These
rates
were
never
as
high
as
when
shocks
were
response
contingent.
These
experiments
indi-
cate
that
a
strong
primary
negative
reinforcer
can
be
established
as
a
positive
conditioned
reinforcer.
A
methodological
insistence
upon
specific
techniques
usually
involves
the
assumption
that
the
technique
must
be
used
to
test
specific
theories.
The
ways
in
which
non-neutral
stimuli
can
be
modified
by
pairing
with
pri-
mary
reinforcers
are
important,
and
they
can
be
experimentally
analyzed.
The
assumption
that
a
stimulus
that
is
to
be
established
as
a
conditioned
reinforcer
should
be
a
neutral
stimulus
is
not
only
an
unnecessary
restriction,
it
is
one
that
would
lead
us
to
neglect
im-
portant
problems.
V.
DISCUSSION
AND
CONCLUSIONS
Questions
about
conditioned
reinforcement
come
from
many
sources.
Some
questions
come
from
attempts
to
account
for
human
behavior,
others
from
the
treatment
of
reinforcement
in
theories
of
learning,
and
still
others
from
be-
havioral
experiments
themselves.
Some
of
these
questions
were
stated
in
the
introduc-
tion
to
this
review.
Although
many
of
the
experiments
reported
here
were
not
designed
to
answer
these
questions,
they
have
provided
many
answers,
or
have
indicated
possibly
suc-
cessful
ways
to
find
the
answers.
In
this
review,
we
have
analyzed
the
concept
of
conditioned
reinforcement
in
terms
of
591
ROGER
T.
KELLEHER
and
LEWIS
R.
GOLLUB
I
I
I
II
.
X
T
I
a
I
N
C
T
.O
N
s~~~~~
~
~~~~~~~~~~~~~~~
I
1
S-1~~~~~~~
NO
NO
PNHENN
NO
PUMSHWN
0
500
'
'
~~NO
P>UNISMNT
I
1
S-2
NO
LSHMENT
'
0
30
40
70
bUJTES
Fig.
14.
Cumulative
response
records
showing
the
effects
of
introducing
a
10-min
period
of
punishment
into
the
middle
of
extinction
sessions
(from
Holz
and
Azrin,1961).
chained
schedules.
The
generality
of
this
concept
is
demonstrated
by
the
wealth
of
experimental
material
it
integrates,
as
well
as
by
the
suggestions
it
makes
for
future
research.
The
advantages
of
chained
schedules
in
com-
parison
with
other
procedures
have
been
considered
above,
but
the
main
advantage
bears
emphasis-the
possibility
of
studying
conditioned
reinforcement
under
chronic,
maintained
conditions.
It
is
now
appropriate
to
consider
answers
to
the
questions
raised
in
the
first
Section.
First,
what
are
the
necessary
and
sufficient
conditions
for
establishing
a
conditioned
re-
inforcer?
Many
studies
have
attacked
this
problem.
Unfortunately,
the
main
effort
in
many
of
these
has
been
to
perform
"crucial
experiments"
which
would
show
that
con-
ditioned
reinforcers
did
not
reinforce,
but,
rather,
acted
as
some
other
sort
of
stimulus,
eliciting
or
producing
the
response
under
study.
The
experimenters
consequently
ex-
plain
their
data
and
those
from
other
con-
ditioned
reinforcement
studies
by
denying
that
certain
stimuli
(besides
food,
water
and
other
"primary"
reinforcers)
can
become
re-
inforcers;
however,
reinforcers
are
initially
defined
by
their
effects
on
behavior.
Notions
of
drive
reduction,
pleasure
or
other
devices
to
explain
reinforcement
are
irrelevant
to
the
issue
of
whether
previously
nonreinforcing
stimuli
can
be
established
as
reinforcers.
The
specific
alternative
explanations
of
condi-
tioned
reinforcement,
considered
in
Section
III,
emphasize
that
a
conditioned
reinforcing
stimulus
may
have
other
properties
besides
a
reinforcing
one,
and
that
it
is
these
other
roles
which
produce
the
effects
attributed
to
re-
inforcement.
In
analyzing
the
experiments
that
led
to
these
alternative
views,
we
have
tried
to
show
that
weaknesses
in
experimental
design
or
in
the
data
could
not
allow
a
con-
clusive
answer.
Moreover,
we
presented
other
experiments
which
are
not
easily
interpreted
in
terms
of
these
alternative
accounts.
Only
one
attempt
has
been
made
to
specify
the
conditions
for
establishing
a
conditioned
reinforcer.
The
hypothesis
suggested
by
592
A
REVIEW
OF
POSITIVE
CONDITIONED
REINFORCEMENT
Schoenfeld
et
al.
(1950)
that
discriminative
stimuli
and
conditioned
reinforcers
are
co-ex-
istent
has
been
treated
in
detail
above.
The
evaluation
of
this
hypothesis
rests
on
the
definition
of
the
term
discriminative
stimulus.
Since,
in
general,
conditioned
reinforcers
have
been
the
discriminative
stimuli
of
chained
schedules,
there
is
a
large
amount
of
positive
evidence
for
showing
a
sufficient
relationship.
Experiments
in
which
S,
is
followed
by
food
without
a
response
(Ferster,
1953;
Ferster
and
Skinner,
1957;
Autor,
1960)
indicate
that
the
relationship
is
not
a
necessary
one,
if
only
pro-
grammed
response
contingencies
define
a
discriminative
stimulus.
Appeals
to
acciden-
tally
established
responses
only
vaguely
fortify
the
argument
and
must
be
substantiated
by
something
more
than
occasional
observation.
Even
more
damaging
evidence
against
the
necessity
of
a
conditioned
reinforcer
being
a
discriminative
stimulus
is
provided
by
Stein's
(1958)
demonstration
that
a
conditioned
re-
inforcer
can
be
established
by
simple
con-
tiguity
of
a
tone
with
positively
reinforcing
brain
shock.
In
this
case
there
was
not
even
a
consummatory
response.
This
technique
could
be
used
for
more
conclusive
studies
on
this
question.
The
sufficiency
of
the
relationship
has
also
been
questioned
through
the
finding
that
the
presentation
of
some
discriminative
stimuli
will
not
maintain
responding
in
early
members
of
an
extended
chain.
This
observation
shows
that
a
quantitative
designation
of
some
aspect
of
discriminative
stimuli
must
also
be
taken
into
account.
"
In
summary,
while
discriminative
stimuli
most
often
are
conditioned
reinforcers,
stimuli
that
simply
are
paired
with
reinforcers
also
re-
inforce.
Whether
one
of
these
alternatives
will
be
chosen
finally
as
the
sole
condition
for
Lestablishing
a
conditioned
reinforcer
must
await
further
careful
research.
Second,
what
variables
determine
the
rein-
forcing
strength
of
a
conditioned
reinforcer?
A
review
of
the
variables
which
determine
the
effectiveness
of
a
conditioned
reinforcer
in-
cludes
most
of
the
variables
which
also
deter-
mine
the
effectiveness
of
a
primary
reinforcer.
In
some
experiments
these
variables
are
in
force
while
a
conditioned
reinforcer
is
being
established,
and
are
evaluated
during
extinc-
tions
of
both
the
response
and
conditioned
reinforcer.
In
other
studies
the
variable
of
interest
is
manipulated
either
during
ex-
tinction
or
in
a
maintained
chained
schedule.
Both
types
of
study
show
that
amount
of
primary
reinforcement,
and
frequency
or
prob-
ability
of
primary
reinforcement
are
effective
in
determining
conditioned
reinforcement
strength.
In
addition
to
these
variables,
num-
ber
of pairings
of,
and
duration
of
interval
between
conditioned
and
primary
reinforcers
affect
the
potency
of
a
conditioned
reinforcer.
In
addition
to
simply
cataloguing
the
ef-
fective
variables,
is
a
more
general
account
of
the
bases
of
conditioned
reinforcement
pos-
sible?
One
explicit
attempt
at
such
a
statement
has
been
made
by
Wyckoff
(1959).
He
main-
tained
that
the
conditioned
reinforcing
strength
of
a
stimulus
is
a
function
of
the
"cue
strength"
of
that
stimulus.
(By
analyzing
data
from
Prokasy's
(1956)
observing
response
experiment,
he
concluded
that
the
function
is
positively
accelerated
over
some
part
of
its
range.)
The
usefulness
of
this
analysis
depends,
however,
on
the
clarity
of
the
terms
con-
ditioned
reinforcing
strength
and
cue
strength.
In
the
formal
treatment
of
his
model,
Wyckoff
defines
cue
strength
of
a
stimulus
as
prob-
ability
of
response
in
the
stimulus.
The
simplicity
of
the
formulation,
therefore,
is
its
weakness;
in
several
chained
schedules
higher
rates
occur
in
S2,
reinforced
by
a
conditioned
reinforcer,
than
in
Sl,
reinforced
by
a
primary
reinforcer
(uncorrelated
schedules
in
SI,
chain
FR
DRL,
etc.).
Although
Wyckoff's
formula-
tion
may
hold
for
a
restricted
set
of
data
(with
narrowly
defined
limitations
on
the
Sl-
schedule,
or
more
broadly
conceived
defini-
tions
of
cue
strength)
it
is
not
true
in
any
general
sense.
Indeed,
the
data
suggest
that
a
useful
integration
of
the
many
variables
which
determine
conditioned
reinforcing
strength
can
not
rely
too
heavily
on
discrim-
inative
stimulus
properties,
since
a
stimulus
can
be
a
conditioned
reinforcer
and
yet
not
be
a
discriminative
stimulus.10
10A
broad
attack
on
the
problem
of
reinforcement
functions
in
general
is
proposed
by
Premack
(1959).
This
approach
resembles
Wyckoff's
in
that
reinforce-
ment
strength
is
a
function
of
response
probability
(without,
however,
any
notion
of
cue
or
discriminative
strength).
Since
Premack's
defining
situation
is
the
"independent"
(operant
level)
frequency
of
responses,
the
theory
is
not
clearly
applicable
to
conditioned
re-
inforcement
situations
with
schedules
of
reinforcement
other
than
CRF.
593
ROGER
T.
KELLEHER
and
LEWIS
R.
GOLLUB
Third,
what
problems
remain
in
the
analy-
sis
of
conditioned
reinforcers?
The
preceding
discussion
indicated
that
both
old
and
new
questions
remain
to
be
answered.
The
first
problem,
that
of
specifying
the
necessary
and
sufficient
conditions
for
establishing
a
con-
ditioned
reinforcer,
still
requires
much
work.
By
the
nature
of
the
problem,
extremely
careful
research,
eliminating
sources
of
pos-
sible
alternative
explanation,
must
be
con-
ducted.
This
research
requires
more
careful
experimental
control
and
analysis
of
con-
ditions
in
the
experiment.
Possibly,
different
forms
of
preparation,
such
as
curarized
animals,
or
different
reinforcers,
such
as
brain
shock,
may
be
methods
of
choice
for
such
an
analysis.
Certainly
the
present
data
indicate
that
pairing
a
stimulus
with
a
primary
reinforcer
may
be
sufficient.
A
second
main
problem
is
the
study
of
variables
which
affect
conditioned
reinforcing
strength.
Are
the
relationships
of
these
vari-
ables
any
different
from
their
effects
on
primary
reinforcement?
More
empirical
re-
search
along
parametric
lines
should
precede
theoretical
speculation
if
this
problem
is
to
be
solved.
The
third
problem
concerns
future
applica-
tions
of
conditioned
reinforcers.
Although
we
have
been
generally
concerned
with
strong
or
durable
conditioned
reinforcers,
a
be-
havioral
baseline
maintained
by
a
weak
con-
ditioned
reinforcer
might
be
useful
for
studying
certain
variables,
particularly
those
variables
that
could
increase
response
out-
puts.
For
example,
performance
in
the
early
components
of
an
extended
chained
schedule
could
be
used
to
study
effects
of
drugs,
level
of
deprivation
or
amount
of
reinforcement.
It
would
be
interesting
to
determine
which
variables
could
increase
the
frequency
of
responding
in
this
type
of
baseline.
Conditioned
reinforcers
might
also
have
technical
advantages
for
establishing
and
maintaining
complex
sequences
of
responding;
i.e.,
a
conditioned
reinforcer,
such
as
a
click,
could
follow
each
component
of
a
sequence,
with
a
primary
reinforcer
terminating
the
sequence.
In
this
way,
components
of
the
sequence
could
be
reinforced
without
disrupt-
ing
the
entire
sequence.
Already
some
experi-
mtnts
have
suggested
conditioned
reinforcers
can
be
used
to
control
complex
behavior
more
effectively
than
primary
reinforcers
alone.
For
example,
Ferster
(1958)
trained
a
chimpanzee
on
a
complex
response
sequence
("counting").
When
every
"correct"
sequence
was
followed
by
food,
about
20%0
of
the
sequences
were
"incorrect."
When
every
33rd
"correct"
se-
quence
was
followed
by
food,
but
the
other
32
were
followed
by
a
conditioned
reinforcer,
about
2%9>1
of
the
sequences
were
"incorrect."
Although
further
analysis
will
be
requiredl
to
determine
the
role
of
conditioned
reinforce-
ment
in
this
type
of
procedure,
the
results
suggest
an
interesting
application
of
con-
ditioned
reinforcers.
Conditioned
reinforcers
might
lead
to
more
knowledge
of
primary
reinforcers
that
are
difficult
to
study
because
of
their
direct
physiological
effects.
For
example,
it
is
difficult
to
study
the
reinforcing
effects
of
some
drugs
because
the
physiological
effects
of
the
drugs
disrupt
all
behavior.
By
using
a
conditione(d
reinforcing
stimulus
that
had
been
paired
with
the
drug
effect,
it
might
be
possible
to
study
the
reinforcing
effects
of the
drug
in
the
absence
of
the
disrupting
physiological
effects.
Can
a
conditioned
reinforcer
be
as
effective
as
the
unconditioned
or
conditioned
reinforcer
used
to
establish
it?
Experiments
directly
comparing
conditioned
and
unconditioned
re-
inforcers
have
consistently
demonstrated
that
the
unconditioned
reinforcer
is
more
effective
than
the
conditioned
reinforcer
(Wolfe,
1936;
Cowles,
1937;
Kelleher,
1957a).
Also,
experi-
ments
on
extended
chains
indicate
that
a
conditioned
reinforcer
is
not
as
effective
as
the
conditioned
reinforcer
used
to
establish
it.
This
factor
determines
the
maximum
chain
length
on
which
performance
can
be
main-
tained.
Future
experiments
may
overcome
this
limiting
factor
by
using
generalized
re-
inforcers.
If
a
stimulus
were
the
occasion
on
which
the
organism
could
select
any
one
of
several
reinforcers,
this
conditioned
reinforc-
inig
stimulus
might
become
a
more
potent
re-
inforcer
than
any
of
the
single
reinforcers
used
to
establish
it.
The
effects
of
primary
reinforcers
are
easily
demonstrated
even
with
individual
organisms.
Probably
because
of
this
effectiveness,
the
concept
of
primary
reinforcement
is
a
critical
part
of
most
theories
of
behavior.
On
the
other
hand,
many
experiments
on
conditioned
reinforcers
have
demonstrated
only
weak
or
transient
effects,
and
it
has
often
been
neces-
sary
to
average
the
data
of
groups
of
animals
594
A
REVIEW
OF
POSITIVE
CONDITIONED
REINFORCEMENT
595
to
show
an
effect.
Because
of
this
transience
some
psychologists
have
rejected
the
concept
of
conditioned
reinforcement
and
have
favored
alternative
interpretations;
others
have
urged
the
use
of
experimental
designs
that
will
permit
the
application
of
more
elaborate
statistical
techniques.
We
believe
that
future
experiments
on
conditioned
re-
inforcement
should
be
primarily
concerned
with
powerful
and
durable
phenomena
that
can
be
demonstrated
in
individual
organisms.
Several
of
the
investigators
discussed
in
this
review
have
described
techniques
already
available
to
develop
effective
conditioned
re-
inforcers.
Most
of
these
techniques
are
varia-
tions
of
simple
chained
schedules.
Future
investigators
will
undoubtedly
develop
tech-
niques
for
cascading
these
simple
chains
into
complex
chains
involving
several
types
of
conditioned
and
primary
reinforcement.
The
experimental
analysis
of
conditioned
rein-
forcement
in
these
complex
extended
chains
should
provide
an
understanding
of
con-
ditioned
reinforcement
and
effective
tech-
niques
applicable
to
behavioral
problems
out-
side
the
laboratory.
REFERENCES
Arnold,
W.
J.
Simple
reaction
chains
and
their
inte-
gration.
I.
Homogeneous
chaining
with
terminal
reinforcement.
J.
cornzp.
physiol.
Psychol.,
1947a,
40,
349-364.
Arnold,
W.
J.
Simple
reaction
chains
and
their
inte-
gration.
II.
Heterogeneous
chaining
with
terminal
reinforcement.
J.
coinp.
physiol.
Psychol.,
1947b,
40,
427-440.
Autor,
S.
M.
The
strength
of
conditioned
reinforcers
as
a
function
of
frequency
and
probability
of
rein-
forcement.
Unpublished
doctoral
dissertation,
Har-
vard
University,
1960.
Bersh,
P.
J.
The
influence
of
two
variables
upon
the
establishment
of
a
secondary
reinforcer
for
operant
responses.
J.
exp.
Psychol.,
1951,
41,
62-73.
Bitterman,
M.
E.,
Fedderson,
W.
E.,
and
Tyler,
D.
W.
Secondary
reinforcement
and
the
discrimination
hypothesis.
Anier.
J.
Psychol.,
1953,
66,
456-464.
Brady,
J.
V.
and
Thach,
J.
Chaining
behavior
and
re-
inforcement
schedule
preference
in
the
Rhesus
monkey.
Presented
at
Eastern
Psychol.
Assoc.
Meet-
ing,
New
York,
1960.
Bugelski,
R.
Extinction
with
and
without
sub-goal
reinforcement.
J.
comip
Psychol.,
1938,
26,
121-134.
Bugelski,
R.
The
psychology
of
learning.
New
York:
Holt,
1956.
Butter,
C.
M.,
and
Thomas,
D.
R.
Secondary
reinforce-
ment
as
a
function
of
the
amount
of
primary
re-
inforcement.
J.
cornip.
physiol.
Psychol.,
1958,
51,
346-348.
Carlton,
P.
L.
and
Marks,
R.
A.
Cold
exposure
and
heat
reinforced
operant
behavior.
Science,
1958,
128,
1344.
Clayton,
F.
L.
Secondary
reinforcement
as
a
function
of
reinforcement
scheduling.
Psychol.
Rep.,
1952,
2,
377-380.
Coate,
W.
B.
Weakening
of
conditioned
bar-pressing
by
prior
extinction
of
its
subsequent
discriminated
operant.
J.
conmp.
physiol.
Psychol.,
1956,
49,
135-138.
Coppock,
H.
W.
and
Chambers,
R.
M.
Reinforcement
of
positive
preference
by
automatic
intravenous
in-
jections
of
glucose.
J.
cotnlp.
physiol.
Psychol.,
1954,
47,
355-357.
Cowles,
J.
T.
Food-tokens
as
incentives
for
learning
by
chimpanzees.
Coinp.
Psychol.
Monogr.,
1937,
14,
1-96.
Crowder,
W.
F.,
Gill,
K.
Jr.,
Hodge,
C.
C.,
and
Nash,
F.
A.
Jr.
Secondary
reinforcement
or
response
facilitation?:
II.
Response
acquisition.
J.
Psychol.,
1959,
48,
303-306.
Crowder,
W.
F.,
Morris,
J.
B.
and
McDaniel,
M.
H.
Secondary
reinforcement
or
response
facilitation?:
I.
Resistance
to
extinction.
J.
Psychol.,
1959,
48,
299-302.
D'Amato,
M.
R.
Secondary
reinforcement
and
magni-
tude
of
primary
reinforcement.
J.
comp.
physiol.
Psychlol.,
1955,
48,
378-380.
D'Amato,
M.
R.,
Lachman,
R.,
and
Kivy,
P.
Secondary
reinforcement
as
affected
by
reward
schedule
and
the
testing
situation.
J.
comp.
physiol.
Psychol.,
1958,
51,
737-741.
Dinsmoor,
J.
A.
A
quantitative
comparison
of
the
dis-
criminative
and
reinforcing
functions
of
a
stimulus.
J.
exp.
Psychol.,
1950,
40,
458-472.
Dinsmoor,
J.
A.
Resistance
to
extinction
following
periodic
reinforcement
in
the
presence
of
a
dis-
criminative
stimulus.
J.
comp.
physiol.
Psychol.,
1952,
45,
31-35.
Dinsmoor,
J.
A.,
Kish,
G.
B.
and
Keller,
F.
S.
A
com-
parison
of
the
effectiveness
of
regular
and
periodic
secondary
reinforcement.
J.
gen.
Psychol.,
1953,
48,
57-66.
Elam,
C.
B.,
Tyler,
D.
W.,
and
Bitterman,
M.
E.
A
further
study
of
secondary
reinforcement
and
the
discrimination
hypothesis.
J.
comp.
physiol.
Psychol.,
1954,
47,
381-384.
Estes,
W.
K.
A
study
of
the
motivating
conditions
necessary
for
secondary
reinforcement.
J.
exp.
Psychol.,
1949a,
39,
306-310.
Estes,
W.
K.
Generalization
of
secondary
reinforce-
ment
from
the
primary
drive.
J.
comp.
physiol.
Psychlol.,
1949b,
42,
286-295.
Ferster,
C.
B.
Sustained
behavior
under
delayed
re-
inforcement.
J.
exp.
Psychol.,
1953,
45,
218-224.
Ferster,
C.
B.
Control
of
behavior
in
chimpanzees
and
pigeons
by
time
out
from
positive
reinforcement.
Psychol.
Moniogr.,
1958a,
72,
Whole
No.
461;
1-38.
Ferster,
C.
B.
Intermittent
reinforcement
of
a
com-
plex
response
in
a
chimpanzee.
J.
exp.
Anal.
Behav.,
1958b,
1,
163-165.
Ferster,
C.
B.,
and
DeMyer,
M.
K.
A
method
for
the
experimental
analysis
of
the
behavior
of
autistic
children.
Ain.
J.
Orthopsychiatry,
1962,
32,
89-98.
Ferster,
C.
B.,
and
Skinner,
B.
F.
Schedules
of
rein-
forcernent.
New
York:
Appleton-Century-Crofts,
1957.
596
ROGER
T.
KELLEHER
and
LEWIS
R.
GOLLUB
Findley,
J.
D.
Rates
of
response
in
a
two-member
chain
as
a
function
of
mean
variable-interval
sched-
ule
of
reinforcement
on
the
second
member.
Un-
published
doctoral
dissertation,
Columbia
Uni-
versity,
1954.
Findley,
J.
D.
Preference
and
switching
under
con-
current
scheduling.
J.
exp.
Anal.
Behav.,
1958,
1,
123-144.
Findley,
J.
D.
An
experimental
outline
for
building
and
exploring
multi-operant
behavior
repertoires.
J.
exp.
Anal.
Behav.,
1962,
5,
113-166.
Fox,
R.
E.
and
King,
R.
A.
The
effects
of
reinforce-
ment
scheduling
on
the
strength
of
a
secondary
re-
inforcer.
J.
cotimp.
physiol.
Psychol.,
1961,
54,
266-269.
Gollub,
L.
R.
The
chaining
of
fixed-interval
schedules.
Unpublished
doctoral
dissertation,
Harvard
Uni-
versity,
1958.
Grice,
G.
R.
and
Davis,
J.
D.
Effect
of
irrelevant
thirst
motivation
on
a
response
learned
with
food
reward.
J.
exp.
Psychol.,
1957,
53,
347-352.
Guttman,
N. Operant
conditioning,
extinction,
and
periodic
reinforcement
in
relation
to
concentration
of
sucrose
used
as
reinforcing
agent.
J.
exp.
Psychol.,
1953,
46,
213-224.
Hall,
J.
F.
Studies
in
secondary
reinforcement:
I.
Secondary
reinforcement
as
a
function
of
the
fre-
quency
of
primary
reinforcement.
J.
comp.
physiol.
Psychol.,
1951a,
44,
246-251.
Hall,
J.
F.
Studies
in
secondary
reinforcement:
II
Secondary
reinforcement
as
a
function
of
the
strength
of
drive
during
primary
reinforcement.
J.
cornp.
physiol.
Psychol.,
1951b,
",
462-466.
Hanson,
H.
M.
and
Witoslawski,
J.
J.
Interaction
be-
tween
the
components
of
a
chained
schedule.
J.
exp.
Anal.
Behav.,
1959,
2,
171-177.
Hilgard,
E.
R.
and
Marquis,
D.
G.
Conditioning
and
learning.
New
York:
Appleton-Century,
1940.
Holtz,
W.
C.
and
Azrin,
N.
H.
Discriminative
proper-
ties
of
punishment.
J.
exp.
Anal.
Behav.,
1961,
4,
225-232.
Hopkins,
C.
0.
Effectiveness
of
secondary
reinforcing
stimuli
as
a
function
of
the
quantity
and
quality
of
food
reinforcement.
J.
exp.
Psychol.,
1955,
S0,
339-342.
Hull,
C.
L.
Principles
of
behavior.
New
York:
Apple-
ton-Century,
1943.
Hull,
C.
L.,
Livingston,
J.
R.,
Rouse,
R.
O.,
and
Barker,
A.
N.
True,
sham,
and
esophageal
feeding
as
reinforcements.
J.
comp.
physiol.
Psychol.,
1951,
44,
236-245.
Jenkins,
W.
0.
A
temporal
gradient
of
derived
re-
inforcement.
Amer.
J.
Psychol.,
1950,
63,
237-243.
Kelleher,
R.
T.
Intermittent
conditioned
reinforce-
ment
in
chimpanzees.
Science,
1956,
124,
679-680.
Kelleher,
R.
T.
A
comparison
of
conditioned
and
food
reinforcement
in
chimpanzees.
Psychol.
Newsltr.,
1957a,
8,
88-93.
Kelleher,
R.
T.
A
multiple
schedule
of
conditioned
reinforcement
with
chimpanzees.
Psychol.
Repts.,
1957b,
3,
485-491.
Kelleher,
R.
T.
Conditioned
reinforcement
in
chim-
panzees.
J.
comp.
physiol.
Psychol.,
1957c,
49,
571-575.
Kelleher,
R.
T.
Stimulus
producing
responses
in
chimpanzees.
J.
exp.
Anal.
Behav.,
1958a,
1,
87-102.
Kelleher,
R.
T.
Fixed-ratio
schedules
of
conditioned
reinforcement
with
chimpanzees.
J.
exp.
Anal.
Behav.,
1958b,
1,
281-289.
Kelleher,
R.
T.
Schedules
of
conditioned
reinforce-
ment
in
experimental
extinction.
J.
exp.
Anal.
Behav.,
1961,
4,
1-5.
Kelleher,
R.
T.,
and
Fry,
W.
Stimulus
functions
in
chained
fixed-interval
schedules.
J.
exp.
Anal.
Behav.,
1962,
5,
167-173.
Keller,
F.
S.,
and
Schoenfeld,
W.
N.
Principles
of
psychology.
New
York:
Appleton-Century-Crofts,
1950.
Kendler,
H.
H.
Learning.
In
Ann.
Rev.
Psychol.
Farnsworth,
P.
R.
(ed.),
Palo
Alto,
Calif.:
Annual
Rcviews,
1959.
Lawson,
R.
Amount
of
primary
reward
and
strength
of
secondary
reward.
J.
exp.
Psychol.,
1953,
46,
183-187.
Lawson,
R.
Brightness
discrimination
performance
and
secondary
reward
strength
as
a
function
of
primary
reward
amount.
J.
camip.
phlysiol.
Psychol.,
1957,
so,
35-39.
Logan,
F.
A.
Incentive:
how
the
conditions
of
rein-
forcemtient
affect
the
performance
of
rats.
New
Haven:
Yale,
1960.
McClelland,
D.
C.
and
McGown,
D.
R.
The
effect
of
variable
food
reinforcement
on
the
strength
of
a
secondary
reward.
J.
coin
p.
phlysiol.
Psychol.,
1953,
46,
80-86.
Mason,
D.
J.
The
relation
of
secondary
reinforcement
to
partial
reinforcement.
J.
coiamp.
physiol.
Psychol.,
1957,
so,
264-268.
Miles,
R.
C.
The
relative
effectiveness
of
secondary
re-
inforcers
throughout
deprivation
and
habit-strength
parameters.
J.
comp.
physiol.
Psychol.,
1956,
49,
126-130.
Miller,
N.
E.
Learnable
drives
and
rewards.
In
S.
S.
Stevens
(ed.),
Handbook
of
experimt
ental
psy-
chology.
New
York:
Wiley,
1951.
Morse,
W.
H.
An
analysis
of
responding
in
the
pres-
ence
of
a
stimulus
correlated
with
periods
of
non-
reinforcement.
Unpublished
doctoral
dissertation,
Harvard
University,
1955.
Mowrer,
0.
H.
Learning
theory
and
behavior.
New
York:
John
Wiley
and
Sons,
1960.
Myers,
J.
L.
Secondary
reinforcements:
A
review
of
recent
experimentation.
Psychol.
Bull.,
1958,
55,
284-301.
Napalkov,
A.
V.
Chains
of
motor
conditioned
re-
actions
in
pigeons.
Zh.
vyssh.
ner-n.
Deiatel.,
1959,
9,
615-621.
Olds,
J.
and
Milner,
P.
Positive
reinforcement
pro-
duced
by
electrical
stimulation
of
septal
area
and
other
regions
of
rat
brain.
J.
cottap.
physiol.
Psychol.,
1954,
47,
419-427.
Osgood,
C.
E.
Method
and
theory
in
experimental
psychology.
New
York:
Oxford
Univ.
Press,
1953.
Pavlov,
I.
P.
Conditioned
reflexes.
(Translated
by
G.
V.
Anrep)
London:
Oxford
University
Press,
1927.
Pierrel,
R.,
and
Sherman,
G.
Barnabus,
a
rat
with
college
training.
Brown
Alumni
Monthly
(in
press)
.
Powell,
D.
R.
and
Thomas,
C. C.
Strength
of
second-
ary
reinforcement
as
a
determiner
of
the
effects
of
duration
of
goal
responses
on
learning.
J.
exp.
Psychol.,
1957,
53,
106-112.
Premack,
D.
Toward
empirical
behavior
laws:
I.
Posi-
tive
reinforcement.
Psychol.
Rev.,
1959,
66,
219-233.
A
REVIEW
OF
POSITIVE
CONDITIONED
REINFORCEMENT
597
Prokasy,
W.
F.
The
acquisition
of
observing
responses
in
the
absence
of
differential
external
reinforcement.
J.
comp.
physiol.
Psychol.,
1956,
49,
131-134.
Ratner,
S.
C.
Effect
of
extinction
of
dipper-approach-
ing
on
subsequent
extinction
of
bar-pressing
and
dipper-approaching.
J.
cot71p.
physiol.
Psychol.,
1956a,
49,
576-581.
Ratner,
S.
C.
Reinforcing
and
discriminative
proper-
ties
of
the
click
in
a
Skinner
box.
Psychol.
Repts.,
1956b,
2,
332.
Razran,
G.
A
note
on
second-order
conditioning-and
secondary
reinforcement.
Psychlol.
Rev.,
1955,
62,
327-332.
Razran,
G.
The
observable
unconscious
and
the
in-
ferable
conscious
in
current
Soviet
psychophysiology:
interoceptive
conditioning,
semantic
conditioning,
and
the
orienting
reflex.
Psychol.
Rev.,
1961,
68,
81-147.
Saltzman,
I.
J.
Maze
learning
in
the
absence
of
pri-
mary
reinforcement:
a
study
of
secondary
rein-
forcement.
J.
comp.
physiol.
Psychol.,
1949,
42,
161-173.
Schoenfeld,
W.
N.,
Antonitis,
J.
J.,
and
Bersh,
P.
J.
A
preliminary
study
of
training
conditions
necessary
for
secondary
reinforcement.
J.
exp.
Psychol.,
1950,
40,
40-45.
Shapiro,
M.
M.
Respondent
salivary
conditioning
during
operant
lever
pressing
in
dogs.
Science,
1960,
132,
619-620.
Shirkova,
G.
I.
and
Verevkina,
G.
L.
Chains
of
con-
ditioned
motor
reflexes
in
monkeys.
Doklad.
Akad.
Nauk
SSSR,
1960,
133,
730-733.
Skinner,
B.
F.
The
behavior
of
organisms:
an
experi-
mental
analysis.
New
York:
Appleton-Century,
1938.
Skinner,
B.
F.
"Superstition"
in
the
pigeon.
J.
exp.
Psychlol.,
1948,
38,
168-172.
Skinner,
B.
F.
Science
and
human
behavior.
New
York:
Macmillan,
1953.
Skinner,
B.
F.,
and
Morse,
WV.
H.
Sustained
perform-
ance
during
very
long
experimental
sessions.
J.
exp.
Anal.
Behav.,
1958,
1,
235-244.
Spence,
K.
W.
Continuous
versus
non-continuous
in-
terpretations
of
discrimination
learning.
Psychol.
Rev.,
1940,
47,
271-288.
Spence,
K.
W.
The
role
of
secondary
reinforcement
in
delayed
reward
learning.
Psychol.
Rev.,
1947,
54,
1-8.
Spence,
K.
W.
Theoretical
interpretations
of
learning.
In
S.
S.
Stevens
(ed.),
Handbook
of
experimzental
psychology.
New
York:
Wiley,
1951,
690-729.
Spence,
K.
W.
Behavior
theory
and
conditioning.
New
Haven:
Yale
University
Press,
1956.
Stein,
L.
Secondary
reinforcement
established
with
subcortical
reinforcement.
Science,
1958,
127,
466-
467.
Verplanck,
W.
S.
and
Hayes,
J.
R.
Eating
and
drinking
as
a
function
of
maintenance
schedule.
J.
comp.
physiol.
Psychol.,
1953,
46,
327-333.
Walker,
E.
L.
Learning.
In
Ann.
Rev.
Psychol.
(Farns-
worth,
P.
R.,
ed.)
Palo
Alto,
Calif.:
Annual
Re-
views,
1957.
Weiss,
B.
Thermal
behavior
of
the
subnourished
and
pantothenic
acid-deprived
rat.
J.
co?rnp.
physiol.
Psychlol.,
1957,
50,
481-485.
Wike,
E.
L.
and
Barrientos,
G.
Secondary
reinforce-
ment
and
multiple
drive
reduction.
J.
con7p.
physiol.
Psychlol.,
1958,
51,
640-643.
Wilson,
M.
P.,
and
Keller,
F.
S.
On
the
selective
re-
inforcement
of
spaced
responses.
J.
conIp.
physiol.
Psychol.,
1953,
46,
190-193.
Wolfe,
J.
B.
Effectiveness
of
token-rewards
for
chim-
panzees.
Comtip.
Psychol.
Monogr.,
1936,
12,
No.
60,
1-72.
Wyckoff,
L.
B.
The
role
of
observing
responses
in
dis-
crimination
learning.
Part
I.
Psychol.
Rev.,
1952,
59,
431-442.
Wyckoff,
L.
B.
Toward
a
quantitative
theory
of
sec-
on(lary
reinforcement.
Psychol.
Rev.,
1959,
66,
68-78.
Wyckoff,
L.
B.,
Sidowski,
J.
and
Chambliss,
D.
J.
An
experimental
study
of
the
relationship
between
secondary
reinforcing
and
cue
effects
of
a
stimulus.
J.
co?n
p.
physiol.
Psychol.,
1958,
51,
103-109.
Zimmerman,
D.
W.
Durable
secondary
reinforcement:
method
and
theory.
Psychol.
Rev.,
1957,
64,
373-383.
Zimmerman,
D.
W.
Sustained
performance
in
rats
based
on
secondary
reinforcement.
J.
comp.
physiol.
Psychlol.,
1959,
52,
353-358.
... According to previous research, the strength of a conditioned reinforcer varies, depending on the number of reinforcers (Dudley, Axe, Allen, & Sweeney-Kerwin, 2019;Helton & Ivy, 2016;Kelleher & Gollub, 1962). For example, in a study with two typically developing children, vocal stimuli paired with four di erent conditioned reinforcers were more e ective than a single reinforcer in encouraging the completion of math problems (Helton & Ivy, 2016). ...
... In general, a candidate for a conditioned reinforcer should be neutral, that is, associated with neither a positive nor a negative response (Kelleher & Gollub, 1962). Accordingly, previous studies attempting to establish praise have used phrases that were relatively unfamiliar to the participants, as a conditioned reinforcer (Dozier et al., 2012). ...
Article
This study aimed to explore whether praise alone could serve as a conditioned reinforcer after being paired with a preferred stimulus assumed to produce analogous sensory consequences of a given modality. Two children (5 years old) with autism spectrum disorder (ASD) received praise as a neutral stimulus with a toy as a reinforcement element. Toys were selected as tools that could deliver the same putative sensory outcomes of stereotypical behaviors. One participant was seated at a desk, and the other could move freely around the test room where other competing reinforcement objects were available. Praise was successfully established as a conditioned reinforcer for the seated participant but not for the other. The percentage of 10-s intervals with competing reinforcer interaction was higher in the baseline and praise periods than during the pairing period. Results suggest that practitioners should consider pairing praise with toys (assumed as having the same sensory results as stereotypy) while the child remains stationary, thus promoting praise as a conditioned reinforcer.
... For example, decades of learning the literature has detailed the variations in reinforcement schedules needed to promote learning across differing ages [16,17]. A reinforcement that is delayed by a second for an infant negates the learning potential in that moment, whereas for a child of 3 or 4 years of age, the schedule of reinforcement has a broader time span to promote the desired learning [18][19][20]. ...
Article
Full-text available
Intensive therapies have become increasingly popular for children with hemiparesis in the last two decades and are specifically recommended because of high levels of scientific evidence associated with them, including multiple randomized controlled trials and systematic reviews. Common features of most intensive therapies that have documented efficacy include: high dosages of therapy hours; active engagement of the child; individualized goal-directed activities; and the systematic application of operant conditioning techniques to elicit and progress skills with an emphasis on success-oriented play. However, the scientific protocols have not resulted in guiding principles designed to aid clinicians with understanding the complexity of applying these principles to a heterogeneous clinical population, nor have we gathered sufficient clinical data using intensive therapies to justify their widespread clinical use beyond hemiparesis. We define a framework for describing moment-by-moment therapeutic interactions that we have used to train therapists across multiple clinical trials in implementing intensive therapy protocols. We also document outcomes from the use of this framework during intensive therapies provided clinically to children (7 months–20 years) from a wide array of diagnoses that present with motor impairments, including hemiparesis and quadriparesis. Results indicate that children from a wide array of diagnostic categories demonstrated functional improvements.
... The idea was popularized by Pryor (1984) as a form of "clicker training" that could be used to train dogs, as well as other animals, for a variety of applied purposes. In the lab, the concept would become a focal point for understanding operant procedures (for reviews, see Fantino & Romanovich, 2007;Kelleher & Gollub, 1962). Similarly, the use of conditioned reinforcement for applied purposes has been examined, including whether conditioned reinforcement successfully improves some dimension of responding, such as speed of acquisition (Chiandetti et al., 2016;Dorey et al., 2020;Dorey & Cox, 2018;Gilchrist et al., 2021;Pfaller-Sadovsky et al., 2020). ...
Article
Full-text available
The field of applied behavior analysis has been directly involved in both research and applications of behavioral principles to improve the lives of captive zoo animals. Thirty years ago, Forthman and Ogden (1992) wrote one of the first papers documenting some of these efforts. Since that time, considerable work has been done using behavioral principles and procedures to guide zoo welfare efforts. The current paper reexamines and updates Forthman and Ogden's original points, with attention to the 5 categories they detailed: (a) promotion of species‐typical behavior, (b) reintroduction and repatriation of endangered species, (c) animal handling, (d) pest control, and (e) animal performances. In addition, we outline 3 current and future directions for behavior analytic endeavors: (a) experimental analyses of behavior and the zoo, (b) applied behavior analysis and the zoo, and (c) single‐case designs and the zoo. The goal is to provide a framework that can guide future behavioral research in zoos, as well as create applications based on these empirical evaluations.
... The idea was popularized by Karen Pryor as a form of "clicker training" that could be used to train dogs, as well as other animals, for a variety of applied purposes (Pryor, 1984). In the lab, the concept would become a focal point for understanding operant procedures (for reviews, see Fantino & Romanovich, 2007;Kelleher & Gollub, 1962). Similarly, the use of conditioned reinforcement for applied purposes has been examined, including whether conditioned reinforcement successfully improves some dimension of responding, such as speed of acquisition (Chiandetti et al., 2016;Dorey et al., 2020;Dorey & Cox, 2018;Gilchrist et al., 2021;Pfaller-Sadovsky et al., 2020). ...
Preprint
Full-text available
The field of applied behavior analysis has been directly involved in both research and applications of behavioral principles to improve the lives of captive zoo animals. Thirty years ago, Forthman and Ogden (1992) wrote one of the first papers documenting some of these efforts. Since that time, considerable work has been done using behavioral principles and procedures to guide zoo welfare efforts. The current paper reexamines and updates Forthman and Ogden's original points, with attention to the five categories they detailed: (1) promotion of species-typical behavior, (2) reintroduction and repatriation of endangered species, (3) animal handling, (4) pest control, and (5) animal performances. In addition, we outline three current and future directions for behavior analytic endeavors: (i) experimental analyses of behavior and the zoo, (ii) applied behavior analysis and the zoo, and (iii) within-subject methodology and the zoo. The goal is to provide a framework that can guide future behavioral research in zoos, as well as create applications based on these empirical evaluations.
... Most important, all token economies and the large majority of CM interventions produce their effects through the delivery of generalized conditioned reinforcers (i.e., conditioned reinforcers that have been paired with several primary reinforcers). For example, money is effectively established as a reinforcer by an enormous number of motivating operations, in that it can be exchanged for anything that can be bought (Kelleher & Gollub, 1962;Skinner, 1953). In a token economy, a token may be exchangeable for many different back-up reinforcers and can maintain a target behavior in the face of changes in preferences for back-up reinforcers, whether temporary or permanent (Skinner, 1953). ...
Article
Contingency management (CM) interventions are based on operant principles and are effective in promoting health behaviors. Despite their success, a common criticism of CM is that its effects to not persist after the intervention is withdrawn. Many CM studies evaluate posttreatment effects, but few investigate procedures for promoting maintenance. Token economy interventions and CM interventions are procedurally and conceptually similar. The token economy literature includes many studies in which procedures for promoting postintervention maintenance are evaluated. A systematic literature review was conducted to synthesize the literature on treatment maintenance in token economies. Search procedures yielded 697 articles, and application of inclusion/exclusion criteria resulted in 37 articles for review. The most successful strategy is to combine procedures. In most cases, thinning or fading was combined with programmed transfer of control via social reinforcement or self-management. Social reinforcement and self-monitoring procedures appear to be especially important, and were included in 70% of studies involving combined approaches. Thus, our primary recommendation is to incorporate multiple maintenance strategies, at least one of which should facilitate transfer of control of the target behavior to other reinforcers. In addition, graded removal of the intervention, which has also been evaluated to a limited extent in CM, is a reasonable candidate for further development and evaluation. Direct comparisons of maintenance procedures are lacking, and should be considered a research priority in both domains. Researchers and clinicians interested in either type of intervention will likely benefit from ongoing attention to developments in both areas.
Article
We assessed whether novel praise statements could be used to (a) maintain and increase responses with existing reinforcement histories and (b) teach a previously untaught response among children diagnosed with autism spectrum disorder across two experiments. During response–stimulus pairing, two responses resulted in preferred edibles but only one also produced a praise statement. In the absence of edibles, the response continuing to produce praise tended to persist more. Next, reversing the praise contingency tended to increase the other response. However, in no case did contingent delivery of those same praise statements result in the acquisition of untaught responses. These findings suggest that conditioning praise statements could serve different functions (antecedent or consequence) depending on the reinforcement history for particular responses.
Article
The behavioral repertoire grows and develops through a lifetime in a manner intricately dependent on bidirectional connections between its current form and the shaping environment. Behavior analysis has discovered many of the key relationships that occur between repertoire elements that govern this constant metamorphosis, including the behavioral cusp: an event that triggers contact with new behavioral contingencies. The current literature already suggests possible integration of the behavioral cusp and related concepts into a wider understanding of behavioural development and cumulative learning. Here we share an attempted step in that progression: an approach to an in-depth characterization of the features and connections underlying cusp variety. We sketch this approach on the basis of differential involvement of contingency terms; the relevance to the cusp of environmental context, accompanying repertoire, or response properties; the connections of particular cusps to other behavioral principles, processes, or concepts; the involvement of co-evolving social repertoires undergoing mutual influence; and the ability of cusps to direct the repertoire either toward desired contingencies or away from a growth-stifling repertoire. We discuss the implications of the schema for expanded applied considerations, the programming of unique cusps, and the need for incorporating cultural context into the cusp. We hope that this schema could be a starting point, subject to empirical refinement, leading to an expanded understanding of repertoire interconnectivity and ontogenetic evolution.
Article
Attentional prioritization of stimuli in the environment plays an important role in overt choice. Previous research shows that prioritization is influenced by the magnitude of paired rewards, in that stimuli signalling high-value rewards are more likely to capture attention than stimuli signalling low-value rewards; and this attentional bias has been proposed to play a role in addictive and compulsive behaviours. A separate line of research has shown that win-related sensory cues can bias overt choices. However, the role that these cues play in attentional selection is yet to be investigated. Participants in the current study completed a visual search task in which they responded to a target shape in order to earn reward. The colour of a distractor signalled the magnitude of reward and type of feedback on each trial. Participants were slower to respond to the target when the distractor signalled high reward compared to when the distractor signalled low reward, suggesting that the high-reward distractors had increased attentional priority. Critically, the magnitude of this reward-related attentional bias was further increased for a high-reward distractor with post-trial feedback accompanied by win-related sensory cues. Participants also demonstrated an overt choice preference for the distractor that was associated with win-related sensory cues. These findings demonstrate that stimuli paired with win-related sensory cues are prioritized by the attention system over stimuli with equivalent physical salience and learned value. This attentional prioritization may have downstream implications for overt choices, especially in gambling contexts where win-related sensory cues are common.
Article
Three pigeons were exposed to second-order schedules in which responding under a fixed-interval (FI) component schedule was reinforced according to a variable-interval (VI) schedule of food reinforcement. Completion of each component resulted in either (1) brief presentation of a stimulus present during reinforcement (paired brief stimulus), (2) brief presentation of a stimulus not present during reinforcement (nonpaired brief stimulus), or (3) no stimulus presentation (tandem schedule). Under the two nonpaired brief stimulus conditions, either a change in keylight color or onset of houselight illumination was used as the brief stimulus. Similar patterns of keypecking occurred under tandem and nonpaired keylight brief-stimulus presentations, whereas nonpaired houselight brief-stimulus presentations generated positively accelerated within-component keypeck patterning for two pigeons. When the same keylight brief stimulus was paired with food, positively accelerated patterns of keypecking were obtained for all pigeons. Differences in the effects of nonpaired brief-stimulus presentations on second-order schedule performance suggest that component schedule patterning under nonpaired brief-stimulus procedures is a function of the particular type of stimulus used (i.e., houselight versus keylight). These results suggest that (1) brief houselight illumination may function as a sensory reinforcer, and (2) a briefly presented food-paired stimulus can function as an effective conditioned reinforcer.