ArticlePDF Available

Effective and efficient forgetting of learned knowledge in Soar’s working and procedural memories

Authors:

Figures

Content may be subject to copyright.
Effective and efficient forgetting of learned knowledge
in Soar’s working and procedural memories
Action editor: David Peebles
Nate Derbinsky
, John E. Laird
University of Michigan, 2260 Hayward Street, Ann Arbor, MI 48109-2121, USA
Available online 5 January 2013
Abstract
Effective management of learned knowledge is a challenge when modeling human-level behavior within complex, temporally extended
tasks. This work evaluates one approach to this problem: forgetting knowledge that is not in active use (as determined by base-level acti-
vation) and can likely be reconstructed if it becomes relevant. We apply this model to the working and procedural memories of Soar.
When evaluated in simulated, robotic exploration and a competitive, multi-player game, these policies improve model reactivity and scal-
ing while maintaining reasoning competence. To support these policies for real-time modeling, we also present and evaluate a novel algo-
rithm to efficiently forget items from large memory stores while preserving base-level fidelity.
Ó2013 Elsevier B.V. All rights reserved.
Keywords: Large-scale cognitive modeling; Working memory; Procedural memory; Cognitive architecture; Soar
1. Introduction
Typical cognitive models persist for short periods of
time (seconds to a few minutes) and have modest learning
requirements. For these models, current cognitive architec-
tures, such as Soar (Laird, 2012) and ACT-R (Anderson
et al., 2004), executing on commodity computer systems,
are sufficient. However, prior work (e.g. Kennedy & Traf-
ton, 2007) has shown that cognitive models of complex,
protracted tasks can accumulate large amounts of knowl-
edge, and that the computational performance of existing
architectures degrades as a result.
This issue, where more knowledge can harm problem-
solving performance, has been dubbed the utility problem,
and has been studied in many contexts, such as explana-
tion-based learning (Minton, 1990; Tambe, Newell, &
Rosenbloom, 1990), case-based reasoning (Smyth &
Keane, 1995; Smyth & Cunningham, 1996), and language
learning (Daelemans, van den Bosch, & Zavrel, 1999).
Markovitch and Scott (1988) have characterized strategies
for dealing with the utility problem in terms of information
filters applied at different stages in the problem-solving
process. One common strategy that is relevant to cognitive
modeling is selective retention, or forgetting, of learned
knowledge. The benefit of this approach, as opposed to
selective utilization, is that the agent does not have to
expend computational resources at run time to decide
whether to utilize knowledge or not, a property that may
be crucial for real-time modeling in temporally extended,
complex tasks. However, it can be challenging to devise
forgetting policies that work well across a variety of prob-
lem domains, effectively balancing the task performance of
models with reductions in retrieval time and storage
requirements of learned knowledge.
In context of this challenge, we present two tasks where
effective behavior requires that the model accumulate large
amounts of information from the environment, and where
over time this learned knowledge overwhelms reasonable
computational limits. In response, we present and evaluate
novel policies to forget learned knowledge in the working
1389-0417/$ - see front matter Ó2013 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.cogsys.2012.12.003
Corresponding author.
E-mail addresses: nlderbin@umich.edu (N. Derbinsky), laird@umich.
edu (J.E. Laird).
www.elsevier.com/locate/cogsys
Available online at www.sciencedirect.com
Cognitive Systems Research 24 (2013) 104–113
and procedural memories of Soar. These policies investi-
gate a common hypothesis: it is rational for the architec-
ture to forget a unit of knowledge when there is a high
degree of certainty that it is not of use, as calculated by
base-level activation (Anderson et al., 2004), and that it
can be reconstructed in the future if it becomes relevant.
We demonstrate that these task-independent policies
improve model reactivity and scaling, while maintaining
problem-solving competence. To support these policies
for real-time modeling, we also present and evaluate a
novel algorithm to efficiently forget items from large mem-
ory stores while preserving fidelity of base-level activation.
2. Related work
Previous cognitive-modeling research has investigated
forgetting in order to account for human behavior and
experimental data. As a prominent example, memory decay
has long been a core commitment of the ACT-R theory
(Anderson et al., 2004), as it has been shown to account
for a class of memory retrieval errors (Anderson, Reder,
& Lebiere, 1996). Similarly, research in Soar investigated
task-performance effects of forgetting short-term (Chong,
2003) and procedural (Chong, 2004) knowledge. By con-
trast, the motivation for this work is to discover the degree
to which forgetting can support long-term, real-time mod-
eling in complex tasks.
Prior work has demonstrated that there are potential
cognitive benefits to using memory decay, such as in
task-switching (Altmann & Gray, 2002) and heuristic
inference (Schooler & Hertwig, 2005). In this paper, we
focus on improving reactivity and scaling.
We extend prior investigations of long-term symbolic
learning in Soar (Kennedy & Trafton, 2007), where the
source of learning was internal problem solving. In this
paper, the evaluation domains accumulate information
from interaction with an external environment.
Prior work has addressed many of the computational
challenges associated with retrieving a single memory
according to the base-level activation (BLA) model (Pet-
rov, 2006; Derbinsky, Laird, & Smith, 2010; Derbinsky &
Laird, 2011). However, efficiently removing items from
memory, while preserving BLA fidelity, presents a different
challenge. As such, before presenting the empirical evalua-
tion domains, we formally describe this computational
problem; present a novel algorithm to forget according to
BLA in large memories; and evaluate our approach with
synthetic data.
3. The Soar cognitive architecture
Soar is a cognitive architecture that has been used for
developing intelligent agents and modeling human cogni-
tion. Historically, one of Soar’s main strengths has been
its ability to efficiently represent and bring to bear large
amounts of symbolic knowledge to solve diverse problems
using a variety of methods (Laird, 2012).
Fig. 1 shows the structure of Soar. At the center is a
symbolic working memory that represents the agent’s cur-
rent state. It is here that perception, goals, retrievals from
Fig. 1. The Soar cognitive architecture (Laird, 2012).
N. Derbinsky, J.E. Laird / Cognitive Systems Research 24 (2013) 104–113 105
long-term memory, external action directives, and struc-
tures from intermediate reasoning are jointly represented
as a connected, directed graph. The primitive representa-
tional unit of knowledge in working memory is a symbolic
triple (identifier,attribute,value), termed a working-memory
element, or WME. The first symbol of a WME (identifier)
must be an existing node in the graph, whereas the second
(attribute) and third (value) symbols may be either terminal
constants or non-terminal graph nodes. Multiple WMEs
that share the same identifier are termed an object,and
the set of individual WMEs sharing that identifier are
termed augmentationsof that object.
Procedural memory stores the agent’s knowledge of
when and how to perform actions, both internal, such as
querying long-term declarative memories, and external,
such as controlling robotic actuators. Knowledge in this
memory is represented as if-then rules. The conditions of
rules test patterns in working memory and the actions of
rules add and/or remove working-memory elements. Soar
makes use of the Rete algorithm for efficient rule matching
(Forgy, 1982) and retrieval time scales to large stores of
procedural knowledge (Doorenbos, 1995). However, in
the worst case, the Rete algorithm scales linearly with the
number of elements in working memory, a computational
issue that motivates maintaining a relatively small working
memory.
Soar learns procedural knowledge via chunking (Laird,
Rosenbloom, & Newell, 1986) and reinforcement-learning
(RL; Nason & Laird, 2005) mechanisms. Chunking creates
new rules: it converts deliberate subgoal processing into
reactive rules by compiling over rule-firing traces, a form
of explanation-based learning (EBL). If subgoal processing
does not interact with the environment, the chunked rule is
redundant with existing knowledge and serves to improve
performance by reducing deliberate processing. However,
memory usage in Soar scales linearly with the number of
rules, typically at a rate of 1–5 KB/rule, which provides a
motivation for forgetting under-utilized rules.
Reinforcement learning incrementally tunes existing rule
actions: it updates the expectation of action utility, with
respect to a subset of state (represented in rule conditions)
and an environmental and/or intrinsic reward signal. A
rule that can be updated by the RL mechanism (termed
an RL rule) must satisfy a few simple criteria related to
its actions, and is thus distinguishable from other rules.
This distinction is relevant to forgetting rules. When an
RL rule that was learned via chunking is updated, that rule
is no longer redundant with the knowledge that led to its
creation, as it now incorporates information from environ-
mental interaction that was not captured in the original
subgoal processing.
Soar incorporates two long-term declarative memories,
semantic and episodic (Derbinsky & Laird, 2010). Seman-
tic memory stores working-memory objects, independent
of overall working-memory connectivity (Derbinsky
et al., 2010), and episodic memory incrementally encodes
and temporally indexes snapshots of working memory,
resulting in an autobiographical history of agent experience
(Derbinsky, Li, & Laird, 2012; Nuxoll & Laird, 2012).
Agents retrieve knowledge from one of these memory sys-
tems by constructing a symbolic cue in working memory;
the intended memory system then interprets the cue,
searches its store for the best matching memory, and if it
finds a match, reconstructs the associated knowledge in
working memory. For episodic memory, the time to recon-
struct knowledge depends on the size of working memory
at the time of encoding, another motivation for a concise
agent state (Derbinsky & Laird, 2009).
Agent reasoning in Soar consists of a sequence of deci-
sions, where the aim of each decision is to select and apply
an operator in service of the agent’s goal(s). The primitive
decision cycle consists of the following phases: encode per-
ceptual input; fire rules to elaborate agent state, as well as
propose and evaluate operators; select an operator; fire
rules that apply the operator; and then process output
directives and retrievals from long-term memory. Unlike
ACT-R, multiple rules may fire in parallel during a single
phase. The time to execute the decision cycle, which pri-
marily depends on the speed with which the architecture
can match rules and retrieve knowledge from episodic
and semantic memories, determines agent reactivity. We
have found that 50 ms is an acceptable upper bound on this
response time across numerous domains, including robot-
ics, video games, and human–computer interaction (HCI)
tasks.
There are two types of persistence for working-memory
elements added as the result of rule firing. Rules that fire to
apply a selected operator create operator-supported struc-
tures. These WMEs will persist in working memory until
deliberately removed. In contrast, rules that do not test a
selected operator create instantiation-supported structures,
which persist only as long as the rules that created them
match. This distinction is relevant to forgetting WMEs.
As evident in Fig. 1, Soar has additional memories and
processing modules; however, they are not pertinent to this
paper and are not discussed further.
4. Efficient forgetting via base-level activation
In later sections, we present and evaluate forgetting pol-
icies in the working and procedural memories of Soar.
Both of these policies use base-level activation (BLA;
Anderson et al., 2004) as a heuristic for identifying memo-
ries that may not be useful to the agent. In this section, we
formally describe the computational problem of forgetting
according to the BLA model; present a novel approach that
scales efficiently in large memories; and evaluate our
approach using synthetic data.
4.1. Problem formulation
Let memory Mbe a set of elements, {m1, m2, ... }. Let
each element m
i
be defined as a set of pairs (a
ij
,k
ij
), where
k
ij
refers to the number of times element m
i
was activated
106 N. Derbinsky, J.E. Laird / Cognitive Systems Research 24 (2013) 104–113
at time a
ij
. We assume jm
i
j6c: the number of activation
events for any element is bounded. These assumptions
are consistent with the ACT-R declarative memory when
bounding chunk-history size (Petrov, 2006). This is also
consistent with the semantic memory in Soar (Laird, 2012).
We assume that activation of an element mat time tis
computed according to the BLA model (Anderson et al.,
2004), where dis a fixed decay parameter:
Bðm;t;dÞ¼ln X
jmj
j¼1
kj½tajd
!
We define an element as decayed, with respect to a thresh-
old parameter Hif B(m,t,d)<H. Given a static element m,
we define Las the fewest number of time steps required for
the element to decay, relative to time step t:
Lðm;t;d;HÞ:¼infftd2N:Bðm;tþtd;dÞ<Hg
For example, element x= {(3, 1), (5, 2)} was activated once
at time step three and twice at time step five. Assuming de-
cay rate 0.5 and threshold 2, xhas activation about 0.649
at time step 7 and is not decayed: L(x, 7,0.5, 2) = 489.
During a model time step t, the following actions can
occur with respect to memory M:
S1. A new element is added to M.
S2. An existing element is removed from M.
S3. An existing element is activated ytimes.
If S3 occurs with respect to element m
i
, a new pair (t,y)
is added to m
i
. To maintain a bounded history size, if jm
i-
j>c, the pair with smallest a(i.e. the oldest) is removed
from m
i
.
Thus, given a memory M, we define that the forgetting
problem, at each time step, t, is to identify the subset of ele-
ments, D#M, that have decayed since the last time step.
4.2. Efficient approach
Given this problem definition, a naı
¨ve approach is to
determine the decay status of each element at every time
step. This test requires computation OðjM, scaling linearly
with average memory size. The computation expended
upon each element, m
i
, will be linear in the number of time
steps where m
i
2M, estimated as OðLÞfor a static element.
Our approach draws inspiration from the work of Nux-
oll, Laird, and James (2004): rather than checking memory
elements for decay status, predictthe future time step
when the element will decay. First, at each time step, exam-
ine elements that either (S1) were not previously in the
memory or (S3) were activated. The number of items
requiring inspection is bounded by the total number of ele-
ments (jMj), but is likely to be a small subset, assuming few
memory elements are created or tested by the model at each
time step. For each of these elements, predict the time of
future decay (discussed shortly) and add the element to a
map, where the map key is the predicted time step and
the value is the set of elements predicted to decay at that
time. If the element was already within the map (S3),
remove it from its old location before adding to its new
location. All insertions/removals require time at most log-
arithmic in the number of distinct decay time steps, which
is bounded by the total number of elements (jMj). At any
time step, the set Dis those elements in the set indexed
by the current time step that are decayed.
To predict element decay, we present a novel, two-phase
process. After a new activation (S3), first employ an
approximation that is guaranteed to underestimate the true
value of L. If, at a future time step, an element is in Dand it
has not decayed, then compute the exact prediction using a
binary parameter search.
We approximate Lfor an element mas the sum of Lfor
each independent pair (a,k)2m. Here we derive the closed-
form calculation: given a single element pair at time t,we
solve for t
p
, the future time of element decay ...
lnðk½tpþðtaÞdÞ¼H
lnðkÞdlnðtpþðtaÞÞ ¼ H
tp¼e
HlnðkÞ
dðtaÞ
Since krefers to a single time point, a, we rewrite the
summed terms as a product. Furthermore, we time shift
the decay term by the difference between the current time
step, t, and that of the element pair, a, thereby predicting L.
Computing this approximation for a single pair takes
constant time (and common values can be cached). Overall
approximation computation is linear in the number of
pairs, which is bounded by c, and therefore Oð1Þ. The com-
putation required for binary parameter search of an element
is Oðlog2LÞ. However, this computation is only necessary if
the element has not decayed, or removed from M.
4.3. Synthetic evaluation
In later sections, we empirically evaluate this approach
with it embedded within the working and procedural mem-
ories of Soar; here we focus on the quality and efficiency of
our prediction approach and utilize synthetic data. This
synthetic data set comprises 50,000 memory elements, each
with a randomly generated pair set. The size of each ele-
ment was randomly selected from between 1 and 10, the
number of activations per pair (k) was randomly selected
between 1 and 10, and the time of each pair (a) was ran-
domly selected between 1 and 999. We verified that each ele-
ment had a valid history with respect to time step 1000,
meaning that each element would not have decayed before
t= 1000. In addition, each element contained a pair with at
least one access at time point 999, which simulated a fresh
activation (S3). For this evaluation, we used decay rate
d= 0.8 and threshold H=1.6. Given these constraints,
the largest possible value of Lfor an element was 3332.
We first evaluated the quality of the decay approxima-
tion. In Fig. 2, the y-axis is the cumulative proportion of
N. Derbinsky, J.E. Laird / Cognitive Systems Research 24 (2013) 104–113 107
the elements and the x-axis plots absolute temporal error of
the approximation, where a value of 0 indicates that the
approximation was correct, and non-zero indicates how
many time steps the approximation under-predicted. We
see that the approximation was correct for over 60% of
the elements, but did underestimate over 500 time steps
for 20% of the elements and over 1000 time steps for 1%
of the elements. Under the constraints of this data set, it
was possible for this approximation to underestimate up
to 2084 time steps.
We also compared the prediction time, in microseconds
(ls), of the approximation to an exact calculation using
binary parameter search. The maximum computation time
across the data set was >19faster for the approximation
(1.37 vs. 26.28 ls/element) and the average time was >15
faster (0.31 vs. 4.73 ls/element). We did not compare these
results with a naı
¨ve algorithm, wherein computation time
at each time step depends upon the number of memory ele-
ments (jMj). This comparison would have required a model
of memory size across a variety of tasks and such a model
would have been difficult to develop, as prior work (Doo-
renbos, 1995; Derbinsky et al., 2012) has shown that while
the number of memory changes tends to be small across a
variety of problem domains, absolute size can vary drasti-
cally between tasks.
In summary, this two-phase forgetting approach main-
tains fidelity to the BLA model (due to the second phase
of prediction) and scales to large memories. Results from
synthetic data show that the first phase of our approach
is a high-quality approximation and is an order of magni-
tude less costly than the exact calculation in the second
phase.
5. Forgetting in Soar’s working memory
The core intuition of our working-memory forgetting
policy is to remove the augmentations of objects that are
not actively in use and that the model can later reconstruct
from long-term semantic memory, if they become relevant.
As defined earlier, we characterize WME usage via the
base-level activation model (BLA; Anderson et al., 2004),
which estimates future usefulness of memory based upon
prior usage. The primary activation event for a working-
memory element is the firing of a rule that tests or creates
that WME. In addition, when a rule first adds an element
to working memory, the activation of the new WME is ini-
tialized to reflect the aggregate activation of the set of
WMEs responsible for its creation. This model of activa-
tion sources, events, and decay is task independent.
At the end of each decision cycle, Soar removes from
working memory each element that satisfies all of the fol-
lowing requirements, with respect to s, a static, architec-
tural threshold parameter:
R1. The WME was not encoded directly from perception.
R2. The WME is operator-supported.
R3. The activation level of the WME is less than s.
R4. The WME augments an object, o, in semantic
memory.
R5. The activation levels of all augmentations of oare less
than s.
We adopted requirements R1-R3 from Nuxoll et al.
(2004), whereas R4 and R5 are novel. Requirement R1
distinguishes between the decay of representations of
perception, and any dynamics that may occur with actual
sensors, such as refresh rate, fatigue, noise, or damage.
Requirement R2 is a conceptual optimization: as opera-
tor-supported WMEs are persistent, while instantiation-
supported structures are direct entailments. Thus, if we
properly remove operator-supported WMEs, any instantia-
tion-supported structures that depend on them will also be
removed. Therefore our mechanism only manages opera-
tor-supported structures. The concept of a fixed lower
bound on activation, as defined by R3, was adopted from
activation limits in ACT-R (Anderson et al., 1996), and
dictates that working-memory elements will decay in a
task-independent fashion as their use for reasoning becomes
less recent/frequent.
Requirement R4 dictates that our mechanism only
removes elements from working memory that augment
objects in semantic memory. This requirement serves to
balance the degree of working-memory decay with support
for sound reasoning. Knowledge in Soar’s semantic mem-
ory is persistent, though it may change over time. Depend-
ing on the task and the model’s knowledge-management
strategies, it is possible that forgotten working-memory
knowledge may be recovered via deliberate reconstruction
from semantic memory. Additionally, augmentations of
objects that are not in semantic memory can persist indef-
initely to support model reasoning.
Requirement R5 supplements R4 by providing partial
support for the closed-world assumption. R5 dictates that
either all object augmentations are removed, or none. This
policy leads to an object-oriented representation whereby
procedural knowledge can distinguish between objects that
have been completely cleared of substructure, and those
that simply are not augmented with a particular feature
or relation. R5 makes an explicit tradeoff, weighting more
heavily model competence at the expense of the speed of
Fig. 2. Evaluation of decay-approximation quality.
108 N. Derbinsky, J.E. Laird / Cognitive Systems Research 24 (2013) 104–113
working-memory decay. This requirement resembles the
declarative module of ACT-R, where activation is associ-
ated with each chunk and not individual slot values.
5.1. Empirical evaluation
We extended an existing system where Soar controls a
simulated mobile robot (Laird, Derbinsky, & Voigt,
2011). Our evaluation uses a simulation instead of a real
robot because of the practical difficulties in running numer-
ous, long experiments in large physical spaces. However,
the simulation is quite accurate and the Soar rules (and
architecture) used in the simulation are exactly the same
as the rules used to control the real robot.
The robot’s task is to visit every room on the third floor
of the Bob & Betty Beyster building at the University of
Michigan. For this task, the robot visits over 100 rooms
and takes about 1 h of real time. During exploration, it
incrementally builds an internal topological map, which,
when completed, requires over 10,000 WMEs to represent
and store. In addition to storing information, the model
reasons about and plans using the map in order to find effi-
cient paths for moving to distant rooms it has sensed but
not visited. The model uses episodic memory to recall
objects and other task-relevant features during exploration.
In our experiments, we aggregate working-memory size
and maximum decision time for each 10 s of elapsed time,
all of which is performed on an Intel i7 2.8 GHz CPU, run-
ning Soar v9.3.1. Because each experimental run takes 1 h,
we did not duplicate our experiments sufficiently to estab-
lish statistical significance and the results we present are
from individual experimental runs. However, we found
qualitative consistency across our runs, such that the vari-
ance between runs is small as compared to the trends we
focus on below.
We make use of the same model for all experiments, but
modify small amounts of procedural knowledge and
change architectural parameters, as described here. The
baseline model (A0) maintains all declarative map informa-
tion in Soar’s working memory. A modification to this
baseline (A1) maintains the declarative map in both work-
ing and semantic memories, and additionally includes
hand-coded rules to prune away rooms in working memory
that are not required for immediate reasoning or planning,
as well as to reconstruct these structures from semantic
memory when they are needed. The experimental model
(A2) also maintains the declarative map in both working
and semantic memories, but rather than task-specific rules,
it makes use of our task-independent working-memory for-
getting policy to prune working-memory structures and
task-independent rules to reconstruct knowledge, as
needed, from semantic memory. For this experimental con-
dition, we held constant the activation-history size (c= 10)
and base-level threshold (s=2), but explored a set of
decay-rate values (d2{0.3, 0.4, 0.5}). For more aggressive
decay rates (dP0.6), the model was unable to maintain
sufficient declarative-map data in working memory to com-
plete planning in this task.
Fig. 3 compares working-memory size between condi-
tions A0, A1, and A2 over the duration of the experiment.
We note first the major difference in working-memory size
between A0 and A1 after one hour, when the working
memory of A1 contains more than 11,000 fewer elements,
more than 90% less than A0. We also find that the greater
the decay-rate parameter for A2, the smaller the working-
memory size, where a value of 0.5 qualitatively tracks
A1. This finding suggests that our policy, with an appropri-
ate decay, keeps working-memory size comparable to that
maintained by hand-coded rules.
Fig. 4 compares maximum decision-cycle time in ms,
between conditions A0, A1, and A2 as the simulation pro-
gresses. The dominant cost reflected by this data is time to
reconstruct prior episodes that are retrieved from episodic
memory. We see a growing difference in time between A0
and A2 as working memory is more aggressively managed
(i.e. greater decay rate), demonstrating that episodic recon-
Fig. 3. Model working-memory size comparison.
N. Derbinsky, J.E. Laird / Cognitive Systems Research 24 (2013) 104–113 109
struction, which scales with the size of working memory at
the time of episodic encoding, benefits from forgetting. We
also find that with a decay rate of 0.5, our mechanism per-
forms comparably to A1. We note that without sufficient
working-memory management (A0; A2 with decay rate
0.3), episodic-memory retrievals are not tenable for a
model that must reason with this amount of acquired infor-
mation, as the maximum required processing time exceeds
the reactivity threshold of 50 ms.
5.2. Discussion
It is possible to write rules that prune Soar’s working
memory; however, this task-specific knowledge is difficult
to encode and learn.
In this work, we presented and evaluated a novel, task-
independent approach that utilizes a memory hierarchy to
bound working-memory size while maintaining sound rea-
soning. This approach assumes that the amount of knowl-
edge required for immediate reasoning is small relative to
the overall amount of knowledge accumulated by the
model. Under this assumption, as demonstrated in the
robotic evaluation task, our policy scales even as learned
knowledge grows large over long trials.
We note that since Soar’s semantic memory can change
over time and is independent of working memory, our for-
getting policy does admit a class of reasoning errors
wherein the contents of semantic memory are changed so
as to be inconsistent with decayed WMEs. However, this
corruption requires deliberate reasoning in a relatively
small time window and has not arisen in our models. While
the model completed this task for all conditions reported
here, at larger decay rates (dP0.6) the model thrashed
because map information was not held in working memory
long enough to complete deep look-ahead planning. Based
upon this finding, we expect that if the agent had to per-
form deeper searches within this task, then the model
would thrash with even less aggressive decay rates (e.g.
d= 0.5), but we do not have data for such circumstances.
This line of reasoning suggests that additional research is
needed on either adaptive decay-rate settings or
approaches to planning, and other forms of temporally
extended reasoning, that are robust in the face of memory
decay.
6. Forgetting in Soar’s procedural memory
The core intuition of our procedural-memory forgetting
policy is to remove rules that are not actively used and that
the model can later reconstruct via deliberate subgoal rea-
soning, if the knowledge embedded in them is relevant to a
given situation. As with working-memory forgetting, we
characterize rule usage via the base-level activation model,
where the activation event is the firing of an instantiation
of a rule. As with our working-memory forgetting policy,
the activation source, event, and decay is task independent:
we utilize the base-level activation model to summarize the
history of rule firing.
At the end of each decision cycle, Soar removes from
procedural memory each rule that satisfies all of the follow-
ing requirements, with respect to parameter s:
R1. The rule was learned via chunking.
R2. The rule is not actively firing.
R3. The activation level of the rule is less than s.
R4. The rule has not been updated by RL.
We adopted R1–R3 from Chong (2004), whereas R4 is
novel. Chong was modeling human skill decay, and did
not delete rules, so as to not lose each rule’s activation his-
tory. Instead, decayed rules were prevented from firing,
similar to below-utility-threshold rules in ACT-R. R1 is a
practical consideration to distinguish learned knowledge
from innaterules developed by the modeler, which, if
modified, would likely break the model. R2 recognizes that
matched rules are in active use and thus should not be for-
gotten. R3 dictates that rules will decay in a task-indepen-
dent fashion as their use for reasoning becomes less recent/
Fig. 4. Model maximum decision time comparison.
110 N. Derbinsky, J.E. Laird / Cognitive Systems Research 24 (2013) 104–113
frequent. We note that for fixed parameters (dand s)anda
single activation, the BLA model is equivalent to the use-
gap heuristic of Kennedy and Trafton (2007). However,
the time between sequential rule firings ignores firing fre-
quency, which the BLA model incorporates.
Requirement R4 attempts to retain rules that include
information that cannot be regenerated in the future. Rules
learned by chunking can be regenerated if they have not
been updated by RL; however, once they have been
updated, they encode expected-utility information, which
is not recorded by any other learning mechanism and can-
not be regenerated if the rule is removed.
6.1. Empirical evaluation
We extended an existing system (Laird, Derbinsky, &
Tinkerhess, 2011) where Soar plays Liar’s Dice, a multi-
player game of chance. The rules of the game are numerous
and complex, yielding a task that has rampant uncertainty
and a large state space (millions-to-billions of relevant
states for games of 2–4 players). Prior work has shown that
RL allows Soar models to significantly improve perfor-
mance after playing a few thousand games. However, this
involves learning large numbers of RL rules to represent
the value function spanning this state space.
The model we use for all experiments learns two classes
of rules: RL rules that capture expected action utility; and
non-RL rules that capture symbolic game heuristics. Our
experimental baseline (B0) does not forget knowledge.
The first experimental modification (B1) implements our
forgetting policy, but does not enforce requirement R4
and is thereby comparable to prior work (Kennedy & Traf-
ton, 2007; Chong, 2004). The second modification (B2)
fully implements our policy. We experiment with a range
of representative decay rates (d), including 0.999, where
rules not immediately updated by RL are deleted (c= 10,
s=2 for all).
We alternated 1000 2-player games of training then test-
ing, each against a non-learning version of the model. After
each testing session, we recorded maximum memory usage
(Mac OS v10.7.3; dominated, in this task, by the rules in
procedural memory), task performance (% games won),
and average decisions/task action. We do not report max-
imum decision time, as this was below 6 ms. for all condi-
tions (Intel i7 2.8 GHz CPU, Soar v9.3.1). We collected
data for all conditions in at least three independent trials
of 40,000 games. For conditions that forget knowledge,
we were able to gather more data in parallel, due to
reduced memory consumption (six trials for d= 0.35, seven
for remaining).
Fig. 5 presents average memory growth, in megabytes,
as the model trains (within each experimental condition,
error bars of 1 standard deviation are too small to be con-
sistently visible on this plot and thus variance data is not
included in Fig. 5). For all models, the memory growth
of games 1–10 K follows a power law (R
2
P0.96), whereas
for 11–40 K, growth is linear (R
2
P0.99). These plots indi-
cate that memory usage for the baseline (B0) and the slowly
decaying model (B2, d= 0.3) is much greater, and faster
growing, than models that more aggressively decay. It also
shows that there is a diminishing benefit from faster decay
(e.g. d= 0.5 and d= 0.999 for B2 are indistinguishable).
Fig. 6 presents average task performance after 1000
games of training, where the error bars represent ±1 stan-
dard deviation. This data shows that given the inherent sto-
chasticity of the task, there is little, if any, difference
between the performance of the baseline (B0) and decay
levels of B2. However, by comparing B0 and B2 to B1, it
is clear that without R4, the model suffers a dramatic loss
of task competence. For clarity, the model begins by play-
ing a non-learning copy of itself and learns from experience
with each training session. While the B0 and B2 models
improve from winning 50% of games to 75–80%, the B1
model improves to below 55%. We conclude that a forget-
ting policy that only incorporates rule-firing history (e.g.
Chong, 2004; Kennedy & Trafton, 2007) will negatively
impact performance in tasks that involve informative inter-
action with an external environment. Our policy incorpo-
rates both rule-firing history and rule reconstruction, and
thus retains this source of feedback.
Fig. 5. Avg. memory usage vs. games played.
N. Derbinsky, J.E. Laird / Cognitive Systems Research 24 (2013) 104–113 111
Finally, Fig. 7 presents average number of decisions for
the model to take an action in the game after training for
10,000 games. In prior work (e.g. Kennedy & Trafton,
2007), this value was a major performance metric, as it
reflected the primary reason for learning new rules. In this
work, each decision takes very little time, and so the num-
ber of decisions to choose an action is not as crucial to task
performance as the correctness of the selected action. How-
ever, these data show that there exists a space of decay val-
ues (e.g. d= 0.35) in which memory usage is relatively low
and grows slowly (Fig. 5), task performance is relatively
high (Fig. 6), and the model makes decisions relatively
quickly (Fig. 7).
6.2. Discussion
This work contributes evidence that we can develop
models that improve using RL in tasks with large state
spaces. Currently, it is typical to explicitly represent the
entire state space, which is not feasible in complex prob-
lems. Instead, Soar learns rules to represent only those por-
tions of the space it experiences, and our policy retains only
those rules that include feedback from environmental
reward. Future work needs to validate this approach in
other domains.
7. Concluding remarks
This paper presents and evaluates policies and algo-
rithms for effective and efficient forgetting of learned
knowledge in complex environments. While forgetting
mechanisms are common in cognitive modeling, this work
pursues this line of research for functional reasons: improv-
ing computational resource usage while maintaining rea-
soning competence. We have presented compelling results
from applying these policies in two complex, temporally
extended tasks, but there is additional work to evaluate
these policies, and their parameters, across a wider variety
of problem domains.
Acknowledgment
We acknowledge the funding support of the Air Force
Office of Scientific Research, contract FA2386-10-1-4127.
References
Altmann, E. M., & Gray, W. D. (2002). Forgetting to remember: The
functional relationship of decay and interference. Psychological
Science, 13, 27–33.
Anderson, J. R., Bothell, D., Byrne, M. D., Douglass, S., Lebiere, C., &
Qin, Y. (2004). An integrated theory of the mind. Psychological
Review, 111, 1036–1060.
Anderson, J. R., Reder, L. M., & Lebiere, C. (1996). Working memory:
Activation limits on retrieval. Cognitive Psychology, 30, 221–256.
Chong, R. (2003). The addition of an activation and decay mechanism to
the Soar architecture. In Proceedings of the fifth international confer-
ence on cognitive modeling (pp. 45–50). Bamberg, Germany.
Chong, R. (2004). Architectural explorations for modeling procedural skill
decay. In Proceedings of the sixth international conference on cognitive
modeling. Pittsburgh, PA, USA.
Daelemans, W., van den Bosch, A., & Zavrel, J. (1999). Forgetting
exceptions is harmful in language learning. Machine Learning, 34,
11–41.
Derbinsky, N., & Laird, J. E. (2009). Efficiently implementing episodic
memory. In Proceedings of the 8th international conference on case-
based reasoning (pp. 403–417). Seattle, WA, USA.
Derbinsky, N., & Laird, J. E. (2010). Extending soar with dissociated
symbolic memories. In Proceedings of the 1st symposium on human
memory for artificial agents (pp. 31–37). Leicester, UK.
Derbinsky, N., & Laird, J. E. (2011). A functional analysis of historical
memory retrieval bias in the word sense disambiguation task. In
Proceedings of the 25th AAAI conference on artificial intelligence (pp.
663–668). San Francisco, CA, USA.
Derbinsky, N., Laird, J. E., & Smith, B. (2010). Towards efficiently
supporting large symbolic declarative memories. In Proceedings of the
10th international conference on cognitive modeling (pp. 49–54).
Philadelphia, PA, USA.
Derbinsky, N., Li, J., & Laird, J. E. (2012). A multi-domain evaluation of
scaling in a general episodic memory. In Proceedings of the 26th AAAI
conference on artificial intelligence (pp. 193–199). Toronto, Canada.
Doorenbos, R. B. (1995). Production matching for large learning systems.
Ph.D. Thesis. Carnegie Mellon University.
Forgy, C. L. (1982). Rete: A fast algorithm for the many pattern/many
object pattern match problem. Artificial Intelligence, 19, 17–37.
Kennedy, W. G., & Trafton, J. G. (2007). Long-term symbolic learning.
Cognitive Systems Research, 8, 237–247.
Laird, J. E., Derbinsky, N., & Tinkerhess, M. (2011). A case study in
integrating probabilistic decision making and learning in a symbolic
cognitive architecture: Soar plays dice. In Papers from the 2011 AAAI
fall symposium series: advances in cognitive systems (pp. 162–169).
Arlington, VA, USA.
Laird, J. E., Derbinsky, N., & Voigt, J. (2011). Performance evaluation of
declarative memory systems in Soar. In Proceedings of the 20th
behavior representation in modeling and simulation conference (pp. 33–
40). Sundance, UT, USA.
Fig. 6. Avg. task performance ±1 std. dev.
Fig. 7. Avg. decisions/task action ±1 std. dev.
112 N. Derbinsky, J.E. Laird / Cognitive Systems Research 24 (2013) 104–113
Laird, J. E. (2012). The Soar Cognitive Architecture. Cambridge: MIT
Press.
Laird, J. E., Rosenbloom, P. S., & Newell, A. (1986). Chunking in Soar:
The anatomy of a general learning mechanism. Machine Learning, 1,
11–46.
Markovitch, S., & Scott, P. D. (1988). The role of forgetting in learning. In
Proceedings of the fifth international conference on machine learning
(pp. 459–465). Ann Arbor, MI, USA.
Minton, S. (1990). Qualitative results concerning the utility of explana-
tion-based learning. Artificial Intelligence, 42, 363–391.
Nason, S., & Laird, J. E. (2005). Soar-RL: Integrating reinforcement
learning with Soar. Cognitive Systems Research, 6, 51–59.
Nuxoll, A. M., Laird, J. E., & James, M. (2004). Comprehensive working
memory activation in Soar. In Proceedings of the sixth international
conference on cognitive modeling (pp. 226–230). Pittsburgh, PA, USA.
Nuxoll, A. M., & Laird, J. E. (2012). Enhancing intelligent agents with
episodic memory. Cognitive Systems Research, 17–18, 34–48.
Petrov, A. A. (2006). Computationally efficient approximation of the base-
level learning equation in ACT-R. In Proceedings of the seventh
international conference on cognitive modeling (pp. 391–392). Trieste,
Italy.
Schooler, L. J., & Hertwig, R. (2005). How forgetting aids heuristic
inference. Psychological Review, 112, 610–628.
Smyth, B., & Cunningham, P. (1996). The utility problem analysed – A
case-based reasoning perspective. In Proceedings of the third European
workshop on case-based reasoning (pp. 392–399). Lausanne,
Switzerland.
Smyth, B., & Keane, M. T. (1995). Remembering to forget: A competence-
preserving case deletion policy for case-based reasoning systems. In
Proceedings of the fourteenth international joint conference on artificial
intelligence (pp. 377–383). Montreal, Quebec, Canada.
Tambe, M., Newell, A., & Rosenbloom, P. S. (1990). The problem of
expensive chunks and its solution by restricting expressiveness.
Machine Learning, 5, 299–349.
N. Derbinsky, J.E. Laird / Cognitive Systems Research 24 (2013) 104–113 113
... In Soar, some evaluation rules (called RL rules) have associated utility metadata that are updated using temporal difference learning. Soar also includes activation metadata for learned rules that is used for forgetting learned rules that are rarely used (Derbinsky & Laird, 2013). ...
Preprint
Full-text available
This is a detailed analysis and comparison of the ACT-R and Soar cognitive architectures, including their overall structure, their representations of agent data and metadata, and their associated processing. It focuses on working memory, procedural memory, and long-term declarative memory. I emphasize the commonalities, which are many, but also highlight the differences. I identify the processes and distinct classes of information used by these architectures, including agent data, metadata, and meta-process data, and explore the roles that metadata play in decision making, memory retrievals, and learning.
... Many well-established cognitive models, such as Soar (Laird 2012), ACT-R (Anderson et al. 2004) and Icarus (Langley 2006), employ functionally specific memory modules. Moreover, few such cognitive models further investigate the dynamics of long-term memory forgetting, e.g., Derbinsky and Laird (2013) heuristically define memory decay mechanisms in Soar. Nonetheless, we select AM-ART to emulate memory loss phenomena due to its (i) high consistency with the neural and psychological basis in terms of both the network architecture and functional dynamics and (ii) comprehensively defined memory encoding and retrieval parameters and mechanisms. ...
Article
Full-text available
Neurocomputational modelling of long-term memory is a core topic in computational cognitive neuroscience, which is essential towards self-regulating brain-like AI systems. In this paper, we study how people generally lose their memories and emulate various memory loss phenomena using a neurocomputational autobiographical memory model. Specifically, based on prior neurocognitive and neuropsychology studies, we identify three neural processes, namely overload, decay and inhibition, which lead to memory loss in memory formation, storage and retrieval, respectively. For model validation, we collect a memory dataset comprising more than one thousand life events and emulate the three key memory loss processes with model parameters learnt from memory recall behavioural patterns found in human subjects of different age groups. The emulation results show high correlation with human memory recall performance across their life span, even with another population not being used for learning. To the best of our knowledge, this paper is the first research work on quantitative evaluations of autobiographical memory loss using a neurocomputational model.
... Therefore, it is necessary to develop case-updating strategy to eliminate redundancy in the case library. Derbinsky et al. [56] proposed a case update method which forgets outdated cases and reinforces the memory of frequent cases. Based on this principle, the memory value (M) of each case has the following relationship: ...
Article
Full-text available
This paper proposes an ontology-based fault diagnosis method which overcomes the difficulty of understanding complex fault diagnosis knowledge of loaders and offers a universal approach for fault diagnosis of all loaders. This method contains the following components: (1) An ontology-based fault diagnosis model is proposed to achieve the integrating, sharing and reusing of fault diagnosis knowledge for loaders; (2) combined with ontology, CBR (case-based reasoning) is introduced to realize effective and accurate fault diagnoses following four steps (feature selection, case-retrieval, case-matching and case-updating); and (3) in order to cover the shortages of the CBR method due to the lack of concerned cases, ontology based RBR (rule-based reasoning) is put forward through building SWRL (Semantic Web Rule Language) rules. An application program is also developed to implement the above methods to assist in finding the fault causes, fault locations and maintenance measures of loaders. In addition, the program is validated through analyzing a case study.
... With the development of artificial intelligence and cognitive science since 1970s, artificial intelligence based modeling method for complex dynamic system control engineering has been paid more and more attention [1][2][3][4], and many human cognitive models for HCI (human-computer interaction) are proposed, such as MHP (Model Human Processor) [5,6], SOAR (State Operator And Result) model [7,8], etc. Moreover, various expert systems are developed based on case-based reasoning and rule-based reasoning to assist the optimization and decision making of computer control systems [9][10][11][12][13]. ...
Conference Paper
Based on psychological experiments and psychophysical mechanisms, it is possible to acquire manufacturing workshop operator's behavior and cognitive situations. A comprehensive optimization model for complex industrial processes is proposed by integrating expert operator's behavior and electrophysiological characters. By using the proposed multiple layer modeling method includes process control, manufacturing and management factors, a human-in-the-loop control system can be created including the situation awareness of manufacturing process and workshop operator cognition.
... The current work focuses on limiting memory expansion based on recency/ frequency principles, expanding only the memory elements with the highest activation. An alternative approach, proposed by Derbinsky and Laird (2013), employs the recency/frequency principles to delete memory elements with low activation. Future work will focus on employing these principles in concert, to further decrease memory size. ...
Article
Exploration and exploitation are common in entrepreneurial teams. This paper considers the relationship among entrepreneurial teams in business incubators (BIETs) and the relationship between leaders and members of BIETs. It also examines the effects of BIET learning, forgetting and exit and entry on their knowledge level (KL) ¹ in different environments and models; two general situations involving the development and use of knowledge in BIETs and business incubators. The results indicate that in a static environment, the rate of BIET learning from each other and BIET equilibrium KL are negatively correlated, but a moderate rate of forgetting leads to a positive correlation. Second, in a static environment within a BIET, the combination of the leader learning from members quickly and members learning from the leader slowly can improve BIETs’ KL. However, with forgetting, improving BIETs’ KL requires a combination of fast learning by the leader and moderate learning by members. Third, in a dynamic environment, maintaining a moderate amount of exit and entry and forgetting within BIETs moderately improves BIETs’ KL in the long run. This effect is even more significant with BIETs’ exit and entry.
Article
I apply my proposed modification of Soar/Spatial/Visual System and Kosslyn’s (1983) computational operations on images to problems within a 2 × 2 taxonomy that classifies research according to whether the coding involves static or dynamic relations within an object or between objects (Newcombe & Shipley, 2015). I then repeat this analysis for problems that are included in mathematics and science curricula. Because many of these problems involve reasoning from diagrams Hegarty’s (2011) framework for reasoning from visual-spatial displays provides additional support for organizing this topic. Two more relevant frameworks specify reasoning at different levels of abstraction (Reed, 2016) and with different combinations of actions and objects (Reed, 2018). The article concludes with suggestions for future directions.
Article
This paper proposes an ontology and CBR (case-based reasoning) based method which overcomes the difficulty for computers to understand complex structures of various mechanical products and makes the disassembly decision-making process of the products fully automated and cost-saving. In this method, (1) ontology concept is applied to the disassembly decision-making. This enables computers to understand and self-reason the CBR/RBR (rule-based reasoning) based disassembly decision-making process. Since ontology uniforms different kinds of disassembly-related knowledge from different sources, the integration and sharing of the knowledge could be achieved; (2) high flexible decision-making to various conditions with high quality is achieved by the combination of ontology and CBR; (3) to achieve the decision-making when CBR fails, an ontology based RBR method is designed to complement the shortage of CBR in the disassembly decision-making field. The paper also presents an application program to realise the proposed method. In addition, a case study is analysed to verify the validity and automation of the program.
Article
Full-text available
Episodic memory endows agents with numerous general cognitive capabilities, such as action modeling and virtual sensing. However, for long lived agents, there are numerous unexplored computational challenges in supporting useful episodic memory functions while maintaining real time reactivity. In this paper, we review the implementation of episodic memory in Soar and present an expansive evaluation of that system. We demonstrate useful applications of episodic memory across a variety of domains, including games, mobile robotics, planning, and linguistics. In these domains, we characterize properties of environments, tasks, and episodic cues that affect performance, and evaluate the ability of Soar's episodic memory to support hours to days of real time operation. Copyright © 2012, Association for the Advancement of Artificial Intelligence. All rights reserved.
Article
Full-text available
Over long lifetimes, learning agents accumulate large stores of knowledge. To support human-level decision-making, their cognitive architectures must efficiently manage this experience and bring to bear pertinent data to act in the world. 1 Prior psychological and computational work suggests the need for multiple, dissociated memory systems, citing significant functional and computational tradeoffs that arise when implementing a single memory mechanism for different types of learning tasks. In this context, we develop a memory-centric analysis of Soar 9, a general cognitive architecture that incorporates multiple long-term memories. In this analysis, we explore the functional abilities, computational opportunities, and theoretical challenges entailed by integrating a diverse set of symbolic memory systems.
Article
Full-text available
A rarely studied issue with using persistent computational models is whether the underlying computational mechanisms scale as knowledge is accumulated through learning. In this paper we evaluate the declarative memories of Soar: working memory, semantic memory, and episodic memory, using a detailed simulation of a mobile robot running for one hour of real-time. Our results indicate that our implementation is sufficient for tasks of this length. Moreover our system executes orders of magnitudes faster than real-time, with relatively modest storage requirements. We also project the computational resources required for extended operations.
Article
Effective access to knowledge within large declarative memory stores is one challenge in the development and understanding of long-living, generally intelligent agents. We focus on a sub-component of this problem: given a large store of knowledge, how should an agent's task-independent memory mechanism respond to an ambiguous cue, one that pertains to multiple previously encoded memories. A large body of cognitive modeling work suggests that human memory retrievals are biased in part by the recency and frequency of past memory access. In this paper, we evaluate the functional benefit of a set of memory retrieval heuristics that incorporate these biases, in the context of the word sense disambiguation task, in which an agent must identify the most appropriate word meaning in response to an ambiguous linguistic cue. In addition, we develop methods to integrate these retrieval biases within a task-independent declarative memory system implemented in the Soar cognitive architecture and evaluate their effectiveness and efficiency in three commonly used semantic concordances.
Book
The definitive presentation of Soar, one AI's most enduring architectures, offering comprehensive descriptions of fundamental aspects and new components. In development for thirty years, Soar is a general cognitive architecture that integrates knowledge-intensive reasoning, reactive execution, hierarchical reasoning, planning, and learning from experience, with the goal of creating a general computational system that has the same cognitive abilities as humans. In contrast, most AI systems are designed to solve only one type of problem, such as playing chess, searching the Internet, or scheduling aircraft departures. Soar is both a software system for agent development and a theory of what computational structures are necessary to support human-level agents. Over the years, both software system and theory have evolved. This book offers the definitive presentation of Soar from theoretical and practical perspectives, providing comprehensive descriptions of fundamental aspects and new components. The current version of Soar features major extensions, adding reinforcement learning, semantic memory, episodic memory, mental imagery, and an appraisal-based model of emotion. This book describes details of Soar's component memories and processes and offers demonstrations of individual components, components working in combination, and real-world applications. Beyond these functional considerations, the book also proposes requirements for general cognitive architectures and explicitly evaluates how well Soar meets those requirements.