ArticlePDF Available

Maximizing the Expected Information Gain of Cognitive Modeling via Design Optimization

Authors:

Abstract and Figures

To ensure robust scientific conclusions, cognitive modelers should optimize planned experimental designs a priori in order to maximize the expected information gain for answering the substantive question of interest. Both from the perspective of philosophy of science, but also within classical and Bayesian statistics, it is crucial to tailor empirical studies to the specific cognitive models under investigation before collecting any new data. In practice, methods such as design optimization, classical power analysis, and Bayesian design analysis provide indispensable tools for planning and designing informative experiments. Given that cognitive models provide precise predictions for future observations, we especially highlight the benefits of model-based Monte Carlo simulations to judge the expected information gain provided by different possible designs for cognitive modeling.
Content may be subject to copyright.
ORIGINAL PAPER
Maximizing the Expected Information Gain of Cognitive Modeling via
Design Optimization
Daniel W. Heck
1
&Edgar Erdfelder
1
#Society for Mathematical Psychology 2019
Abstract
To ensure robust scientific conclusions, cognitive modelers should optimize planned experimental designs a priori in order to
maximize the expected information gain for answering the substantive question of interest. Both from the perspective of
philosophy of science, but also within classical and Bayesian statistics, it is crucial to tailor empirical studies to the specific
cognitive models under investigation before collecting any new data. In practice, methods such as design optimization, classical
power analysis, and Bayesian design analysis provide indispensable tools for planning and designing informative experiments.
Given that cognitive models provide precise predictions for future observations, we especially highlight the benefits of model-
based Monte Carlo simulations to judge the expected information gain provided by different possible designs for cognitive
modeling.
Keywords Optimal design .Power analysis .Recovery simulation .Experimental design .Bayesfactordesignanalysis
Recently, Lee et al. (in press) provided recommendations of how
to increase the robustness of cognitive modeling. We especially
appreciated Lee et al.s recommendation to rely on preregistra-
tion and Registered Modeling Reports in cognitive modeling
projects. Such tools are especially valuable for testing confirma-
tory hypotheses that are ubiquitous in cognitive modeling, for
instance, when testing the goodness of fit of a model, when
validating model parameters by selective influence, when
selecting between competing models, or when testing which
parameters are affected by specific experimental manipulations.
However, before collecting any new data in a confirmatory
(possibly preregistered) study, cognitive modelers should reg-
ularly consider the question of how to design the study opti-
mally such that it maximizes the expected information gain for
answering the substantive question of interest given a model-
based analysis of the data (Myung and Pitt 2009). By doing
so, researchers can exploit limited resources such as money,
number of participants, or study duration most efficiently.
Even though optimizing the experimental design a priori is
crucial for ensuring informative and robust scientific
conclusions through cognitive modeling, Lee et al. (in press)
mentioned this aspect only very briefly, merely stating that
model recovery studies can help diagnose issues like
(weak) identifiability with respect to the type and amount of
information likely to be available(p. 9). In a similar vein, Lee
et al. (in press) argued against the usefulness of power analysis
in general (p. 5). However, when viewed more broadly, con-
siderations of the expected information gain of a planned
study (e.g., in terms of statistical power) are important. In
the present commentary, we highlight the importance of opti-
mizing experimental designs before data collection and elab-
orate on the practical benefits of Monte Carlo simulations for
improving the informativeness of cognitive-modeling studies
apriori.
Informative Experimental Designs
Irrespective of whether a cognitive modeling project is
confirmatory or exploratory, researchers should always
consider whether a planned study is informative for an-
swering a scientific question of interest. Here, we use the
broad term of an informative designsince it encom-
passes several more specific concepts both in theories of
philosophy of science and in various statistical frameworks
(e.g., classical or Bayesian statistics).
*Daniel W. Heck
heck@uni-mannheim.de
1
University of Mannheim, Schloss (Room EO 255),
68131 Mannheim, Germany
Computational Brain & Behavior
https://doi.org/10.1007/s42113-019-00035-0
According to Poppers(2005) logic of scientific discovery,
the principle of falsificationism states that theories or models
should make nontrivial testable predictions that, when tested
rigorously, nevertheless hold empirically. In turn, to test a
cognitive model rigorously, it is necessary to optimize the
experimental design in a way to increase chances of falsifica-
tion in case the model is actually false. In a cognitive modeling
project, this means that the researcher should select a para-
digm, experimental conditions, stimuli, and number of partic-
ipants that generate data structures for which the cognitive
model makes precise, in principle falsifiable predictions.
Only then, a statistical test of the models fit constitutes an
informative test of the theory. Within psychology, this impor-
tant aspect of scientific inference and statistical modeling has
been emphasized in a now classical paper by Roberts and
Pashler (2000): only when both theory and data provide sub-
stantial constraints does this [good model fit] provide signifi-
cant evidence for the theory(p. 359). Importantly, the degree
to which theory and data provide constraints completely de-
pends on the specific experimental design. As an extreme
example, consider the case of testing a theory of memory
decay without actually manipulating the retention interval.
Obviously, such an experimental design is not suited to test
the theory at all. However, the general principle underlying
this (admittedly rather absurd) example directly generalizes to
realistic scenarios, since even basic choices of the experimen-
tal design (e.g., the type and the number of stimuli or condi-
tions) determine how strongly the range of possible outcomes
is restricted by a specific theory.
The importance of informative experimental designs be-
comes even more obvious when considering Platts(1964)
principle of strong inference. According to Platt, re-
searchers should aim to design an experimentum crucis
for which two competing theories make exactly opposite
predictions. Thereby, the researcher maximizes the expect-
ed information gain of the planned experimental design
before collecting any data. Within cognitive modeling, this
goal can be achieved by deriving critical tests based only on
core assumptions of a set of competing cognitive models
(Kellen and Klauer 2015). By testing informative data pat-
terns that discriminate between the competitors, such criti-
cal tests circumvent the reliance on auxiliary (parametric)
assumptions that are usually required by goodness-of-fit
tests or model-selection methods. For instance, Kellen and
Klauer (2015) derived distinct predictions for confidence-
rating receiver operating characteristic (ROC) curves in
recognition memory for two general classes of models
(i.e., signal detection theory and high-threshold models).
Thereby, they obtained a critical test that was tailored to
the research question at hand (i.e., discriminating between
the two model classes) and thus was more informative than
the more common approach of comparing how well specif-
ic parametric versions of the models fit to ROC curves.
Statistical Criteria and Approaches
for Assessing Information Gain
In practice, the question emerges how to assess the expected
information gain of an experimental design before any data
are collected. The answer to this question depends on the
choice of the statistical framework (e.g., classical or
Bayesian statistics) and the statistical criterion of interest
(e.g., hypothesis testing, model selection, or precision of
estimation).
Power Analysis and Optimal Design
Within classical statistics, the information provided by exper-
imental designs is intimately linked to statistical power anal-
ysis within the Neyman-Pearson framework (Neyman and
Pearson 1933; for applications to standard tests, see Faul
et al. 2007) and to accuracy in parameter estimation (i.e.,
controlling the width of the confidence interval, Maxwell
et al. 2008; or other aspects of optimal experimental design,
Berger and Wong 2009). In a typical a priori power analysis,
the required overall sample size is determined by computing
the minimum sample size for which the Type I and Type II
error probabilities of statistical decisions are below the pre-
specified thresholds αand β, respectively, given a minimal to-
be-detected effect size(i.e., deviation from the null hypoth-
esis model) in a specific design. However, next to such calcu-
lations of the required overall sample size, power analysis also
provides an appropriate framework for planning various addi-
tional aspects of an informative experimental design to test a
cognitive model: the proportions of the total sample that
should be assigned to different experimental conditions, the
optimum number of items per participant in within-subject
designs, the gain in power by including covariates into a mod-
el, the selection of the most powerful statistical test, and the
calibration of context conditions that maximize the power of
the hypothesis test of main interest.
What is more, classical a priori power analysis (Cohen
1988) can be refined and improved in a cognitive-modeling
framework by formalizing to-be-detected effects in terms of
psychologically meaningful parameters of the relevant model.
In contrast, power analysis for standard hypothesis tests usu-
ally rests on conventions for standardized effect sizes that are
notoriously difficult to agree upon (and that actually have
different meanings in different designs). These difficulties
largely diminish if to-be-detected effects can be defined in
terms of the models parameters as shown in the Example:
Optimal Design for the Pair-Clustering Modelsection (see
also Erdfelder et al. 2005;Moshagen2010). Given the many
benefits of power analysis for planning details of an experi-
mental design (and not just for deciding about the total sample
size), we disagree with Lee et al. (in press) statement that
unless a modeling project depends critically on a null
Comput Brain Behav
hypothesis test and there is only one opportunity for data to be
collected, a-priori power analysis does not serve a necessary
role(p. 5).
Model Selection and Model-Recovery Simulations
If more than two hypotheses or models are compared, statis-
tical inference often relies on model-selection methods such as
the Akaike information criterion (AIC), the Bayesian informa-
tion criterion (BIC), or the minimum description length prin-
ciple (Myung et al. 2000). In such a scenario, the information
provided by an experimental design may be assessed by clas-
sification accuracy, the probability of selecting the true, data-
generating model out of a set of competing models when
using a specific model-selection criterion. Focusing on classi-
fication accuracy as one of many possible criteria, Myung and
Pitt (2009) developed a formal framework to maximize the
utility of an experimental design for discriminating between
multiple cognitive models. Essentially, the proposed method
searches for the global optimum of all possible levels of a
design factor (e.g., the spacing of multiple retention intervals)
by maximizing a high-dimensional integral of the utility func-
tion over both the parameter space and the data space (weight-
ed by the prior distribution and the likelihood function, respec-
tively). However, even though the authors adapted a
simulation-based approach to address the issue of computa-
tional complexity (Müller et al. 2004), the implementation of
such a highly formalized framework may be intractable or too
time-intensive in practice for many researchers.
As a remedy, cognitive modelers can often rely on the less
sophisticated, but much simpler approach of Monte Carlo
simulations to estimate the expected information gain for a
small number of possible experimental designs. As an exam-
ple of a particularly simple method, Algorithm 1 shows
pseudo-code for a standard model-recovery simulation: A
set of cognitive models M
1
,,M
J
is used to generate and
fit simulated data, thus enabling the estimation of the classifi-
cation accuracy for a specific experimental design dwhen
using a specific model-selection criterion C(e.g., the AIC or
BIC). By performing multiple such simulations while varying
details of the design d, researchers can easily improve the
expected information gain of a planned study in terms of the
model-recovery rate (for similar frameworks, see Navarro
et al. 2004; Wagenmakers et al. 2004).
Even though heuristic simulations do in no way guarantee
that the optimal design is found, they may serve as a simple
and effective means of improving the informativeness of a
planned study without much effort. For instance, Heck et al.
(2017) classified individuals as users of different decision-
making strategies using the Bayes factor and a specific version
of the minimum description length (i.e., the normalized max-
imum likelihood). By varying the number of trials in a model-
recovery simulation similar to Algorithm 1, the minimum
number of responses per participant was determined to ensure
high classification accuracies for both methods.
Going beyond the specific criterion of model recovery,
simulations are also useful to assess the expected information
gain of experimental designs with respect to any other quan-
tity of interest. For instance, Gelman and Carlin (2014)pro-
posed to regularly consider Type S errors (i.e., sign errors that
parameter estimates are in the wrong direction) and Type M
errors (i.e., that the magnitude of an effect is overestimated).
The probabilities for both types of errors can easily be esti-
mated by means of simulations, in turn facilitating the search
for an experimental design that minimizes Type S and Type M
error rates. As a beneficial side effect of simulating possible
outcomes of a study before data collection, the results of a
simulation-based search for an informative design may be
reported as a justification for choosing a specific design and
sample sizewhich is especially useful for (but not limited
to) Registered Modeling Reports.
Algorithm 1. Pseudo-code for a standard model-recovery simulation.
1. for all replications r=1,,Rand all models M
1
,,M
J
do
1.1. Select a data-generating model M
j
.
1.2. Define plausible model parameters θ
r
(e.g., fixed values or
sampled from prior distribution).
1.3. Simulate data y
r
for a specific experimental design d
conditional on θ
r
.
1.4. Fit all models M
1
,,M
J
under consideration.
1.5. Compute statistical criterion C
r
for all models (e.g., AIC,
BIC, Bayes factor).
2. Summarize the distribution of C
r
(e.g., probability of recovering the
data-generating model).
Note. By changing details of the simulated experimental design dacross
simulations (e.g., sample size, relative number of trials per condition, or
data-generating parameters), one can estimate the expected information
gain of different experimental designs.
Expected Information Gain in the Bayesian
Framework
A priori measures of design optimization do not become
less important within the Bayesian framework. Essentially, the
planned experimental design should maximize the expected
information gain obtained by updating the prior to the poste-
rior distribution (for a formal definition based on the Shannon
entropy, see Lindley 1956). For example, in case of a
Bayesian t-test, researchers may assess the expected sample
size required for a high a priori probability of observing a
sufficiently large Bayes factor for or against the null hypoth-
esis before collecting any data (Schönbrodt and Wagenmakers
2018).
The general idea of searching for an experimental design
that maximizes the expected evidence of a study (Lindley
1956) easily transfers to the application of Bayes factors in
Comput Brain Behav
cognitive modeling. By using the Bayes factor as the statistical
criterion Cin the simulation sketched in Algorithm 1, re-
searchers can easily search for the minimum sample size or
number of trials that are necessary to ensure an expected value
of the Bayes factor that represents convincing evidence for or
against the substantive question of interest. Simulated distri-
butions of Bayes factors can even be useful if the data have
already been collected. For instance, in a large-scale reanaly-
sis, Heck et al. (2018b) obtained ambiguous evidence (i.e.,
Bayes factors close to one) when regressing dishonest behav-
ior on several basic personality traits (e.g., Conscientiousness,
Extraversion) even though the merged dataset included a large
number of participants (N= 5002). To judge the possible ben-
efits of future studies linking personality to dishonesty, the
authors plotted the distribution of simulated Bayes factors as
a function of sample size, thereby showing that unfeasibly
large sample sizes would be required to draw any firm
conclusions.
1
If the focus is on Bayesian parameter estimation instead of
hypothesis testing, simulation studies allow researchers to
judge the expected precision of the parameter estimates of a
cognitive model for a specific experimental design. As an
example, the approach by Arnold et al. (2019) provides a case
study of what Lee et al. referred to as a Registered Modeling
Report. Arnold et al. used a well-established multinomial
model of multidimensional source memory (Batchelder and
Riefer 1990;Meiser2014) to investigate whether the binding
of context features is a robust phenomenon (as opposed to an
aggregation artifact due to fitting traditional, complete-
pooling models) and whether different types of encoding dur-
ing the study phase affect the binding parameter. Before
collecting any data, the authors submitted a detailed preregis-
tration plan that specified the substantive hypotheses of inter-
est and details of the experimental design (e.g., sample size
and number of trials), but also specifics of the cognitive model
(i.e., a Bayesian hierarchical multinomial model with
predetermined parameter constraints and prior distributions).
By using a Monte Carlo simulation similar to that in
Algorithm 1 with the difference of focusing on parameter
estimates instead of model selection, the authors assessed the
expected precision for estimating the difference in the binding
parameters across the two between-subjects conditions (Heck
et al. 2018a). Thereby, the sample size and the number of trials
were chosen in a way to ensure sufficiently precise parameter
estimates in terms of the Bayesian credibility interval.
Design Choices in Cognitive Modeling
Irrespective of which specific statistical framework and criteria
are used, the information gain provided by an experimental de-
sign for cognitive modeling hinges on many design choices.
Most prominently, the total sample size is a critical variable in
cognitive modeling, for instance, when testing selective influence
of experimental manipulations on model parameters (e.g., by
showing that a standard memory-strength manipulation affects
only a targeted memory parameter but not any remaining
response-bias parameters). A priori considerations of the required
sample size are even important in cognitive-modeling projects
that rely on sequential tests and optional stopping to increase
efficiency of a statistical test compared to fixed-Ndesigns (e.g.,
using the Bayes factor or the Wald test, cf. Schnuerch and
Erdfelder in press; Schönbrodt et al. 2017;Wald1947). Given
that the resources for empirical studies are always limited, it is
crucial to check whether the planned experiment has a realistic
chance of answering the substantive question of interest. If the
expected information gain provided by the planned design for
cognitive modeling is not assessed a priori, a sequential sampling
plan might merely provide ambiguous conclusions even after
collection of hundreds of participants.
Importantly, the amount of information provided by a study
does not only depend on the total sample size but also on other
design factors such as the relative sample size per condition in
between-subject designs or the number of trials in within-
subject designs (for an example, see the Example: Optimal
Design for the Pair-Clustering Modelsection). Moreover, the
informativeness of cognitive modeling often depends on the
presentation format and specific features of the presented stim-
uli. As an example of modeling long-term memory, the infor-
mation gain in discriminating between power and exponential
models of forgetting curves depends crucially on the spacing of
the retention intervals (Myung and Pitt 2009; Navarro et al.
2004). As a second example in judgment and decision-making,
consider the classification of participants as users of different
decision-making strategies such as take-the-best or weighted-
additive (Bröder and Schiffer 2003). In typical choice tasks for
investigating strategy use across different environments, the
presence or absence of item features (e.g., which of two choice
options is recommended by different experts) must be carefully
chosen to ensure that each strategy predicts a unique response
pattern across items. Next, the decision strategies under consid-
eration are usually formalized as statistical models to classify
participants as users of different strategies. However, the dis-
criminatory power of empirical strategy classification depends
crucially on the specific combinations of item features used in a
study (Heck et al. 2017). As a remedy, instead of searching for
informative item configurations heuristically, Jekel et al. (2011)
employed Euclidian diagnostic task selection to search for the
most diagnostic cue patterns for a given set of decision strate-
gies or models.
1
Notably, even though simulated distributions of Bayes factors may be rele-
vant for planning specific design considerations for future studies, the results
do not affect the interpretation of an observed Bayes factor after data collec-
tion. This is due to the fact that the Bayes factor obeys the likelihood principle,
which means that only the actually observed data (as opposed to hypothetical
data) are relevant for statistical inference (Berger and Wolpert, 1988).
Comput Brain Behav
Whereas sample-size planning and diagnostic stimulus se-
lection is relevant for statistical inference in general (Maxwell
et al. 2008), cognitive modeling in particular offers some
unique benefits for design optimization in terms of calibrating
the substantively meaningful model parameters. For instance,
the parameters in multinomial processing tree models are de-
fined as conditional probabilities of entering different cogni-
tive states or processes (Batchelder and Riefer 1990). Hence,
the information that can be gained about a single test-relevant
parameter may heavily depend on the specific values of the
remaining (nuisance) parameters. For example, in standard
source-memory paradigms (Arnold et al. 2019), participants
have to learn items from two sources (e.g., words presented in
blue and red). In the test phase, the studied items are then
presented intermixed with lures while omitting the source in-
formation (e.g., by showing all words in black). In the corre-
sponding source-monitoring model, the probability of source
recognition d(i.e., detecting whether a word was presented in
blue or red) is defined conditionally on target detection D(i.e.,
detecting whether a word was studied at all). If a researcher is
mainly interested in source memory, this implies that the
length and the content of the study list should be chosen in a
way that ensures a high probability of target detection (i.e.,
large D) which in turn results in a high accuracy of the esti-
mate for source recognition d.
In a cognitive model, a calibration of the psychologi-
cally meaningful parameters can be achieved by relying
on parameter estimates for specific experimental designs,
stimulus materials, or populations of participants in pre-
vious studies. Moreover, the specification of design fac-
tors is facilitated by the fact that the parameters of a cog-
nitive model usually represent meaningful psychological
concepts about which researchers can usually make in-
formed guesses or even define precise prior distributions
(Lee and Vanpaemel 2018).
Example: Optimal Design
for the Pair-Clustering Model
In this subsection, we provide an example of how to optimize
the design of an empirical study to maximize the expected
information gain (here, statistical power) in practice. For this
purpose, we rely on the pair-clustering model, a classical mul-
tinomial processing tree model for disentangling storage and
retrieval in memory (Batchelder and Riefer 1980,1986). In
the standard pair-clustering paradigm, participants first study a
list of words; some of which are semantically related to a
second word (pairs; e.g., dog and cat), whereas others are
unrelated to all remaining words in the study list (singletons).
In a free-recall test, individuals then have to remember as
many of the studied words as possible.
To model the data, the following categories are de-
fined: For pairs, we distinguish the recall of both items
in direct succession (C
11
), the recall of both items sepa-
rated by at least one other item (C
12
), the recall of only
one item of a pair (C
13
), and the recall of none of the
items (C
14
). For singletons, we only distinguish between
items that are recalled and those that are not (C
21
and C
22
,
respectively). The pair-clustering model assumes that the
semantic association of two paired items may result in a
memory benefit due to clustering, meaning that the two
words are jointly stored and retrieved. To specify this
assumption more precisely, the model (shown in Fig. 1)
assumes that a pair is stored as a cluster with probability
c, in which case it is retrieved during free recall with
conditional probability r, resulting in a C
11
response, or
not retrieved with probability 1r, resulting in recall fail-
ure (i.e., C
14
). If clustering fails with probability 1c,in
contrast, participants store and retrieve each item of a pair
independently with probability u. Similarly, each single-
ton is stored and retrieved independently with probability
a.
In most applications of the pair-clustering model, it is as-
sumed that storage and retrieval of single words are identical
for pairs and singletons. To test this hypothesis H
0
:u=a,
Batchelder and Riefer (1986) developed a likelihood ratio test.
However, the statistical power (and thus, the expected infor-
mation gain) of this test depends on two crucial design factors:
first, on the proportion of pairs vs. singletons in the study list,
and second, on the population value of the probability of clus-
tering c, which in turn depends on the difficulty of detecting
semantic associations. To illustrate this, we computed the
power of the likelihood ratio test for testing the null hypothesis
H
0
:u=aat the significance level α= 5%. For multinomial
processing tree models, the power can easily be approximated
Fig. 1 The pair-clustering model (Batchelder and Riefer 1980). The
parameters refer to c= probability of storing a semantically related pair
of items as a cluster, r= probability of retrieving a clustered pair, u=
probability of recalling a single item of a non-clustered pair, and a=
probability of recalling a singleton
Comput Brain Behav
by (1) specifying the population model under H
1
with all
parameter values fixed at plausible values, (2) defining the
number of pairs and singletons (N
1
and N
2
,respectively),(3)
computing the expected frequencies under H
1
, (4) fitting the
model under H
0
to these expected frequencies by minimizing
the likelihood ratio statistic G
2
, and (5) computing the statis-
tical power 1βusing the noncentral χ2
1λðÞdistribution with
the minimized G
2
as noncentrality parameter λ(Erdfelder
et al. 2005;Moshagen2010).
Figure 2shows the statistical power of testingH
0
:u=aas a
function of the proportion of pairs in the study list (left panel)
and the probability of clustering c(right panel). First, with
respect to the proportion of pairs, it is evident that the power
is highest if the study list contains more pairs than singletons.
Depending on the parameter values in the population, the
optimal proportion of pairs is 65% if c=r= 0.50 and 75% if
c=r= 0.80, resulting in a statistical power of 84.8% and
58.2%, respectively. Hence, depending on the expectation
about the most plausible value for cand r(which may be
based on prior studies with similar participants), one can adapt
the design accordingly. Second, for a fixed proportion of 50%
pairs, the second panel in Fig. 2shows the statistical power as
a function of the probability of clustering c.Irrespectiveofthe
population value of the test-relevant parameter a, the power
for testing the hypothesis H
0
:u=ais maximized for small c.If
researchers are interested in testing this hypothesis rigorously,
they should thus ensure that the probability of clustering is
relatively small (e.g., by using pairs with weak semantic
associations).
Quite often researchers will be interested in several re-
search questions simultaneously. For example, cognitive
aging researchers might be interested in (1) testing the pair
clustering model for both young and older adults (i.e., H
0
:
u
y
=a
y
and u
o
=a
o
)andalsoin(2)assessingwhetherre-
trieval parameters differ between age groups or not (i.e.,
H
0
:r
y
=r
o
; cf. Riefer and Batchelder 1991). We have al-
ready seen that the first research question can be addressed
most efficiently when cis small. By the same logic, it is
possible to show that the optimal design for the second
research question requires cto be large. How to resolve
this conflict? There are basically two ways to go. First,
one could design two independent experiments based on
the same underlying populations, with encoding conditions
aiming at a small versus a large c, respectively, to answer
each of the two research questions most efficiently.
Second, one could design a single experiment aiming at a
medium parameter of, say, c= 0.50 and compensate for the
loss in statistical power due to non-optimal cby increasing
the sample sizes N
1
and N
2
for pairs and singletons, respec-
tively. Again, a priori design planning will enable re-
searchers to decide which of these two possible ways min-
imizes resources (i.e., time and money) required for an-
swering both research questions of interest with a
predetermined level of statistical power.
Fig. 2 Statistical power of testing the hypothesis H
0
:u=aat the
significance level α= 5%. Panel A is based on the alternative
hypothesis u=0.40 and a= 0.60 and a total sample size of N=480. In
panel B, the sample sizes for pairs and singletons are N
1
= 320 and N
2
=
160, respectively, with a probability of retrieval of u=0.40andr=0.80
Comput Brain Behav
Overall, our example shows how the expected informa-
tion gain of cognitive modeling can be improved by opti-
mizing details of the experimental design. This is achieved
by assessing the statistical criterion of interest (e.g., the
statistical power) as a function of one or more design fac-
tors (i.e., the relative proportion of pairs and the population
values of the meaningful model parameters). More gener-
ally, the example also shows that the utility of power anal-
ysis extends beyond planning the total sample size for a
fixed experimental design.
Conclusion
Before collecting any new data for a confirmatory (possibly
preregistered) test in cognitive modeling, it is important that
researchers consider whether the planned study and experi-
mental design is informative with respect to the cognitive
models of interest. In order to optimize the design, a formal
analysis allows researchers to maximize, for instance, the ex-
pected utility in terms of model discrimination (Myung and
Pitt 2009), the statistical power of a hypothesis test (Cohen
1988), or the discrepancy between prior and posterior distri-
bution (Lindley 1956). As a much simpler, less time-intensive
alternative approach, Monte Carlo simulations allow re-
searchers to perform any planned analysis a priori across
many replications to assess the impact of various design
choices on the expected gain in information (see
Algorithm 1). Going beyond the scope of a priori design con-
siderations, the benefits of design optimization can be
exploited to an optimal degree by selecting the most informa-
tive stimuliwhile the experiment is running bymeans of adap-
tive designs (Myung et al. 2013).
However, whereas optimizations of the design should reg-
ularly be considered for confirmatory tests in cognitive model-
ing, this is more difficult or may even be impossible in explor-
atory contexts. First, when reanalyzing existing data (e.g., to
develop a new model), design optimization is obviously not
possible. Nevertheless, researchers can judge post hoc wheth-
er the implemented design was informative for model devel-
opment and use this knowledge as a basis for designing more
informative follow-up studies (Navarro etal. 2004). Second, if
the aim is to collect new data for exploratory modeling, design
optimizations are more difficult but still possible. In such a
case, researchers usually cannot formalize expected outcomes
in terms of model equations or parameter valuesaprerequi-
site required for methods such as power analysis or model-
recovery simulations. However, cognitive modelers usually
collect new data only if they have at least some idea of what
to model (e.g., whether certain design factors affect retention
curves in memory). Instead of assessing information gain in
light of established cognitive models, researchers can still rely
on classical power analysis for standard statistical models
(e.g., ANOVA, correlation, or regression analysis) to estimate
the number of observations required to detect relevant system-
atic patterns in the data empirically.
In experimental psychology more generally, the impor-
tance of optimizing experimental designs was recently empha-
sized within the statistical guidelines of the Psychonomic
Society (2019), advising researchers to do what you reason-
ably can to design an experiment that allows a sensitive test.
Within cognitive modeling, methods such as design optimiza-
tion, power analysis, Bayesian design analysis, and simula-
tions of model recovery and parameter estimation are crucial
tools to assess the expected information gain provided by an
experimental design before collecting any data. At least under
ideal conditions as those generated in a simulation or assumed
by an analytical analysis, the chosen experimental design
should ensure a high a priori probability that the models under
consideration provide insights about the psychological theo-
ries and cognitive processes of interest.
Funding information This research was supported by the German
Research Foundation (DFG; Research Training Group Statistical
Modeling in Psychology,grant GRK 2277).
Data availability All R scripts are available on the Open Science
Framework at https://osf.io/xehk5/.
References
Arnold, N. R., Heck, D. W., Bröder, A., Meiser, T., & Boywitt, C. D.
(2019). Testing hypotheses about binding in context memory with a
hierarchical multinomial modeling approach: a preregistered study.
Experimental Psychology, 66(3), 239251. https://doi.org/10.1027/
1618-3169/a000442.
Batchelder, W. H., & Riefer, D. M. (1980). Separation of storage and
retrieval factors in free recall of clusterable pairs. Psychological
Review, 87(4), 375397. https://doi.org/10.1037/0033-295X.87.4.
375.
Batchelder, W. H., & Riefer, D. M. (1986). The statistical analysis of a
model for storage and retrieval processes in human memory. British
Journal of Mathematical and Statistical Psychology, 39,129149.
https://doi.org/10.1111/j.2044-8317.1986.tb00852.x.
Batchelder, W. H., & Riefer, D. M. (1990). Multinomial processing
models of source monitoring. Psychological Review, 97,548564.
https://doi.org/10.1037/0033-295X.97.4.548.
Berger, J. O., & Wolpert, R. L. (1988). The likelihood principle.
Haywood, CA: The Institute of Mathematical Statistics.
Berger, M. P. F., & Wong, W.-K. (2009). An introduction to optimal designs
for social and biomedical research. Hoboken: John Wiley & Sons.
Bröder, A., & Schiffer, S. (2003). Bayesian strategy assessment in multi-
attribute decision making. Journal of Behavioral Decision Making,
16(3), 193213. https://doi.org/10.1002/bdm.442.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences
(2nd ed.). Hillsdale, NJ: Erlbaum.
Erdfelder, E., Faul, F., & Buchner, A. (2005). Power analysis for categor-
ical methods. In Encyclopedia of Statistics in Behavioral Science
(Vol. 3, pp. 15651570). https://doi.org/10.1002/0470013192.
bsa491.
Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). GPower 3: a
flexible statistical power analysis program for the social, behavioral,
Comput Brain Behav
and biomedical sciences. Behavior Research Methods, 39,175191.
https://doi.org/10.3758/bf03193146.
Gelman, A., & Carlin, J. (2014). Beyond power calculations: assessing
type S (sign) and type M (magnitude) errors. Perspectives on
Psychological Science, 9(6), 641651. https://doi.org/10.1177/
1745691614551642.
Heck, D. W., Hilbig, B. E., & Moshagen, M. (2017). From information
processing to decisions: formalizing and comparing probabilistic
choice models. Cognitive Psychology, 96,2640. https://doi.org/
10.1016/j.cogpsych.2017.05.003.
Heck, D. W., Arnold, N. R., & Arnold, D. (2018a). TreeBUGS: an R
package for hierarchical multinomial-processing-tree modeling.
Behavior Research Methods, 50(1), 264284. https://doi.org/10.
3758/s13428-017-0869-7.
Heck, D. W., Thielmann, I., Moshagen, M., & Hilbig, B. E. (2018b). Who
lies? A large-scale reanalysis linking basic personality traits to un-
ethical decision making. Judgment and Decision Making, 13(4),
356371.
Jekel, M., Fiedler, S., & Glöckner, A. (2011). Diagnostic task selection
for strategy classification in judgment and decision making: theory,
validation, and implementation in R. Judgment and Decision
Making, 6(8), 782799.
Kellen, D., & Klauer, K. C. (2015). Signal detection and threshold model-
ing of confidence-rating ROCs: a critical test with minimal assump-
tions. Psychological Review, 122(3), 542557.
Lee, M. D., & Vanpaemel, W. (2018). Determining informative priors for
cognitive models. Psychonomic Bulletin & Review, 25(1), 114127.
https://doi.org/10.3758/s13423-017-1238-3.
Lee, M. D., Criss, A. H., Devezer, B., Donkin, C., Etz, A., Leite, F. P.,
et al. (in press). Robust modeling in cognitive science.
Computational Brain & Behavior. https://doi.org/10.1007/s42113-
019-00029-y.
Lindley, D. V. (1956). On a measure of the information provided by an
experiment. The Annals of Mathematical Statistics, 27(4), 986
1005. https://doi.org/10.1214/aoms/1177728069.
Maxwell, S. E., Kelley, K., & Rausch, J. R. (2008). Sample size planning
for statistical power and accuracy in parameter estimation. Annual
Review of Psychology, 59(1), 537563. https://doi.org/10.1146/
annurev.psych.59.103006.093735.
Meiser, T. (2014). Analyzing stochastic dependence of cognitive process-
es in multidimensional source recognition. Experimental
Psychology, 61(5), 402415. https://doi.org/10.1027/1618-3169/
a000261.
Moshagen, M. (2010). multiTree: a computer program for the analysis of
multinomial processing tree models. Behavior Research Methods,
42,42
54.https://doi.org/10.3758/BRM.42.1.42.
Müller, P., Sansó, B., & Iorio, M. D. (2004). Optimal Bayesian design by
inhomogeneous Markov chain simulation. Journal of the American
Statistical Association, 99(467), 788798. https://doi.org/10.1198/
016214504000001123.
Myung, J. I., & Pitt, M. A. (2009). Optimal experimental design for
model discrimination. Psychological Review, 116(3), 499518.
https://doi.org/10.1037/a0016104.
Myung, J. I., Forster, M. R., & Browne, M. W. (2000). Guest editors
introduction: special issue on model selection. Journal of
Mathematical Psychology, 44,12. https://doi.org/10.1006/jmps.
1999.1273.
Myung, J. I., Cavagnaro, D. R., & Pitt, M. A. (2013). A tutorial on
adaptive design optimization. Journal of Mathematical
Psychology, 57,5367. https://doi.org/10.1016/j.jmp.2013.05.005.
Navarro, D. J., Pitt, M. A., & Myung, I. J. (2004). Assessing the distin-
guishability of models and the informativeness of data. Cognitive
Psychology, 49(1), 4784. https://doi.org/10.1016/j.cogpsych.2003.
11.0 01.
Neyman, J., & Pearson, E. S.(1933). On the problem of the most efficient
tests of statistical hypotheses. Philosophical Transactions of the
Royal Society of London. Series A, Containing Papers of a
Mathematical or Physical Character, 231,289337.
Platt, J. R. (1964). Strong inference. Science, 146(3642), 347353.
Popper, K. (2005). The logic of scientific discovery.https://doi.org/10.
4324/9780203994627.
Psychonomic Society. (2019). Statistical guidelines. Retrieved from
https://www.psychonomic.org/page/statisticalguidelines
Riefer, D. M., & Batchelder, W. H. (1991). Age differences in storage and
retrieval: a multinomial modeling analysis. Bulletin of the
Psychonomic Society, 29(5), 415418. https://doi.org/10.3758/
BF03333957.
Roberts, S., & Pashler, H. (2000). How persuasive is a good fit? A com-
ment on theory testing. Psychological Review, 107(2), 358367.
https://doi.org/10.1037/0033-295X.107.2.358.
Schnuerch, M., & Erdfelder, E. (in press). Controlling decision errors
with minimal costs: the sequential probability ratio t test.
Psychological Methods.
Schönbrodt, F. D., & Wagenmakers, E.-J. (2018). Bayes factor design
analysis: planning for compelling evidence. Psychonomic Bulletin
&Review,25(1), 128142. https://doi.org/10.3758/s13423-017-
1230-y.
Schönbrodt, F. D., Wagenmakers, E.-J., Zehetleitner, M., & Perugini, M.
(2017). Sequential hypothesis testing with Bayes factors: efficiently
testing mean differences. Psychological Methods, 22(2), 322339.
https://doi.org/10.1037/met0000061.
Wagenmakers, E.-J., Ratcliff, R., Gomez, P., & Iverson, G. J. (2004).
Assessing model mimicry using the parametric bootstrap. Journal
of Mathematical Psychology, 48(1), 2850. https://doi.org/10.1016/
j.jmp.2003.11.004.
Wald, A. (1947). Sequential analysis. New York: Wiley.
PublishersNote Springer Nature remains neutral with regard to jurisdic-
tional claims in published maps and institutional affiliations.
Comput Brain Behav
... Importantly, statistical power in MPT modeling does not only depend on sample size but also on the experimental design and the location of the test-relevant MPT parameters in the probability tree (Heck & Erdfelder, 2019). Power is generally higher for MPT parameters that occur (a) in trees with large numbers of observations, (b) in multiple branches of the model, and (c) near to the root of the probability tree (e.g., parameters referring to unconditional probabilities). ...
... may be uninformative. Often, one can increase the power of a hypothesis test by optimizing the experimental design, for instance, by ensuring a high probability of clustering (e.g., c ≥ 0.80; Heck & Erdfelder, 2019). ...
Article
Full-text available
Many psychological theories assume that observable responses are determined by multiple latent processes. Multinomial processing tree (MPT) models are a class of cognitive models for discrete responses that allow researchers to disentangle and measure such processes. Before applying MPT models to specific psychological theories, it is necessary to tailor a model to specific experimental designs. In this tutorial, we explain how to develop, fit, and test MPT models using the classical pair-clustering model as a running example. The first part covers the required data structures, model equations, identifiability, model validation, maximum-likelihood estimation, hypothesis tests, and power analyses using the software multiTree. The second part introduces hierarchical MPT modeling which allows researchers to account for individual differences and to estimate the correlations of latent processes among each other and with additional covariates using the TreeBUGS package in R. All examples including data and annotated analysis scripts are provided at the Open Science Framework (https://osf.io/24pbm/). (PsycInfo Database Record (c) 2023 APA, all rights reserved).
... Analyzing multiple hier-archical data sets with variable sizes only requires a single network that has seen different data set sizes during training. The same network could also be used for various simulation studies, such as the challenging task of designing maximally informative experiments in a hierarchical BMC setting (Heck & Erdfelder, 2019;Myung & Pitt, 2009). ...
Article
Full-text available
Bayesian model comparison (BMC) offers a principled approach to assessing the relative merits of competing computational models and propagating uncertainty into model selection decisions. However, BMC is often intractable for the popular class of hierarchical models due to their high-dimensional nested parameter structure. To address this intractability, we propose a deep learning method for performing BMC on any set of hierarchical models which can be instantiated as probabilistic programs. Since our method enables amortized inference, it allows efficient re-estimation of posterior model probabilities and fast performance validation prior to any real-data application. In a series of extensive validation studies, we benchmark the performance of our method against the state-of-the-art bridge sampling method and demonstrate excellent amortized inference across all BMC settings. We then showcase our method by comparing four hierarchical evidence accumulation models that have previously been deemed intractable for BMC due to partly implicit likelihoods. Additionally, we demonstrate how transfer learning can be leveraged to enhance training efficiency. We provide reproducible code for all analyses and an open-source implementation of our method.
... Prelec (1998) Tversky and Kahneman (1992) A Prelec ( 29,880 17,192 44,462 26,810 FA 16,410,010,146 542,875,525 37,545,455,388 2,625,826,044 FB 3,198,793,483 2,397,017,339 2,733,682,645 7,890,355,611,910,895 435,036,658 68,759,593,248 2,057,055,290,418 1,667,497,060 921,061,986 7,726,270,110 ...
Article
Estimation, statistical testing, and model selection are the main focusarea in statistics. This study focuses on the relationship betweenestimation precision (Fisher information) and model selection(Kullback—Leibler or KL information). This relationship is importantbecause researchers often conduct estimations or statistical tests aftermodel selection. Additionally, this study examines how ComputerizedAdaptive Testing (CAT), a stimulus selection method that maximizesFisher information, affects model selection performance. A simulationstudy demonstrates the relationship between the difference in Fisherinformation between two models and the degree of asymmetry in KLinformation. Furthermore, we confirm that controlling Fisher informationusing stimulus selection can influence model selection performance. Asimulation study suggests that increasing the Fisher information of afalse model reduces model selection performance.
... However, in memory experiments, this is inherently difficult, since presenting participants with hundreds of source-item pairs for learning and retrieval is not possible. Alternatively, researchers have the option to enhance the informativeness of their MPT models by implementing design optimizations, as suggested by Heck and Erdfelder (2019). This involves carefully selecting items that guarantee an adequate number of responses within the relevant tree branches. ...
Article
Full-text available
Multinomial processing tree (MPT) models are a broad class of statistical models used to test sophisticated psychological theories. The research questions derived from these theories often go beyond simple condition effects on parameters and involve ordinal expectations (e.g., the same-direction effect on the memory parameter is stronger in one experimental condition than another) or disordinal expectations (e.g., the effect reverses in one experimental condition). Here, we argue that by refining common modeling practices, Bayesian hierarchical models are well suited to estimate and test these expectations. Concretely, we show that the default priors proposed in the literature lead to nonsensical predictions for individuals and the population distribution, leading to problems not only in model comparison but also in parameter estimation. Rather than relying on these priors, we argue that MPT modelers should determine priors that are consistent with their theoretical knowledge. In addition, we demonstrate how Bayesian model comparison may be used to test ordinal and disordinal interactions by means of Bayes factors. We apply the techniques discussed to empirical data from Bell et al. Journal of Experimental Psychology: Learning, Memory, and Cognition , 41 , 456–472 (2015).
... According to our results, the most important aspect affecting divergence is the standard error, which captures estimation uncertainty (with a higher standard error being associated with larger divergence between estimation methods). To reduce the standard error and associated divergence, the researcher can make sure to collect a large amount of data or to improve the experimental design (Heck & Erdfelder, 2019). 19 Although this conclusion is not new to researchers who 19 In general, for partial-pooling methods increasing the number of participants increases power, and therefore decreases the standard error, more than increasing the number of observations per participant (Rouder & Haaf, 2018;Westfall et al., 2014). ...
Preprint
Researchers have become increasingly aware that data-analysis decisions affect results. Here, we examine this issue systematically for multinomial processing tree (MPT) models, a popular class of cognitive models for categorical data. Specifically, we examine the robustness of MPT model parameter estimates that arise from two important decisions: the level of data aggregation (complete pooling, no pooling, or partial pooling) and the statistical framework (frequentist or Bayesian). These decisions span a multiverse of estimation methods. We synthesized the data from 13,956 participants (164 published data sets) with a meta-analytic strategy and analyzed the magnitude of divergence between estimation methods for the parameters of nine popular multinomial processing tree (MPT) models in psychology (e.g., process dissociation, source monitoring). We further examined moderators as potential sources of divergence. We found that the absolute divergence between estimation methods was small on average (< .04; with MPT parameters ranging between 0 and 1); in some cases, however, divergence amounted to nearly the maximum possible range (.97). Divergence was partly explained by few moderators (e.g., the specific MPT model parameter, uncertainty in parameter estimation), but not by other plausible candidate moderators (e.g., parameter trade-offs, parameter correlations) or their interactions. Partial-pooling methods showed the smallest divergence within and across levels of pooling and thus seem to be an appropriate default method. Using MPT models as an example, we show how transparency and robustness can be increased in the field of cognitive modeling.
... The planned statistical analysis again relied on GLMMs for binary truth judgments for which standard power analyses are not available. Hence, we performed a simulation-based, a-priori power analysis for sample-size planning (Heck & Erdfelder, 2019). Using a simulation-based approach allows us to specify a precise, substantively-justified pattern of expected group means for the 2 × 2 × 2 × 2 mixed factorial design as opposed to relying on default effect sizes. ...
... Both are important, but simulation studies in particular allow us to understand when a method works well and when it does not, and ultimately to make recommendations on when to use a particular method in practice. We note that simulation can also be used for other purposes, such as experimental design (e.g., sample size planning or power analysis for complex statistical analyses where no closed-form solutions exist as in Heck & Erdfelder, 2019;Lakens & Caldwell, 2021) or numerical integration (e.g., Markov chain Monte Carlo methods for computing posterior distributions in Bayesian statistics), but this use of simulation is typically not called "simulation study" in methodological research and is not the focus of the present paper. ...
Preprint
Full-text available
Simulation studies are widely used for evaluating the performance of statistical methods in psychology. However, the quality of simulation studies can vary widely in terms of their design, execution, and reporting. In order to assess the quality of typical simulation studies in psychology, we reviewed 321 articles published in Psychological Methods, Behavioral Research Methods, and Multivariate Behavioral Research in 2021 and 2022, among which 100/321 = 31.2% report a simulation study. We find that many articles do not provide complete and transparent information about key aspects of the study, such as justifications for the number of simulation repetitions, Monte Carlo uncertainty estimates, or code and data to reproduce the simulation studies. To address this problem, we provide a summary of the ADEMP (Aims, Data-generating mechanism, Estimands and other targets, Methods, Performance measures) design and reporting framework from Morris, White, and Crowther (2019) adapted to simulation studies in psychology. Based on this framework, we provide ADEMP-PreReg, a step-by-step template for researchers to use when designing, potentially preregistering, and reporting their simulation studies. We give formulae for estimating common performance measures, their Monte Carlo standard errors, and for calculating the number of simulation repetitions to achieve a desired Monte Carlo standard error. Finally, we give a detailed tutorial on how to apply the ADEMP framework in practice using an example simulation study on the evaluation of methods for the analysis of pre–post measurement experiments.
... Analyzing multiple hier-archical data sets with variable sizes only requires a single network that has seen different data set sizes during training. The same network could also be used for various simulation studies, such as the challenging task of designing maximally informative experiments in a hierarchical BMC setting (Heck & Erdfelder, 2019;Myung & Pitt, 2009). ...
Preprint
Full-text available
Bayesian model comparison (BMC) offers a principled approach for assessing the relative merits of competing computational models and propagating uncertainty into model selection decisions. However, BMC is often intractable for the popular class of hierarchical models due to their high-dimensional nested parameter structure. To address this intractability, we propose a deep learning method for performing BMC on any set of hierarchical models which can be instantiated as probabilistic programs. Since our method enables amortized inference, it allows efficient re-estimation of posterior model probabilities and fast performance validation prior to any real-data application. In a series of extensive validation studies, we benchmark the performance of our method against the state-of-the-art bridge sampling method and demonstrate excellent amortized inference across all BMC settings. We then use our method to compare four hierarchical evidence accumulation models that have previously been deemed intractable for BMC due to partly implicit likelihoods. In this application, we corroborate evidence for the recently proposed L\'evy flight model of decision-making and show how transfer learning can be leveraged to enhance training efficiency. Reproducible code for all analyses is provided.
Article
Full-text available
Researchers have become increasingly aware that data-analysis decisions affect results. Here, we examine this issue systematically for multinomial processing tree (MPT) models, a popular class of cognitive models for categorical data. Specifically, we examine the robustness of MPT model parameter estimates that arise from two important decisions: the level of data aggregation (complete-pooling, no-pooling, or partial-pooling) and the statistical framework (frequentist or Bayesian). These decisions span a multiverse of estimation methods. We synthesized the data from 13,956 participants (164 published data sets) with a meta-analytic strategy and analyzed the magnitude of divergence between estimation methods for the parameters of nine popular MPT models in psychology (e.g., process-dissociation, source monitoring). We further examined moderators as potential sources of divergence. We found that the absolute divergence between estimation methods was small on average (<.04; with MPT parameters ranging between 0 and 1); in some cases, however, divergence amounted to nearly the maximum possible range (.97). Divergence was partly explained by few moderators (e.g., the specific MPT model parameter, uncertainty in parameter estimation), but not by other plausible candidate moderators (e.g., parameter trade-offs, parameter correlations) or their interactions. Partial-pooling methods showed the smallest divergence within and across levels of pooling and thus seem to be an appropriate default method. Using MPT models as an example, we show how transparency and robustness can be increased in the field of cognitive modeling.
Chapter
The first of three volumes, the five sections of this book cover a variety of issues important in developing, designing, and analyzing data to produce high-quality research efforts and cultivate a productive research career. First, leading scholars from around the world provide a step-by-step guide to doing research in the social and behavioral sciences. After discussing some of the basics, the various authors next focus on the important building blocks of any study. In section three, various types of quantitative and qualitative research designs are discussed, and advice is provided regarding best practices of each. The volume then provides an introduction to a variety of important and cutting-edge statistical analyses. In the last section of the volume, nine chapters provide information related to what it takes to have a long and successful research career. Throughout the book, example and real-world research efforts from dozens of different disciplines are discussed.
Article
Full-text available
For several years, the public debate in psychological science has been dominated by what is referred to as the reproducibility crisis. This crisis has, inter alia, drawn attention to the need for proper control of statistical decision errors in testing psychological hypotheses. However, conventional methods of error probability control often require fairly large samples. Sequential statistical tests provide an attractive alternative: They can be applied repeatedly during the sampling process and terminate whenever there is sufficient evidence in the data for one of the hypotheses of interest. Thus, sequential tests may substantially reduce the required sample size without compromising predefined error probabilities. Herein, we discuss the most efficient sequential design, the sequential probability ratio test (SPRT), and show how it is easily implemented for a 2-sample t test using standard statistical software. We demonstrate, by means of simulations, that the SPRT not only reliably controls error probabilities but also typically requires substantially smaller samples than standard t tests and other common sequential designs. Moreover, we investigate the robustness of the SPRT against violations of its assumptions. Finally, we illustrate the sequential t test by applying it to an empirical example and provide recommendations on how psychologists can employ it in their own research to benefit from its desirable properties. (PsycINFO Database Record (c) 2019 APA, all rights reserved).
Article
Full-text available
In experiments on multidimensional source memory, a stochastic dependency of source memory for different facets of an episode has been repeatedly demonstrated. This may suggest an integrated representation leading to mutual cuing in context retrieval. However, experiments involving a manipulated reinstatement of one source feature have often failed to affect retrieval of the other feature, suggesting unbound features or rather item-feature binding. The stochastic dependency found in former studies might be a spurious correlation due to aggregation across participants varying in memory strength. We test this artifact explanation by applying a hierarchical multinomial model. Observing stochastic dependency when accounting for interindividual differences would rule out the artifact explanation. A second goal is to elucidate the nature of feature binding: Contrasting encoding conditions with integrated feature judgments versus separate feature judgments are expected to induce different levels of stochastic dependency despite comparable overall source memory if integrated representations include feature-feature binding. The experiment replicated the finding of stochastic dependency and, thus, ruled out an artifact interpretation. However, we did not find different levels of stochastic dependency between conditions. Therefore, the current findings do not reveal decisive evidence to distinguish between the feature-feature binding and the item-context binding account.
Article
Full-text available
Previous research has established that higher levels of trait Honesty-Humility (HH) are associated with less dishonest behavior in cheating paradigms. However, only imprecise effect size estimates of this HH-cheating link are available. Moreover, evidence is inconclusive on whether other basic personality traits from the HEXACO or Big Five models are associated with unethical decision making and whether such effects have incremental validity beyond HH. We address these issues in a highly powered reanalysis of 16 studies assessing dishonest behavior in an incentivized, one-shot cheating paradigm ($N$ = 5,002). For this purpose, we rely on a newly developed logistic regression approach for the analysis of nested data in cheating paradigms. We also test theoretically derived interactions of HH with other basic personality traits (i.e., Emotionality and Conscientiousness) and situational factors (i.e., the baseline probability of observing a favorable outcome) as well as the incremental validity of HH over demographic characteristics. The results show a medium to large effect of HH (odds ratio = 0.53), which was independent of other personality, situational, or demographic variables. Only one other trait (Big Five Agreeableness) was associated with unethical decision making, although it failed to show any incremental validity beyond HH.
Article
Full-text available
Decision strategies explain how people integrate multiple sources of information to make probabilistic inferences. In the past decade, increasingly sophisticated methods have been developed to determine which strategy explains decision behavior best. We extend these efforts to test psychologically more plausible models (i.e., strategies), including a new, probabilistic version of the take-the-best (TTB) heuristic that implements a rank order of error probabilities based on sequential processing. Within a coherent statistical framework, deterministic and probabilistic versions of TTB and other strategies can directly be compared using model selection by minimum description length or the Bayes factor. In an experiment with inferences from given information, only three of 104 participants were best described by the psychologically plausible, probabilistic version of TTB. Similar as in previous studies, most participants were classified as users of weighted-additive, a strategy that integrates all available information and approximates rational decisions.
Article
Full-text available
Multinomial processing tree (MPT) models are a class of measurement models that account for categorical data by assuming a finite number of underlying cognitive processes. Traditionally, data are aggregated across participants and analyzed under the assumption of independently and identically distributed observations. Hierarchical Bayesian extensions of MPT models explicitly account for participant heterogeneity by assuming that the individual parameters follow a continuous hierarchical distribution. We provide an accessible introduction to hierarchical MPT modeling and present the user-friendly and comprehensive R package TreeBUGS, which implements the two most important hierarchical MPT approaches for participant heterogeneity − the beta-MPT (Smith & Batchelder, 2010) and the latent-trait MPT approach (Klauer, 2010). TreeBUGS reads standard MPT model files and obtains Markov chain Monte Carlo samples that approximate the posterior distribution. The functionality and output is tailored to the specific needs of MPT modelers and provides tests for the homogeneity of items and participants, individual and group parameter estimates, fit statistics, within- and between-subject comparisons, as well as goodness-of-fit and summary plots. We also propose and implement novel statistical extensions to include continuous and discrete predictors (either as fixed or random effects) in the latent-trait MPT model.
Book
Described by the philosopher A.J. Ayer as a work of ‘great originality and power’, this book revolutionized contemporary thinking on science and knowledge. Ideas such as the now legendary doctrine of ‘falsificationism’ electrified the scientific community, influencing even working scientists, as well as post-war philosophy. This astonishing work ranks alongside The Open Society and Its Enemies as one of Popper’s most enduring books and contains insights and arguments that demand to be read to this day. © 1959, 1968, 1972, 1980 Karl Popper and 1999, 2002 The Estate of Karl Popper. All rights reserved.
Article
In an attempt to increase the reliability of empirical findings, psychological scientists have recently proposed a number of changes in the practice of experimental psychology. Most current reform efforts have focused on the analysis of data and the reporting of findings for empirical studies. However, a large contingent of psychologists build models that explain psychological processes and test psychological theories using formal psychological models. Some, but not all, recommendations borne out of the broader reform movement bear upon the practice of behavioral or cognitive modeling. In this article, we consider which aspects of the current reform movement are relevant to psychological modelers, and we propose a number of techniques and practices aimed at making psychological modeling more transparent, trusted, and robust.