ArticlePDF Available

Bayesian adaptive stimulus selection for dissociating models of psychophysical data

August 2018
Journal of Vision 18(8):1-20

August 2018
18(8):1-20

DOI:10.1167/18.8.12

License
CC BY-NC-ND 4.0

Authors:

Luc P J Selen

Radboud University

Robert J van Beers

Vrije Universiteit Amsterdam

Pieter Medendorp

Radboud University

Comparing models facilitates testing different hypotheses regarding the computational basis of perception and action. Effective model comparison requires stimuli for which models make different predictions. Typically, experiments use a predetermined set of stimuli or sample stimuli randomly. Both methods have limitations; a predetermined set may not contain stimuli that dissociate the models, whereas random sampling may be inefficient. To overcome these limitations, we expanded the psi-algorithm (Kontsevich & Tyler, 1999) from estimating the parameters of a psychometric curve to distinguishing models. To test our algorithm, we applied it to two distinct problems. First, we investigated dissociating sensory noise models. We simulated ideal observers with different noise models performing a two-alternative forced-choice task. Stimuli were selected randomly or using our algorithm. We found using our algorithm improved the accuracy of model comparison. We also validated the algorithm in subjects by inferring which noise model underlies speed perception. Our algorithm converged quickly to the model previously proposed (Stocker & Simoncelli, 2006), whereas if stimuli were selected randomly, model probabilities separated slower and sometimes supported alternative models. Second, we applied our algorithm to a different problem-comparing models of target selection under body acceleration. Previous work found target choice preference is modulated by whole body acceleration (Rincon-Gonzalez et al., 2016). However, the effect is subtle, making model comparison difficult. We show that selecting stimuli adaptively could have led to stronger conclusions in model comparison. We conclude that our technique is more efficient and more reliable than current methods of stimulus selection for dissociating models.

Content uploaded by Robert J van Beers

Content may be subject to copyright.

Bayesian adaptive stimulus selection for dissociating models

of psychophysical data

James R. H. Cooke Radboud University, Donders Institute for Brain,

Cognition and Behaviour, Nijmegen, the Netherlands $

Luc P. J. Selen Radboud University, Donders Institute for Brain,

Cognition and Behaviour, Nijmegen, the Netherlands $

Robert J. van Beers

Radboud University, Donders Institute for Brain,

Cognition and Behaviour, Nijmegen, the Netherlands

Department of Human Movement Sciences, Vrije

Universiteit Amsterdam, Amsterdam, the Netherlands $

W. Pieter Medendorp Radboud University, Donders Institute for Brain,

Cognition and Behaviour, Nijmegen, the Netherlands $

Comparing models facilitates testing different

hypotheses regarding the computational basis of

perception and action. Effective model comparison

requires stimuli for which models make different

predictions. Typically, experiments use a predetermined

set of stimuli or sample stimuli randomly. Both methods

have limitations; a predetermined set may not contain

stimuli that dissociate the models, whereas random

sampling may be inefficient. To overcome these

limitations, we expanded the psi-algorithm (Kontsevich

& Tyler, 1999) from estimating the parameters of a

psychometric curve to distinguishing models. To test our

algorithm, we applied it to two distinct problems. First,

we investigated dissociating sensory noise models. We

simulated ideal observers with different noise models

performing a two-alternative forced-choice task. Stimuli

were selected randomly or using our algorithm. We

found using our algorithm improved the accuracy of

model comparison. We also validated the algorithm in

subjects by inferring which noise model underlies speed

perception. Our algorithm converged quickly to the

model previously proposed (Stocker & Simoncelli, 2006),

whereas if stimuli were selected randomly, model

probabilities separated slower and sometimes supported

alternative models. Second, we applied our algorithm to

a different problem—comparing models of target

selection under body acceleration. Previous work found

target choice preference is modulated by whole body

acceleration (Rincon-Gonzalez et al., 2016). However, the

effect is subtle, making model comparison difficult. We

show that selecting stimuli adaptively could have led to

stronger conclusions in model comparison. We conclude

that our technique is more efficient and more reliable

than current methods of stimulus selection for

dissociating models.

Introduction

Within neuroscience there is a clear interest in

developing computational models to explain neural

systems and behavior. This is seen in many disciplines,

such as working memory (Keshvari, van den Berg, &

Ma, 2012, 2013), speed perception (Stocker & Simon-

celli, 2006), multisensory integration (Acerbi, Dokka,

Angelaki, & Ma, 2017; Kording et al., 2007), effector

selection (Bakker, Weijer, van Beers, Selen, & Meden-

dorp, 2017), contrast gain tuning (DiMattina, 2016),

and temporal interval reproduction (Acerbi, Wolpert,

& Vijayakumar, 2012).

Inferring the best model out of several proposed

models is important. Unfortunately, model comparison

is typically difﬁcult. In addition to the computational

problem of having to integrate over the parameter

space of each model, it is also necessary to present

stimuli that can dissociate the models. If different

psychophysical models make similar predictions for

many of the stimuli presented, then it is difﬁcult to

dissociate these models. Despite the importance of

appropriate stimuli selection, many studies comparing

models either select stimuli randomly (Keshvari et al.,

2012, 2013) or use a set of constant stimuli (Acerbi et

Citation: Cooke, J. R. H., Selen, L. P. J., van Beers, R. J., & Medendorp, W. P. (2018). Bayesian adaptive stimulus selection for

dissociating models of psychophysical data. Journal of Vision,18(8):12, 1–20, https://doi.org/10.1167/18.8.12.

Journal of Vision (2018) 18(8):12, 1–20 1

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Downloaded From: https://jov.arvojournals.org/pdfaccess.ashx?url=/data/journals/jov/937434/ on 09/17/2018

al., 2017; Acerbi et al., 2012; Bakker et al., 2017;

Kording et al., 2007). Both of these approaches may

select stimuli that are uninformative for model com-

parison, resulting in a large number of trials to

accurately distinguish different models.

A more efﬁcient approach is to select stimuli that

optimize some criterion (often referred to as a utility

function). The idea of utility-based stimulus selection

has been studied extensively in statistics and machine

learning, typically called active learning (Gardner et al.,

2015; Kulick, Lieck, & Toussaint, 2014), adaptive

design optimization (Cavagnaro, Myung, Pitt, &

Kujala, 2010), and optimal experiment design (Di-

Mattina & Zhang, 2011). These types of algorithms

have been applied to a wide range of problems

including neuronal tuning curve estimation (Pillow &

Park, 2016), testing for deﬁcits in auditory perception

(Gardner et al., 2015), and machine classiﬁcation

(Houlsby, Husz´

ar, Ghahramani, & Lengyel, 2011), but

are not commonly employed in psychophysics. For a

more comprehensive review on the application of

adaptive stimulus selection in sensory systems neuro-

science see DiMattina and Zhang (2013).

Within psychophysics, selecting stimuli in an

adaptive manner has been used extensively for

estimating the parameters of a speciﬁc psychophysical

model. For example, Kontsevich and Tyler (1999)

used an information theoretic approach to estimate

the slope and threshold parameters of a one-dimen-

sional psychometric function, selecting on each trial

the stimulus that maximizes the information gain

about these parameters. Additional work then im-

proved on this by marginalizing out unwanted

parameters in order to improve the estimates of

desired parameters (Prins, 2013). However, many

psychophysical models are not unidimensional and as

such this approach was extended to multidimensional

models (DiMattina, 2015; Kujala & Lukka, 2006;

Lesmes, Lu, Baek, & Albright, 2010).

What if instead of inferring the parameters of these

multidimensional models, we wish to dissociate differ-

ent models? Wang and Simoncelli (2008) developed an

algorithm speciﬁcally designed for generating stimuli

on a trial-to-trial basis to compare two psychophysical

models. However, in many cases there are more than

two candidate models. More recent work used an

information theoretic approach to derive a method for

optimal stimulus selection to compare an arbitrary

number of models (DiMattina, 2016). However, this

approach does not determine the optimal stimulus on a

trial-to-trial basis and therefore may be a suboptimal

approach. Recently, a general approach for determin-

ing the optimal stimulus to compare multiple models

has been proposed in the ﬁeld of cognitive science

(Cavagnaro, Gonzalez, Myung, & Pitt, 2013; Cavag-

naro et al., 2010; Cavagnaro, Pitt, & Myung, 2011).

This approach, named Adaptive Design Optimization

(ADO), which simulates the utility distribution of

possible stimuli, can be done on a trial-to-trial basis

(Cavagnaro et al., 2013) and could be used to

distinguish more than two models. This makes it a

potentially powerful tool to select stimuli for compar-

ing models of psychophysical data. However, imple-

menting this approach requires a detailed

understanding of Monte Carlo–based simulation ap-

proaches such as particle ﬁltering and simulated

annealing.

This difﬁculty may prohibit widespread adoption of

ADO. Therefore, we present an alternative and easier

to implement algorithm for selecting stimuli on a trial-

to-trial basis to dissociate multiple models of psy-

chophysical data. The algorithm is a generalization of

the classical psi-method (Kontsevich & Tyler, 1999;

Prins, 2013), shifting from estimating parameters of

models to comparing models. In order to test our

algorithm, we applied it to two very different

psychophysical problems. First, we tested dissociating

distinct models of sensory noise that affect speed

perception. In order to do this we constructed three

generative models, each with its own noise properties,

that were probed by an ideal observer performing a

two-alternative forced-choice (2AFC) task. Stimuli

were either selected randomly or using our adaptive

algorithm. We found that when stimuli are selected

adaptively, the accuracy of model comparison im-

proved. We also tested our algorithm in real subjects

by inferring which of three sensory noise models best

explains their behavior in a speed perception task. To

do this, we used a psychophysical experiment in which

stimuli selected randomly, adaptively, or using a more

classical approach of measuring psychometric curves

around a variety of ﬁxed references. The adaptive

procedure converged with the model proposed in

earlier work (Stocker & Simoncelli, 2006), whereas the

random sampling method was often inconclusive

about the underlying noise model. Second, we tested

the algorithm on dissociating two models of saccadic

target selection under whole body acceleration (Rin-

con-Gonzalez et al., 2016). Based on the original

experimental data it is hard to dissociate between an

acceleration-dependent or acceleration-independent

target selection model at the individual subject level.

However, using simulations, we show that selecting

the stimuli adaptively could have led to stronger

conclusions during model comparison. We conclude

that our technique is more accurate and faster than the

current methods to dissociate psychophysical models.

In addition, we provide a Python implementation of

our algorithm, as well as the code and data to perform

the simulations and analysis presented.

Journal of Vision (2018) 18(8):12, 1–20 Cooke, Selen, van Beers, & Medendorp 2

Downloaded From: https://jov.arvojournals.org/pdfaccess.ashx?url=/data/journals/jov/937434/ on 09/17/2018

Methods

Our algorithm is based on an experimenter wishing

to determine which of a set of mdiscrete psychophys-

ical models best describes subjects’ behavior, under the

assumption that the model underlying subjects’ be-

havior is contained in the set of models. Under a

traditional experimental approach an experimenter

would present a number of stimuli xto a subject and

obtain the corresponding responses to these stimuli r.

Using Bayes’ rule, we can compute the probability of a

particular psychophysical model mgiven the responses

and stimuli as:

pðmjr;xÞ¼ pðrjx;mÞpðmÞ

Pmpðrjx;mÞpðmÞð1Þ

where p(m) is the prior probability of each model m,

p(mjr,x) is the posterior distribution of each model and

p(rjx,m) is referred to as the marginal likelihood. The

marginal likelihood is obtained by marginalizing over

the parameters hof the particular model:

pðrjx;mÞ¼X

pðrjx;h;mÞpðhjmÞð2Þ

Equation 1 makes it clear that our ability to

dissociate models is dependent on the stimuli xthat

were presented to the subject. Different stimuli and

responses produce different posterior distributions of

models. We can characterize the quality of a possible

posterior using a particular utility function. Following

previous work in model comparison, we use the

entropy of the posterior distribution to characterize its

quality (Cavagnaro et al., 2013; Cavagnaro et al., 2010;

Cavagnaro et al., 2011; DiMattina, 2016):

Hðx;rÞ¼X

pðmjx;rÞlogðpðmjx;rÞÞ ð3Þ

A posterior with lower entropy entails more certainty

about which model underlies the subjects’ behavior. A

minimal entropy distribution across models would be a

posterior mass of 1 at a single model and 0 at all others.

How should we select stimuli to minimize the

expected entropy of the model posterior? Here we

propose using a similar approach to that used

previously for minimizing the entropy of a parameter

posterior (Kontsevich & Tyler, 1999), by numerically

calculating on each trial the stimulus that minimizes

the expected entropy of the model posterior. For our

algorithm, we represent the possible stimuli on each

trial xand parameters hon discrete grids, similar to

Kontsevich and Tyler (1999). This requires three

quantities: a prior distribution over models p(m), a

prior distribution of parameters for each model

p(hjm), and a likelihood look-up table for each model

p(rjx,h,m), which represents the probability of a

response given a model and parameter set. Using these

quantities, we can design an iterative algorithm to

select the optimal stimuli on a trial-to-trial basis,

whichisasfollows:

1. Calculate for each model and all possible stimuli

the marginal likelihood of a response at trial t

given stimulus x:

ptðrjx;mÞ¼X

pðrjx;h;mÞptðhjmÞ

2. Compute the posterior distribution of models

given response rin the next trial to stimulus x:

ptðmjr;xÞ¼ ptðrjx;mÞptðmÞ

Pmptðrjx;mÞptðmÞ

Note, Pmptðrjx;mÞptðmÞcan also be written

(rjx) and should be stored as the term is also

used in Step 4.

3. Compute the entropy of the posterior distribution

over models given presented stimulus xand

response r:

Htðx;rÞ¼X

ptðmjx;rÞlogðptðmjx;rÞÞ

4. Because the response is unknown before the trial,

we must marginalize over all possible responses to

obtain the expected entropy:

E½HtðxÞ ¼ X

Htðx;rÞptðrjxÞ

5. Find the stimulus that produces a posterior with

the minimum expected entropy:

xtþ1¼arg min

E½HtðxÞ

6. Use x

tþ1

as the stimulus on the next trial to receive

response r

tþ1

7. Because Step 1 requires a prior on the parameters

(hjm), this prior must be recursively updated in

addition to updating the model priors. As such we

set the parameter and model priors to their posteriors:

ptðhjm;rtþ1;xtþ1Þ¼ ptðhjmÞpðrtþ1jxtþ1;h;mÞ

PhptðhjmÞpðrtþ1jxtþ1;h;mÞ

ptþ1ðhjmÞ¼ptðhjm;rtþ1;xtþ1Þ

ptþ1ðmÞ¼ptðmjrtþ1;xtþ1Þ

8. Return to the ﬁrst step until the desired number of

trials is completed or sufﬁcient model evidence has

been obtained.

Journal of Vision (2018) 18(8):12, 1–20 Cooke, Selen, van Beers, & Medendorp 3

Downloaded From: https://jov.arvojournals.org/pdfaccess.ashx?url=/data/journals/jov/937434/ on 09/17/2018

Experiment 1: Velocity judgment

Introduction

Most computational models of perception and

action take one particular assumption about how the

sensory uncertainty depends on the stimuli presented.

For example, there are models that assume sensory

noise is constant and independent of the stimuli

presented (Kording et al., 2007; Weiss, Simoncelli, &

Adelson, 2002), some assume a linear increase in the

standard deviation of the noise with the stimulus

magnitude (Battaglia, Kersten, & Schrater, 2011; San-

born & Beierholm, 2016), others take a combination of

these two (Odegaard, Wozny, & Shams, 2015;

Petzschner & Glasauer, 2011; Stocker & Simoncelli,

2006). To our knowledge only a few papers made an

explicit comparison between sensory noise models

(Acerbi et al., 2017; Acerbi et al., 2012; Jazayeri &

Shadlen, 2010). A striking ﬁnding in these comparison

studies is that the sensory noise model can vary among

subjects (Acerbi et al., 2017; Acerbi et al., 2012). Given

that the predictions of complex models—for example,

models of multisensory integration (Acerbi et al.,

2017)—are dependent on the assumed sensory noise

model, it is important to have an accurate model of

each subject’s sensory noise model. It is therefore

essential to validate the assumed sensory noise model to

ensure it is accurate.

One way to validate these assumptions is by

performing an additional experiment designed to

estimate the observer’s sensory noise model. However,

performing an additional experiment requires more

time and resources. Being able to minimize the number

of trials required to perform this type of comparison (as

well as increasing the inference accuracy) is therefore

beneﬁcial. This presents a potential use of our

algorithm—a method to validate sensory noise models

and infer them for use in more complex models. Here,

we use both simulation and a behavioral experiment to

demonstrate that our algorithm can be used to facilitate

inference of a subject’s sensory noise model. More

speciﬁcally, as an illustrative example, we focus on

inferring the sensory noise model underlying speed

perception. We used this paradigm for two reasons.

First, it is experimentally quick to test so we can

compare our algorithm to other methods of stimuli

selection. Second, previous work assumed a sensory

noise model that consisted of both a constant

component (the sensor is not perfect even when speed is

zero) and a component that linearly increases with

speed (Stocker & Simoncelli, 2006) and thus we can

compare our inference to this model.

Methods

Models

In order to test between different sensory noise

models we need to specify a model of the subjects’

responses. We derived a simple 2AFC model of subject

responses using signal detection theory (see Appendix

A). This leads to the response probability given a probe

and a reference s

, described by:

pðrjs2;s1;hÞ

¼kþð12kÞUðs2s1;a;r2

2ðmÞþr2

1ðmÞÞ ð4Þ

in which Uis the cumulative density function of a

Gaussian distribution, evaluated at point s

–s

with a

mean aand variance r2

2ðmÞþr2

1ðmÞ,r2

2ðmÞ, and r2

1ðmÞ

are the variances of the sensory noise for the probe and

reference stimuli respectively, kis a lapse rate

accounting for trials where an observer guesses

randomly, and ais a bias parameter accounting for

biases in subject’s responses. We assume the subject’s

sensory noise changes with the stimulus in one of three

ways. The ﬁrst, and simplest model, assumes sensory

variance is independent of the stimulus. We denote this

the constant noise model. The second model assumes

that the standard deviation of the sensory noise

increases linearly with the signal intensity, and thus has

zero standard deviation if the signal is absent. This

model is referred to as the Weber model. Finally, we

consider a model where the sensory noise is nonzero

when the signal is absent and also has a linearly

increasing part, which we will refer to as the generalized

model.

For the constant model, we assume the sensory

variance is constant r2¼ð5bÞ2(this parameterization

allows bto be kept in a similar range for each model);

for the Weber model we assume r

¼(bs)

, and for the

generalized model we assume r

¼c

þ(bs)

. The above

response model means we can parametrize a subject’s

response behavior (regardless of model) using four

parameters, h¼[a,b,c,k].

Simulation experiment

In order to investigate whether using our adaptive

algorithm facilitates comparison of sensory noise

models, we ﬁrst performed a simulation experiment. To

this end, we need to specify the grids to use for the

stimuli and parameters as well as the priors. The lower

bound, upper bound, and number of steps for all

variables are shown in Table 1. For the prior over

parameters p(hjm), we assumed a uniform discrete

distribution for each parameter and that the parame-

ters are independent. Finally, for the prior over models

p(m) we used a uniform distribution over the three

models.

Journal of Vision (2018) 18(8):12, 1–20 Cooke, Selen, van Beers, & Medendorp 4

Downloaded From: https://jov.arvojournals.org/pdfaccess.ashx?url=/data/journals/jov/937434/ on 09/17/2018

As different subjects could have different parameters

and noise models, it is important to test our algorithm

over a wide range of parameters and models. As such,

we ﬁrst generated 2,000 possible parameter combina-

tions. The parameters were drawn independently from

a continuous uniform distribution with the same upper

and lower bounds as those speciﬁed in Table 1. Next, in

order to assess how well we can infer the correct

generative model, we simulated 750 trials from each

model for each parameter combination. This entailed

using the same parameter combination for each model

(as the constant and Weber models are not dependent

on c, it was not used for these models). The stimuli for

these trials were either selected adaptively using our

algorithm, or randomly from the same stimulus grid.

This led to a total of 12,000 simulated datasets.

We used uniform priors to match the uniform

distribution we drew our parameters from. In practice

any prior distribution could be used, but if it is

continuous, the grid representation will create a

discrete approximation. We also performed an addi-

tional simulation using a truncated Gaussian parameter

distribution (Supplementary Material S1) to better

assess the performance of our algorithm.

Real experiment

We also tested whether our algorithm could facilitate

model comparison in actual subjects. This was done

using a 2AFC speed judgment task in which stimuli

were selected in one of three ways: adaptively (using

our algorithm), randomly (from the same stimulus grid

as adaptive), or using the traditional approach of

measuring separate psychometric curves for different

reference values (Stocker & Simoncelli, 2004, 2006)

using the psi algorithm (Kontsevich & Tyler, 1999). We

tested six naive subjects (four female, aged 25–34). The

experiment was approved by the local ethics committee

of the Social Sciences Faculty of Radboud University.

In accordance with the Declaration of Helsinki, written

informed consent was obtained from all subjects prior

to the experiment.

The stimuli consisted of two drifting Gabor patches

and a black ﬁxation dot, which were drawn using

PsychoPy (Peirce, 2009). Both patches were 38of visual

angle in size, with a spatial frequency of 1.5 cycle/deg,

the contrast of each was set to 90%, and the stimuli

were drawn at 68on either side of ﬁxation. The

background was gray with a luminance of 95.17 cd/m

The ﬁxation dot was 0.28in size and drawn in the center

of the screen. The stimuli were displayed at a resolution

of 1,024 3768 on a gamma-corrected 17-in. Iiyama

HM903DTB monitor (Iiyama, Tokyo, Japan) viewed

from a distance of approximately 43.5 cm.

On each trial, the subject saw both Gabors drift

simultaneously and horizontally for 1 s. Both Gabors

moved in the same direction on a given trial (direction

was left or right and was selected randomly for each

trial). One Gabor (the reference) drifted with speed s

8/s

and the other (the probe) with speed s

8/s. The subject

was asked to judge which of the two was faster and

indicate this with a button press. The position of the

reference stimulus (left or right of ﬁxation) was

randomized on each trial. The experiment was split into

two sessions, the ordering of which was counterbal-

anced across subjects. In one session (algorithm

session) subjects performed 1,500 trials, 750 of which

were adaptive trials and 750 were random trials. On an

adaptive trial, the Gabor speeds were selected using our

algorithm based on the previous stimuli (and responses)

generated by this algorithm; on a random trial the

speed of each Gabor was selected randomly from the

stimulus grid. The stimuli and parameter grids used

were the same as for the simulation experiment. In this

session the screen was refreshed at 72 Hz.

In another session (psi session), subjects performed

750 trials designed to measure their psychometric curve

for ﬁve reference values (150 trials per reference, see

Table 2 for the reference values used). On each trial, s

was randomly selected from a set of ﬁve possible

values, the value of s

on this trial was then selected

using the psi-marginal algorithm (Prins, 2013; see Table

2 for the grids used). This was done in order to

maximize the information gain about l(the point of

subjective equality) and r(the standard deviation) for

this particular value of s

under the assumption the

probability of a subject’s response follows:

pðrjs2;s1Þ¼kþð12kÞUðs2;l;r2Þð5Þ

In this equation r

is the variance of the normal

distribution and lis the mean of the distribution.

Selecting stimuli in this manner allows us to assess how

effective the more traditional ﬁxed reference approach

is to separating sensory noise models compared to our

algorithm. In this session, stimuli were refreshed at 144

Variable

Lower

bound grid

Upper

bound grid

Number of

steps

(deg/s) 0.6 9 10

(deg/s) 0.3 9 20

a(deg/s) 0.6 0.6 17

b() 0.01 0.5 25

c(deg/s) 0 2 20

k() 0 0.1 10

Table 1. Parameter grids used for simulation Experiment 1 and

the adaptive and random conditions in our subject experiment.

Note:s

is the reference speed stimulus, s

is the probe speed

stimulus, ais a bias parameter, bis a scaling parameter for the

subject’s sensory uncertainty, cis the base sensory uncertainty

of an observer (only used in the generalized model), and kis the

lapse rate of an observer.

Journal of Vision (2018) 18(8):12, 1–20 Cooke, Selen, van Beers, & Medendorp 5

Downloaded From: https://jov.arvojournals.org/pdfaccess.ashx?url=/data/journals/jov/937434/ on 09/17/2018

Hz. Note that the probe s

had a denser grid in this

session (see Table 2 compared to Table 1); this allows

us to better estimate the psychometric curve of each

subject but may also give an advantage to this method

in terms of model comparison. Prior to each session,

subjects performed 20 practice trials from the respective

session.

Analysis

For our analysis, we used Python 2.7 (Python

Software Foundation, https://www.python.org) and

additional Python-based toolboxes, primarily SciPy

(Jones, Oliphant, Peterson, & others, 2001), Numpy

(Walt, Colbert, & Varoquaux, 2011), Matplotlib

(Hunter, 2007), scikit-learn (Pedregosa et al., 2011),

and Pandas (McKinney, 2010). The data and code for

this article can be found at http://hdl.handle.net/11633/

di.dcc.DSC_2017.00053_185, in addition a standalone

implementation of the algorithm can be found at

https://gitlab.socsci.ru.nl/sensorimotorlab/

AdaptiveModelSelection.

In addition to computing the model probabilities for

every subject for the different sampling methods, we

also estimated each subject’s parameters for each model

by maximizing the log-likelihood of the parameter

values based on the subject’s responses (to increase

accuracy we pooled the data from all sessions). This

provides more sensitive parameter estimates than the

grid we used for model comparison and also allows us

to check the parameters are not close to the edges of the

grids we used.

We assumed the subject’s responses are independent

across trials. The subject’s response probability on each

trial can then be computed using Equation 4. The log-

likelihood of a parameter set given a subject’s entire

data set, is given by

logðLðhÞÞ ¼ X

2250

logðBernðri;pðrijs2i;s1i;hÞÞÞ ð6Þ

in which iis the trial index, ris a vector of subject

response, s

is a vector of the reference stimuli, s

is a

vector of probe stimuli, and Bern stands for a Bernoulli

distribution.

Parameter estimates ^

hwere then obtained by

minimizing the negative log-likelihood:

h¼arg min

hðlogðLðhÞÞÞ ð7Þ

This optimization was done numerically using the L-

BFGS-B algorithm (Byrd, Lu, Nocedal, & Zhu, 1995),

implemented in SciPy (Jones et al., 2001) and applied in

the scipy.optimize.minimize function. The L-BFGS-B is

an iterative algorithm designed to optimize a nonlinear

Variable Lower bound grid Upper bound grid Number of steps Prior s

(deg/s)

l(deg/s) 0.001 3 41 N(0.5, 2) 0.5

r(deg/s) 0.01 3 51 U0.5

k() 0 0.1 15 B(2, 20) 0.5

(deg/s) 0.01 3 61 N/A 0.5

l(deg/s) 0.001 4 41 N(1, 2) 1

r(deg/s) 0.01 4 51 U1

k() 0 0.1 15 B(2, 20) 1

(deg/s) 0.01 4 61 N/A 1

l(deg/s) 0.1 6 41 N(2, 2) 2

r(deg/s) 0.01 6 51 U2

k() 0 0.1 15 B(2, 20) 2

(deg/s) 0.1 6 61 N/A 2

l(deg/s) 1 9 41 N(4, 2) 4

r(deg/s) 0.01 9 51 U4

k() 0 0.1 15 B(2, 20) 4

(deg/s) 1 9 61 N/A 4

l(deg/s) 0.001 14 41 N(8, 2) 8

r(deg/s) 0.01 14 51 U8

k() 0 0.1 15 B(2, 20) 8

(deg/s) 3 14 61 N/A 8

Table 2. Parameter grids used in our fixed reference condition. Note:N(a, b) indicates the prior was normally distributed with mean a

and standard deviation r,Uindicates a discrete uniform distribution, and B(a, b) indicates a beta distribution with shape parameters

aand b. The values for s

were determined based on previous work on speed perception (Stocker & Simoncelli, 2004). The prior for l

was selected based on the assumption that the psychometric curve for a two-alternative forced-choice task will be close to unbiased.

The prior for kwas selected based on recommendations for the psignifit toolbox (Fr ¨

und, Haenel, & Wichmann, 2011; see http://

psignifit.sourceforge.net/).

Journal of Vision (2018) 18(8):12, 1–20 Cooke, Selen, van Beers, & Medendorp 6

Downloaded From: https://jov.arvojournals.org/pdfaccess.ashx?url=/data/journals/jov/937434/ on 09/17/2018

function subject to parameter boundaries (Byrd et al.,

1995). The parameter bounds were set to those in Table

1, with the exception of bwhich had bounds of [0.001,

1]. To ensure a global minimum was found we used 200

random initializations and selected the parameter set

with the highest log-likelihood. The initial values were

obtained by drawing each parameter value from a

continuous uniform distribution with the same bound

as those speciﬁed above.

In order to validate the results of the grid-based

model comparison we also computed the Akaike

Information Criterion (AIC) for each of the models.

This is a metric that summarizes how well a model ﬁts

(higher likelihood) the data while correcting for the

number of parameters (Akaike, 1974; Burnham &

Anderson, 2002),

AIC ¼2k2logðLð^

hÞÞ ð8Þ

in which kis the number of parameters of the model. It

is important to note that computing model probabil-

ities using Equation 1 also implicitly corrects for the

number of parameters (MacKay, 2003).

Results

Simulation experiment

Figure 1 shows the model probabilities over trials

averaged across the different parameter sets from our

simulation experiment. As expected, the model proba-

bilities trend towards 1 along the diagonal, indicating

that both adaptive and random sampling converge

towards the correct model. This demonstrates that our

algorithm does not introduce any bias during model

comparison, even when the number of parameters

differs between models. It can also be observed that the

probability of the correct model rises faster and is higher

when we select stimuli adaptively (green curves) than

when stimuli are selected randomly (orange curves). This

indicates the strength of evidence towards the correct

model is higher when we use adaptive sampling.

Although Figure 1 provides evidence that adaptive

sampling improves the strength of evidence towards the

correct model, it does not quantify how this increase

would affect the conclusions of an experiment. In order

to quantify the practical beneﬁt of adaptive sampling, we

computed the ratio of the probability of the generative

model against the other models (commonly referred to

as the Bayes factor). This ratio represents how much

more probable one model is than the other model

(MacKay, 2003). Because we consider three models, this

yields two Bayes factors, which the experimenter can use

to decide whether there is signiﬁcant evidence in favor of

a particular model. A commonly used criterion is a that

a Bayes factor over 3 indicates positive evidence towards

this model (Kass & Raftery, 1995).

Figure 2 shows the proportion of simulations where

the Bayes factors for the correct model against the

other two models were both over 3. This represents the

proportion of simulations in which we would ﬁnd

evidence in favor of the correct model. We see that

adaptive sampling has a higher proportion than

random sampling, indicating an experimenter would

conclude in favor of the correct model more often using

adaptive sampling. For example, an experimenter

would be twice as likely to ﬁnd strong evidence in favor

of the correct model using our approach if the

underlying model was the generalized one.

While Figure 2 shows that adaptive sampling

increases the probability of concluding in favor of the

Figure 1. Evolution of model probabilities over trials for

different generative models and algorithms. Columns indicate

the model used to generate the data; rows indicate the

probability of each model. The dark lines indicate the mean

probability averaged over simulations; light lines indicate

example simulations. Green coloring indicates stimuli were

selected adaptively; orange coloring indicates stimuli were

selected at random from the same stimulus grid.

Figure 2. Proportion of simulations where both Bayes factors of

the generative model relative to an alternative model is over 3,

plotted as a function of the number of trials. Each column

indicates the model used to generate the data.

Journal of Vision (2018) 18(8):12, 1–20 Cooke, Selen, van Beers, & Medendorp 7

Downloaded From: https://jov.arvojournals.org/pdfaccess.ashx?url=/data/journals/jov/937434/ on 09/17/2018

true generative model, it is not apparent why the

proportion of Bayes factors over 3 is lower when

stimuli are selected randomly. One possibility is that

random sampling still supports the true generative

model but the strength of this support is insufﬁcient;

another possibility is that random sampling supports

the incorrect model.

In order to explore these possibilities we plotted the

probability of the correct model for each sampling

method as a function of band c(see Figure 3). Figure 3

shows that the model probabilities are primarily green

to yellow when the generative model is Weber or

Constant. This indicates both methods mostly select the

correct model. We can also see that in general adaptive

sampling produces model probabilities that trend closer

to 1 (i.e., yellow), indicating stronger evidence in favor

of the correct model. When the generative model is the

Generalized model, a substantial number of simula-

tions produce probabilities supporting alternative

models (indicated by the blue shading). At ﬁrst this

seems counterintuitive. However, for small values of c

and b, the generalized model becomes almost equiva-

lent to the Weber and constant model. Because these

have fewer parameters, they are favored in this

situation.

Actual experiment

The previous section suggests that, in simulation,

adaptive sampling provides a large beneﬁt to model

comparison. We next tested whether this improvement

also transfers to actual experiments. Figure 4 shows the

model probabilities of each subject obtained from our

speed perception experiment and the average across

subjects. As shown, on average adaptive sampling

supports the generalized model, which is consistent with

previous work (McKee, Silverman, & Nakayama, 1986;

Stocker & Simoncelli, 2006). By contrast, both random

sampling and sampling from the psi algorithm are

indecisive as to the underlying noise model. The reason

follows from inspecting the individual subject data.

When stimuli are selected adaptively, the probability of

the generalized model is high for all subjects. By

contrast, random sampling supports the Weber model

for three subjects and the generalized for the others

(although the probability is lower than that found from

adaptive sampling). The psi session provides similar

results to the random session: three subjects are best

described by a generalized model and the remaining by

the Weber model. Given that the ﬁndings of the different

sampling methods are disparate, we also computed AIC

values on the data of all sessions grouped together,

Figure 3. Probability of the generative model as a function of parameter values for different generative models and algorithms.

Columns indicate the model used to generate the data; rows indicate the sampling method used to determine stimuli. Each point

indicates the probability of the correct model as a function of the parameters cand bfor one simulation. Note, the Weber and

constant models are independent of cand thus model probabilities do not change systematically as a function of c.The cvalue

plotted refers to the cused in the generalized model for this simulation; all other parameters are shared between the models. The red

ellipse indicates the M62SDs of the subjects’ parameter estimates for cand bobtained from the generalized model (see Table 3).

Journal of Vision (2018) 18(8):12, 1–20 Cooke, Selen, van Beers, & Medendorp 8

Downloaded From: https://jov.arvojournals.org/pdfaccess.ashx?url=/data/journals/jov/937434/ on 09/17/2018

which allows us to assess which model is the best based

on the entire data set (see Table 3). Shown by this table,

the AIC results favor the generalized model for every

subject, indicating that the results of the adaptive

sampling method are comparable to the results of the

grouped data. In addition, to assess the possibility that

our adaptive technique was supporting the incorrect

model, we performed additional simulations to verify

that the observed differences between the sampling

methods are as expected. Indeed, when the data is

generated from the generalized model, the random

sampling method often converges to the wrong (i.e.,

Weber) model (see Supplementary Material S1). To-

gether this suggests the conclusions drawn from the

adaptive sampling method are more accurate than

conclusion drawn from both random sampling or

measuring independent psychometric curves.

Although the results of the model comparison match

previous work, it is important to note that a model

being the most likely does not entail it ﬁts the data well,

just that it ﬁts better than the other models. It is

important to check the predictions of the models

against the data.

Figure 5 illustrates the data of each subject obtained

from the psi session as well as the predicted psycho-

metric curves obtained from ﬁtting the models to the

data obtained from the adaptive algorithm only

(therefore the models were not ﬁt to the data shown). As

shown, the constant model is in general a poor predictor

of the data. By contrast, both the predictions of the

Weber and generalized model are close to the data. This

matches the results of AIC comparison (see Table 3),

which indicated that the Weber and generalized model

produce better ﬁts to the data than the constant model.

This also means that the assumptions with regard to our

models (see Appendix A) are reasonable.

Another important property of adaptive algorithms

is that they do not sample uniformly across the entire

stimulus space. Instead, the stimuli selected are those

that are most informative to compare the models. In

order to visualize which stimuli these are in this

experiment, we plotted the stimuli selected using the

adaptive method for a representative subject (see

Figure 6). The adaptive sampling method alternates

between high and low speeds for the reference and

probe stimuli. This sampling strategy is sensible as the

noise models make distinct predictions for high and low

speeds and thus sampling at high and low speeds allows

for effective dissociation of the models.

Experiment 2: Target selection

Introduction

The previous section illustrates the use of our

algorithm as a method to dissociate different sensory

noise models. However, this is only one example

comparison. To ensure our algorithm is broadly

applicable, it is important to validate it in multiple

settings. Here, as an additional application, we

consider comparing models of saccadic target selec-

tion during self-motion (Rincon-Gonzalez et al.,

2016), a study recently performed in our lab. This

example allows us to investigate how much beneﬁt our

algorithm provides when the models being compared

are highly nonlinear and the signal-to-noise ratio in

the data is low.

Figure 4. Evolution of model probabilities over trials for each

subject. Columns indicate the probability of a particular model;

rows indicate the subject. Green lines show the model

probabilities when the stimuli were selected adaptively using

our algorithm; orange lines indicate the model probabilities

when stimulus were selected at random from the same stimuli

grid; and blue lines indicate stimuli were selected using the psi

algorithm. The lines in the mean plot show the mean model

probabilities over subjects; the shaded area indicates 61SEM

over subjects.

Journal of Vision (2018) 18(8):12, 1–20 Cooke, Selen, van Beers, & Medendorp 9

Downloaded From: https://jov.arvojournals.org/pdfaccess.ashx?url=/data/journals/jov/937434/ on 09/17/2018

In this experiment, subjects were passively trans-

lated from left to right in a sinusoidal motion proﬁle,

and at eight predeﬁned phases of the oscillation two

targets were presented. The subjects were instructed to

make a saccade to one of the two targets, which were

presented asynchronously with a particular stimulus

onset asynchrony (SOA). This produces a single

psychometric curve of subject’s choice as a function of

SOA for each phase. This curve can then be used to

determine the SOA at which the probability of

selecting each target is equal, referred to as the

balanced time delay (BTD). The experiment showed

that, on the group level, BTD changes sinusoidally as

a function of the motion phase, suggesting that

subjects’ target selection behavior, and thus prefer-

ence, is inﬂuenced by current body motion. However,

the amplitude of the modulation was small and the

signal-to-noise ratio was low, which made comparing

a sinusoidal modulation to alternative models difﬁcult

at the individual subject level. Our algorithm may

provide a solution to this difﬁculty, as adaptive

sampling selects the most informative stimuli to

dissociate the selected models.

Here, we ﬁrst reanalyze data from this experiment

and show that the data of approximately half of the

subjects are best described by a sinusoidal modulation

rather than a constant choice bias. In other subjects the

results of the model comparison are inconclusive. We

next demonstrate with simulations that using our

algorithm for stimulus selection would have improved

model comparison accuracy. This suggests our algo-

rithm is also useful to help dissociate models in

circumstances where the signal-to-noise ratio is limited.

Methods

Models

In order to test whether self-motion has any effect on

psychophysical choice behavior we consider two

models of choice behavior, a constant bias model and a

sinusoidal bias model (Bakker et al., 2017). We model

choice behavior as:

pðrj/;SOAÞ¼UðSOA;l;rÞð9Þ

in which ris the subject’s response, /is the phase at

which the targets are presented, Uis a cumulative

Gaussian with mean land standard deviation r

evaluated at the SOA. For the constant model, lis a

ﬁxed value across phases: l¼a. In this model choices

are independent of the phase of the motion. The

sinusoidal model entails lchanges sinusoidally as a

function of phase, and is thus written l¼aþbsin(/þ

), in which a,b, and /

are free parameters

representing a subject’s ﬁxed bias, amplitude of the

modulation, and phase offset, respectively. Regardless

of the model, we can parameterize the subject response

probability using h¼[a,b,/

,r].

Reanalysis

In order to test whether the individual subject’s

choice behavior is modulated sinusoidally and to

obtain reasonable parameters to utilize in our simula-

tions, we reanalyzed the data of 17 subjects from

Rincon-Gonzalez et al. (2016). We ﬁt both the

sinusoidal and constant bias models to each subject’s

choice data. We assumed the responses are independent

across trials. The response probability on each trial can

Subject Model a(deg/s) b()k()c(deg/s) DAIC

1 Weber 0.043 0.354 0 N/A 17.459

1 Constant 0.075 0.073 0.1 N/A 202.656

1 Generalized 0.006 0.314 0 0.187 0

2 Weber 0.009 0.223 0.005 N/A 16.031

2 Constant 0.043 0.061 0.054 N/A 220.883

2 Generalized 0.016 0.197 0.004 0.122 0

3 Weber 0.136 0.268 0.055 N/A 171.633

3 Constant 0.027 0.195 0.027 N/A 141.811

3 Generalized 0.049 0.222 0.002 0.584 0

4 Weber 0.034 0.308 0.003 N/A 14.415

4 Constant 0.026 0.052 0.1 N/A 180.728

4 Generalized 0.002 0.272 0.004 0.161 0

5 Weber 0.019 0.256 0.023 N/A 109.731

5 Constant 0.074 0.182 0.007 N/A 151.102

5 Generalized 0.041 0.18 0 0.447 0

6 Weber 0.028 0.424 0 N/A 85.408

6 Constant 0.19 0.188 0.06 N/A 175.483

6 Generalized 0.149 0.303 0 0.533 0

Table 3. Best fit parameters and AIC (DAIC, generalized other model) of each model and subject.

Journal of Vision (2018) 18(8):12, 1–20 Cooke, Selen, van Beers, & Medendorp 10

Downloaded From: https://jov.arvojournals.org/pdfaccess.ashx?url=/data/journals/jov/937434/ on 09/17/2018

Figure 5. Data and model predictions for psychometric curves measured in the psi session. Each row indicates the psychometric

curves of a particular subject; each column indicates the reference value (s

) for this psychometric curve. Gray dots indicate the

proportion of trials where observers report s

; proportions were obtained by binning responses in 10 bins from the minimum to

maximum probe value (s

) for this subject and reference (s

) value. Curves indicate the predicted proportion from each of the

models. Note, the parameters used for the predictions were obtained from fitting only to stimuli selected using our algorithm and

thus were not fit to the data shown.

Figure 6. Stimuli adaptively selected for Subject 2. The left plot shows the probe (blue dots) and reference (red dots) selected on each

trial in Experiment 1. The right plot shows a scatter plot of the combination of probe and reference. Radius of the data points is

proportional to ﬃﬃﬃﬃ

pwhere Nis the number of times this combination was selected.

Journal of Vision (2018) 18(8):12, 1–20 Cooke, Selen, van Beers, & Medendorp 11

Downloaded From: https://jov.arvojournals.org/pdfaccess.ashx?url=/data/journals/jov/937434/ on 09/17/2018

be computed using Equation 9. The log-likelihood of a

subjects’ data set is then,

logðLðhÞÞ ¼ X

logðBernðri;pðrijSOAi;/i;hÞÞÞ ð10Þ

in which iis the trial index, Nis the number of trials, r

is a vector of subject responses, SOA is a vector of the

SOA’s the subject was presented, /is a vector

containing the phase the targets were presented at, and

Bern stands for a Bernoulli distribution.

Parameter estimates ^

hwere then obtained using

Equation 7. As before this optimization was done

numerically using the L-BFGS-B algorithm (Byrd et

al., 1995). The parameter bounds were set to those in

Table 4, with the exception of a,b, and rwhich had

bounds of [250, 250], [0, 250], and [0.1, 250]

respectively. To ensure a global minimum was found,

we used 300 random initializations and selected the

parameter set with the highest log-likelihood. The

initial values were obtained by drawing each parameter

value from a continuous uniform distribution with the

same bound as those speciﬁed above.

In order to validate the results of the grid-based

model comparison we also computed the AIC for each

of the models using Equation 8. As an additional

analysis we ﬁt a cumulative Gaussian (see Equation 5)

to the data from each phase (using the same bounds as

for the constant model and kset to 0) to provide us

with a semiparametric estimate of BTD for each phase.

Simulation experiment

In order to investigate whether using our adaptive

algorithm could help to dissociate these different

models of target selection, we performed a simulation

experiment. The required grids are speciﬁed in Table 4.

As priors we used a uniform discrete distribution for

each parameter and a uniform distribution over the two

models.

We ﬁrst generated 2,000 possible parameter combi-

nations. Parameters were drawn independently from a

continuous uniform distribution with the same upper

and lower bound as those speciﬁed in Table 4.Next,in

order to assess how well we can infer the correct

generative model for each parameter combination, we

simulated a synthetic subject performing 1,000 trials for

each generative model and parameter combination.

Note, the constant model is independent of band /

and

thus they were removed from the parameter set when

simulating this model. The stimuli for these trials were

selected either randomly from the stimulus grid shown in

Table 4 or using our adaptive algorithm. This led to a

total of 8,000 simulated datasets. Additional simulations

were performed based on a truncated Gaussian param-

eter distribution, reﬂecting the estimated behavioral

parameter range (see Supplementary Material S1).

Results

The AIC scores and parameter estimates for both

models are shown in Table 5. In order to interpret the

AIC scores it is useful to note that an AIC difference of

over 4 is considered positive evidence towards the

model with the lower score (Burnham et al., 2002). This

suggests the model comparison in eight of the subjects

is ambiguous (AIC difference under 4), no subjects are

best described by the constant bias model, and nine

subjects are best described the sinusoidal bias model.

Interestingly, it can be seen that even in the ambiguous

cases the amplitude parameter bis not at zero. This

implies the modulation of BTD is sinusoidal but the

effect on the log-likelihood is insufﬁcient to overcome

the penalization for the additional parameters. This is

also supported by the model predictions shown in

Variable

Lower

bound grid

Upper

bound grid

Number

of steps

/(rad) 0 5.5 8

SOA (ms) 250 250 25

a(ms) 70 70 15

b(ms) 0 60 15

(rad) 3315

r(ms) 50 190 15

Table 4. Parameter grids used for simulation Experiment 2.

Figure 7. Sinusoidal and constant model predictions for an

example subject and across subjects. For the group the dashed

line indicates the mean predicted BTD across subjects; for the

example subject it indicates the predicted BTD. The shaded

regions indicates 61SEM across subjects. Data points are the

BTD obtained by fitting a psychometric curve to each phase. The

error bars indicate 61SEM across subjects.

Journal of Vision (2018) 18(8):12, 1–20 Cooke, Selen, van Beers, & Medendorp 12

Downloaded From: https://jov.arvojournals.org/pdfaccess.ashx?url=/data/journals/jov/937434/ on 09/17/2018

Figure 7, illustrating that the sinusoidal model is a

closer ﬁt than the constant model to the independent

estimate of BTD for each phase.

In order to explore if our algorithm can facilitate

model comparison, we plotted the average model

probabilities across trials for both models and sampling

methods used in our simulation experiment (see Figure

8). The model probabilities trend to 1 along the

diagonal, indicating both adaptive and random sam-

pling converge towards the correct model. As before,

the probabilities are higher for the adaptive sampling

method compared to random sampling, suggesting that

our algorithm increases the strength of evidence

towards the correct model. The magnitude of this

increase is lower than observed in simulation Experi-

ment 1.

We also quantiﬁed how each sampling method

affects the conclusions drawn by computing the Bayes

factor of the generative model against the other model.

These Bayes factors are plotted in Figure 9. Interest-

ingly, if stimuli are selected randomly and the correct

model is sinusoidal we only conclude in favor of it in

60%of the simulations. This matches with the mixed

results from the reanalysis. Adaptive sampling in-

creases the proportion of simulations in which we ﬁnd

strong evidence in favor of the correct model. For the

sinusoidal model, we obtained a beneﬁt of about 15%,

which is a smaller beneﬁt than observed in the noise

model simulation.

In order to explore why the models cannot be

strongly dissociated in each simulation, we plotted the

probability of the correct model as a function of rand

b(see Figure 10). If the generative model is the constant

Subject Model a(ms) b(ms) /

(rad) r(ms) DAIC

1 Sinusoidal bias 67.478 15.895 0.002 74.505 0.000

1 Constant bias 67.467 N/A N/A 76.780 8.469

2 Sinusoidal bias 69.792 19.323 1.653 158.262 0.000

2 Constant bias 69.589 N/A N/A 160.475 2.810

3 Sinusoidal bias 14.556 8.692 1.436 65.669 0.000

3 Constant bias 14.492 N/A N/A 66.621 1.480

4 Sinusoidal bias 42.007 47.468 1.161 176.735 0.000

4 Constant bias 42.793 N/A N/A 190.810 23.854

5 Sinusoidal bias 25.052 1.283 0.607 61.613 0.000

5 Constant bias 25.050 N/A N/A 61.623 3.871

6 Sinusoidal bias 54.271 15.12 0.481 65.382 0.000

6 Constant bias 54.346 N/A N/A 67.457 11.753

7 Sinusoidal bias 21.550 24.817 1.21 68.076 0.000

7 Constant bias 21.652 N/A N/A 73.418 30.514

8 Sinusoidal bias 12.830 14.322 1.012 95.736 0.000

8 Constant bias 12.836 N/A N/A 97.677 3.968

9 Sinusoidal bias 0.216 16.564 0.529 97.097 0.000

9 Constant bias 0.318 N/A N/A 99.491 4.859

10 Sinusoidal bias 3.735 15.64 0.956 127.027 0.000

10 Constant bias 3.783 N/A N/A 129.383 1.729

11 Sinusoidal bias 60.449 13.294 0.295 116.987 0.000

11 Constant bias 60.467 N/A N/A 118.368 0.652

12 Sinusoidal bias 7.974 16.892 0.394 117.834 0.000

12 Constant bias 8.233 N/A N/A 119.763 3.079

13 Sinusoidal bias 1.041 19.071 0.359 74.627 0.000

13 Constant bias 1.056 N/A N/A 77.387 16.147

14 Sinusoidal bias 27.310 17.567 0.847 151.860 0.000

14 Constant bias 27.203 N/A N/A 154.390 0.969

15 Sinusoidal bias 13.869 10.955 0.759 68.580 0.000

15 Constant bias 13.752 N/A N/A 69.499 5.662

16 Sinusoidal bias 4.454 48.497 1.085 122.325 0.000

16 Constant bias 4.918 N/A N/A 141.447 56.525

17 Sinusoidal bias 39.354 14.658 0.227 68.175 0.000

17 Constant bias 39.470 N/A N/A 70.346 12.939

Table 5. Best fit parameters and AIC differences (DAIC, sinusoidal other model) for each model for all subjects.

Journal of Vision (2018) 18(8):12, 1–20 Cooke, Selen, van Beers, & Medendorp 13

Downloaded From: https://jov.arvojournals.org/pdfaccess.ashx?url=/data/journals/jov/937434/ on 09/17/2018

bias model, both adaptive and random sampling

method lead to model probabilities favoring the correct

model but the adaptive method produces only slightly

higher probabilities. Adaptive sampling leads to the

probability of the correct model being slightly higher

(indicated by a more yellow hue), which leads to a

larger proportion of Bayes factors being over 3. By

contrast, when the generative model is the sinusoidal

model, the model probabilities range from strongly in

favor of the sinusoidal model to strongly in favor of the

constant model for both sampling methods. This is

understandable because the smaller the amplitude of

the sinusoid, the closer the sinusoidal model becomes to

the constant model and thus penalizing for the

additional parameters leads to favoring the simpler

constant model. Interestingly, the shift in model

probabilities from sinusoidal to constant is dependent

on the variability of a subjects decisions; the smaller r

is, the lower bcan be, while still inferring in favor of the

sinusoidal model.

To determine why adaptive sampling improves the

chance of inferring in favor of the correct generative

model, Figure 11 illustrates the phase and SOA selected

using the adaptive algorithm for an example simula-

tion. In the initial trials, the algorithm samples broadly

over the phase and SOA, but then converges to a few

combinations of SOA and phase. Speciﬁcally, adaptive

sampling selects the phases where the BTD is maximal

or minimal and SOA values close to the current a

estimate.

Discussion

Using a series of simulations in which the correct

generative model is known, we show that selecting

stimuli adaptively increases the probability of inferring

the correct generative model. We further show this

increase affects the conclusions an experimenter could

draw. When stimuli are selected adaptively an exper-

imenter is more likely to conclude strongly in favor of

the correct generative model and it requires fewer trials

to reach this conclusion. For example, in Figure 2 when

the generative model is the generalized model, our

adaptive algorithm yields in only 250 trials strong

evidence towards the correct model in 60%of the

simulations. By contrast almost none of the simulations

using random sampling showed strong evidence.

We illustrate this model comparison beneﬁt in two

distinct settings—ﬁrst, dissociating different sensory

noise models and second, dissociating models of target

selection. As an additional step towards practical

application, we also used our algorithm to test between

sensory noise models of human speed perception. We

found that selecting stimuli adaptively increases the

strength of evidence towards the model previously

proposed (Stocker & Simoncelli, 2004, 2006).

Our ﬁndings match with previous work in cognitive

science that illustrate models of memory retention can

be better dissociated by selecting stimuli adaptively

(Cavagnaro et al., 2010; Cavagnaro et al., 2011). We

also illustrate that the magnitude of improvement

provided by adaptive sampling is highly speciﬁc to the

models being compared. Speciﬁcally, we found a

dramatic improvement in dissociating sensory noise

models but only a small improvement in dissociating

models of saccadic target selection. This illustrates that

the performance of our algorithm will depend on a

variety of factors, including the speciﬁc models, where

the subjects lie in the parameter space of the models

and how coarse the grids being used are.

Being able to compare models in an efﬁcient manner

encourages comparison of different models that may

Figure 8. Evolution of model probabilities over trials for

different generative models and algorithms. Columns indicate

the model used to generate the data; rows indicate the

probability of each model. The dark lines indicate the mean

probability averaged over simulations; light lines indicate

example simulations. Green coloring indicates stimuli were

selected adaptively; orange coloring indicates stimuli were

selected at random from the same stimulus grid.

Figure 9. Proportion of simulations where the Bayes factors

with respect to the generative model is over 3. Each column

indicates the model used to generate the data.

Journal of Vision (2018) 18(8):12, 1–20 Cooke, Selen, van Beers, & Medendorp 14

Downloaded From: https://jov.arvojournals.org/pdfaccess.ashx?url=/data/journals/jov/937434/ on 09/17/2018

otherwise not be compared. For example, in many

cases the sensory noise model is a single component of

a more complex model (Acerbi et al., 2017; Acerbi et

al., 2012; Jazayeri & Shadlen, 2010). In the aforemen-

tioned work, the possibility of different sensory noise

models is dealt with through model comparison.

However, incorporating multiple sensory noise models

adds an additional degree of freedom, to the space of

possible models, which can introduce difﬁculties in

model comparison (Acerbi, Ma, & Vijayakumar, 2014).

Speciﬁcally, multiple models with different components

(for instance, sensory noise, priors, loss functions) can

ﬁt the same data equally well, which makes inferring

the correct components difﬁcult (Acerbi et al., 2014).

This study also indicated a possible solution to this

problem—ﬁxing certain model components and pa-

rameters based on previous work or additional

experiments. As such an experimenter could perform

an additional experiment to test the sensory noise

model (and also obtain parameter estimates) for each

subject, which could then be ﬁxed in the model

comparison. Our algorithm presents an efﬁcient way to

test between the noise models in a small number of

trials and therefore could be used as a method for

efﬁcient model selection.

Although we illustrate, in two distinct practical

examples, the beneﬁts of using our algorithm, there are

limitations to our approach. One major limitation is the

grid-based approach we use in our algorithm. While

this approach is reasonable for the relatively simple

models we tested here, it is unfeasible for more complex

models (models with either more parameters or more

stimuli dimensions). This is because if we use the same

sized grid for each parameter, the number of points

increases exponentially with the number of parameter

dimensions or stimuli dimensions (DiMattina, 2015).

For more complex models, these grids could exceed the

RAM memory available in certain computers, pre-

venting our algorithm from being applicable. In

addition, more complex models will require more time

to compute the optimal stimulus. For example, it takes

approximately 100 ms with our current models; the

additional time increase may render the current

implementation unfeasible for more complex models.

Fortunately, there are a number of different ap-

Figure 11. Stimuli selected adaptively for an example simula-

tion. The upper two plots indicate the phase and SOA sampled

across trials. In both plots the blue dots indicate the sampled

stimuli for a particular trial. For the phase plot the dashed lines

indicate the phases (from our stimulus set) for which the BTD is

maximal or minimal. For the SOA plot the dashed line indicates

the baseline BTD (the BTD independent of phase modulations).

The lower plot indicates a scatter plot of the combination of

phase and SOA. The radius of the data point is proportional to

ﬃﬃﬃﬃ

pwhere Nis the number of times this combination was

selected.

Figure 10. Probability of the generative model as a function of

parameter values for different generative models and algo-

rithms. Columns indicate the model used to generate the data;

rows indicate the sampling method used to determine stimuli.

Each point indicates the probability of the correct model as a

function of amplitude band standard deviation of a subject’s

choices r. Note, constant bias model is independent of

amplitude b; thus model probabilities do not change system-

atically as a function of b. The bvalue plotted refers to the value

used in the sinusoidal model. The red ellipse indicates the M6

1SD of the subject’s parameters obtained from the sinusoidal

model (see Table 5).

Journal of Vision (2018) 18(8):12, 1–20 Cooke, Selen, van Beers, & Medendorp 15

Downloaded From: https://jov.arvojournals.org/pdfaccess.ashx?url=/data/journals/jov/937434/ on 09/17/2018

proaches that can compensate for these problems. One

method is to use an adaptive approach to selecting the

number of grid points and their positions (Kim, Pitt,

Lu, Steyvers, & Myung, 2014; Pﬂ¨

uger, Peherstorfer, &

Bungartz, 2010). The notion is that the contribution of

each point in the parameter space is not equal and thus

more points should be used for more informative

regions of the parameter space. This approach,

previously suggested in the context of parameter

estimation (DiMattina, 2015), could allow our algo-

rithm to scale to higher dimensional models, or to more

than three models. Another alternative solution is to

use an analytic approximation to the parameter

posterior—for example, by using a Laplace approxi-

mation (DiMattina, 2015) or by a sum-of-Gaussians

(DiMattina & Zhang, 2011)—and compute the optimal

stimuli based on the approximated posterior. With such

an approximation it is only necessary to maintain the

parameters for the approximation rather than large

grids. Again this allows our algorithm to scale up to

higher dimensions and more models. However, this

comes at the computational cost of having to reﬁt each

of these approximations to every model on each trial.

As the time required to evaluate the likelihood typically

increases approximately linearly with the number of

data points, this means the time required to reﬁt these

approximation increases with the duration of the

experiment (DiMattina, 2015). Additionally, if the

shapes of the posteriors are a poor match to these

approximations (for example, highly skewed distribu-

tions are poorly approximated using a Laplace

approximation), then this approach may perform

poorly compared to grid approximations that present a

nonparametric method of representing the posterior

(DiMattina, 2015). Given that these different ap-

proaches have distinct costs and beneﬁts, it is

important to quantitatively test them to see how each

performs in terms of accuracy, computation time, and

memory usage. A detailed comparison of this type has

been performed in terms of adaptive stimulus selection

for parameter estimation (DiMattina, 2015), but to our

knowledge, no such analysis has been performed for

model comparison. An important avenue for further

work would be to explicitly compare our algorithm to

other existing algorithms (DiMattina, 2016; Cavagnaro

et al., 2010) to identify the relative costs and beneﬁts of

each approach.

In addition to the practical limitations of our

approach, it is important to consider the theoretical

implications of using adaptive sampling on model

comparison. For example, adaptive sampling could

signiﬁcantly change the distribution of stimuli pre-

sented to the subjects (see Figure 6) and therefore

couldviolateassumptionsusedincertainmodel

comparisons. For example, it is assumed the subject’s

underlying model is independent of the stimuli

presented and typically Bayesian observer models

assume that the subject’s priors matches the stimulus

distribution (Keshvari et al., 2012, 2013). The adaptive

approach may cause violations of these assumptions.

To illustrate this, consider the change detection

experiments referenced above (Keshvari et al., 2012,

2013). In these experiments subjects are ﬁrst shown a

number of oriented ellipses that the subject has to

memorize. Subsequently the ellipses are displayed

again either with the same orientation or a changed

orientation, and the observer must report whether a

change is perceived or not. All the models compared

forthistaskassumedthesubjectusedacircular

uniform prior over the size of the change (the same

used to generate the stimuli). If we were to generate

the change magnitude adaptively instead, this could

create a nonuniform distribution. Presenting a non-

uniform distribution of change magnitude may cause

subjects to alter their response strategy. For example,

if a subject is only being presented trials with large

changes he/she may shift from encoding the stimuli

precisely to a more coarse encoding of the stimuli, as

precise encoding is no longer needed for the task. This

biased distribution could also create a mismatch

between the assumed (uniform circular) prior in the

model and the actual experiment, which could cause

biases in model comparison.

Although these issues may seem severe, the risk can

be mitigated. Our suggestion is to not rely only on

adaptive techniques as deﬁnitive evidence towards a

model. It is important that multiple experiments and

sampling methods support the same model. In some

cases discrepancies may be found between sampling

methods (e.g., in our noise model comparison experi-

ment). In these cases it is important to perform

simulations to see if these results are to be expected (see

Supplementary Material S1 for the simulation we

performed) or if the adaptive technique could be

biasing the comparison.

A ﬁnal theoretical point is that our algorithm makes

a number of assumptions—for example, the ‘‘true’’

modelusedbythesubjectispartoftheincludedsetof

models being considered (an assumption in all

parametric model comparisons). If the true model is

notpartofthissetthenthestimuliarenotoptimized

to ﬁnd evidence for this model. Obviously, in real

subjects, it is impossible to know what the true model

is—rather we are searching for realistic models that

best explain the subject data. It is important to be

aware that when using any adaptive approach the

stimuli are only optimized for dissociating the

assumed model set. We additionally assumed that the

trials are conditionally independent, that is there is no

intertrial dependence. To our knowledge, there is no

detailed analysis of how adaptive approaches fare if

their assumptions are violated. An important direction

Journal of Vision (2018) 18(8):12, 1–20 Cooke, Selen, van Beers, & Medendorp 16

Downloaded From: https://jov.arvojournals.org/pdfaccess.ashx?url=/data/journals/jov/937434/ on 09/17/2018

for further work would be to explore how robust

adaptive approaches are to violations of assumptions,

as well as ways to mitigate the effects of the violations.

An additional area for further work is the impor-

tance of priors in dissociating models. For simplicity,

we used uniform priors for both models and param-

eters. However, this neglects prior information that

may reduce the number of trials necessary to estimate

which model is the best. How should we determine

these priors? Within statistics itself there is little

consensus on how this should be done, ranging from

the prior being a subjective choice of the experimenter

(de Finetti, 2017) to the prior being objectively

estimated from data (Jaynes, 2003). Recent work has

embraced the latter approach and used hierarchical

Bayesian modeling to estimate the prior based on

previous subjects (Kim et al., 2014). For example, this

approach has been successful in determining param-

eter priors to use for observers’ contrast sensitivity

functions, both in simulations and in actual experi-

ments (Gu et al., 2016; Kim et al., 2014). A similar

approach could be taken for estimating both param-

eter and model priors by creating a hierarchical model

that incorporates the different models to be compared

and ﬁtting this to data from previous subjects. An

important step for further work would be to formalize

this generalization and investigate how these priors

affect model inference.

Keywords: Bayesian,adaptive experiment design,

model comparison

Acknowledgments

This work was supported by the European Research

Council grant EU-ERC-283567 (to WPM) and the

Netherlands Organization of Scientiﬁc Research grants

NWO-VICI: 453-11-001 (to WPM and JRHC) and

NWO-VENI: 451-10-017 (to LPJS). We would like to

thank the two anonymous reviewers for their helpful

comments, including the change detection example.

Commercial relationships: none.

Corresponding author: James R. H. Cooke.

Email: j.cooke@donders.ru.nl.

Address: Radboud University, Donders Institute for

Brain, Cognition and Behaviour, Nijmegen, the

Netherlands.

References

Acerbi, L., Dokka, K., Angelaki, D. E., & Ma, W. J.

(2017, June). Bayesian comparison of explicit and

implicit causal inference strategies in multisensory

heading perception. bioRxiv, 150052, https://doi.

org/10.1101/150052.

Acerbi, L., Ma, W. J., & Vijayakumar, S. (2014). A

framework for testing identifiability of Bayesian

models of perception. In Advances in neural

information processing systems (pp. 1026–1034).

Red Hook, NY: Curran Associates, Inc.

Acerbi, L., Wolpert, D. M., & Vijayakumar, S. (2012).

Internal representations of temporal statistics and

feedback calibrate motor-sensory interval timing.

PLoS Computational Biology,8(11), e1002771,

https://doi.org/10.1371/journal.pcbi.1002771.

Acuna, D. E., Berniker, M., Fernandes, H. L., &

Kording, K. P. (2015). Using psychophysics to ask

if the brain samples or maximizes. Journal of

Vision,15(3):7, 1–16, https://doi.org/10.1167/15.3.

7. [PubMed] [Article]

Akaike, H. (1974). A new look at the statistical model

identification. IEEE Transactions on Automatic

Control,19(6), 716–723, https://doi.org/10.1109/

TAC.1974.1100705.

Bakker, R. S., Weijer, R. H. A., van Beers, R. J., Selen,

L. P. J., & Medendorp, W. P. (2017). Decisions in

motion: Passive body acceleration modulates hand

choice. Journal of Neurophysiology,117(6): 2250–

2261, https://doi.org/10.1152/jn.00022.2017.

Battaglia, P. W., Kersten, D., & Schrater, P. R. (2011).

How haptic size sensations improve distance

perception. PLoS Computational Biology,7(6),

e1002080, https://doi.org/10.1371/journal.pcbi.

1002080.

Burnham, K. P., & Anderson, D. R., (2002). Model

selection and multimodel inference: A practical

information-theoretic approach (2nd ed.). New

York, NY: Springer.

Byrd, R., Lu, P., Nocedal, J., & Zhu, C. (1995). A

limited memory algorithm for bound constrained

optimization. SIAM Journal on Scientific Comput-

ing,16(5), 1190–1208, https://doi.org/10.1137/

0916069.

Cavagnaro, D. R., Gonzalez, R., Myung, J. I., & Pitt,

M. A. (2013, February). Optimal decision stimuli

for risky choice experiments: An adaptive ap-

proach. Management Science,59(2), 358–375,

https://doi.org/10.1287/mnsc.1120.1558.

Cavagnaro, D. R., Myung, J. I., Pitt, M. A., & Kujala,

J. V. (2010). Adaptive design optimization: A

mutual information-based approach to model

discrimination in cognitive science. Neural Compu-

tation,22(4), 887–905.

Cavagnaro, D. R., Pitt, M. A., & Myung, J. I. (2011).

Model discrimination through adaptive experimen-

Journal of Vision (2018) 18(8):12, 1–20 Cooke, Selen, van Beers, & Medendorp 17

Downloaded From: https://jov.arvojournals.org/pdfaccess.ashx?url=/data/journals/jov/937434/ on 09/17/2018

tation. Psychonomic Bulletin & Review,18(1), 204–

210, https://doi.org/10.3758/s13423-010-0030-4.

de Finetti, B. (2017). Theory of probability: A critical

introductory treatment. Chichester, UK: John Wiley

& Sons.

DiMattina, C. (2015). Fast adaptive estimation of

multidimensional psychometric functions. Journal

of Vision,15(9):5, 1–20, https://doi.org/10.1167/15.

9.5. [PubMed] [Article]

DiMattina, C. (2016). Comparing models of contrast

gain using psychophysical experiments. Journal of

Vision,16(9):1, 1–18, https://doi.org/10.1167/16.9.

1. [PubMed] [Article]

DiMattina, C., & Zhang, K. (2011). Active data

collection for efficient estimation and comparison

of nonlinear neural models. Neural Computation,

23(9), 2242–2288.

DiMattina, C., & Zhang, K. (2013). Adaptive stimulus

optimization for sensory systems neuroscience.

Frontiers in Neural Circuits, 7, https://doi.org/10.

3389/fncir.2013.00101.

Fr¨

und, I., Haenel, N. V., & Wichmann, F. A. (2011).

Inference for psychometric functions in the pres-

ence of nonstationary behavior. Journal of Vision,

11(6):16, 1–19, https://doi.org/10.1167/11.6.16.

[PubMed] [Article]

Gardner, J., Malkomes, G., Garnett, R., Weinberger,

K. Q., Barbour, D., & Cunningham, J. P. (2015).

Bayesian active model selection with an application

to automated audiometry. In Advances in neural

information processing systems (pp. 2377–2385).

Red Hook, NY: Curran Associates, Inc.

Gu, H., Kim, W., Hou, F., Lesmes, L. A., Pitt, M. A.,

Lu, Z.-L., & Myung, J. I. (2016). A hierarchical

Bayesian approach to adaptive vision testing: A

case study with the contrast sensitivity function.

Journal of Vision,16(6):15, 1–17, https://doi.org/10.

1167/16.6.15. [PubMed] [Article]

Houlsby, N., Husz´

ar, F., Ghahramani, Z., & Lengyel,

M. (2011). Bayesian active learning for classifica-

tion and preference learning. arXiv:1112.5745 [cs,

stat]. (arXiv: 1112.5745).

Hunter, J. D. (2007). Matplotlib: A 2d graphics

environment. Computing in Science Engineering,

9(3), 90–95, https://doi.org/10.1109/MCSE.2007.

55.

Jaynes, E. T. (2003). Probability theory: The logic of

science. Cambridge, UK: Cambridge University

Press.

Jazayeri, M., & Shadlen, M. N. (2010). Temporal

context calibrates interval timing. Nature Neuro-

science,13(8), 1020–1026, https://doi.org/10.1038/

nn.2590.

Jones, E., Oliphant, T., Peterson, P., & others. (2001).

SciPy: Open source scientific tools for Python.

Retrieved from http://www.scipy.org/.

Kass, R. E., & Raftery, A. E. (1995). Bayes factors.

Journal of the American Statistical Association,

90(430), 773–795, https://doi.org/10.1080/

01621459.1995.10476572.

Keshvari, S., van den Berg, R., & Ma, W. J. (2012).

Probabilistic computation in human perception

under variability in encoding precision. PLoS One,

7(6), e40216, https://doi.org/10.1371/journal.pone.

0040216.

Keshvari, S., van den Berg, R., & Ma, W. J. (2013). No

evidence for an item limit in change detection.

PLoS Computational Biology,9(2), e1002927.

Kim, W., Pitt, M. A., Lu, Z.-L., Steyvers, M., &

Myung, J. I. (2014). A hierarchical adaptive

approach to optimal experimental design. Neural

Computation,26(11), 2465–2492.

Kontsevich, L. L., & Tyler, C. W. (1999). Bayesian

adaptive estimation of psychometric slope and

threshold. Vision Research,39(16), 2729–2737,

https://doi.org/10.1016/S0042-6989(98)00285-5.

Kording, K. P., Beierholm, U., Ma, W. J., Quartz, S.,

Tenenbaum, J. B., & Shams, L. (2007). Causal

inference in multisensory perception. PLoS One,

2(9), e943, https://doi.org/10.1371/journal.pone.

0000943.

Kujala, J. V., & Lukka, T. J. (2006). Bayesian adaptive

estimation: The next dimension. Journal of Math-

ematical Psychology,50(4), 369–389, https://doi.

org/10.1016/j.jmp.2005.12.005.

Kulick, J., Lieck, R., & Toussaint, M. (2014). Active

learning of hyperparameters: An expected cross

entropy criterion for active model selection. ArXiv:

1409.7552v1.

Kwon, O.-S., Tadin, D., & Knill, D. C. (2015).

Unifying account of visual motion and position

perception. Proceedings of the National Academy of

Sciences,112(26), 8142–8147, https://doi.org/10.

1073/pnas.1500361112.

Lesmes, L. A., Lu, Z.-L., Baek, J., & Albright, T. D.

(2010). Bayesian adaptive estimation of the con-

trast sensitivity function: The quick CSF method.

Journal of Vision,10(3):17, 1–21, https://doi.org/10.

1167/10.3.17. [PubMed] [Article]

MacKay, D. J. (2003). Information theory, inference and

learning algorithms. Cambridge, UK: Cambridge

University Press.

McKee, S. P., Silverman, G. H., & Nakayama, K.

(1986). Precise velocity discrimination despite

random variations in temporal frequency and

contrast. Vision Research,26(4), 609–619.

Journal of Vision (2018) 18(8):12, 1–20 Cooke, Selen, van Beers, & Medendorp 18

Downloaded From: https://jov.arvojournals.org/pdfaccess.ashx?url=/data/journals/jov/937434/ on 09/17/2018

McKinney, W. (2010). Data structures for statistical

computing in python. In Proceedings of the 9th

Python in Science Conference (Vol. 445, pp. 51–56).

SciPy Austin, TX.

Odegaard, B., Wozny, D. R., & Shams, L. (2015).

Biases in visual, auditory, and audiovisual percep-

tion of space. PLoS Computational Biology,11(12),

e1004649, https://doi.org/10.1371/journal.pcbi.

1004649.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel,

V., Thirion, B., Grisel, O., ... Duchesnay, E.

(2011). Scikit-learn: Machine learning in Python.

Journal of Machine Learning Research,12(Oct),

2825–2830.

Peirce, J. W. (2009). Generating stimuli for neurosci-

ence using PsychoPy. Frontiers in Neuroinformatics,

2, https://doi.org/10.3389/neuro.11.010.2008.

Petzschner, F. H., & Glasauer, S. (2011). Iterative

Bayesian estimation as an explanation for range

and regression effects: A study on human path

integration. Journal of Neuroscience,31(47), 17220–

17229, https://doi.org/10.1523/jneurosci.2028-11.

2011.

Pfl¨

uger, D., Peherstorfer, B., & Bungartz, H.-J. (2010).

Spatially adaptive sparse grids for high-dimen-

sional data-driven problems. Journal of Complexi-

ty,26(5), 508–522, https://doi.org/10.1016/j.jco.

2010.04.001.

Pillow, J. W., & Park, M. (2016). Adaptive Bayesian

methods for closed-loop neurophysiology. In

Closed Loop Neuroscience (pp. 3–18). Amsterdam,

the Netherlands: Elsevier.

Prins, N. (2013). The psi-marginal adaptive method:

How to give nuisance parameters the attention they

deserve (no more, no less). Journal of Vision,13(7):

3, 1–17, https://doi.org/10.1167/13.7.3. [PubMed]

[Article]

Rincon-Gonzalez, L., Selen, L. P. J., Halfwerk, K.,

Koppen, M., Corneil, B. D., & Medendorp, W. P.

(2016). Decisions in motion: vestibular contribu-

tions to saccadic target selection. Journal of

Neurophysiology,116(3), 977–985, https://doi.org/

10.1152/jn.01071.2015.

Sanborn, A. N., & Beierholm, U. R. (2016). Fast and

accurate learning when making discrete numerical

estimates. PLoS Computational Biology,12(4),

e1004859, https://doi.org/10.1371/journal.pcbi.

1004859.

Stocker, A. A., & Simoncelli, E. P. (2004). Constraining

a Bayesian model of human visual speed percep-

tion. In Advances in neural information processing

systems 17 (pp. 1361–1368). Cambridge, MA: MIT

Press.

Stocker, A. A., & Simoncelli, E. P. (2006). Noise

characteristics and prior expectations in human

visual speed perception. Nature Neuroscience,9(4),

578–585, https://doi.org/10.1038/nn1669.

Walt, S. v. d., Colbert, S. C., & Varoquaux, G. (2011,

March). The NumPy array: A structure for efficient

numerical computation. Computing in Science

Engineering,13(2), 22–30, https://doi.org/10.1109/

MCSE.2011.37.

Wang, Z., & Simoncelli, E. P. (2008). Maximum

differentiation (MAD) competition: A methodolo-

gy for comparing computational models of per-

ceptual quantities. Journal of Vision,8(12):8, 1–13,

https://doi.org/10.1167/8.12.8. [PubMed] [Article]

Weiss, Y., Simoncelli, E. P., & Adelson, E. H. (2002).

Motion illusions as optimal percepts. Nature

Neuroscience,5(6), 598–604, https://doi.org/10.

1038/nn0602-858.

Appendix A

In order to model a subject’s 2AFC behavior as a

function of different sensory noise models we assume a

subject receives two sensory measurements x

and x

one for the reference and one for the probe. We model

these as normally distributed random variables, with a

mean centered on the true reference and probe values

and a variance that is a function of the underlying

sensory noise model. As such we can write x

and x

x1;Nðs1;r2

1ðmÞÞ,x2;Nðs2;r2

2ðmÞÞ. We assume an

observer responds 1 if x

and 0 otherwise. In order

to derive the distribution of an observer’s response it is

useful to note this is equivalent to x

–x

.0. As x

and x

are normally distributed random variables,

subtracting them produces another normally distrib-

uted variable Dx. Therefore the subject’s response

probability can be written as:

pðDxjs1;s2Þ¼Nðs2s1;r2

2ðmÞþr2

1ðmÞÞ

The likelihood of an observer responding 1 is

obtained by integrating over positive values of D

pðr¼1js1;s2Þ¼Z‘

pðDxjs1;s2ÞdDx

pðr¼1js1;s2Þ¼Uðs2s1;0;r2

2þr2

1Þ:

Because the responses are mutually exclusive, it follows

that the likelihood of a subject responding 0 is,

pðr¼0js1;s2Þ¼1pðr¼1js1;s2;hÞ

in which Uis the cumulative normal distribution,

evaluated at point s

–s

, with a mean of 0 and variance

2þr2

1. This entails that a subject’s 2AFC behavior is

Journal of Vision (2018) 18(8):12, 1–20 Cooke, Selen, van Beers, & Medendorp 19

Downloaded From: https://jov.arvojournals.org/pdfaccess.ashx?url=/data/journals/jov/937434/ on 09/17/2018

unbiased and also that subjects do not lapse during the

experiment. To make the model more realistic, we

augment it with a small bias term ato account for small

deviations from unbiased behavior and a lapse term k

to account for lapses in the task. Therefore, the final

response probability can be written,

pðr¼1js1;s2Þ¼kþð12kÞUðs2s1;a;r2

2þr2

1Þ:

It is important to note that subjects do not estimate

the underlying speed (using Bayes rule) as they are

using sensory observations rather than posterior

estimates. This was done for two reasons. First, there is

not a consensus on how additional information is

incorporated in speed perception; some models propose

that observers incorporate assumptions about motion

dynamics to create priors (Kwon, Tadin, & Knill,

2015); others propose statistics of natural stimuli are

used to form priors (Stocker & Simoncelli, 2004, 2006).

Second, unless a uniform prior (across the real line) or

conjugate prior is used, computing the response

probability in closed form is difﬁcult (recent work has

analytically derived the effect of Gaussian priors in

2AFC tasks; Acuna, Berniker, Fernandes, & Kording,

2015).

Because our main focus is the sensory noise model,

not the incorporation of priors, our experiment was

designed such that the inﬂuence of priors should be

negligible and hence our derived response probability

should be a reasonable approximation. Speciﬁcally, it

has been shown that the bias in speed estimation

decreases when stimuli are close to ﬁxation (Kwon et

al., 2015) and biases decrease when contrast is high

(Stocker & Simoncelli, 2004, 2006). Our stimuli were

centered relatively close to ﬁxation (68eccentricity)

compared to other speed perception experiments

(Kwon et al., 2015) and also had a much higher

contrast than is typical (Stocker & Simoncelli, 2004,

2006). This means most of subject behavior should be

governed by the sensory noise and not the prior.

Journal of Vision (2018) 18(8):12, 1–20 Cooke, Selen, van Beers, & Medendorp 20

Downloaded From: https://jov.arvojournals.org/pdfaccess.ashx?url=/data/journals/jov/937434/ on 09/17/2018

Article

Jan 2019

As we age, the acuity of our sensory organs declines, which may affect our lifestyle. Sensory deterioration in the vestibular system is typically bilateral and gradual, and could lead to problems with balance and spatial orientation. To compensate for the sensory deterioration, it has been suggested that the brain reweights the sensory information sources according to their relative noise characteristics. For rehabilitation and training programs, it is important to understand the consequences of this reweighting, preferably at the individual subject level. Here, we psychometrically examined the age-dependent reweighting of visual and vestibular cues used in spatial orientation in a group of 32 subjects (age range: 19-76 years). We asked subjects to indicate the orientation of a line (clockwise or counterclockwise relative to the gravitational vertical) presented within an oriented squared visual frame when seated upright or with their head 30° tilted relative to the body. Results show that subjects’ vertical perception is biased by the orientation of the visual frame. Both the magnitude of this bias and response variability become larger with increasing age. Deducing the underlying sensory noise characteristics, using Bayesian inference, suggests an age-dependent reweighting of sensory information, with an increasing weight of the visual contextual information. Further scrutiny of the model suggests that this shift in sensory weights is the result of an increase in the noise of the vestibular signal. Our approach quantifies how noise properties of visual and vestibular systems change over the lifespan, which helps to understand the aging process at the neurocomputational level.

Bayesian Comparison of Explicit and Implicit Causal Inference Strategies in Multisensory Heading Perception

Preprint

Full-text available

Jun 2017

The precision of multisensory heading perception improves when visual and vestibular cues arising from the same cause, namely motion of the observer through a stationary environment, are integrated. Thus, in order to determine how the cues should be processed, the brain must infer the causal relationship underlying the multisensory cues. In heading perception, however, it is unclear whether observers follow the Bayesian strategy, a simpler non-Bayesian heuristic, or even perform causal inference at all. We developed an efficient and robust computational framework to perform Bayesian model comparison of causal inference strategies, which incorporates a number of alternative assumptions about the observers. With this framework, we investigated whether human observers’ performance in an explicit cause attribution and an implicit heading discrimination task can be modeled as a causal inference process. In the explicit inference task, all subjects accounted for cue disparity when reporting judgments of common cause, although not necessarily all in a Bayesian fashion. By contrast, but in agreement with previous findings, data from the heading discrimination task only could not rule out that several of the same observers were adopting a forced-fusion strategy, whereby cues are integrated regardless of disparity. Only when we combined evidence from both tasks we were able to rule out forced-fusion in the heading discrimination task. Crucially, findings were robust across a number of variants of models and analyses. Our results demonstrate that our proposed computational framework allows researchers to ask complex questions within a rigorous Bayesian framework that accounts for parameter and model uncertainty.

Decisions in motion: Passive body acceleration modulates hand choice

Article

Full-text available

Mar 2017

In everyday life, we frequently have to decide which hand to use for a certain action. It has been suggested that for this decision the brain calculates expected costs based on action values, such as expected biomechanical costs, expected success rate, handedness and skillfulness. While these conclusions were based on experiments in stationary subjects, we often act while the body is in motion. We investigated how hand choice is affected by passive body motion, which directly affects the biomechanical costs of the arm movement due to its inertia. Using a linear motion platform, twelve right-handed subjects were sinusoidally translated (0.625Hz and 0.5Hz). At eight possible motion phases, they had to reach using either their left or right hand to a target presented at one of eleven possible locations. We predicted hand choice by calculating the expected biomechanical costs under different assumptions about the future acceleration involved in these computations, being the forthcoming acceleration during the reach, the instantaneous acceleration at target onset or zero acceleration as if the body is stationary. While hand choice was generally biased to using the dominant hand, it also modulated sinusoidally with the motion, with the amplitude of the bias depending on the motion's peak acceleration. The phase of hand choice modulation was consistent with the cost model that took the instantaneous acceleration signal at target onset. This suggest that the brain relies on the bottom-up acceleration signals, and not on predictions about future accelerations, when deciding on hand choice during passive whole-body motion.

Comparing models of contrast gain using psychophysical experiments

Article

Full-text available

Jul 2016
J VISION

Christopher DiMattina

In a wide variety of neural systems, neurons tuned to a primary dimension of interest often have responses that are modulated in a multiplicative manner by other features such as stimulus intensity or contrast. In this methodological study, we present a demonstration that it is possible to use psychophysical experiments to compare competing hypotheses of multiplicative gain modulation in a neural population, using the specific example of contrast gain modulation in orientation-tuned visual neurons. We demonstrate that fitting biologically interpretable models to psychophysical data yields physiologically accurate estimates of contrast tuning parameters and allows us to compare competing hypotheses of contrast tuning. We demonstrate a powerful methodology for comparing competing neural models using adaptively generated psychophysical stimuli and demonstrate that such stimuli can be highly effective for distinguishing qualitatively similar hypotheses. We relate our work to the growing body of literature that uses fits of neural models to behavioral data to gain insight into neural coding and suggest directions for future research.

Decisions in motion: Vestibular contributions to saccadic target selection

Article

Full-text available

Jun 2016

The natural world continuously presents us with many opportunities for action, thus a process of target selection must precede action execution. While there has been considerable progress in understanding target selection in stationary environments, little is known about target selection when we are in motion. Here we investigated the effect of self-motion signals on saccadic target selection in a dynamic environment. Human subjects were sinusoidally translated (f=0.6 Hz, 30 cm peak-to-peak displacement) along an inter-aural axis using a vestibular sled. During the motion two visual targets were presented asynchronously but equidistantly on either side of fixation. Subjects had to look at one of these targets as quickly as possible. Using an adaptive approach, the time delay between these targets was adjusted until the subject selected both targets equally often. We determined this balanced time delay for different phases of the motion in order to distinguish the effects of body acceleration and velocity on saccadic target selection. Results show that acceleration (or position, as these are indistinguishable during sinusoidal motion), but not velocity, affect target selection for saccades. Subjects preferred to look at targets in the direction of the acceleration - the leftward target was preferred when the sled accelerated to the left, and vice versa. Saccadic reaction times mimicked this selection bias by being reliably shorter to targets in the direction of acceleration. Our results provide evidence that saccade target selection mechanisms are modulated by self-motion signals, which could be derived directly from the otolith system.

A hierarchical Bayesian approach to adaptive vision testing: A case study with the contrast sensitivity function

Article

Full-text available

Apr 2016
J VISION

Measurement efficiency is of concern when a large number of observations are required to obtain reliable estimates for parametric models of vision. The standard entropy-based Bayesian adaptive testing procedures addressed the issue by selecting the most informative stimulus in sequential experimental trials. Noninformative, diffuse priors were commonly used in those tests. Hierarchical adaptive design optimization (HADO; Kim, Pitt, Lu, Steyvers, & Myung, 2014) further improves the efficiency of the standard Bayesian adaptive testing procedures by constructing an informative prior using data from observers who have already participated in the experiment. The present study represents an empirical validation of HADO in estimating the human contrast sensitivity function. The results show that HADO significantly improves the accuracy and precision of parameter estimates, and therefore requires many fewer observations to obtain reliable inference about contrast sensitivity, compared to the method of quick contrast sensitivity function (Lesmes, Lu, Baek, & Albright, 2010), which uses the standard Bayesian procedure. The improvement with HADO was maintained even when the prior was constructed from heterogeneous populations or a relatively small number of observers. These results of this case study support the conclusion that HADO can be used in Bayesian adaptive testing by replacing noninformative, diffuse priors with statistically justified informative priors without introducing unwanted bias.

Fast and Accurate Learning When Making Discrete Numerical Estimates

Article

Full-text available

Apr 2016
PLOS COMPUT BIOL

Many everyday estimation tasks have an inherently discrete nature, whether the task is counting objects (e.g., a number of paint buckets) or estimating discretized continuous variables (e.g., the number of paint buckets needed to paint a room). While Bayesian inference is often used for modeling estimates made along continuous scales, discrete numerical estimates have not received as much attention, despite their common everyday occurrence. Using two tasks, a numerosity task and an area estimation task, we invoke Bayesian decision theory to characterize how people learn discrete numerical distributions and make numerical estimates. Across three experiments with novel stimulus distributions we found that participants fell between two common decision functions for converting their uncertain representation into a response: drawing a sample from their posterior distribution and taking the maximum of their posterior distribution. While this was consistent with the decision function found in previous work using continuous estimation tasks, surprisingly the prior distributions learned by participants in our experiments were much more adaptive: When making continuous estimates, participants have required thousands of trials to learn bimodal priors, but in our tasks participants learned discrete bimodal and even discrete quadrimodal priors within a few hundred trials. This makes discrete numerical estimation tasks good testbeds for investigating how people learn and make estimates.

Data Structures for Statistical Computing in Python

Conference Paper

Jan 2010

Wes McKinney

Motion illusions as optimal percepts

Article

Jan 2002

The pattern of local image velocities on the retina encodes important environmental information. Although humans are generally able to extract this information, they can easily be deceived into seeing incorrect velocities. We show that these 'illusions' arise naturally in a system that attempts to estimate local image velocity. We formulated a model of visual motion perception using standard estimation theory, under the assumptions that (i) there is noise in the initial measurements and (ii) slower motions are more likely to occur than faster ones. We found that specific instantiation of such a velocity estimator can account for a wide variety of psychophysical phenomena.

Theory of Probability: A Critical Introductory Treatment

Book

Dec 2016

B. de Finetti

De Finetti's theory of probability is one of the foundations of Bayesian theory. De Finetti stated that probability is nothing but a subjective analysis of the likelihood that something will happen and that that probability does not exist outside the mind. It is the rate at which a person is willing to bet on something happening. This view is directly opposed to the classicist/ frequentist view of the likelihood of a particular outcome of an event, which assumes that the same event could be identically repeated many times over, and the 'probability' of a particular outcome has to do with the fraction of the time that outcome results from the repeated trials.

Adaptive Bayesian Methods for Closed-Loop Neurophysiology

Chapter

Dec 2016

An important problem in the design of neurophysiology experiments is to select stimuli that rapidly probe a neuron’s tuning or response properties. This is especially important in settings where the neural parameter space is multidimensional and the experiment is limited in time. Bayesian active learning methods provide a formal solution to this problem using a statistical model of the neural response and a utility function that quantifies what we want to learn. In contrast to staircase and other ad hoc stimulus selection methods, Bayesian active learning methods use the entire set of past stimuli and responses to make inferences about functional properties and select the next stimulus. Here we discuss recent advances in Bayesian active learning methods for closed-loop neurophysiology experiments. We review the general ingredients for Bayesian active learning and then discuss two specific applications in detail: (1) low-dimensional nonlinear response surfaces (also known as “tuning curves” or “firing rate maps”) and (2) high-dimensional linear receptive fields. Recent work has shown that these methods can achieve higher accuracy in less time, allowing for experiments that are infeasible with nonadaptive methods. We conclude with a discussion of open problems and exciting directions for future research.

Bayesian adaptive stimulus selection for dissociating models of psychophysical data

Abstract

Recommended publications

Last Chance to Apply! 18 Assistant & Associate Professor positions at VU Amsterdam

See our 18 open Asst/Assoc. Professor positions at the science faculty of VU Amsterdam

'If we want to develop AI that helps people, we need all the brainpower we can get.'

Bayesian adaptive stimulus selection for dissociating models of psychophysical data

Bayesian adaptive stimuli selection for dissociating psychophysical models

Effect of depth information on multiple-object tracking in three dimensions: A probabilistic perspec...

Causal inference for spatial constancy accross body motion final Supplemental material preprint

Fast adaptive estimation of multidimensional psychometric functions