ArticlePDF AvailableLiterature Review

How to measure metacognition

Frontiers in Human Neuroscience

July 2014
8:443

DOI:10.3389/fnhum.2014.00443

Source
PubMed

License
CC BY 3.0

Authors:

Stephen M Fleming

University College London

Hakwan Lau

University of California, Los Angeles

The ability to recognize one's own successful cognitive processing, in e.g., perceptual or memory tasks, is often referred to as metacognition. How should we quantitatively measure such ability? Here we focus on a class of measures that assess the correspondence between trial-by-trial accuracy and one's own confidence. In general, for healthy subjects endowed with metacognitive sensitivity, when one is confident, one is more likely to be correct. Thus, the degree of association between accuracy and confidence can be taken as a quantitative measure of metacognition. However, many studies use a statistical correlation coefficient (e.g., Pearson's r) or its variant to assess this degree of association, and such measures are susceptible to undesirable influences from factors such as response biases. Here we review other measures based on signal detection theory and receiver operating characteristics (ROC) analysis that are “bias free,” and relate these quantities to the calibration and discrimination measures developed in the probability estimation literature. We go on to distinguish between the related concepts of metacognitive bias (a difference in subjective confidence despite basic task performance remaining constant), metacognitive sensitivity (how good one is at distinguishing between one's own correct and incorrect judgments) and metacognitive efficiency (a subject's level of metacognitive sensitivity given a certain level of task performance). Finally, we discuss how these three concepts pose interesting questions for the study of metacognition and conscious awareness.

Schematic showing the theoretical dissociation between metacognitive sensitivity and bias. Each graph shows a hypothetical probability density of confidence ratings for correct and incorrect trials, with confidence increasing from left to right along each x-axis. Metacognitive sensitivity is the separation between the distributions—the extent to which confidence discriminates between correct and incorrect trials. Metacognitive bias is the overall level of confidence expressed, independent of whether the trial is correct or incorrect. Note that this is a cartoon schematic and we do not mean to imply any parametric form for these “Type 2” signal detection theoretic distributions. Indeed, as shown by Galvin et al. (2003), these distributions are unlikely to be Gaussian.

…

(A) Example type 2 ROC function for a single subject. Each point plots the type 2 false alarm rate on the x-axis against the type 2 hit rate on the y-axis for a given confidence criterion. The shaded area under the curve indexes metacognitive sensitivity. (B) Example underconfident and overconfident probability calibration curves, modified after Harvey (1997).

…

Figures - available from: Frontiers in Human Neuroscience

This content is subject to copyright.

Access to this full-text is provided by Frontiers.

Learn more

Content available from Frontiers in Human Neuroscience

This content is subject to copyright.

REVIEW ARTICLE

published: 15 July 2014

doi: 10.3389/fnhum.2014.00443

How to measure metacognition

Stephen M. Fleming1,2*and Hakwan C. Lau3,4*

1Department of Experimental Psychology, University of Oxford, Oxford, UK

2Center for Neural Science, New York University, New York, NY, USA

3Department of Psychology, Columbia University, New York, NY, USA

4Department of Psychology, University of California, Los Angeles, Los Angeles, CA, USA

Edited by:

Harriet Brown, Oxford University,

Reviewed by:

David Huber, University of California

San Diego, USA

Michelle Arnold, Flinders University,

Australia

*Correspondence:

Stephen M. Fleming, Center for

Neural Science, 6 Washington

Place,NewYork,NY10003,USA

e-mail: sf102@nyu.edu;

Hakwan C. Lau, Department of

Psychology, Columbia University,

1190 Amsterdam Avenue, New York,

NY 10027, USA

e-mail: hakwan@gmail.com

The ability to recognize one’s own successful cognitive processing, in e.g., perceptual

or memory tasks, is often referred to as metacognition. How should we quantitatively

measure such ability? Here we focus on a class of measures that assess the

correspondence between trial-by-trial accuracy and one’s own conﬁdence. In general,

for healthy subjects endowed with metacognitive sensitivity, when one is conﬁdent,

one is more likely to be correct. Thus, the degree of association between accuracy and

conﬁdence can be taken as a quantitative measure of metacognition. However, many

studies use a statistical correlation coefﬁcient (e.g., Pearson’s r) or its variant to assess

this degree of association, and such measures are susceptible to undesirable inﬂuences

from factors such as response biases. Here we review other measures based on signal

detection theory and receiver operating characteristics (ROC) analysis that are “bias free,”

and relate these quantities to the calibration and discrimination measures developed in the

probability estimation literature. We go on to distinguish between the related concepts of

metacognitive bias (a difference in subjective conﬁdence despite basic task performance

remaining constant), metacognitive sensitivity (how good one is at distinguishing between

one’s own correct and incorrect judgments) and metacognitive efﬁciency (a subject’s level

of metacognitive sensitivity given a certain level of task performance). Finally, we discuss

how these three concepts pose interesting questions for the study of metacognition and

conscious awareness.

Keywords: metacognition, conﬁdence, signal detection theory, consciousness, probability judgment

INTRODUCTION

Early cognitive psychologists were interested in how well peo-

ple could assess or monitor their own knowledge, and asking for

conﬁdence ratings was one of the mainstays of psychophysical

analysis (Peirce and Jastrow, 1885). For example, Henmon (1911)

summarized his results as follows: “While there is a positive cor-

relation on the whole between degree of conﬁdence and accuracy

the degree of conﬁdence is not a reliable index of accuracy.” This

statement is largely supported by more recent research in the

ﬁeld of metacognition in a variety of domains from memory to

perception and decision-making: subjects have some metacog-

nitive sensitivity, but it is often subject to error (Nelson and

Narens, 1990; Metcalfe and Shimamura, 1996). The determinants

of metacognitive sensitivity is an active topic of investigation

that has been reviewed at length elsewhere (e.g., Koriat, 2007;

Fleming and Dolan, 2012). Here we are concerned with the

best approach to measure metacognition, a topic on which there

remains substantial confusion and heterogeneity of approach.

From the outset, it is important to distinguish two aspects,

namely sensitivity and bias. Metacognitive sensitivity is also

known as metacognitive accuracy, type 2 sensitivity, dis-

crimination, reliability, or the conﬁdence-accuracy correlation.

Metacognitive bias is also known as type 2 bias, over- or under-

conﬁdence or calibration. In Figure 1 we illustrate the difference

between these two constructs. Each panel shows a cartoon density

of conﬁdence ratings separately for correct and incorrect trials on

an arbitrary task (e.g., a perceptual discrimination). Intuitively,

when these distributions are well separated, the subject is able

to discriminate good and bad task performance using the con-

ﬁdence scale, and can be assigned a high degree of metacognitive

sensitivity. However, note that bias “rides on top of” any measure

of sensitivity. A subject might have high overall conﬁdence but

poor metacognitive sensitivity if the correct/error distributions

are not separable. Both sensitivity and bias are important features

of metacognitive judgments, but they are often conﬂated when

interpreting data. In this paper we outline behavioral measures

that are able to separately quantify sensitivity and bias.

A second important feature of metacognitive measures is that

sensitivity is often affected by task performance itself—in other

words, the same individual will appear to have greater metacog-

nitive sensitivity on an easy task compared to a hard task. In

contrast, it is reasonable to assume that an individual might have

a particular level of metacognitive efﬁciency in a domain such as

memory or decision-making that is independent of different lev-

els of task performance. Nelson (1984) emphasized this desirable

property of a measure of metacognition when he wrote that “there

should not be a built-in relation between [a measure of] feeling-

of-knowing accuracy and overall recognition,” thus providing for

Frontiers in Human Neuroscience www.frontiersin.org July 2014 | Volume 8 | Article 443 |1

HUMAN NEUROSCIENCE

Fleming and Lau How to measure metacognition

FIGURE 1 | Schematic showing the theoretical dissociation between

metacognitive sensitivity and bias. Each graph shows a hypothetical

probability density of conﬁdence ratings for correct and incorrect trials, with

conﬁdence increasing from left to right along each x-axis. Metacognitive

sensitivity is the separation between the distributions—the extent to which

conﬁdence discriminates between correct and incorrect trials.

Metacognitive bias is the overall level of conﬁdence expressed,

independent of whether the trial is correct or incorrect. Note that this is a

cartoon schematic and we do not mean to imply any parametric form for

these “Type 2” signal detection theoretic distributions. Indeed, as shown

by Galvin et al. (2003), these distributions are unlikely to be Gaussian.

the “logical independence of metacognitive ability...and objec-

tive memory ability” (Nelson, 1984; p. 111). The question is

then how to distil a measure of metacognitive efﬁciency from

behavioral data. We highlight recent progress on this issue.

We note there are a variety of methods for eliciting metacog-

nitive judgments (e.g., wagering, scoring rules, conﬁdence scales,

awareness ratings) across different domains that have been dis-

cussed at length elsewhere (Keren, 1991; Hollard et al., 2010;

Sandberg et al., 2010; Fleming and Dolan, 2012). Our focus here is

on quantifying metacognition once a judgment has been elicited.

MEASURES OF METACOGNITIVE SENSITIVITY

A useful starting point for all the measures of metacognitive sensi-

tivity that follow is the 2 ×2 conﬁdence-accuracy table (Tabl e 1 ).

This table simply counts the number of high conﬁdence ratings

assigned to correct and incorrect judgments, and similarly for

low conﬁdence ratings. Intuitively, above-chance metacognitive

sensitivity is found when correct trials are endorsed with high

conﬁdence to a greater degree than incorrect trials1. Readers with

a background in signal detection theory (SDT) will immediately

see the connection between Tabl e 1 and standard, “type 1” SDT

(Green and Swets, 1966). In type 1 SDT, the relevant joint prob-

ability distribution is P(response, stimulus)—parameters of this

distribution such as dare concerned with how effectively an

organism can discriminate objective states of the world. In con-

trast, Ta b l e 1 has been dubbed the “type 2” SDT table (Clarke

et al., 1959), as the conﬁdence ratings are conditioned on the

observer’s responses (correct or incorrect), not on the objec-

tive state of the world. All measures of metacognitive sensitivity

can be reduced to operations on this joint probability distribu-

tion P(conﬁdence,accuracy)(seeMason, 2003, for a mathematical

treatment).

1These ratings may be elicited either prospectively or retrospectively.

Table 1 | Classiﬁcation of responses within type 2 signal detection

theory.

Type I decision High conﬁdence Low conﬁdence

Correct Type 2 hit (H2) Type 2 miss (M2)

Incorrect Type 2 false alarm (FA2) Type 2 correct rejection (CR2)

In the discussion that follows we assume that stimulus strength

or task difﬁculty is held roughly constant. In such a design, ﬂuc-

tuations in accuracy and conﬁdence can be attributed to noise

internal to the observer, rather than external changes in signal

strength. This “method of constant stimuli” is appropriate for

ﬁtting signal detection theoretic models, but it also rules out

other potentially interesting experimental questions, such as how

behavior and conﬁdence change with stimulus strength. In the

section Psychometric Function Measures we discuss approaches

to measuring metacognitive sensitivity in designs such as these.

CORRELATION MEASURES

The simplest measure of association between the rows and

columns of Ta b l e 1 is the phi (φ) correlation. In essence, phi is the

standard Pearson rcorrelation between accuracy and conﬁdence

over trials. That is, if we code correct responses as 1’s, and incor-

rect responses as 0’s, accuracy over trials forms a vector, e.g., [0 1

1001].Andifwecodehighconﬁdenceas1,andlowconﬁdence

as 0, we can likewise form a vector of the same length (number

of trials). The Pearson rcorrelation between these two vectors

deﬁnes the “phi” coefﬁcient. A related and very common mea-

sure of metacognitive sensitivity, at least in the memory literature,

is the Goodman–Kruskall gamma coefﬁcient, G(Goodman and

Kruskal, 1954; Nelson, 1984). In a classic paper, Nelson (1984)

advocated Gas a measure of metacognitive sensitivity that does

not make the distributional assumptions of SDT.

Gcan be easily expanded to handle designs in which conﬁ-

dence is made using a rating scale rather than a dichotomous

high/low design (Gonzalez and Nelson, 1996). Though popular,

as measures of metacognitive sensitivity both phi and gamma

correlations have a number of problems. The most prominent is

the fact that both can be “contaminated” by metacognitive bias.

That is, for subjects with a high or low tendency to give high

conﬁdence ratings overall, their phi correlation will be altered

(Nelson, 1984)2. Intuitively one can consider the extreme cases

where subjects perform a task near threshold (i.e., between ceiling

and chance performance), but rate every trial as low conﬁdence,

not because of a lack of ability to introspect, but because of an

overly shy or humble personality. In such a case, the correspon-

dence between conﬁdence and accuracy is constrained by bias. In

an extensive simulation study, Masson and Rotello (2009) showed

that Gwas similarly sensitive to the tendency to use higher or

lower conﬁdence ratings (bias), and that this may lead to erro-

neous conclusions, such as interpreting a difference in Gbetween

2Another way of stating this is that phi is “margin sensitive”—the value of phi

is affected by the marginal counts of Tab l e 1 (the row and column sums) that

describe an individual’s task performance and bias.

Frontiers in Human Neuroscience www.frontiersin.org July 2014 | Volume 8 | Article 443 |2

Fleming and Lau How to measure metacognition

groups as reﬂecting a true underlying difference in metacognitive

sensitivity despite possible differences in bias.

TYPE 2 d

A standard way to remove the inﬂuence of bias in an estima-

tion of sensitivity is to apply SDT (Green and Swets, 1966). In

the case of type 1 detection tasks, overall percentage correct is

“contaminated” by the subject’s bias, i.e., the propensity to say

“yes” overall. To remove this inﬂuence of bias, researchers often

estimate dbased on the hit rate and false alarm rate, which

(assuming equal-variance Gaussian distributions for internal sig-

nal strength) is mathematically independent of bias. That is, given

a constant underlying sensitivity to detect the signal, estimated d

will be constant given different biases.

There have been several evaluations of this approach to char-

acterize metacognitive sensitivity (Clarke et al., 1959; Lachman

et al., 1979; Ferrell and McGoey, 1980; Nelson, 1984; Kunimoto

et al., 2001; Higham, 2007; Higham et al., 2009), where type 2

hit rate is deﬁned as the proportion of trials in which subjects

reported high conﬁdence given their responses were correct (H2

in Tab l e 1 ), and type 2 false alarm rate is deﬁned as the proportion

of trials in which subjects reported high conﬁdence given their

responses were incorrect (FA2 in Tab l e 1 ). Type 2 d=z(H2)

−z(FA2), where zis the inverse of the cumulative normal dis-

tribution function3. Theoretically, then, by using standard SDT,

type 2 dis argued to be independent from metacognitive bias

(the overall propensity to give high conﬁdence responses).

However, type 2 dturns out to be problematic because SDT

assumes that the distribution of internal signals for “correct” and

“incorrect” trials are Gaussian with equal variances. While this

assumption is usually more or less acceptable at the type 1 level

(especially for 2-alternative forced-choice tasks), it is highly prob-

lematic for type 2 analysis. Galvin et al. (2003) showed that these

distributions are of different variance and highly non-Gaussian if

the equal variance assumption holds at the type 1 level. Using sim-

ulation data, Evans and Azzopardi (2007) showed that this leads

to the type 2 dmeasure proposed by Kunimoto et al. (2001) being

confounded by changes in metacognitive bias.

TYPE 2 ROC ANALYSIS

Because the standard parametric signal detection approach is

problematic for type 2 analysis, one solution is to apply a non-

parametric analysis that is free from the equal-variance Gaussian

assumption. In type 1 SDT this is standardly achieved via ROC

(receiver operating characteristic) analysis, in which data are

obtained from multiple response criteria. For example, if the pay-

offs for making a hit and false alarm are systematically altered,

it is possible to systematically induce more conservative or lib-

eral criteria. For each criterion, hit rate and false alarm rate can

be calculated. These are plotted as individual points on the ROC

plot—hit rate is plotted on the vertical axis and false alarm rate

on the horizontal axis. With multiple criteria we have multiple

points, and the curve that passes through these different points

is the ROC curve. If the area under the ROC is 0.5, performance

3Kunimoto and colleagues labeled their type 2 dmeasure a.

is at chance. Higher area under ROC (AUROC) indicates higher

sensitivity.

Because this method is non-parametric, it does not depend on

rigid assumptions about the nature of the underlying distribu-

tions and can similarly be applied to type 2 data. Recall that type

2 hit rate is simply the proportion of high conﬁdence trials when

the subject is correct, and type 2 false alarm rate is the proportion

of high conﬁdence trials when the subject is incorrect (Tabl e 1 ).

For two levels of conﬁdence there is thus one criterion, and one

pair of type 2 hit and false alarm rates. However, with multi-

ple conﬁdence ratings it is possible to construct the full type 2

ROC by treating each conﬁdence level as a criterion that separates

high from low conﬁdence (Clarke et al., 1959; Galvin et al., 2003;

Benjamin and Diaz, 2008). For instance, we start with a liberal cri-

terion that assigns low conﬁdence =1 and high conﬁdence =2–4,

then a higher criterion that assigns low conﬁdence =1 and 2 and

high conﬁdence =3 and 4, and so on. For each split of the data,

hit and false alarm rate pairs are calculated and plotted to obtain

a type 2 ROC curve (Figure 2A).Theareaunderthetype2ROC

curve (AUROC2) can then be used as a measure of metacogni-

tive sensitivity (in the Supplementary Material we provide Matlab

code for calculating AUROC2 from rating data). This method is

more advantageous than the gamma and phi correlations because

it is bias-free (i.e., it is theoretically uninﬂuenced by the overall

propensity of the subject to say high conﬁdence) and in con-

trast to type 2 ddoes not make parametric assumptions that are

knowntobefalse.

In summary, therefore, despite their intuitive appeal, simple

measures of association such as the phi correlation and gamma do

not separate metacognitive sensitivity from bias. Non-parametric

methodssuchasAUROC2providebias-freemeasuresofsensi-

tivity.However,afurthercomplicationwhenstudyingmetacog-

nitive sensitivity is that the measures reviewed above are also

affected by task performance. For instance, Galvin et al. (2003)

showed mathematically that AUROC2 is affected by both type

1dand type 1 criterion placement, a conclusion supported

by experimental manipulation (Higham et al., 2009). In other

words, a change in task performance is expected, apriori,to

lead to changes in AUROC2, despite the subject’s endogenous

metacognitive “efﬁciency” remaining unchanged. One approach

to dealing with this confound is to use psychophysical techniques

to control for differences in performance and then calculate

AUROC2 (e.g., Fleming et al., 2010). An alternative approach

is to explicitly model the connection between performance and

metacognition.

MODEL-BASED APPROACHES

The recently developed meta-dmeasure (Maniscalco and Lau,

2012, 2014) exploits the fact that given Gaussian variance assump-

tions at the type 1 level, the shapes of the type 2 distributions are

known even if they are not themselves Gaussian (Galvin et al.,

2003). Theoretically therefore, ideal, maximum type 2 perfor-

mance is constrained by one’s type 1 performance. Intuitively, one

can again consider the extreme cases. Imagine a subject is per-

forming a two-choice discrimination task completely at chance.

Half of their trials are correct and half are incorrect due to chance

responding despite zero type 1 sensitivity. To introspectively

Frontiers in Human Neuroscience www.frontiersin.org July 2014 | Volume 8 | Article 443 |3

Fleming and Lau How to measure metacognition

FIGURE 2 | (A) Example type 2 ROC function for a single subject. Each

point plots the type 2 false alarm rate on the x-axis against the type 2

hit rate on the y-axis for a given conﬁdence criterion. The shaded area

under the curve indexes metacognitive sensitivity. (B) Example

underconﬁdent and overconﬁdent probability calibration curves, modiﬁed

after Harvey (1997).

distinguish between correct and incorrect trials would be impos-

sible, because the correct trials are ﬂukes. Thus, when type 1

sensitivity is zero, type 2 sensitivity (metacognitive sensitivity)

should also be so. This dependency places strong constraints on a

measure of metacognitive sensitivity.

Speciﬁcally, given a particular type 1 variance structure and

bias, the form of the type 2 ROC is completely determined (Galvin

et al., 2003). We can thus create a family of type 2 ROC curves,

each of which will correspond to an underlying type 1 sensitivity

assuming that the subject is metacognitively ideal (i.e., has max-

imal type 2 sensitivity given a certain type 1 sensitivity). Because

such a family of type 2 ROC curves are all non-overlapping

(Galvin et al., 2003), we can determine the curve from this fam-

ily with just a single point, i.e., a single criterion. With this, we

can obtain, given the subject’s actual type 2 performance data,

the underlying type 1 sensitivity that we expect if the subject is

ideal is placing their conﬁdence ratings. We label the underlying

type 1 sensitivity of this ideal observer meta-d. Because meta-d

is in units of type 1 d, we can think of it as the sensory evidence

available for metacognition in signal-to-noise ratio units, just as

type 1 dis the sensory evidence available for decision-making in

signal-to-noise ratio units. Among currently available methods,

we think meta-dis the best measure of metacognitive sensitiv-

ity, and it is quickly gaining popularity (e.g., Baird et al., 2013;

Charles et al., 2013; Lee et al., 2013; McCurdy et al., 2013). Barrett

et al. (2013) have conducted extensive normative tests of meta-d,

ﬁnding that it is robust to changes in bias and that it recovers sim-

ulated changes in metacognitive sensitivity (see also Maniscalco

and Lau, 2014). Matlab code for ﬁtting meta-dto rating data is

available at http://www.columbia.edu/∼bsm2105/type2sdt/.

One major advantage of meta-dover AUROC2 is its ease

of interpretation and its elegant control over the inﬂuence of

performance on metacognitive sensitivity. Speciﬁcally, because

meta-dis in the same units as (type 1) d, the two can be

directly compared. Therefore, for a metacognitively ideal observer

(a person who is rating conﬁdence using the maximum possi-

ble metacognitive sensitivity), meta-dshould equal d.Ifmeta-

d<d, metacognitive sensitivity is suboptimal within the SDT

framework. We can therefore deﬁne metacognitive efﬁciency as

the value of meta-drelative to d, or meta-d/d. A meta-d/d

value of 1 indicates a theoretically ideal value of metacognitive

efﬁciency. A value of 0.7 would indicate 70% metacognitive efﬁ-

ciency (30% of the sensory evidence available for the decision

is lost when making metacognitive judgments), and so on. A

closely related measure is the difference between meta-dand d,

i.e., meta-d−d(Rounis et al., 2010). One practical reason for

using meta-d−drather than meta-d/dis that the latter is a

ratio,andwhenthedenominator(d) is small, meta-d/dcan give

rather extreme values which may undermine power in a group

statistical analysis. However, this problem can also be addressed

by taking log of meta- d/d,asisoftendonetocorrectforthe

non-normality of ratio measures (Howell, 2009). Toward the end

of this article we explore the implications of this metacognitive

efﬁciency construct for a psychology of metacognition.

The meta-dapproach is based on an ideal observer model of

thelinkbetweentype1andtype2SDT,usingthisasabench-

mark against which to compare subjects’ metacognitive efﬁciency.

However, meta-dis unable to discriminate between different

causes of a change in metacognitive efﬁciency. In particular, like

standard SDT, meta-dis unable to dissociate trial-to-trial vari-

ability in the placement of conﬁdence criteria from additional

noise in the evidence used to make the conﬁdence rating—both

manifest as a decrease in metacognitive efﬁciency.

A similar bias-free approach to modeling metacognitive accu-

racy is the “Stochastic Detection and Retrieval Model” (SDRM)

introduced by Jang et al. (2012). The SDRM not only mea-

sures metacognitive accuracy, but is also able to model different

potential causes of metacognitive inaccuracy. The core of the

model assumes two samplings of “evidence” per stimulus, one

leading to a ﬁrst-order behavior, such as memory retrieval, and

the other leading to a conﬁdence rating. These samples are dis-

tinct but drawn from a bivariate distribution with correlation

parameter ρ. This variable correlation naturally accounts for dis-

sociations between conﬁdence and accuracy. For instance, if the

samples are highly correlated, the subject will tend to be conﬁdent

when behavioral performance is high, and less conﬁdent when

Frontiers in Human Neuroscience www.frontiersin.org July 2014 | Volume 8 | Article 443 |4

Fleming and Lau How to measure metacognition

behavioral performance is low. The SDRM additionally models

noise in the conﬁdence rating process itself through variability

in the setting of conﬁdence criteria from trial to trial. SDRM

was originally developed to account for conﬁdence in free recall

involving a single class of items, but it can be naturally extended

to two choice cases such as perceptual or mnemonic decisions.

By modeling these two separate sources of variability, SDRM is

able to unpack potential causes of a decrease in metacognitive

efﬁciency. However, SDRM requires considerable interpretation

of parameter ﬁts to draw conclusions about underlying metacog-

nitive processes, and meta-dmay prove simpler to calculate and

work with for many empirical applications.

METACOGNITIVE BIAS

Metacognitive bias is the tendency to give high conﬁdence rat-

ings, all else being equal. The simplest of such measures is the

percentage of high conﬁdence trials (i.e., the marginal proportion

of high conﬁdence judgments in Ta b l e 1 , averaging over correct

and incorrect trials), or the average conﬁdence rating over tri-

als. In standard type 1 SDT, a more liberal metacognitive bias

corresponds to squeezing the ﬂanking conﬁdence-rating criteria

toward the central decision criterion such that more area under

both stimulus distributions falls beyond the “high conﬁdence”

criteria.

A more liberal metacognitive bias leads to different patterns

of responding depending on how conﬁdence is elicited. If conﬁ-

dence is elicited secondary to a decision about options “A” or “B,”

squeezing the conﬁdence criteria will lead to an overall increase in

conﬁdence, regardless of previous response. However, conﬁdence

is often elicited alongside the decision itself, using a scale such as

1=sure “A” to 6 =sure “B,” where ratings 3 and 4 indicate low

conﬁdence “A” and “B,” respectively. A more liberal metacognitive

bias in this case would lead to an increased use of the extremes of

the scale (1 and 6) and a decreased use of the middle of the scale

(3 and 4).

PSYCHOMETRIC FUNCTION MEASURES

The methods for measuring metacognitive sensitivity we have

discussed above assume data is obtained using a constant level

of task difﬁculty or stimulus strength, equivalent to obtaining a

measure of din standard psychophysics. If a continuous range

of stimulus difﬁculties are available, such as when a full psycho-

metric function is estimated, it is of course possible to apply the

same methods to each level of stimulus strength independently.

An alternative approach is to compute an aggregate measure of

metacognitive sensitivity as the difference in slope between psy-

chometric functions constructed from high and low conﬁdence

trials (e.g., De Martino et al., 2013; de Gardelle and Mamassian,

2014).Theextenttowhichtheslopebecomessteeper(moreaccu-

rate) under high compared to low conﬁdence is a measure of

metacognitive sensitivity. However, this method may not be bias-

free, or account for individual differences in task performance, as

discussed above.

DISCREPANCY MEASURES

We close this section by pointing out that some researchers have

used “one-shot” discrepancy measures to quantify metacogni-

tion. For instance, if we ask someone how good their memory

is on a scale of 1–10, we obtain a rating that we can then

compare to memory performance on a variety of tasks. This

discrepancy score approach is often used in the clinical litera-

ture (e.g., Schmitz et al., 2006) and in social psychology (e.g.,

Kruger and Dunning, 1999) to quantify metacognitive skill or

“insight.” It is hopefully clear from the preceding sections that

if one only has access to a single rating of performance, it

is not possible to tease apart bias from sensitivity, nor mea-

sure efﬁciency. To continue with the memory example, a large

discrepancy score may be due to a reluctance to rate oneself

as performing poorly (metacognitive bias), or a true blind-

ness to one’s memory performance (metacognitive sensitivity).

In contrast, by collecting trial-by-trial measures of performance

and metacognitive judgments we can build up a picture of

an individual’s bias, sensitivity and efﬁciency in a particular

domain.

JUDGMENTS OF PROBABILITY

Metacognitive conﬁdence can be formalized as a probability judg-

ment directed toward one’s own actions—the probability of a

previous judgment being correct. There is a rich literature on the

correspondence between subjective judgments of probability and

the reality to which those judgments correspond. For example,

a weather forecaster may make several predictions of the chance

of rain throughout the year; if the average prediction (e.g., 60%)

ends up matching the frequency of rainy days in the long run we

can say that the forecaster is well calibrated. In this framework

metacognition has a normative interpretation as the accuracy of

a probability judgment about one’s own performance. We do not

aim to cover the literature on probability judgments here; instead

we refer the reader to several comprehensive reviews (Lichtenstein

et al., 1982; Keren, 1991; Harvey, 1997; Moore and Healy, 2008).

Instead we highlight some developments in the judgment and

decision-making literature that directly bear on the measurement

of metacognition.

There are two general classes of probability judgment prob-

lem. Discrete cases refer to probabilities assigned to particular

statements, such as “the correct answer is A” or “it will rain

tomorrow.” Continuous cases are where the assessor provides a

conﬁdence interval or some other indication of their uncertainty

in a quantity such as the distance from London to Manchester.

While the accuracy of continuous judgments is also of interest,

our focus here is on discrete judgments, as they provide the clear-

est connection to the metacognition measures reviewed above.

For example, in a 2AFC task with stimulus class dand response

a, an ideal observer should base their conﬁdence on the quantity

P(d=a).

An advantage of couching metacognitive judgments in a prob-

ability framework is that a meaningful measure of bias can be

elicited. In other words, while a conﬁdence rating of “4” does not

mean much outside of the context of the experiment, a probabil-

ity rating of 0.7 can be checked against the objective likelihood of

occurrence of the event in the environment; i.e., the probability

of being correct for a given conﬁdence level. Moreover, probabil-

ity judgments can be compared against quantities derived from

probabilistic models of conﬁdence (e.g., Kepecs and Mainen,

2012).

Frontiers in Human Neuroscience www.frontiersin.org July 2014 | Volume 8 | Article 443 |5

Fleming and Lau How to measure metacognition

QUANTIFYING THE ACCURACY OF PROBABILITY JUDGMENTS

The judgment and decision-making literature has independently

developed indices of probability accuracy similar to Gand

meta- din the metacognition literature. For example, following

Harvey (1997), a “probability score” (PS) is the squared differ-

ence between the probability rating fand its actual occurrence c

(where c=1or0forbinaryevents,suchascorrectorincorrect

judgments):

PS =f−c2

The mean value of the PS averaged across estimates is known

as the Brier score (Brier, 1950). As the PS is an “error” score, a

lower value of PS is better. The Brier score is analogous to the phi

coefﬁcient discussed above.

The decomposition of the Brier score into its component

parts may be of particular interest to metacognition researchers.

Particularly, one can decompose the Brier score into the following

components (Murphy, 1973):

PS =O+C−R

where Ois the “outcome index” and reﬂects the variance of

the outcome event c:O=c(1 −c); Cis “calibration,” the good-

ness of ﬁt between probability assessments and the correspond-

ing proportion of correct responses; and Ris “resolution,” the

variance of the probability assessments. Note that in studies of

metacognitive conﬁdence in decision-making, memory, etc., the

outcome event is simply the performance of the subject. In other

words, when performance is near chance, the variance of the

outcomes—corrects and errors—is maximal, and Owill be high.

In contrast, when performance is near ceiling, Ois low. This

decomposition therefore echoes the SDT-based analysis discussed

above, and accordingly both reach the same conclusion: sim-

ple correlation measures between probabilities/conﬁdence and

outcomes/performance are themselves inﬂuenced by task per-

formance. Just as efforts have been made to correct measures

of metacognitive sensitivity for differences in performance and

bias, similar concerns led to the development of bias-free mea-

sures of discrimination. In particular, Yaniv et al. (1991) describe

an “adjusted normalized discrimination index” (ANDI) that

achieves such control.

Calibration (C)isdeﬁnedas:

C=1



j=1

Njfj−cj2

where jindexes each probability category. Calibration quantiﬁes

the discrepancy between the mean performance level in a cat-

egory (e.g., 60%) and its associated rating (e.g., 80%), with a

lower discrepancy giving a better PS. A calibration curve is con-

structed by plotting the relative frequency of correct answers in

each probability judgment category (e.g., 50–60%) against the

mean probability rating for the category (e.g., 55%) (Figure 2B).

A typical ﬁnding is that observers are overconﬁdent (Lichtenstein

et al., 1982)—probability judgments are greater than mean %

correct.

Resolution is a measure of the variance of the probability

assessments, measuring the extent to which correct and incorrect

answers are assigned to different probability categories:

R=1



j=1

Njcj−c2

As Ris subtracted from the other terms in the PS, a larger vari-

ance is better, reﬂecting the observer’s ability to place correct and

incorrect judgments in distinct probability categories.

Both calibration and resolution contribute to the overall

“accuracy” of probability judgments. To illustrate this, consider

the following contrived example. In a general knowledge task,

a subject rates each correct judgment as 90% likely to be cor-

rect, and each error as 80% likely to be correct. Her objective

mean performance level is 60%. She is poorly calibrated, in the

sense that the mean subjective probability of being correct out-

strips her actual performance. But she displays good resolution

for discriminating correct from incorrect trials using distinct lev-

els of the probability scale (although this resolution could be

even higher if she chose even more diverse ratings). This example

raises important questions as to the psychological processes that

permit metacognitive discrimination of internal states (e.g., reso-

lution, or sensitivity) and the mapping of these discriminations

onto a probability or conﬁdence scale (calibration; e.g., Ferrell

and McGoey, 1980). The learning of this mapping, and how it

may lead to changes in metacognition, has received relatively little

attention.

IMPLICATIONS OF BIAS, SENSITIVITY, AND EFFICIENCY FOR

A PSYCHOLOGY OF METACOGNITION

The psychological study of metacognition has been interested

in elucidating the determinants and impact of metacognitive

sensitivity. For instance, in a classic example, judgments of

learning (JOLs) show better sensitivity when the delay between

initial learning and JOL is increased (Nelson and Dunlosky,

1991), presumably due to delayed JOLs recruiting relevant diag-

nostic information from long-term memory. However, many

of these “classic” ﬁndings in the metacognition rely on mea-

sures such as G(Rhodes and Tauber, 2011)thatmaybecon-

founded by bias and performance effects (although see Jang

et al., 2012). We strongly urge the application of bias-free

measures of metacognitive sensitivity reviewed above in future

studies.

More generally, we believe it is important to distinguish

between metacognitive sensitivity and efﬁciency. To recap,

metacognitive sensitivity is the ability to discriminate correct

from incorrect judgments; signal detection theoretic analysis

shows that metacognitive sensitivity scales with task performance.

In contrast, metacognitive efﬁciency is measured relative to a

particular performance level. Efﬁciency measures have several

possible applications. First, we may want to compare metacog-

nitive efﬁciency across domains in which it is not possible to

match performance levels. For instance, it is possible to quan-

tify metacognitive efﬁciency on visual and memory tasks to

elucidate their respective neural correlates (Baird et al., 2013;

Frontiers in Human Neuroscience www.frontiersin.org July 2014 | Volume 8 | Article 443 |6

Fleming and Lau How to measure metacognition

McCurdy et al., 2013). Second, it is of interest to determine

whether different subject groups, such as patients and controls

(David et al., 2012) or older vs. younger adults (Souchay et al.,

2000), exhibit differential metacognitive efﬁciency after taking

into account differences in task performance. For example, Wei l

et al. (2013) showed that metacognitive efﬁciency increases dur-

ing adolescence, consistent with the maturation of prefrontal

regions thought to underpin metacognition (Fleming and Dolan,

2012). Finally, it will be of particular interest to compare metacog-

nitive efﬁciency across different animal species. Several stud-

ies have established the presence of metacognitive sensitivity in

some non-human animals (Hampton, 2001; Kornell et al., 2007;

Middlebrooks and Sommer, 2011; Kepecs and Mainen, 2012).

However, it is unknown whether other species such as macaque

monkeyshavelevelsofmetacognitiveefﬁciency similar to those

seen in humans.

Finally, the inﬂuence of performance, or skill, on efﬁciency

itself is of interest. In a highly cited paper, Kruger and Dunning

(1999) report a series of experiments in which the worst-

performing subjects on a variety of tests showed a bigger dis-

crepancy between actual performance and a one-shot rating than

the better performers. The authors concluded that “those with

limited knowledge in a domain suffer a dual burden: Not only

do they reach mistaken conclusions and make regrettable errors,

but their incompetence robs them of the ability to realize it”

(p. 1132). Notably the Dunning–Kruger effect has two distinct

interpretations in terms of sensitivity and efﬁciency. On the one

hand the effect is a direct consequence of metacognitive sensitiv-

ity being determined by type 1 d. In other words, it would be

strange (based on the ideal observer model) if worse perform-

ing subjects didn’t make noisier ratings. On the other hand, it

is possible that skill in a domain and metacognitive efﬁciency

share resources (Dunning and Kruger’s preferred interpretation),

leading to a non-linear relationship between dand metacogni-

tive sensitivity. As discussed above, one-shot ratings are unable to

disentangle bias, sensitivity and efﬁciency. Instead, by collecting

trial-by-trial metacognitive judgments and calculating efﬁciency,

it may be possible to ask whether efﬁciency itself is reduced in

subjects with poorer skill.

IMPLICATIONS OF BIAS, SENSITIVITY, AND EFFICIENCY FOR

STUDIES OF CONSCIOUS AWARENESS

There has been a recent interest in interpreting metacognitive

measures as reﬂecting conscious awareness or subjective (often

visual) phenomenological experience, and in this ﬁnal section

we discuss some caveats associated with these thorny issues. As

early as Peirce and Jastrow (1885) it has been suggested that

a subject’s conﬁdence can be used to indicate level of sensory

awareness. Namely, if in making a perceptual judgment, a sub-

ject has zero conﬁdence and feels that a pure guess has been

made, then presumably the subject is not aware of sensory infor-

mation driving the decision. If their judgment turns out to be

correct, it would seem likely to be a ﬂuke or due to unconscious

processing.

However, conﬁdence is typically correlated with task accuracy

(type 1 d)—indeed, this is the essence of metacognitive sen-

sitivity. It has been argued that type 1 ditself should not be

taken as a measure of awareness because unconscious processing

may also drive type 1 d(Lau, 2008), as demonstrated in clini-

cal cases such as blindsight (Weiskrantz et al., 1974). Lau (2008)

gives further arguments as to why type 1 dis a poor measure

of subjective awareness and argues that it should be treated as a

potential confound. In other words, because type 1 ddoes not

necessarily reﬂect awareness, in measuring awareness we should

compare conditions where type 1 dis matched or otherwise

controlled for. Importantly, to match type 1 d, it is difﬁcult to

focus the analysis at a single-trial level, because dis a prop-

erty of a task condition or group of trials. Therefore, Lau and

Passingham (2006) created task conditions that were matched for

type 1 dbut differed in level of subjective awareness, permitting

an analysis of neural activity correlated with visual awareness but

not performance. Essentially, such differences between conditions

reﬂect a difference in metacognitive bias despite type 1 dbeing

matched.

In contrast, other studies have focused on metacognitive sen-

sitivity, rather than bias, as a relevant measure of awareness. For

instance, Kolb and Braun (1995) used binocular presentation and

motion patterns to create stimuli in which subjects had positive

type 1 d(in a localization task), but near-zero metacognitive

sensitivity. Although this ﬁnding has proven difﬁcult to replicate

(Morgan and Mason, 1997), here we focus on the conceptual basis

of their argument. The notion of taking a lack of metacognitive

sensitivity as reﬂecting lack of awareness has also been discussed

in the literature on implicit learning (Dienes, 2008), and is intu-

itively appealing. Lack of metacognitive sensitivity indicates that

the subject has no ability to introspect upon the effectiveness of

their performance. One plausible reason for this lack of ability

is an absence of conscious experience on which the subject can

introspect.

However, there is another possibility. Metacognitive sensitiv-

ity is calculated with reference to the external world (whether

a judgment is objectively correct or incorrect), not the subject’s

experience, which is unknown to the experimenter. Thus, while

low metacognitive sensitivity could be due to an absence of con-

scious experience, it could also be due to hallucinations, such that

the subject vividly sees a false target and thus generates an incor-

rect type 1 response. Because of the vividness of the hallucination,

the subject may reasonably express high conﬁdence (a type 2 false

alarm, from the point of view of the experimenter). In the case

of hallucinations, the conscious experience does not correspond

to objects in the real world, but it is a conscious experience all

the same. Thus, low metacognitive sensitivity cannot be taken

unequivocally to mean lack of conscious experience.

That said, we acknowledge the close relationship between

metacognitive sensitivity and awareness in standard laboratory

experiments in the absence of psychosis. Intuitively, metacogni-

tive sensitivity is what gives conﬁdence ratings their meaning.

Conﬁdence or bias ﬂuctuates across individual trials (a single trial

might be rated as “seen” or highly conﬁdent), whereas metacog-

nitive sensitivity is a property of the individual, or at least a

particular condition in the experiment. High conﬁdence is only

meaningfully interpretable as successful recognition of one’s own

effective processing when it can be shown that there is some

reasonable level of metacognitive sensitivity; i.e., that conﬁdence

Frontiers in Human Neuroscience www.frontiersin.org July 2014 | Volume 8 | Article 443 |7

Fleming and Lau How to measure metacognition

ratings were not given randomly. For instance, Schwiedrzik et al.

(2011) used this logic to argue that differences in metacog-

nitive bias reﬂected genuine differences in awareness, because

metacognitive sensitivity was positive and unchanged in their

experiment.

We note that criticisms also apply to using metacognitive

bias to index awareness. In all cases, we would need to make

sure that type 1 dis not a confound, and that the conﬁdence

level expressed is solely due to introspection of the conscious

experience in question. Thus, the strongest argument for pre-

ferring metacognitive bias rather than metacognitive sensitivity

as a measure of awareness is a conceptual one. Metacognitive

sensitivity measures the ability of the subject to introspect, not

what or how much conscious experience is being introspected

upon on any given trial. For instance, in what is sometimes

called type 2 blindsight, patients may develop a “hunch” that

the stimulus is presented, without acknowledging the existence

of a corresponding visual conscious experience. Such a hunch

may drive above-chance metacognitive sensitivity (Persaud et al.,

2011). More generally, it is unfortunate that researchers often

prefer sensitivity or sensitivity measures simply because they are

“bias free.” This advantage is only relevant when we have good

reasons to want to exclude the inﬂuence of bias! Otherwise, bias

and sensitivity measures are just different measures. This is true

for both type 1 and type 2 analyses. Instead it might be useful

to think of metacognitive sensitivity as a background against

which awareness reports should be referenced. Metacognitive

sensitivity indexes the amount we can trust the subject to tell us

something about the objective features of the stimulus. But lack

of trust does not immediately rule out an idiosyncratic conscious

experience divorced from features of the world proscribed by the

experimenter.

CONCLUSIONS

Here we have reviewed measures of metacognitive sensitivity,

and pointed out that bias is a confounding factor for popu-

lar measures of association such as gamma and phi. We point

out that there are alternative measures available based on SDT

and ROC analysis that are bias-free, and we relate these quan-

tities to the calibration and resolution measures developed in

theprobabilityestimationliterature.Westronglyurgetheappli-

cation of the bias-free measures of metacognitive sensitivity

reviewed above in future studies of metacognition. We distin-

guished between the related concepts of metacognitive bias (a

difference in subjective conﬁdence despite basic task perfor-

mance remaining constant), metacognitive sensitivity (how good

one is at distinguishing between one’s own correct and incor-

rect judgments) and metacognitive efﬁciency (a subject’s level

of metacognition given a certain basic task performance or sig-

nal processing capacity). Finally, we discussed how these three

concepts pose interesting questions for future studies of metacog-

nition, and provide some cautionary warnings for directly

equating metacognitive sensitivity with awareness. Instead, we

advocate a more traditional approach that takes metacognitive

bias as reﬂecting levels of awareness and metacognitive sensi-

tivity as a background against which other measures should

be referenced.

ACKNOWLEDGMENTS

Stephen M. Fleming is supported by a Sir Henry Wellcome

Fellowship from the Wellcome Trust (WT096185). We thank

Brian Maniscalco for helpful discussions.

SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found

online at: http://www.frontiersin.org/journal/10.3389/fnhum.

2014.00443/abstract

REFERENCES

Baird, B., Smallwood, J., Gorgolewski, K. J., and Margulies, D. S. (2013).

Medial and lateral networks in anterior prefrontal cortex support metacog-

nitive ability for memory and perception. J. Neurosci. 33, 16657–16665. doi:

10.1523/JNEUROSCI.0786-13.2013

Barrett, A., Dienes, Z., and Seth, A. K. (2013). Measures of metacognition

on signal-detection theoretic models. Psychol. Methods 18, 535–552. doi:

10.1037/a0033268

Benjamin, A. S., and Diaz, M. (2008). “Measurement of relative metamnemonic

accuracy,” in Handbook of Metamemory and Memory, eds J. Dunlosky and R. A.

Bjork (New York, NY: Psychology Press), 73–94.

Brier, G. W. (1950). Veriﬁcation of forecasts expressed in terms of probability. Mon.

Weather Rev. 78, 1–3. doi: 10.1175/1520-0493(1950)078%3C0001:VOFEIT%

3E2.0.CO;2

Charles, L., Van Opstal, F., Marti, S., and Dehaene, S. (2013). Distinct brain mech-

anisms for conscious versus subliminal error detection. Neuroimage 73, 80–94.

doi: 10.1016/j.neuroimage.2013.01.054

Clarke, F., Birdsall, T., andTanner, W. (1959). Two types of ROC curves and deﬁni-

tion of parameters. J. Acoust. Soc. Am. 31, 629–630. doi: 10.1121/1.1907764

David, A. S., Bedford, N., Wiffen, B., and Gilleen, J. (2012). Failures of metacog-

nition and lack of insight in neuropsychiatric disorders. Philos. Trans. R. Soc.

Lond. B Biol. Sci. 367, 1379–1390. doi: 10.1098/rstb.2012.0002

de Gardelle, V., and Mamassian, P. (2014). Does conﬁdence use a com-

mon currency across two visual tasks? Psychol. Sci. 25, 1286–1288. doi:

10.1177/0956797614528956

De Martino, B., Fleming, S. M., Garrett, N., and Dolan, R. J. (2013). Conﬁdence in

value-based choice. Nat. Neurosci. 16, 105–110. doi: 10.1038/nn.3279

Dienes, Z. (2008). Subjective measures of unconscious knowledge. Prog. Brain Res.

168, 49–64. doi: 10.1016/S0079-6123(07)68005-4

Evans, S., and Azzopardi, P. (2007). Evaluation of a “bias-free” measure of aware-

ness. Spat. Vis. 20, 61–77. doi: 10.1163/156856807779369742

Ferrell, W. R., and McGoey, P. J. (1980). A model of calibration for subjective

probabilities. Organ. Behav. Hum. Perform. 26, 32–53. doi: 10.1016/0030-

5073(80)90045-8

Fleming, S. M., and Dolan, R. J. (2012). The neural basis of metacogni-

tive ability. Philos. Trans. R. Soc. Lond. B Biol. Sci. 367, 1338–1349. doi:

10.1098/rstb.2011.0417

Fleming, S. M., Weil, R. S., Nagy, Z., Dolan, R. J., and Rees, G. (2010). Relating

introspective accuracy to individual differences in brain structure. Science 329,

1541–1543. doi: 10.1126/science.1191883

Galvin, S. J., Podd, J. V., Drga, V., and Whitmore, J. (2003). Type 2

tasks in the theory of signal detectability: discrimination between cor-

rect and incorrect decisions. Psychon. Bull. Rev. 10, 843–876. doi: 10.3758/

BF03196546

Gonzalez, R., and Nelson, T. O. (1996). Measuring ordinal association in situ-

ations that contain tied scores. Psychol. Bull. 119, 159. doi: 10.1037//0033-

2909.119.1.159

Goodman, L. A., and Kruskal, W. H. (1954). Measures of association for cross

classiﬁcations. J. Am. Stat. Assoc. 49, 732–764.

Green, D., and Swets, J. (1966). Signal Detection Theory and Psychophysics.New

Yo r k , N Y: W i l e y.

Hampton, R. R. (2001). Rhesus monkeys know when they remember. Proc. Natl.

Acad. Sci. U.S.A. 98, 5359–5362. doi: 10.1073/pnas.071600998

Harvey, N. (1997). Conﬁdence in judgment. Trend s C ogn . S c i . 1, 78–82. doi:

10.1016/S1364-6613(97)01014-0

Henmon, V. (1911). The relation of the time of a judgment to its accuracy. Psychol.

Rev. 18, 186. doi: 10.1037/h0074579

Frontiers in Human Neuroscience www.frontiersin.org July 2014 | Volume 8 | Article 443 |8

Fleming and Lau How to measure metacognition

Higham, P. A. (2007). No special K! A signal detection framework for the strategic

regulation of memory accuracy. J. Exp. Psychol. Gen. 136, 1. doi: 10.1037/0096-

3445.136.1.1

Higham, P. A., Perfect, T. J., and Bruno, D. (2009). Investigating strength and fre-

quency effects in recognition memory using type-2 signal detection theory.

J. Exp. Psychol. Learn. Mem. Cogn. 35, 57. doi: 10.1037/a0013865

Hollard, G., Massoni, S., and Verg naud, J. C. (2010). Subjective Belief Formation and

Elicitation Rules: Experimental Evidence. Working paper.

Howell, D. C. (2009). Statistical Methods for Psychology.PaciﬁcGrove,CA:

Wadsworth Pub Co.

Jang, Y., Wallsten, T. S., and Huber, D. E. (2012). A stochastic detection and

retrieval model for the study of metacognition. Psychol. Rev. 119, 186. doi:

10.1037/a0025960

Kepecs, A., and Mainen, Z. F. (2012). A computational framework for the study of

conﬁdence in humans and animals. Philos. Trans. R. Soc. Lond. B Biol. Sci. 367,

1322–1337. doi: 10.1098/rstb.2012.0037

Keren, G. (1991). Calibration and probability judgements: conceptual and method-

ological issues. Acta Psychol. 77, 217–273. doi: 10.1016/0001-6918(91)90036-Y

Kolb, F. C., and Braun, J. (1995). Blindsight in normal observers. Nature 377,

336–338. doi: 10.1038/377336a0

Koriat, A. (2007). “Metacognition and consciousness,” in The Cambridge Handbook

of Consciousness, eds P. D. Zelazo, M. Moscovitch, and E. Davies (New York, NY:

Cambridge University Press), 289–326.

Kornell, N., Son, L. K., and Terrace, H. S. (2007). Transfer of metacognitive skills

and hint seeking in monkeys. Psychol. Sci. 18, 64–71. doi: 10.1111/j.1467-

9280.2007.01850.x

Kruger, J., and Dunning, D. (1999). Unskilled and unaware of it: how difﬁculties

in recognizing one’s own incompetence lead to inﬂated self-assessments. J. Pers.

Soc. Psychol. 77, 1121–1134. doi: 10.1037/0022-3514.77.6.1121

Kunimoto, C., Miller, J., and Pashler, H. (2001). Conﬁdence and accuracy of

near-threshold discrimination responses. Conscious. Cogn. 10, 294–340. doi:

10.1006/ccog.2000.0494

Lachman, J. L., Lachman, R., and Thronesbery, C. (1979). Metamemory

through the adult life span. Dev. Psychol. 15, 543. doi: 10.1037/0012-1649.15.

5.543

Lau, H. (2008). “Are we studying consciousness yet?” in Frontiers of Consciousness:

Chichele Lectures, eds L. Weiskrantz and M. Davies (Oxford: Oxford University

Press), 245–258.

Lau, H. C., and Passingham, R. E. (2006). Relative blindsight in normal observers

and the neural correlate of visual consciousness. Proc. Natl. Acad. Sci. U.S.A.

103, 18763–18768. doi: 10.1073/pnas.0607716103

Lee, T. G., Blumenfeld, R. S., and D’Esposito, M. (2013). Disruption of dorso-

lateral but not ventrolateral prefrontal cortex improves unconscious percep-

tual memories. J. Neurosci. 33, 13233–13237. doi: 10.1523/JNEUROSCI.5652-

12.2013

Lichtenstein, S., Fischhoff, B., and Phillips, L. D. (1982). “Calibration of probabili-

ties: the state of the art to 1980,” in Judgment Under Uncertainty: Heuristics and

Biases, eds D. Kahneman, P. Slovic, and A. Tversky (Cambridge, UK: Cambridge

University Press), 306–334.

Maniscalco, B., and Lau, H. (2012). A signal detection theoretic approach for esti-

mating metacognitive sensitivity from conﬁdence ratings. Conscious. Cog n. 21,

422–430. doi: 10.1016/j.concog.2011.09.021

Maniscalco, B., and Lau, H. (2014). “Signal detection theory analysis of type 1 and

type 2 data: meta-d, response-speciﬁc meta-d, and the unequal variance SDT

Model,” in The Cognitive Neuroscience of Metacognition, eds S. M. Fleming and

C. D. Frith (Berlin: Springer), 25–66.

Mason, I. B. (2003). “Binary events,” in Forecast Veriﬁcation: A Practitioner’s Guide

in Atmospheric Science,edsI.T.JolliffeandD.B.Stephenson(Chichester:

Wiley), 37–76.

Masson, M. E. J., and Rotello, C. M. (2009). Sources of bias in the Goodman–

Kruskal gamma coefﬁcient measure of association: implications for studies of

metacognitive processes. J. Exp. Psychol. Learn. Mem. Cogn. 35, 509–527. doi:

10.1037/a0014876

McCurdy, L. Y., Maniscalco, B., Metcalfe, J., Liu, K. Y., de Lange, F. P., and

Lau, H. (2013). Anatomical coupling between distinct metacognitive sys-

tems for memory and visual perception. J. Neurosci. 33, 1897–1906. doi:

10.1523/JNEUROSCI.1890-12.2013

Metcalfe, J., and Shimamura, A. P. (1996). Metacognition: Knowing About Knowing.

Cambridge, MA: MIT Press.

Middlebrooks, P. G., and Sommer, M. A. (2011). Metacognition in monkeys dur-

inganoculomotortask.J. Exp. Psychol. Learn. Mem. Cogn. 37, 325–337. doi:

10.1037/a0021611

Moore, D. A., and Healy, P. J. (2008). The trouble with overconﬁdence. Psychol. Rev.

115, 502–517. doi: 10.1037/0033-295X.115.2.502

Morgan, M., and Mason, A. (1997). Blindsight in normal subjects? Nature 385,

401–402. doi: 10.1038/385401b0

Murphy, A. H. (1973). A new vector partition of the probability score. J. Appl.

Meteor. 12, 595–600. doi: 10.1175/1520-0450(1973)012<0595:ANVPOT>2.

0.CO;2

Nelson, T. (1984). A comparison of current measures of the accuracy of

feeling-of-knowing predictions. Psychol. Bull. 95, 109–133. doi: 10.1037/0033-

2909.95.1.109

Nelson, T. O., and Dunlosky, J. (1991). When people’s Judgments of Learning

(JOLs) are extremely accurate at predicting subsequent recall: the ‘Delayed-JOL

Effect.’ Psychol. Sci. 2, 267–270. doi: 10.1111/j.1467-9280.1991.tb00147.x

Nelson, T. O., and Narens, L. (1990). Metamemory: a theoretical framework

and new ﬁndings. Psychol. Learn. Motiv. 26, 125–141. doi: 10.1016/S0079-

7421(08)60053-5

Peirce, C. S., and Jastrow, J. (1885). On small differences in sensation. Mem. Natl.

Acad. Sci. 3, 73–83.

Persaud, N., Davidson, M., Maniscalco, B., Mobbs, D., Passingham, R. E., Cowey,

A., et al. (2011). Awareness-related activity in prefrontal and parietal cortices

in blindsight reﬂects more than superior visual performance. Neur oimage 58,

605–611. doi: 10.1016/j.neuroimage.2011.06.081

Rhodes, M. G., and Tauber, S. K. (2011). The inﬂuence of delaying judgments of

learning on metacognitive accuracy: a meta-analytic review. Psychol. Bull. 137,

131. doi: 10.1037/a0021705

Rounis, E., Maniscalco, B., Rothwell, J., Passingham, R., and Lau, H. (2010).

Theta-burst transcranial magnetic stimulation to the prefrontal cortex

impairs metacognitive visual awareness. Cogn. Neurosci. 1, 165–175. doi:

10.1080/17588921003632529

Sandberg, K., Timmermans, B., Overgaard, M., and Cleeremans, A. (2010).

Measuring consciousness: is one measure better than the other? Conscious.

Cogn. 19, 1069–1078. doi: 10.1016/j.concog.2009.12.013

Schmitz, T. W., Rowley, H. A., Kawahara, T. N., and Johnson, S. C. (2006).

Neural correlates of self-evaluative accuracy after traumatic brain injury.

Neuro psycholog ia 44, 762–773. doi: 10.1016/j.neuropsychologia.2005.07.012

Schwiedrzik, C. M., Singer, W., and Melloni, L. (2011). Subjective and objective

learning effects dissociate in space and in time. Proc. Natl. Acad. Sci. U.S.A. 108,

4506–4511. doi: 10.1073/pnas.1009147108

Souchay, C., Isingrini, M., and Espagnet, L. (2000). Aging, episodic memory

feeling-of-knowing, and frontal functioning. Neuropsy cholog y 14, 299. doi:

10.1037/0894-4105.14.2.299

Weil, L. G., Fleming, S. M., Dumontheil, I., Kilford, E. J., Weil, R. S., Rees, G., et al.

(2013). The development of metacognitive ability in adolescence. Conscious.

Cogn. 22, 264–271. doi: 10.1016/j.concog.2013.01.004

Weiskrantz, L., Warrington, E. K., Sanders, M. D., and Marshall, J. (1974). Visual

capacity in the hemianopic ﬁeld following a restricted occipital ablation. Brain

97, 709–728. doi: 10.1093/brain/97.1.709

Yaniv, I., Yates, J. F., and Smith, J. K. (1991). Measures of discrimination skill in

probabilistic judgment. Can. J. Exp. Psychol. 110, 611.

Conﬂict of Interest Statement: The Editor Dr. Harriet Brown declares that despite

having previously collaborated with the author Dr. Klaas Stephan the review pro-

cess was handled objectively. The authors declare that the research was conducted

in the absence of any commercial or ﬁnancial relationships that could be construed

as a potential conﬂict of interest.

Received: 23 January 2014; accepted: 02 June 2014; published online: 15 July 2014.

Citation: Fleming SM and Lau HC (2014) How to measure metacognition. Front.

Hum. Neu rosci. 8:443. doi: 10.3389/fnhum.2014.00443

This article was submitted to the journal Frontiers in Human Neuroscience.

the terms of the Creative Commons Attribution License (CC BY). The use, distribu-

tion or reproduction in other forums is permitted, provided the original author(s)

or licensor are credited and that the original publication in this journal is cited, in

accordance with accepted academic practice. No use, distribution or reproduction is

permitted which does not comply with these terms.

Frontiers in Human Neuroscience www.frontiersin.org July 2014 | Volume 8 | Article 443 |9

Available via license: CC BY 3.0

Content may be subject to copyright.

Available via license: CC BY 3.0

Content may be subject to copyright.

Content uploaded by Stephen M Fleming

Content may be subject to copyright.

Supplemental Material

Data

September 2015

Stephen M Fleming · Hakwan Lau

Download

Reliable, rapid, and remote measurement of metacognitive bias

Article

Full-text available

Jun 2024

Metacognitive biases have been repeatedly associated with transdiagnostic psychiatric dimensions of ‘anxious-depression’ and ‘compulsivity and intrusive thought’, cross-sectionally. To progress our understanding of the underlying neurocognitive mechanisms, new methods are required to measure metacognition remotely, within individuals over time. We developed a gamified smartphone task designed to measure visuo-perceptual metacognitive (confidence) bias and investigated its psychometric properties across two studies (N = 3410 unpaid citizen scientists, N = 52 paid participants). We assessed convergent validity, split-half and test–retest reliability, and identified the minimum number of trials required to capture its clinical correlates. Convergent validity of metacognitive bias was moderate (r(50) = 0.64, p < 0.001) and it demonstrated excellent split-half reliability (r(50) = 0.91, p < 0.001). Anxious-depression was associated with decreased confidence (β = − 0.23, SE = 0.02, p < 0.001), while compulsivity and intrusive thought was associated with greater confidence (β = 0.07, SE = 0.02, p < 0.001). The associations between metacognitive biases and transdiagnostic psychiatry dimensions are evident in as few as 40 trials. Metacognitive biases in decision-making are stable within and across sessions, exhibiting very high test–retest reliability for the 100-trial (ICC = 0.86, N = 110) and 40-trial (ICC = 0.86, N = 120) versions of Meta Mind. Hybrid ‘self-report cognition’ tasks may be one way to bridge the recently discussed reliability gap in computational psychiatry.

Neuroergonomic Approaches to Understanding and Improving Communication of Recognized Cyber Threat Situations

Thesis

Full-text available

Jun 2024

Torvald F. Ask

Updating Prospective Self-Efficacy Beliefs About Cardiac Interoception in Anorexia Nervosa: An Experimental and Computational Study

Article

Full-text available

Jun 2024

Patients with anorexia nervosa (AN) typically hold altered beliefs about their body that they struggle to update, including global, prospective beliefs about their ability to know and regulate their body and particularly their interoceptive states. While clinical questionnaire studies have provided ample evidence on the role of such beliefs in the onset, maintenance, and treatment of AN, psychophysical studies have typically focused on perceptual and ‘local’ beliefs. Across two experiments, we examined how women at the acute AN (N = 86) and post-acute AN state (N = 87), compared to matched healthy controls (N = 180) formed and updated their self-efficacy beliefs retrospectively (Experiment 1) and prospectively (Experiment 2) about their heartbeat counting abilities in an adapted heartbeat counting task. As preregistered, while AN patients did not differ from controls in interoceptive accuracy per se, they hold and maintain ‘pessimistic’ interoceptive, metacognitive self-efficacy beliefs after performance. Modelling using a simplified computational Bayesian learning framework showed that neither local evidence from performance, nor retrospective beliefs following that performance (that themselves were suboptimally updated) seem to be sufficient to counter and update pessimistic, self-efficacy beliefs in AN. AN patients showed lower learning rates than controls, revealing a tendency to base their posterior beliefs more on prior beliefs rather than prediction errors in both retrospective and prospective belief updating. Further explorations showed that while these differences in both explicit beliefs, and the latent mechanisms of belief updating, were not explained by general cognitive flexibility differences, they were explained by negative mood comorbidity, even after the acute stage of illness.

Confidence judgments interfere with perceptual decision making

Article

Full-text available

Jun 2024

Determining one’s confidence in a decision is a vital part of decision-making. Traditionally, psychological experiments have assessed a person’s confidence by eliciting confidence judgments. The notion that such judgments can be elicited without impacting the accuracy of the decision has recently been challenged by several studies which have shown reactivity effects—either an increase or decrease in decision accuracy when confidence judgments are elicited. Evidence for the direction of reactivity effects has, however, been decidedly mixed. Here, we report three studies designed to specifically make reactivity effects more prominent by eliciting confidence judgment contemporaneously with perceptual decisions. We show that confidence judgments elicited contemporaneously produce an impairment in decision accuracy, this suggests that confidence judgments may rely on a partially distinct set of cues/evidence than the primary perceptual decision and, additionally, challenges the continued use of confidence ratings as an unobtrusive measure of metacognition.

Metacognition during fake news detection induces an ineffective demand for disambiguating information

Preprint

Full-text available

Feb 2024

The mechanisms by which individuals evaluate the veracity of uncertain news and subsequently decide whether to seek additional information to resolve uncertainty remain unclear. In a controlled experiment participants assessed non-partisan ambiguous news and made decisions about whether to acquire extra information. Interestingly, confidence in their judgments of news veracity did not reliably predict actual accuracy, indicating limited metacognitive ability in navigating ambiguous news. Nonetheless, the level of confidence, although uncalibrated, was the primary driver of the demand for additional information about the news, with lower confidence driving a greater demand, regardless of its veracity judgment. This demand for disambiguating information, driven by the uncalibrated metacognition, was increasingly ineffective as individuals became more enticed by the ambiguity of the news. Our findings highlight how metacognitive abilities shape decisions to seek or avoid additional information amidst ambiguity, suggesting that interventions targeting ambiguity and enhancing confidence calibration could effectively combat misinformation. Main Text

Can I See Your Answers? Applying the Fishbowl Method in Marketing Analytics Classes

Article

Jun 2024

Data-driven marketing analytics courses are integral to modern business management degrees in universities, yet many graduates focus solely on single, separated data analysis techniques during their learning process, hindering effective integration and practical performance. This study proposes that employing the Fishbowl method, which divides students into “fish” or “observers” to facilitate active problem-solving and analytical reflection, can effectively empower students to augment their learning and performance in marketing analysis by strengthening their metacognition. This research also explores the moderating effects of task complexity and students’ divergent thinking. Two field experiments (41 Cohort 22/23 students in Study 1; 39 Cohort 23/24 students in Study 2) were implemented. The results revealed that the Fishbowl method significantly enhances students’ metacognition, which affects their task-solving performance. Furthermore, students with higher (lower) divergent thinking perform better and are better suited to the observer (fish) roles. This moderating effect was strengthened when the task complexity was high. This study bridges the use of the Fishbowl method with the enhancement of metacognition in the context of marketing analytics courses. Appropriate utilization of the Fishbowl method during marketing analytics courses, along with grouping students based on their thinking traits, can significantly enhance learning effectiveness and performance.

Unconscious Perception of Vernier Offsets

Article

Jun 2024

The comparison between conscious and unconscious perception is a cornerstone of consciousness science. However, most studies reporting above-chance discrimination of unseen stimuli do not control for criterion biases when assessing awareness. We tested whether observers can discriminate subjectively invisible offsets of Vernier stimuli when visibility is probed using a bias-free task. To reduce visibility, stimuli were either backward masked or presented for very brief durations (1–3 milliseconds) using a modern-day Tachistoscope. We found some behavioral indicators of perception without awareness, and yet, no conclusive evidence thereof. To seek more decisive proof, we simulated a series of Bayesian observer models, including some that produce visibility judgements alongside type-1 judgements. Our data are best accounted for by observers with slightly suboptimal conscious access to sensory evidence. Overall, the stimuli and visibility manipulations employed here induced mild instances of blindsight-like behavior, making them attractive candidates for future investigation of this phenomenon.

Lack of effects of online HD-tDCS over the left or right DLPFC in an associative memory and metamemory monitoring task

Article

Full-text available

Jun 2024
PLOS ONE

Neuroimaging studies have shown that activity in the prefrontal cortex correlates with two critical aspects of normal memory functioning: retrieval of episodic memories and subjective “feelings-of-knowing" about our memory. Brain stimulation can be used to test the causal role of the prefrontal cortex in these processes, and whether the role differs for the left versus right prefrontal cortex. We compared the effects of online High-Definition transcranial Direct Current Stimulation (HD-tDCS) over the left or right dorsolateral prefrontal cortex (DLPFC) compared to sham during a proverb-name associative memory and feeling-of-knowing task. There were no significant effects of HD-tDCS on either associative recognition or feeling-of-knowing performance, with Bayesian analyses showing moderate support for the null hypotheses. Despite past work showing effects of HD-tDCS on other memory and feeling-of-knowing tasks, and neuroimaging showing effects with similar tasks, these findings add to the literature of non-significant effects with tDCS. This work highlights the need to better understand factors that determine the effectiveness of tDCS, especially if tDCS is to have a successful future as a clinical intervention.

Toward a universal theory of consciousness

Article

May 2024

While falsifiability has been broadly discussed as a desirable property of a theory of consciousness, in this paper, we introduce the meta-theoretic concept of “Universality” as an additional desirable property for a theory of consciousness. The concept of universality, often assumed in physics, posits that the fundamental laws of nature are consistent and apply equally everywhere in the universe and remain constant over time. This assumption is crucial in science, acting as a guiding principle for developing and testing theories. When applied to theories of consciousness, universality can be defined as the ability of a theory to determine whether any fully described dynamical system is conscious or non-conscious. Importantly, for a theory to be universal, the determinant of consciousness needs to be defined as an intrinsic property of a system as opposed to replying on the interpretation of the external observer. The importance of universality originates from the consideration that given that consciousness is a natural phenomenon, it could in principle manifest in any physical system that satisfies a certain set of conditions whether it is biological or non-biological. To date, apart from a few exceptions, most existing theories do not possess this property. Instead, they tend to make predictions as to the neural correlates of consciousness based on the interpretations of brain functions, which makes those theories only applicable to brain-centric systems. While current functionalist theories of consciousness tend to be heavily reliant on our interpretations of brain functions, we argue that functionalist theories could be converted to a universal theory by specifying mathematical formulations of the constituent concepts. While neurobiological and functionalist theories retain their utility in practice, we will eventually need a universal theory to fully explain why certain types of systems possess consciousness.

Confidence ratings do not distinguish imagination from reality

Article

May 2024
J VISION

Perceptual reality monitoring refers to the ability to distinguish internally triggered imagination from externally triggered reality. Such monitoring can take place at perceptual or cognitive levels-for example, in lucid dreaming, perceptual experience feels real but is accompanied by a cognitive insight that it is not real. We recently developed a paradigm to reveal perceptual reality monitoring errors during wakefulness in the general population, showing that imagined signals can be erroneously attributed to perception during a perceptual detection task. In the current study, we set out to investigate whether people have insight into perceptual reality monitoring errors by additionally measuring perceptual confidence. We used hierarchical Bayesian modeling of confidence criteria to characterize metacognitive insight into the effects of imagery on detection. Over two experiments, we found that confidence criteria moved in tandem with the decision criterion shift, indicating a failure of reality monitoring not only at a perceptual but also at a metacognitive level. These results further show that such failures have a perceptual rather than a decisional origin. Interestingly, offline queries at the end of the experiment revealed global, task-level insight, which was uncorrelated with local, trial-level insight as measured with confidence ratings. Taken together, our results demonstrate that confidence ratings do not distinguish imagination from reality during perceptual detection. Future research should further explore the different cognitive dimensions of insight into reality judgments and how they are related.

Aging, Episodic Memory Feeling-of-Knowing, and Frontal Functioning

Article

Full-text available

Apr 2000

Groups of normal old and young adults made episodic memory feeling-of-knowing (FOK) judgments and took 2 types of episodic memory tests (cued recall and recognition). Neuropsychological tests of executive and memory functions thought to respectively involve the frontal and medial temporal structures were also administered. Age differences were observed on the episodic memory measures and on all neuropsychological tests. Compared with young adults, older adults performed at chance level on FOK accuracy judgments. Partial correlations indicated that a composite measure of frontal functioning and FOK accuracy were closely related. Hierarchical regression analyses showed that the composite frontal functioning score accounted for a large proportion of the age-related variance in FOK accuracy. This finding supports the idea that the age-related decline in episodic memory FOK accuracy is mainly the result of executive or frontal limitations associated with aging.

The Neural Basis of Metacognitive Ability

Article

Full-text available

Dec 2013

Ability in cognitive domains is usually assessed by measuring task performance, such as decision accuracy. A similar analysis can be applied to metacognitive reports about a task to quantify the degree to which an individual is aware of his or her success or failure. Here, we review the psychological and neural underpinnings of metacognitive accuracy, drawing primarily on research in memory and decision-making. These data show that metacognitive accuracy is dissociable from task performance and varies across individuals. Convergent evidence indicates that the function of rostral and dorsal aspects of lateral prefrontal cortex is important for the accuracy of retrospective judgements of performance. In contrast, prospective judgements of performance may depend upon medial prefrontal cortex. We close by considering how metacognitive processes relate to concepts of cognitive control, and propose a neural synthesis in which dorsolateral and anterior prefrontal cortical subregions interact with interoceptive cortices (cingulate and insula) to promote accurate judgements of performance.

Binary events

Article

Jan 2003

I.B. Mason

Measurement of relative mnemonic accuracy

Article

Jan 2008

Calibration of probabilities: the state of the art to 1980

Book

Jan 1982

Signal Detection Theory Analysis of Type 1 and Type 2 Data: Meta-d′, Response-Specific Meta-d′, and the Unequal Variance SDT Model

Article

Dec 2013

Previously we have proposed a signal detection theory (SDT) methodology for measuring metacognitive sensitivity (Maniscalco and Lau, Conscious Cogn 21:422-430, 2012). Our SDT measure, meta-d′, provides a response-bias free measure of how well confidence ratings track task accuracy. Here we provide an overview of standard SDT and an extended formal treatment of meta-d′. However, whereas meta-d′ characterizes an observer's sensitivity in tracking overall accuracy, it may sometimes be of interest to assess metacognition for a particular kind of behavioral response. For instance, in a perceptual detection task, we may wish to characterize metacognition separately for reports of stimulus presence and absence. Here we discuss the methodology for computing such a response-specific meta-d′ and provide corresponding Matlab code. This approach potentially offers an alternative explanation for data that are typically taken to support the unequal variance SDT (UV-SDT) model. We demonstrate that simulated data generated from UV-SDT can be well fit by an equal variance SDT model positing different metacognitive ability for each kind of behavioral response, and likewise that data generated by the latter model can be captured by UV-SDT. This ambiguity entails that caution is needed in interpreting the processes underlying relative operating characteristic (ROC) curve properties. Type 1 ROC curves generated by combining type 1 and type 2 judgments, traditionally interpreted in terms of low-level processes (UV), can potentially be interpreted in terms of high-level processes instead (response-specific metacognition). Similarly, differences in area under response-specific type 2 ROC curves may reflect the influence of low-level processes (UV) rather than high-level metacognitive processes.

Measures of Association for Cross Classifications

Article

Jan 1954

Metamemory: A Theoretical Framework and New Findings

Chapter

Dec 1990
Psychol Learn Motiv

Thomas O. Nelson

This chapter focuses on research program, providing a description of a theoretical framework that has evolved out of metamemory research, followed by a few remarks about the methodology. Research in metamemory is initiated by the paradoxical findings that people can accurately predict their subsequent likelihood of recognizing nonrecallable items and that they can quickly and accurately decide-on the basis of no more than a cursory search through memory-that they will not retrieve particular sought after items. Those findings lead to develop a methodology based on psychophysical methods that are used to empirically investigate people's feeling of knowing. The results of the experiments convinced that for dealing with only a part of a complex metacognitive system and to account adequately for feeling-of-knowing phenomena, a larger perspective was needed. This eventuated in the present theoretical framework that emphasizes the role of control and monitoring processes. The embedding of the feeling of knowing in a richer framework helped to dissipate the paradoxical nature of the feeling of knowing. The chapter discusses that today there are many capable, active investigators and a wealth of solid empirical findings.

Verification of Forecasts Expressed in Terms of Probability,” Monthly Weather Review 78: 1-3

Article