ArticlePDF Available

How surprising is a simple pattern? Quantifying ?Eureka!?

November 2004
Cognition 93(3):199-224

November 2004
93(3):199-224

DOI:10.1016/j.cognition.2003.09.013

Source
PubMed

Authors:

Jacob Feldman

Rutgers, The State University of New Jersey

Simple patterns are compelling. When all the observed facts fit into a simple theory or "story," we are intuitively convinced that the pattern must be real rather than random. But how surprising is a simple pattern, really? That is, given a pattern of featural data, such as the properties of a set of objects, how unlikely would the pattern be if they were actually generated at random? In conventional statistics dealing with patterns of numbers, this type of question would be answered by reference to a null distribution such as the t distribution. This paper gives the analogous answer in the realm of concept learning, that is, the formation of generalizations from patterns of featural data. Using a formal but psychologically valid definition of complexity, I derive and exhibit the distribution of subjective complexity under the hypothesis of no pattern. This leads directly to a number of applications, including a statistical test indicating whether an observed pattern is sufficiently simple that it is not likely to have been an accident: literally, the "significance of simplicity."

(a) A “Bongard problem” (Bongard, 1970). Here, the categorical separation between the objects on the left and those on the right makes a “simple story” (complexity C 1⁄4 4 literals), significantly simpler than a random pattern (see plot at right). In (b), two of the objects have been swapped, making a new problem with no simple solution. Complexity jumps to C 1⁄4 20 literals, no longer simpler than a randomly constructed problem.

…

As algebraic complexity increases, human performance in a learning task steadily decreases; human learners prefer simple generalizations. See Feldman (2000b) for details on the source of these data.

…

The null distribution of complexity for D 1⁄4 3 ; 4, 5 and 6. Each distribution tabulates the proportion of randomly selected objects that have each given level of algebraic complexity. The curves for D 1⁄4 3 and 4 are complete tabulations, not samples: the variety of basic distinct types in those cases is small enough that they can be counted exactly. The curves at D 1⁄4 5 and D 1⁄4 6 are each estimates based on tabulation of 150,000 random concepts.

…

The null distributions of complexity for (a) D 1⁄4 4 and (b) D 1⁄4 5 ; broken down into various levels of n (the number of objects). For each D the figures show n ’s near 2 D 2 1 ; i.e. near half the total objects.

…

Plot showing the exponential rise in mean expected complexity C as a function of D :

…

Figures - uploaded by Jacob Feldman

Content may be subject to copyright.

Content uploaded by Jacob Feldman

Content may be subject to copyright.

How surprising is a simple pattern?

Quantifying “Eureka!”

Jacob Feldman

Department of Psychology, Center for Cognitive Science, Busch Campus, Rutgers University-New Brunswick,

152 Frelinghuysen Rd, Piscataway, NJ 08854, USA

Received 8 August 2002; revised 30 July 2003; accepted 19 September 2003

Abstract

Simple patterns are compelling. When all the observed facts ﬁt into a simple theory or “story,” we

are intuitively convinced that the pattern must be real rather than random. But how surprising is a

simple pattern, really? That is, given a pattern of featural data, such as the properties of a set of

objects, how unlikely would the pattern be if they were actually generated at random? In

conventional statistics dealing with patterns of numbers, this type of question would be answered by

reference to a null distribution such as the tdistribution. This paper gives the analogous answer in the

realm of concept learning, that is, the formation of generalizations from patterns of featural data.

Using a formal but psychologically valid deﬁnition of complexity, I derive and exhibit the

distribution of subjective complexity under the hypothesis of no pattern. This leads directly to a

number of applications, including a statistical test indicating whether an observed pattern is

sufﬁciently simple that it is not likely to have been an accident: literally, the “signiﬁcance of

simplicity.”

Keywords: Complexity; Concepts; Bayes; Pattern; Simplicity

Grand juries don’t like coincidences.

-Anonymous legal aphorism

Simple patterns are compelling. When a police detective ﬁnds that the ﬁngerprints,

eyewitness testimony, forensic evidence, motive, etc. are all well-explained by the simple

hypothesis the butler did it, the case will be convincing to the jury. After all, such a large

doi:10.1016/j.cognition.2003.09.013

Cognition 93 (2004) 199–224

www.elsevier.com/locate/COGNIT

E-mail address: jacob@ruccs.rutgers.edu (J. Feldman).

amount of evidence is not likely to ﬁt by accident into such a simple story. Like the solution

to a crossword puzzle, the simple explanation makes all the clues pop into place. Eureka!

The same set of clues might also be explained by the hypothesis the chauffeur did it, but

framed the butler by bribing witnesses and concocting physical evidence…But this

theory’s complexity and arbitrariness makes it very unconvincing. After all, almost any

pattern of evidence could probably be squeezed into a theory that complicated. No Eureka.

Of course, if the evidence is not so clean (perhaps one eyewitness saw the chauffeur

rather than the butler leaving the crime scene), the detective may have to settle for a

somewhat more complex theory of the crime (the butler did it, but one witness is

mistaken). In this case, the moderate complexity of the theory makes it moderately

compelling. Medium Eureka.

In the more mundane domain of categorization and concept learning, roughly the same

unconscious reasoning seems to occur. Say a certain set of observed objects have certain

properties ab, ac, ad, ae… (e.g. red triangle, red square, red circle…). Although the

objects differ from each other in many ways, they all share the single common property a

(red). Surely (the unconscious reasoning goes) this commonality cannot be a coincidence;

it is too unlikely that so many randomly selected objects would all happen to have property

a:Hence it must be a real pattern, or (in Horace Barlow’s suggestive phrase) a suspicious

coincidence: the objects all belong to a common category whose members typically

(or always) have property a(red things). Eureka!

Or consider the example of a “Bongard problem” (Bongard, 1970;Fig. 1a). Here eight

objects are deﬁned over four binary features (shape ¼square or circle, size ¼small or

large, color ¼ﬁlled or not-ﬁlled, and side ¼left or right). The observer can quickly

ascertain that these eight objects all obey the simple theory Squares on the left, circles on

the right; in Boolean notation, (left !square) ^(right !circle). Surely given such a

large number of objects (8) each deﬁned over so many features (4), such a simple pattern is

not likely to be the result of a random process. Rather some non-random process—in this

case, segregation by shape classes—must have occurred, which we as the observer will

surely care to note. Eureka!

Conversely, if we “scramble” the problem by exchanging two of the same eight objects as

in Fig. 1b, we get a more complicated situation: no simple description now applies. We can

still describe the arrangement, but only using a more complex and long-winded phrase such

as Big shaded square and small unshaded square and big shaded triangle and small

unshaded triangle on the left, small shaded square and… and so on. Indeed, the description

is nothing other than an item-by-item recitation of the contents of the scene. No Eureka.

This line of reasoning presupposes that the observer has a way of measuring or

estimating exactly how unlikely a given pattern of features is as a function of the

complexity of the pattern: if you will, the degree of “Eureka.” In order to accomplish

this, the observer needs (a) a computable measure Cof pattern complexity that

accurately reﬂects the subjective impression of a theory’s “badness”; and (b) the

expected distribution pðCÞof this complexity measure over random patterns—this

latter being essential if the observer is to judge the probability that the pattern is too

simple to be a random outcome. To my knowledge, the statistical properties of

conceptual complexity, including the form of the null distribution, have never

previously been discussed in the literature. This article builds on recent progress in

J. Feldman / Cognition 93 (2004) 199–224200

Fig. 1. (a) A “Bongard problem” (Bongard, 1970). Here, the categorical separation between the objects on the left and those on the right makes a “simple story”

(complexity C¼4 literals), signiﬁcantly simpler than a random pattern (see plot at right). In (b), two of the objects have been swapped, making a new problem with no

simple solution. Complexity jumps to C¼20 literals, no longer simpler than a randomly constructed problem.

J. Feldman / Cognition 93 (2004) 199–224 201

developing (a) (the complexity measure) by deriving and exhibiting (b) (the null

distribution of complexity).

As illustrated below, the null distribution and the associated “signiﬁcance test for

simplicity” lead to a number of useful and concrete applications to problems in cognitive

science. Broadly speaking, they are potentially relevant to any psychological process that

is modulated by complexity, because they make it possible to connect degrees of

complexity to degrees of probability. The premise is that this connection underlies the

ineffable sense of “Eureka”—the subjective feeling that simple patterns bespeak non-

random processes.

The mathematics governing the null distribution are somewhat involved, and many of

the details are given in Appendices A and B. However, much of the math can be boiled

down to a “thumbnail” or approximate version of the distribution, which gives nearly the

right values with a minimum of fuss and calculation. The existence of this quick-and-dirty

approximation is useful in practice, and moreover supports the idea that a rough

probability assessment could be implemented in biological hardware, and could thus truly

underlie our untutored intuitions about the genuineness of patterns.

1. Avoiding coincidences

Assume we have observed some set of objects, and coded them as a pattern with a

given level of complexity C:Possessing the null complexity distribution would allow

us to then say exactly how unlikely this degree of complexity is under the “null

hypothesis” of no pattern. The inference that a given pattern is “real” (as opposed to

random) corresponds to our subjective rejection of this null hypothesis.

The reasoning is extremely familiar from the domain of conventional diagnostic

statistics. Imagine we have sampled a numeric measure (say, the lengths of seed pods)

from two populations (e.g. species of plant), and would like to know whether the observed

mean difference is “real” or “random.” Are the two samples different enough to allow us to

infer that the two distributions are genuinely different? (In proper statistical language, we

say “the population means are not the same” instead of saying “the two distributions are

genuinely different.”) In this situation, we might use the tdistribution: we would calculate

the tvalue corresponding to the observed mean difference, and then look up this value on a

ttable, which would give the probability of observing such a large tunder the null

hypothesis that the populations were truly the same. If the probability is sufﬁciently low,

we reject the null hypothesis, and conclude that the populations are in fact different. An

analogous step would be possible in the domain of pattern detection, if only we had a

model of the null distribution of complexity analogous to the null distribution of the

sample mean given by the tdistribution.

2. A measure of subjective complexity

The idea that human observers have a preference for simple or regular patterns has deep

roots in psychology (Barlow, 1974; Hochberg & McAlister, 1953; Kanizsa, 1979)

J. Feldman / Cognition 93 (2004) 199–224202

and philosophy (Quine, 1965; Sober, 1975). But this idea has been slow to take hold in

what is perhaps its most natural domain of application, namely the formation of

generalizations and the induction of categories, or, as it is often called in the psychological

literature, concept learning. Four decades ago Neisser and Weene (1962) suggested that

human learners tended to prefer the logically simplest category consistent with their

observations, but their suggestion was not met with enthusiasm in the learning literature.

More recent models of human concept learning often have a very different ﬂavor,

emphasizing the used of stored examples (“exemplars”) as a basis for generalization, with

no explicit bias towards simplicity in the induced rule, and in fact no “induced rule” at all.

Much of the trouble has stemmed from the difﬁculty in formulating the correct

deﬁnition of “simplicity” and “complexity.” Mathematicians beginning in the 1960s

have converged on a solution to this problem: simplicity is the degree to which a

given object can be faithfully compressed (an idea usually referred to as Kolmogorov

complexity; see Li & Vita

´nyi, 1997). (“Faithfully” here means “without loss of

information.”) Intrinsically complex objects are those that cannot be communicated

without virtually quoting them verbatim, which takes as much information as the

original object actually contains. Simple objects, by contrast, are those that can be

expressed in a form much more compact then they are themselves.

In earlier work I have used a similar measure of complexity to study human learners’

intuitions about concepts to be learned. In the simplest situation, learners are presented

with a collection of sample objects, each of which can be represented by a set of Boolean

(binary-valued) features. In this situation, the set of sample objects can be thought of as a

long propositional formula, essentially the disjunction of the objects themselves. A simple

measure of the complexity of these objects, then is the length of the shortest equivalent

formula, that is, the length of the most compact way of faithfully describing the same set of

object; this number is called the Boolean complexity (Givone, 1970; Wegener, 1987).

Boolean complexity is most conveniently expressed in terms of the number of literals,or

instances of variable names that appear in the shortest formula. For example the object set

red square or red circle is logically equivalent to red (assuming shapes are always squares

or circles), and so has Boolean complexity 1; exactly one variable name appears in the

compressed expression. Conversely red square or blue circle cannot be compressed at all,

so it has Boolean complexity 4.

Studies in my laboratory (Feldman, 2000b) have shown that the Boolean complexity

correlates well with the difﬁculty subjects have learning the given concept—that is, that

this number gives a good account of subjective or psychological complexity (see also

Feldman, in press). Human concept learners—like police detectives—prefer patterns of

data that can be summarized succinctly, i.e. “simple theories.”

2.1. Algebraic complexity

In what follows below, I will actually use a related but different measure of subjective

complexity, called the algebraic complexity (Feldman, 2001). Like Boolean complexity,

Exemplar-based categorization models may in fact tend to produce simple abstractions, but only as it were

“epiphenomenally,” that is, not as an overt aspect of their design.

J. Feldman / Cognition 93 (2004) 199–224 203

the algebraic complexity of a set of examples can be understood as the length of

the shortest faithful representation of the observed objects, and is also measured in

literals. However, rather than expressing formulae in the logicians’ conventional basis of

{^;_;:};the algebraic complexity uses a more subtle but more psychologically

motivated basis. A synopsis of algebraic complexity and the concept algebra behind it is

given in Appendix A. It should be noted, however, that the general argument below could

be applied to any concrete, computable complexity measure, in whatever domain one

happened to be interested in, and would probably yield generally similar results—although

this is not so with uncomputable measures such as Kolmogorov complexity, for reasons

discussed below.

Brieﬂy, the idea behind algebraic complexity is to reduce each concept (or set of

examples) to a structured representation, called the implicational power series, which

expresses the concept in terms of all the regularities it obeys, expressed in a maximally

compact form. The representation commits to a particular choice of what counts as a

“regularity” (see Appendix A), assigning brief descriptions to those concepts that obey

such regularities, and long descriptions to those that do not. The key idea is that the power

series is stratiﬁed—overtly broken down by levels of component complexity. Thus the

series gives a kind of spectral breakdown of the concept, the same way the Fourier

decomposition breaks a function down into components at different frequencies (here,

complexity levels); Fig. 2 shows several examples of concepts and their spectra. The

spectral representation makes explicit how much of the concept’s structure can be

explained by simple rules, and how much can only be explained by more complex rules.

The algebraic complexity itself is then simply the mean spectral power, indicating where

in the complexity spectrum most of the concept’s structure lies.

Because the representation overtly breaks the concept down into simple and complex

components (and every level in between), it in effect expresses where the concept falls in

the spectrum between “theory” and “exceptions” (cf. Nosofsky, Palmeri, & McKinley,

1994). The more regular it is, the more spectral power at the low end, and thus the lower

the ﬁnal complexity score C:The more internally “exceptional” it is, the more spectral

power at the high end, and thus the higher the ﬁnal complexity score (see Fig. 2). As

Cranges from low to high, the associated concepts run the gamut from primarily rule-like

to primarily “exceptional” in nature.

As with any concrete, computable complexity measure, the commitment to a particular

regularity language means that some seemingly “regular” concepts are deemed complex

because the regularities they obey are not of the type recognized by the language. An

extreme example is the parity function (with D¼2 called exclusive-or; with D¼3

equivalent to Shepard, Hovland, and Jenkins (1961)’s type VI). This concept exhibits a

neat alternating pattern and thus seems regular from a certain point of view, but

nevertheless scores as complex and incompressible in any measure that doesn’t know

about that particular kind of alternation, such as the concept algebra (as well as many other

languages; see Feldman, 2003; Scho

¨ning & Pruim, 1998 for discussion). Unlike Boolean

complexity, algebraic complexity can be readily computed for concepts deﬁned over non-

Boolean discrete features, that is, features that have more than two possible values (e.g.

shape ¼{square,circle,triangle,…}), though I will consider only Boolean features in

J. Feldman / Cognition 93 (2004) 199–224204

Fig. 2. Three concepts and their algebraic power spectra. As complexity increases, the spectra shift to the right

and become more “exceptional.” Each concept here is deﬁned over two features, each having four possible

values; heavy dots at the appropriate vertices indicating the members of the concept. With a simple concept (a),

all the spectral power is at minimal degree, indicating unanimous satisfaction of a simple regularity. In (b), one of

the four objects now fails to obey the regularity (an exception), and consequently some of the spectral power

shifts to higher degree. In (c), none of the four objects obeys any common regularities, and much of the spectral

power shifts to higher degrees, yielding a high complexity value. The ﬁnal complexity value is the sum of spectral

power weighted by degree plus one. For more explanation of how the algebraic power spectrum is computed, see

Appendix A and Feldman (2001).

J. Feldman / Cognition 93 (2004) 199–224 205

this paper (with the exception of the concepts illustrated in Fig. 2, deﬁned over 4-valued

features).

There are several reasons for using algebraic complexity rather than Boolean

complexity as the measure in which to develop the null distribution. First, the actual

computation of Boolean complexity is heuristic, using a variety of minimization tricks

(because exact computation of it is computationally intractable); as a result it is difﬁcult to

model analytically. Second, computation of Boolean complexity, even in its heuristic

approximation, is relatively intractable, which limits our ability to compute it to small

numbers of features. For example, the Boolean complexity of random set of objects

deﬁned by ﬁve features would already be impractical to compute, whereas algebraic

complexity of concepts with ﬁve or six or more features are readily computable (as will be

seen below). Third, although the mathematics underlying algebraic complexity may seem

more difﬁcult, it is better suited to modeling human inferences, in that it more directly

captures the main idea of building a more compact representation by extracting true

regularities from the observed data. Finally, algebraic complexity ﬁts the human data

somewhat better than Boolean complexity (see Fig. 3).

The key points necessary to understand the following are as follows. Given a set of

objects x¼x1;x2;…xn;we can compute its complexity CðxÞ;which is the length measured

in literals of the shortest description of x:Low values of Cmean very simple theories, e.g.

C¼1 means the whole set of objects can be described by a single property. High values of

Cmean complex sets of observations that cannot be compressed. For example with D

features and nobjects, one can list all the objects verbatim with a description of length Dn;

Fig. 3. As algebraic complexity increases, human performance in a learning task steadily decreases; human

learners prefer simple generalizations. See Feldman (2000b) for details on the source of these data.

J. Feldman / Cognition 93 (2004) 199–224206

so a complexity value this high would mean that the observations could not be compressed

at all—a very unsatisfying theory.

Now, with the complexity measure in place, we turn to the main question: if we

generate object sets xat random, what will be the expected distribution of complexity

pðCÞ?

3. The null distribution of complexity

As mentioned above, the mathematics required to explicitly model the distribution pðCÞ

is rather involved, but simply exhibiting and inspecting it is easy: all we need to do is to

generate many object sets at random, compute their complexities, and tabulate the results.

Here “at random” means that the binary features are all independent, with each taking each

of its possible values with probability 1/2—that is, with feature values decided by ﬂips of a

fair coin. Of course, different assumptions about the nature of random concepts, e.g.

different feature probabilities would lead to different distributions.

Fig. 4 shows the results of such a simulation, tabulating the complexities of large

samples of randomly generated concepts for several values of D(¼the number of

features). Actually for low values of Dwe do not need to estimate the distribution via this

Monte Carlo technique; we can measure it precisely. Boolean concepts only come in a

ﬁnite variety of “ﬂavors” or basic types (see Feldman, 2003). For D¼2;for example,

there are only three types: afﬁrmation (a;C¼1); conjunction/disjunction (a_bor a^b;

C¼2); and exclusive-or (ða^:bÞ_ð: a^bÞ;C¼4). All other two-feature concepts

have essentially the same logical structure, and thus the same complexity, as one of these

three types. This means, perhaps surprisingly, that for D¼2 complexity Ccan only take

on the values 1, 2 or 4. The exact proportion of all concepts that fall into each of these three

categories can be computed exactly, thus allowing the distribution pðCÞitself to be

computed exactly. This means that the somewhat “jaggy” distributions shown for D¼3

and 4 in the ﬁgure are not actually poor approximations, but are themselves the real

distributions: the “truth” itself is jaggy. Unfortunately, this strategy cannot be pursued for

higher values of D;where the taxonomy, albeit similar, becomes much more complicated

(see Feldman, 2003); hence the use of Monte Carlo simulations to estimate the distribution

in these cases.

Randomly generated object sets can have different numbers nof objects (ranging from

0to2

D); the curves in Fig. 4 aggregate complexities calculated from sets of all sizes.

Instead, given a sample of nobjects, one might like to evaluate it with respect to the

expected complexity for sets of equal size. These distributions are again easy to tabulate,

but not as easy to inspect as the aggregate curves, because for each Dthere are 2Ddistinct

curves, which tend to overlap. Fig. 5 shows the distributions for D¼4 and 5 broken down

into various levels of n(not all levels of nare shown, but rather a sample drawn from n

close to 2D21). Each of these individual curves gives the expected distribution of

complexity for a set of nobjects deﬁned over Dfeatures. As can be seen in the ﬁgure, these

curves are much “jaggier” and overlap each other substantially, making it hard to

distinguish them visually. (As before, the jagginess is “real” and not the result of

undersampling.) I show them anyway because these curves constitute arguably the most

J. Feldman / Cognition 93 (2004) 199–224 207

appropriate standard against which to judge the surprisingness of the complexity of a given

observed sample of nobjects. Conversely when one wishes to consider the expected

complexity more abstractly without reference to a speciﬁc object set, the aggregate curves

make a convenient summary.

3.1. Poisson and Gaussian approximations

The relatively smooth aggregate distributions (Fig. 4) resemble a Poisson distribution.

A Poisson can be thought of as the expected number of “successes” in a large set of

Fig. 4. The null distribution of complexity for D¼3;4, 5 and 6. Each distribution tabulates the proportion of

randomly selected objects that have each given level of algebraic complexity. The curves for D¼3 and 4 are

complete tabulations, not samples: the variety of basic distinct types in those cases is small enough that they can

be counted exactly. The curves at D¼5 and D¼6 are each estimates based on tabulation of 150,000 random

concepts.

J. Feldman / Cognition 93 (2004) 199–224208

Fig. 5. The null distributions of complexity for (a) D¼4 and (b) D¼5;broken down into various levels of n(the number of objects). For each Dthe ﬁgures show n’s

near 2D21;i.e. near half the total objects.

J. Feldman / Cognition 93 (2004) 199–224 209

independent trials of a low-probability event (e.g. the total number of “heads” on a series

of tosses of a very tails-heavy coin; see Wickens, 1982 for an introduction). Here the rare

event in question is roughly that a random set of observations will submit to a

representation that includes a particular single literal in its most compact description. This

observation is pursued more thoroughly in Appendix B, where the distribution is discussed

in greater detail.

A Poisson distribution asymptotically resembles a Gaussian (normal) with identical

mean and variance, in which case the single parameter is usually called

(again see

Wickens, 1982). Hence Fig. 4 resembles an ensemble of normal distributions;

as D

increases, both the mean

Dand variance

Dof expected complexity increase. These

curves are very close to normality for D$5:at D¼5 the correlation with a Gaussian of

the same mean and variance is r¼0:9956;and at D¼6itisr¼0:9976:Hence in each

case the error in the normal approximation ð12r2Þis less than 1%. Moreover the means

and variances of the distributions are almost perfectly correlated (r¼0:9994 in our

sample of D¼2 through 6), supporting the claim that the distribution is approximately

Poisson.

The approximately normal form of the aggregate distribution means that a typical

random concept deﬁned over, say, four features will tend to have complexity of about 12

literals, with about 68% falling within one standard deviation, i.e. between 8.5 and 15.5.

Occasionally, such an object set will turn out more simple just by accident, but this will

happen more rarely—the precise probability is given by the curve. Probabilities of ranges

of complexity values—e.g. “simpler than C¼4”—can be computed by integrating the

area under the curve, in the manner familiar from statistics books. In practice, because

each curve (above D¼4) is approximately normal, this is easily accomplished by using a

normal distribution look-up table found in any statistics book. This fact will be used below

to help create a quick-and-dirty recipe for computing the signiﬁcance of a given observed

pattern.

The increase of

Dð¼

D¼

Þwith Dis approximately exponential (see Appendix B

for some discussion of why this is so). Thus for D¼3 the distribution is centered at about

C¼5 (s.d. about 2.2), for D¼4 at about C¼12 (s.d. about 3.6), for D¼5 about C¼28

(s.d. about 5); and for D¼6 about C¼64 (s.d. about 7.9); and so forth (Fig. 6). These

values are well ﬁt by a simple exponential growth model

D:ð1Þ

The agreement between this model and the known means is very good ðR2¼0:99978Þ;

with estimated parameters

¼0:36287;ð2Þ

¼0:86686:ð3Þ

The approximation by a normal distribution here cannot be completely correct, because a Gaussian has

inﬁnite tails in both directions, whereas complexity can never be negative. However, this is a numerically small

error, as the probability assigned by the Gaussian model to negative complexities is vanishingly small, for

example less than 0.000001 at D¼5 and decreasing with D:Nevertheless it should be kept in mind that the

Gaussian model is only an approximation.

J. Feldman / Cognition 93 (2004) 199–224210

This formula allows us to estimate the expected complexity for values of Dgreater than

those directly simulated here.

As mentioned, a more acute comparison for the complexity of a given sample of n

objects is actually given by the particular curve for that level of n(and D). Unfortunately,

these curves cannot be approximated via anything as simple as a Gaussian or Poisson

model. First, as mentioned, these distributions are inherently “jaggy.” In part this is

because these curves contain a substantial “periodic” or ﬂuctuating component

superimposed over their primarily unimodal shape. The period and phase of the

ﬂuctuation depend on n;so while these components smooth out when ns are combined,

they are non-negligible for each individual n:An even more serious deviation from

unimodality stems from the fact that some complexities can only occur for particular

values of n;for example only concepts with n¼2D21can exhibit C¼1 (see Feldman,

2003). For these reasons when we wish to employ a quick-and-dirty approximation to the

null distribution of complexity, we can refer only to the aggregate curves, although for a

given set of nobjects they admittedly make a less perfectly apposite standard.

4. A signiﬁcance test for simplicity

Given a set of observations on Dfeatures with computed complexity C;one would

like to be able to estimate how unlikely this level of complexity would have been if

the observations were actually random. If the answer to this question is “very

Fig. 6. Plot showing the exponential rise in mean expected complexity Cas a function of D:

J. Feldman / Cognition 93 (2004) 199–224 211

unlikely”, then we can “reject the null hypothesis” and conclude that our theory

pretty good.

If the answer is “moderately likely”, we might conclude that the

observations might have been random after all. Thus we have a quantitative tool to

help answer the police detective’s quandary: is my best theory of the crime good

enough to take to the grand jury?

Ideally, we would like to conduct this signiﬁcance testing using the true null distri

butions. In practice, the exact distribution may not be available, and one would like to

substitute an easily computed approximation. As discussed above, for curves speciﬁc to

each level of Dand n;this is not generally possible. However, for the aggregate curves

at each level of D;the Gaussian approximation affords a convenient and accurate

approximation. As mentioned, for a given level of Dthe distribution of complexity is

approximately Gaussian (normal). Hence an easy way to perform a signiﬁcance test is

via a z-test, using the mean and variance of the null distribution as estimated by Eq. (1)

(recall that because the distribution is approximately Poisson, the mean and variance

are approximately equal).

Step-by-step recipe. Putting this all together, here’s how to conduct a quick-and-dirty

signiﬁcance test for simplicity, using the methods given in any introductory statistics book

for conducting a z-test.

1. Compute the complexity Cof your sample.

2. Compute the mean

Dof the distribution of complexity at the given number of features

D;using Eqs. (1)– (3).

3. Compute the z-score of your sample’s complexity Cvia

z¼C2

ﬃﬃﬃﬃ

p:ð4Þ

4. Finally, evaluate the signiﬁcance (one-tailed) of this zby looking at a table of z-scores

in the back of any statistics book.

If the test is signiﬁcant, we can reject the null hypothesis that the pattern is random.

4.1. Examples

As an example, consider the simple Bongard problem from Fig. 1a. This was deﬁned

with four features and has eight objects, so we focus on the curve for D¼4;n¼8:First

we compute the signiﬁcance using the curve speciﬁc to this level of n;and then repeat the

test with the “quick-and-dirty” version that uses the Gaussian approximation to the

aggregate distribution.

Here “our theory” is just that the observations weren’t generated randomly, i.e. the negation of the null

hypothesis. Negating this doesn’t support any speciﬁc afﬁrmative theory of what did generate the data; see below

for more discussion of this point.

It is intriguing that some of the earliest foundational work on the mathematical theory of complexity, that of

Martin-Lo

¨f (1996), focused on the tendency of random strings to pass arbitrary “signiﬁcance tests” for regularity.

Unfortunately, there is no easy way to perform this step: see Feldman (2001) for all the gory details. An on-

line version of the algorithm is under development.

J. Feldman / Cognition 93 (2004) 199–224212

The complexity score for the pattern was C¼4:Fig. 1 shows the area (shaded) under

the D¼4;n¼8 complexity curve corresponding to patterns this simple or simpler: only

0.00901 of the curve. So our pattern is signiﬁcant at the p¼0:009 level. Eureka!

In contrast, now consider the scrambled problem from Fig. 1b, whose complexity was

C¼20:This is much larger than the mean of 12.8 for D¼4;n¼8 patterns, so we can

immediately see that it is not simpler than a random concept. In fact its “signiﬁcance” is

0.9538, meaning that it is more complex than over 95% of all comparable concepts.

Deﬁnitely no Eureka.

The original, unscrambled Bongard problem (Fig. 1a) was pretty convincing. What

would happen if one of the examples were simply deleted? Would one still believe

the pattern with a bit less evidence? If so, how much less strongly? We can answer

this question with another signiﬁcance test. In the abridged problem, complexity Cis

now 7, rather than 4 as in the original problem—the pattern is no longer quite as

“clean” as it was. This level of complexity is now only “marginally” signiﬁcant at

p¼0:0936:You might believe the pattern, but you couldn’t publish it.

Now the quick-and-dirty z-test, which uses the expected distribution of complexity

for D¼4 patterns generally. The predicted expected complexity at D¼4is

4¼

0:36287 £exp½0:86686 £4or about 11.63 literals, and the standard deviation

4is

the square root of this, or about 3.41. (A look at Fig. 4 conﬁrms that this is about

right as the peak of the D¼4 curve.) So our measured complexity of C¼4 is about

2:24ð¼ ð11:63 24Þ=ﬃﬃﬃﬃﬃﬃﬃ

11:63

pÞstandard deviations less than the mean. Consulting a table

of z-scores, we see that this z’s deviation from zero is signiﬁcant at p¼0:0125:This

matches reasonably well with the true value of p¼0:009 we computed above by

integrating under the non-aggregate curve, corroborating the accuracy of the quick-

and-dirty test. By either test, we can feel conﬁdent that the Bongard problem has been

solved; the pattern is too simple to be an accident. We are ready to take it to the

grand jury!

5. Applications

Now let’s put a famous Boolean pattern to the test. The logic I’ll use is similar to

that used to solve a notorious problem in archaeology: the mystery of the supposedly

collinear patterns of paleolithic standing stones in Britain. These patterns, called ley

lines, had struck many as too collinear to be accidental, suggesting intentional planning

on the part of ancient Britons. But were they, perhaps, mere accidental alignments? To

answer this, you’d have to know just how many collinearities would be expected from

a random distribution of points in the plane—that is, the null distribution. Unfortunately

the statistics of spatial distributions, like the statistics of complexity, did not lend

themselves directly to any preexisting mathematical models. The problem was solved

by the statisticians Broadbent (1980) and Kendall and Kendall (1980), who provided

the necessary mathematics by inventing an essentially new ﬁeld of statistics, now called

spatial statistics. The answer, by the way, is that the angles comprised in known ley

lines appeared to be well within the null distribution, and thus probably just

coincidences.

J. Feldman / Cognition 93 (2004) 199–224 213

In a similar spirit, we ask here just how surprisingly simple is a famous Boolean pattern.

We consider the pattern Only eat oysters during months containing the letter R.

We can

code the 12 months using two Boolean features to represent contains an R and safe to eat

oysters,

and three additional Boolean features as dummy variables to distinguish the

months (note that there are 8 ¼23months that contain R). The complexity of the resulting

pattern is C¼8:At D¼5;the expected complexity is about

5¼27:7;so our observed

complexity of 8 is about 3:74ð¼ ð27:728Þ=ﬃﬃﬃﬃﬃﬃ

27:7

pÞstandard deviations below the mean,

which is highly signiﬁcant ðp¼0:000091Þ:Unusually simple—and thus unusually

memorable, as simplicity leads directly to ease of memorization (Feldman, 2000b). At

p,0:00001 this is a memorable pattern indeed—which is perhaps why it has been

remembered by countless generations of seafood lovers.

A more serious application concerns the evaluation of concepts widely used as target

concepts in experiments in the concept learning literature. For historical reasons, certain

speciﬁc concepts—particular combinations of Boolean features—have been used over and

over by different researchers in their experiments (mostly to allow easy comparisons of

data across methodologies, etc.). Much of the data has supported one broad class of

models, exemplar models, and the result is now that many contemporary researchers in

human concept learning feel that they provide the best overall account of human concept

learning (Nosofsky & Johansen, 2000; Palmeri & Nosofsky, 2001).

Recently, though, Smith and Minda (2000) have alleged that the impression of a good

ﬁt between exemplar models and the data is actually an accident, substantially due to the

somewhat arbitrary range of concepts that have been frequently tested. Speciﬁcally, they

argue that many of the most-tested concepts are unusually complex and irregular. Such

concepts are exactly the type on which one would expect exemplar models to do well.

Exemplar models are based around the storage of particular examples rather than on the

extraction of common tendencies or regularities, as in prototype-based theories. When

faced with a complex or irregular concept, there is no point in trying to extract common

tendencies or regularities from it: such concepts—by deﬁnition—don’t have any common

tendencies or regularities. (From the point of view of complexity theory, that’s exactly

what makes them complex—the lack of any orderly structure that could form the basis of a

compression scheme.) With such concepts, exemplar models, which simply store

examples without trying to extract regularities, represent a more useful strategy. Thus

testing primarily random concepts would tend to artiﬁcially inﬂate exemplar models’ ﬁt to

the data.

But is it really true that the concepts that have been widely tested are relatively complex

or irregular ones? This is impossible to say unless you can quantify complexity—which

the complexity measure Callows you to do. But furthermore, you also need to be able to

quantify where the observed level of complexity stands in relation to the range of possible

levels—that is, place the given complexity Cin the distribution pðCÞ:This of course is

exactly what we are trying to do in the current paper.

Or is it Never eat oysters during months containing the letter R? I can never remember. But either way, the

complexity is the same, since the two versions differ only by a negation.

Here I assume that the dictum is in fact correct, and that eating oysters during Rmonths is in fact safe! The

question is whether this pattern is too simple to appear to be accident.

J. Feldman / Cognition 93 (2004) 199–224214

Let’s take two examples that have been at the forefront of the pro-exemplar argument

since its inception. Exemplar models were invented by Medin and Schaffer (1978), who

based their original argument around several experiments involving particular ﬁxed

concepts, each deﬁned by a particular combination of four Boolean features. The inﬂuence

of this paper was such that more recent papers have often used the same concepts (either

retesting them or modeling earlier data). Medin and Schaffer’s Exps. 2 and 3 concept, for

example, has been used by Medin, Altom, Edelson, and Freko (1982), Nosofsky et al.

(1994), and Pavel, Gluck, and Henkle (1988). This concept, which has D¼4;n¼5;has

algebraic complexity C¼14:But how unusual is this level exactly? Consulting the (non-

aggregate) distribution for D¼4;n¼5;we see that this is about the 85th complexity

percentile (i.e. the area under the curve at or to the left of C¼14 is 0.85 of the total area).

Thus this concept is more complex and irregular than more than three-quarters of all

comparable concepts. Medin and Schaffer (1978)’s Exp. 4 concept (D¼4;n¼6), also

front-and-center in the case for exemplar models, has complexity C¼12;which puts it in

the 58th percentile. Another often-used concept was used by Medin, Altom, and Murphy

(1984), and then re-used by McKinley and Nosofsky (1993) and Pavel et al. (1988). This

one (at D¼4;n¼4) has complexity 8, which is at the 45th percentile, i.e. very close to

average for a random concept.

Hence Smith and Minda (2000)’s worry was well-founded. These concepts, around

which much of the empirical case in favor of exemplar models is based, are drawn from

very random territory: average or even well above-average in complexity for randomly

generated concepts. Exemplar models may account well for how human learners handle

them, but this says little about how they might handle less complex concepts, where more

regular structure is available to be extracted. This is certainly not evidence against

exemplar concepts, but it does argue, as Smith and Minda (2000) suggested, that the

evidence is not as strong as has been claimed.

6. Trouble at the right tail?

One aspect of the shape of the null distribution is a bit paradoxical, or at least, puzzling:

it’s symmetrical. This means that just as “simple stories” are unlikely to occur by accident,

so are complex stories—indeed, high-complexity patterns are just as rare as low-

complexity patterns. But obviously simple stories are treated very differently from

complex ones psychologically. What’s up?

Just as a simple pattern can only happen by accident when many elements line up just

so, an extremely incompressible pattern can only happen when all the elements line up just

so as to break any possible incipient regular trend. Just as a very simple theory of a crime is

possible only when all the cues line up, a very complex theory is only necessary when all

the cues conﬂict—each, say, pointing to a different suspect—and this only happens by

accident a small proportion of the time.

So why do we as observers tend to focus on simple theories and not on complex ones?

Simple theories’ low probability from random sources cannot be the only reason we prefer

them, because that property is shared by complex theories. One answer is that simple

theories ought to be assigned higher prior probability than complex ones. This is

J. Feldman / Cognition 93 (2004) 199–224 215

a common proposal in the Bayesian literature (often called an Occam factor; see Duda,

Hart, and Stork (2001) for an introduction). Indeed even human subjects in shape

classiﬁcation experiments assign highest probability to the most symmetrical and regular

shapes (Feldman, 2000a).

Why assign simple theories higher priors? When criminals perpetrate crimes, the clues

generally tend to cluster into a simple theory, if only we had all the clues and could discern

the pattern. But there is no similar source for complex theories: generally, the evidence is

not created by an elaborate collusion designed to render it maximally complex. This is an

asymmetry between simple and complex theories. When a very simple pattern is seen, it

has low likelihood under the null hypothesis, and high likelihood under some high-prior-

probability simple model. But when a very complex pattern is seen, it too has low

likelihood under the null, but there isn’t any high-pior-probability hypothesis that would

explain it. Hence it is interpreted as a random pattern, albeit an usually complex one.

The symmetry of the null complexity distribution is in stark contrast to the classical

theory of Kolmogorov complexity (Li & Vita

´nyi, 1997), which entails that as Dincreases,

a larger and larger proportion of all patterns will have near-maximal complexity. In the

standard picture, as the size of the patterns under consideration increases, a higher and

higher fraction of the total number of patterns have complexity above any ﬁxed threshold.

Inevitably, this means that the distribution of complexity does not tail off at all on the

righthand side, but rather climbs inexorably until it reaches an abrupt cliff at the maximum

possible value (approximately equal to the size of the patterns considered). Why is our

distribution pðCÞso different?

The key difference is that classical Kolmogorov complexity is a universal measure,

meaning that the complexity value assigned to a pattern is the lowest value available in any

description system or computer language. (This also entails that its actual value is

uncomputable; see Scho

¨ning & Pruim, 1998.) By contrast here we are dealing with a ﬁxed

description language (the concept algebra); briefer descriptions in other languages don’t

count. With a universal measure, if a supposedly complex pattern were very rare, it could

ipso facto be referred to nearly uniquely, and hence described very compactly. For

example, at the extreme, the single most complex pattern could be uniquely identiﬁed by

the very brief phrase “the single most complex pattern”—and thus paradoxically have low

Kolmogorov complexity (cf. the “Berry paradox,” due to Bertrand Russell; Chaitin, 1995).

But a ﬁxed language such as the concept algebra doesn’t necessarily allow such a

description, so its brevity doesn’t automatically confer low complexity. An extreme case

of this, mentioned above, is the parity function, which is unique, and thus has low

probability; it has a long description in the concept algebra notwithstanding its uniqueness

and consequent brief descriptions in other languages.

Another way of seeing the same point is that in the conventional picture, longer

descriptions automatically apply to more objects, because they allow more objects to be

uniquely identiﬁed; this yields the ever-rising right tail of the Kolmogorov null

distribution. But again this doesn’t apply to complexity as measured in any ﬁxed

language, where uniqueness of reference doesn’t necessarily relate to brevity of

description. The result of all this is that very rare classes of patterns automatically have

low Kolmogorov complexity, but may well have high complexity in any ﬁxed code, thus

allowing the vanishing right tail (without any paradox).

J. Feldman / Cognition 93 (2004) 199–224216

Real computing organisms, of course, have ﬁxed description languages (or,

equivalently, must choose from a ﬁxed and ﬁnite set of description languages), and

don’t have arbitrary access to alternative forms of representation; that’s what makes

complexity computable. Hence it’s worth considering that the Kolmogorov conception of

complexity, based on a universal code, misses an important part of the picture in subjective

(computable) complexity measures, embodied by the vanishing right tail.

7. Occam meets Bayes

All this raises the question: are human concept learners signiﬁcance testers? The

question of whether even experimental psychologists ought to be signiﬁcance testers has

quietly become controversial among statisticians in recent years (Dixon, 1993; Loftus,

1991). Conventional diagnostic statistics, many statisticians now argue, is in fact missing a

large part of the picture of inferential statistics—speciﬁcally, it explicitly considers the

probabilistic consequences of only one hypothesis, the null hypothesis, while disregarding

others, e.g. target hypotheses of interest to the scientists. A more complete picture,

arguably, is provided by Bayesian theory (Jaynes, 1983). Anderson (1991) has suggested

that human categorization might follow Bayesian principles (viz. optimal use of available

information), a proposal recently expanded and reﬁned by Tenenbaum (1999).

But in order for the story advanced in this paper to bear Bayesian fruit, the null

distribution—complexity due to random processes—must be partnered with other

distributions—namely, of complexity due to regular processes. Technically, these

would be class-conditional likelihood functions giving probability as a function of

complexity for some number of regular classes or data sources. There are various ways of

constructing such distributions, depending on exactly how one deﬁnes “regular,” and as a

result these distributions do not have the uniquely well-deﬁned status possessed by the

null. Understanding these distributions is of fundamental importance, because unlike the

null distribution developed above, they reﬂect the observer’s afﬁrmative model of patterns

extant in the environment. I will not further develop this point in the current paper, but

simply put in a call for progress in this direction.

Hence the distribution of complexity under the null hypothesis is not the whole story:

we also need to look at the distribution of complexity under particular regular hypotheses.

The major point is that the subjective probability of categorical hypotheses is strongly

modulated by their complexity, at least in human minds, as attested by the behavioral data

(Feldman, 2000b). The null hypothesis is only one among many hypotheses that needs a

probability assigned to it, but it’s a particularly important one—in Polya’s (1954)’s phrase,

“the ever-present rival conjecture.” Hence the larger signiﬁcance of the null distribution

developed in this paper is probably its role in a more complete theory of inference based on

both simplicity and Bayesian principles.

8. Intuitions of randomness

In considering the distribution of complexity in randomly generated patterns, it must be

remarked somewhere that subjective expectations about random processes are notoriously

J. Feldman / Cognition 93 (2004) 199–224 217

inaccurate (see Falk & Konold, 1997). In the most famous illustration of this, Gilovich,

Vallone, and Tversky (1985) showed that subjects expect random runs of Bernoulli trials

(e.g. basketball shots or coin ﬂips) to exhibit fewer long runs and more alternations than

they really do, the so-called hot-hand illusion.

Recently Grifﬁths and Tenenbaum (2001)

have shown that human judgments of randomness can, in fact, be well-modeled if you

make some assumptions about their subjective probabilistic expectations about patterns

produced by regular processes, as mentioned above. This debate is not strictly relevant

here, because we are talking about categories of objects in Boolean space, rather than

sequential patterns per se: the math is not directly applicable. The point remains however

that just because complexity Cis distributed as some function pðCÞdoesn’t necessarily

entail that human intuitions will correctly recognize this. The distribution developed in the

current paper rather plays the role played by the binomial distribution in the hot-hand

literature: the objective distribution, albeit in this case of a subjective measure,

complexity. It is impossible to begin an empirical investigation into human intuitions

on this point, though, until the objective distribution is established, which is the goal of the

current paper.

9. Conclusion

Grand juries hate coincidences—and they are right to. Coincidences mean unsatisfying

complexities, unexplained anomalies, and awkward exceptions that mar an otherwise

elegant and coherent solution. When you’ve really solved the puzzle, all the clues ought to

fall into place in a simple, coherent story. And when the story is simple, intuition says its

unlikely to be an accident. The contribution of this paper is to quantify exactly how

unlikely, using a psychologically realistic measurement of simplicity.

One of the major advances in the theory of induction in the last few decades has been a

historical reconciliation of the ideas of simplicity and truth—the idea that the simplest

hypothesis can be shown, under various assumptions, to be the most likely to be correct.

Thus Rissanen (1978) has shown that under broad assumptions the least complex

(minimum description length or MDL) hypothesis has the highest Bayesian posterior, and

Chater (1996) that the simplest or most “Pragna

¨nt” interpretation of a visual scene is likely

the most veridical. The null distribution of complexity presented here takes a step towards

a similar reconciliation in the realm of concept learning.

Acknowledgements

I am grateful to Tom Grifﬁths, Josh Tenenbaum, and two anonymous reviewers for

helpful comments. Preparation of this manuscript was supported by NSF SBR-9875175.

Note though that some researchers (e.g. Kubovy & Gilden, 1991) have disputed their explanation for this

phenomenon, while others (e.g. Kareev, 1992) have argued that human judgments are as reasonable as can be

expected given short-term memory limitations.

J. Feldman / Cognition 93 (2004) 199–224218

Appendix A. Algebraic complexity

This appendix gives a precis of an algebraic theory of concepts, explained more fully by

Feldman (2001).

We begin with a set of observations or objects x¼x1;x2;…xndeﬁned over Dfeatures

1…

D;each object is deﬁned by a certain conjunction of Dliterals (positive or negative

variable symbols). Our goal is to extract from these observations the set of regularities

that occur in it, where by “regularity” we mean a lawful relation that all nobjects satisfy.

We wish to consider regularities of the form “if object xsatisﬁes [some proposition],

then xwill have some property

;” such rules have the form of a “causal law.” Also, we

wish to distinguish between regularities that involve small numbers of features from those

that involve larger numbers of features, because the former entail more simple patterns,

while the latter entail more complex ones. Thus we deﬁne each regularity

Kas a formula

of the form

2…

0;ðA1Þ

where we have arbitrarily renumbered the regularities in order to show that there are Kof

them on the left side of the implication and one of them on the right. The number K;giving

the number of literals in the antecedent of the “law,” is called the degree of the regularity.

Thus

(meaning, “the object has property

”) has degree K¼0 (there is no antecedent).

AK¼1-degree regularity has the form

2;meaning “if the object has property

then it has property

2:” Again if a regularity

appears in the description of xit means that

all nobjects obey it.

Now consider the set ^

SðxÞof all regularities satisﬁed by a given observation set x:In its

raw form, this representation of the objects is highly redundant, because some regularities

are automatically entailed by others in the set. However, given certain notational

conventions, there is a unique minimal set of regularities that contains only those that are

essential in describing x;which is denoted SðxÞand called the power series expansion of x:

SðxÞis in a sense the smallest set of regularities whose transitive closure is ^

SðxÞ;that is, that

fully describes the original dataset.

The power series of xis so named because, like a Fourier or Taylor series, it

decomposes the structure of xinto components of differing degrees of complexity. All

object sets can be fully expressed by regularities of maximal degree K¼D21 (entailed

by the so-called representation theorem), in the same way that all periodic signals may be

expressed by sums of sine and cosine components at various frequencies (the Fourier

decomposition), or an analytic function can be expressed by a weighted sum of its

derivatives at various degrees (the Taylor series). A power series may or may not contain

regularities of lower than maximal degree; it does so only if contains some orderly

structure.

The function l

lxðKÞgiving the number of regularities at each degree Kcontained in

SðxÞis called the power spectrum of x(see Fig. 2). This function, giving the number of

regularities of degree Kthat are necessary in a minimal description of x;provides a very

useful summary of x’s regularity content. Because regularities at degree K¼0 are the

very simplest in form, a power series that contains many of them—i.e. has much power at

K¼0—has a relatively large amount of its structure explained by extremely simple rules.

J. Feldman / Cognition 93 (2004) 199–224 219

Similarly, series with power at K¼1 have structure that can be accounted for by

implicational rules; and so forth for higher degrees. If SðxÞcontains power primarily at

higher degrees it means that xis intrinsically complex in form; its structure can only be

explained by positing relatively complex rules.

A useful numeric measure of the overall complexity of xis its total spectral power

weighted by degree plus one ðKþ1Þ;

CðxÞ¼X

D21

K¼0ðKþ1Þl

lxðKÞ;ðA2Þ

which gives the average amount of spectral power contained in x’s spectrum. This is the

measure of complexity used in the current paper.

CðxÞhas a very direct interpretation as the total length in literals of the minimal

representation SðxÞ;because each regularity of degree Kcontributes exactly Kþ1 literals

to the total expression. As mentioned in the text, on a wide range of concepts tested by

Feldman (2000b), this number CðxÞgives very good quantitative prediction of human

subjects’ performance in a learning task (Fig. 3) accounting for more than half the variance

in performance (Feldman, 2001), well over twice as much as competing learning models

(e.g. ALCOVE, Kruschke, 1992). This result inspires the main premise of the current

paper, that CðxÞmakes a good model of subjective complexity.

Finally, note that a very useful consequence of partitioning the structure of xinto bins

by degree is it allows one to truncate the series, that is, to represent xusing only terms

below some ﬁxed degree, discarding higher-order terms. When higher order terms are

truncated, in general the observations xcan no longer be represented precisely; the

representation has been “simpliﬁed,” bringing beneﬁts for generalization. (Roughly,

higher order terms are more liable to represent “noise” or accidental structure in x;so

discarding them avoids overﬁtting.) For example truncating at K#1 (the so-called linear

terms) amounts to representing the observations only in terms of their constant and

implicational structure, and ignoring any regularities that require three or more symbols to

express. There is good evidence (again see Feldman, 2001) that human learners adopt

something like this linear truncation, termed a “bias towards linearity.”

Appendix B. Elements of the null distribution of complexity

This appendix discusses some internal elements of the distribution pðCÞ:For reasons

discussed below, the model given is approximate rather than analytic. However, several

aspects of the internal workings of the distribution are interesting, and moreover shed light

on why the distribution is approximately Poisson; thus I delve more deeply into the

distribution here for the beneﬁt of interested readers.

We seek to model the distribution pðC½xÞ of the parameter Cwhen the object

set x¼x1;x2;…;xn;deﬁned over DBoolean features, is chosen at random. By “random”

we mean that each of the 2Dpossible objects is included in xwith probability 1/2. This

means that the number n¼lxlof objects will itself be distributed binomially with mean

J. Feldman / Cognition 93 (2004) 199–224220

2D=2¼2D21:As explained in Appendix A and Feldman (2001), complexity Cis the total

number of literals in the minimal power series SðxÞ:

The situation can be broken down by degree K:For each value of K;there are a certain

total number ND;Kof regularities. To visualize how these can be counted, note that each

regularity is deﬁned by deleting (prohibiting) one cell from a column, side, or “hyper-side”

(depending on K) of the Boolean D-cube. (The regularity asserts that observed objects do

not fall in this cell.) For example, with D¼3;aK¼1 regularity prohibits one of the four

corners of one two-dimensional side of the three-dimensional Boolean cube. (e.g. the

K¼1 regularity

2inhabits the k

2lplane of the 3-cube, and prohibits the single

cell

1

2:) In general, a hyperside is deﬁned by choosing Kþ1 of the Dfeatures (Kfor

the features on the left side of Eq. (A1), one for the single feature on the right side). Given

a hyperside, a regularity is deﬁned by selecting one of its 2Kþ1vertices to be prohibited.

Hence we see that the total number of regularities at a ﬁxed level of Dand Kis given by

ND;K¼

Kþ1

2Kþ1;ðB1Þ

where the notation ðD

Kþ1Þdenotes the binomial coefﬁcient D!=½ðD2K21Þ!ðKþ1Þ!;

which gives the number of ways Dobjects can be chosen Kþ1 at a time.

Each regularity has a certain probability pD;K;nof randomly occurring in (being satisﬁed

by) a randomly selected object set of size n¼lxl;this probability is given by

pD;K;n¼Y

n21

i¼0

2D22D2K212i

2D2i

;ðB2Þ

which after algebraic manipulation is equivalent to the closed-form expression

pD;K;n¼ð2D22D2K21Þ!ð2D2nÞ!

ð2D22D2K212nÞ!ð2DÞ!:

(To see where this formula comes from, consider that the product in Eq. (B2) multiplies

together a series of nfractions, the numerators of which are the nintegers running

downwards from 2D22D2K21;and the denominators of which are the nintegers running

downwards from 2D:)

The complex form of Eq. (B2) is due to the fact we are sampling objects without

replacement; that is, once we have evaluated a single object with respect to the regularity,

we move on to consider the remaining objects, which are reduced by one in number (this

reduction is indexed by iin Eq. (B2)). At each step, we ask whether the observed object

falls within the region that satisﬁes the regularity. Of the 2D2iobjects remaining, 2D2K2i

are prohibited by the regularity, and the rest are allowed, which yields the ratio given in the

formula.

The number of regularities of degree Kin the minimal power series of a random object

set is just the number of successes among ND;Kapproximately independent Bernoulli trials

with success probability pD;K;nand failure probability 1 2pD;K;n;and thus will follow a

binomial distribution. Each regularity of degree Kthat occurs in the ﬁnal representation

(minimal power series) contributes Kþ1 literals to the total complexity. Hence

J. Feldman / Cognition 93 (2004) 199–224 221

the complexity distribution we seek is that of the random variable formed by the sum of a

number of independent binomially distributed random variables. By the central limit

theorem, such a distribution will tend to normality as the number of contributing

distributions grows large. This number grows exponentially with D:(At each ﬁxed value

of D;there is one contributing distribution per value of Kand n:There are Dpossible

values of K;and 2Dpossible values of n;for a total of D2Ddistributions all together.)

Hence we can expect the distribution to grow more normal rapidly as Dincreases—exactly

as can be seen from Fig. 4, in which the plotted curves for D¼5 and 6 deviate from

normality by (respectively) less than one part in 200 and less than one part in 400.

In practice this approach to calculating the null distribution has several drawbacks.

First, it is a very complicated summation, because as mentioned the number of binomial

distributions we are summing up grows exponentially with D:Second, there are several

imperfect assumptions in the above that are difﬁcult to correct for. One is that different

regularities (our Bernoulli trials) are not actually all independent. Second, and more

importantly, many regularities that occur by chance will not appear in the minimal power

series, because they are logically entailed by other regularities of smaller K:Thus the

above substantially overestimates the size of the actual power series. Hence the above

description is deﬁnitely not exactly correct.

Fortunately, there is a better way of getting a good approximation for pðCÞ:The

probability pD;K;nhere is usually small: any one regularity has little chance of being

satisﬁed by a randomly chosen object set (for example with D¼4;n¼8;and K¼0 and

1, pD;K;nwill be respectively 1=12;870 and 0.0385). In this situation the distribution will

tend to be Poisson (see Wickens, 1982): that is, approximately normal with equal mean

and variance.

Again, this is visibly conﬁrmed by the pattern of distributions in Fig. 4,in

which the variance of the distributions scales up almost exactly linearly with the mean of

the distribution (again, r¼0:9994 in our sample). This single parameter is usually called

in the context of a Poisson distribution, which has general form

pðXÞ¼ e2

X!:ðB3Þ

Hence our task is reduced to estimating this single parameter

D¼

D;which can

be thought as the expected (mean) number of literals in a typical object set containing a

typical number nof objects at a typical degree K:Because we are assuming random

objects generated with each binary feature having probability 1/2, the expected number of

objects is 2D21(that is, half the maximum of 2D); this is the “typical” value of n:Similarly,

the “expected” value of Kgrows with D(recall that Kranges from 0 to D21). Thus the

total number ND;Kof regularities (Eq. (B1)), which is exponential in K;is also exponential

in D:Hence we have good reason to expect the expected mean

D(which is proportional

to ND;K) to grow exponentially with D;as in Eq. (1), with exponential growth parameters

and

:It then remains only to estimate values of

and

from the known values of

Even without considering the Poisson approximation, the approximate equality of the mean and variance

follows from the fact that the probability of success pD;K;nis close to zero. The number of successes in Ntrials of

probability pis approximately normal with mean Np and variance Npð12pÞ:When pis approximately zero,

12pis approximately 1, and Np is approximately the same as Npð12pÞ:

J. Feldman / Cognition 93 (2004) 199–224222

(2.00, 4.91, 12.11, 28.01, 63.87 for D¼2;3, 4, 5, 6 respectively, plotted in Fig. 6), as is

done in the text. This method has the advantage of avoiding all the modeling subtleties

mentioned above, as these numbers reﬂect the true values of C;and hence

D;after all

relevant subleties have been taken into account.

The predicted value of

given by Eq. (1) can now be used to execute the quick-and-

dirty z-test for pattern randomness (Eq. (4)), using

and

¼ﬃﬃ

p¼ﬃﬃ

pas

parameters, as illustrated in the text.

References

Anderson, J. R. (1991). The adaptive nature of human categorization. Psychological Review,98(3), 409– 429.

Barlow, H. B. (1974). Inductive inference, coding, perception, and language. Preception,3, 123 – 134.

Bongard, M. (1970). Pattern recognition. New York: Spartan Books.

Broadbent, S. (1980). Simulating the ley hunter. Journal of the Royal Statistical Society A,143(2), 109 – 140.

Chaitin, G. J. (1995). The Berry paradox. Complexity,1(1), 26–30.

Chater, N. (1996). Reconciling simplicity and likelihood principles in perceptual organization. Psychological

Review,103(3), 566– 581.

Dixon, P. (1993). Why scientists value pvalues. Psychonomic Bulletin and Review,5(3), 390 – 396.

Duda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern classiﬁcation. New York: Wiley.

Falk, R., & Konold, C. (1997). Making sense of randomness: Implicit encoding as a basis for judgment.

Psychological Review,104(2), 301– 318.

Feldman, J. (2000a). Bias toward regular form in mental shape spaces. Journal of Experimental Psychology:

Human Perception and Performance,26(1), 1– 14.

Feldman, J. (2000b). Minimization of Boolean complexity in human concept learning. Nature,407, 630 – 633.

Feldman, J (2001). An algebra of human concept learning. Under review.

Feldman, J. (2003). A catalog of Boolean concepts. Journal of Mathematical Psychology,47(1), 98 –112.

Feldman, J (2003). The simplicity principle in human concept learning. Current Directions in Psychological

Science, 12(6), 227– 232.

Gilovich, T., Vallone, R., & Tversky, A. (1985). The hot hand in basketball: On the misperception of random

sequences. Cognitive Psychology,17(3), 295– 314.

Givone, D. D. (1970). Introduction to switching circuit theory. New York: McGraw Hill.

Grifﬁths, T. L., & Tenenbaum, J. B. (2001). Randomness and coincidences: Reconciling intuition and probability

theory. Proceedings of the 23rd Annual Conference of the Cognitive Science Society, 370– 375.

Hochberg, J., & McAlister, E. (1953). A quantitative approach to ﬁgural “goodness”. Journal of Experimental

Psychology,46, 361– 364.

Jaynes, E. T. (1983). Conﬁdence intervals vs Bayesian intervals. In R. D. Rosenkrant (Ed.), E. T. Jaynes: Papers

on probability, statistics and statistical physics (pp. 757 – 804). Dordrecht: Reidel.

Kanizsa, G. (1979). Organization in vision: Essays on Gestalt perception. New York: Praeger.

Kareev, Y. (1992). Not that bad after all: Generation of random sequences. Journal of Experimental Psychology:

Human Perception and Performance,18(4), 1189– 1194.

Kendall, D. G., & Kendall, W. S. (1980). Alignments in two-dimensional random sets of points. Advances in

Applied Probability,12, 380– 424.

Kruschke, J. (1992). ALCOVE: An exemplar-based connectionist model of category learning. Psychological

Review,99(1), 22–44.

Kubovy, M., & Gilden, D. (1991). Apparent randomness is not always the complement of apparent order. In

G. Lockhead, & J. Pomerantz (Eds.), The perception of structure: Essays in honor of Wendell Garner.

Washington, DC: MIT Press.

Li, M., & Vita

´nyi, P. (1997). An introduction to Kolmogorov complexity and its applications. York: Springer.

Loftus, G. R. (1991). On the tyranny of hypothesis testing in the social sciences. Contemporary Psychology,

36(2), 102– 105.

J. Feldman / Cognition 93 (2004) 199–224 223

Martin-Lo

¨f, P. (1996). The deﬁnition of random sequences. Information and Control,9, 602–619.

McKinley, S. C., & Nosofsky, R. M (1993). Attention learning in models of classiﬁcation (Cited in Nosofsky,

Palmeri, and McKinley, 1994).

Medin, D. L., Altom, M. W., Edelson, S. M., & Freko, D. (1982). Correlated symptoms and simulated medical

classiﬁcation. Journal of Experimental Psychology: Learning, Memory, and Cognition,8, 37– 50.

Medin, D. L., Altom, M. W., & Murphy, T. D. (1984). Given versus induced category representations: Use of

prototype and exemplar information in classiﬁcation. Journal of Experimental Psychology: Learning,

Memory, and Cognition,10(3), 333– 352.

Medin, D. L., & Schaffer, M. M. (1978). Context model of classiﬁcation learning. Psychological Review,85,

207– 238.

Neisser, U., & Weene, P. (1962). Hierarchies in concept attainment. Journal of Experimental Psychology,64(6),

640– 645.

Nosofsky, R. M., & Johansen, M. K. (2000). Exemplar-based accounts of “multiple-system” phenomena in

perceptual categorization. Psychonomic Bulletin and Review,7(3), 375– 402.

Nosofsky, R. M., Palmeri, T. J., & McKinley, S. C. (1994). Rule-plus-exception model of classiﬁcation learning.

Psychological Review,101(1), 53– 79.

Palmeri, T. J., & Nosofsky, R. M. (2001). Central tendencies, extreme points, and prototype enhancement effects

in ill-deﬁned perceptual categorization. The Quarterly Journal of Experimental Psychology,54A(1),

197– 235.

Pavel, M., Gluck, M. A., & Henkle, V. (1988). Generalization by humans and multi-layer networks. Proceedings

of the 10th Annual Conference of the Cognitive Science Society.

Polya, G. (1954). Mathematics and plausible reasoning. Princeton, NJ: Princeton University Press.

Quine, W. (1965). On simple theories of a complex world. In M. H. Foster, & M. L. Martin (Eds.), Probability,

conﬁrmation, and simplicity: Readings in the philosophy of inductive logic. New York: Odyssey.

Rissanen, J. (1978). Modeling by shortest data description. Automatica,14, 465 – 471.

Scho

¨ning, U., & Pruim, R. (1998). Gems of theoretical computer science. Berlin: Springer.

Shepard, R., Hovland, C. L., & Jenkins, H. M. (1961). Learning and memorization of classiﬁcations.

Psychological Monographs: General and Applied,75(13), 1– 42.

Smith, J. D., & Minda, J. P. (2000). Thirty categorization results in search of a model. Journal of Experimental

Psychology: Learning Memory and Cognition,26(1), 3– 27.

Sober, E. (1975). Simplicity. London: Oxford University Press.

Tenenbaum, J (1999). A Bayesian framework for concept learning. Unpublished doctoral dissertation,

Massachusetts Institute of Technology.

Wegener, I. (1987). The complexity of Boolean functions. Chichester: Wiley.

Wickens, T. D. (1982). Models for behavior: Stochastic processes in psychology. San Francisco: Freeman.

J. Feldman / Cognition 93 (2004) 199–224224

Visuo-Locomotive Complexity as a Component of Parametric Design for Architecture

Chapter

Full-text available

Apr 2021

A people-centred approach for designing large-scale built-up spaces necessitates systematic anticipation of user’s embodied visuo-locomotive experience from the viewpoint of human-environment interaction factors pertaining to aspects such as navigation, wayfinding, usability. In this context, we develop a behaviour-based visuo-locomotive complexity model that functions as a key correlate of cognitive performance vis-a-vis internal navigation. We also demonstrate the model’s implementation and application as a parametric tool for the identification and manipulation of the architectural morphology along a navigation path as per the parameters of the proposed visuospatial complexity model. We present examples based on an empirical study in two healthcare buildings, and showcase the manner in which a dynamic and interactive parametric (complexity) model can promote behaviour-based decision-making throughout the design process to maintain desired levels of visuospatial complexity as part of a navigation or wayfinding experience.

Linear Separability, Irrelevant Variability, and Categorization Difficulty

Article

Full-text available

Apr 2021

In rule-based (RB) category-learning tasks, the optimal strategy is a simple explicit rule, whereas in information-integration (II) tasks, the optimal strategy is impossible to describe verbally. This study investigates the effects of two different category properties on learning difficulty in category learning tasks-namely, linear separability and variability on stimulus dimensions that are irrelevant to the categorization decision. Previous research had reported that linearly separable II categories are easier to learn than nonlinearly separable categories, but Experiment 1, which compared performance on linearly and nonlinearly separable categories that were equated as closely as possible on all other factors that might affect difficulty, found that linear separability had no effect on learning. Experiments 1 and 2 together also established a novel dissociation between RB and II category learning: increasing variability on irrelevant stimulus dimensions impaired II learning but not RB learning. These results are all predicted by the best available measures of difficulty in RB and II tasks. (PsycInfo Database Record (c) 2021 APA, all rights reserved).

How Perceived Complexity Impacts on Comfort Zones in Social Decision Contexts—Combining Gamification and Simulation for Assessment

Chapter

Jan 2022

This paper is about the ambiguous love-hate relationship people have with complexity in social decision contexts: There seems to be a tipping point where increasing complexity seen as exciting and satisfying turns to overwhelming and annoying nuisance. People tend to have an intuitive understanding about what constitutes a complex situation. The paper investigates this intuition to find out more about complexity in a bottom-up approach where a complexity definition would emerge from people’s intersubjective understanding. It therefore looks for the relation between a subjective feeling individual people might arbitrarily share with others by chance, and objective, measurable features underlying a decision situation. The paper combines gamification and simulation to address these questions. By increasing complexity in a gamified social decision situation, empirical data is generated about people’s complexity intuitions. The empirical games are then simulated—calibrated by the gamification setting for producing artificial data. The analysis compares the ratings of perceived complexity and satisfaction in empirical games with a set of metrics derived from the simulations. Correlations between participants’ ratings and simulation metrics provide insights into the complexity experience: Sentiments about complexity may be related to objective features that enable a bottom-up definition of measurable social complexity.KeywordsGamificationSimulationSocial complexity measures

Developing a Stakeholder-Centric Simulation Tool to Support Integrated Mobility Planning

Chapter

Mar 2022

Simulation tools aimed at enhancing cross-sectoral cooperation can support the transition from a traditional transport planning approach based on predictions towards more integrated and participatory urban mobility planning. This shift entails a broader appraisal of urban dynamics and transformations in the policy framework, capitalizing on new developments in urban modelling. In this paper, we argue that participatory social simulation can be used to solve these emerging challenges in mobility planning. We identify the functionalities that such a tool should have when supporting integrated mobility planning. Drawing on a transdisciplinary case study situated in Potsdam, Germany, we address through interviews and workshops stakeholders’ needs and expectations and present the requirements of an actionable tool for practitioners. As a result, we present three main challenges for participatory, simulation-based transport planning, including: (1) enhancement of the visioning process by testing stakeholders’ ideas under different scenarios and conditions to visualise complex urban relationships; (2) promotion of collective exchange as means to support stakeholder communication; and (3) credibility increase by early stakeholder engagement with model development. We discuss how our participatory modelling approach helps us to better understand the gaps in the knowledge of the planning process and present the coming steps of the project.KeywordsSustainable urban mobility planningIntegrated planning approachParticipatory modelling

Quantum Leaper: A Methodology Journey From a Model in NetLogo to a Game in Unity

Chapter

Jan 2022

Combining Games and Agent-Based Models (ABMs) in a single research design (i.e. GAM design) shows potential for investigating complex past, present, or future social phenomena. Games offer engaging environments that can help generating insights into social dynamics, perceptions, and behaviours, while ABMs support the representation and analysis of complexity. We present here the first attempt to “discipline” the interdisciplinary endeavour of developing a GAM design in which an ABM is transformed into a game, thus the two becoming intertwined in one application. When doing this, we use as a GAM design exemplar the process of developing Quantum Leaper, a proof-of-concept video game made in Unity software and based on the NetLogo implementation of the well known “Artificial Anasazi” ABM. This study aims to consolidate the methodology component of the GAM field by proposing the GAM Reflection Framework, a tool that can be used by GAM practitioners, ABM modellers, or game designers looking for methodological guidance with developing an agent-based model that is a game (i.e. an agent-based game).

Cruising Drivers’ Response to Changes in Parking Prices in a Serious Game

Chapter

Jan 2022

Scarcity of on-street parking in cities centers is a known factor motivating drivers to drive slowly (“to cruise”) while searching for an available parking place and is associated with negative externalities e.g., congestion, accidents, fuel waste and air pollution. Finding the correct prices is suggested to bring cruising to a sustainable level. However, current research methods based on surveys and simulations fail to provide a full understanding of drivers’ cruising preference and their behavioral response to price changes. We used the PARKGAME serious game, which provides a real-world abstraction of the dynamic cruising experience. Eighty-three players participated in an experiment under two pricing scenarios. Pricing was spatially designed as “price rings”, decreasing when receding from the desired destination point. Based on the data, we analyzed search time, parking distance, parking location choice and spatial searching patterns. We show that such a pricing policy may substantially reduce the cruising problem, motivating drivers to park earlier—further away from the destination or in the lot, especially when occupancy levels are extremely high. We further discuss the policy implications of these findings.KeywordsSerious gamesCruisingDriver behaviorParking search

La compression de l’information dans l’apprentissage de catégories artificielles

Article

Oct 2012

Fabien Mathy

Résumé Les recherches sur la compressibilité de l’information ont permis de donner un regain d’intérêt aux modèles de la catégorisation artificielle fondés sur des règles. Quelques modèles de compression sont décrits, en explicitant l’architecture mentale qui leur est sous-jacente. Plusieurs résultats expérimentaux tendent à valider un modèle multi-agent ayant pour objectif de formaliser à la fois les limitations de capacités et les contraintes exécutives de la mémoire de travail. D’après ce modèle, les sujets utilisent des règles statiques dans lesquelles l’ordonnancement de l’information d’un stimulus à l’autre n’est pas variable, ce qui conduit à des patterns de temps de réponse caractéristiques. D’autres travaux expérimentaux montrent que les sujets cumulent diverses stratégies afin de compresser l’information de façon maximale, et que l’apprentissage peut progresser tant que l’information n’est pas réduite à son strict minimum. Un problème fondamental pour la modélisation future dans le domaine de la catégorisation est d’intégrer les diverses stratégies utilisées par les sujets pour faciliter l’apprentissage.

A QFD Approach for Selection of Design for Logistics Strategies - Design For Tomorrow Volume2

Conference Paper

Full-text available

May 2021

A QFD Approach for Selection of Design for Logistics Strategies

Chapter

Apr 2021

Designing products considering logistic costs and improved customer service in the field of supply chain management is termed as design for logistics (DfL). By improving the design of the products for logistics, organizations can reduce the costs spend on logistic and delivery. Some of the critical factors of adopting DfL are ease of transport, ease of packaging, ease of loading/unloading, minimize logistic cost, and so on. To improve the DfL characteristics at the product design stage, five strategies, namely flat packaging strategy, design for non-circular subparts, modular design principles, and design for ease of fabrication, have been identified. Quality function deployment (QFD) approach, a successful method often used for new product development, was used in the selection of strategies for designing products from a logistics viewpoint. The results of QFD show that flat packaging strategy and ease of handling are the critical DfL strategies for the improvement of logistics characteristics of the product at the design stage. The methodology has been tested through a real-case application in a packaged drinking water manufacturing organization.

A Qualitative Study of Global Design Practices to Build Directions and Opportunities for Indian Social Design

Chapter

Apr 2021

The design industry including design knowledge and practice is constantly evolving itself to fulfill the evolving needs of society in order to deliver a larger social impact. Social design is a field that stresses the positioning, responsibility and collective societal impact of designers and designed products. The underdeveloped and developing world can benefit largely from social design interventions. India is a developing country with social problems like poverty and poor health care—giving a large scope for social design work. However, major development of social design is limited to other countries. A qualitative study was conducted with 14 global design practitioners to understand the social design theory, knowledge and practice existing globally and identify directions and opportunities for the Indian context, by using a cultural lens. The study suggests creating open dialogue, simplifying vocabulary, bias-busting, fostering cross-cultural social design and encouraging mindfulness as the main opportunities. These opportunities can be applied both in social design practice and design education.

An Introduction to Kolmogorov Complexity and Its Applications

Book

Full-text available

Jan 1997

Rule-Plus-Exception Model of Classification Learning

Article

Full-text available

Jan 1994

The authors propose a rule-plus-exception model (RULEX) of classification learning. According to RULEX, people learn to classify objects by forming simple logical rules and remembering occasional exceptions to those rules. Because the learning process in RULEX is stochastic. the model predicts that individual Ss will vary greatly in the particular rules that are formed and the exceptions that are stored. Averaged classification data are presumed to represent mixtures of these highly idiosyncratic rules and exceptions. RULEX accounts for numerous fundamental classification phenomena, including prototype and specific exemplar effects, sensitivity to correlational information, difficulty of learning linearly separable versus nonlinearly separable categories, selective attention effects, and difficulty of learning concepts with rules of differing complexity. RULEX also predicts distributions of generalization patterns observed at the individual subject level.

An Introduction to Kolmogorov Complexity and Its Applications

Book

Jan 1993

Alignments in two-dimensional random sets of points

Article

Jun 1980

Let n points in the plane be generated by some specified random mechanism and suppose that N (∊) of the resulting triads form triangles with largest angle ≧ π – ∊. The main object of the paper is to obtain asymptotic formulae for and Var ( N (∊)) when ∊ ↓ 0, and to solve the associated data-analytic problem of testing whether an empirical set of n points should be considered to contain too many such ∊-blunt triads in the situation where the generating mechanism is unknown and where all that can be said about the tolerance ∊ is that it must be allowed to take values anywhere in a given interval ( T 0 , T 1 ) (0 < T 0 < T 1 ). This problem is solved by the introduction of a plot to be called the pontogram and by the introduction of simulation-based significance tests constructed by random lateral perturbations of the data.

Gems of Theoretical Computer Science

Chapter

Jan 1998

This chapter begins with a question from predicate logic, namely to determine the set of all (sizes of) finite models of a given formula. It turns out that there is an amazingly close relationship between this question and the world of P and NP.

Simplicity

Book