PreprintPDF Available
Preprints and early-stage research may not have been peer reviewed yet.

Abstract

GPT-3 is based on distributional semantics. Warren Weaver had the basic idea in his 1949 memorandum, “Translation” (p. 8). Gerard Salton operationalized the idea in his work using vector semantics for document retrieval in the 1960s and 1970s (p. 9). Since then distributional semantics has developed as an empirical discipline. The last decade of work in NLP has seen remarkable, even astonishing, progress. And yet we lack a robust theoretical framework in which we can understand and explain that progress. Such a framework must also indicate the inherent limitations of distributional semantics. This document is a first attempt to outline such a framework, as such its various formulations must be seen as speculative and provisional. I offer them so that others may modify them, replace them, and move beyond them.
GPT-3: Waterloo or Rubicon?
Here be Dragons
A Working Paper • Version 4.15.7.2022
William L. Benzon
2
GPT-3 is a significant achievement.
But I fear the community that has created it may, like other
communities have done before machine translation in the mid-
1960s, symbolic computing in the mid-1980s, triumphantly walk
over the edge of a cliff and find itself standing proudly in mid-air.
This is not necessary and certainly not inevitable.
A great deal has been written about GPTs and transformers more generally, both in
the technical literature and in commentary of various levels of sophistication. I have
read only a small portion of this. But nothing I have read indicates any interest in the
nature of language or mind. Interest seems relegated to the GPT engine itself. And yet
the product of that engine, a language model, is opaque. I believe that, if we are to
move to a level of accomplishment beyond what has been exhibited to date, we must
understand what that engine is doing so that we may gain control over it. We must
think about the nature of language and of the mind.
That is what this working paper sets out to achieve, a beginning point, and only that.
By attending to ideas by Adam Neubig, Julian Michael, and Sydney Lamb, and by
extending them through the geometric semantics of Peter Gärdenfors, we can create a
framework in which to understand language and mind, a framework that is
commensurate with the operations of GPT-3. That framework can help us to
understand what GPT-3 is doing when it constructs a language model, and thereby to
gain control over that model so we can enhance and extend it.
It is in that speculative spirit that I offer the following remarks.
Bill Benzon, August 20, 2020
About the cover image: I created the background pattern in December, 1985, using MacPaint
running on a 1984 128K Apple Macintosh. I used a current version of Adobe’s Photoshop to overlay the
aleph.
GPT-3: Waterloo or Rubicon? Here be Dragons
William Benzon
Version 4.1, May 7, 2022
Abstract: GPT-3 is an AI engine that generates text in response to a prompt
given to it by a human user. It does not understand the language that it produces,
at least not as philosophers understand such things. And yet its output is in many
cases astonishingly like human language. How is this possible? Think of the mind
as a high-dimensional space of signifieds, that is, meaning-bearing elements.
Correlatively, text consists of one-dimensional strings of signifiers, that is, linguistic
forms. GPT-3 creates a language model by examining the distances and ordering
of signifiers in a collection of text strings and computes over them so as to reverse
engineer the trajectories texts take through that space. Peter Gärdenfors’ semantic
geometry provides a way of thinking about the dimensionality of mental space
and the multiplicity of phenomena in the world, about how mind mirrors the
world. Yet artificial systems are limited by the fact that they do not have a
sensorimotor system that has evolved over millions of years. They do have
inherent limits.
Contents
0. Starting point and preview ........................................................................................................... 1!
1. Computers are strange beasts ....................................................................................................... 4!
2. No meaning, no how: GPT-3 as Rubicon and Waterloo, a personal view .................................. 7!
3. The brain, the mind, and GPT-3: An “isometric transform” onto meaning space ................... 15!
4. Why is simple arithmetic difficult for deep learning systems? .................................................... 20!
5. Metaphysics: The dimensionality of mind and world ................................................................ 22!
6. Gestalt switch: GPT-3 as a model of the mind ........................................................................... 28!
7. Engineered intelligence at liberty in the world ........................................................................... 30!
Appendix: Semanticity, adhesion and relationality ........................................................................ 34!
1301 Washington St., Apt. 311
Hoboken, NJ 07030
646.599.3232
bbenzon@mindspring.com
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.
1
0. Starting point and preview
GPT-3 is based on distributional semantics. Warren Weaver had the basic idea in his 1949
memorandum, “Translation” (p. 7). Gerard Salton operationalized the idea in his work using
vector semantics for document retrieval in the 1960s and 1970s (p. 8). Since then distributional
semantics has developed as an empirical discipline. The last decade of work in NLP has seen
remarkable, even astonishing, progress. And yet we lack a robust theoretical framework in which
we can understand and explain that progress. Such a framework must also indicate the inherent
limitations of distributional semantics. This document is a first attempt to outline such a
framework, as such its various formulations must be seen as speculative and provisional. I offer
them so that others may modify them, replace them, and move beyond them.
It started with a comment at a blog
On July 19, 2020, Tyler Cowen made a post to Marginal Evolution entitled “GPT-3, etc.”
1
It
consisted of an email from a reader who asserted, “When future AI textbooks are written, I could
easily imagine them citing 2020 or 2021 as years when preliminary AGI first emerged. This is
very different than my own previous personal forecasts for AGI emerging in something like 20-50
years…” While I have my doubts about the concept of AGI it’s too ill-defined to serve as
anything other than a hook on which to hang dreams, anxieties, and fears I think GPT-3 is
worth serious consideration.
Cowen’s post has attracted 52 comments so far, more than a few of acceptable or even high
quality. I made a long comment to that post. I then decided to expand that comment into a series
of blog posts, say three or four, and then to collect them into a single document as a working
paper. When it appeared that those three or four posts would grow to five or six I decided that I
would issue two working papers. This first one would concentrate on GPT-3 and the nature of
artificial intelligence, or whatever it is. The later ones would speculate about the future and take a
quick tour of the past.
Here is a slightly revised version of the comment I made at Marginal Revolution. This paper
covers the shaded material. The rest may be covered in later papers.
2
Yes, GPT-3 [may] be a game changer. But to get there from here we need to rethink a
lot of things. And where that's going (that is, where I think it best should go) is more
than I can do in a comment.
Right now, we're doing it wrong, headed in the wrong direction. AGI, a really good
one, isn't going to be what we're imagining it to be, e.g. the Star Trek computer.
Think AI as platform, not feature (Andreessen).
3
Obvious implication, the basic
computer will be an AI-as-platform. Every human will get their own as a very young
child. They're grow with it; it’ll grow with them. The child will care for it as with a pet.
Hence we have ethical obligations to them. As the child grows, so does the pet the pet
will likely have to migrate to other physical platforms from time to time.
1
Tyler Cowen, GPT-3, Marginal Revolution, blog post, July 19, 2020,
https://marginalrevolution.com/marginalrevolution/2020/07/gpt-3-etc.html.
2
None of them have been written as of April 26, 2022.
3
Is AI a feature or a platform? [machine learning, artificial neural nets], New Savanna, blog post,
December 13, 2019, https://new-savanna.blogspot.com/2019/12/is-ai-feature-or-platfrom-machine.html.
2
Machine learning was the key breakthrough. Rodney Brooks Gengis, with its
subsumption architecture, was a key development as well, for it was directed at robots
moving about in the world. FWIW Brooks has teamed up with Gary Marcus and they
think we need to add some old school symbolic computing into the mix. I think they’re
right.
Machines, however, have a hard time learning the natural world as humans do. We're
born primed to deal with that world with millions of years of evolutionary history
behind us. Machines, alas, are a blank slate.
The native environment for computers is, of course, the computational environment.
That's where to apply machine learning. Note that writing code is one of GPT-3's skills.
So, the AGI of the future, let's call it GPT-42, will be looking in two directions, toward
the world of computers and toward the human world. It will be learning in both, but in
different styles and to different ends. In its interaction with other artificial computational
entities GPT-42 is in its native milieu. In its interaction with us, well, we'll necessarily be
in the driver’s seat.
Where are we with respect to the hockey stick growth curve? For the last 3/4 quarters of
a century, since the end of WWII, we've been moving horizontally, along a plateau,
developing tech. GPT-3 is one signal that we've reached the toe of the next curve. But
to move up the curve, as I’ve said, we have to rethink the whole shebang.
We're IN the Singularity. Here be dragons.
[Superintelligent computers emerging out of the FOOM is bullshit.]
* * * * *
ADDENDUM: A friend of mine, David Porush, has reminded me that Neal
Stephenson has written of such a tutor in The Diamond Age: Or, A Young Lady's Illustrated
Primer (1995).
4
I then remembered that I have played the role of such a tutor in real life,
The Freedoniad: A Tale of Epic Adventure in which Two BFFs Travel the Universe
and End up in Dunkirk, New York.
5
* * * * *
Here’s what is in the rest of this paper:
1. Computers are strange beastsThey’re obviously inanimate, and yet we communicate
with them through language. The don’t fit pre-existing (19th century?) conceptual categories, and
so we are prone to strange views about them.
4
The Diamond Age, Wikipedia: https://en.wikipedia.org/wiki/The_Diamond_Age.
5
The Freedoniad: A Tale of Epic Adventure in which Two BFFs Travel the Universe and End up in
Dunkirk, New York, New Savanna, blog post, February 12, 2019, http://new-
savanna.blogspot.com/2014/10/the-freedoniad-tale-of-epic-adventure.html.
3
2. No meaning, no how: GPT-3 as Rubicon and Waterloo, a personal view Arguing
from first principles it is clear that GPT-3 lacks understanding and access to meaning. And yet it
produces very convincing simulacra of understanding. But common sense understanding remains
elusive, as it did for old school symbolic processing. Much of common sense is deeply embedded
in the physical world. GPT-3, as it currently functions is, in effect, an artificial brain in a vat.
3. The brain, the mind, and GPT-3: Dimensions and conceptual spaces GPT-3
creates a language model by examining the distances and ordering of signifiers in a collection of
text strings and computes over them so as to reverse engineer but the trajectories texts take through
a high-dimensional mental space of signifieds. Peter Gärdenfors’ semantic geometry provides a way
of thinking about the dimensionality of mental space and the multiplicity of phenomena in the
world.
4. Why is simple arithmetic difficult for deep learning systems? This difficulty
suggests that it is unable to distinguish between episodic and semantic memory, a distinction
introduced in the previous section.
5. Metaphysics: The dimensionality of mind and worldI suggest the large-scale
language models, such as GPT-3, give evidence of what I am calling the metaphysical structure of
the world, which is a function of how human minds categorize things.
6. Gestalt switch: GPT-3 as a model of the mind GPT-3 creates: 1) a model of a body of
natural language texts, and only a model. 2) Those texts are the product of human minds. 3)
Though the application of 2 to 1 we may conclude that GPT-3 is also a model of the mind, albeit
a very limited one. 3 requires a Gestalt switch.
7. Engineered intelligence at liberty in the world The “intelligence” in systems such as
GPT-3 is static and reactive. To liberate and mobilize it we need to endow AI systems with
mental models of the kind investigated in “old school” symbolic AI.
Appendix: Semanticity, adhesion and relationality: I pick up from the passage in section
7 where I distinguish between a relational aspect and an intentional aspect of meaning. I now
distinguish between intentionality and semanticity, where semanticity consists of relationality and
adhesion.
4
1. Computers are strange beasts
The purpose of this working paper and the next is to set out a vision for the evolution of artificial
intelligence beyond GPT-3 (GPT: Generative Pre-trained Transformer). As I explain in the next
section, “No meaning, no how”, GPT-3 is both a remarkable achievementwe are now at sea in
the Singularity and there is no turning back and a temptation to continue with what has worked
so far. Thus the title of this paper suggests both possibilities, GPT-3: Waterloo or Rubicon?
Here be dragons. No doubt some are already yielding to temptation and itching to build more
of the same, but others are actively resisting that impulse, and have been for awhile. What will
happen?
Of course I don’t know what will happen, but I have preferences.
The purpose of this series is to lay out those preferences. In the next section I quote extensively
from an article David Hays and I published in 1990, the first in a series of essays in which we
outlined a view of human cultural evolution over the longue durée. I conclude with some
observations of the value of being oldyou’ve had plenty of failure from which to recover.
Beyond AGI
In “The Evolution of Cognition”
6
David Hays and I argued that the long-term evolution of
human culture flows from the architectural foundations of thought and communication: first
speech, then writing, followed by systematized calculation, and most recently, computation. In
discussing the importance of the computer we remark:
One of the problems we have with the computer is deciding what kind of thing it is, and
therefore what sorts of tasks are suitable to it. The computer is ontologically ambiguous.
Can it think, or only calculate? Is it a brain or only a machine?
The steam locomotive, the so-called iron horse, posed a similar problem for people at
Rank 3. It is obviously a mechanism and it is inherently inanimate. Yet it is capable of
autonomous motion, something heretofore only within the capacity of animals and
humans. So, is it animate or not? Perhaps the key to acceptance of the iron horse was
the adoption of a system of thought that permits separation of autonomous motion from
autonomous decision. The iron horse is fearsome only if it may, at any time, choose to
leave the tracks and come after you like a charging rhinoceros. Once the system of
thought had shaken down in such a way that autonomous motion did not imply the
capacity for decision, people made peace with the locomotive.
The computer is similarly ambiguous. It is clearly an inanimate machine. Yet we
interact with it through language; a medium heretofore restricted to communication
with other people. To be sure, computer languages are very restricted, but they are
languages. They have words, punctuation marks, and syntactic rules. To learn to
program computers we must extend our mechanisms for natural language.
As a consequence it is easy for many people to think of computers as people. Thus
Joseph Weizenbaum, with considerable dis-ease and guilt, tells of discovering that his
6
William L. Benzon and David G. Hays, The Evolution of Cognition, Journal of Social and Biological Structures
13(4): 297-320, 1990, https://www.academia.edu/243486/The_Evolution_of_Cognition.
5
secretary “consults” Elizaa simple program which mimics the responses of a
psychotherapistas though she were interacting with a real person (Weizenbaum
1976). Beyond this, there are researchers who think it inevitable that computers will
surpass human intelligence and some who think that, at some time, it will be possible for
people to achieve a peculiar kind of immortality by “downloading” their minds to a
computer. As far as we can tell such speculation has no ground in either current
practice or theory. It is projective fantasy, projection made easy, perhaps inevitable, by
the ontological ambiguity of the computer. We still do, and forever will, put souls into
things we cannot understand, and project onto them our own hostility and sexuality,
and so forth.
A game of chess between a computer program and a human master is just as
profoundly silly as a race between a horse-drawn stagecoach and a train. But the
silliness is hard to see at the time. At the time it seems necessary to establish a purpose
for humankind by asserting that we have capacities that it does not. It is truly difficult to
give up the notion that one has to add “because . . . “ to the assertion “I’m important.”
But the evolution of technology will eventually invalidate any claim that follows
“because.” Sooner or later we will create a technology capable of doing what,
heretofore, only we could.
That is where we are now. The notion of an AGI (artificial general intelligence) that will bootstrap
itself into superintelligence is fantasy; it arises because, even after three-quarters of a century,
computers are still strange to us. We design, build, and operate them; but they challenge us;
they’ve got bugs, they crash, they don’t come to heel when we command. We don’t know what
they are. That is certainly the case with GPT-3. We’ve built it; its performance amazes (but
puzzles and disappoints as well). And we do not understand how it works. It is almost as puzzling
to us as we are to ourselves. Surely we can change that, no?
We conclude our essay with this paragraph:
We know that children can learn to program, that they enjoy doing so, and that a
suitable programming environment helps them to learn (Kay 1977, Pappert 1980).
Seymour Pappert argues that programming allows children to master abstract concepts
at an earlier age. In general it seems obvious to us that a generation of 20-year-olds who
have been programming computers since they were 4 or 5 years old are going to think
differently than we do. Most of what they have learned they will have learned from us.
But they will have learned it in a different way. Their ontology will be different from
ours. Concepts which tax our abilities may be routine for them, just as the calculus,
which taxed the abilities of Leibniz and Newton, is routine for us. These children will
have learned to learn Rank 4 concepts.
Frankly, I think we were ahead of the curve on this one. Had Hays and I hazarded to predict the
advance of computing into the lives of children “The child is Father to the man”, as
Wordsworth observed I fear we would be disappointed by the current situation.
Yes, relatively young programmers have done remarkable things and Silicon Valley teems with
young virtuosi. It is not the virtuosi I’m concerned about. It is the average, which is too low, by
far.
Oddly enough, the current pandemic may help raise that average, though only marginally. With
at-home schooling looming in the future, school districts are beginning to buy laptop machines for
6
children whose families cannot afford them. For without those machines, those children will not
be able to participate in the only education available to them. No doubt most of the instruction
they receive through those machines will train them to be only passive consumers of computation,
as most of us are and have been conditioned to be.
But some of them surely will be curious. They’ll take a look under the virtual hood though some
of them will undoubtedly open up the physical machine itself (not that there’s much to see, with so
much action integrated on a single chip)and begin tinkering around. And before you know,
they’ll do interesting things and Peter Thiel is going to be handing out more of those $100,000
fellowships to teens living in institutionally impoverished neighborhoods plagued with
substandard infrastructure.
7
We’ll see.
My intellectual history
I was trained in computational semantics by the late David Hays, a first generation researcher in
machine translation and one of the founders of computational linguistics.
8
He saw the collapse of
that enterprise in the mid-1960s because it over-promised and under-delivered. He learned from
that collapse. But of course, I could not. For me it is just something that had happened in the past.
I could listen to the lessons Hays had taken from those events, and believe them, but those lessons
weren’t my lessons. I did not have to adjust my priors to accommodate to that collapse.
Symbolic AI, roughly similar to the computational semantics I learned from Hays, collapsed in
the mid-1980s. I had fully expected to see the development of symbolic systems capable of
“reading” a Shakespeare play in an intellectually interesting way.
9
That was not to be. I have
made other adjustments, in response to other events, since then. I have NOT kept to a straight
and narrow path. My road has been a winding one.
But I have kept moving.
I have always believed that you should commit yourself to the strongest intellectual position you
can, but not in the expectation that it will pan out or that it is your duty to make it pan out come
hell or high water. No, you do it because it maximizes your ability to learn from what you got
wrong. If you don’t establish firm priors, you can’t correct them effectively.
My intellectual career has thus been a long sequence of error-correcting maneuvers. Have I got it
right at long last?
Are you crazy?
This section and the ones to follow are no more than my best assessment of the current situation,
subject to the fact that I’m doing this quickly. I will surely be wrong in many particulars, and
perhaps in overall direction as well. Consider this paper to be a set of Bayesian priors subject to
correction by later events.
7
Peter Thiel offers $100,000 fellowships to talented young people provided they drop out of college so they
can do new things, https://thielfellowship.org/.
8
David G. Hays, Wikipedia, https://en.wikipedia.org/wiki/David_G._Hays.
9
See the discussion of the Prospero system on pages 271-273 of William Benzon and David G. Hays,
Computational Linguistics and the Humanist, Computers and the Humanities, Vol. 10. 1976, pp. 265-274,
https://www.academia.edu/1334653/Computational_Linguistics_and_the_Humanist.
7
2. No meaning, no how: GPT-3 as Rubicon and Waterloo, a
personal view
I say that not merely because I am a person and, as such, I have a point of view on GPT-3 and
related matters. I say so because this discussion is informal, without journal-class examination of
this, that, and the other, along with the attendant burden of citation, though I will offer some
citations. I am trying to figure out just what it is that I think, and see value in doing so in public.
What value, you ask? It commits me to certain ideas, if only provisionally. It lays out a set of priors
and thus serves to sharpen my ideas as developments unfold and I reconsider and adjust.
GPT-3 represents an achievement of a high order; it deserves the attention it has received, if not
the hype. We are now deep in “here be dragons” territory and we cannot go back. And yet, if we
are not careful, we’ll never leave the dragons, we’ll always be wild and undisciplined. We will
never actually advance; we’ll just spin faster and faster. Hence GPT-3 is both a Rubicon, the
crossing of a threshold, and a potential Waterloo, a battle the AI community cannot win. If it
chooses to fight, that is, to continue with the largely empirical methods that have brought success
so far, it will loose as machine translation did in the mid 1960s and symbolic AI did in the mid-
1980s. By all means, continue the building and experimenting. But take a few moments to step
back and reflect about the enterprise and so develop a deeper understanding of what has been
done in the past and what can and should be done in the future.
Here’s my plan for this section of the paper: First we take a look at history, at the origins of
machine translation and symbolic AI. Next I develop a fairly standard critique of models such as
those used in GPT-3 and follow it with similar remarks by Martin Kay, one of the Grand Old
Men of computational linguistics. Then I look at the problem of common-sense reasoning and
conclude by looking ahead to the next stage of exposition in which I offer some speculations on
why (and perhaps even how) these models can succeed despite their severe and fundamental
short-comings.
Background: MT and Symbolic computing
It all began with a famous memo Warren Weaver wrote in 1949. Weaver was director of the
Natural Sciences division of the Rockefeller Foundation from 1932 to 1955. He collaborated with
Claude Shannon on the publication of a book which popularized Shannon’s seminal work in
information theory, The Mathematical Theory of Communication. Weaver’s 1949 memorandum, simply
entitled “Translation”
10
, is regarded as the catalytic document in the origin of machine translation
(MT) and hence of computational linguistics (CL) and heck! why not? artificial intelligence (AI).
Let’s skip to the fifth section of Weaver’s memo, “Meaning and Context” (p. 8):
First, let us think of a way in which the problem of multiple meaning can, in principle at
least, be solved. If one examines the words in a book, one at a time as through an
opaque mask with a hole in it one word wide, then it is obviously impossible to
determine, one at a time, the meaning of the words. “Fast” may mean “rapid”; or it
may mean “motionless”; and there is no way of telling which.
10
Warren Weaver, “Translation”, Carlsbad, NM, July 15, 1949, 12. pp. Online: http://www.mt-
archive.info/Weaver-1949.pdf.
8
But if one lengthens the slit in the opaque mask, until one can see not only the central
word in question, but also say N words on either side, then if N is large enough one can
unambiguously decide the meaning of the central word. The formal truth of this
statement becomes clear when one mentions that the middle word of a whole article or
a whole book is unambiguous if one has read the whole article or book, providing of
course that the article or book is sufficiently well written to communicate at all.
It wasn’t until the 1960s and ‘70s that computer scientists would make use of this insight; Gerard
Salton was the central figure and he was interested in document retrieval.
11
Salton would
represent documents as a vector of words and then query a database of such representations by
using a vector composed from user input. Documents were retrieved as a function of similarity
between the input query vector and stored document vectors.
Work on MT went a different way. Various approaches were used, but at some relatively early
point researchers were writing formal grammars of languages. In some cases these grammars were
engineering conveniences while in others they were taken to represent the mental grammars of
humans. In any event, that enterprise fell apart in the mid-1960s. The prospects for practical
results could not justify federal funding and the government had little interest in supporting purely
scientific research into the nature of language.
Such research continued nonetheless, sometimes under the rubric of computational linguistics
(CL) and sometimes as AI. I encountered CL in graduate school in the mid-1970s when I joined
the research group of David Hays in the Linguistics Department of the State University of New
York at Buffalo I was actually enrolled as a graduate student in English.
Many different semantic models were developed in various research groups, but we don’t need
anything like a review of that work, just a little taste. In particular let us look at a general type of
model known as a semantic or cognitive network. Hays had been developing such a model for
some years in conjunction with several graduate students.
12
Here’s a fragment of a network from a
system developed by one of those students, Brian Philips, to tell whether or not newspaper stories
of people drowning were tragic.
13
Here’s a representation of the semantics of capsize:
11
David Durbin, The Most Influential Paper Gerard Salton Never Wrote, Library Trends, vol. 52, No. 4,
Spring 2004, pp. 748-764.
12
For a basic account of cognitive networks, see David G. Hays. Networks, Cognitive. In (Allen
Kent, Harold Lancour, Jay E. Daily, eds.): Encyclopedia of Library and Information Science, Vol. 19.
Marcel Dekker, Inc., NY 1976, 281-300. You can download a copy here,
https://www.academia.edu/10900362/Networks_Cognitive.
13
Brian Phillips. A Model for Knowledge and Its Application to Discourse Analysis, American Journal of
Computational Linguistics, Microfiche 82, (1979).
9
Notice that there are two kinds of nodes in the network, square ones and round ones. The square
ones represent a scene while the round ones represent individual objects or events. Thus the
square node at the upper left indicates a scene with two sub-scenes I’m just going to follow out
the logic of the network without explaining it in any detail. The first one asserts that there is a boat
that contains one Horatio Smith. The second one asserts that the boat overturns. And so forth through
the rest of the diagram.
This network represents semantic structure. In the terminology of semiotics, it represents a
network of signifieds. Though Philips didn’t do so, it would be entirely possible to link such a
semantic network with a syntactic network (composed of signifiers), and many systems of that era
did so.
Such networks were symbolic in the (obvious) sense that the objects in them were considered to be
symbols, not sense perceptions or motor actions nor, for that matter, neurons, whether real or
artificial. The relationship between such systems and the human brain was not explored, either in
theory or in experimental observation. It wasn’t an issue.
That enterprise collapsed in the mid-1980s. Why? The models had to be hand-coded, which took
time. They were computationally expensive and so-called common sense reasoning proved to be
endless, making the models larger and larger. (I discuss common sense below and I have many
posts at New Savanna on the topic.
14
)
The work didn’t stop entirely. Some researchers kept at it. But interests shifted toward machine
learning techniques and toward artificial neural networks. That is the line of evolution that has,
three or four decades later, resulted in systems like GPT-3, which also owe a debt to the vector
semantics pioneered by Salton. Such systems build huge language models from huge corpora
GPT-3 is based on 300 billion tokens
15
and contain no explicit models of syntax or semantics
anywhere, at least not that researchers can recognize.
14
My various posts on common sense are at this link: https://new-
savanna.blogspot.com/search/label/common%20sense%20knowledge.
15
Tom B. Brown, Benjamin Mann, Nick Ryder, et al. Language Models are Few-Shot Learners,
arXiv:2005.14165v4 [cs.CL] 5 June 2020, p. 8. (https://arxiv.org/abs/2005.14165v4)
10
Researchers build a computational system that constructs a language model (“learns” the
language), but the inner workings of that model are opaque to the researchers. The system built
the model, not the researchers. They only built the system.
It is a strange situation.
No words, only signifiers
Let us start with the basics. There are no words as such in the corpus on which GPT-3’s language
model is based. Words have spelling, pronunciation, meaning, often various meanings,
grammatical usage, and connotations and implications. All that exists in the corpus are spellings,
bare naked signifiers. No signifieds, that is to say, semantics and more generally concepts and even
percepts. And certainly no referents, things and situations in the world to which words refer. The
corpus is thus utterly empty signification, and yet it exhibits structure and the order in GPT-3’s
language model derives from that order.
Remember, the corpus consists of words people wrote while negotiating their way in the world. In
their heads they’ve got a highly structured model of the world, a semantics (which rides on
perception and cognition). Let us say, for the moment, that the semantic model is
multidimensional. Linguistic syntax maps that multidimensional semantics onto a one-
dimensional string which can be communicated through speech, gesture, or writing.
GPT-3 has access only to those strings. It ‘knows’ nothing of the world, nor of syntax, much less of
semantics, cognition and perception. What it is ‘examining, in those strings, however, reflects the
interaction of human minds and the world. While there is an obvious sense in which the structure in
those strings comes from the human mind, we also have to take the world into account. For the
people who created those strings were not just spinning language out for the fun of it oh, some of them
were. But even poets, novelists, and playwrights attend to the world’s structure in their writing.
What GPT-3 recovers and constructs from the data it ingests is thus a simulacrum of the
interaction between people and the world. There is no meaning there. Only entanglement. And
yet what it does with that entanglement is very interesting and has profound consequences but
more of that later.
* * * * *
Now, we as users, as clients of such systems, are fooled by our semiotic naiveté. Even when we’ve
taken semiotics 101, we look at a written signifier and we take it for a word, automatically and
without thought, with its various meanings and implications. But it isn’t a word, not really.
Yes, in normal circumstances talking with one another, reading various documents it makes
sense for us to treat signifiers as words. As such those signifiers are linked to signifieds (semantics,
concepts, percepts) and referents (things and situations in the world). But output from GPT-3 is
not normal circumstances. It’s working from a huge corpus of signifiers, but no matter how you
bend, fold, spindle, or mutilate those signifiers, you’re not going to get a scintilla of meaning. Any
meaning you see, is meaning you put there.
Where did those signifiers come from? That’s right, those millions if not billions of people writing
away. Writing about the world. So there is in fact something about the world intertwined with
those signifiers, just as there is something about the structure of the minds that composed them.
The structure of minds and of the world have become entangled and projected onto one
11
freakishly long and entangled pile of strings. That is what GPT-3 works with to generate its
language model.
Let me repeat this once again, obvious though it is: Those words in the corpus were generated by
people conveying knowledge of, attempting to make sense of, the world. Those strings are coupled
with the world, albeit asynchronously. Without that coupling, that corpus would collapse into an
unordered pile of bare naked signifiers. It is that coupling with the world the authorizes our
treatment of those signifiers as full-on words.
We need to be clear on the distinction between the language system as it exists in the minds of
people and the many and various texts those people generate as they employ that system to
communicate about and make sense of the world. It would be a mistake to think that the GPT-3
language model is only about what is inside people’s heads. It is also about the world, for those
people use what is in their heads to negotiate their way in the world. [I intend to “cash out” on
my insistence on this point in the next section.]
Martin Kay, “an ignorance model”
With that in mind let us consider what Martin Kay has to say about statistical language
processing. Martin Kay is one of the Grand Old Men of computational linguistics. He was
originally trained in Great Britain by Margaret Masterman, a student of Ludwig Wittgenstein,
and moved to the United States in the 1950s where he worked with by teacher and colleague,
David Hays. Before he had come to SUNY Buffalo Hays had run the RAND Corporation’s
program in machine translation.
In the early 2000s the Association for Computational Linguistics gave Kay a lifetime achievement
award and he delivered some remarks on that occasion.
16
At the end he says (p. 438):
Statistical NLP has opened the road to applications, funding, and respectability for our
field. I wish it well. I think it is a great enterprise, despite what I may have seemed to say
to the contrary.
Prior to that he had this to say (437):
Symbolic language processing is highly nondeterministic and often delivers large
numbers of alternative results because it has no means of resolving the ambiguities that
characterize ordinary language. This is for the clear and obvious reason that the
resolution of ambiguities is not a linguistic matter. After a responsible job has been done
of linguistic analysis, what remain are questions about the world. They are questions of
what would be a reasonable thing to say under the given circumstances, what it would
be reasonable to believe, suspect, fear, or desire in the given situation. If these questions
are in the purview of any academic discipline, it is presumably artificial intelligence. But
artificial intelligence has a lot on its plate and to attempt to fill the void that it leaves
open, in whatever way comes to hand, is entirely reasonable and proper. But it is
important to understand what we are doing when we do this and to calibrate our
expectations accordingly. What we are doing is to allow statistics over words that occur
very close to one another in a string to stand in for the world construed widely, so as to
include myths, and beliefs, and cultures, and truths and lies and so forth. As a stop-gap
for the time being, this may be as good as we can do, but we should clearly have only
16
Martin Kay, A Life of Language, Computational Linguistics, Volume 31 Issue 4, December 2005, pp. 425-
438, http://web.stanford.edu/~mjkay/LifeOfLanguage.pdf.
12
the most limited expectations of it because, for the purpose it is intended to serve, it is
clearly pathetically inadequate. The statistics are standing in for a vast number of things
for which we have no computer model. They are therefore what I call an “ignorance
model.”
An “ignorance model” that, more or less, is what I said in the previous section when I said that
GPT-3 had no access to semantics.
There are some thinkers who believe it is necessary to somehow integrate these statistical methods
with the old school” symbolic methods pioneered by Kay and by David Hays and many others.
That’s what I think, and have thought so for some time. I know that Gary Marcus and Rodney
Brooks have a start-up built on that premise, Robust AI.
17
So does David Ferrucci, who headed
IBM’s Watson project when it won at Jeopardy; Elemental Cognition is his start-up.
18
Common sense reasoning about the world
As I’ve already observed, one of the problems that sunk symbolic processing in the mid-1980s is
that of common sense reasoning. This has become an issue with current work as well. While
GPT-3 does exhibit impressive performance in this area, there are signs that not all is as it seems.
Kevin Lacker observes:
19
Traditionally, artificial intelligence struggles at “common sense”. But GPT-3 can
answer a lot of common-sense questions.
Q: What is your favorite animal?
A: My favorite animal is a dog.
Q: Why?
A: Because dogs are loyal and friendly.
Q: What are two reasons that a dog might be in a bad mood?
A: Two reasons that a dog might be in a bad mood are if it
is hungry or if it is hot.
Q: How many eyes does a giraffe have?
A: A giraffe has two eyes.
Q: How many legs does a frog have?
A: A frog has four legs.
Q: Are there any animals with three legs?
A: No, there are no animals with three legs.
Q: Why don't animals have three legs?
A: Animals don't have three legs because they would fall
over.
Ten years ago, if I had this conversation, I would have assumed the entity on the other
end was a human. You can no longer take it for granted that an AI does not know the
answer to “common sense” questions.
Yes. But Lacker goes on:
We need to ask questions that no normal human would ever talk about.
17
Robust AI, https://www.robust.ai/.
18
Elemental Cognition, https://www.elementalcognition.com/.
19
Kevin Lacker's blog, Giving GPT-3 a Turing Test, https://lacker.io/ai/2020/07/06/giving-gpt-3-a-
turing-test.html.
13
Q: How many eyes does a giraffe have?
A: A giraffe has two eyes.
Q: How many eyes does my foot have?
A: Your foot has two eyes.
Q: How many eyes does a spider have?
A: A spider has eight eyes.
Q: How many eyes does the sun have?
A: The sun has one eye.
Q: How many eyes does a blade of grass have?
A: A blade of grass has one eye.
Now we’re getting into surreal territory. GPT-3 knows how to have a normal
conversation. It doesn’t quite know how to say “Wait a moment... your question is
nonsense.” It also doesn’t know how to say “I don’t know.”
It seems that, if you probe a bit, GPT-3 does have problems with common sense reasoning.
A lot of common-sense reasoning takes place “close” to the physical world. I have come to
believe, but will not here argue, that much of our basic (‘common sense’) knowledge of the
physical world is grounded in analogue and quasi-analogue representations.
20
This gives us the
power to generate language about such matters on the fly. Old school symbolic machines did not
have this capacity nor do current statistical models, such as GPT-3.
But then how can a system generate analog or quasi-analog representations of the world unless it
has direct access to the world? The creators of GPT-3 acknowledge this as a limitation:
Finally, large pretrained language models are not grounded in other domains of
experience, such as video or real-world physical interaction, and thus lack a large
amount of context about the world [BHT+20]. For all these reasons, scaling pure self-
supervised prediction is likely to hit limits, and augmentation with a different approach
is likely to be necessary. Promising future directions in this vein might include learning
the objective function from humans [ZSW+19a], fine-tuning with reinforcement
learning, or adding additional modalities such as images to provide grounding and a
better model of the world [CLY+19].
21
And yet GPT-3 seems so effective. How can that be?
The above critique is from first principles and, as such, seems to me to be unassailable. Equally
unassailable, however, are the facts on the ground: these systems do work. And here I’m not
talking only about GPT-3 and its immediate predecessors. I’ve done much of my thinking about
these matters in connection with other kinds of systems based on distributional semantics, such as
topic modeling, for one example.
20
For a superb analog model see William Powers, Behavior: The Control of Perception (Aldine) 1973. Don’t let
the publication date fool; Powers develops his model with a simplicity and elegance that makes it well worth
our attention even now, almost 50 years later. Hays integrated Powers’ model into his cognitive network
model, see David G. Hays, Cognitive Structures, HRAF Press, 1981. Also, see my post, “Computation, Mind,
and the World [bounding AI]”, New Savanna, blog post, December 28, 2019, https://new-
savanna.blogspot.com/2019/12/computation-mind-and-world-bounding-ai.html.
21
Tom B. Brown, Benjamin Mann, Nick Ryder, et al. Language Models are Few-Shot Learners,
arXiv:2005.14165v4 [cs.CL] 5 June 2020, p. 34. (https://arxiv.org/abs/2005.14165v4)
14
Thus I have little choice, it seems, but to hazard an account of just why these models are effective.
That’s my task for the next section, The brain, the mind, and GPT-3: Dimensions and
conceptual spaces”. Note that I do not mean to explicate the computational processes used in
GPT-3, not at all. Rather, I am going to speculate about what there is in the nature of the mind,
and perhaps even of the world, that allows such mechanisms to succeed.
It is common to think of language as loose, fuzzy, and imprecise. And so it is. But that cannot and
is not all there is to language. In order for language to work at all there must be a rigid and
inflexible aspect to it. That is what I’ll be talking about in the next section. I’ll be building on
theoretical work by Sydney Lamb, Peter Gärdenfors, and a comment Graham Neubig made in a
discussion about semantics and machine learning.
15
3. The brain, the mind, and GPT-3: An “isometric
transform” onto meaning space
The purpose of this section is to sketch a conceptual framework in which we can understand the
success of language models such as GPT-3 despite the fact that they are based on nothing more
than massive collections of unadorned signifiers. I have no intention of attempting to explain how
GPT-3 works. That it does work, in an astonishing variety of cases if (certainly) not universally, is
sufficient for my purposes.
First of all I present the insight that sent me down this path, a comment by Graham Neubig in an
online conversation that I was not a part of. Then I set that insight in the context of and insight by
Sydney Lamb (meaning resides in relations), a first-generation researcher in machine translation
and computational linguistics. I think take a grounding case by Julian Michael, that of color, and
suggest that it can be extended by the work of Peter Gärdenfors on conceptual spaces.
A clue: an isomorphic transform into meaning space
At the 58th Annual Meeting of the Association for Computational Linguistics Emily M. Bender
and Alexander Koller delivered a paper, Climbing towards NLU: On Meaning, Form, and
Understanding in the Age of Data
22
, where NLU means natural language understanding. The
issue is pretty much the one I laid out in my previous section in the sections “No words, only
signifiers” and “Martin Kay, ‘an ignorance model’”. A lively discussion ensured online which
Julian Michael has summarized and commented on in a recent blog post.
23
In that post Michael quotes a remark by Graham Neubig:
24
One thing from the twitter thread that it doesn’t seem made it into the paper... is the
idea of how pre-training on form might learn something like an “isomorphic transform”
onto meaning space. In other words, it will make it much easier to ground form to
meaning with a minimal amount of grounding. There are also concrete ways to
measure this, e.g. through work by Lena Voita or Dani Yogatama... This actually seems
like an important point to me, and saying “training only on form cannot surface
meaning,” while true, might be a little bit too harshsomething like “training on form
makes it easier to surface meaning, but at least a little bit of grounding is necessary to do
so” may be a bit more fair.
That’s my point of departure in this section, that notion of “an ‘isomorphic transform’ onto
meaning space.I am going to sketch a framework in which we can begin unpacking that idea.
But it may take awhile to get there.
22
Emily M. Bender and Alexander Koller, Climbing towards NLU: On Meaning, Form, and
Understanding in the Age of Data, Proceedings of the 58th Annual Meeting of the Association for
Computational Linguistics, pages 51855198 July 5 - 10, 2020.
23
Julian Michael, To Dissect an Octopus: Making Sense of the Form/Meaning Debate,
https://blog.julianmichael.org/2020/07/23/to-dissect-an-
octopus.html?fbclid=IwAR0LfzVkrmiMBggkm0tJyTN8hgZks5bN0b5Wg4MO96GWZBx9Fom
qhIJH4LQ.
24
He’s an Associate Professor at Carnegie Mellon. His website, http://www.phontron.com/.
16
Meaning is in relations
I want to develop an idea I have from Sydney Lamb, that meaning resides in relations. The idea
as Lamb understood it emerged in the “old school” world of symbolic computation, where
language is conceived as a relational network of items. The meaning of any item in the network is
a function of its position in the network. Note that this means that I am assuming that the mind is
constituted, in part, by something like an old school symbol model; that’s not all that constitutes
the mind, not by any means (recall p. 13). It is that symbolic model that is the object of Neubig’s
“isometric transform”.
Let’s start with this simple diagram:
It represents the fact that the central nervous system (CNS) is coupled to two worlds, each
external to it. To the left we have the external world. The CNS is aware of that world through
various senses (vision, hearing, smell, touch, taste, and perhaps others) and we act in that world
through the motor system. But the CNS is also coupled to the internal milieu, with which it shares
a physical body. The net is aware of that milieu by chemical sensors indicating contents of the
blood stream and of the lungs, and by sensors in the joints and muscles. And it acts in the world
through control of the endocrine system and the smooth muscles. Roughly speaking the CNS
guides the organism’s actions in the external world so as to preserve the integrity of the internal
milieu. When that integrity is gone, the organism is dead.
Now consider this more differentiated presentation of the same facts:
17
I have divided the CNS into four sections: A) senses the external world, B) senses the internal
milieu, C) guides action in the internal milieu, and D) guides action in the external world. I rather
doubt that even a very simple animal, such as C. elegans, with 302 neurons, is so simple. But I trust
my point will survive that oversimplification.
Lamb’s point is that the “meaning” or “significance” of any of those nodes let’s not worry at the
moment whether they’re physical neurons or more abstract entities is a function of its position
in the entire network, with its inputs from and outputs to the external world and the inner
milieu.
25
To appreciate the full force of Lamb’s point we need to recall the diagrams typical of old
school symbolic computing, such as this diagram from Brian Philips we saw in the previous
section:
All of the nodes and edges have labels. Lamb’s point is that those labels exist for our convenience,
they aren’t actually a part of the system itself. If we think of that network as a fragment from a
human cognitive system and I’m pretty sure that’s how Philips thought about it, even if he could
not justify it in detail (no one could, not then, not now) then it is ultimately connected to both
the external world and the inner milieu. All those labels fall away; they serve no purpose. Alas,
Philips was not building a sophisticated robot, and so those labels are necessary fictions.
But we’re interested in the real case, a human being making their way in the world. In that case
let us assume that, for one thing, the necessary diagram is WAY more complex, and that the
nodes and edges do not represent individual neurons. Rather, they represent various entities that
are implemented in neurons, sensations, thoughts, perceptions, and so forth. Just how such things
are realized in neural structures is a matter of some importance and is being pursued by hundreds
of thousands of investigators around the world. But we need not worry about that now. We’re
about to fry some rather more abstract fish.
Some of those nodes will represent signifiers, to use the Saussurian terminology I used earlier, and
some will represent signifieds. What’s the difference between a signifier and a signified? Their
25
Sydney Lamb, Pathways of the Brain, John Benjamins, 1999. See also Lamb’s most recent account
of his language model, Sydney M. Lamb, Linguistic Structure: A Plausible Theory, Language Under
Discussion, 2016, 4(1): 137. https://journals.helsinki.fi/lud/article/view/229.
18
position in the network as a whole. That’s all. No more, no less. Now, it seems to me, we can
begin thinking about Neubig’s “isomorphic transform” onto meaning space.
Let us notice, first of all, that language exists as strings of signifiers in the external world. In the
case that interests us, those are strings of written characters that have been encoded into
computer-readable form. Let us assume that the signifieds which bear a major portion of
meaning, no? exist in some high dimensional network in mental space. This is, of course, an
abstract space rather than the physical space of neurons, which is necessarily three dimensional.
However many dimensions this mental space has, each signified exists at some point. Just how this
conceptual space is implemented in populations of neurons is a matter of considerable interest,
but we need not consider that here.
26
What happens when one writes? You produce a string of signifiers. The distance between signifiers
on this string, and their ordering relative to one another, are a function of the relative distances
and orientations of their associated signifieds in mental space. Perhaps that’s where to look for
Neubig’s isometric transform into meaning space. What GPT-3, and other NLP engines, does is to
examine the distances and ordering of signifiers in the string and compute over them so as to reverse engineer the
distances, orientations and relations of the associated signifieds in high-dimensional mental space.
Let us recall the classic distinction between semantic and episodic memory. Roughly speaking
semantic memory is like a dictionary; it is a basic inventory of concepts. Episodic memory, first
characterized, I believe, by Endel Tulving
27
, is more like an encyclopedia attached to histories
and news reports. Semantic memory
28
is an inventory of types while entries in episodic memory
consist of tokens of those types. The square nodes in the Brian Philips network diagram represent
episodes while the circles represent semantic entities (p. 17).
What is the size of human semantic memory? One estimate places the number of English words
at roughly a million.
29
No one individual will know all those words, but the corpus GPT-3 is based
on is certainly not the product of a single individual. Let us assume, then, for the sake of argument
that GPT-3, or a similar engine, includes roughly one million word types (I believe that GPT-3’s
language model is in fact based on less that 100,000 words). Or, to be more precise, a million
signifier types, and they in turn correspond to a million signified types in mental space. We
certainly don’t need 175 billion parameters to characterize those million signifieds. We need them
to characterize all the episodes in the text base.
Imagine some space adequate to characterize our million signifieds. Maybe it has 10, 100, or
more dimensions; as long as the signifieds are usefully differentiated from one another just how
that is done is secondary. Each signified occupies a point in that space. An episode, then, is a path
26
I offer some preliminary speculation in William Benzon, Attractor Nets, Series I: Notes Toward a New Theory of
Mind, Logic, and Dynamics in Relational Networks, Working Paper, 52 pp.,
https://www.academia.edu/9012847/Attractor_Nets_Series_I_Notes_Toward_a_New_Theory_of_Mind
_Logic_and_Dynamics_in_Relational_Networks.
27
Endel Tulving, Episodic and semantic memory, in: Endel Tulving and Wayne !Donaldson (Eds.),
Organization of Memory (Academic Press, New York, 1972) 382 - 403.
28
Semantic memory is sometimes said to constitute an ontology. However, I suspect that human semantic
memory is not so orderly as the ontologies proposed by knowledge engineers. See Wikipedia’s entry,
Ontology (information science), https://en.wikipedia.org/wiki/Ontology_(information_science), and John
Sowa’s Ontology page, http://www.jfsowa.com/ontology/.
29
“How Many Words Are There In The English Language?” Dictionary.com, accessed August 1,
2020, https://www.dictionary.com/e/how-many-words-in-english/.
19
or a trajectory in the space.
30
An episode might only a single token, or ten, thirty, 256 tokens; but
it might also have 100,000 or more tokens. Those 175 billion parameters are characterizing those
episodes. When you present GPT-3 with a prompt, it treats that prompt as the initial segment of
an episode and continues that trajectory to complete the episode.
In this framework we can think of the extended meaning of a semantic type as being a function of
the way its tokens participate in episode strings. I say extended because it is also defined in that base
space that distinguishes the types from one another. This extended meaning has a Wittgensteinian
feel, word meaning resides in their use.
However, even that extended meaning is not entirely adequate, as some of Kevin Lacker’s
common-sense examples suggest (p. 12 above). In the real world many basic concepts are
grounded in sensory-motor schemas of one kind or another, the image of dog, the sound of a bird,
or the taste of salt. GPT-3 doesn’t have access to such schemas. Some of that information,
however, is characterized by sentences and phrases, that is, by episodes, and GPT-3 does have
access to those.
Many things, moreover, are characterized in multiple ways. Ordinary table salt is characterized
by its taste, appearance, and haptic feel. To the chemist, however, table salt consists mostly of
sodium chloride (NaCl), along with traces of various impurities. Conceptually sodium chloride
didn't even exist until the nineteenth century. While the substance is concrete, the
conceptualization is abstract. The same is true for dogs and cats and pine trees and grasses.
Young children recognize them by how the look, sound, feel, smell, and taste. The professional
biologist, however, has altogether more abstract ways of characterizing them. And so it goes for
the entirely of the natural world and the sciences that have arisen to study those phenomena. We
live amid multiple over-lapping ontologies.
31
What does GPT-3 “know” of such things? We could, I suppose, ask, couldn’t we?
30
I’ve explored the notion of texts as paths in semantic space, William Benzon, Virtual Reading: The Prospero
Project Redux, Working Paper, Version 2, October 2018, 37 pp.,
https://www.academia.edu/34551243/Virtual_Reading_The_Prospero_Project_Redux.
31
I’ve written a bit about multiple ontologies. See William Benzon, Ontology of Common Sense, Hans
Burkhardt and Barry Smith, eds. Handbook of Metaphysics and Ontology, Muenchen: Philosophia
Verlag GmbH, 1991, pp. 159-161. The final draft is online,
https://www.academia.edu/28723042/Ontology_of_Common_Sense; Ontology in Knowledge Representation for
CIM, Center for Manufacturing Productivity and Technology Transfer, Rensselaer Polytechnic Institute.
Report No. CIMNW85TR034, January 1985,
https://www.academia.edu/19804747/Ontology_in_Knowledge_Representation_for_CIM.
20
4. Why is simple arithmetic difficult for deep learning
systems?
Video: Gary Marcus - Towards a Proper Foundation for Artificial General Intelligence:
https://youtu.be/8VWQQbngxXY
Gary Marcus points this out at two points in the video: c. 18:25 (multiplication of 2-digit
numbers), c. 19:49 (3-digit addition). Why is this so difficult for deep learning models to grasp
this? This suggests a failure to distinguish between semantic and episodic memory, to use terms
from Old School symbolic computation that I introduced in the previous section.
The question interests me because arithmetic has calculation has well-understood procedures. We
know how people do it. And by that I mean that there’s nothing important about the process that’s
hidden, unlike our use of ordinary language. The mechanisms of both sentence-level grammar
and discourse structure are unconscious.
It's pretty clear to me that arithmetic requires episodic structure, to introduce a term from old
symbolic-systems AI and computational linguistics. That’s obvious from the fact that we don’t
teach it to children until grammar school, which is roughly episodic level cognition kicks in (see
the paper Hays and I did on natural intelligence
32
).
I note that, while arithmetic is simple, it’s simple only in that there are no subtle conceptual issues
involved. But fluency requires years of drill. First the child must learn to count; that gives numbers
meaning. Once that is well in hand, children are drilled in arithmetic tables for the elementary
operations: addition, subtraction, multiplication, and division. The learning of addition and
subtraction tables proceeds along with exercises in counting, adding and subtracting items in
collections. Once this is going smoothly one learns the procedures multiple-digit addition and
subtraction, multiple-operand addition and then multiplication and division. Multiple digit
division is the most difficult because it requires guessing, which is then checked by actual
calculation (multiplication followed by subtraction).
Why does such intellectually simple procedures require so much drill? Because each individual
step must be correct. One mistake anywhere, and the whole calculation is thrown off. You need to
recall atomic facts (from the tables) many times in a given calculation and keep track of
intermediate results. The human mind is not well-suited to that. It doesn’t come naturally. Drill is
required. That drill is being managed by episodic cognition.
It would seem that GPT-3 cannot pick up that kind of episodic structure. The question is: Can it
pick up any kind of episodic structure at all? I don’t know.
When humans produce the kind of coherent prose that these GPT-3 does, they are using episodic
cognition. But that episodic cognition is unconscious. Does GPT-3 pick up episodic cognition of
32
William Benzon and David G. Hays, A Note on Why Natural Selection Leads to Complexity, Journal of
Social and Biological Structures 13: 33-40, 1990,
https://www.academia.edu/8488872/A_Note_on_Why_Natural_Selection_Leads_to_Complexity.
21
that kind? As I say, I don’t know. But I can imagine that it does not. If not, then what is GPT-3
doing to produce such convincing simulacra of coherent prose? I am tempted to say it is doing it
all with systemic-level cognition, but that may be a mistake as well. GPT-3 is doing it with some
other mechanism, one that doesn’t differentiate between semantic and episodic level.
22
5. Metaphysics: The dimensionality of mind and world
Let’s bring this down to earth. Let’s return to Bender and Koller, who proposed a thought
experiment involving a superintelligent octopus listening in on a conversation between two
people. Julian Michael proposes the following:
As a concrete example, consider an extension to the octopus test concerning colora
grounded concept if there ever was one. Suppose our octopus O is still underwater, and
he:
Understands where all color words lie on a spectrum from light to dark... But he
doesn’t know what light or dark mean.
Understands where all color words lie on a spectrum from warm to cool... But he
doesn’t understand what warm or cool mean.
Understands where all color words lie on a spectrum of saturated to washed out...
But he doesn’t understand what saturated or washed-out mean.
Et cetera, for however many scalar concepts you think are necessary to span color space
with sufficient fidelity. A while after interposing on A and B, O gets fed up with his
benthic, meaningless existence and decides to meet A face-to-face. He follows the cable
to the surface, meets A, and asks her to demonstrate what it means for a color to be light,
warm, saturated, etc., and similarly for their opposites. After grounding these words, it
stands to reason that O can immediately ground all color termsa much larger subset of
his lexicon. He can now demonstrate full, meaningful use of words like green and lavender,
even if he never saw them used in a grounded context. This raises the question: When, or
from where, did O learn the meaning of the word “lavender”?
It’s hard for me to accept any answer other than “partly underwater, and partly on
land.” Bender acknowledges this issue in the chat as well:
The thing about language is that it is not unstructured or random, there is a lot of information there in
the patterns. So as soon as you can get a toe hold somewhere, then you can (in principle, though I don’t
want to say it’s easy or that such systems exist), combine the toe hold + the structure to get a long ways.
The thing about color is that it is much investigated and well (if not completely) understood, from
genetics up through cultural variation in color terms. And color is understood in terms of three
dimensions, hue (warm to cool), saturation, and brightness (light to dark).
And that brings us to the work of Peter Gärdenfors, who has developed a very sophisticated
geometry of conceptual spaces where each space is organized along one or more dimensions.
33
And he means real geometry, not geometry as metaphor. He starts with color, but then, over the
course of two books, extends the idea of conceptual spaces and their constitutive dimensions to a
wide and satisfying range of examples.
This is not the time and place to even attempt a précis of his theory. But I note, for example, that
he has interesting treatments of properties, animal concepts, metaphor, prepositions, induction,
and computation in Conceptual Spaces. His more recent book, The Geometry of Meaning, has chapters
33
Peter Gärdenfors, Conceptual Spaces: The Geometry of Thought, MIT Press, 2000; The Geometry of Meaning:
Semantics Based on Conceptual Spaces, MIT Press, 2014.
23
on semantic domains, meeting of minds (in interaction), the semantics of nouns, adjectives, and
actions, and propositions and compositionality. As a starting point I recommend his recent
article
34
, which also contains some remarks about computational implementation, as a starting
point. In our immediate context a crucial point is that Gärdenfors regards his account of mental
spaces as being different from both classic symbolic accounts of mind (such as that embodied by
the Brian Philips example) and artificial neural networks, such as GPT-3. Though I am perhaps
interpreting him a bit, he sees mental spaces as a tertium quid between the two. In particular, to the
extent that Gärdenfors is more or less correct, we have a coherent and explicit way of understanding the
success of neural network models such as GPT-3. That is, if the world, on the one hand, and the human
sensorium and motor system, on the other, are like that, then the success of GPT-3 is intelligible on
those terms.
The metaphysical structure of the world
It seems to me that what Gärdenfors is looking at is what me might call, for lack of a better term,
the metaphysical structure of the world.
The metaphysical structure of the world?!
I don’t mean physical structure of the world, which is a subject for the various physical and, I
suppose, biological sciences. I mean metaphysical. Just what that is, I’m not sure. The metaphysical
structure of the world is that structure that makes the world intelligible to us; it exists in the
relationship between us, homo sapiens sapiens, and the world. What is the world that it is perceptible,
that we can move around in it in a coherent fashion? Whatever it is, it is the product of millions of
years of evolution in which animals have had to make their way in the world.
Imagine, in contrast, that the world consisted entirely of elliptically shaped objects. Some are
perfectly circular, others only nearly circular. Still others seem almost flattened into lines. And we
have everything in between. In this world things beneficial to us are a random selection from the
full population of possible elliptical beings, and the same with things dangerous to us. Thus there
are no simple and obvious perceptual cues that separate good things from bad things. A very good
elliptical being may differ from a very bad being in a very minor way, difficult to detect. Such a
world would be all but impossible to live in.
That is not the world we have. Yes, there are cases where small differences are critical. But they
don’t dominate. Our world is intelligible. Plants are distinctly different from animals, tigers from
mice, oaks from petunias, rocks and water are not at all alike, and so on. It is thus possible to
construct a conceptual system capable of navigating in the external world so as to preserve and
even enhance the integrity of the internal milieu. That, I believe, is what Gärdenfors is looking at
when he talks of dimensionality and conceptual spaces. Conceptual spaces capture the variety in
the world in a way that nervous systems can compute over it. The metaphysical structure of the
world thus lies in the correspondence of language with the world.
I am, at least provisionally, calling that correspondence the metaphysical structure of the world.
Moreover, since humans did not arise de novo that metaphysical structure must necessarily extend
through the animal kingdom and, who knows, plants as well.
34
Peter Gärdenfors, An Epigenetic Approach to Semantic Categories, IEEE Transactions on Cognitive and
Developmental Systems (Volume: 12 , Issue: 2, June 2020 ) 139 147. DOI: 10.1109/TCDS.2018.2833387
(sci-hub link, https://sci-hub.tw/10.1109/TCDS.2018.2833387)
24
“How”, you might ask, “does this metaphysical structure of the world differ from the world’s
physical structure?” I will say, again provisionally, that it is a matter of intension rather than extension.
Extensionally the physical and the metaphysical are one and the same. But intensionally, they are
different. We think about them in different terms. We ask different things of them. They have
different conceptual affordances. The physical world is meaningless; it is simply there. It is in the
metaphysical world that we seek meaning.
Interlude, a little dialog
Does this make sense, philosophically? How would I know?
I get it, you’re just making this up.
Right.
Hmmmm… How does this relate to that object-oriented ontology stuff you
were so interested in a couple of years ago?
35
Interesting question. Why don’t you think about it and get back to me.
I mean, that metaphysical structure you’re talking about, it seems
almost like a complex multidimensional tissue binding the world
together. It has a whiff of a Latourian actor-network about it.
Hmmm… Set that aside for awhile. I want to go somewhere else.
Still on GPT-3, eh?
You got it.
World, Mind, and Text
Text reflects this learnable, this metaphysical, structure, albeit at some remove:
35
See, for example, William Benzon, Living with Abundance in a Pluralist Cosmos: Some Metaphysical Sketches,
Working Paper, January 2013, 87 pp.,
https://www.academia.edu/4066568/Living_with_Abundance_in_a_Pluralist_Cosmos_Some_Metaphysi
cal_Sketches.
25
Learning engines are learning the structure inherent in the text. But that learnable structure is not
explicit in the language model created by the learning engine.
There are two things in play: 1) the fact that the text is learnable, and 2) that it is learnable by a
statistical process. How are these two related?
If we already had an explicit ‘old school’ propositional model in computable form, then we
wouldn’t need statistical learning at all. We could just run the propositional model over the corpus
and encode the result. But why do even that? If we can read the corpus with the propositional
model, in a simulation of human reading, then there’s no need to encode it at all. Just read
whatever aspect of the corpus is needed at the time.
So, statistical learning is a substitute for the lack of a usable propositional model. The statistical
model does work, but at the expense of explicitness.
But why does the statistical model work at all? That’s the question.
It’s not enough to say, because the world itself is learnable. That’s true for the propositional
model as well. Both work because the world is learnable.
BUT: Humans don’t learn the world with a statistical model. We learn it through a propositional
engine floating over an analogue or quasi-analogue engine with statistical properties. And it is the
propositional engine that allows us to produce language. A corpus is a product of the action of
propositional engine, not a statistical model, acting on the world.
Description is one basic such action; narration is another. Analysis and explanation are perhaps
more sophisticated and depend on (logically) prior description and narration. Note that this
process of rendering into language is inherently and necessarily a temporal one. The order in
which signifiers are placed into the speech stream depends in some way, not necessarily obvious, on
the relations among the correlative signifieds in semantic or cognitive space. Distances between
signifiers in the speech stream reflect distances between correlative signifieds in semantic space.
We thus have systematic relationships between positions and distances of signifiers in the speech stream, on the one
hand, and positions and distances of signifieds in semantic space. It is those systematic relationships that allow
statistical analysis of the speech stream to reconstruct semantic space.
Note that time is not extrinsic to this process. Time is intrinsic and constitutive of computation. Speaking
involves computation, for it is language, as does the statistical analysis of the speech stream.
The propositional engine learns the world via Gärdenfors’ dimensions, and whatever else,
Powers’ stack for example.
36
Those dimensions are implicit in the resulting propositional model
and so become projected onto the speech stream via syntax, pragmatics, and discourse structure.
The language engine is then able to extract (a simulacrum of) those dimensions through statistical
learning. Those dimensions are expressed in the parameter weights of the model. THAT’s what
makes the knowledge so ‘frozen’. One has to cue it with actual speech.
36
William Powers, Behavior: The Control of Perception (Aldine) 1973. A decade later David Hays integrated
Powers’ model into his cognitive network model, David G. Hays, Cognitive Structures, HRAF Press, 1981.
26
The whole language model thus functions as associative memory.
37
You present it with an input
cue, and it then associates from that cue and emits tokens which then project back into the model,
curing other tokens, and so forth.
Waterloo or Rubicon? Beyond blind success
We now have a way of beginning to think about just why language models such as GPT-3 can be
so successful even though, as I argued earlier, they have no direct access to the realm of signifieds,
of meaning. This argument does not change that. Whatever GPT-3 accomplishes, it does so on
the strength of structural relationships between the signifiers which it accessed in the corpus and the
signifieds in the minds of the people who produced the texts in that corpus. Its success is a
triumph of formalism devoid of meaning. Meaning requires access to the world.
And yet, if meaning inheres in relationships, as Lamb has argued, then those relationships exist in
the model even as the model is isolated from the world. But we are creatures of language. We
generate it and it generates us. Is it so very strange that an elaborate network of relationships
among bare naked signifiers should evoke the tantalizing prospect of flesh and blood
interlocutors?
What happens next with GPT-3 and other such models? At the moment they represent the
success of not-so-blind groping extending back to Gerard Salton’s first experiments with vector
semantics. As far as I can tell the creators of such models have little commitment to some theory
of how language and mind work. They may well know that the neurons in their networks
resemble real neurons about as much as a smiley face resembles the Mona Lisa, that their layers
have only a passing resemblance to the structure of the cerebral cortex; they may well have taken
Linguistics 101, and so forth. But thinking in those terms is not central to their work; they do not
arrive at the computer with a well thought-out account of mind-brain-world-and-language. That’s
not what they’re trying to figure out. They’re trying to figure out how to get computers to
produce simulacra of human language and cognitive behavior. They do that very well. As for
their remarkable results, they don’t know how their engines achieve them. They know only that
they do. And they no doubt are full of ideas about how to modify those engines so they do better.
Remarkable as the results have been, I do not see this as a long-term strategy for success. Students
of symbolic systems know a lot about how they work, though many details are in dispute. How
can that knowledge be brought to bear on the construction and operation of (large scale) statistical
properties of language? I have suggested a framework in which that can be done a fairly specific
suggestion about Neubig’s isomorphic transform onto meaning spaceto be amplified extended
with Gärdenfors’ conceptual spaces. It is only a beginning. Such a framework would be useful in
looking under the hood to examine the mechanics of these models so that we can improve them.
When I talk of GPT-3 as a crossing of the Rubicon, that is what I mean. Given a way of thinking
about how such models operate, we are at the threshold of even more remarkable developments.
But if the AI community refuses to develop such a framework then I fear that their work will,
sooner or later, crash and burn, as machine translation did in the mid 1960s, and as symbolic
37
The idea that the brain implements associative memory in a holographic fashion was championed by
Karl Pribram in the 1970s and 1980s. David Hays and I drew on that work in an article on metaphor,
William Benzon and David Hays, Metaphor, Recognition, and Neural Process, The American Journal of
Semiotics , Vol. 5, No. 1 (1987), 59-80,
https://www.academia.edu/238608/Metaphor_Recognition_and_Neural_Process.
27
computation did in the mid 1980s. They will have met yet another Waterloo, snatching defeat
to change metaphors in mid-stream from success.
We have no choice but to move forward.
Unless, of course, the investors chicken out.
28
6. Gestalt switch: GPT-3 as a model of the mind
Here are some key paragraphs from section three; note the underlined sections:
Let us notice, first of all, that language exists as strings of signifiers in the external world.
In the case that interests us, those are strings of written characters that have been
encoded into computer-readable form. Let us assume that the signifieds which bear a
major portion of meaning, no? exist in some high dimensional network in mental
space. This is, of course, an abstract space rather than the physical space of neurons,
which is necessarily three dimensional. However many dimensions this mental space
has, each signified exists at some point in that space and, as such, we can specify that
point by a vector containing its value along each dimension.
What happens when one writes? Well, one produces a string of signifiers. The distance
between signifiers on this string, and their ordering relative to one another, are a function
of the relative distances and orientations of their associated signifieds in mental space.
That’s where to look for Neubig’s isometric transform into meaning space. What GPT-3,
and other NLP engines, does is to examine the distances and ordering of signifiers in the string and
compute over them so as to reverse engineer the distances and orientations of the associated signifieds in
high-dimensional mental space.
The purpose of this section is simply to underline the seriousness of my assertion to treat the mind
as a high-dimensional space and that, therefore, we should treat the high-dimensional parameter
space of GPT-3 as a model of the mind. If you aren't comfortable with the idea, well, it takes a bit
of time for it to settle down. This section is a way of occupying some of that time.
If it’s not a model of the mind, after all, then what IS it a model of? “The language”, you say?
Where does the language come from, where does it reside?The mind”, that’s right.
It is certainly not a complete model of the mind. The mind, for example, is quite fluid, is capable
of autonomous action, has access to the physical world, and is deeply social. GPT-3 seems static
and is only reactive. It cannot initiate action, has no direct access to the external world, and has
little capacity for social interaction. Nonetheless, it is still a rich model.
I built plastic models as a kid, models of rockets, of people, and of sailing ships. None of those
models completely captured the things they modeled. I was quite clear on that. I have a cousin
who builds museum-class ship models from wood of various kinds, metal, cloth, paper, thread and
twine (and perhaps some plastic here and there). They are much more accurate and aesthetically
pleasing than the models I assembled from plastic kits as a kid. But they are still only models.
So it is with GPT-3. It is a model of the mind. We need to get used to thinking of it in those terms,
dangerous as they may be. But, really, can the field get more narcissistic and hubristic than it
already is?
* * * * *
This is not the first time I’ve been through this drill. I’ve been thinking about this that and the
other in the so-called digital humanities since 2014. Call it computational criticism. These
particular investigators had been using various kinds of distributional semantics topic modeling,
29
vector space semantics to examine literary texts and populations of texts. They don’t think
about their language models as models of the mind; they’re just, well, you know, language models,
models of texts. There’s some kind membrane, some kind of barrier, that keeps us them, me,
you from moving from these statistical models of texts to thinking of them as models of the mind
that produced the texts. They’re not the real thing, they’re stop gaps, approximations.
Yes, they are. And they are also models, as much models of the mind as a plastic schooner is a
model of the America.
* * * * *
I’m suggesting we need to perform a gestalt switch. As long as we think of the statistical object as a
poor cousin of what we’re really
interested in language, the mind we
see it as rabbit. But then it starts walking
like a duck and quacking like one.
Shazaam! It’s a duck.
Why am I saying this? Like I said, to
underline the seriousness of my assertion
to treat the mind as a high-dimensional
space. In a common formulation, the mind
is what the brain does. The brain is a three-
dimensional physical object.
It consists of roughly 86 billion neurons,
each of which has roughly 10,000 connections with other neurons. The action at each of those
synaptic junctures is mediated by upward of 100 neurochemicals. The number of states a system
can take depends on 1) the number of elements it has, 2) the number of states each element can
take, and 3) the dependencies among those elements. How many states can that system assume?
We don't really know. Jillions, maybe zillions, maybe jillions of zillions. A lot.
That is a state space of very high dimensionality. That state space is the mind. GPT-3 is a model
of that. Compared to a jillion zillion possible mental states, a 175 billion parameters is peanuts.
30
7. Engineered intelligence at liberty in the world
It’s time to wrap things up. To do so I will quote some passages from a recent blog post by David
Ferrucci. Full disclosure: I know Ferrucci, though not well. I haven’t seen or talked with him in
decades but we do exchange emails every few years. Back in the early to mid-1980s I was on the
faculty at the Rensselaer Polytechnic Institute in the Department of Language, Literature, and
Communications. Ferrucci was, I believe, getting a master’s degree in Computer Science. We
worked with the late Geoff Goldbogen on a project to evaluate the uses of AI for manufacturing.
Ferrucci ended up working with IBM while also collaborating with Selmer Bringsjord (Cognitive
Science at RPI) on a story generator called BRUTUS.
38
He then assembled the team at IBM that
created Watson, the computer system that beat humans at Jeopardy in February 2011. He went
on to found Elemental Cognition.
39
Ferrucci says
In a blog post on July 30, 2020
40
Ferrucci observes:
A language model is just a string probability guesser. Its superpower is to look at a string
of texta word, a sentence, a paragraphand guess how likely it is that a human
would write that string. To make these guesses, language models analyze mounds of text
in search of statistical patterns, such as what words tend to appear near what other
words, or how key terms repeat throughout a paragraph.
Yes. That’s all GPT-3 does and that’s all GPT-3 can do. Ferrucci goes on to observe that this is a
tremendously useful skill. However:
But ultimately, NLP aims higher. We want machines to understand what they read, and
to converse, answer questions, and act based on their understanding. So I have to
wonder: how much closer are we now than we were a decade ago, before neural
network language models swept the field?
As impressive as today’s NLP is, I worry that it’s still on a path that comes with severe
limitations. A system’s “understanding” can only go so far when its world consists
entirely of what writers typically say. The concepts we want machines to learn just
aren’t evident in the data we’re giving them.
Yes, language models can learn that humans often write “bowl” near “kitchen.” But
that’s the grand total of what a language model understands about bowls and kitchens.
Everything else that humans know about these objectsthat bowls have raised edges,
that bowls often break apart if you drop them, that people go to kitchens when hungry
to find foodis taken for granted. All this context is obvious to us thanks to our shared
experiences, so writers don’t bother to lay it all out.
38
Selmer Bringsjord and David Ferrucci, Artificial Intelligence and Literary Creativity: Inside the Mind of Brutus, A
Storytelling Machine, Psychology Press, 1999.
39
https://www.elementalcognition.com/.
40
David Ferrucci, Can super-parrots ever achieve language understanding? Elemental Cognition website,
accessed August 4, 2020, https://www.elementalcognition.com/super-parrots-blog.
31
Ferrucci was trained in old school symbolic processing, an enterprise in which researchers
devoted a couple decades to developing machine tractable mental models of various domains in
the world not texts, but the world. The objective was to produce artificial systems that
understand language in some meaningful way. Understanding was grounded in those mental
models. BRUTUS was built on such models. While Watson employed them as well, it also
employed the newer, a shallower, statistical models.
41
As far as I can tell from publically available
material, his approach at Elemental Cognition is eclectic as well, though the objective is more
general than the question-answering that governed Watson’s architecture and so will require a
different architecture.
Interlude in a Chinese room
Yet if you would believe John Searle, no matter how rich and detailed those old school mental
models, understanding would necessarily elude them. I am referring, of course, to his (in)famous
Chinese Room argument.
42
When I first encountered it years ago my reaction was something like:
interesting, but irrelevant. Why irrelevant? Because it said absolutely nothing about the techniques AI
or cognitive science investigators used and so would provide no guidance toward improving that
work. He did, however, have a point: If the machine has no contact with the world, how can it
possibly be said to understand anything at all? All it does is grind away on syntax.
What Searle misses, though, is the way in which meaning is a function of relations among
concepts, as I pointed out earlier (pp. 17 ff.). It seems to me, howeverand here I’m just making
this up we can think of meaning as having both an intentional aspect, the connection of signs to the
world, and a relational aspect, the relations of signs among themselves. Searle’s argument
concentrated on the former and said nothing about the latter.
What of the intentional aspect when a person is writing or talking about things not immediately
present, which is, after all quite common? In this case the intentional aspect of meaning is not
supported by the immediate world. Language use thus must necessarily be driven entirely by the
relations signifiers have among themselves, Sydney Lamb’s point which we have already
investigated (p. 17).
In this respect, however, it is not obvious to me that there is any difference between a system such
as DPT-3, which is utterly lacking in mental models, and old school symbolic systems, which were
built on them, and an eclectic system, such as Ferrucci proposesbut also Rodney Brooks and
Gary Marcus of Robust AI. What then is the value of having mental models?
Gaining control
Let us recall an earlier formulation (from p. 10) where we noted the GPT-3 was defined over a
huge corpus of language strings. Those strings were created by people making their way in the
world and thus expresses both their intentionality, which is directed at the world, and their
relationality, which is inherent in their minds, mental models plus language (syntax, morphology,
etc.). Those strings reflect both mind and world. The text corpora supporting NLP engines (such as
GPT-3) thus intertwine both the intentional aspect of meaning and the relational.
41
David Ferrucci, Eric Brown, Jennifer Chu-Carroll, et al. Building Watson: An Overview of the DeepQA
Project, AI Magazine, Fall 2010, pp. 59-79.
42
I have written a number of blog posts about this argument. Here’s one of them: Another romp around
Searle’s Chinese room, New Savanna, blog post, July 18, 2018, http://new-
savanna.blogspot.com/2018/07/another-romp-around-searles-chinese-room.html. You can find others at
the Searle link, which, however, contains other Searle posts as well, http://new-
savanna.blogspot.com/search/label/Searle.
32
The function of providing a machine with a mental model is, in effect, to liberate it from that
entanglement. It is that entanglement that limits GPT-3 to guessing, to What’s next?
I know nothing of Ferrucci’s technical approach (more likely, approaches) to integrating symbolic,
or deep semantics (to use the language from the Watson paper) based on mental models, and
shallow semantics, based on statistical models. It is the mere fact of such integration that is
important. For the relational information that is implicit in GPT-3’s language model is opaque to
outsiders, and cannot be manipulated directly. All one can do is submit a prompt and have GPT-
3 continue on, word after word after predicted word.
As Ferrucci says, we need to do better than that.
And we can.
Language mirrors the world
Let us return to Neubig’s isometric transform” onto meaning space (pp. 18 ff.). I began
explicating it in terms of the relationship between strings of signifiers in a corpus and a high
dimensional organization of signifieds in mental space. But doesn’t that relationship, that
transform, ultimately exist between the world and mental space?
For, as I’ve said many times before, those strings arise through the interaction of mental space
and the world: people making their way in the world through writing. The mind mirrors the
world. No more, no less. We have arrived back at the metaphysical structure of the world (pp. 23
ff.)
How could it be otherwise? The human mind is the product of millions of years of evolution, of
animals making their way in the world. Our sensorium is adapted to perceiving the world and our
motor system is adapted to moving about in the world. Perception and action are interrelated by
cognition and thought, the mind.
Natural language AIs cannot compete with the human sensorimotor system no matter how much
text they train on. Too much information is missing from the text. No doubt some of the gap can
be made up by variously hand-crafted augmentations and by having humans constantly interact
in partnership as Ferrucci and his team are doing at Elemental Cognition.
43
Then we have robots, which I haven’t discussed at all. Robots do perceive and move about in the
world. And, while we can equip robots with both perceptual and motor powers that we do not
have, those powers operate in very restricted domains. We do not know how to endow robots
with our sensorimotor capabilities. Our robots must necessarily remain strangers in the land.
The natural domain for an AI would be the digital world, that is the world in which an artificial
intelligence is a native. How do we endow an AI with the capacity to learn about and operate in a
purely digital world, and to what end?
With that question my line of thought in this working paper comes to an end. That is something I
intend to take up in a later working paper.
43
Elemental Cognition, “Continuous Human-Machine Collaboration”, accessed August 5, 2020,
https://www.elementalcognition.com/technology. See also his talk at the Allen Institute for AI in 2014,
https://youtu.be/F_0hpnLdNjk.
33
What’s next?
This working paper is at an end. As I indicated at the very beginning, this effort began with a long
comment I posted at Tyler Cowen’s blog, Marginal Revolution. This working paper has covered
the first two paragraphs in that comment. Here is the rest of that comment:
Think AI as platform, not feature (Andreessen).
44
Obvious implication, the basic
computer will be an AI-as-platform. Every human will get their own as a very young
child. They're grow with it; it'll grow with them. The child will care for it as with a pet.
Hence we have ethical obligations to them. As the child grows, so does the pet the pet
will likely have to migrate to other physical platforms from time to time.
Machine learning was the key breakthrough. Rodney Brooks Gengis, with its
subsumption architecture, was a key development as well, for it was directed at robots
moving about in the world. FWIW Brooks has teamed up with Gary Marcus and they
think we need to add some old school symbolic computing into the mix. I think they’re
right.
Machines, however, have a hard time learning the natural world as humans do. We're
born primed to deal with that world with millions of years of evolutionary history
behind us. Machines, alas, are a blank slate.
The native environment for computers is, of course, the computational environment.
That's where to apply machine learning. Note that writing code is one of GPT-3's skills.
So, the AGI of the future, let's call it GPT-42, will be looking in two directions, toward
the world of computers and toward the human world. It will be learning in both, but in
different styles and to different ends. In its interaction with other artificial computational
entities GPT-42 is in its native milieu. In its interaction with us, well, we'll necessarily be
in the driver's seat.
Where are we with respect to the hockey stick growth curve? For the last 3/4 quarters of
a century, since the end of WWII, we've been moving horizontally, along a plateau,
developing tech. GPT-3 is one signal that we've reached the toe of the next curve. But
to move up the curve, as I've said, we have to rethink the whole shebang.
We're IN the Singularity. Here be dragons.
[Superintelligent computers emerging out of the FOOM is bullshit.]
When I posted the first version of this working paper in August of 2020 I had intended to cover
that material in another series of posts which I would then consolidate into a working paper
tentatively entitled, After GPT-X: The Star Trek computer, and beyond. I started down
that path, wrote a number of posts, planned another working paper or three, but never made it to
the end. Still, that seems like a worthy objective but I’ve just taken another path. So...
To the Star Trek computer, and beyond.
44
Is AI a feature or a platform? [machine learning, artificial neural nets], New Savanna, blog post,
December 13, 2019, https://new-savanna.blogspot.com/2019/12/is-ai-feature-or-platfrom-machine.html.
34
Appendix: Semanticity, adhesion and relationality
Let’s review a passage where I discuss Searle’s Chinese Room thought-experiment (p. 31):
Yet if you would believe John Searle, no matter how rich and detailed those old school
mental models, understanding would necessarily elude them. I am referring, of course,
to his (in)famous Chinese Room argument. When I first encountered it years ago my
reaction was something like: interesting, but irrelevant. Why irrelevant? Because it said
absolutely nothing about the techniques AI or cognitive science investigators used and
so would provide no guidance toward improving that work. He did, however, have a
point: If the machine has no contact with the world, how can it possibly be said to
understand anything at all? All it does is grind away on syntax.
What Searle misses, though, is the way in which meaning is a function of relations
among concepts, as I pointed out earlier (pp. 17 ff.). It seems to me, however and here
I’m just making this up we can think of meaning as having both an intentional aspect, the
connection of signs to the world, and a relational aspect, the relations of signs among
themselves. Searle’s argument concentrated on the former and said nothing about the
latter.
What of the intentional aspect when a person is writing or talking about things not
immediately present, which is, after all quite common? In this case the intentional
aspect of meaning is not supported by the immediate world. Language use thus must
necessarily be driven entirely by the relations signifiers have among themselves, Sydney
Lamb’s point which we have already investigated (p. 17).
Those statistics are grabbing onto the relational aspect of meaning. The question is: How much of
that can these methods recover from texts? Let’s set that aside for the moment.
Intention, relationality, and adhesion
That passage mentions intention and relation. Intention resides in the relationship between a person
and the world. Relation resides in the relationships that signifiers have among themselves. It is a
property of the cognitive system. I am now thinking that it must be paired with adhesion. Taken
together they constitute semanticity. Thus we have semanticity and intention where semanticity is a
general capacity inherent in the cognitive system, in a person’s mind, and intention inheres in the
relation between a person and the world in a particular perceptual and/or cognitive activity.
What do I mean by adhesion? Adhesion is how words ‘cling’ to the world while relationality is the
differential interaction of words among themselves within the linguistic system. Words whose
meaning is defined directly over the physical world, but also, to some extent, the interpersonal
world of signals and feeling, they adhere to the world through sensorimotor schemas. Words
whose meaning is abstract are more problematic. Their adhesion operates though patterns of
words and other signs and symbols (e.g. mathematics, data visualizations, illustrative diagrams of
various kinds, and so forth). Teasing out these systems of adhesion has just barely begun.
The psychologist J.J. Gibson talked of the affordances an environment presents to the organism.
Affordances as the features of the world which an organism can readily pick up during its life in
the world. Adhesions are the organism’s complement to environmental affordances; they are the
perceptual devices through which the organism relates to the affordances.
35
What this means for language models
Large language models built through deep neural networks, such as GPT-3, conflate the
interaction of three phenomena: 1) the world-level relational aspect of semanticity as captured in
the locations of word forms (signifiers) in a string, 2) the conventions of discourse structure, and 3)
the world itself. The world is present in the model because the texts over which the model was
constructed were created by people interacting in the world. They were in an intentional
relationship with the world when they wrote those texts. The conventions of discourse are present
simply because they organize the placement of word forms in a text, with special emphasis on the
long-distance relationships of word. As for relationality, that’s all that can possibly be present in a
text. Adhesions belong to the realm of signifieds, of concepts and ideas, and they aren’t in the text
itself.
Would it somehow be possible to factor a language model into these three aspects? I have no idea.
The point of doing so would be to reduce the overall size of the model.
Putting that aside, let us ask: Given a sufficiently large database of texts and tokens and a high
enough number of parameters for our model, is it possible for a language model to extract all the
relationality from the texts? How much of that multidimensional relational semanticity can be
recovered from strings of word forms? Given a deep enough understanding of how relational
semantics is reflected in the structure of texts, can we calculate what is possible with various text
bases and model parameterization?
To answer those questions we need to have some account of semantic relationality which we can
examine. The models of Old School symbolic AI and computational linguistics provide such
accounts. Many such models have been created. Which ones would we choose as the basis for our
analysis? The sort of question that interests me is how many word forms have their meanings
given in adhesions to the physical world (that is, physical objects and events), to the interpersonal
world (facial expressions, gestures, etc.) and how many word forms are defined abstractly?
So many questions.
... As well as the use of «general common sense» has not yet been implemented. However, the situation is gradually changing -the achievements and successes of the artificial neural network GPT-3 amaze the imagination [9]. ...
Chapter
Full-text available
The article describes the authors’ approach to the construction of general-level artificial cognitive agents based on the so-called «semantic supervised learning». Within this approach in accordance with the hybrid paradigm of artificial intelligence, both machine learning methods and methods of the symbolic approach («good old-fashioned artificial intelligence») are used. A description of current problems with understanding of the general meaning and context of situations in which narrow AI agents are found is presented. The definition of semantic supervised learning is given and its relationship with other machine learning methods is described. In addition, a thought experiment is presented, which shows the essence and meaning of semantic training with a teacher, which makes it possible to «educate» a general-level AI agent. It opens the opportunity to apply not only general-level knowledge about the world around the agent, but also introduce the personal experience, which, according to the authors, will lead to a full understanding of the context and, ultimately, to the construction of general artificial cognitive agents. The article also provides a possible architecture for an artificial cognitive agent that can implement semantic supervised learning. The novelty of the work lies in the authors’ approach to combining various methods of artificial intelligence within the hybrid paradigm. The relevance of the work is based on the ever-increasing interest in general-level artificial intelligence methods in science, technology, business and world politics. The article is theoretical. The article will be of interest to specialists in the field of artificial intelligence (especially in the direction of building artificial general intelligence), philosophy of consciousness, and in general to all those who are interested in up-to-date relevant information about the approaches and methods of implementing general artificial cognitive agents.
... Равно как и использование «общего здравого смысла». Впрочем, постепенно ситуация меняется -достижения и успехи искусственной нейронной сети GPT-3 поражают воображение [Benzon, 2020]. ...
Preprint
Full-text available
The article describes the author's approach to the construction of general-level artificial cognitive agents based on the so-called «semantic supervised learning», within which, in accordance with the hybrid paradigm of artificial intelligence, both machine learning methods and methods of the symbolic approach and knowledge-based systems are used («good old-fashioned artificial intelligence»). A description of current problems with understanding of the general meaning and context of situations in which narrow AI agents are found is presented. The definition of semantic supervised learning is given and its relationship with other machine learning methods is described. In addition, a thought experiment is presented, which shows the essence and meaning of semantic training with a teacher, which makes it possible to «educate» a general-level AI agent, giving him the opportunity to apply not only general-level knowledge about the world around it, but also personal experience, which, according to the author, will lead to a full understanding of the context and, ultimately, to the construction of general artificial cognitive agents. The article also provides a possible architecture for an artificial cognitive agent that can implement semantic supervised learning. The novelty of the work lies in the author's approach to combining various methods of artificial intelligence within the hybrid paradigm. The relevance of the work is based on the ever-increasing interest in general-level artificial intelligence methods in science, technology, business and world politics. The article is theoretical. The article will be of interest to specialists in the field of artificial intelligence (especially in the direction of building artificial general intelligence), philosophy of consciousness, and in general to all those who are interested in up-to-date relevant information about the approaches and methods of implementing general artificial cognitive agents.
Article
Full-text available
IBM Research undertook a challenge to build a computer system that could compete at the human champion level in real time on the American TV quiz show, Jeopardy. The extent of the challenge includes fielding a real-time automatic contestant on the show, not merely a laboratory exercise. The Jeopardy Challenge helped us address requirements that led to the design of the DeepQA architecture and the implementation of Watson. After three years of intense research and development by a core team of about 20 researchers, Watson is performing at human expert levels in terms of precision, confidence, and speed at the Jeopardy quiz show. Our results strongly suggest that DeepQA is an effective and extensible architecture that can be used as a foundation for combining, deploying, evaluating, and advancing a wide range of algorithmic techniques to rapidly advance the field of question answering (QA).
How Many Words Are There In The English Language?
How Many Words Are There In The English Language?" Dictionary.com, accessed August 1, 2020, https://www.dictionary.com/e/how-many-words-in-english/.
I've explored the notion of texts as paths in semantic space
I've explored the notion of texts as paths in semantic space, William Benzon, Virtual Reading: The Prospero Project Redux, Working Paper, Version 2, October 2018, 37 pp., https://www.academia.edu/34551243/Virtual_Reading_The_Prospero_Project_Redux.
Can super-parrots ever achieve language understanding? Elemental Cognition website
  • David Ferrucci
David Ferrucci, Can super-parrots ever achieve language understanding? Elemental Cognition website, accessed August 4, 2020, https://www.elementalcognition.com/super-parrots-blog.