ArticlePDF Available

Functional parallelism in spoken word-recognition

April 1987
Cognition 25(1-2):71-102

April 1987
25(1-2):71-102

DOI:10.1016/0010-0277(87)90005-9

Source
PubMed

Authors:

William D Marslen-Wilson

University of Cambridge

The process of spoken word-recognition breaks down into three basic functions, of access, selection and integration. Access concerns the mapping of the speech input onto the representations of lexical form, selection concerns the discrimination of the best-fitting match to this input, and integration covers the mapping of syntactic and semantic information at the lexical level onto higher levels of processing. This paper describes two versions of a “cohort”-based model of these processes, showing how it evolves from a partially interactive model, where access is strictly autonomous but selection is subject to top-down control, to a fully bottom-up model, where context plays no role in the processes of form-based access and selection. Context operates instead at the interface between higher-level representations and information generated on-line about the syntactic and semantic properties of members of the cohort. The new model retains intact the fundamental characteristics of a cohort-based word-recognition process. It embodies the concepts of multiple access and multiple assessment, allowing a maximally efficient recognition process, based on the principle of the contingency of perceptual choice.

Content uploaded by William D Marslen-Wilson

Content may be subject to copyright.

Cognition, 25 (1987) 71-102

Functional parallelism in spoken word-recognition

WILLIAM D. MARSLEN-WILSON*

Max-Planck-lnstitut fijr Psycholinguistik,

Nijmegen, and MRC Applied Psychology

Unit, Cambridge

Abstract

The process of spoken word-recognition breaks down into three basic func-

tions, of access, selection and integration. Access concerns the mapping of the

speech input onto the representations of lexical form, selection concerns the

discrimination of the best-fitting match to this input, and integration covers the

mapping of syntactic and semantic information at the lexical level onto higher

levels of processing. This paper describes two versions of a “cohort”-based

model of these processes, showing how if evolves from a partially interactive

model, where access is strictly autonomous but selection is subject to top-down

control, to a fully bottom-up model, where context plays no role in the process-

es of form-based access and selection. Context operates instead at the interface

between higher-level representations and information generated on-line about

the syntactic and semantic properties of members of the cohort. The new model

retains intact the fundamental characteristics of a cohort-based word-recogni-

tion process. It embodies the concepts of multiple access and multiple assess-

ment, allowing a maximally efficient recognition process, based on the princi-

ple of the contingency of perceptual choice.

1. Introduction

To understand spoken language is to relate sound to meaning. At the core

of this process is the recognition of spoken words, since it is the knowledge

representations in the mental lexicon that provide the actual bridge between

sounds and meanings, linking the phonological properties of specific word-

*I thank Uli Frauenfelder and Lorraine Tyler for their forbearance as editors, and for their comments on

the manuscript. I also thank Tom Bever and two anonymous reviewers for their stimulating criticism of

previous drafts. The first version of this paper was written with the support of the Department of Experimental

Psychology, University of Cambridge, which I gratefully acknowledge. Reprint requests should be sent to

William Marslen-Wilson, MPI fiir Psycholinguistik, Wundtlaan 1, 6525 XD Nijmegen, The Netherlands

OOlO-0277/87/$10.10 0 1987, Elsevier Science Publishers B.V.

72 W. D. Marslen- Wilson

forms to their syntactic and semantic attributes. This duality of lexical repre-

sentation enables the word-recognition process to mediate between two radi-

cally distinct computational domains-the acoustic-phonetic analysis of the

incoming speech signal, and the syntactic and semantic interpretation of the

message being communicated. In this paper, I am concerned with the conse-

quences of this duality of representation and of function for the organisation

of the word-recognition process as an information-processing system.

The overall process of spoken word-recognition breaks down into three

fundamental functions. These I will refer to as the access, the selection, and

the integration functions. The first of these, the access function, concerns the

relationship of the recognition process to the sensory input. The system must

provide the basis for a mapping of the speech signal onto the representations

of word-forms in the mental lexicon. Assuming some sort of acoustic-phone-

tic analysis of the speech input, it is a representation of the input in these

terms that is projected onto the mental lexicon.

The integration function, conversely, concerns the relationship of the rec-

ognition process to the higher-level representation of the utterance. In order

to complete the recognition process, the system must provide the basis for

the integration, into this higher level of representation, of the syntactic and

semantic information associated with the word that is being recognised.

Finally, and mediating between access and integration, there is the selec-

tion function. In addition to accessing word-forms from the sensory input,

the system must also discriminate between them, selecting the word-form

that best matches the available input.

These three functional requirements have to be realised in some way in

any model of spoken word-recognition. They need to be translated into claims

about the kinds of processes that subserve these functions, and about the

processing relations between them during the recognition of a word. I will

begin the discussion here by considering the way that the access and selection

functions are realised, and their relationship to the integration function. How

far do access, selection, and integration correspond to separate processing

stages in the recognition of a spoken word, and to what extent do they

operate in computational isolation from one another?

I will develop the argument here in its approximate historical sequence.

In Section 2 I will argue that, while the accessing of the mental lexicon is a

strictly autonomous, bottom-up process, there seems to be a close computa-

tional dependency between the process of selecting the word-form that best

matches the sensory input and the process of integrating the syntactic and

semantic properties of word-forms with their utterance context. The charac-

teristics of the real-time transfer function of the system suggest that the selec-

tion phase of the recognition process cannot depend on bottom-up informa-

Functional parallelism in spoken word-recognition 73

tion alone, and that contextual constraints also affect its outcome. This, as I

will show in Section 3, led to the first version of the cohort model: a parallel,

interactive model of spoken word-recognition. In Section 4 I will examine the

properties and predictions of this early model. In Section 5 I will show how

this model now needs to be modified. In particular, I will argue that it needs

to incorporate the concept of activation, and I will re-examine the role of

top-down interaction in the on-line recognition process, suggesting a model

where different information sources are integrated together to give the per-

ceptual output of the system, but where they do not, in the conventional

sense, interact. In particular, I argue for the autonomy of form-based selec-

tion, as well as for the autonomy of form-based access.

2. The earliness of spoken word-recognition

The crucial constraint on the functional properties of access and selection is

the earliness of correct selection. This I define as the reliable identification

of spoken words, in utterance contexts, before sufficient acoustic-phonetic

information has become available to allow correct identification on that basis

alone. If this can be demonstrated, then it places strong restrictions not only

on how the selection process is organised, but also on the ways in which

representations are initially accessed from the bottom-up.

To prove early selection, two things must be established. The first is how

long it takes to recognise a given word. This reflects the timing with which

the selection function is completed. The second is whether the acoustic-pho-

netic information available at this estimated selection-point is or is not suffi-

cient, by itself, to support correct identification.

The major techniques for establishing the timing of on-line word-recogni-

tion-thereby answering the first of these two questions-involve fast reac-

tion-time tasks. Typical examples are the shadowing and the identical

monitoring tasks, where the listener responds directly to the words he hears-

either by repeating them aloud, or by making a detection response to a

word-target. The mean reaction-times in such tasks, measured from word-

onset, can be used as a direct estimate of selection-time, subject to a correc-

tion factor to allow for the time it takes to execute the response.’ Typical

values obtained in these tasks (for one- and two-syllable content words heard

‘The use of a correction factor compensates for the fact that a monitoring reaction-time of, for example,

250 ms, does not mean that the word was not identified until 250 ms of it had been heard. There is undoubtedly

Verne lag between the internal decision process and the external evidence that this decision has been made.

The correction factor reflects this.

74 W. D. Marslen- Wilson

in normal utterance contexts) are of the order of 250-275 ms, which, with a

correction factor of 50-75 ms, gives a mean selection-time of around 200 ms

(e.g., Marslen-Wilson, 1973, 1985; Marslen-Wilson & Tyler, 1975, 1980).

Similar values can be obtained, more indirectly, from reaction-time tasks

where the listeners are asked to respond, not to the word itself, but to some

property of the word whose accessibility for response depends on first iden-

tifying the word in question. Examples of this are the rhyme-monitoring

results reported by Marslen-Wilson & Tyler (1975, 1980) and others (e.g.,

Seidenberg & Tanenhaus, 1979), and at least some research involving the

phoneme-monitoring task (e.g., Marslen-Wilson, 1984; Morton & Long,

1976). By subtracting an additional constant from the response-times in these

tasks, to take into account the extra phonological matching processes they

involve, one again arrives at selection-times for words in context of the order

of 200 ms from word-onset.

But these estimates are only half of the equation. It is also necessary to

establish whether or not the acoustic-phonetic information available at these

selection-points is sufficient for correct selection. For the research described

above, this could only be done indirectly, by estimating the average number

of phonemes that could be identified within 200 ms of word-onset, and then

using that estimate to determine how many words would normally still be

consistent with the input. If, as the available measurements suggest, 200 ms

would only be enough to specify an initial two phonemes, then there would

on average be more than 40 words still compatible with the available input

(this estimate is based on the analysis of a 20,000-word phonetic dictionary

of American English (Marslen-Wilson, 1984)). The limitation of this indirect

inference to early selection is that it cannot take into account possible coar-

ticulatory and prosodic effects. This could lead to an underestimate of the

amount of sensory information actually available to the listener after 200 ms.

The second main technique allows a more direct measure of the sufficiency

of the acoustic-phonetic input available at the estimated selection-point. This

is the gating task, as developed by Grosjean (1980), and exploited by Tyler

and others (e.g., Salasoo & Pisoni, 1985; Tyler & Wessels, 1983). Listeners

are presented with successively longer fragments of a word, at increments

ranging (in different experiments) from 20 to 50 ms, and at each increment

they are asked to say what they think the word is, or is going to become.

This tells us exactly how much acoustic-phonetic input the listener needs to

hear to be able to reliably identify a word under various conditions. In the

original study by Grosjean (1980), we find that subjects needed to hear an

average of 199 ms of a word when it occurred in sentential context, as op-

posed to 333 ms for the same acoustic token presented in isolation.

Because of the unusual way the auditory input is presented in the gating

Functional parallelism in spoken word-recognition 75

task, there has been some criticism of its validity as a reflection of normal

word-recognition processes. Since the listener hears the same fragments re-

peated many times in sequence, this might encourage abnormal response

strategies. This objection is met by Cotton and Grosjean (1984) and Salasoo

and Pisoni (1985), whose subjects heard only one fragment for any given

word, and where the pattern of responses matched very closely the results

for the same words when presented as complete sequences to each subject.

It is also possible that responses are distorted by the effectively unlimited

time-in comparison to normal listening-that listeners have available to

think about what the word could be at each presentation. This objection is

met by Tyler and Wessels (1985), in an experiment where subjects also heard

only one fragment from each word, and where they responded by naming the

word as quickly as possible. Mean naming latencies were 478 ms from frag-

ment offset, and the response patterns again closely corresponded to those

obtained without time-pressure.

In a recent study (Brown, Marslen-Wilson, & Tyler, unpublished) we have

combined reaction-time measures for words heard normally with gating tests

for the same words. This provides the most direct evidence presently available

for early selettion. In the first half of the experiment, subjects monitored

pairs of sentences for word targets, with a mean reaction-time for words in

normal contests of 241 ms. This gives an estimated selection-time of 200 ms

or less. In the second part of the experiment, the target-words were edited

out of the stimulus tapes and presented, as isolated words, to a different set

of subjects in a standard gating task. The mean identification-time estimated

here was 301 ms, indicating that the words were being responded to in the

monitoring task some 100 ms before sufficient acoustic-phonetic information

could have accumulated to allow recognition on that basis alone.2

Given, then, that we have accurate and reliable estimates of the two vari-

ables in our equation, simple arithmetic tells us that content words, heard in

utterance contexts, can usually be selected-and, indeed, recognised-earlier

than would be possible if just the acoustic-phonetic input was being taken

into account. Naturally, as Grosjean and Gee (1987, this issue) point out,

some words-especially function words and short, infrequent content

words-will often not be recognised early. In fact, under certain conditions

of temporary ambiguity, as Grosjean (1985) has documented, “late” selection

will occur, where the word is not only not recognised early, but may not even

be identified until the word following it has been heard. These observations

nonetheless do not change the significance of the fact that a large proportion

‘There is still the problem here of factoring out the purely acoustic-phonetic effects of removing words

from their contexts. We are investigating this in current research.

76 W. D. Murslen- Wilson

of words are selected early. A theory of lexical access has to be able to

explain this, just as it has to deal with late selection as well. Late selection,

however, places far weaker constraints on the properties of the recognition

process than does early se1ection.j

A different type of objection is methodological in character. It is argued

that none of the tasks used to establish early selection are measuring “real”

word-recognition. Instead, by forcing subjects to respond unnaturally early,

they elicit some form of sophisticated guessing behaviour. Forster (1981), for

example, argues that when a subject responds before the end of the word,

as in the shadowing task, he must in some way be guessing what the word

will be, on the basis of fragmentary bottom-up cues plus knowledge of con-

text.

Such objections, however, have little force. First, because the claim that

subjects are responding “unnaturally early” does not have any independent

empirical basis. There is no counter evidence, from “more natural” tasks,

showing that under these conditions different estimates of recognition-time

are obtained-nor is the notion “more natural task” itself easy to defend

except in terms of subjective preference. Secondly, to distinguish under these

conditions between “perception of the target word and guessing” (Forster,

1981, p. 490; emphases in original) is to assume, as a theoretical a priori, a

particular answer to the fundamental questions at issue.

Forster apparently wants to rule out, as an instance of normal perception,

cases where the listener responds before all of the sensory information poten-

tially relevant to that response has become available. But this presupposes a

theory of perception where there is a very straightforward dependency be-

tween the sensory input and the corresponding percept. The claims that I am

trying to develop here allow for the possibility of a less direct causal relation-

ship between the sensory input and the percept (see Marcel, 1983, for a

discussion of some related issues). These claims may or may not prove to be

correct. But one cannot settle the issue in advance by excluding evidence on

the grounds that it conflicts with the theoretical assumptions whose validity

one is trying to establish. If one is advancing the view that normal perception

is just the outcome of the integration of partial bottom-up cues with contex-

tual constraints, then it is not an argument against this view simply to assert

that perception under these conditions is not perception.

‘It should also bc clear. contrary to Grosjean and others, that the phenomenon of “late selection”, does

not constitute a problem for theories, like the cohort model, which emphasise the real-time nature of the

word-recognition proccsa. Actwation-based versmns of the cohort model, as discussed in Section 5, and as

modcllcd. for example. in the McClelland and Elman (1986) TRACE model, functmn equally well indepen-

dent of whether the critical sensory information arrives before or after the word boundary (as classically

defined).

Functional parallelism in spoken word-recognition 77

3. Implications of early selection

Early selection means that the acoustic-phonetic and the contextual con-

straints on the identity of a word can be integrated together at a point in time

when each source of constraint is inadequate, by itself, to uniquely specify

the correct candidate. The sensory input can do no more than specify a class

of potential candidates, consisting of those entries in the mental lexicon that

are compatible with the available input. Similarly, the current utterance and

discourse context provides a set of acceptability criteria that also can do no

more than delimit a class of potentially appropriate candidates. It is only by

intersecting these two sets of constraints that the identity of the correct can-

didate can be derived at the observed selection-point. It is this that forces a

parallel model of access and selection, and that poses intractable difficulties

for any model which depends on an autonomous bottom-up selection process

to reliably identify the single correct candidate for submission to subsequent

processing stages (e.g., Forster, 1976, 1979, 1981).

To see this, consider the major functional requirements that early selection

places upon the spoken word-recognition system. These are the requirements

of multiple access, of multiple assessment, and of real-time efficiency. They

reflect the properties the recognition system needs to have if it is to integrate

sensory and contextual constraints to yield mean selection-times of the order

of 200 ms.

Multiple access is the accessing of multiple candidates in the original map-

ping of the acoustic-phonetic input onto lexical representations. The sensory

input defines a class of potential word-candidates, and, in principle, all of

these need to be made available, via a multiple access process, to the selection

phase of spoken word-recognition. The second requirement is the require-

ment for multiple assessment. If contextual constraints are to affect the selec-

tion phase at a point in time when many candidates are compatible with the

sensory input, then the system must provide a mechanism whereby each of

these candidates can be assessed for their syntactic and semantic appropriate-

ness relative to the current context.

The final, and critical, requirement is for real-time efficiency. The system

must be organised to allow these access and assessment activities to take

place in real time, such that the correct candidate can be identified-and

begin to be integrated into an utterance-level representation-within about

200 ms of word-onset.

These requirements, taken together, cannot be met by a serial process

moving through the decision space one item at a time (cf. Fahlman, 1979).

They point, instead, to some form of parallel or distributed recognition model

(e.g., Hinton & Anderson, 1981). But they do not, however, uniquely deter-

78 W. D. Marslen- Wilson

mine the form of such a model. In particular, they do not unambiguously

dictate the manner in which the word-recognition process is divided up into

distinct processing stages. But they do place strong constraints on the func-

tional properties of the recognition model. The strategy that I have followed,

therefore, is to propose a model which rather literally embodies these con-

straints, and then to use this model as a heuristic starting-point for a detailed

investigation of the properties of on-line speech processing. Accordingly, I

will begin here by describing the first version of this model and the predictions

it makes. In a later section, I will discuss the ways the model now needs to

be expanded and modified.

The model in question, labelled an “active direct access model” in Marslen-

Wilson and Welsh (1978), but now usually referred to as the “cohort model”,

evolved out of an analysis of Morton’s logogen model (as stated in Morton,

1969) and of the Forster “bin” model (Forster, 1976). As originally stated, it

meets the requirements of multiple access and multiple assessment by assum-

ing a distributed, parallel processing system. In this system, each individual

entry in the mental lexicon is assumed to correspond to a separate computa-

tionally active recognition unit. This unit represents a functional coordination

of the acoustic-phonetic and of the syntactic and semantic specifications as-

sociated with a given lexical entry.

Given such an array of recognition elements, this leads to the characteristic

“cohort” view of the recognition process, with its specific claims about the

way this process develops over time. A lexical unit is assumed to become

active when the sensory input matches the acoustic-phonetic pattern specified

for that unit. The model prohibits top-down activation of these units in nor-

mal word-recognition, so that only the sensory input can activate a unit.

There is no contextually driven pre-selection of candidates, so that words

cannot become active as potential percepts without some bottom-up (sensory)

input to the structures representing these words.

Early in the word, when only the first 100-150 ms have been heard, then

the recognition devices corresponding to all of the words in the listener’s

mental lexicon that begin with this initial sequence will become active-

thereby meeting the requirement for multiple access.4 This subset of active

elements, constituting the word-initial cohort, monitors both the continuing

sensory input, and the compatibility of the words that the elements represent

‘The notion of “activity” will bc examined more closely in Section 5. What it means here is that each

lexical recognition unit, as a computationally independent pattern-matching device, can respond to the presence

of a match with the signal. Ail words that could match the input are matched by it, and this changes the state

of the relevant pattern matching devices, thereby differentiating them from the other devices in the system,

which do not match the current input.

Functional parallelism in spoken word-recognition 79

with the available structural and interpretative context-which meets the re-

quirement for multiple assessment. A mismatch with either source of con-

straint causes the elements to drop out of the pool of potential candidates.

This means that there will be a sequential reduction over time in the initial

set of candidates, until only one candidate is left. At this point, the correct

word-candidate can be recognised, and the correct word-sense, with its struc-

tural consequences, is incorporated into the message-level representation of

the utterance. This is a system that allows for optimal real-time efficiency,

since each word will be recognised as soon as the accumulating acoustic-

phonetic information permits, given the available contextual constraints.5

In terms of the issues raised earlier in this paper, the model treats the

initial access phase as a functionally separable aspect of the recognition pro-

cess. It does not do this by postulating an independent processing component

which performs the access function-in the style, for example, of the

peripheral access files proposed by Forster and others (e.g., Forster, 1976;

Norris, 1981). It assumes, instead, that the processing mechanisms underlying

word-recognition can only be engaged by a bottom-up input. It is the speech

signal, and only the speech signal, that can activate perceptual structures in

the recognition lexicon. ’ This has the effect of making access functionally

autonomous, without having to make claims about additional levels and pro-

cesses.

Once the word-initial cohort has been accessed, and the model has entered

into the selection phase, then top-down factors begin to affect its behaviour.

It is this that allows the model to account for early selection. When a word

is heard as part of a normal utterance, then both sensory and contextual

constraints contribute jointly to a process of mapping word senses onto

‘The sequential cohort recognition process is sometimes treated as if it were equivalent to following a path

down a “pronunciation tree”. This is a branching structure, starting from a single phoneme (e.g., /t/), and

branching at each subsequent phonetic choice point. By following the path to its terminal node one arrives at

the correct word--trespass, tress, rrend, or whatever. This captures in a limited sense the sequential decision

process represented in the cohort model. Where it fails, however, is to capture the treatment of context in

the cohort model, In a pronunciation tree, it is only when one reaches the terminal node that one can know

what word one is hearing. It is only at this point, therefore, that the syntactic and semantic information

associated with this word can be accessed. and made available for interaction with context. But the cohort

model-and the evidence on which it is based-require context to be able to operate much earlier in the word,

to help select the correct word even before the sensory input could have uniquely identified it. The pronunci-

ation tree is neither an adequate model of human word-recognition nor an accurate depiction of the cohort

model.

“It is not an argument against this claim to point out that one can often predict what someone is going to

say before they say it. There is no doubt that this is true. But to be able to predict what someone will say is

(i) not the same as having the percept that they have actually said it, nor (ii) is it evidence that this knowledge

can penetrate, top-down, into the mental lexicon. and change the state of the basic recognition devices-and

it is this that’s at issue here.

80 W. D. Marslen- Wilson

higher-level representations. The way this is realised in the model is by allow-

ing the semantic and syntactic appropriateness of word-candidates to directly

affect their status in the current cohort, which causes the selection process to

converge on a single candidate earlier than it would if only acoustic-phonetic

constraints were being taken into consideration.

Even in this rough and ready form-that is, as stated in Marslen-Wilson

and Welsh (1978) and Marslen-Wilson and Tyler (1980)-the model serves

its heuristic purpose. It makes a number of strong predictions, which not only

differentiate it from other models, but also, more importantly, raise novel

and testable questions about the temporal microstructure of spoken word-rec-

ognition. In the next section of this paper I will summarise the research by

myself and others into three of these major predictions: The model’s claims

about the concept of “recognition-point”, about optimal real-time analysis,

and about the early activation of multiple semantic codes.

4. Some predictions of the cohort model

4.1. The concept of recognition-point

The unique feature of the cohort model is its ability to make predictions

about the precise timing of the selection and integration process for any

individual word in the language. Other models have had essentially nothing

to say about the recognition process at this level of specificity. The cohort

model, in contrast, provides a theoretical basis for predicting the recognition-

point for any given word. This is the point at which, starting from word-onset,

a word can be discriminated from the other members of its word-initial

cohort, taking into account both contextual and sensory constraints. For

many words--especially monosyllables-this point may only be reached when

all of the word has being heard. But for longer words-and for words of any

length heard in constraining contexts-the recognition-point can occur well

before the end of the word.7

‘In a recent paper, Lute (1986) argues against the notion of recognition-point on the grounds that most

common words are monosyllables and that most monosyllables (as he establishes by searching a lexical data-

base) do not become unique until the end of the word or after. There are a number of problems WI& his

argument.

The first is that he does not take into account the role of prosodic structure and of various types of

anticipatory coarticulation in the recognition process. These will not only position the recognition-point earlier

than a purely phonemic analysis would indicate, but will also reduce the potential problem created by short

words that are also the first syllables of longer words. The second is that the claims of the cohort model derive,

in the first instance, from observations of word-recognition in context, where even monosyllables are normally

recognised before all of them have been heard (see Section 2 above). Thirdly, the important claim of the cohort

Functional parallelism in spoken word-recognition 81

Take, for example, the word “trespass”. If this word is heard in isolation,

then its recognition-point-the point at which it can be securely identified-is

at the /p/, since it is here that it separates from words like “tress” and “tres-

tle”. The recognition-point for the same word in context might be at the first

/s/, however, if these competitors were syntactically or semantically excluded.

Similar predictions can be derived for any word in any context, given a specifi-

cation of the word-initial cohort for that word, and of the constraints deriv-

able from the context in which it is uttered.

The crucial hypothesis underlying the notion of recognition-point is a claim

about the contingency of the recognition process. The identification of a word

does not depend simply on the information that a given word is present. It

also depends on the information that other words are not present. The word

“trespass”, heard in isolation, is only identifiable at the /p/ if the decision

process can take into account, in real-time, the status of other potential

word-candidates. The calculation of recognition-points directly reflects this.

If these predicted recognition-points are experimentally validated, then this

rules out all models of spoken word-recognition that do not allow for these

dependencies.

4.1.1. Evidence for recognition-points

Paralleling the various types of evidence for early selection summarised in

Section 2, the evidence for the psychological validity of recognition-points

derives from a mixture of reaction-time and gating tasks. In a first experiment

(Marslen-Wilson, 1978; Marslen-Wilson, 1984) response-latencies in a

phoneme-monitoring task were found to be closely correlated with re-

cognition-points, both as calculated a priori on the basis of cohort analysis,

and as operationally defined in a separate gating task.

In phoneme-monitoring, the subject is asked to monitor spoken materials

for a phoneme target defined in advance. There are two major strategies

listeners can use to do this (cf. Cutler & Norris, 1979). I exploited here the

lexical strategy, where the listener determines that a given phoneme is present

by reference to his stored phonological knowledge of the word involved.

When this strategy is used, response-latency is related to the timing of word

identification, since the phonological representation of the word in memory

cannot be consulted until it is known which word is being heard. If cohort

theory correctly specifies the timing of word-identification, then there should

model is, in any case, not whether the recognition-point falls early or late relative to the word-boundary, but

rather that the word is uniquely discriminated as soon as the available constraints (sensory. contextual) make

it possible for the system to do so. Wherever the recognition-point falls, that is where the listener should

identify the word in question. And for content words heard in utterance context, this will be, more often than

not, before all of the word has been heard.

82 W. D. Marslen- Wilson

be a close dependency between the monitoring response and the distance

between the phoneme-target and the recognition-point for that word. In par-

ticular, response-latency should decrease the later the target occurs relative

to the recognition-point, since there will be less of a delay before the subject

can identify the word and access its phonological representation.

I evaluated this question for a set of 60 three-syllable words, which con-

tained phoneme targets varying in position from the end of the first syllable

until the end of the third syllable. I had already confirmed that a lexical

strategy was being used for these stimuli, since overall response-latencies

dropped sharply over serial-positions, compared to a control set of nonsense

words where there was no change in latency as a function of position (for

further details, see Marslen-Wilson, 1984). The cohort structure of the mate-

rials was analysed to determine the recognition-point for each word, and the

distances measured between the recognition-points and the monitoring

targets. These recognition-points could occur as much as two or three

hundred ms before or after the target-phoneme.

A linear regression analysis showed that there was a close relationship

between these distances and the monitoring response (r = +.89).’ The vari-

ations in distance accounted for over 80% of the variance in the mean laten-

ties for the 60 individual words containing targets. This strong correlation

with phoneme-monitoring latency shows that recognition-points derived from

cohort analysis have a real status in the immediate, on-line processing of the

word. The subjects in this experiment were using a lexical strategy, so that

their response-latencies reflected the timing of word-recognition processes,

and the cohort model correctly specified the timing of these processes for the

words involved.

These results were checked in a follow-up study, which used the gating

task to operationally define the recognition-points for the same set of mate-

rials. Gating offers a variety of methods for calculating recognition-points,

depending on whether or not confidence ratings are taken into account. The

most satisfactory results are obtained when confidence ratings are included,

since this reduces the distorting effects of various response biases. Gating

recognition-points were therefore defined as the point in a word at which

85% of the subjects had correctly identified the word, and where these sub-

jects were at least 85% confident.’ These operationally derived recognition-

‘The correlation is positive because the earlier the recognition-point occurs, relative to the position of the

target phoneme (which is also the point from which response-time is measured), the longer the subjects have

to wait until they can identify the word, access its phonological representation. and then make their response.

“The exact percentage chosen as criteria1 is not critical. Setting the level at 80 or 90%, for example, gives

equivalent results.

Functional parallelism in spoken word-recognition 83

points correlated very highly both with the previous set of recognition-points

(calculated on an a priori basis) and with the phoneme-monitoring response

latencies (r = +.92).

The comparison between gating recognition-points and a priori recogni-

tion-points is further evidence that the cohort model does provide a basis for

correctly determining when a word can be recognised. The point at which a

word becomes uniquely identifiable, as established through an analysis of

that word’s initial cohort, corresponds very well to the point at which listeners

will confidently identify a word in the gating task. This has been confirmed

for a new set of materials, and, in particular, extended to words heard in

utterance contexts, in a recent study by Tyler and Wessels (1983). The gating

recognition-points calculated in this study are indeed the points at which a

single candidate is left, and this point is not only quite independent of the

total length of the word, but also varies in the manner predicted by the theory

as a function of the availability of contextual constraints.

4.1.2. Implications of on-line recognition-points

The evidence for the psychological reality of the recognition-points

specified by cohort analysis poses severe problems for certain classes of word-

recognition model. The recognition-points were calculated on the basis not

only of the positive information accumulating over time that a given word

was present, but also, and equally important, the information that certain

other words were not present. There is nothing, for example, about trespass

by itself that predicts a recognition-point at the /p/-or indeed, anywhere else

in the word. It is only in terms of the relationship of trespass to its initial

cohort that the recognition-point can be computed. This contingency of the

recognition response on the state of the ensemble of alternatives is in conflict

with the basic decision mechanisms employed both by logogen-based theories

and by serial search theories.

The results exclude, first, those recognition-models that depend on a self-

terminating serial search, in the manner of Forster’s models of access and

selection (Forster, 1976, 1979, 1981). In this type of model, word-forms are

stored in peripheral access files. These access files are organised into “bins”,

with the words within any one bin arranged in sequential order according to

frequency. Once a bin has been accessed, there follows a serial search through

the contents of the bin, terminating as soon as a word-form is encountered

which matches the search parameters. The search must be self-terminating,

since it is this that gives the model its ability to deal with frequency effects-

frequent words are recognised more quickly because they are encountered

earlier in the search process. Such a procedure could only take into account

the status of competing word-candidates if they were higher in frequency

84 W. D. Marslen- Wilson

than the actual best match. This would not predict the correct recognition-

points.

It is in general a problem for sequential search models if the outcome of

the recognition process needs to reflect the state of the entire ensemble of

possibilities, since this makes the process extremely sensitive to the size of

this ensemble. In fact, evidence I will cite later shows that the timing of

word-recognition processes is not affected by the number of alternatives that

need to be considered. Parallel access and selection processes are far better

suited to the task of providing information about the status of several word-

candidates simultaneously. But this by no means guarantees the suitability of

all parallel models.

One type of parallel model that is excluded by the present results (as well

as by the data reported in the next section) are the logogen-based models.

These models depend on the accumulation of positive evidence within a single

recognition device as the basis for recognition. Each device has a decision

threshold, and the word that is recognised is the one whose corresponding

recognition device (or logogen) crosses the threshold first, without reference

to the state of any other recognition devices. The model has no mechanism for

allowing the behaviour of one unit to take into account the behaviour of

other units in the ensemble. This means that it has no basis for computing

the recognition-point for a given word as a function of the timing with which

that word emerges from the ensemble of its competitors, and, therefore,

cannot explain the effectiveness of cohort-based recognition-points in ac-

counting for response variation in the phoneme-monitoring task.

4.2. Optimal real-time analysis

The evidence for the psychological reality of recognition-points is also evi-

dence for a more general claim about the properties of the word-recognition

system. In a distributed model of access and selection, information coming

into the system is made simultaneously available to all of the processing

entities to which it might be relevant. This makes the system capable, in

principle, of extracting the maximum information-value from the speech sig-

nal, in real-time as it is heard.

The information-value of the signal is defined with respect to the informa-

tion that it provides, over time, for the discrimination of the correct word-can-

didate from among the total set of possible words that might be uttered. To

use this information in an optimally efficient manner requires an access and

selection process that can continuously assess the sensory input against all

possible word-candidates. It is only by considering all possible lexical in-

terpretations of the accumulating sensory input that the system can be sure,

Functional parallelism in spoken word-recognition 85

on the one hand, of not selecting an incorrect candidate, and, on the other,

of being able to select the single correct candidate as soon as it becomes

uniquely discriminable-that is, at the point where all other candidates be-

come excluded by the sensory input. A series of experiments, using an audit-

ory lexical decision task, show that listeners do have access, in real time, to

information about the sensory input that could only have derived from an

analysis process with these properties (Marslen-Wilson, 1980, 1984).

These experiments focused on the discrimination of nonwords, rather than

on the timing of real-word recognition, because this made it possible to ask

a wider range of questions about the processes of access and selection. The

nonword stimuli-which the subjects heard mixed in with an equal number

of real words-were constructed by analysing the cohort structure of sets of

English words. The sequence “trenker”, for example, becomes a nonword at

the /k/, since there are no words in English beginning with /tren/ which have

/k/ as a continuation. The use of this type of material allowed us to ask the

following questions.

First, can listeners detect that a sound sequence is a nonword at precisely

the point where the sequence diverges from the existing possibilities in En-

glish-that is, from the offset of the last phoneme in the nonword sequence

that could be part of the beginning of a real word in English? If the selection

process does continuously assess the incoming speech against possible word-

candidates, then decision-time should be constant relative to critical phoneme

offset. It should be independent both of the position of the critical phoneme

in the sequence, and of the length of the sequence as a whole.

The results were unambiguous. Decision-time, measured from the offset

of the last real word phoneme, was remarkably constant, at around 450 ms.“’

It was unaffected either by variations in the position of the nonword point

(from the second to the fifth phoneme in the sequence), or by variations in

the length of the nonword sequences (from one to three syllables). It appears

that not only is there a continuous lexical assessment of the speech input, but

also that this input itself is not organised into processing units any larger than

a phoneme.

This latter point was investigated in a subsequent experiment (Marslen-

Wilson, 1984), which looked specifically at the role of a larger unit-the

syllable-in access and selection. If the speech input is fed to the mental

“‘We can also look at the results in terms of the relationship between overall reaction-time (measured from

sequence onset) and the delay from sequence onset until the offset of the critical phoneme. In an optimal

system, the slope of this relationship should approach 1.0, since reaction-time from sequence onset should

increase as a linear function of the delay until the sequence becomes a nonword. The outcome is very close

to this, with an observed slope of +.90, and with a correlation coefficient of + .97.

86 W. D. Marslen- Wilson

lexicon in syllable-sized chunks, then nonword decision-time, which depends

on access to the lexicon, should increase the further the critical phoneme is

from the end of the syllable. To test this, I used nonword sequences where

the critical phoneme was either at the beginning, in the middle, or at the end

of a syllable. This variation in position had no effect on decision-time, which

remained constant at around 4.50 ms. The absence of any delay for syllable-in-

ternal targets shows that subjects do not need to wait until the end of a

syllable to make contact with the lexicon. This is consistent with recent evi-

dence (Cutler, Mehler, Norris, & Segui, 1983) that the syllable does not

function as a processing unit in English.

The absence of length effects in these experiments appears to be fatal for

standard logogen models. A weak point in this type of model, as I have noted

elsewhere (Marslen-Wilson & Welsh, 1978), is its treatment of nonwords.

A logogen-based recognition system cannot directly identify a nonword, since

recognition depends on the triggering of a logogen, and there can be no

logogen for a nonword. The system can only determine that a nonword has

occurred if no logogen fires in response to some sensory input. But to know

that no logogen will fire, it must wait until all of the relevant input has been

heard. In the present experiment, therefore, nonword decision-times should

have been closely related to item length. In fact, there was no relationship

at all between these two variables.

The predicted effect of length derives directly from the fundamental deci-

sion mechanism around which logogen-based recognition models are con-

structed. The failure of this prediction means that we must reject such

mechanisms as the building blocks for models of spoken word-recognition.

The second main question I was able to ask, using nonword stimuli, ad-

dressed more directly the claim for a parallel access and selection process. A

major diagnostic of a parallel, as opposed to serial system, is its relative

insensitivity to set size effects. For a distributed system like the cohort model,

it need make no difference to the timing of the word-recognition decision

whether two candidates have to be considered or two hundred. In either case,

the timing of the selection process reflects the point at which a unique solution

emerges. .This is purely a matter of cohort structure, and has nothing to do

with the number of alternatives per se. For a serial process, however, which

moves through the alternatives in the decision space one item at a time, an

increase in the number of alternatives must mean an increase in decision-

time.

I investigated this in two experiments, in which I varied the size of the

“terminal cohort” of sets of nonword sequences. This refers to the number

of real words that are compatible with the nonword sequences at the point

where they start to become nonwords-that is, at the offset of the last real-

Functional parallelism in spoken word-recognition 87

word phoneme. To make the nonword decision, all of these words presuma-

bly need to be analysed, to determine whether the subsequent speech input

is a possible continuation of any of them. In the first experiment, the size of

these terminal sets varied from one to 30. In the second, replicating the first,

the range was from one to over 70. In neither case did I find an effect of

set-size. Decision-time was constant, as predicted by the model, from the

offset of the last real-word phoneme in the sequence, irrespective of whether

only one real word remained consistent with the input up to this decision

point, or of whether more than 70 still remained. This is evidence against any

sequential search model of spoken word-recognition, whether self-termi-

nating or not.

4.3. The early activation of multiple semantic codes

The two preceding sections focused on the way the cohort model leads one

to think about the relationship between the sensory input and the mechanisms

of access and selection. Here I consider the role of contextual constraints in

the operation of these mechanisms.

The cohort model places severe restrictions on the ways in which contex-

tual variables can affect the access and selection process. In particular, it

prohibits the top-down pre-selection of potential word-candidates. It is the

sensory input that activates the initial set of candidates, which can then be

assessed against context. There is no top-down flow of activation (or inhibi-

tion) from higher centers, but, rather, the bottom-up activation of the syntac-

tic and semantic information associated with each of the word-forms that has

been accessed.

This has two major consequences. It means, first, that contextual con-

straints cannot prevent the initial accessing (i.e., the entry into the word-ini-

tial cohort) of words that do not fit the context. There is already indirect

evidence for this from earlier work on lexical ambiguity (e.g., Seidenberg,

Tanenhaus, Leiman, & Bienkowski, 1982; Swinney, 1979). More recently,

research by Tyler (1984) and Tyler and Wessels (1983) shows that subjects

in the gating task produce a substantial proportion of contextually inappro-

priate responses at the earlier gates-that is, when they have heard between

50 and 200 ms of the word. These are responses that are compatible with the

available sensory input, but which do not fit the semantic and syntactic con-

text in which these fragments occur. The existence of these responses at the

early gates is evidence for the priority given by the system to the bottom-up

input, and for the inability of context to suppress the initial activation of

inappropriate candidates.

The second major consequence is that early in the recognition process

88 W. D. Marslen-Wilson

there will be the activation of multiple semantic and syntactic codes.” If

contextual constraints are to affect the selection process, they can only do so,

within this framework, if they have access to the syntactic and semantic prop-

erties of the potential word-candidates. This information must be made avail-

able not only about the word that is actually being heard, but also about the

other words that are compatible with the sensory input-that is, the other

members of the current cohort.

We have evaluated these two claims by using cross-modal priming tasks to

tap the activation of different semantic codes early in the recognition process.

In these experiments (Marslen-Wilson, Brown, & Zwitserlood, in preparation;

Zwitserlood, 1985), the subjects heard spoken words, and made lexical decision

judgements to visual probes that were presented concurrently with these words.

Previous research by Swinney and his colleagues (e.g., Onifer & Swinney,

1981; Swinney, 1979) had shown that lexical decisions to visually presented

stimuli are facilitated when these words are associatively related to spoken

words that are being presented at the same time.

The spoken words in our experiments were drawn from pairs of words

such as CAPTAIN and CAPTIVE, which only diverge from each other rela-

tively late in the word-in this case at the onset of the vowel following the

/t/-burst. The visual probes, to which the subjects made their lexical decisions,

were semantically associated with one or the other member of the pair of

spoken words-in this case, for example, the probes might be the words

SHIP and GUARD, where SHIP is frequently produced as an associate to

CAPTAIN but never to CAPTIVE, and vice versa for GUARD. The critical

variable, however, was the timing with which the visual probes were present-

ed, relative to the separation-point in the spoken words. We contrasted two

probe positions in particular: an Early position, where the probe appeared

just before the separation-point, and a Late position, where it occurred at

the end of the word, well after the separation-point.

The cohort model claims that both CAPTAIN and CAPTIVE will be ac-

cessed early in the selection process, and that this will make available the

semantic codes linked to both of them. If this is correct, then there should

be facilitation of the lexical decision for both visual probes when they occur

“It is important not to equate the kind of activation being postulated here with the activation effects

detected by Swinney (1979) and Seidenberg et al. (1982) in experiments using homophones. In these experi-

ments, subjects hear a complete word-form-like “bug” or “rose”-that has two or more different meanings.

Under these conditions. there is a strong activation of both meanings, which appears to persist for as much

as a second after word offset. This is not the same as the phenomena predicted here, where the transient

match, early in the word, of the incoming signal with a number of different word-forms leads to the transient

activation of the semantic and syntactic codes associated with these forms. These effects are only the faint

precursors of the activation effects to bc cxpectcd when there is a full match of the input to a given word-form,

1s in the homophone experiments.

Functional parallelism in spoken word-recognition 89

in the Early position. Decision-time for SHIP and GUARD should, there-

fore, be affected equally when these probes are presented on or before the /t/

in either CAPTIVE or CAPTAIN. In contrast, when the probes are present-

ed in the Late position, then only the probe related to the actual word should

be facilitated. If the word is CAPTAIN, for example, there should be facili-

tation of SHIP at the end of the word but not of GUARD.

This pattern should hold both for isolated words and for the same words

in context. If the initial access, first of word-forms, and then of the syn-

tactic and semantic information associated with these word-forms, is trig-

gered from the bottom-up, and if contextual effects can only operate on this

information after it has been accessed in this way, then the presence or

absence of contextual constraints should not affect the pattern of activation

of semantic codes at the early positions.

In a series of experiments this was exactly what we found. For words in

isolation we see facilitation of both probes for the Early locations, but only

facilitation of one probe at the Late positions (Marslen-Wilson et al., in

preparation; Zwitserlood, 1985). The same pattern holds for words in context

(Zwitserlood, 1985). The differential facilitation of probes associated with

contextually appropriate as opposed to contextually less appropriate words

only begins to appear after about 200 ms. At earlier probe positions, there

is evidence for the activation of semantic codes linked to contextually inap-

propriate words, just as we find for words in isolation.

These results support the fundamental claim of the cohort model that the

recognition process is based not only on multiple bottom-up access, but also

on multiple contextual assessment (as discussed in Section 3). They also dem-

onstrate that the involvement of contextual variables early in the selection

process takes place under highly constrained conditions. No contextual pre-

selection is permitted, and context cannot prevent the accessing and activa-

tion of contextually inappropriate word-candidates.

These conclusions distinguish the first version of the cohort model both

from standard autonomy models and from standard interactive models. The

cohort model differs from autonomous models, because it allows contextual

variables to affect the selection process. But it shares with autonomy theories

the assumption that initial access is autonomous, in the sense that top-down

inputs cannot activate perceptual structures in the recognition lexicon.

This partial “autonomy” distinguishes the cohort model from theories

which do permit top-down influences on initial access. One example is the

logogen model, where logogens can be activated by inputs from the cognitive

system as well as by bottom-up inputs. Another, more topical example, is the

interactive activation model put forward by Rumelhart and McClelland

90 W. D. Marslen- Wilson

(1981), and recently applied to spoken word-recognition in the form of the

TRACE model (Elman & McClelland, 1984). This is an approach that can

accommodate many of the phenomena driving the cohort model-and, in-

deed, this was what it was initially designed to do.

It is not clear, however, whether TRACE (or its predecessor COHORT),

with its mixture of excitatory connections between levels and inhibitory con-

nections within levels, can accommodate the pattern of semantic activation

described here for members of the same cohort heard in context and isolation.

It should, first, predict differential patterns early in recognition for the con-

textually appropriate word, because of feed-forward from excitatory top-

down connections. Secondly, because of the inhibitory connections between

units within a level, there should be very little early activation of competing

words like CAPTAIN and CAPTIVE. They should mutually inhibit each

other until after their separation-point. Neither prediction is consistent with

our results.

Finally, it is worth remembering that any evidence which reinforces the

claims for multiple contextual assessment also serves to underline the funda-

mental inability of sequential search models to explain the observed prop-

erties of the on-line transfer-function of the recognition system-namely, the

convergence of two sets of criteria, sensory and contextual, onto a unique

solution within 200 ms of word-onset.

5. Information and decision in the cohort model: Some revisions and

extensions

The results summarised in the preceding sections illustrate the value of the

cohort approach as a basis for research into spoken word-recognition, and

they support the accuracy of the claims it embodies about the functional

characteristics of the recognition process. Nonetheless, it is also clear that the

internal structure of the model, as originally stated, is over-simplified and

inadequate on several counts.

In this final section of the paper I want to discuss some problems with the

handling of information and decision in the cohort theory. These problems

concern the nature of the information coming into the system, the way that

information is represented within the system, and the way in which decisions

are taken to exclude or include candidates for selection and recognition.

I will argue, in particular, that the cohort model has to move away from

its binary concept of information and decision, where candidates are either

in the cohort or out of it, towards a more fluid form of organisation, incorpo-

rating the concept of activation. The rationale for this derives, first of all,

Functional parallelism in spoken word-recognition 91

from some recent evidence for the role of word-frequency in the early stages

of access and selection.

5.1. Activation and word-frequency

As originally stated, the cohort model made no mention at all of word-fre-

quency. The main reason for this was the absence of compelling evidence

that word-frequency was an effective variable in the kinds of on-line analysis

processes with which the model is concerned. The older research in this area

(e.g., Broadbent, 1967; Howes, 1957; Morton, 1969; Pollack, Rubinstein, &

Decker, 1960) showed that word-frequency affects the intelligibility of spoken

words heard in noise. But it was never clear whether these were immediate

perceptual effects or due to post-perceptual response biases.

More recent research, using reaction-time techniques, was flawed by its

failure to take into account the distribution of information over time in

spoken words. Unless the high and low frequency words in an experiment

are matched for recognition-point, and unless reaction-time is measured with

respect to this point, then any measures of response-time to the two different

classes of stimuli are difficult to interpret. This is the problem, for example,

with the auditory lexical decision data reported by McCusker, Holley-Wilcox,

and Hillinger (1979) and by Blosfeld and Bradley (1981). Both studies show

faster response times to high frequency as opposed to low frequency monosyl-

lables. But in each case reaction-time was measured from word-onset, with

no correction for possible variations in recognition-point,

Two new studies provide better evidence for the role of word-frequency.

In a preliminary study I looked at lexical decision latencies for matched pairs

such as STREET and STREAK, where the recognition-point for each word

is in the word-final stop-consonant. I2 This means that reaction-time can be

measured from comparable points in each member of the pair-in this case,

from the release of the final stop. For a set of 3.5 matched pairs, with mean

frequencies, respectively, of 130 per million and 3 per million, there was a

considerable advantage for the high-frequency words (387 vs. 474 ms).

Evidence of a different sort shows that these frequency effects can be

detected early in the selection process. This evidence comes from the research

on the early activation of semantic codes (see Section 4.3), where we found

that the frequency of the spoken words being heard indirectly affected the

amount of priming of the concurrent visual probe.

The effective variable was the difference in frequency between the word

‘*This was research carried out under my supervision by R. Sanders and E. Eden in 1983, as part of an

undergraduate research project in the Cambridge Department of Experimental Psychology.

92 W. D. Mmslen- Wilson

being heard and its closest competitor-in this experiment usually the other

member of the stimulus pair. For the Early probes, presented before the

spoken words had separated from each other, we regularly found more faci-

litation for the probe related to the more frequent member of the pair, with

the size of this effect varying according to the size of the frequency difference

between the two words.

The word CAPTAIN, for example, is more frequent than its close com-

petitor CAPTIVE. For visual probes presented in the Early position, just

before the /t/, there would be more facilitation of SHIP (the probe related

to CAPTAIN) than of GUARD (the probe related to CAPTIVE), irrespec-

tive of whether the word actually being heard was CAPTAIN or CAPTIVE

(Marslen-Wilson et al., in preparation). But for Late probes, presented at

the end of the spoken word, these effects of relative frequency had disap-

peared, so that only the probe associated with the actual word being heard

would be facilitated. Comparable effects were found by Zwitserlood (1$X35),

in a study where the relative frequency of the members of such pairs was

systematically varied.

These appear to be genuine perceptual effects, reflecting competition be-

tween different candidates early in the selection process. Alternative explana-

tions, in terms of post-perceptual response-bias, can be excluded. If there are

any bias effects in the data, they will reflect the properties of the visual

probes rather than the spoken words, since it was the visual probes the sub-

jects were actually responding to. They were not being asked to make any

judgements about the identity or lexical status of the spoken words, nor, in

general, did they seem to be aware that there was a relationship between

these words and the visual probes. Furthermore, since the effects hold only

for the Early probes, they reflect the state of the system during the selection

phase, and not after it is completed.

Finally, and most significantly for the activation argument, these effects

are transient. The effects of relative difference in frequency have dissipated

by the time the Late probes are presented (between 200 to 300 ms later).

What we appear to be picking up earlier in the word is a temporary advantage

accorded to frequent word-forms, where the size of this advantage reflects

the degree of differential activation of word-forms and their closest com-

petitors.

Related transient effects can be seen in some other studies. For example,

Blosfeld and Bradley (1981) only found significant frequency effects for

monosyllables. In disyllabic words, lexical decision time did not vary accord-

ing to frequency. This is because lexical decision is a task where the listener

needs to wait until the end of the word before making a positive response,

to make sure that he is not hearing a nonword. If the end of the word comes

Functional parallelism in spoken word-recognition 93

significantly later than the recognition-point, as will usually be the case for

disyllables, then the effects of word-frequency at the recognition-point will

have dissipated when the time comes for the subject to respond.‘” Finally, in

the gating task the effects of frequency appear systematically only at the

earliest gates (Tyler, 1984).

On the basis of this, I conclude the following. We can still assume that all

word-forms which match a given input will be accessed by that input, and

will remain active candidates as long as there is a satisfactory match with the

sensory input. However, the response of higher-frequency word-forms ap-

pears to be enhanced in some way, such that the level of activation of these

elements can rise more rapidly, per unit information, than the activation of

less frequent elements (cf. Grosjean & Itzler, 1984).

This means that, early in the word, high-frequency words will be stronger

candidates than lower-frequency words, just because their relative level of

activation will be higher. This transient advantage is what the priming data

reflect. And since the selection process is dependent on the emergence of

one candidate from among a range of competitors, this should lead to faster

recognition-times for high-frequency words than for low-frequency words,

especially for low-frequency words with high-frequency competitors. This is

because the activation of high-frequency competitors will take longer to drop

below the level of the low-frequency candidate, once the critical sensory

information has become available which excludes this high-frequency com-

petitor.

To adopt this kind of account means that the behaviour of the cohort

system can no longer be characterised in terms of the simple presence or

absence of positive or negative information. Elements are not simply switched

on or off as the sensory and contextual information accumulates, until a

single candidate is left standing. Instead, the outcome and the timing of the

recognition process will reflect the differential levels of activation of success-

ful and unsuccessful candidates, and the rate at which their respective activa-

tion levels are rising and falling.

Some recent attempts to model a cohort-like analysis process have, in fact,

represented the behaviour of the system in these or very similar terms (e.g.,

Elman & McClelland, 1984; Marcus, 1981, 1984; McClelland & Elman, 1986;

Nusbaum & Slowiaczek, 1983). The results of these simulations show that an

activation-based system is capable of exhibiting the main characteristics of a

cohort selection process, with the correct candidate emerging from among its

“1 should note, however, that recent research by Fraurnfcldcr (personal communication) has failed to find

this fall-off of frequency effects for disyllables.

94 W. D. Marslen- Wilson

competitors as the discriminating acoustic-phonetic information starts to ac-

cumulate.

But apart from being strongly suggested by the word-frequency data, the

activation concept has advantages in other respects. In particular, it enables

us to deal in a more satisfactory manner with a second set of issues raised by

the cohort model’s treatment of information and decision. These concern the

nature of the sensory and contextual input to the decision process, and the

way that the matching of these inputs to lexical representations affects this

process.

5.2. Matching processes in access and selection

In the initial formulation of the cohort model it was assumed that the match-

ing process was conducted on an all-or-none basis. The sensory and the con-

textual input either did or did not match the specifications for a given candi-

date. If it did not, then the candidate would be dropped from the cohort.

The trouble with this account is that it makes the successful outcome of

the recognition process dependent on an unrealistically perfect match be-

tween the specifications of the correct candidate and the properties of the

sensory input and the context. I will begin with the problems raised by varia-

bility in the bottom-up input.

5.2.1. Matching the sensory input

The cohort model emphasises the role of sensory information in determin-

ing the scope and characteristics of the access and selection process. It is this

that determines the membership of the word-initial cohort, and that has the

priority in determining which candidates remain in the cohort and which are

dropped. The available evidence suggests that this is the correct view to take

(see Section 4.3).

To take this view, however, is to run the risk of making the recognition

process too sensitive to noise and variation in the sensory input. If sensory

information is the primary determinant of cohort membership, and if the

matching process operates on an all-or-none basis, then even a small amount

of variability in the sensory signal could lead to problems in recognition, with

the correct word-candidate either never making it into the word-initial cohort,

or being dropped from it for spurious reasons.

In fact, the human spoken word-recognition system seems to be remark-

ably indifferent to noise in the signal, so long as the disrupted input occurs

in an utterance context. Even when deviations are deliberately introduced

into words-as in the mispronunciation detection task (Cole, 1973; Marslen-

Wilson & Welsh, 1978)-listeners often fail to notice them. Over 70% of

Functional parallelism in spoken word-recognition 95

small changes (i.e., changes by a single distinctive feature) are not detected

when they occur in words in utterance contexts, even though the same

changes are readily detectable in isolated syllables (Cole, 1973).

To accommodate this type of result, the model must find some way of

permitting deviant words to enter the cohort. The model can only allow

context to compensate for deficiencies in the bottom-up specification of the

correct candidate if this candidate nonetheless manages to find its way into

the cohort.

There are two aspects to the solution of this problem. The first follows

from the activation-based selection process sketched out in the previous sec-

tion. This is not a decision process that requires all-or-none matching, since

to discriminate the correct candidate it is not necessary to systematically

reduce the cohort to a single member. Selection does not depend on simple

presence or absence in the cohort, but on relative goodness of fit to the

sensory input. This makes it in principle possible for candidates that do not

fully match the sensory input to participate nonetheless in the recognition

process.

The second aspect of the solution involves the model’s assumptions about

the nature of the input. The system will respond quite differently to deviant

or noisy input, depending on the description under which this input is fed

into the decision process. The more highly categorised the output of acoustic-

phonetic analysis, the greater the problems that will be caused by variability

and error (cf., Klatt, 1980). In fact, if the cohort model is going to be able

to allow contextual constraint to compensate for bottom-up variability, then

the input to the lexicon cannot be anything as abstract as a string of

phonemes. Instead, a representation is required which preserves more infor-

mation about the acoustic-phonetic properties of the input-for example, a

representation,in terms of a feature matrix.

To see this, consider the consequences of minor disruptions of the signal

when we adopt different assumptions about the input. Suppose that the dis-

turbance is such that a word-initial voiced stop-for example, /be/-is mis-

identified as a voiceless stop (/pe/). If the input to the word-recognition system

takes the form of a string of phonemic labels, then this error will have drastic

consequences for the membership of the cohort. A match will be established

for all words beginning with /pe/, and these will be strongly activated. But

the word intended by the speaker, beginning with a lb/, will receive no acti-

vation at all.

In contrast, if the input is specified in terms of a set of feature values, then

such an error will have much less drastic consequences. A minimal pair like

/b/ and /p/ only differ in their specifications along one feature parameter-in

this case voicing. Even if a wrong assignment is made on this parameter, the

96 W. D. Marslen- Wilson

input will still match the specifications for /b/ words along all of the other

parameters. This means much less differentiation in the degree of match and

mismatch between the /be/ and the /pe/ sets, so that the word-form intended

by the speaker has a much better chance of receiving sufficient activation to

be treated as a candidate for selection and recognition. In other words, the

system will become more tolerant of minor deviations in the sensory input.

To assume a less highly categorised input to the lexicon does not sacrifice

the ability of the system to discriminate among different alternatives. There

is no inherent advantage to making phonemic distinctions at a pre-lexical

decision stage, and the choice between two phonemes can be made just as

well at the lexical level, as part of the choice between two words. In each

case, the decision takes into account the same bottom-up information. The

advantage of making the decision at the lexical level is that it enables the

system to delay committing itself to final decisions about the properties of

the sensory input until the other information relevant to this decision-includ-

ing the lexical status of different alternatives and their contextual roles-can

be taken into account (Klatt, 1980).

5.2.2. Matching the context

The evidence that selection is intimately bound up with integration lies at

the heart of the argument for a distributed model of spoken word-recognition.

But despite this, the way that the first version of the cohort model handles

the relationship between selection and contextual constraints is seriously

flawed.

Early statements of the model (e.g., Marslen-Wilson & Welsh, 1978) assert

that candidates drop out of the pool of word-candidates when they do not fit

the specifications of context, in the same way as when they do not fit the

accumulating sensory input. This runs into similar problems to the all-or-none

assumptions about sensory matching that I have just discussed. For the sen-

sory input, the problem was to explain how mispronounced, or otherwise

deviant words could nonetheless still be correctly identified. For context, the

problem is to explain how contextually anomalous words can be identified

(e.g., Norris, 1981).

Commonsense experience, as well as experimental evidence, tells us that

contextually inappropriate words can, in fact, be readily perceived and iden-

tified, so long as they are unambiguously specified in the signal. In a recent

experiment, for example, we compared monitoring latencies to the same

target under conditions where it was either normal with respect to its context,

or was anomalous in varying degrees of severity (Brown et al., unpublished).

Consistent with earlier results, there was a clear effect of anomaly. Response

latency to the word GUITAR increased by 27 ms over normal when it occur-

Functional parallelism in spoken word-recognition 97

red in an implausible context (“John buried the guitar”), and by a further 22

ms when it occurred in a semantically anomalous context (“John drank the

guitar”). But equally clearly, these anomalies are not causing a major break-

down of the recognition process. In the semantically anomalous condition,

for example, response-latencies remain well below 300 ms, and the error rate

is essentially zero. Even for grossly anomalous targets (“John slept the

guitar”), where verb sub-categorisation constraints are also violated, re-

sponse-time is still a relatively rapid 320 ms, and the error-rate remains low.

The relative speed and accuracy of correct selection for contextually inap-

propriate candidates is a reflection of the principle of bottom-up priority

(Marslen-Wilson & Tyler, 1980, 1983). The system is organised so that it

cannot override unambiguous bottom-up information. This means that there

is a considerable asymmetry in the degree to which context can override

bottom-up mismatch as opposed to the ability of bottom-up information to

override contextual mismatch. If the sensory input clearly differentiates one

candidate from all others, then that is the candidate that will emerge from

the perceptual process, irrespective of the degree of contextual anomaly. If

contextual variables clearly indicate a given candidate, it will nonetheless not

emerge as the choice of the system unless it also fits the bottom-up input

(within the limits of variation indicated earlier).

The clear implication of this is that context does not function to exclude

candidates from the cohort. There is no all-or-none matching with context,

and no all-or-none inclusion or exclusion of candidates on this basis. This

parallels the points made earlier (Section 4.3), prohibiting top-down influ-

ences upon initial access. It looks as if contextual factors can neither deter-

mine which candidates can enter the cohort, nor which candidates must leave

it. If we accept this conclusion, then there are two lines we can follow. One

is to maintain an interactive model, but to restrict the kinds of top-down

effects that are permitted. Since inhibitory effects are now excluded, context

will only have facilitatory effects, perhaps by increasing the level of activation

of candidates that fit the current context. Alternatively, we can turn towards

a different type of model, where no top-down interactions of any sort are

permitted. Different types of information are integrated together on-line to

produce the perceptual output of the system, but they do not interact in the

conventional sense. I will explore here the possibilities for this second kind

of account.

The effects of context, within the general framework I have adopted in

this paper, reflect the processing relationship between selection and integra-

tion. This is the relationship between, on the one hand, the set of potential

word-candidates, triggered from the bottom-up, and, on the other, the

98 W. D. Marslen- Wilson

higher-level representation of the current utterance and discourse. This con-

textual representation provides a structured interpretative framework against

which the senses associated with different word-forms can be assessed. In a

non-interactive model, this framework does not, itself, operate directly on

the activation levels of different candidates. These activation levels are a

measure of the relative goodness of fit of the candidates to the bottom-up

input, and context does not tamper with this measure.

We can capture, instead, the phenomena of early selection, and of contex-

tual compensation for bottom-up deficiency, by exploiting the capacity of a

parallel system for multiple access and multiple assessment. These will lead

to a form of on-line competition between the most salient candidates (those

most strongly activated by the sensory input) to occupy the available sites in

the higher-level representation. Once the appropriate senses associated with

a given word-form have been bound to these locations in the representation,

then we can say that recognition has taken place.t4

The speed with which this is accomplished will be the joint function of two

variables: the extent to which the bottom-up fit for a given candidate differen-

tiates it from its competitors, and the extent to which the contextual match

similarly differentiates it. The facilitatory and compensatory effects of context

reflect the tendency of the system to commit itself to a particular structural

interpretation even though the sensory input may not have fully differentiated

the word-form associated with this interpretation. The reason for this lack of

full bottom-up differentiation may be either temporal-not all of the sensory

input relevant to the decision has been heard yet, or it may be substantive-

the sensory input is simply inadequate by itself to indicate a unique candidate.

On this account, both access and certain aspects of selection are autonom-

ous processes, in the sense that they are driven strictly from the bottom-up.

Whether the speech signal is heard in context or in isolation, the basic pat-

tern-matching routines of the system will operate in the same way, providing

information about the goodness of fit of the sensory signal to the array of

lexical representations of word-forms.

This means that when the signal is heard in isolation, we will get something

approximating the commonsense concept of word-recognition-that is, a pro-

cess of form-based selection culminating in the explicit decision that a given

word-form is present. But when the signal is heard in context-and note that

normal context is fluent conversational speech-there need be no explicit

form-based recognition decision. Selection-viewed as the decision that one

particular word-form rather than another has been heard-becomes a by-

“It is at this point (Marslen-Wilson & Welsh, 1978) that the output of the system becomes perceptually

available.

Functional parallelism in spoken word-recognition 99

product of the primary process of mapping word-senses into higher-level

representations. The bottom-up access and selection processes provide the

essential basis for rapid on-line comprehension processes, but they provide

no more than a partial input to an integrative system that is only peripherally

concerned with identifying word-forms, and whose primary function is to

uncover the meanings that the speaker is trying to communicate.

5.2.3. The new cohort

In the preceding section of this paper 1 have suggested a number of mod-

ifications in the way that the cohort concept should be realised as a processing

model. These include the use of the activation concept, the introduction of

frequency effects into the early stages of the recognition process, the specifi-

cation of the bottom-up input in terms of some form of sub-phonemic rep-

resentation, and the exclusion of top-down contextual influences on the state

of the actual lexical recognition units. What do these changes mean for the

central concepts of the approach, with its emphasis on the contingency of

perceptual choice, and on the processing concept of the word-initial cohort?

By moving away from the concept of all-or-none matching against sensory

and contextual criteria, and by adopting an activation metaphor to represent

the goodness of fit of a given candidate to the bottom-up input, the model

abandons the convenient fiction that the cohort is a discrete, well-demarcated

entity in the mental life of the listener. The selection process does not depend

on the membership of the cohort reducing to a single candidate. It depends

instead on the process of mutual differentiation of levels of activation of

different candidates. The operation of the system still reflects the state of the

entire ensemble of possibilities, but the state of this ensemble is no longer

represented simply in terms of the all-or-none presence or absence of differ-

ent candidates.

Functionally, however, the cohort still exists. The effective core of salient

candidates will be much the same as it would have been under an all-or-none

regime. Although very many candidates will be momentarily activated as

aspects of their phonological representations transiently match the accumulat-

ing input, the preceding and subsequent input will not match, and they will

fall back into semi-quiescence. It takes some amount of time and input for

candidates to start to participate fully in the selection and integration process.

The effect of this is that the set of candidates which must be discriminated

among will look very similar to the membership of the word-initial cohort as

defined on an all-or-none basis. But by not defining it on this all-or-none

basis, the system becomes far better equipped to deal with the intrinsic and

constant variability of the speech signal.

Overall, none of the modifications I have suggested change the fundamen-

100 W. D. Marslen- Wilson

tal functional characteristics of the cohort-based word-recognition process. It

still embodies the concepts of multiple access and multiple assessment, allow-

ing a maximally efficient recognition process, based on the principle of the

contingency of real-time perceptual choice.

References

Blosfeld. M.E., & Bradley. D.C. (1981). Visual and auditory word recognition: Effects of frequency and

syllahicity. Paper presented at the Third Australian Language and Speech Conference, Melbourne.

Broadhcnt, D.E. (1967). Word-frequency effect and response-bias. Psychological Review, 73, 504-506.

Brown, C.M., Marslen-Wilson, W.D., & Tyler, L.K. (no dam). Sensory and contextual factors in spoken

word-recognition. Unpublished manuscript, Max-Planck Institute. Nijmegen.

Cole, R.A. (lY73). Listening for mispronunciations: A measure of what we hear during speech. Perception &

Psychophysics, 13, 153-156.

Cotton, S., & Grosjean, F. (19X4). Thegating paradigm: A comparison of successive and individual presentation

formats. Perception & Psychophy.~ics. 35, 41-48.

Cutler, A., Mehlcr, J., Norris, D., & Segui, J. (lY83). A language-specific comprehension strategy. Nature.

.?04. 159-160.

Cutler, A., & Norris. D. (197’)). Monitoring sentence comprehension. In W.E. Cooper & E. Walker (Eds.).

Srnrrncr processing: Psycholinguistic studies presented to Merrill Garrett. Hillsdalc. NJ: Erlbaum.

Elman, J.L., & McClelland, J.L. (1984). Speech perception as a cognitive process: The interactive activation

model. In N. Lass (Ed.), Speech and Language. Vol. 10. New York: Academic Press.

Fahlman, S.E. (1979). NETL: A .system for representing and using real-world knowledge. Cambridge, MA:

MIT Press.

Forster, K.I. (1976). Accessing the mental lexicon. In R.J. Wales & E. Walker (Eds.) New approaches to

language mechanisms. Amsterdam: North-Holland.

Forster, K.I. (1979). Levels of processing and the structure of the language processor. In W.E. Cooper & E.

Walker (Eds.) Sentence processing: Psycholinguistic stud& presented to Merrill Garrett. Hillsdale, NJ:

Erlbaum.

Forster, K.I. (1981). Priming and the effects of sentence and lexical contexts on naming time: Evidence for

autonomous lexical processing. Quarterly Journal of Experimental Psychology, 33A, 365495.

Grosjean, F. (1980). Spoken word recognition processes and the gating paradigm. Perception & Psychophysics.

2X, 267-283.

Grosjcan, F. (19X5). The recognition of words after their acoustic offset: Evidence and implications. Perception

& P.<ychophysics, 28, 299-310.

Grosjcan, F.. & Gee, J.P. (1987). Another view of spoken word recognition. C‘ognition, 25, this issue.

Grosjean, F.. & Itzler, J. (1984). Can semantic constraint reduce the role of word frequency during spoken-

word recognition? Bulletin of the Psychonomic Societ.y, 22, 18&1X2.

Hinton, G.E., & Anderson, J.A. (Eds.) (1981). Parallel models of associative memory. Hillsdale, NJ:

Erlhaum.

Howes. D. (lY57). On the relationship between the intelligibility and the frequency of occurrence of English

wjords. Jourml of the Acoustical Society of Americu, 29, 296-305.

Klatt, D.H. (1980). Speech perception: A model of acoustic-phonetic analysis and lexical access. In R.A. Cole

(Ed,) Perception and production of fluent speech. Hillsdalc, NJ: Erlbaum.

Lucc, P.A. (1986). A computational analysis of uniqueness points in auditory word recognition. Perception

& P,vychophysicr. 3Y, I?-1SY.

Functional parallelism in spoken word-recognition 101

Marcel, A.J. (1983). Conscious and unconscious perception: An approach to the relations between phenom-

enal experience and perceptual processes. Cognitive Psychology, 15, 238-300.

Marcus. S.M, (1981). ERIS-context-sensitive coding in speech perception. Journal of Phonetics, 9, 197-220.

Marcus, S.M. (1984). Recognizing speech: On the mapping from sound to word. In H. Bouma & D.G.

Bouwhuis (Eds.) Attention and Performance X: Control of language processes. Hillsdale, NJ: Erlbaum.

Marslcn-Wilson. W.D. (1973). Linguistic structure and speech shadowing at very short latencies. Nature. 244,

522-523.

Marslen-Wilson, W.D. (1978). Sequential decision processes during spoken word recognition. Paper presented

to the Psychonomic Society. San Antonio. Texas.

Marslen-Wilson, W.D. (1980). Speech understanding as a psychological process. In J.C. Simon (Ed.) Spoken

language understanding and generation. Dordrecht: Reidel.

Marslen-Wilson, W.D. (1984). Function and process in spoken word-recognition. In H. Bouma and D.G.

Bouwhuis (Eds.) Attention and Performance X: Control of language processes. Hillsdale. NJ: Erlbaum.

Marslen-Wilson, W.D. (1985). Speech shadowing and speech comprehension. Speech Communication, 4,

55-73.

Marslen-Wilson. W.. Brown, C.M. & Zwitserlood. P. (in preparation). Spoken word-recognition: early acti-

vation of multiple semantic codes. Manuscript in preparation. Max-Planck Institute. Nijmegen.

Marslen-Wilson. W.D., & Tyler, L.K. (1975). Processing structure of sentence perception. Nature, 257,

784-786.

Marslen-Wilson, W.D., & Tyler, L.K. (1980). The temporal structure of spoken language understanding.

Cognition, X. 1-71.

Marslen-Wilson, W.D.. & Tyler, L.K. (1981). Central processes in speech understanding. Philosophical Trans-

actions of the Royal Society, Series B, 295, 317-332.

Marslen-Wilson, W.D., & Welsh, A. (1978). Processing interactions during word-recognition in continuous

speech. Cognitive Psychology. IO, 2943.

McClelland. J.L., & Elman, J.L. (1986). The TRACE model of speech perception. In McClelland, J.L.. &

Rumelhart, D.E. (Eds.) Parallel distributed processing: Explorations in the microstructure of cognition.

Cambridge, Mass. Bradford Books.

McCusker. L.X., Holley-Wilcox, P., & Hillinger. M.L. (1979). Frequency effects in auditory and visual word

recognition. Paper presented to the Southwestern Psychological Association, San Antonio, Texas.

Morton, J. (1969). Interaction of information in word recognition. Psychological Review, 76, 165-178.

Morton, J., & Long, J. (1976). Effect of word transitional probability on phoneme identification. Journal of

Verbal Learning and Verbal Behavior, 15, 43-52.

Norris, D. (1981). Autonomous processes in comprehension. Cognitton, II, 97-101.

Nusbaum, H.C., & Slowiaczek, L.M. (1983). An activation model of the cohort theory of auditory word

recognition. Paper presented at the Society for Mathematical Psychology, Boulder, Colorado.

Onifer, W., & Swinney, D.A. (1981). Accessing lexical ambiguities during sentence comprehension: Effects

of frequency of meaning and contextual bias. Memory & Cognition, 9, 225-236.

Pollack, I.. Rubinstein, H., & Decker, L. (1960). Analysis of correct responses to an unknown message set.

Journal of the Acoustical Society of America. 32. 4541157.

Rumclhart, D.E.. & McClelland, J.L. (1981). An interactive activation model of context effects in letter

perception, Part II: The contextual enhancement effect and some tests and extensions of the model.

P.cychological Review, 89, 6%94.

Salasoo, A., & Pisoni, D. (1985). Interaction of knowledge sources in spoken word identification. Journal of

Verbal Learning and Verbal Behavior, 24. 21(&231.

Seidenberg, M.S.. & Tanenhaus, M.K. (1979). Orthographic effects on rhyme monitoring, Journal of Experi-

mental Psychology: Human Learning and Memory, 5, 546554.

Seidenbcrg, M.S.. Tanenhaus, M.K., Leiman, J.M., & Bienkowski, M. (1982). Automatic access of the

102 W. D. Marslen- Wilson

meanings of ambiguous words in context: Some limitations of knowledge-based processing. Cognirive

Psychology, 14, 489-537.

Swinney, D. (1979). Lexical access during sentence comprehension: (Re)consideration of context effects.

Journal of Verbal Learning and Verbal Behavior, 14, 645-660.

Tyler. L.K. (1984). The structure of the initial cohort: evidence from gating. Perception & Psychophysics, 36,

217-222.

Tyler. L.K., & Wessels, J. (1983). Quantifying contextual contributions to word-recognition processes. Percep-

tion Br Psychophysics, 34, 409-420.

Tyler, L.K., & Wessels, J. (1985). Is gating an on-line task? Evidence from naming latency data. Perceprlon

& Psychophysics, 38, 217-222.

Zwitserlood, P. (1985). Activation of word candidates during spoken word-recognition. Paper presented to

Psychonomic Society Meetings, Boston, Mass.

La reconnaissance de mots (dans la chaine parlee) englobc trois fonctions fondamentales: I’acces, la selection

et I’intcgration. L’accPs se refere a I’appareillement de I’onde sonore avec les representations de formes

lcxicales; la sdection. designe la discrimination du meilleur “pareil” (match) lexical avec Ic stimulus. et I’inr&

grution recouvre l’appareillement de I’information syntaxique et semantique avec les niveaux de traitement

superieures.

Cet article decrit comment deux versions d’un mod& (de type “cohorte”) rendent compte de ces proces-

sus. en tracant son evolution a partir d’une premiere version comportant un principe d’interaction partielle

oti I’accts est strictement autonome mais oti la selection est soumise a des controles “de haut en has” vers une

deuxieme version (a fonctionnement entiercment “de bas en ham”‘) oti le contexte n’intervient plus dam les

processus d’acces et de selection.

Par consequent, le contexte n’intervient qu’a I’interface entre les representations superieures et I’informa-

tion generee en temps reel sur les proprietds syntaxiques et semantiques des membres du cohorte. Ce nouveau

modelc garde intactes les caractdristiques essentielles d’un proccssus de reconnaissance de type cohorte. II

integre les notions d’acces ct d’evaluation multiples permettant ainsi un processus de reconnaissance optimal

fond6 sur Ic principe de contingence de choix perceptif.

The Prediction Potential indexes the meaning and communicative function of upcoming utterances.

Article

Full-text available

Jun 2024
CORTEX

Genealogy, linguistic distance and mutual intelligibility in European languages

Chapter

Full-text available

May 2024

This is a draft/preprint of a chapter for the Routledge Handbook of Eurolinguistics (ed. Joachim Grzega). Comments welcome. Cite published version (2025) only. This chapter presents a somewhat simplified overview of the phylogenetic (i.e., based on historically shared language changes) and glottochronic (proportion of cognate words in the basic vocabulary) relationships among the indigenous European languages spoken today. These family relations are compared with the results of a project undertaken by ourselves, in which we computed lexical, phonological and syntactic distances between 70 pairs of Indo-European languages based on a relatively small parallel translation corpus of some 800 words for 16 languages (Germanic: Danish, Dutch, English, German, Swedish; Romance: French, Italian, Portuguese, Romanian, Spanish; Slavic: Bulgarian, Croatian, Czech, Polish, Slovak, Slovene). We also report selected results of a large-scale intelligibility test done on these 70 language pairs, and predict the degree of cross-lingual intelligibility between the members of each pair from the linguistic distances measured between them. The results show that the lexical, phonological and syntactic distances are intercorrelated, correspond rather well with the historical phylogeny, and afford a good prediction of cross-lingual intelligibility.

Neural representation of phonological wordform in temporal cortex

Article

Full-text available

May 2024
PSYCHON B REV

While the neural bases of the earliest stages of speech categorization have been widely explored using neural decoding methods, there is still a lack of consensus on questions as basic as how wordforms are represented and in what way this word-level representation influences downstream processing in the brain. Isolating and localizing the neural representations of wordform is challenging because spoken words activate a variety of representations (e.g., segmental, semantic, articula-tory) in addition to form-based representations. We addressed these challenges through a novel integrated neural decoding and effective connectivity design using region of interest (ROI)-based, source-reconstructed magnetoencephalography/ electroencephalography (MEG/EEG) data collected during a lexical decision task. To identify wordform representations, we trained classifiers on words and nonwords from different phonological neighborhoods and then tested the classifiers' ability to discriminate between untrained target words that overlapped phonologically with the trained items. Training with word neighbors supported significantly better decoding than training with nonword neighbors in the period immediately following target presentation. Decoding regions included mostly right hemisphere regions in the posterior temporal lobe implicated in phonetic and lexical representation. Additionally, neighbors that aligned with target word beginnings (critical for word recognition) supported decoding, but equivalent phonological overlap with word codas did not, suggesting lexical mediation. Effective connectivity analyses showed a rich pattern of interaction between ROIs that support decoding based on training with lexical neighbors, especially driven by right posterior middle temporal gyrus. Collectively, these results evidence functional representation of wordforms in temporal lobes isolated from phonemic or semantic representations.

Frequency and Orthographical Similarity Ratings for Kannada Words

Article

Jul 2023

Muruli Manikanta

Psycholinguistics

Chapter

Nov 2003

Gary D. Prideaux

Uncovering Human Traits in Determining Real and Spoofed Audio: Insights from Blind and Sighted Individuals

Conference Paper

May 2024

Ham or hamster? Eye-tracking evidence of a clear speech benefit for word segmentation in quiet and in noise

Article

May 2024

Hearing what is being said: the distributed neural substrate for early speech interpretation

Article

Apr 2024

Cognition for Culture

Chapter

Jan 2001

Neuroethology: From Morphological Computation to Planning

Chapter

Jan 2001

Malcolm A Maciver

Interaction of Information in Word Recognition

Article

Full-text available

Mar 1969

John Morton

Quantitative predictions are made from a model for word recognition. The model has as its central feature a set of "logogens," devices which accept information relevant to a particular word response irrespective of the source of this information. When more than a threshold amount of information has accumulated in any logogen, that particular response becomes available for responding. The model is tested against data available on (1) the effect of word frequency on recognition, (2) the effect of limiting the number of response alternatives, (3) the interaction of stimulus and context, and (4) the interaction of successive presentations of stimuli. Implications of the underlying model are largely upheld. Other possible models for word recognition are discussed as are the implications of the logogen model for theories of memory. (30 ref.) (PsycINFO Database Record (c) 2012 APA, all rights reserved)

Priming and the Effects of Sentence and Lexical Contexts on Naming Time: Evidence for Autonomous Lexical Processing

Article

Full-text available

Nov 1981

Kenneth Forster

Models of language processing which stress the autonomy of processing at each level predict that the semantic properties of an incomplete sentence context should have no influence on lexical processing, either facilitatory or inhibitory. An experiment similar to those reported by Fischler and Bloom (1979) and Stanovich and West (1979, 1981) was conducted using naming time as an index of lexical access time. No facilitatory effects of context were observed for either highly predictable or semantically appropriate (but unpredictable) completions, whereas strong inhibitory effects were obtained for inappropriate completions. When lexical decision time was the dependent measure, the same results were obtained, except that predictable completions now produced strong facilitation. In a further experiment the inhibitory effects of context on lexical decision times for inappropriate targets were maintained, even though unfocussed contexts were used, in which no clear expectancy for a particular completion was involved. These results were interpreted in terms of a two-factor theory which attributes the facilitation observed with the lexical decision task to postaccess decision processes which are not involved in the naming task. The inhibitory effects were attributed to interference resulting from semantic integration. In contrast to the results for sentence contexts, lexical contexts of the doctor-nurse variety produced clear facilitation effects on naming time (but no inhibitory effects). It was also shown that relatively minor variations in the type of neutral context could completely alter the relative importance of facilitation and inhibition.

ERIS-context sensitive coding in speech perception

Article

Apr 1981

Stephen Michael Marcus

Speech perception: a model of acoustic–phonetic analysis and lexical access

Article

Jul 1979

Dennis H. Klatt

Speech Perception as a Cognitive Process: The Interactive Activation Model

Article

Dec 1984

TheTRACE model of speech perception

Article

Jan 1986

An activation model of auditory word recognition

Article

Jan 1982

Monitoring sentence comprehension

Article

Jan 1979

Context-sensitive coding in speech perception

Article

Jan 1981

SM Marcus

Speech Understanding as a Psychological Process

Article

Jan 1980

William D Marslen-Wilson

An overview of recent psychological research into speech understanding is presented, with an emphasis on spoken word-recognition processes. Human speech understanding is shown to be interactive in character, with speech being understood as it is heard by optimally efficient processing procedures. The flow of analyses through the system is assumed to be controlled by the processing principles of bottom-up priority and of obligatory processing.

Functional parallelism in spoken word-recognition

Abstract

Recommended publications

Early contribution of phonological awareness and later influence of phonological memory throughout r...

Level of Discrimination for Recognition Judgments Reduced following the Recognition of Semantically...

The roles of bottom-up and top-down information in the recognition of reduced speech: Evidence from...

Ambiguity and Competition in Lexical Segmentation