ChapterPDF Available

Online dictionaries of English

January 2011

January 2011

In book: e-Lexicography: The Internet, Digital Initiatives and Lexicography (pp.230–250)
Publisher: Continuum
Editors: Fuertes-Olivera, Pedro A. and Bergenholtz, Henning

Authors:

Adam Mickiewicz University

In this paper I present an overview of the spectrum of available online English language dictionaries, and then offer some general comments on a few selected key issues. Given the current explosion of web content, it is quite pointless to try to list every single dictionary available. It makes better sense to identify the salient categories of online dictionaries and selectively focus on their prominent and typical representatives. The first notable category, so important to the many learners of English worldwide, are the famous British monolingual learners' dictionaries (the Big Five). Here, it is interesting to observe the gradual transition to the online medium in what has sometimes been called the freemium approach. Quality general English dictionaries aimed at the native speaker are not so well represented, but there are a wide choice of specialized (subject) dictionaries of varying quality and provenance. Special-purpose dictionaries include pronouncing dictionaries and onomasiological dictionaries. Diachronic dictionaries have also established a presence on the internet. As one guise of the Web 2.0 experience, we witness the emergence of bottom-up (or user-involvement) lexicography, with such prominent exemplars as the Urban Dictionary or Wiktionary. Hyperlinking is a fundamental feature of the web, but it is, arguably, overused in the so called dictionary aggregators: dictionary portals which put together entries from several online dictionaries. This creates highly redundant assemblages of lexicographic data. How to tap the richness of the Web but present the results in a user-friendly manner without laborious human intervention is a tough question. Another issue that still awaits satisfactory answers is the organization of access to data in online dictionaries. Even in highly respected dictionaries, there remain basic problems of access, such as with locating multi-word units, notwithstanding the upbeat tone of metalexicographers who often just pronounce the problem as essentially solved in the electronic medium. Other issues related to new technologies are the use of graphics, multimedia and alternative presentation modes, and these receive some attention. Finally, I play with the idea of the dictionary as an advanced query system sitting on top of a text corpus. Using collocation dictionaries as an example, I demonstrate that the difference between a sophisticated corpus query system and a more traditional lexicographic product may soon become something of a technical subtlety.

The entry for TEAM in myCOBUILD.com as an example of a stepwise interface

…

Entry for TOOTH in the ForBetterEnglish.com automated collocations dictionary

…

JustTheWord alternative collocation suggestions for ‘powerful tea’

…

Figures - uploaded by Robert Lew

Content may be subject to copyright.

Content uploaded by Robert Lew

Content may be subject to copyright.

Online dictionaries of English

Robert Lew

Adam Mickiewicz University

Abstract

In this paper I present an overview of the spectrum of available online English language

dictionaries, and then offer some general comments on a few selected key issues. Given the

current explosion of web content, it is quite pointless to try to list every single dictionary

available. It makes better sense to identify the salient categories of online dictionaries and

selectively focus on their prominent and typical representatives. The first notable category, so

important to the many learners of English worldwide, are the famous British monolingual

learners’ dictionaries (the Big Five). Here, it is interesting to observe the gradual transition to

the online medium in what has sometimes been called the freemium approach. Quality general

English dictionaries aimed at the native speaker are not so well represented, but there are a

wide choice of specialized (subject) dictionaries of varying quality and provenance. Special-

purpose dictionaries include pronouncing dictionaries and onomasiological dictionaries.

Diachronic dictionaries have also established a presence on the internet. As one guise of the

Web 2.0 experience, we witness the emergence of bottom-up (or user-involvement)

lexicography, with such prominent exemplars as the Urban Dictionary or Wiktionary.

Hyperlinking is a fundamental feature of the web, but it is, arguably, overused in the so called

dictionary aggregators: dictionary portals which put together entries from several online

dictionaries. This creates highly redundant assemblages of lexicographic data. How to tap the

richness of the Web but present the results in a user-friendly manner without laborious human

intervention is a tough question. Another issue that still awaits satisfactory answers is the

organization of access to data in online dictionaries. Even in highly respected dictionaries,

there remain basic problems of access, such as with locating multi-word units,

notwithstanding the upbeat tone of metalexicographers who often just pronounce the problem

as essentially solved in the electronic medium. Other issues related to new technologies are

the use of graphics, multimedia and alternative presentation modes, and these receive some

attention. Finally, I play with the idea of the dictionary as an advanced query system sitting on

top of a text corpus. Using collocation dictionaries as an example, I demonstrate that the

difference between a sophisticated corpus query system and a more traditional lexicographic

product may soon become something of a technical subtlety.

Introductory

The present paper is intended as an overview of online dictionaries of English, often seen, and

probably rightly, as the leading lexicographic tradition of the present. Although a balanced

overview is my primary goal, I will also touch upon some general issues and adopt a more

evaluative position here and there. However, this will only be a secondary perspective, as the

specific issues are covered in greater depth in some of the other papers in the present volume.

Obviously, given the sheer number of the currently available on-line dictionaries, no-one

can hope to produce a complete catalogue, and this is not the purpose here. Rather, the idea is

to present prominent and representative exemplars of specific types of dictionaries and focus

on their properties of interest. But what are those types of dictionaries? As dictionaries can be,

and have been, compared on a number of different levels, classifying them has traditionally

been problematic. This has become even more of a challenge in the age of electronic

dictionaries. What, then, could be the basic classifying criteria for online dictionaries?

Clearly, most of the traditional criteria can still be applied to online products. Here, of

course, we find the complex (and at times confusing) network of overlapping oppositions:

general/specialized subject, general/special purpose, L1/L2/FL speaker, expert/layman,

contemporary/historical, etc...

There do appear, however to be some criteria or oppositions that have not been inherited

from printed dictionaries but rather are specific to online dictionaries.

1 Some additional criteria for classifying online dictionaries

1.1 Institutional vs. collective

A variety of overlapping classification criteria have been used to categorize online

dictionaries. For example, in terms of user involvement, there is the institutional versus

collective opposition (Fuertes-Olivera 2009); the latter category signifies a collaborative effort

by a community of non-professionals, who can themselves be dictionary users; an earlier

paper by Carr (1997) has also used the terms bottom-up and collaborative. User-involvement

is yet another designation for a similar concept, while open stresses a slightly different aspect

of what might again be a fairly similar formula.

1.2 Free vs. paid

Collective dictionaries would normally be free to use. Conversely, institutional dictionaries

need not necessarily involve fee-based access, so the free versus paid contrast is an

independent one. It is also increasingly difficult to demarcate clearly between free and paid,

with the clear cases leaving a substantial grey area in the middle, as revenue to the publisher

can take different forms. For example, individual pay-per-view or subscription-based access is

a clear case, but when syndicated as part of a more comprehensive service and sold, say, to

libraries, the end user often does not bear the direct cost. Then there are cases where online

access is offered (perhaps for a limited time) as a bonus for buyers of paper editions. Still

closer to the free end of the cline are ad-supported dictionaries, and this appears to be a rather

popular model at the moment.

1.3 Number of dictionaries

In terms of how many dictionaries are offered by the specific services, at least the following

four options come to mind:

1. individual dictionaries: just like the traditional printed dictionaries, there are

standalone, single online dictionaries;

2. dictionary sets consisting of clusters of related dictionaries may be offered from a

single landing page; a good example is the Cambridge dictionaries online page;1

3. dictionary portals only include hyperlinks to actual dictionaries (examples will be

presented below);

4. dictionary aggregators excel at pasting together the content of various dictionaries and

serving them on a single page (again, examples will be discussed below).

In my overview below, I will begin with some notable representatives of institutional

dictionaries offered free of charge to the world internet community.

2 Institutional Dictionaries

2.1 General English Dictionaries

General English Dictionaries are traditional general-purpose dictionaries which provide a

relatively rich microstructural treatment of (primarily) contemporary English, which is

traditionally expected from general reference desk dictionaries, and where the word list is not

restricted by domain or register.

2.1.1 American

Traditional US dictionary publishers seem to have embraced the web: as many as three of the

major American players on the market of general desk and college dictionaries make their

dictionaries available online free of charge. These are the Merriam-Webster Online

Dictionary, American Heritage Dictionary and Random House Unabridged Dictionary, the

last one being included only as part of the Dictionary.com service (on which see 3.1 below).

2.1.2 British

Until recently, the available offer of online general-purpose dictionaries on the British scene

had been less complete, with the traditional and most prestigious publishers (notably Oxford

University Press) apparently hesitant about placing their products online for free. Only very

recently, OUP created the new oxforddictionaries.com2 lexicographic portal, built around two

of the publisher’s recent dictionaries: the newest (third) edition of the Oxford Dictionary of

English (under the heading World English), and its American counterpart, the New Oxford

American Dictionary, also in its third edition. A premium subscription service is available,

with one year free for buyers of the printed copy.

The availability of the free/premium combination for these Oxford dictionaries exemplifies

rather well the new business model that is currently being followed by a number of

publishers: the model known by the linguistic blend freemium. The approach works on the

principle that basic content and functionality is offered essentially free of charge (in response,

we might say, to the free-lunch mindset of today’s netizens). The free offer, however, is used

as an opportunity to market and sell extra content, which might be richer lexicographic data

and/or non-lexicographic content, such as exercises or language testing materials. To continue

with our example, the premium oxforddictionaries.com service offers the following extra

features (Judy Pearsall, personal communication):

− sense-linked thesaurus of 600,000 synonyms and antonyms;

− advanced search and browse features;

− 1.9 million sense-linked examples from the Oxford English Corpus;

− audio pronunciations;

− My Oxford Dictionary personalization features;

− browsing and search by subject area, meaning category, part of speech, etc.;

− four additional zones fully linked to dictionary content, including Writing Skills zone,

Writers and Editors zone, Example sentences zone, and Puzzles zone.

To some extent, free online versions may drive the sales of paper copies — but of course this

argument could be reversed, with online access deterring some potential buyers from

purchasing a printed copy.

Apart from the two Oxford dictionaries, there are also other notable British dictionaries

offered free of charge. Collins offers what it refers to as the Collins English Free Dictionary.3

A closer examination reveals that this is not the same as the authoritative Collins English

Dictionary; the latter, however, does seem to be available, but only as part of

TheFreeDictionary service (on which see 3.1 below). The venerable Scottish publisher

Chambers offers on its website its Chambers 21st Century Dictionary.4 Again, though not

really the same as the renowned Chambers English Dictionary, the 21st Century is still a

usable, solid reference work for general consultation.

The Encarta World English Dictionary,5 having originated in a cooperation between the

London-based Bloomsbury publisher and Microsoft, actually comes in two versions, and both

are available via the same website; there is the World English version, marketed as the

dictionary that provides unrivalled treatment of the regional varieties of English, and the

localized US version; the site provides an option to switch quickly between the two, and it is

fascinating to observe, by switching back and forth, the differences in the coverage of

regional terms and meaning, spelling and pronunciation.

2.2 Learners' dictionaries: the Big Five

According to data from internetworldstats,6 English is the foreign language of some 86% of

Europe’s active internet users. Now, given that English is today’s de facto lingua franca and

that WWW content in English dwarfs out that in any other language, it becomes clear that

non-native speakers are a significant category of online dictionary users, present or future. In

this context, the category of English learners’ dictionaries comes to the focus, since these are

the reference works designed specifically with the non-native speaker in mind. English

learners’ dictionaries enjoy a long-standing tradition, which goes back to around the 1940’s

or, as some claim, the 1930’s (cf. Cowie 1999). Their content has been meticulously reworked

over numerous successive editions, and because of the worldwide customer base and the

corresponding sales volumes, publishers of monolingual English learners’ dictionaries have

been able to take advantage of select teams of expert lexicographers. These dictionaries have

enjoyed high levels of prestige, and so have their traditionally British publishers.

The last few years has seen free versions of British monolingual dictionaries for advanced

learners appear online, one by one. On the whole, the major British MLD’s have followed a

pattern of remarkable similarity (Yamada 2010), perhaps as part of the competitive drive, and

this is also reflected in the features offered in their online versions. There is also a more

down-to-earth reason for the similarities found in a number of British MLD’s: they tend to use

the same software dictionary production platform from IDM.

The range of available English MLD’s opens with the pioneer in this segment, Oxford

Advanced Learner's Dictionary,7 a free version now roughly based on the 7th print edition. A

long-time competitor, Longman Dictionary of Contemporary English, currently in its fifth

edition, has also offered a free online version8 for some time. The dictionary’s landing page

specifically mentions a limitation of the free version: recordings of spoken pronunciation are

only available for a subset of headwords and example sentences (more specifically, the audio

is available for the entries in the letter stretches D and S). The note further states that audio

recordings for all entries are available in “the CD-ROM version”: this is not quite accurate, as

the optical disk version is actually offered on a DVD-ROM. But the free version is not the

only online version of this dictionary: there is also a radically different premium online

edition9, which offers essentially the same content as the off-line DVD-ROM version.

Cambridge Dictionaries Online10 represents an example of an institutional dictionary set (as

defined in 1.3 above): apart from the flagship Cambridge Advanced Learner’s Dictionary,

four other learners’ dictionaries from the publisher are available at the same address.

Amongst the major British learners’ dictionaries, Macmillan English Dictionary may well

be the one to have offered the most complete set of lexicographic content online11 free of

charge, including audio pronunciations of all headwords and a sense-linked thesaurus.

The one member of the Big Five set which has remained apparently sceptical when it comes

to offering free online access of any kind is COBUILD. Although it has offered subscription-

based access for some time,12 none of this is available freely, if we disregard an outdated 4th

edition being hosted on a third-party service.13 Recently, it looked as if COBUILD was set to

become the most widely used learner’s dictionary when, in autumn 2009, Google apparently

obtained a licence for COBUILD content and placed it online as the main Google dictionary

for English. This was a questionable choice, as COBUILD is not really well-suited for the

type of uses that Google users were most likely to need the dictionary for, i.e. problems with

text reception: of all the major learners’ dictionaries, COBUILD has the smallest coverage

(Rundell 2006). On the other hand, the features supporting text production would remain

underused. Google’s half-hearted implementation of the interface certainly would not have

made users more sympathetic towards the dictionary. For example, Google dictionary

included COBUILD’s syntactic codes, but without a word of explanation anywhere. Surely, it

is a long shot to assume that a casual user of the Google dictionary will appreciate the

significance of a code such as “NVAR” (in this case, an indication that a noun in the sense so

marked has both mass and individuated uses). Considering all this, it is not at all surprising

that in August 2010, COBUILD was replaced as the database for Google dictionary with The

Oxford American College Dictionary (Judy Pearsall, personal information).

2.2.1 American learners' dictionaries

Although it is the British publishers that lead the market of monolingual English learners’

dictionaries, such dictionaries have also been published elsewhere, and one particular

dictionary that made a premiere recently with quite a bit of publicity is the Merriam-Webster's

Learner's Dictionary.14 What is rather unique about this dictionary is that the launch of its

online version coincided with the publication of the first paper edition. The free online content

includes audio pronunciation, and the user interface is at least as good as those of the British

dictionaries, but despite the marketing claims, the lexicographic content itself is not

groundbreaking, and still lacks a number of modern features now taken for granted in the

leading British products (Bogaards 2010; Hanks 2009). The dictionary does have more

examples than the competition, but their quality has been questioned (Hanks 2009).

Despite what some might be led to believe, the Merriam-Webster's Learner's Dictionary is

by no means the first American dictionary of its type: several have already been published,

and at least one of them, Heinle's Newbury House Dictionary of American English,15 is freely

available online. However, the latter is a rather small dictionary and not a particularly

impressive one. All in all, learners of American English may actually be better off using

British-published dictionaries of American English, such as the Cambridge Dictionary of

American English.16

2.2.2 Louvain EAP Dictionary (LEAD)

Apart from the established publishers, some academic centres are also trying to enter the field

of learners’ dictionaries. One particularly promising project currently in progress (not yet

publicly accessible) is the Louvain EAP dictionary (LEAD), which is being developed as a

dictionary for non-native writers. Its main novelty is that it is customizable in terms of field

domain (business, medicine) and mother tongue (French, Dutch). In consequence, usage notes

and equivalents match the L1 of the user, and some of the examples are domain-specific. The

dictionary will also have (as you might expect from a product created at the Centre for

English Corpus Linguistics) a solid grounding in corpora, and integrated corpus access.

2.3 User-involvement (bottom-up) lexicography

In the democratic world of the internet, users can play lexicographer as well and create their

own online dictionaries. There is quite an impressive range of these, but let us have a look at

three representative exemplars:

2.3.1 Urban dictionary

A success story in its own right, the Urban dictionary17 is a true bottom-up initiative which

recently celebrated its 10th anniversary. One of the community features exemplified here is

that users vote on the „best” definitions. But such democracy does not necessarily serve

lexicography well: as it turns out, the most liked definitions are not of the type that would

really help someone who does not already know the meaning. Clearly, true explanatory

definitions are too predictable and thus not “interesting” enough, and are being pushed back to

the bottom of the list. Instead, collaborative dictionary entries, unless properly moderated,

tend to become the playing ground for showing off wit, marking in-group membership and

venting prejudice. For example, one entry at the headword BOOTYISM runs as follows: “The

gospel according to Beyonce. Often confused with Buddhism.” This entry is written in an

abbreviated style posing as lexicographese, and manages to allude rather cleverly to the

semantics as well as origins of the slang term, but it would probably not be of much help to a

user who has no clue about the meaning. In this case, the author seems to be aware of this

deficiency, and makes up for it in the (entirely invented) example exchange:

Todd: I'm thinking about converting to Bootyism.

Michael: Nah man, it's BUDDHISM.

Todd: No, cause in Bootyism all you do is worship ass.

2.3.2 Wiktionary

Wiktionary18 may be the ultimate collaborative dictionary. A recent in-depth analysis of this

resource (Fuertes-Olivera 2009) presents a number of interesting findings. It is observed that,

contrary to what is often claimed, Wiktionary is not a multilingual dictionary but an English

dictionary with a translation overlay for several other languages. It is also noted that very

similar items may receive radically different treatments, lacking internal consistency and

contradicting the Wiktionary guidelines.

2.3.3 Wordnik

Wordnik19 presents an interesting blend of online dictionary genres, involving a collaborative

community-driven component built around a “professional” core. According to the founder

Erin McKean (personal communication), user-generated content is encouraged here but in

"guided" ways, with less emphasis on user-created definitions than is usual in collaborative

projects. Wordnik embeds content from other datasets: at this time, Twitter and Flickr are

being tapped for real-time citations and relevant images, respectively. The service employs

modern data mining techniques to identify in corpora citations of the self-defining and

exemplar types (McKean, personal communication). Overall, there is less reliance on

traditional definitions and the emphasis is shifted to citations.

2.3.4 Collaborative-institutional dictionaries

Commercial publishers also try to get their users actively interested and involved in

lexicography, perhaps in an effort to persuade them to stay on the site and come back for

more. Examples of collaborative sections hosted on institutional dictionary sites suggest that

the opposition institutional versus collective dictionary (Fuertes-Olivera 2009) may no longer

be a sharp one. Two such examples from well-known institutional publishers are the

Merriam-Webster Open Dictionary20 and Macmillan Open Dictionary.21 A perusal of the

user-added entries reveals that most of the entries added would not meet the criteria for

inclusion in the regular edition of the dictionary, and their presence merely provides evidence

of the conventional wisdom that “the dictionary” is a collection of “all the words” of a

language.

Apart from adding open dictionary components, online dictionaries sometimes offer other

extras aimed at involving the users. Recent add-ons include social networking features, such

as the award-winning Macmillan Dictionary blog.22

So far we have discussed general dictionaries of contemporary English, aimed at both

native speakers of English and foreign learners. Let us now move beyond these common

types, to diachronic and specialized dictionaries.

2.4 Diachronic (historical) dictionaries

Users of diachronic dictionaries are most typically language scholars, and so their level of

sophistication and language awareness is normally far beyond that of lay users. As language

experts, they can reasonably be trusted to make choices that a non-expert user will not be in a

position to make, such as the explicit selection of microstructural data categories (and we will

revisit the issue of customization in a later part of this article). The makers of academic

diachronic dictionaries appear to be aware of these ramifications, as exemplified by the online

version of what is perhaps the most famous dictionary world-wide (at least for English), the

Oxford English Dictionary. Access to the OED is subscription-based, and affiliated scholars

would most of the time rely on their institutional subscription rather than a personal one. In

contrast, a more restricted (in terms of period) but no less voluminous Middle English

Dictionary23 has been freely available online since 2007, when the University of Michigan

completed the digitization process with the help of a government grant. The dictionary offers

a rather large number of technically complex search options but these should be manageable

for language scholars and their students.

2.5 Subject field dictionaries

There are countless online specialized dictionaries out there on the web, most of them fairly

small in size, dealing with the vocabulary of a specific subject field (as well as narrower sub-

fields). Because of the sheer number, many users will find it useful to consult online

directories of such dictionaries, one of the most comprehensive being Glossarist.com: an

example of a dictionary portal as listed in my provisional taxonomy under 1.3 above.

Indexing portals of this type only include links to dictionaries on external pages, without

themselves hosting or displaying actual lexicographic content.

The lexicographic wisdom that content and presentation are largely two separate aspects is

strengthened by those products where there is a sharp contrast in quality between one and the

other. One case in point is Dorland's Medical Dictionary24 from the respectable pair Merck

Medicus and Elsevier, where solid content is marred by the uninspired (to say the least)

access interface. Users are presented with a long chain of alphabetic stretches which have to

be navigated linearly in a fashion resembling page-turning, only much slower (although there

is a term search window, it does not apply to the dictionary itself, but to other services). To

metalexicographers, this dictionary serves as a warning against sweeping generalizations

about electronic dictionaries being faster and superior in terms of access: apparently, it is

perfectly possible to produce an online dictionary where access is more cumbersome than in a

paper book.

2.6 Dictionaries with restricted macrostructure

One way to think of special-purpose dictionaries is that they often involve systematically

restricted treatment in either macrostructure or microstructure. In the earlier case, only a

distinct subset of the vocabulary is included in the wordlist. Field dictionaries, already

covered in 2.5 above, may be included here. Another exemplar of a restricted macrostructure

dictionary is the well-known and successful Acronym Finder25, which aims to include

acronyms, including those pronounced as one word and letter by letter (sometimes called

initialisms). Although Acronym Finder does not limit its headword list to English acronyms, it

is a fact that English very clearly dominates.

2.7 Dictionaries with restricted microstructure

In contrast to dictionaries with restricted macrostructure, restricted-microstructure dictionaries

are characterized by a systematic reduction, not in the word list itself, but in the lexicographic

data categories presented at each entry, compared to a general dictionary. The free Online

Etymology Dictionary26 is a representative of the genre: the lexicographic data for a given

headword is restricted to an explanation of the word’s origins.

Pronouncing dictionaries are another major category of restricted-microstructure

dictionaries, where the chief lexicographic data given indicates the phonetic form of the entry

word. Semantic information is only given in exceptional cases, such as to disambiguate

between graphemically identical words that are pronounced differently (i.e. homographs that

are not homophones). There is the question of the exact form in which information on

pronunciation is conveyed. In printed books, transcription (in one of a number of standards,

the most universal being the IPA) used to be the only option, but in the multimedia

environment of the web, the expectation of users is to be able to hear an audio rendition of an

item’s pronunciation. This expectation is met by the popular free online talking English

dictionary howjsay.com,27 which provides recorded audio clips, but no written transcription.

At the other end of the cline are academic pronouncing dictionaries such as the Carnegie

Mellon University Pronouncing Dictionary,28 which presents transcriptions in the ARPAbet

respelling system, or Péter Szigetvári’s English Pronouncing Dictionary,29 which employs a

variant of the SAMPA respelling system.

There is no denying that being able to hear what the word or phrase sounds like is an asset,

but does this mean, as most people seem to assume, that phonetic transcription is now

dispensable? It probably is for native speakers of English, but hardly so for speakers of other

languages looking up English pronunciation. For them, it is an illusion to believe that just

hearing a word pronounced in a foreign language is enough to register, less still learn, its

correct pronunciation. Due to the effect known as categorical perception, speakers of a

language tend to hear foreign language sounds through the filter of their native language

phonology. Consequently, what foreigners will hear is mostly their native language sounds

and tend to miss the distinctions not present in their own language. For example, a speaker of

Polish may easily miss the difference between met and mat. The important advantage of

phonemic transcription is that it provides an explicit graphic representation of the phonemes

involved, drawing attention to the phonemes as entities. (This is not to say that the two

academic dictionaries cited above do this in a very user-friendly way: they do not.) Of course,

it is also true that efficient use of phonetic transcription does not usually come naturally for a

language learner and requires guided training.

But that is not the end of the story. Apart from pure phonemic identity, there is the

important subphonemic phonetic detail, including positional allophony which, again, is very

hard to hear for the untrained learner. Although traditional printed pronouncing dictionaries

tend not to give subphonemic detail, there is no principled reason why future online

dictionaries should not be able to offer a choice of the level of transcription, including a

narrow-phonetic rendition for those who might want or need it. Technically, it should not be

terribly difficult to take stock of at least the rule-based variants.

As noted by Sobkowiak (2009), phonetic transcription has a representational function and

an indexical function. The former has to do with the representation of the phonetic form of a

word (or, more generally, other linguistic string). The indexical function allows the user to use

symbols for accessing (sets of) lexical items, such as when looking for words that exhibit a

given phonetic pattern. A systematic transcription system is at present a prerequisite for the

indexical function to be possible, although not all dictionaries that do have transcription,

allow ‘sound search’ options. Clearly, of the three free pronouncing dictionaries here

presented, Szigetvári’s English Pronouncing Dictionary is the most sophisticated in this

respect.

2.8 Onomasiological dictionaries

Onomasiological dictionaries are those that are specifically designed to take the user from a

concept or idea to linguistic form, rather than explaining the meaning or use of a given form.

A traditional paper dictionary of this type would most typically be a thesaurus or synonym

dictionary. Thesaurus.com30 is a companion site to the popular Dictionary.com aggregator

(see 3.1 below). A more interesting online example of such a dictionary is RhymeZone,31

which started off as a synonym dictionary calling itself the Semantic Rhyming Dictionary.

Somewhat predictably, probably because of the phrase “rhyming dictionary” in the name,

users arrived at the dictionary from search engines looking for traditional phonetic rhymes,

and this is what the default search mode now offers. In fact, searching for rhyming words is

also an onomasiological query, albeit in a broader sense. In the more restricted sense of

onomasiological, the dictionary offers lists of synonyms, antonyms and “related words”. For

these, RhymeZone relies on data from the English WordNet32 lexical database, just as so

many other lexical resources do these days: it has become one of the favourite dataset for

many online dictionaries, because it is free and NLP-tractable in ways that make such

integration relatively easy.

One interesting way in which WordNet data is used is graphic visualization engines such as

VisuWords33 or Visual Thesaurus,34 where the idea is to represent WordNet’s lexical relations

in a visually appealing graphical form. The latter now shows up in Cambridge Dictionaries

Online entries.

Having completed a quick tour of the representative online dictionaries of English, we will

move on to a number of overarching issues that are relevant and topical for online dictionaries

of today and tomorrow.

3 Some issues in online dictionaries

3.1 The dictionary web

The World Wide Web is built around the concept of hypertext, where texts, documents and

media make up an interconnected network. Like most other sites, online dictionaries

hyperlink, interlink, embed and integrate, and it will not take long for a careful user of online

dictionaries to start noticing that quite a lot of the same content crops up again and again on a

variety of dictionary sites. For example, the very same Visual Thesaurus images which feature

in Cambridge Dictionaries Online are also present at the Dictionary.com35 site. The latter is an

example of a dictionary resource which does not rely on its own data, but instead aggregates

lexicographic content from other electronic (online) dictionaries. Dictionary.com is a

particularly popular such aggregator. The popularity, one might suspect, has a lot to do with

the attractive domain name, which to many users (and search engines?) strongly suggests that

this is the Dictionary (see e.g. Béjoint 2010 on the popular image of the dictionary). As of this

writing, the resource aggregates lexicographic content from 15 dictionaries, including the

American favourites Random House Dictionary and American Heritage Dictionary, as well as

half a dozen special-purpose and special-subject dictionaries.

Another aggregator is TheFreeDictionary, with American Heritage Dictionary (again!),

WordNet (again!), and Collins English Dictionary (and Thesaurus). The resource is worth

consulting for this last one, as this time (compare 2.1.2 above), it is indeed the respectable

Collins English Dictionary, which is generally not freely available elsewhere.

While the ability to hyperlink and embed is one that lies at the heart of the World Wide

Web, in dictionary aggregators the idea is taken to extremes, with the result that such

dictionary portals produce absurdly long entries by mechanically pasting together, back-to-

back, entries from several online dictionaries. These individual entries are often very similar,

which results in highly unhelpful, many-times redundant, tortuous assemblages of

disconnected lexicographic data.

3.2 Access

Electronic dictionaries, including online dictionaries, are often praised for their access

functionality, which is claimed to be superior compared to paper book form. Clearly, the

electronic interface is by definition more flexible and has a potential for efficiency that is not

achievable in static printed form, but it is also true that this potential is not always properly

utilized, especially if the online dictionary is retrospectively digitalized (Wiegand et al. 2010:

209). One example of a respectable online dictionary with paper-like access is the American

Heritage Dictionary, which has no search facility at all, worse still is Dorland's Medical

Dictionary (see 2.5 above), where outer access is even slower and more cumbersome than in a

printed book. However, some online dictionaries do take advantage of the electronic media

and explore alternative access routes. As an illustration of this issue, let us consider issues of

access in cases where a search term potentially returns large amounts of data.

3.2.1 The step-wise approach to outer access?

Over ten years ago, Hulstijn and Atkins (1998) proposed what they called “step-wise access”

for electronic dictionaries. In this connection, it is interesting to observe how this proposal

stands up in view of the practical implementations in online English dictionaries. For this, we

need to examine the volume of data that a dictionary presents to the user in those cases when

a search term matches more than a single treatment unit, such as multiple lemmata (such as

items of different part of speech), or includes multi-word expressions (MWE's), such as fixed

phrases, idioms or phrasal verbs. The spectrum of actual solutions seen in English online

dictionaries can essentially be reduced to three options:

1. a menu of target items is presented;

2. a menu is presented, but the most likely choice opens by default;

3. partial entries are listed.

The first option, by far the most common, can be illustrated using Macmillan Dictionary

Online as an example. Here, a search on a word-long string team returns a vertical menu of

nine matches, each one hyperlinked to an entry or subentry. The top of the menu looks like

this:

team NOUN

team VERB

dream team NOUN

sales team NOUN

...

Option 2. features in the Merriam-Webster's Advanced Learner's English Dictionary, where

a search for team produces a similar list of seven items, but the first of these (here again, team

NOUN) is already given as the full entry immediately below the list.

Option 3. is implemented in the online dictionary at myCOBUILD.com,36 available to

buyers of the printed copy of the Collins COBUILD Advanced Dictionary. The approach is an

intermediate one between a bare lemma list (Option 1.) and complete entries (Option 2.). As

seen in Figure 1, showing the entry TEAM in myCOBUILD.com, the dictionary interface alerts

the user that multiple entries have been found, and then displays the top of each lemma with a

More link leading to the complete entry for that lemma.

Figure 1: The entry for TEAM in myCOBUILD.com as an example of a stepwise interface

Which of the three options is best? A universal answer, ignoring lexicographically relevant

details such as the nature of the lookup situation and specific user needs and skills, rarely

makes sense in lexicography, but let us offer some observations that might have a more

universal appeal. Option 2. looks attractive, but there is a danger here that users may fail to

recognize that the default choice (as here team NOUN) is the wrong one in their case. In

contrast, Option 1. seems relatively safe in terms of the risk of missing the right option, but

the problem here lies in the economy of effort (aka laziness): users may lack the patience to

navigate through the menu to actual full treatment, and may decide instead to ditch a tool

which requires two much clicking work. In view of the above reservations, Option 3. might

perhaps be optimal (other things being equal), and it is surprising that so few dictionaries have

adopted it.

3.3 Customization and profiling in online English dictionaries

A recent study by Tono (2011), the first dictionary use study ever to employ eye tracking,

confirms the suspicion that dictionary users differ greatly in their consultation habits and

strategies. The realization that different users have different needs and expectations lies

behind efforts to vary or customize e-dictionaries (De Schryver 2009; Verlinde et al. 2010),

and, indeed, in some online dictionaries of English we have reviewed above, users do have

some ability to control the presentation of lexicographic data.

Oxford English Dictionary online has control buttons to display or hide away the following

data types: Pronunciation, Spellings, Etymology, Quotations, Date Chart, Additions. It should

be observed that this solution is not really lexicographic-function-driven (Tarp 2008), as the

user here is required to explicitly select the data fields included in the dictionary. However,

the users of an academic dictionary such as this usually represent a high level of

sophistication (many being language scholars), and so they are much more likely than naive

users to know directly and explicitly what data types they actually need.

Macmillan English Dictionary Online offers two pre-packaged presentation modes which

can be selected by flipping the Show Less/Show More control button located next to the

lemma sign. The choice is suggestive of the difference between a text reception mode and a

text production mode, respectively. Switching to the more basic mode hides away the

phonetic transcription, collocations (with examples), grammar labels and some of the

examples.

However, synonym links are still included, even though, arguably, a synonym list is not

very useful for text reception. Only a minority of dictionary users will be aware that the

dictionary has a third, even simpler mode, available via the so-called interstitial page,

accessible from collaborating news sites37 by double-clicking on any word in the text (luckily,

the engine includes lemmatization, so the word-form stealing takes the user to the lemma

STEAL). In this mode, all examples and synonyms are now absent, as one would expect in true

reception mode.

User profiling is one of the highlights in the new Louvain EAP dictionary (see also 2.2.2

above), now in development, where the content presented depends on the user-selected native

language and discipline (field domain) of interest.

3.4 Multimedia in online dictionaries

Online dictionaries can potentially include a range of multimedia content. The potential is

utilized in online dictionaries of English to varying degrees.

3.4.1 Graphics

Graphical elements are not the sole domain of electronic dictionaries, as drawings, and (to a

lesser extent) photographs, diagrams and tables have been used for a long time in paper

dictionaries. However, pictorials are more easily and cheaply included in electronic

dictionaries (Lew 2010). For example, illustrations are present in some entries in Cambridge

Dictionaries Online or the free online version of Longman Dictionary of Contemporary

English.

Thanks to the linkability of the web, it is quite possible to embed media from other

providers. However, one has to count with the ramifications of limited control over

hyperlinked content. For example, between (roughly) November 2009 and June 2010, the

Google Dictionary used to display popular images from Google’s own image search service

next to some entries. As a consequence, the Google Dictionary entry for KILT included a

photograph which, likely without conscious intent, conveyed all too clearly the cultural

information that kilts need no accompanying underwear (in the interest of propriety, no

screenshot is included in this article). As of this writing, the Google Dictionary has

discontinued the inclusion of images.

3.4.2 Audio

It is becoming increasingly popular for online dictionaries of English to offer audio recordings

of entry words. However, recordings of other verbal elements (definition, examples) are rarely

included: of the dictionaries discussed in this article, it is only the subscription version of

LDOCE which offers spoken recordings of all example sentences. One novel use of audio is

to present characteristic sounds associated with the entry word: an interesting subgenre of

ostensive defining. Proposals to include such elements in electronic dictionaries have been

made by Dodd (1989: 91) and Ooi (1998: 112). Dodd called them sound effects, and such

recordings are now available in the free Macmillan English Dictionary Online. There, the user

can hear the sounds produced by musical instruments under their relevant headwords, both

popular ones (GUITAR, PIANO, VIOLIN, RECORDER), as well the less well-known (SITAR).

Animal noises and bird calls are likewise included (ROAR, HOOT: perhaps also worth linking

under the entries LION and OWL), as well as sounds made by humans (CLAP, LAUGH, HICCUP),

and noisy machines (TRAIN, HELICOPTER).

3.4.3 Video and animation

With the speed of the internet steadily on the increase, video content is becoming mainstream

on the web. However, English online dictionaries have not really embraced the video

technology so far. This caution may, in fact, be well-motivated: Chun and Plass (1996) point

out that video sequences are too transient to allow the spectator to build a stable mental

model. Thus, videos may not make good cognitive sense, because the viewer may be unable

to pace the information processing at the rate that works for them.

Similar reservations can be raised for animated graphics, and there is at least one empirical

study which appears to substantiate the pessimistic view of the effectiveness of animations, at

least for dictionary-induced vocabulary learning. Lew and Doroszewska’s recent study (2009)

found a strong and significant negative impact of viewing animations on vocabulary retention.

3.5 Dictionaries, corpora and lexical databases

We have seen above repeatedly online dictionaries using WordNet data. In fact, WordNet is

often loosely referred to as a “dictionary”, even though, in more careful usage, it is a lexical

database rather than a dictionary. I suspect that for the average user, the distinction is too fine

a point. Yet, if we look at the recent history of dictionary-making, we see the growing role of

information technology and structured data: corpora, databases, the use of structured markup

such as XML. The current trend then is towards a clearer separation of the data layer from

presentation, in line with Sue Atkins’ visionary proposal (1996). Increasingly, the dictionary

as the user sees it is likely to be but an epiphenomenon on a structured lexical database or

corpus, and the presentation layer is set to become an automated procedure, requiring little or

no human intervention (De Schryver 2009; Atkins et al. 2010; Kilgarriff and Rychlý 2010)

(also see Almind and Nielsen, this volume, Gouws, this volume?).

Indeed, as corpus interfaces and wrappers get increasingly sophisticated, they can be used

in some ways similar to dictionaries, so that even a more cultured user may not care what’s

“under the hood” as long as the interface can be used as a sort-of dictionary. As an example,

consider the fully automatic collocations dictionary ForBetterEnglish.com,38 which uses the

SketchEngine and GDEX technologies (Kilgarriff et al. 2008) on server-resident corpora to

automatically produce entries such as the one in Figure 2. Clearly, it takes quite an expert to

tell that this is not your usual human-made dictionary entry. The illusion would have been

even better if the type-of-collocation indicators (object_of, etc.) had been given less technical

and more user-friendly names.

Figure 2: Entry for TOOTH in the ForBetterEnglish.com automated collocations dictionary

Another corpus-based online resource, also having to do with English collocations,

JustTheWord,39 is even capable of correcting unnatural word combinations. Figure 3 shows

the output for the query POWERFUL TEA with the “find alternatives” option selected. The

interface indicates whether the word combination is “good” (green bar on the right, colours

not shown in print), or “bad” (red bar), and the length of the bar indicates the (un)typicality of

the word combination. Further, the narrow blue bar directly underneath each combination

indicates the degree of meaning similarity between the combination to be replaced and each

candidate for replacement. Here, the collocation strong tea has the longest blue bar, and

indeed this is the idiomatic phrase that a learner of English would have wanted to use instead

of the non-idiomatic powerful tea, had they known any better themselves. All in all, the

information provided is very useful and relevant, and it may actually be hard to believe that

this output has been computed fully automatically.

Figure 3: JustTheWord alternative collocation suggestions for ‘powerful tea’

There exist other “smart” interfaces to corpora. One of them is http://corpus.byu.edu,

created and maintained by Mark Davies, and it offers free access to several corpora, including

the Corpus of Contemporary American English (COCA),40 currently the largest publicly

available corpus of English. Another one is the SketchEngine,41 available by subscription. A

subset of the British Academic Spoken English corpus is available through IBM’s many

eyes42 clever visualizing interface, allowing the user to investigate the syntagmatic

relationships of the most common words, though it is not all that useful for the less common

combinations, due to small corpus size. A rich and comprehensive lexical database of English

with a dictionary-like interface will very soon become publicly available online as part of the

DANTE43 project.

These resources represent a high level of sophistication and so there is not much hope that

their popularity will extend much beyond a relatively small group of power users; the others

will just increasingly Google for any answers, irrespective of the nature of the problem, and I

fear that this tendency presents a real threat to more specialized reference tools, including

dictionaries.

4 Summary and conclusion

In our necessarily sketchy overview of English online dictionaries, we have seen that a great

variety of dictionaries exist, but without proper guidance users run the risk of getting lost in

the riches. It is surprising to see so many of the online dictionaries (including some from

respectable publishers) still largely constrained by the paper model, with access mechanisms

to lexicographic data often being substandard for today’s technology. Furthermore, users may

get flooded with irrelevant and highly repetitive information, especially by dictionary

aggregators. And even if hyperlinking to external sources embodies the best practice in

hypertext philosophy, it is not without danger, as it relinquishes much of the control over the

content of “our” dictionary page. More generally, the universal use of search engines (or one

dominant search engine) presents a risk of dictionaries (or any specialized online works of

reference) being marginalized. Finally, learners of English are still waiting for a function-

driven lexical resource of the type represented by the excellent Base lexicale du français44

(Verlinde et al. 2010 and this volume).

Notes

1 http://dictionary.cambridge.org

2 http://oxforddictionaries.com

3 http://www.collinslanguage.com

4 http://www.chambersharrap.co.uk/chambers/features/chref/chref.py/main

5 http://encarta.msn.com/encnet/features/dictionary/dictionaryhome.aspx

6 http://www.internetworldstats.com

7 http://www.oup.com/elt/catalogue/teachersites/oald7/lookup?oup_jspFileName=document.jsp&cc=pl

8 http://www.ldoceonline.com

9 http://ldoce.longmandictionariesonline.com/dict/SearchEntry.html

10 http://dictionary.cambridge.org

11 http://www.macmillandictionary.com

12 http://www.mycobuild.com

13 http://dictionary.reverso.net/english-cobuild

14 http://www.learnersdictionary.com

15 http://nhd.heinle.com/home.aspx

16 http://dictionary.cambridge.org/Default.asp?dict=A

17 http://www.urbandictionary.com

18 http://en.wiktionary.org

19 http://www.wordnik.com

20 http://www3.merriam-webster.com/opendictionary/

21 http://www.macmillandictionary.com/open-dictionary/latestEntries.htm

22 http://www.macmillandictionaryblog.com, winner of the 2009 Edublog award for best education blog on the

web

23 http://quod.lib.umich.edu/m/med

24 http://www.merckmedicus.com/pp/us/hcp/thcp_dorlands_content_split.jsp?pg=/ppdocs/us/common/dorlands/

drlnd/misc/dmd-a-b-000.htm

25 http://www.acronymfinder.com

26 http://www.etymonline.com

27 http://www.howjsay.com, the domain name being an eye-dialect rendition of the casual pronunciation of the

phrase ‘how do you say?’

28 http://www.cmu.edu

29 http://seas3.elte.hu/epd.html

30 http://thesaurus.com/?regHome=true

31 http://www.rhymezone.com

32 http://wordnetweb.princeton.edu

33 http://www.visuwords.com

34 http://www.visualthesaurus.com

35 http://dictionary.reference.com

36 http://www.myCobuild.com

37 One example is http://www.shanghaidaily.com

38 http://forbetterenglish.com

39 http://193.133.140.102/justTheWord, Sharp Laboratories

40 http://www.americancorpus.org

41 http://www.sketchengine.co.uk

42 http://manyeyes.alphaworks.ibm.com/manyeyes/visualizations/3e335458358611de909d000255111976

43 http://www.webdante.com

44 http://ilt.kuleuven.be/blf

References

Atkins, Beryl T. Sue. 1996. ‘Bilingual Dictionaries - Past, Present and Future’ in Gellerstam,

Martin, Jerker Jarborg, Sven-Göran Malmgren, Kerstin Noren, Lena Rogström and

Catarina Röjder Papmehl (eds.), EURALEX '96 Proceedings. Göteborg: Department of

Swedish, Göteborg University, 515-546.

Atkins, Beryl T. Sue, Adam Kilgarriff and Michael Rundell. 2010. ‘Database of Analysed

Texts of English (Dante): The Neid Database Project’ in Dykstra, Anne and Tanneke

Schoonheim (eds.), Proceedings of the XIV Euralex International Congress. Ljouwert:

Afûk, 549-556.

Béjoint, Henri. 2010. The Lexicography of English. From Origins to Present. Oxford:

Oxford University Press.

Bogaards, Paul. 2010. ‘The Evolution of Learners' Dictionaries and Merriam-Webster's

Advanced Learner's English Dictionary’ in Kernerman, Ilan and Paul Bogaards (eds.),

English Learners' Dictionaries at the DSNA 2009. Tel Aviv: K Dictionaries, 11-27.

Carr, Michael. 1997. ‘Internet Dictionaries and Lexicography.’ International Journal of

Lexicography 10.3: 209-230.

Chun, Dorothy M. and Jan L. Plass. 1996. ‘Effects of Multimedia Annotations on

Vocabulary Acquisition.’ Modern Language Journal 80.2: 183-198.

Cowie, Anthony Paul. 1999. English Dictionaries for Foreign Learners: A History. Oxford:

Clarendon Press.

De Schryver, Gilles-Maurice. 2009. ‘State-of-the-Art Software to Support Intelligent

Lexicography’ in Zhu, R. (ed.), Proceedings of the International Seminar on Kangxi

Dictionary & Lexicology. Beijing: Beijing Normal University, 565–580.

Dodd, W. Steven. 1989. ‘Lexicomputing and the Dictionary of the Future’ in James, Gregory

(ed.), Lexicographers and Their Works. Exeter Linguistic Studies 14. Exeter: Exeter

University Press, 83-93.

Fuertes-Olivera, Pedro A. 2009. ‘The Function Theory of Lexicography and Electronic

Dictionaries: Wiktionary as a Prototype of Collective Free Multiple-Language

Internet Dictionary’ in Bergenholtz, Henning, Sandro Nielsen and Sven Tarp (eds.),

Lexicography at a Crossroads: Dictionaries and Encyclopedias Today,

Lexicographical Tools Tomorrow. Linguistic Insights - Studies in Language and

Communication, Vol.90. Bern: Peter Lang, 99-134.

Hanks, Patrick. 2009. ‘Review of Stephen J. Perrault (Ed.). 2008. Merriam-Webster's

Advanced Learner's English Dictionary.’ International Journal of Lexicography 22.3:

301-315.

Hulstijn, Jan H. and Beryl T. Sue Atkins. 1998. ‘Empirical Research on Dictionary Use in

Foreign-Language Learning: Survey and Discussion’ in Atkins, Beryl T. Sue (ed.),

Using Dictionaries. Studies of Dictionary Use by Language Learners and Translators.

Lexicographica Series Maior 88. Tübingen: Niemeyer, 7-19.

Kilgarriff, Adam, Milos Husak, Katy McAdam, Michael Rundell and Pavel Rychlý.

2008. ‘GDEX: Automatically Finding Good Dictionary Examples in a Corpus’ in

Bernal, Elisenda and Janet DeCesaris (eds.), Proceedings of the XIII EURALEX

International Congress. Barcelona: Universitat Pompeu Fabra, 425-432.

Kilgarriff, Adam and Pavel Rychlý. 2010. ‘Semi-Automatic Dictionary Drafting’ in De

Schryver, Gilles-Maurice (ed.), A Way with Words: Recent Advances in Lexical

Theory and Analysis. A Festschrift for Patrick Hanks. Kampala: Menha Publishers,

299-312.

Lew, Robert. 2010. ‘Multimodal Lexicography: The Representation of Meaning in Electronic

Dictionaries.’ Lexikos 20.

Ooi, Vincent Beng Yeow. 1998. Computer Corpus Lexicography. Edinburgh: Edinburgh

University Press.

Rundell, Michael. 2006. ‘More Than One Way to Skin a Cat: Why Full-Sentence Definitions

Have Not Been Universally Adopted’ in Corino, Elisa, Carla Marello and Cristina

Onesti (eds.), Atti Del XII Congresso Di Lessicografia, Torino, 6-9 Settembre 2006.

Allessandria: Edizioni dell'Orso, 323-337.

Sobkowiak, Włodzimierz. 2009. ‘Review of Wells, John C., Longman Pronunciation

Dictionary (3rd Edition).’ International Journal of Lexicography 22.2: 191-209.

Tarp, Sven. 2008. Lexicography in the Borderland between Knowledge and Non-Knowledge:

General Lexicographical Theory with Particular Focus on Learner’s Lexicography.

(Lexicographica Series Maior 134.). Tübingen: Max Niemeyer Verlag.

Tono, Yukio. 2011. ‘Application of Eye-Tracking in EFL Learners’ Dictionary Look-up

Process Research.’ International Journal of Lexicography 23.1.

Verlinde, Serge, Patrick Leroyer and Jean Binon. 2010. ‘Search and You Will Find. From

Stand-Alone Lexicographic Tools to User Driven Task and Problem-Oriented

Multifunctional Leximats.’ International Journal of Lexicography 23.1: 1-17.

Wiegand, Herbert Ernst, Michael Beißwenger, Rufus H. Gouws, Matthias Kammerer,

Angelika Storrer and Werner Wolski. 2010. Wörterbuch Zur Lexikographie und

Wörterbuchforschung. Dictionary of Lexicography and Dictionary Research. Vol. 1

(A-C). Berlin: Walter de Gruyter.

Yamada, Shigeru. 2010. ‘EFL Dictionary Evolution: Innovations and Drawbacks’ in

Kernerman, Ilan and Paul Bogaards (eds.), English Learners' Dictionaries at the

DSNA 2009. Tel Aviv: K Dictionaries, 147-168.

The age of computer dictionary and its problems

Article

Full-text available

Jun 2022

Mehmet Atli

The use of computers in practical lexicography in the 1960s introduced a fundamental change in traditional lexicography. Especially in the 1990s, the accumulating of electronic language data and the increasing use of low-cost high-speed broadband internet at the beginning of the 21st century is leading to a rapid growth of e- lexicography studies. Along with these developments, many printed dictionaries became available as e-dictionaries within a short time. The advantage of electronically published dictionaries over printed dictionaries catches the attention of many lexicographers immediately. But nobody noticed the problems caused by hybridizations. Likewise, issues such as corpus coherence, data reliability, access path, quality, and utility attract few researchers' attention. Nevertheless, modern developments in e- lexicography have led to an unbalanced growth in expectations of the discipline, as well as a radical change in its functional field. In this study, the definition of the term computer lexicography is emphasized in addition to the historical advances of e-lexicography. And the phases of e-dictionary making were also discussed. In addition, an attempt was made to determine whether there are problems with today's e-dictionaries in terms of hybridization, corpus, data reliability, access path, personalization and quality. It is known that electronic devices require speech technology software to analyse linguistic and lexicographical data, and language technology applications require lexical data to be readable/understandable. Language needs to be standardized, because e-devices are more sensitive than humans to detect mistakes and errors. Therefore, this study makes some suggestions to identify and get rid of problems with e-dictionaries.

A proposed microstructure for a new kind of active learner’s dictionary

Article

Full-text available

Nov 2022

This article is intended as the first of a series of papers designing an electronic linguistic resource made up of three modules: (1) a phrase-based active dictionary thought of as a first attempt to implement John Sinclair’s vision of the “ultimate dictionary”; (2) a grammar / construction describing not only the morphologic and syntactic rules of a language but also its systematic (semantic) alternations and the derivations generating new meaningful constructions from old ones; (3) a phrase thesaurus / phraseological conceptual ontology taking WordNet into the modern “age of phraseology”. After introducing the theoretical framework of our project, we present the microstructure of our Phrase-based Active Dictionary (PAD) model and describe it in a general theory of lexicography.

Word formation of Chinese English words: Evidence from the Chinese English Dictionary

Article

Full-text available

Jun 2020

This study aims to cast light on the nature and features of word formation in the lexis of Chinese English and provide a synchronic formational analysis of Chinese English neologisms by adopting a sequential data‐driven approach with a combination of qualitative and quantitative methods. The study has 1) constructed a hierarchical and quantificational four‐level structure for Chinese English word formation through meticulous coding, categorization, and calculation of 3522 headwords collected in the Chinese English Dictionary; and 2) revealed departures of the present taxonomy with extant models of indigenized varieties in both qualitative (coherence, inclusiveness, refinedness, adaptability) and quantitative measures (the changing status of formation processes, source languages, and transliteration systems). The study sheds significant insights into the motivation of Chinese English words, the status of the Chinese English variety, the sociolinguistic conditions of the variety, and word formation in the Expanding Circle where a dictionary of the variety is available.

“View and Hide Definitions” of Racist Hate Speech: Ethnophaulisms in Google’s English Dictionary

Article

Full-text available

May 2024

Silvia Pettini

This paper aims to foster debate about the language of racist hate speech in online English lexicography. For this purpose, it presents a study on the treatment of ethnophaulisms, or ethnic slurs, in “powered by Oxford Languages” Google’s English dictionary. The focus is indeed on the perspective of the general user of the Internet, in light of the connection between two facets of this digital age. The first one is the strong and growing tendency among Internet users to ‘google’ their language issues. The second one is the alarming increase in cases of hate speech online, most of which are based on ethnicity and nationality, according to reports by the United Nations. Consequently, the free and pervasive content of Google’s English dictionary represents a case in point to investigate whether and how online users are warned against the power of these hate words. A selected sample of 285 English ethnic slurs have been looked up in the dictionary and, if recorded, their entries have been scrutinised to identify lexicographic data regarding their semantic relevance and offensiveness. Findings show that the majority are included, they mostly present ethnicity-related senses, but less than half of the total are treated as ethnophaulisms. In this respect, the major dictionary markers indicating offensiveness are effect labels, predominantly alone or combined with definitions. Relative to their size, thus, ethnophaulisms in Google’s English dictionary are clearly described as offensive or derogatory expressions, thus making online users aware of their hurtful nature.

Tools of electronic lexicography as a means to teaching foreign language for professional purposes

Article

Jul 2023

В работе впервые описывается технология применения методов электронной лексикографии в практике изучения языка специальности и формирования коммуникативной компетенции у будущих преподавателей системы профессионального образования. В статье содержится подробное описание процедуры создания учебного словаря с использованием языка разметки DSL. Разработка двуязычного электронного словаря рассматривается одновременно и как объект изучения, и как средство обучения языку специальности. При проведении исследования был выполнен анализ нормативно-правовой документации, а также учебно-методической литературы; представлен анализ существующих двуязычных электронных словарей. Результаты работы внедрены в практику подготовки преподавателей системы среднего профессионального образования, изложенные в работе идеи и практические рекомендации могут быть использованы при проведении занятий по изучению языка специальности. The paper is the first one to describe the technology of applying methods of electronic lexicography to teaching foreign language for professional purposes, and even to form communicative competence among future teachers of the secondary vocational education system. A detailed description of the procedure for creating learning dictionary using the Dictionary Specification Language (DSL) is suggested. The composing of a bilingual electronic dictionary is considered both as an object of studying and as a teaching tool. In addition to the analysis of scientific literature, the analysis of regulatory and legal documentation, as well as educational and methodological literature was carried out. An analytical review of electronic bilingual dictionaries is presented. The results of the study can be used to train vocational education teachers and to teach foreign language for professional purposes.

APPLICATION OF THE PRINCIPLES OF MULTIMODALITY IN MODERN LEXICOGRAPHY

Article

Jan 2022

Svetlana V. Burenkova

The search for the optimal and most comprehensible ways of lexical units semantization has been the key problem of lexicography and metalexicography for the dictionary user and remains the contemporary problem. Introduction of computer technology has radically changed the concept of the dictionary, solved many problems of the lexicographic genre, and expanded the capabilities of the dictionary user. Along with this, the abundance of lexicographic works, including electronic ones, dictionary platforms, that act as aggregators of several online dictionaries, an overabundance of reference information, disorient an inexperienced consumer. Due to this, the objective of this article is the typology of existing online dictionaries and information dictionary platforms based on the material of the German language, depending on their content and general purpose. The most accurate multi-aspect interpretation of a word comprises various features and connotations taken into account. Nevertheless, the reference nature requires a laconic description of the material, devoid of a formalized meta-language, and the ability to quickly find the necessary explanation. Regarding this, the analysis of the application of the principles of multimodality is conducted in the article to solve the problems of meaning representation in electronic dictionaries. Besides, the author underlines the significance of the multimodal meaning representation for the effective interpretation of foreign lexis in all variety of nuances, and, therefore, for the effectiveness of a foreign language lexis acquisition and overcoming translation difficulties.

La metalexicografía del siglo XXI: un estado de la cuestión

Article

Full-text available

May 2022

Lexicography faces a pivotal transition in the 21st century. In a context of disruptive innovation, publishing companies confront a big challenge due to a plethora of open-access online dictionaries, mature free platforms such as WordReference, Reverso, Linguee or Wiktionary, and ever-progressing automatic translators. In order to analyse the critical moment that lexicography is going through, this paper deals with the technological impact and the advantages and disadvantages of paper and electronic dictionaries. Moreover, transversal issues such as the quality of dictionaries and the satisfaction of users’ needs are tackled. Our purpose is to provide a picture of present-day metalexicography and to indicate the challenges and obstacles that lexicography faces in the digital era.

La metalexicografía del siglo XXI: un estado de la cuestión [EN PRENSA]

Article

May 2022

La lexicografía del siglo xxi afronta una transición decisiva. En un contexto de disrupción tecnológica, la proliferación de diccionarios en línea de consulta abierta, la consolidación de plataformas gratuitas como WordReference, Reverso, Linguee o Wiktionary, y el avance de los traductores automáticos plantean todo un desafío a las editoriales. Para analizar el momento crítico que atraviesa la lexicografía, este artículo aborda el impacto tecnológico, comparando las ventajas y desventajas de los diccionarios en papel frente a los diccionarios electrónicos. Asimismo, se debaten cuestiones transversales como la calidad de los diccionarios y la satisfacción de las necesidades de los usuarios. El objetivo es ofrecer un mapa del estado actual de la metalexicografía y señalar los desafíos y obstáculos a los que se enfrenta la lexicografía en la era digital.

diccionarios como herramienta para el aprendizaje de idiomas extranjeros: uso, percepción y preferencias por parte de estudiantes de tercero de ESO

Article

Full-text available

Jan 2022

Rafael Onieva Palomar

Los diccionarios son herramientas de extraordinaria importancia en el campo del aprendizaje de idiomas extranjeros tanto por su larga tradición en la educación como por su marcado carácter pedagógico. A pesar de ello, su uso es limitado y, en la mayoría de ocasiones, este se limita al diccionario bilingüe aunque existan obras monolingües destinadas a este fin: los diccionarios de aprendizaje. Con el objetivo de profundizar en el conocimiento sobre la percepción y el uso que los estudiantes de lenguas extranjeras hacen de los diferentes tipos de diccionarios, en este artículo se presentan los resultados obtenidos a partir de las respuestas a un cuestionario distribuido a 41 alumnos de Tercero de Educación Secundaria Obligatoria (E.S.O.) del C. D. P Madre del Divino Pastor en Andújar (Jaén) para dirimir el estado en el que se encuentran los diccionarios en este nivel educativo y comprobar si se hace un uso regular de unas obras que pueden contribuir a la mejora drástica de las calificaciones en esta asignatura.

Merging Professional and Collaborative Lexicography: The Case of Czech Neology

Article

Sep 2021

This paper aims to relate two linguistic phenomena: neology (along with sources for its study) and collaborative lexicography. A pair of case studies is presented concerning two thematically defined groups of recent Czech neologisms: those abusing the Czech ex-president V. Havel’s name and those reflecting the Covid-19 pandemic. An initial dataset was provided by the user-generated content web dictionary of non-standard Czech Čeština 2.0 and the Neomat neology database, fostered by professional linguists. The objective data from a monitor corpus of Czech is used in contrast with the initial dataset and thereby leads to some open questions, especially with regards to the extent to which amateur and professional, two branches of lexicography, can inspire and enrich each other.

State-of-the-Art Software to Support Intelligent Lexicography

Conference Paper

Full-text available

Jan 2010

Gilles-Maurice de Schryver

This paper presents a proposal for a revolutionary type of electronic dictionary, one in which the potential is explored to link an automatically derived dynamic user profile to the proffered multimedia lexicographic output. Such adaptive and intelligent dictionaries may use the TshwaneLex dictionary production system at their core, to which a string of artificial intelligent components are added. This proposal is illustrated by means of the description of a project to compile an online Swahili to English dictionary. Swahili is both the most widely spoken African language, and the one sub-Saharan language most commonly taught throughout the world. As a theoretical framework for the development of this new type of electronic dictionary, the ‘fuzzy answer set programming’ framework (Van Nieuwenborgh et al. 2007) is advanced.

Search and You Will Find. From Stand-Alone Lexicographic Tools to User Driven Task and Problem-oriented Multifunctional Leximats

Article

Full-text available

Mar 2010

Regardless of their name (dictionary, glossary, encyclopaedia, or even ‘leximat’, in the case of a new generation of online, semi-automated lexicographic tools), subject-field, purpose, or medium (paper or cyber), lexicographic reference works should be regarded as functional information tools that are solely designed to cater to the information needs of their users in different usage situations and that consequently help them solve specific communication (reading, writing, translation) or knowledge problems (acquiring new knowledge or verifying existing knowledge, learning a language or a subject field). In this article, we briefly outline the evolution of lexicographic reference works from stand-alone to multifunctional lexicographic tools, and we describe the theoretical principles and innovative functionalities of a new task and problem-oriented lexical database, the Base Lexicale du Français. In line with Tarp (2006), a tool that should be truly regarded as a ‘leximat’.

English Dictionaries for Foreign Learners: A History

Book

Oct 2023

A P Cowie

This is the first history of dictionaries of English for foreign learners, from their origins in Japan and East Asia in the 1920s to the computerized compilations of the present. Monolingual dictionaries for foreign speakers were a revolutionary development at their outset, and now represent a coming-together of intellectual, technological and commercial forces almost unequalled in book publishing. As the author shows, the early history of EFL dictionaries was research-driven, arising directly from research in linguistic theory and language pedagogy; now it is user-driven, determined by what users require or are thought to require. The pioneering dictionaries were the work of individuals. Current dictionaries are the products of huge databases manipulated by sophisticated processing, as publishers strive to share an immense and constantly growing global market. The book has both a thematic and a chronological structure. Three chapters describe the historical sequence over a period of some sixty years. These alternate with chapters dealing with phraseology, computers and corpus linguistics, and research into dictionary users and uses - three subjects central to the development of ELT dictionaries over the last thirty years. Dr Cowie examines the way in which availability of massive computing power has transformed the recording and analysis of current speech, and shows how the growth of research into the users and uses of dictionaries has led to developments both in ELT lexicography and method. This readable and non-technical account is directed both at professionals in applied linguistics and English language teaching, and at lexicographers, but it will interest and fascinate everyone concerned with the analysis of English and faced with the challenge of recording of the subtelties of its grammar and meaning.

Lexicography in the Borderland between Knowledge and Non-Knowledge: General Lexicographical Theory with Particular Focus on Learner's Lexicography

Book

Jan 2008

Sven Tarp

Application Of Eye-Tracking In Efl Learners' Dictionary Look-Up Process Research

Article

Feb 2011

Yukio Tono

The present study aims to apply eye-tracking technologies to analyse the process of dictionary look-up by learners of English as a foreign language. An experiment was conducted to examine detailed processes of look-up in the microstructure. Several variables (the availability of supporting devices such as signposts or menus, different types of grammar codes, positions of target definitions) were carefully controlled to see how look-up behaviour would change in both monolingual and bilingual dictionary interfaces. The findings show that look-up processes within a microstructure are very complex, showing interactive effects among positions of target information within the microstructure, functions of supporting devices, and users’ proficiency levels. Pedagogical and methodological implications will be discussed.

The Dictionary-Making Process More than One Way to Skin a Cat: Why Full-Sentence Definitions Have not Been Universally Adopted

Article

Michael Rundell

At the last Euralex Congress, John Sinclair reiterated the case for full-sentence definitions (FSDs), and questioned why the COBUlLD approach to defining had not been generally adopted by other dictio-nary publishers. This paper answers his question. The theoretical case for FSDs is reviewed (and in general not challenged), and it is shown how the full-sentence model often results in definitions that are more effective and more readable than could be achieved using traditional styles. But the FSD is not al-ways the most appropriate strategy: the approach has several disadvantages, and a rigid adherence to this style does not always serve best interests of dictionary users (especially language learners). Rather, it will be argued, the goals that gave rise to the FSD may often be achieved through other means. The paper concludes with proposals for a range of defining strategies (including FSDs), along with sugges-tions as to when each is likely to be most effective.

Empirical research on dictionary use in foreign-language learning: survey and discussion

Article

Zum Stand der Arbeiten am „Wörterbuch zur Lexikographie und Wörterbuchforschung / Dictionary of Lexicography and Dictionary Research“

Article

Jan 2007

Herbert Ernst Wiegand

Internet Dictionaries and Lexicography

Article

Sep 1997

Michael Carr

Cyberlexicography is definable as “employing the Internet to compile or create a dictionary.” Modern lexicographers can use the Net in various ways: participating in electronic conferences, consulting dictionaries and encyclopedias, or searching word usages within the ultimate corpus. An Appendix contains Figures 1–8. Submitted October 1996.

Effects of Multi-Media Annotations on Vocabulary Acquisition

Article

Jun 1996

Research on second language (L2) vocabulary acquisition has revealed that words associated with actual objects or imagery techniques are learned more easily than those without. With multimedia applications, it is possible to provide, in addition to traditional definitions of words, different types of information, such as pictures and videos. Thus, one of the fundamental research questions posed in the use of multimedia systems is: How effective are annotations with different media types for vocabulary acquisition? This article discusses the results of three studies done with 160 university German students using CyberBuch, a hypermedia application for reading German texts that contains a variety of annotations for words in the form of text, pictures, and video. The issues examined are related to (a) how well vocabulary is learned incidentally when the goal is reading comprehension, (b) the effectiveness of different types of annotations for vocabulary acquisition, and (c) the relationship between look‐up behavior and performance on vocabulary tests. The results showed a higher rate of incidental learning than expected (25% accuracy on production tests, 77% on recognition tests), significantly higher scores for words that were annotated with pictures + text than for those with video + text or text only, and a correlation between looking up a certain annotation type and using this type as the retrieval cue for remembering words.

Online dictionaries of English

Abstract and Figures

Recommended publications

A Web 2.0 and Open Source Approach for Management and Sharing of Multimedia Data-Case of the Tzu Chi...

Multimedia Authoring and Web 2.0

Knelpunten en barrières bij het ontwikkelen van interactieve televisieformats

Method for constructing personalized digital museum in grid