Content uploaded by Robert Lew
Author content
All content in this area was uploaded by Robert Lew on May 11, 2015
Content may be subject to copyright.
Online dictionaries of English
Robert Lew
Adam Mickiewicz University
Abstract
In this paper I present an overview of the spectrum of available online English language
dictionaries, and then offer some general comments on a few selected key issues. Given the
current explosion of web content, it is quite pointless to try to list every single dictionary
available. It makes better sense to identify the salient categories of online dictionaries and
selectively focus on their prominent and typical representatives. The first notable category, so
important to the many learners of English worldwide, are the famous British monolingual
learners’ dictionaries (the Big Five). Here, it is interesting to observe the gradual transition to
the online medium in what has sometimes been called the freemium approach. Quality general
English dictionaries aimed at the native speaker are not so well represented, but there are a
wide choice of specialized (subject) dictionaries of varying quality and provenance. Special-
purpose dictionaries include pronouncing dictionaries and onomasiological dictionaries.
Diachronic dictionaries have also established a presence on the internet. As one guise of the
Web 2.0 experience, we witness the emergence of bottom-up (or user-involvement)
lexicography, with such prominent exemplars as the Urban Dictionary or Wiktionary.
Hyperlinking is a fundamental feature of the web, but it is, arguably, overused in the so called
dictionary aggregators: dictionary portals which put together entries from several online
dictionaries. This creates highly redundant assemblages of lexicographic data. How to tap the
richness of the Web but present the results in a user-friendly manner without laborious human
intervention is a tough question. Another issue that still awaits satisfactory answers is the
organization of access to data in online dictionaries. Even in highly respected dictionaries,
there remain basic problems of access, such as with locating multi-word units,
notwithstanding the upbeat tone of metalexicographers who often just pronounce the problem
as essentially solved in the electronic medium. Other issues related to new technologies are
the use of graphics, multimedia and alternative presentation modes, and these receive some
attention. Finally, I play with the idea of the dictionary as an advanced query system sitting on
top of a text corpus. Using collocation dictionaries as an example, I demonstrate that the
difference between a sophisticated corpus query system and a more traditional lexicographic
product may soon become something of a technical subtlety.
Introductory
The present paper is intended as an overview of online dictionaries of English, often seen, and
probably rightly, as the leading lexicographic tradition of the present. Although a balanced
overview is my primary goal, I will also touch upon some general issues and adopt a more
evaluative position here and there. However, this will only be a secondary perspective, as the
specific issues are covered in greater depth in some of the other papers in the present volume.
Obviously, given the sheer number of the currently available on-line dictionaries, no-one
can hope to produce a complete catalogue, and this is not the purpose here. Rather, the idea is
to present prominent and representative exemplars of specific types of dictionaries and focus
on their properties of interest. But what are those types of dictionaries? As dictionaries can be,
and have been, compared on a number of different levels, classifying them has traditionally
been problematic. This has become even more of a challenge in the age of electronic
dictionaries. What, then, could be the basic classifying criteria for online dictionaries?
Clearly, most of the traditional criteria can still be applied to online products. Here, of
course, we find the complex (and at times confusing) network of overlapping oppositions:
general/specialized subject, general/special purpose, L1/L2/FL speaker, expert/layman,
contemporary/historical, etc...
There do appear, however to be some criteria or oppositions that have not been inherited
from printed dictionaries but rather are specific to online dictionaries.
1 Some additional criteria for classifying online dictionaries
1.1 Institutional vs. collective
A variety of overlapping classification criteria have been used to categorize online
dictionaries. For example, in terms of user involvement, there is the institutional versus
collective opposition (Fuertes-Olivera 2009); the latter category signifies a collaborative effort
by a community of non-professionals, who can themselves be dictionary users; an earlier
paper by Carr (1997) has also used the terms bottom-up and collaborative. User-involvement
is yet another designation for a similar concept, while open stresses a slightly different aspect
of what might again be a fairly similar formula.
1.2 Free vs. paid
Collective dictionaries would normally be free to use. Conversely, institutional dictionaries
need not necessarily involve fee-based access, so the free versus paid contrast is an
independent one. It is also increasingly difficult to demarcate clearly between free and paid,
with the clear cases leaving a substantial grey area in the middle, as revenue to the publisher
can take different forms. For example, individual pay-per-view or subscription-based access is
a clear case, but when syndicated as part of a more comprehensive service and sold, say, to
libraries, the end user often does not bear the direct cost. Then there are cases where online
access is offered (perhaps for a limited time) as a bonus for buyers of paper editions. Still
closer to the free end of the cline are ad-supported dictionaries, and this appears to be a rather
popular model at the moment.
1.3 Number of dictionaries
In terms of how many dictionaries are offered by the specific services, at least the following
four options come to mind:
1. individual dictionaries: just like the traditional printed dictionaries, there are
standalone, single online dictionaries;
2. dictionary sets consisting of clusters of related dictionaries may be offered from a
single landing page; a good example is the Cambridge dictionaries online page;1
3. dictionary portals only include hyperlinks to actual dictionaries (examples will be
presented below);
4. dictionary aggregators excel at pasting together the content of various dictionaries and
serving them on a single page (again, examples will be discussed below).
In my overview below, I will begin with some notable representatives of institutional
dictionaries offered free of charge to the world internet community.
2 Institutional Dictionaries
2.1 General English Dictionaries
General English Dictionaries are traditional general-purpose dictionaries which provide a
relatively rich microstructural treatment of (primarily) contemporary English, which is
traditionally expected from general reference desk dictionaries, and where the word list is not
restricted by domain or register.
2.1.1 American
Traditional US dictionary publishers seem to have embraced the web: as many as three of the
major American players on the market of general desk and college dictionaries make their
dictionaries available online free of charge. These are the Merriam-Webster Online
Dictionary, American Heritage Dictionary and Random House Unabridged Dictionary, the
last one being included only as part of the Dictionary.com service (on which see 3.1 below).
2.1.2 British
Until recently, the available offer of online general-purpose dictionaries on the British scene
had been less complete, with the traditional and most prestigious publishers (notably Oxford
University Press) apparently hesitant about placing their products online for free. Only very
recently, OUP created the new oxforddictionaries.com2 lexicographic portal, built around two
of the publisher’s recent dictionaries: the newest (third) edition of the Oxford Dictionary of
English (under the heading World English), and its American counterpart, the New Oxford
American Dictionary, also in its third edition. A premium subscription service is available,
with one year free for buyers of the printed copy.
The availability of the free/premium combination for these Oxford dictionaries exemplifies
rather well the new business model that is currently being followed by a number of
publishers: the model known by the linguistic blend freemium. The approach works on the
principle that basic content and functionality is offered essentially free of charge (in response,
we might say, to the free-lunch mindset of today’s netizens). The free offer, however, is used
as an opportunity to market and sell extra content, which might be richer lexicographic data
and/or non-lexicographic content, such as exercises or language testing materials. To continue
with our example, the premium oxforddictionaries.com service offers the following extra
features (Judy Pearsall, personal communication):
− sense-linked thesaurus of 600,000 synonyms and antonyms;
− advanced search and browse features;
− 1.9 million sense-linked examples from the Oxford English Corpus;
− audio pronunciations;
− My Oxford Dictionary personalization features;
− browsing and search by subject area, meaning category, part of speech, etc.;
− four additional zones fully linked to dictionary content, including Writing Skills zone,
Writers and Editors zone, Example sentences zone, and Puzzles zone.
To some extent, free online versions may drive the sales of paper copies — but of course this
argument could be reversed, with online access deterring some potential buyers from
purchasing a printed copy.
Apart from the two Oxford dictionaries, there are also other notable British dictionaries
offered free of charge. Collins offers what it refers to as the Collins English Free Dictionary.3
A closer examination reveals that this is not the same as the authoritative Collins English
Dictionary; the latter, however, does seem to be available, but only as part of
TheFreeDictionary service (on which see 3.1 below). The venerable Scottish publisher
Chambers offers on its website its Chambers 21st Century Dictionary.4 Again, though not
really the same as the renowned Chambers English Dictionary, the 21st Century is still a
usable, solid reference work for general consultation.
The Encarta World English Dictionary,5 having originated in a cooperation between the
London-based Bloomsbury publisher and Microsoft, actually comes in two versions, and both
are available via the same website; there is the World English version, marketed as the
dictionary that provides unrivalled treatment of the regional varieties of English, and the
localized US version; the site provides an option to switch quickly between the two, and it is
fascinating to observe, by switching back and forth, the differences in the coverage of
regional terms and meaning, spelling and pronunciation.
2.2 Learners' dictionaries: the Big Five
According to data from internetworldstats,6 English is the foreign language of some 86% of
Europe’s active internet users. Now, given that English is today’s de facto lingua franca and
that WWW content in English dwarfs out that in any other language, it becomes clear that
non-native speakers are a significant category of online dictionary users, present or future. In
this context, the category of English learners’ dictionaries comes to the focus, since these are
the reference works designed specifically with the non-native speaker in mind. English
learners’ dictionaries enjoy a long-standing tradition, which goes back to around the 1940’s
or, as some claim, the 1930’s (cf. Cowie 1999). Their content has been meticulously reworked
over numerous successive editions, and because of the worldwide customer base and the
corresponding sales volumes, publishers of monolingual English learners’ dictionaries have
been able to take advantage of select teams of expert lexicographers. These dictionaries have
enjoyed high levels of prestige, and so have their traditionally British publishers.
The last few years has seen free versions of British monolingual dictionaries for advanced
learners appear online, one by one. On the whole, the major British MLD’s have followed a
pattern of remarkable similarity (Yamada 2010), perhaps as part of the competitive drive, and
this is also reflected in the features offered in their online versions. There is also a more
down-to-earth reason for the similarities found in a number of British MLD’s: they tend to use
the same software dictionary production platform from IDM.
The range of available English MLD’s opens with the pioneer in this segment, Oxford
Advanced Learner's Dictionary,7 a free version now roughly based on the 7th print edition. A
long-time competitor, Longman Dictionary of Contemporary English, currently in its fifth
edition, has also offered a free online version8 for some time. The dictionary’s landing page
specifically mentions a limitation of the free version: recordings of spoken pronunciation are
only available for a subset of headwords and example sentences (more specifically, the audio
is available for the entries in the letter stretches D and S). The note further states that audio
recordings for all entries are available in “the CD-ROM version”: this is not quite accurate, as
the optical disk version is actually offered on a DVD-ROM. But the free version is not the
only online version of this dictionary: there is also a radically different premium online
edition9, which offers essentially the same content as the off-line DVD-ROM version.
Cambridge Dictionaries Online10 represents an example of an institutional dictionary set (as
defined in 1.3 above): apart from the flagship Cambridge Advanced Learner’s Dictionary,
four other learners’ dictionaries from the publisher are available at the same address.
Amongst the major British learners’ dictionaries, Macmillan English Dictionary may well
be the one to have offered the most complete set of lexicographic content online11 free of
charge, including audio pronunciations of all headwords and a sense-linked thesaurus.
The one member of the Big Five set which has remained apparently sceptical when it comes
to offering free online access of any kind is COBUILD. Although it has offered subscription-
based access for some time,12 none of this is available freely, if we disregard an outdated 4th
edition being hosted on a third-party service.13 Recently, it looked as if COBUILD was set to
become the most widely used learner’s dictionary when, in autumn 2009, Google apparently
obtained a licence for COBUILD content and placed it online as the main Google dictionary
for English. This was a questionable choice, as COBUILD is not really well-suited for the
type of uses that Google users were most likely to need the dictionary for, i.e. problems with
text reception: of all the major learners’ dictionaries, COBUILD has the smallest coverage
(Rundell 2006). On the other hand, the features supporting text production would remain
underused. Google’s half-hearted implementation of the interface certainly would not have
made users more sympathetic towards the dictionary. For example, Google dictionary
included COBUILD’s syntactic codes, but without a word of explanation anywhere. Surely, it
is a long shot to assume that a casual user of the Google dictionary will appreciate the
significance of a code such as “NVAR” (in this case, an indication that a noun in the sense so
marked has both mass and individuated uses). Considering all this, it is not at all surprising
that in August 2010, COBUILD was replaced as the database for Google dictionary with The
Oxford American College Dictionary (Judy Pearsall, personal information).
2.2.1 American learners' dictionaries
Although it is the British publishers that lead the market of monolingual English learners’
dictionaries, such dictionaries have also been published elsewhere, and one particular
dictionary that made a premiere recently with quite a bit of publicity is the Merriam-Webster's
Learner's Dictionary.14 What is rather unique about this dictionary is that the launch of its
online version coincided with the publication of the first paper edition. The free online content
includes audio pronunciation, and the user interface is at least as good as those of the British
dictionaries, but despite the marketing claims, the lexicographic content itself is not
groundbreaking, and still lacks a number of modern features now taken for granted in the
leading British products (Bogaards 2010; Hanks 2009). The dictionary does have more
examples than the competition, but their quality has been questioned (Hanks 2009).
Despite what some might be led to believe, the Merriam-Webster's Learner's Dictionary is
by no means the first American dictionary of its type: several have already been published,
and at least one of them, Heinle's Newbury House Dictionary of American English,15 is freely
available online. However, the latter is a rather small dictionary and not a particularly
impressive one. All in all, learners of American English may actually be better off using
British-published dictionaries of American English, such as the Cambridge Dictionary of
American English.16
2.2.2 Louvain EAP Dictionary (LEAD)
Apart from the established publishers, some academic centres are also trying to enter the field
of learners’ dictionaries. One particularly promising project currently in progress (not yet
publicly accessible) is the Louvain EAP dictionary (LEAD), which is being developed as a
dictionary for non-native writers. Its main novelty is that it is customizable in terms of field
domain (business, medicine) and mother tongue (French, Dutch). In consequence, usage notes
and equivalents match the L1 of the user, and some of the examples are domain-specific. The
dictionary will also have (as you might expect from a product created at the Centre for
English Corpus Linguistics) a solid grounding in corpora, and integrated corpus access.
2.3 User-involvement (bottom-up) lexicography
In the democratic world of the internet, users can play lexicographer as well and create their
own online dictionaries. There is quite an impressive range of these, but let us have a look at
three representative exemplars:
2.3.1 Urban dictionary
A success story in its own right, the Urban dictionary17 is a true bottom-up initiative which
recently celebrated its 10th anniversary. One of the community features exemplified here is
that users vote on the „best” definitions. But such democracy does not necessarily serve
lexicography well: as it turns out, the most liked definitions are not of the type that would
really help someone who does not already know the meaning. Clearly, true explanatory
definitions are too predictable and thus not “interesting” enough, and are being pushed back to
the bottom of the list. Instead, collaborative dictionary entries, unless properly moderated,
tend to become the playing ground for showing off wit, marking in-group membership and
venting prejudice. For example, one entry at the headword BOOTYISM runs as follows: “The
gospel according to Beyonce. Often confused with Buddhism.” This entry is written in an
abbreviated style posing as lexicographese, and manages to allude rather cleverly to the
semantics as well as origins of the slang term, but it would probably not be of much help to a
user who has no clue about the meaning. In this case, the author seems to be aware of this
deficiency, and makes up for it in the (entirely invented) example exchange:
Todd: I'm thinking about converting to Bootyism.
Michael: Nah man, it's BUDDHISM.
Todd: No, cause in Bootyism all you do is worship ass.
2.3.2 Wiktionary
Wiktionary18 may be the ultimate collaborative dictionary. A recent in-depth analysis of this
resource (Fuertes-Olivera 2009) presents a number of interesting findings. It is observed that,
contrary to what is often claimed, Wiktionary is not a multilingual dictionary but an English
dictionary with a translation overlay for several other languages. It is also noted that very
similar items may receive radically different treatments, lacking internal consistency and
contradicting the Wiktionary guidelines.
2.3.3 Wordnik
Wordnik19 presents an interesting blend of online dictionary genres, involving a collaborative
community-driven component built around a “professional” core. According to the founder
Erin McKean (personal communication), user-generated content is encouraged here but in
"guided" ways, with less emphasis on user-created definitions than is usual in collaborative
projects. Wordnik embeds content from other datasets: at this time, Twitter and Flickr are
being tapped for real-time citations and relevant images, respectively. The service employs
modern data mining techniques to identify in corpora citations of the self-defining and
exemplar types (McKean, personal communication). Overall, there is less reliance on
traditional definitions and the emphasis is shifted to citations.
2.3.4 Collaborative-institutional dictionaries
Commercial publishers also try to get their users actively interested and involved in
lexicography, perhaps in an effort to persuade them to stay on the site and come back for
more. Examples of collaborative sections hosted on institutional dictionary sites suggest that
the opposition institutional versus collective dictionary (Fuertes-Olivera 2009) may no longer
be a sharp one. Two such examples from well-known institutional publishers are the
Merriam-Webster Open Dictionary20 and Macmillan Open Dictionary.21 A perusal of the
user-added entries reveals that most of the entries added would not meet the criteria for
inclusion in the regular edition of the dictionary, and their presence merely provides evidence
of the conventional wisdom that “the dictionary” is a collection of “all the words” of a
language.
Apart from adding open dictionary components, online dictionaries sometimes offer other
extras aimed at involving the users. Recent add-ons include social networking features, such
as the award-winning Macmillan Dictionary blog.22
So far we have discussed general dictionaries of contemporary English, aimed at both
native speakers of English and foreign learners. Let us now move beyond these common
types, to diachronic and specialized dictionaries.
2.4 Diachronic (historical) dictionaries
Users of diachronic dictionaries are most typically language scholars, and so their level of
sophistication and language awareness is normally far beyond that of lay users. As language
experts, they can reasonably be trusted to make choices that a non-expert user will not be in a
position to make, such as the explicit selection of microstructural data categories (and we will
revisit the issue of customization in a later part of this article). The makers of academic
diachronic dictionaries appear to be aware of these ramifications, as exemplified by the online
version of what is perhaps the most famous dictionary world-wide (at least for English), the
Oxford English Dictionary. Access to the OED is subscription-based, and affiliated scholars
would most of the time rely on their institutional subscription rather than a personal one. In
contrast, a more restricted (in terms of period) but no less voluminous Middle English
Dictionary23 has been freely available online since 2007, when the University of Michigan
completed the digitization process with the help of a government grant. The dictionary offers
a rather large number of technically complex search options but these should be manageable
for language scholars and their students.
2.5 Subject field dictionaries
There are countless online specialized dictionaries out there on the web, most of them fairly
small in size, dealing with the vocabulary of a specific subject field (as well as narrower sub-
fields). Because of the sheer number, many users will find it useful to consult online
directories of such dictionaries, one of the most comprehensive being Glossarist.com: an
example of a dictionary portal as listed in my provisional taxonomy under 1.3 above.
Indexing portals of this type only include links to dictionaries on external pages, without
themselves hosting or displaying actual lexicographic content.
The lexicographic wisdom that content and presentation are largely two separate aspects is
strengthened by those products where there is a sharp contrast in quality between one and the
other. One case in point is Dorland's Medical Dictionary24 from the respectable pair Merck
Medicus and Elsevier, where solid content is marred by the uninspired (to say the least)
access interface. Users are presented with a long chain of alphabetic stretches which have to
be navigated linearly in a fashion resembling page-turning, only much slower (although there
is a term search window, it does not apply to the dictionary itself, but to other services). To
metalexicographers, this dictionary serves as a warning against sweeping generalizations
about electronic dictionaries being faster and superior in terms of access: apparently, it is
perfectly possible to produce an online dictionary where access is more cumbersome than in a
paper book.
2.6 Dictionaries with restricted macrostructure
One way to think of special-purpose dictionaries is that they often involve systematically
restricted treatment in either macrostructure or microstructure. In the earlier case, only a
distinct subset of the vocabulary is included in the wordlist. Field dictionaries, already
covered in 2.5 above, may be included here. Another exemplar of a restricted macrostructure
dictionary is the well-known and successful Acronym Finder25, which aims to include
acronyms, including those pronounced as one word and letter by letter (sometimes called
initialisms). Although Acronym Finder does not limit its headword list to English acronyms, it
is a fact that English very clearly dominates.
2.7 Dictionaries with restricted microstructure
In contrast to dictionaries with restricted macrostructure, restricted-microstructure dictionaries
are characterized by a systematic reduction, not in the word list itself, but in the lexicographic
data categories presented at each entry, compared to a general dictionary. The free Online
Etymology Dictionary26 is a representative of the genre: the lexicographic data for a given
headword is restricted to an explanation of the word’s origins.
Pronouncing dictionaries are another major category of restricted-microstructure
dictionaries, where the chief lexicographic data given indicates the phonetic form of the entry
word. Semantic information is only given in exceptional cases, such as to disambiguate
between graphemically identical words that are pronounced differently (i.e. homographs that
are not homophones). There is the question of the exact form in which information on
pronunciation is conveyed. In printed books, transcription (in one of a number of standards,
the most universal being the IPA) used to be the only option, but in the multimedia
environment of the web, the expectation of users is to be able to hear an audio rendition of an
item’s pronunciation. This expectation is met by the popular free online talking English
dictionary howjsay.com,27 which provides recorded audio clips, but no written transcription.
At the other end of the cline are academic pronouncing dictionaries such as the Carnegie
Mellon University Pronouncing Dictionary,28 which presents transcriptions in the ARPAbet
respelling system, or Péter Szigetvári’s English Pronouncing Dictionary,29 which employs a
variant of the SAMPA respelling system.
There is no denying that being able to hear what the word or phrase sounds like is an asset,
but does this mean, as most people seem to assume, that phonetic transcription is now
dispensable? It probably is for native speakers of English, but hardly so for speakers of other
languages looking up English pronunciation. For them, it is an illusion to believe that just
hearing a word pronounced in a foreign language is enough to register, less still learn, its
correct pronunciation. Due to the effect known as categorical perception, speakers of a
language tend to hear foreign language sounds through the filter of their native language
phonology. Consequently, what foreigners will hear is mostly their native language sounds
and tend to miss the distinctions not present in their own language. For example, a speaker of
Polish may easily miss the difference between met and mat. The important advantage of
phonemic transcription is that it provides an explicit graphic representation of the phonemes
involved, drawing attention to the phonemes as entities. (This is not to say that the two
academic dictionaries cited above do this in a very user-friendly way: they do not.) Of course,
it is also true that efficient use of phonetic transcription does not usually come naturally for a
language learner and requires guided training.
But that is not the end of the story. Apart from pure phonemic identity, there is the
important subphonemic phonetic detail, including positional allophony which, again, is very
hard to hear for the untrained learner. Although traditional printed pronouncing dictionaries
tend not to give subphonemic detail, there is no principled reason why future online
dictionaries should not be able to offer a choice of the level of transcription, including a
narrow-phonetic rendition for those who might want or need it. Technically, it should not be
terribly difficult to take stock of at least the rule-based variants.
As noted by Sobkowiak (2009), phonetic transcription has a representational function and
an indexical function. The former has to do with the representation of the phonetic form of a
word (or, more generally, other linguistic string). The indexical function allows the user to use
symbols for accessing (sets of) lexical items, such as when looking for words that exhibit a
given phonetic pattern. A systematic transcription system is at present a prerequisite for the
indexical function to be possible, although not all dictionaries that do have transcription,
allow ‘sound search’ options. Clearly, of the three free pronouncing dictionaries here
presented, Szigetvári’s English Pronouncing Dictionary is the most sophisticated in this
respect.
2.8 Onomasiological dictionaries
Onomasiological dictionaries are those that are specifically designed to take the user from a
concept or idea to linguistic form, rather than explaining the meaning or use of a given form.
A traditional paper dictionary of this type would most typically be a thesaurus or synonym
dictionary. Thesaurus.com30 is a companion site to the popular Dictionary.com aggregator
(see 3.1 below). A more interesting online example of such a dictionary is RhymeZone,31
which started off as a synonym dictionary calling itself the Semantic Rhyming Dictionary.
Somewhat predictably, probably because of the phrase “rhyming dictionary” in the name,
users arrived at the dictionary from search engines looking for traditional phonetic rhymes,
and this is what the default search mode now offers. In fact, searching for rhyming words is
also an onomasiological query, albeit in a broader sense. In the more restricted sense of
onomasiological, the dictionary offers lists of synonyms, antonyms and “related words”. For
these, RhymeZone relies on data from the English WordNet32 lexical database, just as so
many other lexical resources do these days: it has become one of the favourite dataset for
many online dictionaries, because it is free and NLP-tractable in ways that make such
integration relatively easy.
One interesting way in which WordNet data is used is graphic visualization engines such as
VisuWords33 or Visual Thesaurus,34 where the idea is to represent WordNet’s lexical relations
in a visually appealing graphical form. The latter now shows up in Cambridge Dictionaries
Online entries.
Having completed a quick tour of the representative online dictionaries of English, we will
move on to a number of overarching issues that are relevant and topical for online dictionaries
of today and tomorrow.
3 Some issues in online dictionaries
3.1 The dictionary web
The World Wide Web is built around the concept of hypertext, where texts, documents and
media make up an interconnected network. Like most other sites, online dictionaries
hyperlink, interlink, embed and integrate, and it will not take long for a careful user of online
dictionaries to start noticing that quite a lot of the same content crops up again and again on a
variety of dictionary sites. For example, the very same Visual Thesaurus images which feature
in Cambridge Dictionaries Online are also present at the Dictionary.com35 site. The latter is an
example of a dictionary resource which does not rely on its own data, but instead aggregates
lexicographic content from other electronic (online) dictionaries. Dictionary.com is a
particularly popular such aggregator. The popularity, one might suspect, has a lot to do with
the attractive domain name, which to many users (and search engines?) strongly suggests that
this is the Dictionary (see e.g. Béjoint 2010 on the popular image of the dictionary). As of this
writing, the resource aggregates lexicographic content from 15 dictionaries, including the
American favourites Random House Dictionary and American Heritage Dictionary, as well as
half a dozen special-purpose and special-subject dictionaries.
Another aggregator is TheFreeDictionary, with American Heritage Dictionary (again!),
WordNet (again!), and Collins English Dictionary (and Thesaurus). The resource is worth
consulting for this last one, as this time (compare 2.1.2 above), it is indeed the respectable
Collins English Dictionary, which is generally not freely available elsewhere.
While the ability to hyperlink and embed is one that lies at the heart of the World Wide
Web, in dictionary aggregators the idea is taken to extremes, with the result that such
dictionary portals produce absurdly long entries by mechanically pasting together, back-to-
back, entries from several online dictionaries. These individual entries are often very similar,
which results in highly unhelpful, many-times redundant, tortuous assemblages of
disconnected lexicographic data.
3.2 Access
Electronic dictionaries, including online dictionaries, are often praised for their access
functionality, which is claimed to be superior compared to paper book form. Clearly, the
electronic interface is by definition more flexible and has a potential for efficiency that is not
achievable in static printed form, but it is also true that this potential is not always properly
utilized, especially if the online dictionary is retrospectively digitalized (Wiegand et al. 2010:
209). One example of a respectable online dictionary with paper-like access is the American
Heritage Dictionary, which has no search facility at all, worse still is Dorland's Medical
Dictionary (see 2.5 above), where outer access is even slower and more cumbersome than in a
printed book. However, some online dictionaries do take advantage of the electronic media
and explore alternative access routes. As an illustration of this issue, let us consider issues of
access in cases where a search term potentially returns large amounts of data.
3.2.1 The step-wise approach to outer access?
Over ten years ago, Hulstijn and Atkins (1998) proposed what they called “step-wise access”
for electronic dictionaries. In this connection, it is interesting to observe how this proposal
stands up in view of the practical implementations in online English dictionaries. For this, we
need to examine the volume of data that a dictionary presents to the user in those cases when
a search term matches more than a single treatment unit, such as multiple lemmata (such as
items of different part of speech), or includes multi-word expressions (MWE's), such as fixed
phrases, idioms or phrasal verbs. The spectrum of actual solutions seen in English online
dictionaries can essentially be reduced to three options:
1. a menu of target items is presented;
2. a menu is presented, but the most likely choice opens by default;
3. partial entries are listed.
The first option, by far the most common, can be illustrated using Macmillan Dictionary
Online as an example. Here, a search on a word-long string team returns a vertical menu of
nine matches, each one hyperlinked to an entry or subentry. The top of the menu looks like
this:
team NOUN
team VERB
dream team NOUN
sales team NOUN
...
Option 2. features in the Merriam-Webster's Advanced Learner's English Dictionary, where
a search for team produces a similar list of seven items, but the first of these (here again, team
NOUN) is already given as the full entry immediately below the list.
Option 3. is implemented in the online dictionary at myCOBUILD.com,36 available to
buyers of the printed copy of the Collins COBUILD Advanced Dictionary. The approach is an
intermediate one between a bare lemma list (Option 1.) and complete entries (Option 2.). As
seen in Figure 1, showing the entry TEAM in myCOBUILD.com, the dictionary interface alerts
the user that multiple entries have been found, and then displays the top of each lemma with a
More link leading to the complete entry for that lemma.
Figure 1: The entry for TEAM in myCOBUILD.com as an example of a stepwise interface
Which of the three options is best? A universal answer, ignoring lexicographically relevant
details such as the nature of the lookup situation and specific user needs and skills, rarely
makes sense in lexicography, but let us offer some observations that might have a more
universal appeal. Option 2. looks attractive, but there is a danger here that users may fail to
recognize that the default choice (as here team NOUN) is the wrong one in their case. In
contrast, Option 1. seems relatively safe in terms of the risk of missing the right option, but
the problem here lies in the economy of effort (aka laziness): users may lack the patience to
navigate through the menu to actual full treatment, and may decide instead to ditch a tool
which requires two much clicking work. In view of the above reservations, Option 3. might
perhaps be optimal (other things being equal), and it is surprising that so few dictionaries have
adopted it.
3.3 Customization and profiling in online English dictionaries
A recent study by Tono (2011), the first dictionary use study ever to employ eye tracking,
confirms the suspicion that dictionary users differ greatly in their consultation habits and
strategies. The realization that different users have different needs and expectations lies
behind efforts to vary or customize e-dictionaries (De Schryver 2009; Verlinde et al. 2010),
and, indeed, in some online dictionaries of English we have reviewed above, users do have
some ability to control the presentation of lexicographic data.
Oxford English Dictionary online has control buttons to display or hide away the following
data types: Pronunciation, Spellings, Etymology, Quotations, Date Chart, Additions. It should
be observed that this solution is not really lexicographic-function-driven (Tarp 2008), as the
user here is required to explicitly select the data fields included in the dictionary. However,
the users of an academic dictionary such as this usually represent a high level of
sophistication (many being language scholars), and so they are much more likely than naive
users to know directly and explicitly what data types they actually need.
Macmillan English Dictionary Online offers two pre-packaged presentation modes which
can be selected by flipping the Show Less/Show More control button located next to the
lemma sign. The choice is suggestive of the difference between a text reception mode and a
text production mode, respectively. Switching to the more basic mode hides away the
phonetic transcription, collocations (with examples), grammar labels and some of the
examples.
However, synonym links are still included, even though, arguably, a synonym list is not
very useful for text reception. Only a minority of dictionary users will be aware that the
dictionary has a third, even simpler mode, available via the so-called interstitial page,
accessible from collaborating news sites37 by double-clicking on any word in the text (luckily,
the engine includes lemmatization, so the word-form stealing takes the user to the lemma
STEAL). In this mode, all examples and synonyms are now absent, as one would expect in true
reception mode.
User profiling is one of the highlights in the new Louvain EAP dictionary (see also 2.2.2
above), now in development, where the content presented depends on the user-selected native
language and discipline (field domain) of interest.
3.4 Multimedia in online dictionaries
Online dictionaries can potentially include a range of multimedia content. The potential is
utilized in online dictionaries of English to varying degrees.
3.4.1 Graphics
Graphical elements are not the sole domain of electronic dictionaries, as drawings, and (to a
lesser extent) photographs, diagrams and tables have been used for a long time in paper
dictionaries. However, pictorials are more easily and cheaply included in electronic
dictionaries (Lew 2010). For example, illustrations are present in some entries in Cambridge
Dictionaries Online or the free online version of Longman Dictionary of Contemporary
English.
Thanks to the linkability of the web, it is quite possible to embed media from other
providers. However, one has to count with the ramifications of limited control over
hyperlinked content. For example, between (roughly) November 2009 and June 2010, the
Google Dictionary used to display popular images from Google’s own image search service
next to some entries. As a consequence, the Google Dictionary entry for KILT included a
photograph which, likely without conscious intent, conveyed all too clearly the cultural
information that kilts need no accompanying underwear (in the interest of propriety, no
screenshot is included in this article). As of this writing, the Google Dictionary has
discontinued the inclusion of images.
3.4.2 Audio
It is becoming increasingly popular for online dictionaries of English to offer audio recordings
of entry words. However, recordings of other verbal elements (definition, examples) are rarely
included: of the dictionaries discussed in this article, it is only the subscription version of
LDOCE which offers spoken recordings of all example sentences. One novel use of audio is
to present characteristic sounds associated with the entry word: an interesting subgenre of
ostensive defining. Proposals to include such elements in electronic dictionaries have been
made by Dodd (1989: 91) and Ooi (1998: 112). Dodd called them sound effects, and such
recordings are now available in the free Macmillan English Dictionary Online. There, the user
can hear the sounds produced by musical instruments under their relevant headwords, both
popular ones (GUITAR, PIANO, VIOLIN, RECORDER), as well the less well-known (SITAR).
Animal noises and bird calls are likewise included (ROAR, HOOT: perhaps also worth linking
under the entries LION and OWL), as well as sounds made by humans (CLAP, LAUGH, HICCUP),
and noisy machines (TRAIN, HELICOPTER).
3.4.3 Video and animation
With the speed of the internet steadily on the increase, video content is becoming mainstream
on the web. However, English online dictionaries have not really embraced the video
technology so far. This caution may, in fact, be well-motivated: Chun and Plass (1996) point
out that video sequences are too transient to allow the spectator to build a stable mental
model. Thus, videos may not make good cognitive sense, because the viewer may be unable
to pace the information processing at the rate that works for them.
Similar reservations can be raised for animated graphics, and there is at least one empirical
study which appears to substantiate the pessimistic view of the effectiveness of animations, at
least for dictionary-induced vocabulary learning. Lew and Doroszewska’s recent study (2009)
found a strong and significant negative impact of viewing animations on vocabulary retention.
3.5 Dictionaries, corpora and lexical databases
We have seen above repeatedly online dictionaries using WordNet data. In fact, WordNet is
often loosely referred to as a “dictionary”, even though, in more careful usage, it is a lexical
database rather than a dictionary. I suspect that for the average user, the distinction is too fine
a point. Yet, if we look at the recent history of dictionary-making, we see the growing role of
information technology and structured data: corpora, databases, the use of structured markup
such as XML. The current trend then is towards a clearer separation of the data layer from
presentation, in line with Sue Atkins’ visionary proposal (1996). Increasingly, the dictionary
as the user sees it is likely to be but an epiphenomenon on a structured lexical database or
corpus, and the presentation layer is set to become an automated procedure, requiring little or
no human intervention (De Schryver 2009; Atkins et al. 2010; Kilgarriff and Rychlý 2010)
(also see Almind and Nielsen, this volume, Gouws, this volume?).
Indeed, as corpus interfaces and wrappers get increasingly sophisticated, they can be used
in some ways similar to dictionaries, so that even a more cultured user may not care what’s
“under the hood” as long as the interface can be used as a sort-of dictionary. As an example,
consider the fully automatic collocations dictionary ForBetterEnglish.com,38 which uses the
SketchEngine and GDEX technologies (Kilgarriff et al. 2008) on server-resident corpora to
automatically produce entries such as the one in Figure 2. Clearly, it takes quite an expert to
tell that this is not your usual human-made dictionary entry. The illusion would have been
even better if the type-of-collocation indicators (object_of, etc.) had been given less technical
and more user-friendly names.
Figure 2: Entry for TOOTH in the ForBetterEnglish.com automated collocations dictionary
Another corpus-based online resource, also having to do with English collocations,
JustTheWord,39 is even capable of correcting unnatural word combinations. Figure 3 shows
the output for the query POWERFUL TEA with the “find alternatives” option selected. The
interface indicates whether the word combination is “good” (green bar on the right, colours
not shown in print), or “bad” (red bar), and the length of the bar indicates the (un)typicality of
the word combination. Further, the narrow blue bar directly underneath each combination
indicates the degree of meaning similarity between the combination to be replaced and each
candidate for replacement. Here, the collocation strong tea has the longest blue bar, and
indeed this is the idiomatic phrase that a learner of English would have wanted to use instead
of the non-idiomatic powerful tea, had they known any better themselves. All in all, the
information provided is very useful and relevant, and it may actually be hard to believe that
this output has been computed fully automatically.
Figure 3: JustTheWord alternative collocation suggestions for ‘powerful tea’
There exist other “smart” interfaces to corpora. One of them is http://corpus.byu.edu,
created and maintained by Mark Davies, and it offers free access to several corpora, including
the Corpus of Contemporary American English (COCA),40 currently the largest publicly
available corpus of English. Another one is the SketchEngine,41 available by subscription. A
subset of the British Academic Spoken English corpus is available through IBM’s many
eyes42 clever visualizing interface, allowing the user to investigate the syntagmatic
relationships of the most common words, though it is not all that useful for the less common
combinations, due to small corpus size. A rich and comprehensive lexical database of English
with a dictionary-like interface will very soon become publicly available online as part of the
DANTE43 project.
These resources represent a high level of sophistication and so there is not much hope that
their popularity will extend much beyond a relatively small group of power users; the others
will just increasingly Google for any answers, irrespective of the nature of the problem, and I
fear that this tendency presents a real threat to more specialized reference tools, including
dictionaries.
4 Summary and conclusion
In our necessarily sketchy overview of English online dictionaries, we have seen that a great
variety of dictionaries exist, but without proper guidance users run the risk of getting lost in
the riches. It is surprising to see so many of the online dictionaries (including some from
respectable publishers) still largely constrained by the paper model, with access mechanisms
to lexicographic data often being substandard for today’s technology. Furthermore, users may
get flooded with irrelevant and highly repetitive information, especially by dictionary
aggregators. And even if hyperlinking to external sources embodies the best practice in
hypertext philosophy, it is not without danger, as it relinquishes much of the control over the
content of “our” dictionary page. More generally, the universal use of search engines (or one
dominant search engine) presents a risk of dictionaries (or any specialized online works of
reference) being marginalized. Finally, learners of English are still waiting for a function-
driven lexical resource of the type represented by the excellent Base lexicale du français44
(Verlinde et al. 2010 and this volume).
Notes
1 http://dictionary.cambridge.org
2 http://oxforddictionaries.com
3 http://www.collinslanguage.com
4 http://www.chambersharrap.co.uk/chambers/features/chref/chref.py/main
5 http://encarta.msn.com/encnet/features/dictionary/dictionaryhome.aspx
6 http://www.internetworldstats.com
7 http://www.oup.com/elt/catalogue/teachersites/oald7/lookup?oup_jspFileName=document.jsp&cc=pl
8 http://www.ldoceonline.com
9 http://ldoce.longmandictionariesonline.com/dict/SearchEntry.html
10 http://dictionary.cambridge.org
11 http://www.macmillandictionary.com
12 http://www.mycobuild.com
13 http://dictionary.reverso.net/english-cobuild
14 http://www.learnersdictionary.com
15 http://nhd.heinle.com/home.aspx
16 http://dictionary.cambridge.org/Default.asp?dict=A
17 http://www.urbandictionary.com
18 http://en.wiktionary.org
19 http://www.wordnik.com
20 http://www3.merriam-webster.com/opendictionary/
21 http://www.macmillandictionary.com/open-dictionary/latestEntries.htm
22 http://www.macmillandictionaryblog.com, winner of the 2009 Edublog award for best education blog on the
web
23 http://quod.lib.umich.edu/m/med
24 http://www.merckmedicus.com/pp/us/hcp/thcp_dorlands_content_split.jsp?pg=/ppdocs/us/common/dorlands/
drlnd/misc/dmd-a-b-000.htm
25 http://www.acronymfinder.com
26 http://www.etymonline.com
27 http://www.howjsay.com, the domain name being an eye-dialect rendition of the casual pronunciation of the
phrase ‘how do you say?’
28 http://www.cmu.edu
29 http://seas3.elte.hu/epd.html
30 http://thesaurus.com/?regHome=true
31 http://www.rhymezone.com
32 http://wordnetweb.princeton.edu
33 http://www.visuwords.com
34 http://www.visualthesaurus.com
35 http://dictionary.reference.com
36 http://www.myCobuild.com
37 One example is http://www.shanghaidaily.com
38 http://forbetterenglish.com
39 http://193.133.140.102/justTheWord, Sharp Laboratories
40 http://www.americancorpus.org
41 http://www.sketchengine.co.uk
42 http://manyeyes.alphaworks.ibm.com/manyeyes/visualizations/3e335458358611de909d000255111976
43 http://www.webdante.com
44 http://ilt.kuleuven.be/blf
References
Atkins, Beryl T. Sue. 1996. ‘Bilingual Dictionaries - Past, Present and Future’ in Gellerstam,
Martin, Jerker Jarborg, Sven-Göran Malmgren, Kerstin Noren, Lena Rogström and
Catarina Röjder Papmehl (eds.), EURALEX '96 Proceedings. Göteborg: Department of
Swedish, Göteborg University, 515-546.
Atkins, Beryl T. Sue, Adam Kilgarriff and Michael Rundell. 2010. ‘Database of Analysed
Texts of English (Dante): The Neid Database Project’ in Dykstra, Anne and Tanneke
Schoonheim (eds.), Proceedings of the XIV Euralex International Congress. Ljouwert:
Afûk, 549-556.
Béjoint, Henri. 2010. The Lexicography of English. From Origins to Present. Oxford:
Oxford University Press.
Bogaards, Paul. 2010. ‘The Evolution of Learners' Dictionaries and Merriam-Webster's
Advanced Learner's English Dictionary’ in Kernerman, Ilan and Paul Bogaards (eds.),
English Learners' Dictionaries at the DSNA 2009. Tel Aviv: K Dictionaries, 11-27.
Carr, Michael. 1997. ‘Internet Dictionaries and Lexicography.’ International Journal of
Lexicography 10.3: 209-230.
Chun, Dorothy M. and Jan L. Plass. 1996. ‘Effects of Multimedia Annotations on
Vocabulary Acquisition.’ Modern Language Journal 80.2: 183-198.
Cowie, Anthony Paul. 1999. English Dictionaries for Foreign Learners: A History. Oxford:
Clarendon Press.
De Schryver, Gilles-Maurice. 2009. ‘State-of-the-Art Software to Support Intelligent
Lexicography’ in Zhu, R. (ed.), Proceedings of the International Seminar on Kangxi
Dictionary & Lexicology. Beijing: Beijing Normal University, 565–580.
Dodd, W. Steven. 1989. ‘Lexicomputing and the Dictionary of the Future’ in James, Gregory
(ed.), Lexicographers and Their Works. Exeter Linguistic Studies 14. Exeter: Exeter
University Press, 83-93.
Fuertes-Olivera, Pedro A. 2009. ‘The Function Theory of Lexicography and Electronic
Dictionaries: Wiktionary as a Prototype of Collective Free Multiple-Language
Internet Dictionary’ in Bergenholtz, Henning, Sandro Nielsen and Sven Tarp (eds.),
Lexicography at a Crossroads: Dictionaries and Encyclopedias Today,
Lexicographical Tools Tomorrow. Linguistic Insights - Studies in Language and
Communication, Vol.90. Bern: Peter Lang, 99-134.
Hanks, Patrick. 2009. ‘Review of Stephen J. Perrault (Ed.). 2008. Merriam-Webster's
Advanced Learner's English Dictionary.’ International Journal of Lexicography 22.3:
301-315.
Hulstijn, Jan H. and Beryl T. Sue Atkins. 1998. ‘Empirical Research on Dictionary Use in
Foreign-Language Learning: Survey and Discussion’ in Atkins, Beryl T. Sue (ed.),
Using Dictionaries. Studies of Dictionary Use by Language Learners and Translators.
Lexicographica Series Maior 88. Tübingen: Niemeyer, 7-19.
Kilgarriff, Adam, Milos Husak, Katy McAdam, Michael Rundell and Pavel Rychlý.
2008. ‘GDEX: Automatically Finding Good Dictionary Examples in a Corpus’ in
Bernal, Elisenda and Janet DeCesaris (eds.), Proceedings of the XIII EURALEX
International Congress. Barcelona: Universitat Pompeu Fabra, 425-432.
Kilgarriff, Adam and Pavel Rychlý. 2010. ‘Semi-Automatic Dictionary Drafting’ in De
Schryver, Gilles-Maurice (ed.), A Way with Words: Recent Advances in Lexical
Theory and Analysis. A Festschrift for Patrick Hanks. Kampala: Menha Publishers,
299-312.
Lew, Robert. 2010. ‘Multimodal Lexicography: The Representation of Meaning in Electronic
Dictionaries.’ Lexikos 20.
Ooi, Vincent Beng Yeow. 1998. Computer Corpus Lexicography. Edinburgh: Edinburgh
University Press.
Rundell, Michael. 2006. ‘More Than One Way to Skin a Cat: Why Full-Sentence Definitions
Have Not Been Universally Adopted’ in Corino, Elisa, Carla Marello and Cristina
Onesti (eds.), Atti Del XII Congresso Di Lessicografia, Torino, 6-9 Settembre 2006.
Allessandria: Edizioni dell'Orso, 323-337.
Sobkowiak, Włodzimierz. 2009. ‘Review of Wells, John C., Longman Pronunciation
Dictionary (3rd Edition).’ International Journal of Lexicography 22.2: 191-209.
Tarp, Sven. 2008. Lexicography in the Borderland between Knowledge and Non-Knowledge:
General Lexicographical Theory with Particular Focus on Learner’s Lexicography.
(Lexicographica Series Maior 134.). Tübingen: Max Niemeyer Verlag.
Tono, Yukio. 2011. ‘Application of Eye-Tracking in EFL Learners’ Dictionary Look-up
Process Research.’ International Journal of Lexicography 23.1.
Verlinde, Serge, Patrick Leroyer and Jean Binon. 2010. ‘Search and You Will Find. From
Stand-Alone Lexicographic Tools to User Driven Task and Problem-Oriented
Multifunctional Leximats.’ International Journal of Lexicography 23.1: 1-17.
Wiegand, Herbert Ernst, Michael Beißwenger, Rufus H. Gouws, Matthias Kammerer,
Angelika Storrer and Werner Wolski. 2010. Wörterbuch Zur Lexikographie und
Wörterbuchforschung. Dictionary of Lexicography and Dictionary Research. Vol. 1
(A-C). Berlin: Walter de Gruyter.
Yamada, Shigeru. 2010. ‘EFL Dictionary Evolution: Innovations and Drawbacks’ in
Kernerman, Ilan and Paul Bogaards (eds.), English Learners' Dictionaries at the
DSNA 2009. Tel Aviv: K Dictionaries, 147-168.