ArticlePDF Available

The structure of the Merriam-Webster pocket dictionary /

Authors:

Abstract

Thesis (Ph. D.)--University of Texas at Austin, 1980. Vita. Includes bibliographical references (leaves 260-269).
... Dictionary definitions represent a large source of our knowledge of meaning of words (Amsler, 1980). Definitions are composed of a 'genus' phrase and a 'differentia' phrase (Amsler, 1980). ...
... Dictionary definitions represent a large source of our knowledge of meaning of words (Amsler, 1980). Definitions are composed of a 'genus' phrase and a 'differentia' phrase (Amsler, 1980). The 'genus' phrase identifies the general category of the defined word. ...
... The syllabification model was trained using Websters's Pocket Dictionary [17] which was automatically pre-processed according to the numbered ONC procedure. ...
... The first step in building the ontology was the specification of an inventory of basic concepts that are assumed to be lexicalized in a wide range of languages. The ontology acquisition methodology is based on the extraction of type_of hierarchies from dictionary definitions (Martín-Mingorance 1984, 1990, 1995Hirst 2009;Amsler 1980Amsler , 1981. Whereas the genus designates the superordinate concept of the defined word, the differentiating features are the properties that make the concept different from other members of the same conceptual category. ...
Chapter
Full-text available
Role and Reference Grammar (RRG) is a theory of language in which linguistic structures are accounted for in terms of the interplay of discourse, semantics and syntax. With contributions from a team of leading scholars, this Handbook provides a field-defining overview of RRG. Assuming no prior knowledge, it introduces the framework step-by-step, and includes a pedagogical guide for instructors. It features in-depth discussions of syntax, morphology, and lexical semantics, including treatments of lexical and grammatical categories, the syntax of simple clauses and complex sentences, and how the linking of syntax with semantics and discourse works in each of these domains. It illustrates RRG's contribution to the study of language acquisition, language change and processing, computational linguistics, and neurolinguistics, and also contains five grammatical sketches which show how RRG analyses work in practice. Comprehensive yet accessible, it is essential reading for anyone who is interested in how grammar interfaces with meaning.
... Özellikle 1970'li yılların sonu, 1980'li yılların tamamı MTOS'ların DDİ'de yaygın kullanıldığı dönemde, sözlüksel veri tabanı yanı sıra istatistiksel sözcükbilim disiplini hakkında yeni fikirler gelişir (bkz. Amsler, 1980;Michiels, 1982;Kay, 1984;Wilks et al., 1996). İçeriği özdeş basılı sözlükleri yansıtan CD'lerin çıkışı 1980'li yıllara kadar geriye gider. ...
Article
Full-text available
The use of computers in practical lexicography in the 1960s introduced a fundamental change in traditional lexicography. Especially in the 1990s, the accumulating of electronic language data and the increasing use of low-cost high-speed broadband internet at the beginning of the 21st century is leading to a rapid growth of e- lexicography studies. Along with these developments, many printed dictionaries became available as e-dictionaries within a short time. The advantage of electronically published dictionaries over printed dictionaries catches the attention of many lexicographers immediately. But nobody noticed the problems caused by hybridizations. Likewise, issues such as corpus coherence, data reliability, access path, quality, and utility attract few researchers' attention. Nevertheless, modern developments in e- lexicography have led to an unbalanced growth in expectations of the discipline, as well as a radical change in its functional field. In this study, the definition of the term computer lexicography is emphasized in addition to the historical advances of e-lexicography. And the phases of e-dictionary making were also discussed. In addition, an attempt was made to determine whether there are problems with today's e-dictionaries in terms of hybridization, corpus, data reliability, access path, personalization and quality. It is known that electronic devices require speech technology software to analyse linguistic and lexicographical data, and language technology applications require lexical data to be readable/understandable. Language needs to be standardized, because e-devices are more sensitive than humans to detect mistakes and errors. Therefore, this study makes some suggestions to identify and get rid of problems with e-dictionaries.
... Moreover, these definitions have a recurrent structure, which can be definitely used to derive a simpler model. Definitions for words w are often organized as a particular sentence that contains the super-type of w and a modifier, which specializes the super-type (Amsler, 1980). For example ( Fig. 1), cheerlessness is defined in WordNet as a feeling, which is the super-type, and of dreary and pessimistic sadness, which is the modifier. ...
... The first methods proposed in the 1970s and 1980s were aimed at creating semantic language models through the analysis of computer dictionaries, which were considered as sources of taxonomic relations "hyponymhyperonym." To extract relations from dictionaries, rule-based algorithms were used [3,4]. The advantage of these methods was that they extracted semantic relations from reliable sources, which can be considered already partially structured, since dictionaries work as "implicit taxonomies." ...
Article
Full-text available
This article considers the problem of finding text documents similar in meaning in the corpus. We investigate a problem arising when developing applied intelligent information systems that is non-detection of a part of solutions by the TF-IDF algorithm: one can lose some document pairs that are similar according to human assessment, but receive a low similarity assessment from the program. A modification of the algorithm, with the replacement of the complete vocabulary with a vocabulary of specific terms is proposed. The addition of thesauri when building a corpus vector model based on a ranking function has not been previously investigated; the use of thesauri has so far been studied only to improve topic models. The purpose of this work is to improve the quality of the solution by minimizing the loss of its significant part and not adding “false similar” pairs of documents. The improvement is provided by the use of a vocabulary of specific terms extracted from the text of the analyzed documents when calculating the TF-IDF values for corpus vector representation. The experiment was carried out on two corpora of structured normative and technical documents united by a subject: state standards related to information technology and to the field of railways. The glossary of specific terms was compiled by automatic analysis of the text of the documents under consideration, and rule-based NER methods were used. It was demonstrated that the calculation of TF-IDF based on the terminology vocabulary gives more relevant results for the problem under study, which confirmed the hypothesis put forward. The proposed method is less dependent on the shortcomings of the text layer (such as recognition errors) than the calculation of the documents’ proximity using the complete vocabulary of the corpus. We determined the factors that can affect the quality of the decision: the way of compiling a terminology vocabulary, the choice of the range of n-grams for the vocabulary, the correctness of the wording of specific terms and the validity of their inclusion in the glossary of the document. The findings can be used to solve applied problems related to the search for documents that are close in meaning, such as semantic search, taking into account the subject area, corporate search in multi-user mode, detection of hidden plagiarism, identification of contradictions in a collection of documents, determination of novelty in documents when building a knowledge base.
... Rather than relying on such curated lexical resources that are not readily available for the majority of languages, we propose a method capable of improving embeddings by leveraging the more common resource of monolingual dictionaries. 1 Lexical databases such as WordNet (Fellbaum, 1998) are often built from dictionary definitions, as was proposed earlier by Amsler (1980). We propose to bypass the process of explicitly building a lexical database -during which information is structured but information is also lostand instead directly use its detailed source: dictionary definitions. ...
Article
Full-text available
Polysemy is the type of lexical ambiguity where a word has multiple distinct but related interpretations. In the past decade, it has been the subject of a great many studies across multiple disciplines including linguistics, psychology, neuroscience, and computational linguistics, which have made it increasingly clear that the complexity of polysemy precludes simple, universal answers, especially concerning the representation and processing of polysemous words. But fuelled by the growing availability of large, crowdsourced datasets providing substantial empirical evidence; improved behavioral methodology; and the development of contextualised language models capable of encoding the fine-grained meaning of a word within a given context, the literature on polysemy recently has developed more complex theoretical analyses. In this survey we discuss these recent contributions to the investigation of polysemy against the backdrop of a long legacy of research across multiple decades and disciplines. Our aim is to bring together different perspectives to achieve a more complete picture of the heterogeneity and complexity of the phenomenon of polysemy. Specifically, we highlight evidence supporting a range of hybrid models of the mental processing of polysemes. These hybrid models combine elements from different previous theoretical approaches to explain patterns and idiosyncrasies in the processing of polysemous that the best known models so far have failed to account for. Our literature review finds that i) traditional analyses of polysemy can be limited in their generalisability by loose definitions and selective materials; ii) linguistic tests provide useful evidence on individual cases, but fail to capture the full range of factors involved in the processing of polysemous sense extensions; and iii) recent behavioural (psycho) linguistics studies, largescale annotation efforts and investigations leveraging contextualised language models provide accumulating evidence suggesting that polysemous sense similarity covers a wide spectrum between identity of sense and homonymy-like unrelatedness of meaning. We hope that the interdisciplinary account of polysemy provided in this survey inspires further fundamental research on the nature of polysemy and better equips applied research to deal with the complexity surrounding the phenomenon, e.g. by enabling the development of benchmarks and testing paradigms for large language models informed by a greater portion of the rich evidence on the phenomenon currently available.
Article
Full-text available
Social conflict in Mamasa was a crucial issue. The conflict which initially had a political dimension has led to ethnic and religious conflicts. Various efforts to rebuild the social integration of the community continue to be carried out. This study aims to examine the role of religious identity representation as media of conflict resolution in building social reintegration. The author analyzes the electoral arena in Mamasa which always provides space for religious representation in contesting. The research used an ethnographic approach to explain this phenomenon. Data collection techniques were carried out by observation, interviews and documentation. This study found that religious representation in the regional head election has a significant role in the process of social reintegration of the community in Mamasa district, West Sulawesi. In the process, the community does not rely on extreme identity politics (ethnic and religious) but rather builds moderate political awareness. These political choices contribute to the development of social integration in society.
Book
Full-text available
This monograph written in Tamil has eleven chapters. The first ten chapters deal about the principles of generative lexicon based on James Pustejovsky’s (1995) book on “Generative lexicon. The generative lexicon (shortly GL) presents a new and exciting theory of lexical semantics that addresses the problem of the “multiplicity of word meaning”, that is, how we are able to give an infinite number of senses to words with finite means. As the first formally elaborated theory of generative approach to word meaning, it lays the foundation for an implemented computational treatment of word meaning that connects explicitly to a compositional semantics. In contrast to static view of word meaning (where each word is characterized by a predetermined number of word senses) that imposes a tremendous bottleneck on the performance capability of any natural language processing, Pustejovsky proposes that the lexicon becomes an active and central component in the linguistic description. The essence of his theory is that the lexicon functions generatively, first by providing a rich and expressive vocabulary for characterizing lexical information; then, by developing a frame work for manipulating fine-grained distinctions in word descriptions; and finally, by formalizing a set of mechanisms for specialized composition of aspects of such description of words, as they occur in context, extended and novel senses are generated. The second part of the book is on polysemantic study of Tamil verbs based on principles of generative lexicon. Taking into account the principles of generative lexicon propounded by Pustejovsky, the semantic extension of meanings has been studied taking the data from Crea’s Modern Tamil Dictionary (Ramakrishnan 2008). Nearly 28 verbs have been taken from Crea and their meanings have been tabulated by extracting the participants of actions expressed by verbs such as subject, object, indirect object, location, etc., from the examples given for meanings of each verbal entry. The participants are realized as noun phrases (NPs). The verbs such as acai1 ‘move’ , acai2 ‘cause to move’, anjcu ‘be afraid of’, aTangku ‘subside’, aTakku ‘control’, aTi ‘beat’, aTai1 ‘get’ and aTai2 ‘confine’, aNi ‘wear’, aNai1 ‘be extinguished’ and aNai2 ‘extinguish’, aNai3 ‘embrace’, amai1 ‘be established’, amai2 ‘establish’, aaTu ‘move’, aaTTu ‘shake’, izu ‘pull’, iir ‘attract’, uTai ‘be broken’, uTai ‘break’, uuRu ‘secrete’, uuRRu ‘pour’, eTu ‘take’, eeRu ‘climb’, eeRRu ‘load’, ooTu ‘run’, ooTTu ‘cause to run’, kaTTu ‘construct’, kalai ‘become disorderly’, kalai ‘change the order’, ceer1 ‘join’, ceer ‘assemble’, taTTu ‘pat’, ndaTa ‘walk’ and maRRu ‘change’ have been selected from Crea. From the meanings of each verb and taking into account the participants of the action such as subject, object, indirect object, location, etc. realized as NPs, the sematic features of the participants are studied and the polysemy and semantic extension of the verbs are plotted and given explanation from the point of view of Pustejovsky.
ResearchGate has not been able to resolve any references for this publication.