Combined sources of low predictability

Combined sources of low predictability

Source publication
Article
Full-text available
Recent literature has highlighted the extent to which inflectional paradigms are organised into systems of implications allowing speakers to make full use of the inflection system on the basis of exposure to only a few forms of each word. The present paper contributes to this line of research by investigating in detail the implicative structure of...

Context in source publication

Context 1
... have now identified the two main sources of unpredictability in the system. This is shown visually in Figure 5, where rows and columns have been sorted by both stress place­ ment and number of contrasts in theme vowels, and areas where identified determinants of predictability have the same effect have been materialised by dashed lines. It is strik­ ing here that, as expected, we have very low entropy (lower than 0.1) for pairs of cells in the four areas where (i) there is no contrast in stress placement between predictor and predictee, and (ii) the predictor exhibits as many or more theme vowel contrasts than the predictee. ...

Similar publications

Article
Full-text available
Esta pesquisa tem como objetivo analisar a emergência de templates, rotinas articulatórias usadas como meio de expansão lexical, no desenvolvimento linguístico de uma criança bilíngue (sujeito B) falante do português europeu (PE) e do francês. Para tal, utilizamos o Paradigma da Complexidade (THELEN; SMITH, 1994; LARSEN-FREEMAN, 1997) por entender...

Citations

... The first is that many of the lower frequency verbs are unknown to even highly-educated native speakers of Romanian, and hence probably do not really form part of the average acquired inflectional system of the language, despite the presence of these verbs (and many others) in dictionaries and grammars that are understandably aimed at exhaustivity. The second is that this number of verbs is closer to the average number of items analyzed with identical methods in the extant literature on other Romance languages (see Bonami et al., 2014, Pellegrini & Passarotti, 2018, Pellegrini & Cignarella, 2020, Beniamine et al., 2021, and Herce, 2023, which will enable us to draw more meaningful cross-linguistic comparisons. After the mentioned exclusions (which included nonstandard overabundant forms), all remaining verbs and forms were analyzed in Qumin (Quantitative Modelling of Inflection, Beniamine, 2018). ...
... Z2, Z8, Z9, Z10, and Z11) are the ones based on predictable morphological alternations like the one in Table 2). Notable commonalities can be identified between Romanian verbal inflection and that of the other major Romance languages analyzed with this same methodology to date (Beniamine, 2018, Pellegrini & Cignarella, 2020, Beniamine et al., 2021, and Herce, 2023. The number of interpredictability areas (14 in Romanian, vs 15 in Italian [and Latin], 14 in Spanish and French, and 12 in Portuguese), and their distribution (e.g. ...
... For more detailed explanation of how the alternations are identified, for example when multiple descriptions are possible, (seeBeniamine et al., 2021).Content courtesy of Springer Nature, terms of use apply. Rights reserved. ...
Article
Full-text available
This paper presents VeLeRo, an inflected lexicon of Standard Romanian which contains the full paradigm of 7297 verbs in phonological form. We explain the process by which the resource was compiled, and how stress, diphthongs and hiatus, consonant palatalization, and other relevant issues were handled in phonemization. On the basis of the most token-frequent verbs in VeLeRo, we also perform a quantitative analysis of morphological predictability in Romanian verbs, whose complexity patterns are presented within the broader Romance context.
... forms rather than by trying to segment forms and isolate stems and exponents (see also Beniamine et al., 2021). 6 On the basis of suprasegmental alternations, verb paradigms can be partitioned into four areas of full interpredictability, where knowledge of one cell of the area provides full knowledge of all the other cells of the area (i.e. ...
... We thus follow the way set by an increasing number of works on well-described(Beniamine et al., 2021) or under-described(Jacques et al., 2012;Snoek et al., 2014;Crysmann, 2016;Harrigan et al., 2017;Pellard ...
Article
Full-text available
Formal and computational linguistics can enhance descriptive linguistics of endangered languages by providing them with precise models and quantitative perspectives. We exemplify the benefits of such an approach with the case of Asama’s verb inflectional morphology. We show that a Word-and-Paradigm framework can provide interesting insights and allow for both the identification and the quantification of the sources of uncertainty in the implicative relations within Asama’s verb paradigms. We describe Asama’s verb morphology by considering whole forms rather than exponents only, and we factor its alternation patterns in two types: segmental alternations and suprasegmental alternations. Measures of Shannon’s conditional entropy are then used to estimate the respective contributions of these factors to the complexity of the system. Suprasegmental alternations turn out to be the major source of uncertainty in implicative relations, and vowel length and tone alternations cannot be treated separately but strongly interact. We also show how the principal parts of the inflectional system can be determined with conditional entropy measures of n-ary implicative relations.
... Maiden 2018). Regarding in ected lexicons in phonological rather than orthographic form, we have both family-wide but low-resolution 1 ones (see ODRVM, Maiden et al. 2010), as well as high-resolution ones for some of the major languages (namely French [Bonami et al. 2014], Latin [Pellegrini & Passarotti 2018], Italian [Pellegrini & Cignarella 2020], and Portuguese [Beniamine et al. 2021]). There is, however, maybe surprisingly, no comparable resource for Spanish, and hence also no analogous quantitative analysis of morphological predictability in Spanish verbs. ...
... Zone 3, i.e. the 1SG.PRS.IND is, as Table 4 shows, the least informative cell (see also Table 5). This is also clearly the case in Portuguese (see Beniamine et al. 2021). 4 Also shared with the other national standard Ibero-Romance language (and beyond in this case) is the fact that the rhizotonic cells of the present (the so-called N-morphome, i.e. domains Z1, Z3, Z4 and Z5) are the least predictable from other cells in the paradigm. ...
... This is also clearly the case in Portuguese (see Beniamine et al. 2021). 4 Also shared with the other national standard Ibero-Romance language (and beyond in this case) is the fact that the rhizotonic cells of the present (the so-called N-morphome, i.e. domains Z1, Z3, Z4 and Z5) are the least predictable from other cells in the paradigm. In Spanish, this is largely the result of unpredictable stem-vowel diphthongizations (e.g. ...
Preprint
Full-text available
This paper presents VeLeSpa, a verbal lexicon of Peninsular Spanish, which contains the full paradigms (all 63 cells) in phonological form of 6553 verbs, along with their corresponding frequencies. In this paper, the process and decisions involved in the building of the resource are presented. In addition, based on the most frequent 3000 + verbs, a quantitative analysis is conducted of morphological predictability in Spanish verbal inflection. The results and their drivers are discussed, as well as observed differences with other Romance languages and Latin.
... A final observation with respect to the cognitive-paradigmatic domains identified here through morphological innovations in stems is that they are remarkably close to the domains of interpredictability identified through conditional entropies in individual Romance languages (e.g. Pellegrini & Cignarella, 2020, Beniamine et al., 2021 B. Herce Table 21 Interpredictability areas based on suffixal allomorphy in the Spanish PRS.IND Table 22 Interpredictability areas based on stem allomorphy in the Spanish PRS.IND and also in Latin (Pellegrini, 2020), looking at whole word forms. Suffixal and stem allomorphy, in fact, seem to behave alike in many instances (see Tables 21 vs 22), which casts doubt on the usefulness and empirical basis of segmentation in at least some cases. ...
Article
Full-text available
Morphologists of different backgrounds disagree with respect to the degree of autonomy of the morphological component of language from syntax and semantics. A precise and objective quantification of the diachronic productivity of Romance morphomes is the piece of evidence most crucially missing from this debate. On the basis of 502 morphophonological innovations associated with the loss of stem-final consonants across 63 Romance varieties, this paper quantifies the degree of productivity of different morphomes (the N pattern is found to be the most productive one) and of morphomic templates generally (15% of novel stem alternations are found to abide by them). Although a strong attraction effect is detectable for morphomes, the numbers suggest that the morphological autonomy and longevity of stem alternations in the family might have been somewhat overstated. For an optimal account of the morphological innovations observed, reference to inherited morphomic structure, semantic structure, and to frequency of use are needed in similar proportions.
Article
Full-text available
The process by which awareness and/or knowledge of linguistic categories arises from exposure to patterns in data alone, known as emergence, is the corner stone of usage-based approaches to language. The present paper zooms in on the types of patterns that language users may detect in the input to determine the content, and hence the nature, of the hypothesised morphological category of aspect. The large-scale corpus and computational studies we present focus on the morphological encoding of temporal information as exemplified by aspect (imperfective/perfective) in Polish. Aspect is so heavily grammaticalized that it is marked on every verb form, yielding the practice of positing infinitival verb pairs (‘do’ = ‘robićimpf/zrobićpf’) to represent a complete aspectual paradigm. As has been shown for nominal declension, however, aspectual usage appears uneven, with 90% of verbs strongly preferring one aspect over the other. This makes the theoretical aspectual paradigm in practice very gappy, triggering an acute sense of partialness in usage. Operationalising emergence as learnability, we simulate learning to use aspect from exposure with a computational implementation of the Rescorla-Wager rule of associative learning. We find that paradigmatic gappiness in usage does not diminish learnability; to the contrary, a very high prediction accuracy is achieved using as cues only the verb and its tense; contextual information does not further improve performance. Aspect emerges as a strongly lexical phenomenon. Hence, the question of cognitive reality of aspectual categories, as an example of morphological categories in general, should be reformulated to ask which continuous cues must be learned to enable categorisation of aspectual outcomes. We discuss how the gappiness of the paradigm plays a crucial role in this process, and how an iteratively learned, continuously developing association presents a possible mechanism by which language users process their experience of cue-outcome co-occurrences and learn to use morphological forms, without the need for abstractions.
Article
Form predictability has long been known to influence speaker behaviour in language learning and use. However, this observation has largely remained dissociated from the question of the most apt theoretical framing of the effects observed. We set out to seek evidence that speakers’ relationship to form predictability is best characterised in paradigmatic terms: in an experimental task comparable to prediction of one word form from a related one, speakers appear sensitive to the probabilistic, implicative relations that make up a morphological paradigm. We find this effect to be omnidirectional, from any paradigm cell to any paradigm cell. Form predictability does not impact speaker behaviour in a vacuum, but instead works together with aspects of memory and learning to organise the mental lexicon and inform language use. In a corpus study, we map out the complex relationships that exist between paradigmatic form predictability, lexeme frequency and cell frequency in the context of naturalistic language use. Speakers appear to exploit all available probabilistic relationships between the word forms of a language in a way that is predicted by Word and Paradigm theories of morphology, with memory and predictive processing playing a mediating role in all aspects of language use.
Article
Analogy has returned to prominence in the field of inflectional morphology as a basis for new explanations of inflectional productivity. Here we review the rising profile of analogy, identifying key theoretical and methodological developments, areas of success, and priorities for future work. In morphological theory, work within so-called abstractive approaches places analogy at the center of productive processes, though significant conceptual and technical details remain to be settled. The computational modeling of inflectional analogy has a rich and diverse history, and attention is now increasingly directed to understanding inflectional systems through their internal complexity and cross-linguistic diversity. A tension exists between the prima facie promise of analogy to lead to new explanations and its relative lack of theoretical articulation. We bring this to light as we examine questions regarding inflectional defectiveness and whether analogy is reducible to grammar optimization resulting from simplicity biases in learning and language use. Expected final online publication date for the Annual Review of Linguistics, Volume 10 is January 2024. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Preprint
Full-text available
This paper presents VeLeRo, an inflected lexicon of Standard Romanian which contains the full paradigm of 7297 verbs in phonological form. We explain the process by which the resource was compiled, and how stress, diphthongs and hiatus, consonant palatalization, and other relevant issues were handled in phonemization. On the basis of the most token-frequent verbs in VeLeRo, we also perform a quantitative analysis of morphological predictability in Romanian verbs, whose complexity patterns are presented within the broader Romance context.
Article
Within “word-based”, “paradigm-based” or “abstractive” models of inflectional systems ( Blevins 2006 , 2016 ), only full inflected wordforms are considered primitives; subword strings are treated not as distinct entities, but as abstract generalisations inferred by speakers across multiple inflected forms. These models stand in contrast to “constructive” approaches, which proceed from individual, distinct subword units to full words. An argument consistently adduced in favour of abstractive approaches is that they afford a descriptive advantage regarding “fusional” systems characterised by pervasive non-canonical exponence, such as cumulative exponence, extended exponence, and morphomic structure ( Stump 2016 : 17–30). Via an exploration of inflectional phenomena including non-canonical exponence and arbitrary distributional regularities in the verb inflection of standard Swahili, a language usually described as exemplifying “agglutinating” inflection and amenable to constructive, morpheme-based, analyses, this paper will argue that abstractive systems are equally applicable to “agglutinating” inflection, offering greater empirical plausibility and in some cases descriptive advantage.