The CELEX Lexical Database (Release 2) [CD-ROM

Five-month-old infants' recognition of frequent and rare syllables

Thesis

Full-text available

Jan 2018

Lloyd Chilcott

Over the course of an infant’s first steps into their native language, they must learn and isolate new words whilst having very little knowledge of their language to support them. One proposed solution to this word-segmentation problem is the bottom-up, statistical learning mechanism, which tracks the transitional probabilities between syllables to discover initial word forms. In the present study, we explored statistical learning at the onset of word-segmentation and tested 5-month-old British English-learning infants’ recognition of frequent and rare English syllables using the head-turn preference procedure. We found the participants failed to demonstrate a significant preference for either set of syllables. Subsequently, no evidence was found to support the statistical learning hypothesis with 5-month-olds.

Exploring Semanticity for Content and Function Word Distinction in Catalan

Article

Full-text available

May 2024

In the realm of linguistics, the concept of “semanticity” was recently introduced as a novel measure designed to study linguistic networks. In a given text, semanticity is defined as the ratio of the potential number of meanings associated with a word to the number of different words with which it is linguistically linked. This concept provides a quantitative indicator that reflects a word’s semantic complexity and its role in a language. In this pilot study, we applied the semanticity measure to the Catalan language, aiming to investigate its effectiveness in automatically distinguishing content words from function words. For this purpose, the measure of semanticity has been applied to a large corpus of texts written in Catalan. We show that the semanticity of words allows us to classify the word classes existing in Catalan in a simple way so that both the semantic and syntactic capacity of each word within a language can be integrated under this parameter. By means of this semanticity measure, it has been observed that adverbs behave like function words in Catalan. This approach offers a quantitative and objective tool for researchers and linguists to gain insights into the structure and dynamics of languages, contributing to a deeper understanding of their underlying principles. The application of semanticity to Catalan is a promising pilot study, with potential applications in other languages, which will allow progress to be made in the field of theoretical linguistics and contribute to the development of automated linguistic tools.

Frequency effects in linear discriminative learning

Article

Full-text available

Jan 2024
FRONT HUM NEUROSCI

Word frequency is a strong predictor in most lexical processing tasks. Thus, any model of word recognition needs to account for how word frequency effects arise. The Discriminative Lexicon Model (DLM) models lexical processing with mappings between words' forms and their meanings. Comprehension and production are modeled via linear mappings between the two domains. So far, the mappings within the model can either be obtained incrementally via error-driven learning, a computationally expensive process able to capture frequency effects, or in an efficient, but frequency-agnostic solution modeling the theoretical endstate of learning (EL) where all words are learned optimally. In the present study we show how an efficient, yet frequency-informed mapping between form and meaning can be obtained (Frequency-informed learning; FIL). We find that FIL well approximates an incremental solution while being computationally much cheaper. FIL shows a relatively low type- and high token-accuracy, demonstrating that the model is able to process most word tokens encountered by speakers in daily life correctly. We use FIL to model reaction times in the Dutch Lexicon Project by means of a Gaussian Location Scale Model and find that FIL predicts well the S-shaped relationship between frequency and the mean of reaction times but underestimates the variance of reaction times for low frequency words. FIL is also better able to account for priming effects in an auditory lexical decision task in Mandarin Chinese, compared to EL. Finally, we used ordered data from CHILDES to compare mappings obtained with FIL and incremental learning. We show that the mappings are highly correlated, but that with FIL some nuances based on word ordering effects are lost. Our results show how frequency effects in a learning model can be simulated efficiently, and raise questions about how to best account for low-frequency words in cognitive models.

Quantifying the redundancy between prosody and text

Conference Paper

Jan 2023

Computing the Sound-Sense Harmony - in the poetic works by William Shakespeare and Francis Webb

Preprint

Full-text available

Jul 2023

Rodolfo Delmonte

We assume that poetic devices have an implicit goal: producing an overall sound scheme that will induce the reader to associate intended and expressed meaning to the sound of the poem. Sounds may be organized into categories and assigned presumed meaning as suggested by traditional literary studies. In my work, I have extracted automatically the sound grids of all the sonnets by William Shakespeare and have combined them with the themes expressed by their contents. In a first experiment I have computed lexically and semantically based sentiment analysis obtaining an 80% of agreement. In a second experiment sentiment analysis has been substituted by Appraisal Theory thus obtaining a more fine-grained interpretation which in some cases contradicts the first one. The computation for the second poet - regarded by many critics the best of last century - includes both vowels and consonants. In addition, it combines automatic semantically and lexically based sentiment analysis with sound grids. The results produce visual maps that clearly separate poems into three clusters: negative harmony, positive harmony and disharmony where the latter instantiates the need by the poet to encompass the opposites in a desperate attempt to reconcile them.

Listening like a native: Unprofitable procedures need to be discarded

Article

Full-text available

May 2023

Two languages, historically related, both have lexical stress, with word stress distinctions signalled in each by the same suprasegmental cues. In each language, words can overlap segmentally but differ in placement of primary versus secondary stress ( OCtopus , ocTOber ). However, secondary stress occurs more often in the words of one language, Dutch, than in the other, English, and largely because of this, Dutch listeners find it helpful to use suprasegmental stress cues when recognising spoken words. English listeners, in contrast, do not; indeed, Dutch listeners can outdo English listeners in correctly identifying the source words of English word fragments ( oc -). Here we show that Dutch-native listeners who reside in an English-speaking environment and have become dominant in English, though still maintaining their use of these stress cues in their L1, ignore the same cues in their L2 English, performing as poorly in the fragment identification task as the L1 English do.

Quality of Measurement in Core Lexicon Measures

Article

Jul 2022

Purpose Core lexicon measures have received growing attention in research. They are intended to provide clinicians with a clinician-friendly means to quantify word retrieval ability in discourse based on normal expectations of discourse production for specific discourse elicitation tasks. To date, different criteria have been used to develop core lexicon measures by groups of researchers. The need for statistical guidance in pursuit of the psychologically robust measure has been recognized. Aims This study was to investigate the best criterion for accurate measurement. Specifically, we focused on two criteria ( frequency vs. percentage ) that have previously been used for the development of core lexicon measures. Method Core lexicon measures consisting of five different checklists by word class (verbs, nouns, adjectives, adverbs, and function words) and developed by the two criteria were applied to language samples produced by 470 cognitively healthy adults. Performance in word retrieval ability at the discourse level was modeled as a latent variable based on the observed proportions of the production of core lexicon items in two different sets of core lexicon measures using structural equation modeling. Results Results indicated that both criterion for core lexicon measures capture word retrieval ability in discourse. Greater residual variances were found in the core lexicon measure established by the percentage criterion compared to the one established by the frequency criterion. This indicates that the measure based on the percentage criterion is more affected by measurement errors. Conclusions The findings provide evidence that the frequency criterion is better to use for the development of core lexicon measures for core nouns, verbs, adjectives, and adverbs, but not for function words. However, our findings are limited to core lexicon measures based on language samples elicited by wordless picture books. This may not be easily applied to other core lexicon measures that use different discourse elicitation tasks due to the difference in quality and quantity of language samples. Ideally, the same approach should be replicated to evaluate the appropriateness of respective criteria in the development of core lexicon measures. Supplemental Material https://doi.org/10.23641/asha.20304144

An inclusive multivariate approach to neural localization of language components

Article

Full-text available

May 2024
BRAIN STRUCT FUNCT

To determine how language is implemented in the brain, it is important to know which brain areas are primarily engaged in language processing and which are not. Existing protocols for localizing language are typically univariate, treating each small unit of brain volume as independent. One prominent example that focuses on the overall language network in functional magnetic resonance imaging (fMRI) uses a contrast between neural responses to sentences and sets of pseudowords (pronounceable nonwords). This contrast reliably activates peri-sylvian language areas but is less sensitive to extra-sylvian areas that are also known to support aspects of language such as word meanings (semantics). In this study, we assess areas where a multivariate, pattern-based approach shows high reproducibility across multiple measurements and participants, identifying these areas as multivariate regions of interest (mROI). We then perform a representational similarity analysis (RSA) of an fMRI dataset where participants made familiarity judgments on written words. We also compare those results to univariate regions of interest (uROI) taken from previous sentences > pseudowords contrasts. RSA with word stimuli defined in terms of their semantic distance showed greater correspondence with neural patterns in mROI than uROI. This was confirmed in two independent datasets, one involving single-word recognition, and the other focused on the meaning of noun-noun phrases by contrasting meaningful phrases > pseudowords. In all cases, areas of spatial overlap between mROI and uROI showed the greatest neural association. This suggests that ROIs defined in terms of multivariate reproducibility can help localize components of language such as semantics. The multivariate approach can also be extended to focus on other aspects of language such as phonology, and can be used along with the univariate approach for inclusively mapping language cortex.

The Effect of Pitch Accent on the Perception of English Lexical Stress: Evidence from English and Mandarin Chinese Listeners

Article

Full-text available

Mar 2024

The relative weighting of f0 and vowel reduction in English spoken word recognition at the sentence level were investigated in one two-alternative forced-choice word identification experiment. In the experiment, an H* pitch-accented or a deaccented word fragment (e.g., AR- in the word archive) was presented at the end of a carrier sentence for identification. The results of the experiment revealed differences in the cue weighting of English lexical stress perception between native and non-native listeners. For native English listeners, vowel quality was a more prominent cue than f0, while native Mandarin Chinese listeners employed both vowel quality and f0 in a comparable fashion. These results suggested that (a) vowel reduction is superior to f0 in signaling initial stress in the words and (b) f0 facilitates the recognition of word initial stress, which is modulated by first language.

Swedish word family resource: Construction, applicability, strengths and first experiments

Article

Full-text available

Feb 2024

The article introduces a novel lexical resource for Swedish based on word family principles. The development of the Swedish Word Family (SweWF) resource is set into the context of linguistic complexity in second language acquisition. The SweWF is particularly appropriate for that, given that it contains lexical items used in second language corpora, namely, in a corpus of coursebook texts, and in a corpus of learner essays. The main focus of the article is on the construction of the resource with its user interface and on its applicability for research, although it also opens vast possibilities for practical applications for language learning, testing and assessment. We demonstrate the value of the resource through several case studies.

Lexical recognition processes in L2-dominant bilingualism

Article

Full-text available

Feb 2024

To comprehend speech, listeners must resolve competition between potential candidate words. In second-language (L2) listening such competition may be inflated by spurious activation; the onsets of “reggae” and “legacy” may both activate “leg” for Japanese listeners, or the rhymes of “adapt” and “adept” may activate “apt” for Dutch listeners, while only one in each pair triggers competition for L1 listeners. Using eyetracking with L2-dominant bilingual emigrants, we directly compared within-language L1 and L2 lexical activation and competition in the same individuals. For these listeners, activation patterns did not differ across languages. Unexpectedly, however, we observed onset competition in both languages but rhyme competition in the L2 only (although the same stimuli elicited rhyme competition for control listeners in both languages). This suggests that L1 rhyme competition may disappear after long-time immersion in an L2 environment.

Production of word stress in German: children and adults

Conference Paper

May 2006

Judgments of learning in bilinguals: Does studying in a L2 hinder learning monitoring?

Article

Full-text available

Dec 2023
PLOS ONE

Nowadays, use of a second language (L2) has taken a central role in daily activities. There are numerous contexts in which people have to process information, acquire new knowledge, or make decisions via a second language. For example, in academia and higher education, English is commonly used as the language of instruction and communication even though English might not be students’ native or first language (L1) and they might not be proficient in it. Such students may face different challenges when studying and learning in L2 relative to contexts in which they study and learn in their L1, and this may affect their metamemory strategies. However, little is yet known about whether metamemory processes undergo significant changes when learning is carried out in L2. The aim of the present study was to investigate the possible consequences on learning derived from studying materials in L2 and, more specifically, to explore whether the interplay between monitoring and control (metamemory processes) changes as a function of the language involved. In three experiments, we explored whether font type (Experiment 1), concreteness (Experiment 2), and relatedness (Experiment 3) affected judgments of learning (JOLs) and memory performance in both L1 and L2. JOLs are considered the result of metacognitive strategies involved in the monitoring of learning and have been reported to vary with the difficulty of the material. The results of this study showed that people were able to monitor their learning in both L1 and L2, even though they judged L2 learning as more difficult than L1. Interestingly, self-perceived difficulty did not hinder learning, and people recognized L2 materials as well or better than L1 materials. We suggest that this might be an example of a desirable difficulty for memory.

THE PHONETICS–PHONOLOGY–SYNTAX INTERFACE: A COMPUTATIONAL IMPLEMENTATION

Conference Paper

Full-text available

Aug 2023

This paper introduces a new, computationally implemented end-to-end system for German that takes a speech signal as input, interprets the phonetic data in phonological/prosodic terms, and makes the results available to a linguistically deep computational grammar. The grammar uses the provided information to disambiguate syntactically ambiguous structures, thus reducing overgeneration. A system evaluation showed promising results for this new combination of automatic speech signal analysis and computational grammars, which is a significant step towards a fine-grained linguistic analysis including all grammar modules and hence towards real automatic speech understanding.

A Comprehensive Comparison of Neural Networks as Cognitive Models of Inflection

Conference Paper

Jan 2022

LaDEP: A large database of English pseudo-compounds

Article

Full-text available

Jul 2023
BEHAV RES METHODS

The Large Database of English Pseudo-compounds (LaDEP) contains nearly 7500 English words which mimic, but do not truly possess, a compound morphemic structure. These pseudo-compounds can be parsed into two free morpheme constituents (e.g., car-pet), but neither constituent functions as a morpheme within the overall word structure. The items were manually coded as pseudo-compounds, further coded for features related to their morphological structure (e.g., presence of multiple affixes, as in ruler-ship), and summarized using common psycholinguistic variables (e.g., length, frequency). This paper also presents an example analysis comparing the lexical decision response times between compound words, pseudo-compound words, and monomorphemic words. Pseudo-compounds and monomorphemic words did not differ in response time, and both groups had slower response times than compound words. This analysis replicates the facilitatory effect of compound constituents during lexical processing, and demonstrates the need to emphasize the pseudo-constituent structure of pseudo-compounds to parse their effects. Further applications of LaDEP include both psycholinguistic studies investigating the nature of human word processing or production and educational or clinical settings evaluating the impact of linguistic features on language learning and impairments. Overall, the items within LaDEP provide a varied and representative sample of the population of English pseudo-compounds which may be used to facilitate further research related to morphological decomposition, lexical access, meaning construction, orthographical influences, and much more.

SEMANTICS AND SYNTAX IN POLYSEMOUS DENOMINAL VERBS: A CORPUS-BASED STUDY

Thesis

Full-text available

Apr 2022

Allen Minchun Hsiao

Denominal verbs (DNVs) are the verbs converted from nouns without affixation, as in to milk the cow. The source noun, milk, holds a role in the semantic structure of the denominal verb. And some DNVs may manifest multiple semantic structures, as an extraction event in to milk a cow and an addition event in to milk the coffee, as opposed to the one-to-one form-meaning mapping mostly assumed in the DNV literature. This thesis research explores a one-to-many association between a DNV and the multiple semantic structures it associates with, focusing on the semantic and syntactic structures of polysemous DNVs in American English. We identified polysemous DNVs from Rimell's (2012) collection of DNVs, observed language uses of the polysemous DNVs in an English corpus, COCA, and calculated the statistical measures to examine the constructional bias of each semantic class. Our findings show that polysemous DNVs account for almost one-third of the DNVs in American English, which mostly belong to two semantic classes. In particular, instrumental verbs tend to be one of the semantic classes associated with polysemous DNVs due to the versatility of instruments and its high productivity as a semantic class in American English. Lastly, constructional bias was found in each semantic class. This suggests that a DNV's syntactic structure is highly motivated by and reliant on its lexical semantics. We concluded that polysemous DNVs in American English are not uncommon and that a semantic class not only concerns the shared meaning between its DNV members but has constructional bias. Because of this, whenever a DNV obtains the membership of another semantic class, the DNV tends to alter its semantic and syntactic structure and manifest the one resembling other DNV members of the class.

Does MOED Rhyme with FRUIT? An event-related potential study of cross-language rhyming

Article

Apr 2023
NEUROREPORT

Mona Roxana Botezatu

The study aimed to characterize the event-related potentials signature elicited by visual rhyme judgements across two alphabetic orthographies that differ in depth (shallow: Dutch; deep: English) and by spelling-sound consistency in a deep L2-English orthography. Twenty-four Dutch-English bilinguals who varied on measures of L2-English proficiency, made rhyme judgments of semantically unrelated Dutch-English word pairs presented sequentially in the visual modality, while behavioral and electrophysiological responses were recorded. The spelling-sound consistency of target words was varied systematically. Nonrhyming targets elicited a larger N450 amplitude than rhyming targets, indicating sensitivity to mismatching phonology across languages. English target words with consistent spelling-sound mappings elicited less negative N250 amplitudes when preceded by rhyming Dutch primes. Overall, event-related potentials revealed robust responses to phonological mismatch, but subtle responses to spelling-sound inconsistency in L2-English. Results suggest that bilingual readers of a shallow L1 orthography who are immersed in an L1-speaking environment may not tune into the degree of spelling-sound consistency of a deep L2 orthography.

Modelling L1 and the artificial language during artificial language learning

Article

Full-text available

Apr 2023

Artificial language learning research has become a popular tool to investigate universal mechanisms in language learning. However, often it is unclear whether the found effects are due to learning, or due to artefacts of the native language or the artificial language, and whether findings in only one language will generalise to speakers of other languages. The present study offers a new approach to model the influence of both the L1 and the target artificial language on language learning. The idea is to control for linguistic factors of the artificial and the native language by incorporating measures of wordlikeness into the statistical analysis as covariates. To demonstrate the approach, we extend Linzen and Gallagher (2017)’s study on consonant identity pattern to evaluate whether speakers of German and Mandarin rapidly learn the pattern when influences of L1 and the artificial language are accounted for by incorporating measures assessed by analogical and discriminative learning models over the L1 and artificial lexicon. Results show that nonwords are more likely to be accepted as grammatical if they are more similar to the trained artificial lexicon and more different from the L1 and, crucially, the identity effect is still present. The proposed approach is helpful for designing cross-linguistic studies.

Productivity and semantic transparency: An exploration of word formation in Mandarin Chinese

Article

Full-text available

Mar 2023

We used word embeddings to study the relation between productivity and semantic transparency. We compiled a dataset with around 2700 two-syllable compounds that shared position-specific constituents (henceforth pivots) and some 1100 suffixed words. For each pivot and suffix, we calculated measures of productivity as well as measures of semantic transparency. For compounds, productivity ( P ) was negatively correlated with the number of types ( V ) and with the semantic similarity between non-pivot constituents and their compounds. Conversely, the greater semantic similarity of the pivot with either the compound or the non-pivot constituent predicted higher degrees of productivity. Visualization with t-SNE revealed clustering of suffixed words’ embeddings, but no by-pivot clustering for compounds, except for a minority of pivots whose regions in semantic space did not contain intruding unrelated compounds. A subset of these pivots was found to realize a fixed shift in semantic space from the base word to the corresponding compound, a property that also emerged for several suffixes. For these pivots, no correlation between P and V was present. Thus, Mandarin compounds appear to realize, at one extreme, motivated but unsystematic concept formation (where other pivots could just as well have been used), and at the other extreme, systematic suffix-like semantics.

Baseline Conceptual-Semantic Impairment Predicts Longitudinal Treatment Effects for Anomia in Primary Progressive Aphasia and Alzheimer's Disease

Article

Full-text available

Mar 2023
APHASIOLOGY

Background: An individual’s diagnostic subtype may fail to predict the efficacy of a given type of treatment for anomia. Classification by conceptual-semantic impairment may be more informative. Aims: This study examined the effects of conceptual-semantic impairment and diagnostic subtype on anomia treatment effects in primary progressive aphasia (PPA) and Alzheimer’s disease (AD). Methods & Procedures: At baseline, the picture and word versions of the Pyramids and Palm Trees and Kissing and Dancing tests were used to measure conceptual-semantic processing. Based on norming that was conducted with unimpaired older adults, participants were classified as being impaired on both the picture and word versions (i.e., modality-general conceptual-semantic impairment), the picture version (Objects or Actions) only (i.e., visual- conceptual impairment), the word version (Nouns or Verbs) only (i.e., lexical-semantic impairment), or neither the picture nor the word version (i.e., no impairment). Following baseline testing, a lexical treatment and a semantic treatment were administered to all participants. The treatment stimuli consisted of nouns and verbs that were consistently named correctly at baseline (Prophylaxis items) and/or nouns and verbs that were consistently named incorrectly at baseline (Remediation items). Naming accuracy was measured at baseline, and it was measured at three, seven, eleven, fourteen, eighteen, and twenty-one months. Outcomes & Results: Compared to baseline naming performance, lexical and semantic treatments both improved naming accuracy for treated Remediation nouns and verbs. For Prophylaxis items, lexical treatment was effective for both nouns and verbs, and semantic treatment was effective for verbs, but the pattern of results was different for nouns – the effect of semantic treatment was initially nonsignificant or marginally significant, but it was significant beginning at 11 Months, suggesting that the effects of prophylactic semantic treatment may become more apparent as the disorder progresses. Furthermore, the interaction between baseline Conceptual-Semantic Impairment and the Treatment Condition (Lexical vs. Semantic) was significant for verb Prophylaxis items at 3 and 18 Months, and it was significant for noun Prophylaxis items at 14 and 18 Months. Conclusions: The pattern of results suggested that individuals who have modality-general conceptual-semantic impairment at baseline are more likely to benefit from lexical treatment, while individuals who have unimpaired conceptual-semantic processing at baseline are more likely to benefit from semantic treatment as the disorder progresses. In contrast to conceptual-semantic impairment, diagnostic subtype did not typically predict the treatment effects.

The LexTALE as a measure of L2 global proficiency: A cautionary tale based on a partial replication of Lemhöfer and Broersma (2012)

Article

Jan 2023

The role of proficiency is widely discussed in multilingual language acquisition research, and yet, there is little consensus as to how one should operationalize it in our empirical investigations. The present study assesses the validity of the LexTALE (Lemhöfer & Broersma, 2012) as a ‘quick and valid’ measure of global proficiency. We first provide an overview review of how the LexTALE has been used since its publication, showing that although the test has gained popularity in the last few years, its reliability has not been thoroughly examined. Thus, herein we present results of a partial replication of Lemhöfer and Broersma (2012), where we empirically assess the validity of the LexTALE as a measure of L2 global proficiency in two groups of learners of English with various degrees of proficiency (L1 Spanish, n = 288; L1 Chinese, n = 266). Results indicate that if we are to use LexTALE in our investigations, we should do so with caution as the analyses show that irrespective of the L1 and level of proficiency of the targeted participants, its reliability as a measure of global proficiency is under question evidenced by the low and moderate correlations found with a standardised measure of global proficiency.

When a Sunny Day Gives You Butterflies: An Electrophysiological Investigation of Concreteness and Context Effects in Semantic Word Processing

Article

Nov 2022

Theories on controlled semantic cognition assume that word concreteness and linguistic context interact during semantic word processing. Methodological approaches and findings on how this interaction manifests at the electrophysiological and behavioral levels are heterogeneous. We measured ERPs and RTs applying a validated cueing paradigm with 19 healthy participants, who performed similarity judgments on concrete or abstract words (e.g., “butterfly” or “tolerance”) after reading contextual and irrelevant sentential cues. Data-driven analyses showed that concreteness increased and context decreased negative-going deflections in broadly distributed bilateral clusters covering the N400 and N700/late positive component time range, whereas both reduced RTs. Crucially, within a frontotemporal cluster in the N400 time range, contextual (vs. irrelevant) information reduced negative-going amplitudes in response to concrete but not abstract words, whereas a contextual cue reduced RTs only in response to abstract but not concrete words. The N400 amplitudes did not explain additional variance in the RT data, which showed a stronger contextual facilitation for abstract than concrete words. Our results support separate but interacting effects of concreteness and context on automatic and controlled stages of contextual semantic processing and suggest that effects on the electrophysiological versus behavioral level obtained with this paradigm are dissociated.

A Comprehensive Comparison of Neural Networks as Cognitive Models of Inflection

Preprint

Full-text available

Oct 2022

Neural networks have long been at the center of a debate around the cognitive mechanism by which humans process inflectional morphology. This debate has gravitated into NLP by way of the question: Are neural networks a feasible account for human behavior in morphological inflection? We address that question by measuring the correlation between human judgments and neural network probabilities for unknown word inflections. We test a larger range of architectures than previously studied on two important tasks for the cognitive processing debate: English past tense, and German number inflection. We find evidence that the Transformer may be a better account of human behavior than LSTMs on these datasets, and that LSTM features known to increase inflection accuracy do not always result in more human-like behavior.

An Eye Opener on the Use of Machine Learning in Eye Movement Based Authentication

Poster

Full-text available

Jun 2022

The viability and need for eye movement-based authentication has been well established in light of the recent adoption of Virtual Reality headsets and Augmented Reality glasses. Previous research has demonstrated the practicality of eye movement-based authenti-cation, but there still remains space for improvement in achieving higher identification accuracy. In this study, we focus on incorporating linguistic features in eye movement based authentication, and we compare our approach to authentication based purely on common first-order metrics across 9 machine learning models. Using GazeBase, a large eye movement dataset with 322 participants, and the CELEX lexical database, we show that AdaBoost classifier is the best performing model with an average F1 score of 74.6%. More importantly , we show that the use of linguistic features increased the accuracy of most classification models. Our results provide insights on the use of machine learning models, and motivate more work on incorporating text analysis in eye movement based authentication.

Acquisition of novel word meaning via cross situational word learning: An event-related potential study

Article

Jun 2022
BRAIN LANG

Cross-situational statistical word learning (CSWL) refers to the process whereby participants learn new words by tracking ambiguous word-object co-occurrences across time. This study used event-related potentials to explore the acquisition of novel word meanings via CSWL in healthy adults. After learning to associate novel auditory words (e.g., ‘ket’) with familiar objects (e.g., sword), participants performed a semantic judgement task where the learned novel words were paired with a familiar word belonging to either the same (e.g., dagger) or a different (e.g., harp) semantic category. As a comparison, the task also included word pairs comprising two familiar words. The analyses revealed that the unrelated novel word pairs elicited a similar N400 to that of the unrelated familiar word pairs, but with a different hemispheric distribution (left hemisphere for novel words, right hemisphere for familiar words). These findings demonstrate rapid meaning acquisition via CSWL, which is reflected at a neurophysiological level.

The recognition of spoken pseudowords

Article

Mar 2022

Pseudowords are used as stimuli in many psycholinguistic experiments yet they, remain largely under-researched. To better understand the cognitive processing of pseudowords, we analysed the pseudoword responses in the Massive Auditory Lexical Decision megastudy data set. Linguistic characteristics that influence the processing of real English words–namely, phonotactic probability, phonological neighbourhood density, uniqueness point, and morphological complexity–were also found to influence the processing time of spoken pseudowords. Subsequently, we analysed how the linguistic characteristics of non-unique portions of pseudowords influenced processing time. We again found that the named linguistic characteristics affected processing time, highlighting the dynamicity of activation and competition. We argue these findings also speak to learning new words and spoken word recognition generally. We then discuss what aspects of pseudoword recognition a full model of spoken word recognition must account for. We finish with a re-description of the auditory lexical decision task in light of our results.

The Assessment of Chinese Children’s English Vocabulary—A Culturally Appropriate Receptive Vocabulary Test for Young Chinese Learners of English

Article

Full-text available

Mar 2022

Millions of Chinese children learn English at increasingly younger ages. Yet when it comes to measuring proficiency, educators, and researchers rely on assessments that have been developed for L1 learners and/or for different cultural contexts, or on non-validated, individually designed tests. We developed the Assessment of Chinese Children’s English Vocabulary test (ACCE-V) to address the need for a validated, culturally appropriate receptive vocabulary test, designed specifically for young Chinese learners. The items are drawn from current teaching materials used in China, and the depictions of people and objects are culturally appropriate. We evaluated the instrument’s reliability and validity in two field tests with a combined sample size of 1,092 children (181 children for the first field test and 911 children for the second field test, age range from 3.1 to 7.7, mean age: 5.2. Item Response Theory (IRT) analyses show that the ACCE-V is sufficiently sensitive to capture different proficiency levels and that it has good psychometric properties. ACCE-V scores were correlated with Peabody Picture Vocabulary Test scores, indicating concurrent validity. We found that children’s age and English learning experience can significantly predict the scores of ACCE-V, but the effect of English learning experience is greater. The ACCE-V thus offers an alternative to existing vocabulary tests. We argue that culturally appropriate assessments like the ACCE-V are fairer to learners and help promote an English learning and teaching environment that is less dominated by Western cultures and native speaker norms.

Phonological and semantic similarity of misperceived words in babble: Effects of sentence context, age, and hearing loss

Article

Jan 2022

This study investigated how age and hearing loss influence the misperceptions made when listening to sentences in babble. Open-set responses to final words in sentences with low and high context were analyzed for younger adults with normal hearing and older adults with normal or impaired hearing. All groups performed similarly in overall accuracy but differed in error type. Misperceptions for all groups were analyzed according to phonological and semantic properties. Comparisons between groups indicated that misperceptions for older adults were more influenced by phonological factors. Furthermore, older adults with hearing loss omitted more responses. Overall, across all groups, results suggest that phonological confusions most explain misperceptions in low context sentences. In high context sentences, the meaningful sentence context appears to provide predictive cues that reduce misperceptions. When misperceptions do occur, responses tend to have greater semantic similarity and lesser phonological similarity to the target, compared to low context sentences. In this way, semantic similarity may index a postdictive process by which ambiguities due to phonological confusions are resolved to conform to the semantic context of the sentence. These patterns demonstrate that context, age, and hearing loss affect the misperceptions, and potential sentence interpretation, made when listening to sentences in babble.

The influence of romanizing a non-alphabetic L1 on L2 reading: the case of Hindi-English visual word recognition

Article

Full-text available

Jun 2022
READ WRIT

The romanization of non-alphabetic scripts, particularly in digital contexts, is a widespread phenomenon across many languages. However, the effect of script romanization on English reading by bilinguals with English as a second language is underexamined. Guided by the premises of the script relativity hypothesis and the Bilingual Interactive Activation (BIA+) model, we examined differences in phonological activation during visual English word recognition by Hindi-English bilinguals after they were primed with interlingual homophones in Devanagari (traditional Hindi script) and Romanagari (romanized Hindi script). We also explored the specific roles played by diacritic markers and individual language proficiencies. Linear mixed-effects and regression modeling showed that participants were faster at English word recognition when primed by interlingual homophones in Romanagari than in Devanagari. Further, words with diacritics led to faster English word recognition than words without diacritics with both scripts. This was unexpected since Romanagari does not mark diacritics. Finally, lexical proficiency in English and Devanagari explained variance in phonological priming effects. The findings provide evidence that adopting an additional L1 script might reconfigure the architecture of the bilingual lexicon. Our results support the view that script differences play a critical role in language processing.

Proceedings of the Third International Workshop on Resources and Tools for Derivational Morphology (DeriMo 2021)

Book

Full-text available

Sep 2021

On Pronunciations inWiktionary: Extraction and Experiments on Multilingual Syllabification and Stress Prediction

Conference Paper

Jan 2021

On Homophony and Rényi Entropy

Conference Paper

Jan 2021

Remote Testing of the Familiar Word Effect With Non-dialectal and Dialectal German-Learning 1–2-Year-Olds

Article

Full-text available

Dec 2021

Variability is pervasive in spoken language, in particular if one is exposed to two varieties of the same language (e.g., the standard variety and a dialect). Unlike in bilingual settings, standard and dialectal forms are often phonologically related, increasing the variability in word forms (e.g., German Fuß “foot” is produced as [fus] in Standard German and as [fs] in the Alemannic dialect). We investigate whether dialectal variability in children’s input affects their ability to recognize words in Standard German, testing non-dialectal vs. dialectal children. Non-dialectal children, who typically grow up in urban areas, mostly hear Standard German forms, and hence encounter little segmental variability in their input. Dialectal children in turn, who typically grow up in rural areas, hear both Standard German and dialectal forms, and are hence exposed to a large amount of variability in their input. We employ the familiar word paradigm for German children aged 12–18 months. Since dialectal children from rural areas are hard to recruit for laboratory studies, we programmed an App that allows all parents to test their children at home. Looking times to familiar vs. non-familiar words were analyzed using a semi-automatic procedure based on neural networks. Our results replicate the familiarity preference for non-dialectal German 12–18-month-old children (longer looking times to familiar words than vs. non-familiar words). Non-dialectal children in the same age range, on the other hand, showed a novelty preference. One explanation for the novelty preference in dialectal children may be more mature linguistic processing, caused by more variability of word forms in the input. This linguistic maturation hypothesis is addressed in Experiment 2, in which we tested older children (18–24-month-olds). These children, who are not exposed to dialectal forms, also showed a novelty preference. Taken together, our findings show that both dialectal and non-dialectal German children recognized the familiar Standard German word forms, but their looking pattern differed as a function of the variability in the input. Frequent exposure to both dialectal and Standard German word forms may hence have affected the nature of (prelexical and/or) lexical representations, leading to more mature processing capacities.

The family size effect in visual and auditory word recognition

Article

May 2024

An Evolutionary Approach to Motivation and Learning: Differentiating Biologically Primary and Secondary Knowledge

Article

Full-text available

Apr 2024
EDUC PSYCHOL REV

The ubiquity of formal education in modern nations is often accompanied by an assumption that students’ motivation for learning is innate and self-sustaining. The latter is true for most children in domains (e.g., language) that are universal and have a deep evolutionary history, but this does not extend to learning in evolutionarily novel domains (e.g., mathematics). Learning in evolutionarily novel domains requires more cognitive effort and thus is less motivating. The current study tested the associated hypothesis that learning will feel easier and more motivating for evolutionarily relevant (e.g., “mother,” “food”) than evolutionarily novel (e.g., “computer,” “gravity”) word pairs and that a growth mindset emphasizing the importance of effort in learning might moderate this effect. Specifically, 144 adults were presented with 32 word pairs (half evolutionarily relevant and half evolutionarily novel) and were randomly assigned to a growth mindset or a control condition. Evolutionarily relevant words were better remembered than evolutionarily novel words (d = 0.65), and the learning was reported as more enjoyable (d = 0.49), more interesting (d = 0.38), as well as less difficult (d = − 0.96) and effortful (d = − 0.78). Although the growth mindset intervention fostered a mindset belief, compared to the control condition, it did not lead to improved recall performance or changes in motivational beliefs. These results are consistent with the prediction of higher motivation and better learning of evolutionarily relevant words and concepts than for evolutionarily novel words and concepts. Implications for future research and educational practice are discussed.

Predicting the difficulty of EFL reading comprehension tests based on linguistic indices

Article

Full-text available

Dec 2023

Estimating the difficulty of reading tests is critical in second language education and assessment. This study was aimed at examining various text features that might influence the difficulty level of a high-stakes reading comprehension test and predict test takers’ scores. To this end, the responses provided by 17,900 test takers on the reading comprehension subsection of a major high-stakes test, the Iranian National University Entrance Exam for the Master’s Program were examined. Overall, 63 reading passages in different versions of the test from 2017 to 2019 were studied with a focus on 16 indices that might help explain the reading difficulty and test takers’ scores. The results showed that the content word overlap index and the Flesch-Kincaid Reading Ease formula had significant correlations with the observed difficulty and could therefore be considered better predictors of test difficulty compared to other variables. The findings suggest the use of various indices to estimate the reading difficulty before administering tests to ensure the equivalency and validity of tests.

Long-form analogies generated by chatGPT lack human-like psycholinguistic properties

Preprint

Full-text available

Jun 2023

Psycholinguistic analyses provide a means of evaluating large language model (LLM) output and making systematic comparisons to human-generated text. These methods can be used to characterize the psycholinguistic properties of LLM output and illustrate areas where LLMs fall short in comparison to human-generated text. In this work, we apply psycholinguistic methods to evaluate individual sentences from long-form analogies about biochemical concepts. We compare analogies generated by human subjects enrolled in introductory biochemistry courses to analogies generated by chatGPT. We perform a supervised classification analysis using 78 features extracted from Coh-metrix that analyze text cohesion, language, and readability (Graesser et. al., 2004). Results illustrate high performance for classifying student-generated and chatGPT-generated analogies. To evaluate which features contribute most to model performance, we use a hierarchical clustering approach. Results from this analysis illustrate several linguistic differences between the two sources.

EXPRESS: UniPseudo : A Universal Pseudoword Generator

Article

Mar 2023
Q J EXP PSYCHOL

Pseudowords are letter strings that look like words but are not words. They are used in psycholinguistic research, particularly in tasks such as lexical decision. In this context, it is essential that the pseudowords respect the orthographic statistics of the target language. Pseudowords that violate them would be too easy to reject in a lexical decision and would not enforce word recognition on real words. We propose a new pseudoword generator, UniPseudo, using an algorithm based on Markov chains of orthographic n-grams. It generates pseudowords from a customizable database, which allows one to control the characteristics of the items. It can produce pseudowords in any language, in orthographic or phonological form. It is possible to generate pseudowords with specific characteristics, such as frequency of letters, bigrams, trigrams or quadrigrams, number of syllables, frequency of biphones, and number of morphemes. Thus, from a list of words composed of verbs, nouns, adjectives, or adverbs, UniPseudo can create pseudowords resembling verbs, nouns, adjectives, or adverbs in any language using an alphabetic or syllabic system.

References

Article

May 2022

Morphological structures interact dynamically with lexical processing and storage, with the parameters of morphological typology being partly dependent on cognitive pathways for processing, storage and generalization of word structure, and vice versa. Bringing together a team of well-known scholars, this book examines the relationship between linguistic cognition and the morphological diversity found in the world's languages. It includes research from across linguistic and cognitive science sub-disciplines that looks at the nature of typological diversity and its relationship to cognition, touching on concepts such as complexity, interconnectedness within systems, and emergent organization. Chapters employ experimental, computational, corpus-based and theoretical methods to examine specific morphological phenomena, and an overview chapter provides a synthesis of major research trends, contextualizing work from different methodological and philosophical perspectives. Offering a novel perspective on how cognition contributes to our understanding of word structure, it is essential reading for psycholinguists, theoreticians, typologists, computational modelers and cognitive scientists.

Word learning in the context of semantic prior knowledge: evidence of interference from feature-based neighbours in children and adults

Article

Full-text available

Nov 2022

The presence of phonological neighbours facilitates word-form learning, suggesting that prior phonological knowledge supports vocabulary acquisition. We tested whether prior semantic knowledge similarly benefits word learning by teaching 7-to-10-year-old children (Experiment 1) and adults (Experiment 2) pseudowords assigned to novel concepts with low or high semantic neighbourhood density according to feature norms. Form recall, definition recall, and semantic categorisation tasks were administered immediately after training, the next day, and one week later. Across sessions, pseudowords assigned to low-density (versus high-density) semantic neighbourhood concepts elicited better word-form recall (for adults) and better meaning recall (for children). Exploratory cross-experiment analyses demonstrated that the neighbourhood influence was most robust for recalling meanings. Children showed greater gains in form recall than adults across the week, regardless of links to semantic knowledge. While the results suggest that close semantic neighbours interfere with word learning, we consider alternative semantic dimensions that may be beneficial.

17 - Evidence for Stress and Metrical Structure in Chinese

Article

Aug 2022

There is a common view that English has word stress but Chinese does not. I examine perceived stress in disyllabic lexical entries and show two similarities between the languages: (i) when both syllables carry a designated tone, such as such as bamboo or Red Cross in English, or 北京 Beijing ‘Beijing’ in Chinese, main stress is unclear to native speakers; and (ii) when just one syllable has a designated tone, such as yoga, magpie, or about in English, or 爸_爸 ba_ba [paː][pə] ‘pa_pa (papa)’ in Chinese, it is clearly perceived to carry main stress. However, case (ii) covers 86% of disyllabic entries in English but just 5% in Chinese. The difference is attributable to the independent fact that Chinese is a tone language, in which syllables with secondary stress also carry a designated tone, whereas in English they usually do not. I also show that English and Chinese share two further similarities: First, stressed and unstressed syllables are acoustically different, and second, stress plays other phonological roles, such as phrasal stress, contrastive stress, and meter in poetry.

Part Three - Phonetic-phonological Issues in Chinese

Article

Aug 2022

The linguistic study of Chinese, with its rich morphological, syntactic and prosodic/tonal structures, its complex writing system, and its diverse socio-historical background, is already a long-established and vast research area. With contributions from internationally renowned experts in the field, this Handbook provides a state-of-the-art survey of the central issues in Chinese linguistics. Chapters are divided into four thematic areas: writing systems and the neuro-cognitive processing of Chinese, morpho-lexical structures, phonetic and phonological characteristics, and issues in syntax, semantics, pragmatics, and discourse. By following a context-driven approach, it shows how theoretical issues in Chinese linguistics can be resolved with empirical evidence and argumentation, and provides a range of different perspectives. Its dialectical design sets a state-of-the-art benchmark for research in a wide range of interdisciplinary and cross-lingual studies involving the Chinese language. It is an essential resource for students and researchers wishing to explore the fascinating field of Chinese linguistics.

Predicting the Difficulty of EFL Reading Comprehension Tests Based on Linguistic Indices

Preprint

Full-text available

Oct 2022

Estimating the difficulty of reading texts is critical in second language education and assessment. This study was aimed at examining various text features that might influence the difficulty level of a high-stakes reading comprehension test and predict test takers’ scores. To this end, the responses provided by 17900 test takers on the reading comprehension subsection of a major high-stakes test, the Iranian National University Entrance Exam for the Master’s Program were examined. Overall, 63 reading passages in different versions of the test from 2017-2019 were studied with a focus on 16 indices that might help explain the reading difficulty and test takers’ scores. The results showed that the content word overlap index and the Flesch-Kincaid Reading Ease formula had significant correlations with the observed difficulty and could therefore be considered better predictors of test difficulty compared to other variables. The findings suggest the use of various indices to estimate the reading difficulty before administering tests to ensure the equivalency and validity of tests.

Too Little Morphology Can Kill You: The Interplay Between Low-Frequency Morpho-Orthographic Rules and High-Frequency Verb Homophones in Spelling Errors

Chapter

Oct 2022

Dominiek Sandra

Many orthographies represent the morphological structure of words, i.e., keep the spelling of a morpheme constant despite variability in pronunciation (e.g., cats, dogs). Experimental work strongly suggests that this structure plays a beneficial role in both visual word recognition and spelling. Readers apparently decompose words into their constituent morphemes for the sake of lexical access. Moreover, early on, spellers rely on a word’s morphological structure to derive its spelling (e.g., picked, called). However, morphologically complex words can also be a spelling hurdle, more particularly, when different morphological structures yield different spellings (i.e., morpho-orthographic representations) with the same pronunciation, i.e., grammatical homophones. The error risk on these homophones is codetermined by the token frequencies of the homophones, the rule’s type frequency, and properties of working-memory. The focus in this chapter is on a salient error type in the spelling of Dutch verb homophones but is extended to other languages as well.KeywordsHomophonesHomophone intrusionsMorphological decompositionMorphological awarenessSpelling errorsRule frequencyHomophone dominanceWorking memory

The role of the basal ganglia and cerebellum in adaptation to others’ speech rate and rhythm: A study of patients with Parkinson’s disease and cerebellar degeneration

Article

Sep 2022
CORTEX

Background Spoken language is constantly undergoing change: Speakers within and across social and regional groups influence each other’s speech, leading to the emergence and drifts of accents in a language. These processes are driven by mutual unintentional imitation of the phonetic details of others' speech in conversational interactions, suggesting that continuous auditory-motor adaptation takes place in interactive language use and plasticity of auditory-motor representations of speech persists across the lifespan. The brain mechanisms underlying this large-scale social-linguistic behaviour are still poorly understood. Research aim To investigate the role of cerebellar and basal ganglia dysfunctions in unintended adaptation to the speech rhythm and articulation rate of a second speaker. Methods Twelve patients with spinocerebellar ataxia type 6 (SCA6), 15 patients with Parkinson’s disease (PD), and 27 neurologically healthy controls (CTRL) participated in two interactive speech tasks, i.e., sentence repetition and “turn-taking” (i.e., dyadic interaction with sentences produced by a model speaker). Production of scripted sentences was used as a control task. Two types of sentence rhythm were distinguished, i.e., regular and irregular, and model speech rate was manipulated in 12 steps between 2.9 and 4.0 syllables per second. Acoustic analyses of the participants’ utterances were performed to determine the extent to which participants adapted their speech rate and rhythm to the model. Results Neurologically healthy speakers showed significant adaptation of rate in all conditions, and of rhythm in the repetition task and partly also the turn-taking task. Patients with PD showed a stronger propensity to adapt than the controls. In contrast, the patients with cerebellar degeneration were largely insensitive to the model speaker’s rate and rhythm. Contrary to expectations, sentences with an irregular speech rhythm exerted a stronger adaptive attraction than regular sentences in the two patient groups. Conclusions Cerebellar degeneration inhibits the propensity to covertly adapt to others’ speech. Striatal dysfunction in Parkinson’s disease spares or even promotes the tendency to accommodate to other speakers’ speech rate and rhythm.

Experimental explorations of a discrimination learning approach to language processing

Thesis

Full-text available

Jan 2016

Peter Hendrix

From decomposition to distributed theories of morphological processing in reading

Article

May 2022

The morphological structure of complex words impacts how they are processed during visual word recognition. This impact varies over the course of reading acquisition and for different languages and writing systems. Many theories of morphological processing rely on a decomposition mechanism, in which words are decomposed into explicit representations of their constituent morphemes. In distributed accounts, in contrast, morphological sensitivity arises from the tuning of finer-grained representations to useful statistical regularities in the form-to-meaning mapping, without the need for explicit morpheme representations. In this theoretically guided review, we summarize research into the mechanisms of morphological processing, and discuss findings within the context of decomposition and distributed accounts. Although many findings fit within a decomposition model of morphological processing, we suggest that the full range of results is more naturally explained by a distributed approach, and discuss additional benefits of adopting this perspective.

Style, Content, and the Success of Ideas

Preprint

Full-text available

Jan 2022

Why do some things succeed in the marketplace of ideas? While some argue that content drives success, others suggest that style, or the way ideas are presented, also plays an important role. To provide a stringent test of style's importance, we examine it in a context where content should be paramount: academic research. While scientists often see writing as a disinterested way to communicate unobstructed truth, a multi-method investigation indicates that writing style shapes impact. Separating style from content can be difficult as papers that tend to use certain language may also write about certain topics. Consequently, we focus on a unique class of words linked to style (i.e., function words such as "and," "the," and "on") that are completely devoid of content. Natural language processing of almost 30,000 articles from a range of disciplines finds that function words explain 13-27% of language's impact on citations. Ancillary analyses explore specific categories of function words to suggest how style matters, highlighting the role of writing simplicity, personal voice, and temporal perspective. Experiments further underscore the causal impact of style. The results suggest how to boost communication's impact and highlight the value of natural language processing for understanding the success of ideas.

Du campus au jardin : estimations de fréquence subjective auprès d’adultes jeunes et âgés pour 660 mots de la langue française

Article

Jun 2012

Résumé Cet article présente des normes de fréquence subjective pour 660 mots de la langue française recueillies auprès d’adultes jeunes (M = 22,6 ans) et âgés (M = 71,2 ans). La fréquence subjective a été évaluée en utilisant une échelle en 7 points, allant de « jamais rencontré » à « rencontré plusieurs fois par jour ». Les analyses montrent que les estimations sont fidèles pour les 2 groupes d’âge. Les corrélations avec les données issues d’études similaires sont positives et significatives. Par ailleurs, la fréquence subjective corrèle (0,42 à 0,65) avec différents indicateurs de fréquence objective issus de Lexique 3,55 (New et al ., 2007). Des analyses de régression indiquent que la fréquence subjective des jeunes adultes est le meilleur prédicteur des performances de décision lexicale d’une population jeune (French Lexicon Project, Ferrand et al ., 2010). Enfin, les données indiquent des différences intergénérationnelles dans les estimations pour 24 % des mots. Cette norme, accessible gratuitement ( http://www.labopsycho-u-bordeaux2.fr/psycogni/equipe/cognitive/publis.php?login=robert ), propose un nouvel outil aux chercheurs afin de sélectionner le matériel lexical de langue française utilisé pour étudier les effets liés à l’âge sur le fonctionnement cognitif.

The CELEX Lexical Database (Release 2) [CD-ROM

Abstract

No file available

Recommended publications

Radial Limiting Behaviour of Harmonic and Super-Harmonic Functions

Screening for specific developmental disorders of speech and language: Evidence and challenges

Residual stress measurement in an extra thick multi-pass weld using initial stress integrated inhere...

Joint MMBI, SSC RAS and NODC NOAA approach to oceanographic and hydro-biological database organizati...