Figure 1 - available from: Scientific Reports
This content is subject to copyright. Terms and conditions apply.
Experimental designs: (A) The human Predictability was estimated from the online responses of several participants to a web cloze-task experiment. Each participant had to complete one of every 30 words, and the text was uncovered as they responded. (B) Eye movements were recorded in separate participants that read three of the eight texts in the lab. The eye movement measures (Gaze duration) were analyzed using Linear Mixed Models. (C) Computational algorithms were trained on a large corpus of texts from a similar domain as the tested short stories (A,B). Image sources (B) R project (https://www.r-project.org/logo/, The R Foundation, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0/, no changes made).

Experimental designs: (A) The human Predictability was estimated from the online responses of several participants to a web cloze-task experiment. Each participant had to complete one of every 30 words, and the text was uncovered as they responded. (B) Eye movements were recorded in separate participants that read three of the eight texts in the lab. The eye movement measures (Gaze duration) were analyzed using Linear Mixed Models. (C) Computational algorithms were trained on a large corpus of texts from a similar domain as the tested short stories (A,B). Image sources (B) R project (https://www.r-project.org/logo/, The R Foundation, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0/, no changes made).

Source publication
Article
Full-text available
When we read printed text, we are continuously predicting upcoming words to integrate information and guide future eye movements. Thus, the Predictability of a given word has become one of the most important variables when explaining human behaviour and information processing during reading. In parallel, the Natural Language Processing (NLP) field...

Contexts in source publication

Context 1
... was first estimated by humans' responses to a cloze-task. Approximately 2500 participants read 1-8 texts (mean = 1.92) and completed approximately 300 words out of 26366 unique words, where each participant completed one every 30 words in an online platform (Fig. 1A). Correlations between (logit) cloze-Predictability with the repetition number, the (log) frequency in a corpus, and the (inverse of) word length (Fig. 2) showed the expected behaviour (i.e., the more frequent, the shorter, and the more repeated the words were inside the text, the more predictable the words were) 19 ...
Context 2
... separate set of 36 participants performed an eye-tracking experiment in the lab. Each participant read three of the eight texts. Texts were assigned to participants pseudo-randomly (Fig. 1B). Finally, we trained different computational models drawn from the Natural Language Processing (NLP) field in a larger corpus. This corpus was also composed of short stories in Spanish. The original stories were not contained in the larger corpus (Fig. ...
Context 3
... read three of the eight texts. Texts were assigned to participants pseudo-randomly (Fig. 1B). Finally, we trained different computational models drawn from the Natural Language Processing (NLP) field in a larger corpus. This corpus was also composed of short stories in Spanish. The original stories were not contained in the larger corpus (Fig. ...
Context 4
... estimations for predictability were evaluated one at a time by replacing the cloze-Predictability (M2.N to M4.N). We first explored the parameter space for the N-gram+cache predictabilities, and we decided to use N = 4, δ = 0.00015 and λ = 0.15 (see Supplementary Fig. S1). The resulting co-variable was included in the model (M2.N, N-gram+cache model), which showed a very significant contribution (Fig. 3A, Supplementary Table S1). ...
Context 5
... a model that summed all these results was implemented, which used the N-gram+cache for the fixated word and CS-FT for N + 1 (M9 .N + 1, Fig. 4A, Supplementary Table S2). This model resulted in an AIC close to the best of all the explored models with only two co-variables included over the baseline model (Fig. 4C) and significantly better than M0.N + 1 and M1.N + 1 (Fig. 4D). ...
Context 6
... that developed at the level of the integration of new information with information from the beginning of the text. We hypothesized that these differences, both in the estimation of Predictability and the in eye movements, were the reason for the negative relation between cloze-Predictability of the following word (N + 1) and GD on word N (Fig. 4, M1.N + 1). This negative relation was found previously only in Chinese sentence-reading 26 , but not in German or Spanish sentence-reading 10,25 ...
Context 7
... we analyzed computer estimations of word Predictability with four different algorithms: N-gram, LSA, Word2Vec, and FastText. A 4-gram model was used with the addition of the local word frequency (see Supplementary Fig. S1). LSA, Word2Vec, and FastText were studied using 300 dimensions and average Cosine Similarity (CS) with the previous words (context) as a proxy for Predictability, which used different context sizes (see Supplementary Fig. S2). ...
Context 8
... estimation of the impact of these algorithms was analyzed using Linear Mixed Models (LMMs) and the Gaze Duration as the dependant variable (Fig. 3, Supplementary Table S1). The results of each of these computer-based Predictabilities on the gaze models clearly showed that the one that best explained eye movements was the N-gram+cache, even though it generated a large decrease in the frequency effect, presumably because of the high correlation between these two variables. In comparison with ...
Context 9
... N-gram probability for each word in the stories from the Buenos Aires Corpus was calculated using the SRILM package (http://www.speech.sri.com/projects/srilm/). The window used to determine the context (N) was optimized using the correlation with the cloze-Predictability (Supplementary Fig. S1A). The optimal value for N was 4, after which the curve showed a plateau, which indicated that long chains of words did not appear in our training corpus. ...
Context 10
... δ and λ parameters were optimized for the 4-gram probabilities. We performed a grid search between δ = [0, 000050, 0005] and λ = [0,050,6], measuring the t-value of the 4-gram+cache variable in the M2.N model (Supplementary Fig. 1B). We kept the values of δ and λ with the maximum absolute t-value (δ = 0, 00015 and λ = 0, 15). ...

Citations

... Accordingly, the cost of processing a time-series of symbols is a function of how predictable the series is, given the context in which it appears (Levy, 2008). For example, predictability has been defined in the reading literature as the probability of knowing a word before reading it, and it has been used to understand the variation of gaze duration over words in eye tracking experiments (Bianchi et al., 2020;Kliegl et al., 2006;Rayner, 1998). ...
... To thoroughly address these questions, we measured how easily a trained TSOM can predict an input form, by showing the entire form to the map one letter at a time from '#' to '$'. The idea, borrowed from the literature on word/sentence reading (Bianchi et al., 2020;Kliegl et al., 2006;Rayner, 1998), is that the predictability of an input form correlates inversely with the serial processing cost of the form. Put simply, highly predictable words are easier to process than hardly predictable words. ...
Article
Full-text available
Over the last decades, several independent lines of research in morphology have questioned the hypothesis of a direct correspondence between sublexical units and their mental correlates. Word and paradigm models of morphology shifted the fundamental part-whole relation in an inflection system onto the relation between individual inflected word forms and inflectional paradigms. In turn, the use of artificial neural networks of densely interconnected parallel processing nodes for morphology learning marked a radical departure from a morpheme-based view of the mental lexicon. Lately, in computational models of Discriminative Learning, a network architecture has been combined with an uncertainty reducing mechanism that dispenses with the need for a one-to-one association between formal contrasts and meanings, leading to the dissolution of a discrete notion of the morpheme. The paper capitalises on these converging lines of development to offer a unifying information-theoretical, simulation-based analysis of the costs incurred in processing (ir)regularly inflected forms belonging to the verb systems of English, German, French, Spanish and Italian. Using Temporal Self-Organising Maps as a computational model of lexical storage and access, we show that a discriminative, recurrent neural network, based on Rescorla-Wagner’s equations, can replicate speakers’ exquisite sensitivity to widespread effects of word frequency, paradigm entropy and morphological (ir)regularity in lexical processing. The evidence suggests an explanatory hypothesis linking Word and paradigm morphology with principles of information theory and human perception of morphological structure. According to this hypothesis, the ways more or less regularly inflected words are structured in the mental lexicon are more related to a reduction in processing uncertainty and maximisation of predictive efficiency than to economy of storage.
... Whereas these results demonstrate the importance of predictability, they used only two or three words of context, and did not demonstrate a simple method for generating predictabilities for any text. More current models including 4-grams and word embeddings like FastText were used by [Bianchi et al., 2020] who showed that especially the 4-gram model could explain a considerable share of the variance in fixation durations. Going a step further, [Umfurer et al., 2021] used an LSTM-based recurrent neural network model trained on the Spanish Wikipedia, and finetuned it on a set of stories to explain eye movements. ...
... A more negative AIC indicates a better fit. Following an approach described in [Bianchi et al., 2020], we analyzed residuals of models including computational predictabilities with linear models that included cloze predictability as a single predictor in order to assess the degree to which computational predictabilities are redundant with cloze predictabilities. If the effect of cloze predictabilities on residuals is considerably smaller than its effect on the original data, we conclude that computational and cloze predictabilities are largely redundant. ...
... In order to assess to which degree computational predictabilities play a similar role as cloze predictabilities in influencing single fixation durations, we compared the cloze probability effect before and after taking computational predictabilities into account and examined how the t-values changed. Following [Bianchi et al., 2020], for models with cloze predictability data, we fitted linear models with current word cloze predictability as the single predictor to the residuals of the LMM including computational predictability. Table 4 shows a comparison of model coefficients and their associated t-values. ...
Conference Paper
Full-text available
A long tradition in eye movement research has focused on three linguistic variables explaining fixation durations during sentence reading: word length, frequency, and predictability. Lengths and frequencies are easily obtainable but predictabilities are tedious to collect, requiring the incremental cloze procedure. Modern large language models are trained using the objective of predicting the next word given previous context, hence they readily provide predictability information. This capability has largely been overlooked in eye movement research. Here we investigate the suitability of a synthetic predictability measure, extracted from pretrained GPT-2 models, as a surrogate for cloze predictability. Using several published eye movement corpora, we find that synthetic and cloze predictabilities are highly correlated, and that their influence on eye movements is qualitatively similar. Similar patterns are obtained when including synthetic predictabilities in data sets lacking cloze predictabilities. In conclusion, synthetic predictabilities can serve as a substitute for empirical cloze predictabilities.
... In general, algorithms need to convert raw text inputs into numerical representations (word embeddings) through a process known as language modeling, which forms the basis for knowledge distillation [3]. Thus, several modelling approaches have been designed, such as Deep Learning-based models. ...
... In a recent study, Bianchi et al. (2020) contrasted the GD predictions of an n-gram model with the predictions of an LSA-based match of the current word with the preceding nine words during discourse reading. They found that LSA did not provide effects over and above the n-gram model. ...
... In sum, the literature on document-level semantics presently provides no consistent picture. Long-range semantic effects might be comparably small (e.g., Hofmann et al., 2017), but they may be more likely to deliver consistent results when the analysis is not constrained to the long-range contextual match of the present, but also of other words (Bianchi et al., 2020). A more consistent picture might emerge, when also shortrange predictability is considered, as reflected e.g., in n-gram models (Wang et al., 2010;Bianchi et al., 2020). ...
... Long-range semantic effects might be comparably small (e.g., Hofmann et al., 2017), but they may be more likely to deliver consistent results when the analysis is not constrained to the long-range contextual match of the present, but also of other words (Bianchi et al., 2020). A more consistent picture might emerge, when also shortrange predictability is considered, as reflected e.g., in n-gram models (Wang et al., 2010;Bianchi et al., 2020). ...
Article
Full-text available
Though there is a strong consensus that word length and frequency are the most important single-word features determining visual-orthographic access to the mental lexicon, there is less agreement as how to best capture syntactic and semantic factors. The traditional approach in cognitive reading research assumes that word predictability from sentence context is best captured by cloze completion probability (CCP) derived from human performance data. We review recent research suggesting that probabilistic language models provide deeper explanations for syntactic and semantic effects than CCP. Then we compare CCP with three probabilistic language models for predicting word viewing times in an English and a German eye tracking sample: (1) Symbolic n-gram models consolidate syntactic and semantic short-range relations by computing the probability of a word to occur, given two preceding words. (2) Topic models rely on subsymbolic representations to capture long-range semantic similarity by word co-occurrence counts in documents. (3) In recurrent neural networks (RNNs), the subsymbolic units are trained to predict the next word, given all preceding words in the sentences. To examine lexical retrieval, these models were used to predict single fixation durations and gaze durations to capture rapidly successful and standard lexical access, and total viewing time to capture late semantic integration. The linear item-level analyses showed greater correlations of all language models with all eye-movement measures than CCP. Then we examined non-linear relations between the different types of predictability and the reading times using generalized additive models. N-gram and RNN probabilities of the present word more consistently predicted reading performance compared with topic models or CCP. For the effects of last-word probability on current-word viewing times, we obtained the best results with n-gram models. Such count-based models seem to best capture short-range access that is still underway when the eyes move on to the subsequent word. The prediction-trained RNN models, in contrast, better predicted early preprocessing of the next word. In sum, our results demonstrate that the different language models account for differential cognitive processes during reading. We discuss these algorithmically concrete blueprints of lexical consolidation as theoretically deep explanations for human reading.
... In a recent study, Bianchi et al. (2020) contrasted the GD predictions of an n-gram model with the predictions of an LSA-based match of the current word with the preceding nine words during discourse reading. They found that LSA did not provide effects over and above the n-gram model. ...
... In sum, the literature on document-level semantics presently provides no consistent picture. Long-range semantic effects might be comparably small (e.g., Hofmann et al., 2017), but they may be more likely to deliver consistent results when the analysis is not constrained to the long-range contextual match of the present, but also of other words (Bianchi et al., 2020). A more consistent picture might emerge, when also shortrange predictability is considered, as reflected e.g., in n-gram models (Wang et al., 2010;Bianchi et al., 2020). ...
... Long-range semantic effects might be comparably small (e.g., Hofmann et al., 2017), but they may be more likely to deliver consistent results when the analysis is not constrained to the long-range contextual match of the present, but also of other words (Bianchi et al., 2020). A more consistent picture might emerge, when also shortrange predictability is considered, as reflected e.g., in n-gram models (Wang et al., 2010;Bianchi et al., 2020). ...
Preprint
Full-text available
Though there is a strong consensus that word length and frequency are the most important single-word features determining visual-orthographic access to the mental lexicon, there is less agreement as how to best capture syntactic and semantic factors. The traditional approach in cognitive reading research assumes that word predictability from sentence context is best captured by cloze completion probability (CCP) derived from human performance data. We review recent research suggesting that probabilistic language models provide deeper explanations for syntactic and semantic effects than CCP. Then we compare CCP with (1) Symbolic n-gram models consolidate syntactic and semantic short-range relations by computing the probability of a word to occur, given two preceding words. (2) Topic models rely on subsymbolic representations to capture long-range semantic similarity by word co-occurrence counts in documents. (3) In recurrent neural networks (RNNs), the subsymbolic units are trained to predict the next word, given all preceding words in the sentences. To examine lexical retrieval, these models were used to predict single fixation durations and gaze durations to capture rapidly successful and standard lexical access, and total viewing time to capture late semantic integration. The linear item-level analyses showed greater correlations of all language models with all eye-movement measures than CCP. Then we examined non-linear relations between the different types of predictability and the reading times using generalized additive models. N-gram and RNN probabilities of the present word more consistently predicted reading performance compared with topic models or CCP.
... With this information, the time expended by the reader's eyes on each word (i.e., Gaze Duration -GD-) is analysed as a reflection of their processing cost. This variable is known to correlate with word properties like word length, lexical frequency, position in the sentence or text, and Predictability, among others [15,14,3]. Nowadays, these analyses are performed using Linear Mixed Models (LMM), that allows to understand how all these word properties relate with GD taking into account the variance introduced by subjects or the selected material for the experiment (random effects). Thus, by doing this type of analyses it is possible to understand which text features are used by our brains to process information. ...
... Researchers had made several attempts to model it using simple computational models but, until now, they had not reached conclusive results [19,9,3,10,1]. In 2008, Ong and Kliegl [19] analysed how the conditional co-occurence probability (CCP) of a word given its context, measured by their frequency on internet search engines (Google, Yahoo!, MSN), and replaced the cloze-Predictability in Eye Movements models. ...
... In 2020 Bianchi and colleagues [3] showed that N-gram probabilities and semantic similarities from different distributional semantics algorithms (LSA, word2vec, FastText) can partially replace the cloze-Predictability on Linear Mixed Models (LMM) using the GD as the dependent Variable. In this study they analysed how much variance was left for the cloze-Predictability to explain the GD on the residuals of the LMM fitted with computational-Predictabilities. ...
Conference Paper
Full-text available
Modern Natural Language Processing (NLP) models can achieve great results resolving different types of linguistic tasks. This is possible thanks to a high volume of internal parametersthat are optimized during the training phase. They allow to model high-level linguistic properties. For example, LSTM-based language models have the ability to find long-term dependencies between words on a text, and use them to make predictions about upcoming words. Nevertheless, their complexity makes it hard to understand which features they use to generate predictions. The neurolinguistic field faces a similar issue when studying how our brain processes language. For example, every adult reader has the ability to understand long texts and to make predictions of upcoming words. Nevertheless, our understanding on how these predictions are driven is limited. During the last decades, the study of eye movements during reading have shed some light on this topic, finding a relation between the time spent on a word (gaze duration) and its processing cost. Here, we aim to understand how LSTM-based models predict future words and these predictions relate with human predictions, fitting statistical models commonly used in the neurolinguistic field with gaze duration as the dependent variable. We found that an AWD-LSTM Language Model can partially model eye movements, with high overlap with both human-Predictability and lexical frequency. Interestingly, this last overlap is seen to depend on the training corpus, being lower when the model is fine-tuned with a corpus similar to the one used for testing.
... Eye-computer-based interaction provides an effective alternative to joystick-based control of mobility scooters for people who cannot functionally use their upper limbs [8]- [10]. It represents a novel approach for human-machine interaction and assisted living [11]- [15]. In addition, eye tracking provides an additional hands-free level of control for non-disabled people during fast, cognitively demanding tasks such as driving or flying. ...
Article
Full-text available
The tracking of eye gesture movements using wearable technologies can undoubtedly improve quality of life for people with mobility and physical impairments by using spintronic sensors based on the tunnel magnetoresistance (TMR) effect in a human–machine interface. Our design involves integrating three TMR sensors on an eyeglass frame for detecting relative movement between the sensor and tiny magnets embedded in an in-house fabricated contact lens. Using TMR sensors with the sensitivity of 11 mV/V/Oe and ten <1 mm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">3</sup> embedded magnets within a lens, an eye gesture system was implemented with a sampling frequency of up to 28 Hz. Three discrete eye movements were successfully classified when a participant looked up, right or left using a threshold-based classifier. Moreover, our proof-of-concept real-time interaction system was tested on 13 participants, who played a simplified Tetris game using their eye movements. Our results show that all participants were successful in completing the game with an average accuracy of 90.8%.
... Eye-computer-based interaction provides an effective alternative to joystick-based control of mobility scooters for people who cannot functionally use their upper limbs [8]- [10]. It represents a novel approach for human-machine interaction and assisted living [11]- [15]. In addition, eye tracking provides an additional hands-free level of control for non-disabled people during fast, cognitively demanding tasks such as driving or flying. ...
Article
Full-text available
Predictions of incoming words performed during reading have an impact on how the reader moves their eyes and on the electrical brain potentials. Eye tracking (ET) experiments show that less predictable words are fixated for longer periods of times. Electroencephalography (EEG) experiments show that these words elicit a more negative potential around 400ms (N400) after the word onset when reading one word at a time (foveated reading). Nevertheless, there was no N400 potential during the foveated reading of previously known sentences (memory-encoded), which suggests that the prediction of words from memory-encoded sentences is based on different mechanisms than predictions performed on common sentences. Here, we performed an ET-EEG co-registration experiment where participants read common and memory-encoded sentences. Our results show that the N400 potential disappear when the reader recognises the sentence. Furthermore, time-frequency analyses show a larger alpha lateralisation and a beta power increase for memory-encoded sentences. This suggests a more distributed attention and an active maintenance of the cognitive set, in concordance to the predictive coding framework.
Chapter
Predictability corpora built via Cloze task generally accompany eye-tracking data for the study of processing costs of linguistic structures in tasks of reading for comprehension. Two semantic measures are commonly calculated to evaluate expectations about forthcoming words: (i) the semantic fit of the target word with the previous context of a sentence, and (ii) semantic similarity scores that represent the semantic similarity between the target word and Cloze task responses for it. For Brazilian Portuguese (BP), there was no large eye-tracking corpora with predictability norms. The goal of this paper is to present a method to calculate the two semantic measures used in the first BP corpus of eye movements during silent reading of short paragraphs by undergraduate students. The method was informed by a large evaluation of both static and contextualized word embeddings, trained on large corpora of texts. Here, we make publicly available: (i) a BP corpus for a sentence-completion task to evaluate semantic similarity, (ii) a new methodology to build this corpus based on the scores of Cloze data taken from our project, and (iii) a hybrid method to compute the two semantic measures in order to build predictability corpora in BP.