Classification results for each set of language representations.

Source publication

Figure 2: Classification results for each set of language representations.

Figure 3: Classification results for each set of language representations.

Figure 6: Classification results for each set of language representations.

Figure 7: Classification results for each set of language...

Language Embeddings Sometimes Contain Typological Generalizations

Preprint

Full-text available

Jan 2023

To what extent can neural network models learn generalizations about language structure, and how do we find out what they have learned? We explore these questions by training neural models for a range of natural language processing tasks on a massively multilingual dataset of Bible translations in 1295 languages. The learned language representation...

Context 1

... from WordLM and the Reinflect models, none of the representations reach a mean F 1 of 0.7 for any of the features under investigation. Figure 6 shows how well different language representations can be used to predict whether a language tends to use prefixes or suffixes (affixation type), according to the weighted affixation index of Dryer (2013e). Languages classified as not using affixation, or with equal use of prefixes and suffixes, are excluded from the sample. ...

View in full-text

Context 2

... classified as not using affixation, or with equal use of prefixes and suffixes, are excluded from the sample. The language representations best able to predict this feature is the Reinflect-Noun, followed by Reinflect-Verb and (when using gold-standard labels for training, Figure 6a) the WordLM representations. However, with WordLM representations, the object and verb order as well as adposition and noun order features both explain the classification results about equally well (F 1 within 1.5 percentage points). ...

View in full-text

Classification results for each set of language representations.

Contexts in source publication