Classification results for each set of language representations, using naive cross-validation where languages related to the evaluated language are not excluded from the training fold. The point of this figure is to demonstrate how unsound evaluation methods give misleading results, see main text for details.

Source publication

Figure 1: Classification results for each set of language representations.

Figure 2: Classification results for each set of language representations.

Figure 3: Classification results for each set of language representations.

Figure 6: Classification results for each set of language representations.

Figure 7: Classification results for each set of language...

Language Embeddings Sometimes Contain Typological Generalizations

Preprint

Full-text available

Jan 2023

To what extent can neural network models learn generalizations about language structure, and how do we find out what they have learned? We explore these questions by training neural models for a range of natural language processing tasks on a massively multilingual dataset of Bible translations in 1295 languages. The learned language representation...

Context 1

... illustrate the effect of not following our cross-validation setup (Section 7.2), we now compare Figure 8a NMTeng2x and NMTx2eng), perform equally poorly in both cases, suggesting that they do not correlate well with any type of language similarity. For representations such as Lexical and ASJP, the naive cross-validation setup results in much higher classification F 1 than the linguistically sound cross-validation. ...

View in full-text

Context in source publication