Indo-European language tree 16

Indo-European language tree 16

Source publication
Preprint
Full-text available
This research project aimed to overcome the challenge of analysing human language relationships, facilitate the grouping of languages and formation of genealogical relationship between them by developing automated comparison techniques. Techniques were based on the phonetic representation of certain key words and concept. Example word sets included...

Contexts in source publication

Context 1
... clustering was performed with the best Silhouette value cut ( Figure 10). The Silhouette value suggested making 9 clusters. ...
Context 2
... way of looking at all to all comparison data was by producing 10 clusters. It was done by using "hcutVisual" and "cPurity" function (see Appendix B.1 cluster Figure: 11). Two out of ten clusters were pure, both containing number "5". ...
Context 3
... examining the data calculated for "ColoursAll" none of the colours showed a clear tendency to be more preserved than others ( Figure: 13). All colours had large distances and comparatively small standard deviation when compared with other groups. ...
Context 4
... standard deviation was most likely the result of most of the distances being large. Indo-European language group scores were similar to "ColoursAll", exhibiting slightly larger standard deviation ( Figure: 14). Conclusion could be drawn that words for color "Red" are more similar in this group. ...
Context 5
... plots of all languages and Indo-European languages were similar: both having multiple peaks with the most density around scores of 0.75 (big linguistic distance). Moreover, Germanic languages density distribution consisted of two peaks for words "White", "Blue" and "Green" (Figure: 17). This could possibly be the result of certain weighting in the Phonetic Substitution Table or indicate possible further grouping of languages. ...
Context 6
... color "Black" had more normal distribution and smoother bell shape compared to others. Furthermore, Romance languages also obtained density plots with two peaks for words "White", "Yellow", "Blue" (Figure: 18). In contrast, "Black", "Red" and "Green" distributions were quite smooth. ...
Context 7
... order to experiment how the Phonetic Substitution Table affects the linguistic distances, "densityP" function was applied to the linguistic distances calculated with "GabyTable" substitution table. The aim was to eliminate the two peaks in the Figure 14: Mean, SD and mean*SD of every colour of Indo-European languages Germanic language group for word "Green". In Germanic languages word for green tended to begin with either "gr" or "khr" (encoded as "Kr") -both sounding similar phonetically. ...
Context 8
... a result, the preservation of particular words can be analysed across language groups, enabling to compare and evaluate potential reasons behind it. Figure 15: Mean, SD and mean*SD of every colour of Indo-European languages ...
Context 9
... begin with, clustering of all languages showed some interesting results that complied with the grouping of the languages (find the dendrogram in Figure: 19). The suggested cut by Silhouette value was 23. ...
Context 10
... all are spread across islands in the Indian Ocean. Moreover, clusters of Indo-European languages were quite pure as well (groups are visible in the dendrogram of all languages, however for clarity see figure 21). There were four larger groups that stood out. ...
Context 11
... clustering results of the Germanic languages file ( Figure: 22) show high relation with geographical prevalence of the languages and language development history. German, Luxembourgish (has similarities with other varieties of High German languages) and Yiddish (a High German-based language) were all in the Figure 17: Density plots of each colour of Germanic languages same cluster. Also, Afrikaans and Dutch were placed in the same group, and it is known that Afrikaans derived from Dutch vernacular of South Holland in the course of 18th century. ...