PreprintPDF Available

The spread of Munda in prehistoric South Asia -the view from areal typology To appear in: Volume in Celebration of the Bicentenary of Deccan College Post-Graduate and Research Institute (Deemed University).

October 2021

October 2021

License
CC BY-NC-ND 4.0

Authors:

John Peterson

Christian-Albrechts-Universität zu Kiel

Preprints and early-stage research may not have been peer reviewed yet.

The history of the Munda branch of the Austro-Asiatic 2 family in South Asia and its relation to the other branches of this family have long been shrouded in mystery. While some studies place the origin of this family in South Asia, from where it spread to Southeast Asia, others see its origin in Southeast Asia, with a subsequent spread to South Asia in the west. As the original spread of the Munda languages in South Asia plays a key role in many of these hypotheses, I examine a claim on the earlier maximal spread of Munda languages made in a recent study (Rau & Sidwell, 2019) and suggest a revision of this hypothesized spread, based primarily on areal-typological grounds, which I believe more accurately reflects the true extent of the earlier spread of these ethnic groups in the subcontinent.

South Asian Language Families 4

…

The branches of Austro-Asiatic (Sidwell, 2015: 144)

…

Districts with Munda populations as listed in Ethnologue (Rau & Sidwell, 2019: 37)

…

Map of the different Munda branches without modern settlements in other areas (Rau & Sidwell, 2015: 40) -Abbreviations refer to the classification of Munda in Sidwell (2015), given in Figure 4 above: NM -North Munda, K -Kharia, J -Juang, SG -Sora-Gorum, GR -Gutob-Remo, G -GtaɁ

…

Physiographic divisions of India 10

…

Figures - uploaded by John Peterson

Content may be subject to copyright.

Content uploaded by John Peterson

Content may be subject to copyright.

The spread of Munda in prehistoric South Asia – the view from areal typology

John Peterson, Kiel University, Germany

Abstract

The history of the Munda branch of the Austro-Asiatic

family in South Asia and its relation to the

other branches of this family have long been shrouded in mystery. While some studies place the

origin of this family in South Asia, from where it spread to Southeast Asia, others see its origin in

Southeast Asia, with a subsequent spread to South Asia in the west. As the original spread of the

Munda languages in South Asia plays a key role in many of these hypotheses, I examine a claim on

the earlier maximal spread of Munda languages made in a recent study (Rau & Sidwell, 2019) and

suggest a revision of this hypothesized spread, based primarily on areal-typological grounds, which

I believe more accurately reflects the true extent of the earlier spread of these ethnic groups in the

subcontinent.

1 Introduction

South Asia is home to some 600 languages belonging to at least seven stocks: Indo-European

(including Indo-Aryan, Iranian and Nuristani), Dravidian, Jarawa-Onge, Andamanese,

Tibeto-

Burman/Trans-Himalayan, Tai-Kadai, and Austro-Asiatic, as well as various isolates such as

Burushaski, Kusunda and Nihali. Figure 1 provides an overview of most of these.

The linguistic history of South Asia before the advent of Indo-Aryan speakers is still largely

unknown. While we know that speakers of Indo-Aryan first appeared in the northwest of the

subcontinent some time before 1,000 BCE and subsequently spread southwards and eastwards, we

cannot be as sure of the prehistories of Dravidian, Munda and other languages/language families.

For example, the original speakers of the languages of the Munda branch of the Austro-Asiatic

family are considered by some (e.g., Kumar et al. 2007) to have originated in South Asia, from

where they spread to Southeast Asia. Most researchers however now assume the opposite direction,

viewing the Austro-Asiatic speakers of South Asia as descendants of migrants from the east,

perhaps in the Irrawaddy flood plains of Myanmar or the lower Brahmaputra in Assam and

Bangladesh (e.g. Diffloth, 2005).

In a very interesting recent study, Rau & Sidwell (2019) suggest that speakers of the form of

speech which was later to become the Munda languages arrived in South Asia ca. 3,500 – 4,000

years ago in the Mahanadi Delta and adjacent coastal plains, from where they later spread to other

regions. Rau & Sidwell assume a maritime migration consisting primarily of males. These

immigrants cultivated rice and millet and eventually established themselves as dominant in much

of eastern India. More important to our discussion here, these authors provide a very exact

description of what they consider to have been the maximal prehistorical spread of Munda

languages in South Asia, one which explicitly does not include the Gangetic Plains.

Many thanks to Paul Sidwell and Felix Rau for their insightful and critical comments on an earlier version of this

study. Although I am sure that they do not accept all of my conclusions here, their comments forced me to reconsider

and reformulate my arguments somewhat on a number of different points, for which I am grateful. Needless to say, all

remaining errors and misconceptions are my own.

Although the spelling “Austroasiatic” without a hyphen is more common, in this and other works I consistently refer

to this family as “Austro-Asiatic” as it consists of two components, “Austro” and “Asiatic”. This spelling brings it into

line with other language families, such as “Indo-European”, “Tai-Kadai”, “Afro-Asiatic”, etc.

Abbi (2009) argues that the languages of the Andaman Islands belong to two genealogically unrelated groups, whose

protolanguages she refers to as “Proto Ang” and “Proto Great Andamanese”.

Figure 1: South Asian Language Families

In the present study I will not attempt to evaluate Rau & Sidwell’s arguments with respect to

whether Proto-Munda speakers arrived in India via land or sea, or where exactly they first settled.

Rather, I will discuss what I consider more likely to have been the maximal spread of Munda

languages in prehistoric times than what these authors suggest. More specifically, I will argue that

the maximal prehistoric spread of Munda included the eastern half of the Gangetic Plains, although

Proto-Munda speakers were very likely not the only ethnic groups who inhabited these plains. My

arguments primarily come from linguistic typology in a very broad sense, including areal typology,

Modified version of the map found at https://en.wikipedia.org/wiki/Languages_of_India. “South Asian Language

Families, translated from Image: Südasien Sprachfamilien.png, from Language families and branches, languages and

dialects in A Historical Atlas of South Asia, Oxford University Press. New York 1992. Nihali and Kusunda are not

shown. Author – Wikipedia User Bishkek Rocks. Translated by Wikipedia User Kitkatcrazy.” Modified by Anvita

Abbi and her research team to correctly portray the genealogical relations of the languages of the Andaman Islands

and to include the Tai-Kadai languages. Reprinted here with the kind permission of Anvita Abbi.

language contact, and spread zones vs. residual/accretion zones, but I will also make reference to

the presumed sedentary agricultural lifestyle of these groups as well as a few recent genetic studies.

The article is structured as follows. In Section 2, I provide a very general overview of Munda

and its place in the Austro-Asiatic family before I discuss the present spread of Munda languages

in Section 3. In Section 4 I present my arguments for assuming a larger prehistoric spread of Munda

to include the eastern half of the Gangetic Plains, divided into two main sections: linguistic

evidence (4.1) and recent genetic studies (4.2). Section 5, the conclusion, summarizes once again

my main arguments.

2 The Munda languages and Austro-Asiatic

The Munda languages form the western-most branch of Austro-Asiatic, which stretches from

Central India to Vietnam. Figure 2, from Sidwell (2015: 144), provides an overview of the extent

of this spread as well as of the major branches of this family.

Figure 2: The branches of Austro-Asiatic (Sidwell, 2015: 144)

Figure 3 presents the “traditional” internal classification of Munda, from Zide (1969: 412). In

recent years, this classification has undergone considerable revision. One recent revised

classification is given in Figure 4 from Sidwell (2015: 197), with a considerably flatter internal

classification. Despite all differences, however, all classifications from the last ca. 50 years agree

on the status of the North Munda branch and its subsequent bifurcation into Korku on the one hand

and the Kherwarian languages on the other, while there is much less agreement on the internal

classification of the rest of the family, i.e., Zide’s (1969) “South Munda”.

For detailed discussion of the internal classification of Austro-Asiatic, see the discussion in Sidwell (2015, especially

pp. 206-211).

An overview of many of these classifications is given in Anderson (2015).

Figure 3: The Munda languages according to Zide (1969: 412)

Figure 4: The revised Munda classification of Sidwell (2015: 197)

Munda

North Munda:

Korku

Santali, Mundari

Sora-Gorum

Juang

Kharia

Gutob-Remo

GtaɁ

3 The present spread of Munda

It has long been a matter of debate whether the Austro-Asiatic-speaking populations of South Asia

immigrated from Southeast Asia, or whether the Austro-Asiatic-speaking populations of Southeast

Asia represent the descendants of an eastward movement into Southeast Asia from India. At

present, however, there appears to be more general support for a migration into South Asia from

the east, for many reasons.

Perhaps the most obvious reason to assume a Southeast Asian homeland is the geographical

spread of Munda languages in South Asia today. With the exception of Korku, spoken in western

central India in Maharashtra and Madhya Pradesh, these languages are primarily concentrated in

the eastern half of the subcontinent, stretching from Jharkhand in the north to Odisha and

northeastern Andhra Pradesh in the South, and Chhattisgarh/Madhya Pradesh in the west to western

West Bengal in the east. No Munda languages are found further west than Korku. Figure 5, from

Rau & Sidwell (2015: 37), provides an overview of districts in India with significant Munda

populations.

Figure 5: Districts with Munda populations as listed in Ethnologue (Rau & Sidwell, 2019: 37)

While Munda-speaking groups are now found as far north as Nepal, Assam and Bhutan, as well

as in Bangladesh and several parts of West Bengal, many of these groups represent a later migration

of (often forced) laborers primarily from Jharkhand, Odisha and Chhattisgarh to work on the tea

plantations of these regions in the 19th century. In contrast, much of the movement into more

southern areas of West Bengal appears to have begun considerably earlier and was unrelated to the

tea plantations (cf. Section 4.1.1, especially note 9). In this study I focus on the traditional

homelands of these peoples and will only refer in passing to these later migrations.

Figure 6, from Rau & Sidwell (2015: 40), provides a rough overview of where the modern

Munda languages are spoken together with their proposed original Munda homeland in India.

Figure 6: Map of the different Munda branches without modern settlements in other areas

(Rau & Sidwell, 2015: 40) – Abbreviations refer to the classification of Munda in Sidwell (2015), given in Figure 4

above: NM – North Munda, K – Kharia, J – Juang, SG – Sora-Gorum, GR – Gutob-Remo, G - GtaɁ

Due to the position of Korku in western central India at such a large distance from the other

North Munda languages, Rau & Sidwell (2019: 37; 39) consider this language to be an outlier of

North Munda, reflecting an expansion of North Munda from the Chotanagpur Plateau in the east.

The authors attribute the present geographic separation of these two branches of North Munda to a

later expansion of Dravidian-speaking groups such as the Kui, Gond and Kurukh, which drove a

wedge between the previously contiguous North Munda-speaking groups. However, Rau &

Sidwell (2019: 38) see “no evidence for Kherwarian speakers in the Gangetic Plains prior to

colonial and post-colonial migrations.”

4 Some thoughts on the prehistoric spread of Munda in India

From an areal-typological/contact-linguistic perspective, one would expect that Munda languages

were once spoken over a considerably larger area than the present distribution of this group

suggests. In the following sections I therefore present my arguments for assuming that the

prehistoric spread of Munda also included the eastern half of the Gangetic Plains, comprising parts

of West Bengal (and perhaps neighboring regions of Bangladesh), Bihar and the eastern half of

Uttar Pradesh. These include primarily linguistic arguments (4.1) but also data from a number of

studies from the field of genetics (4.2) which point in the same direction.

4.1 The linguistic evidence

The linguistic evidence I cite in the following comes from three different areas, which I will deal

with separately in the following sub-sections. These are spread zones vs. residual/accretion zones

(4.1.1), arguments based on linguistic terms for agriculture and domesticated animals (4.1.2) and

areal-typological considerations (4.1.3).

4.1.1 Spread zones vs. residual/accretion zones

Johanna Nichols (1992) introduces the terms “spread zones” and “residual zones”, the latter of

which she later refers to as “accretion zones” (e.g., Nichols, 1997), to refer to two different types

of areas with respect to language density.

Spread zones are areas of rapid language spread, among other things with little genealogical

diversity, shallow language families and the use of a limited number of lingue franche or languages

of general communication between the different ethnic groups (e.g. Nichols, 1992: 16-17). Typical

of such areas is that they are regions which are easily accessible to outsiders, e.g. invaders and

large-scale immigration.

In South Asia, the Gangetic Plains as we presently see them are a textbook example of a spread

zone. As is known from the Vedas, northwestern South Asia was inhabited by various ethnic groups

when Indo-Aryan speakers first arrived, most likely some time before 1,000 BCE, and a number

of words from Dravidian and (Para-) Munda languages are claimed by some to have found their

way into these texts (e.g. Kuiper 1948; Witzel 1999), although this is disputed by others.

From

there some of these speakers began migrating eastwards at an early date, and by 600 BCE large

numbers of Indo-Aryan speakers had already settled throughout much of the Gangetic Plains,

Cf. Wikipedia, Substratum in the Vedic language for further discussion.

where Indo-Aryan quickly established itself as the lingua franca and later became the first language

of most of the inhabitants there.

Towards the southern and southeastern peripheries of the Gangetic Plains we find several hill

tracts such as the Chotanagpur Plateau in the southeast, and the Vindhya and Satpura ranges and

the Vindhyan Scarplands in central India to the south of the Gangetic Plains. Further to the

southeast we also find the northern Eastern Ghats running parallel to the east coast. These are

typical residual or accretion zones in Nichols’ terminology as they possess a relatively high

genealogical density compared to the rest of the sub-Himalayan subcontinent and presumably only

relatively recent lingue franche, with local bilingualism and/or multilingualism apparently having

long been the norm (cf. e.g. Nichols 1992: 21).

Typical of such residual or accretion zones is that they are generally unattractive to newcomers

from an agricultural perspective, as the soil tends to be difficult to cultivate, or with respect to trade,

as these regions are comparatively difficult to access, neither of these two features being true of

the Gangetic Plains, which are highly accessible and easy to cultivate. For reasons such as these,

languages in these residual/accretion zones tend to survive the onslaught of invaders/settlers better,

at least initially, as they offer refuge to ethnic groups who may be fleeing the newcomers and to

those who already live there.

It is precisely in regions such as these that we find the languages of the Munda family on and

around the Chotanagpur Plateau, the Garhjat Hills, on the Baghelkhand Plateau and parts of the

Mahanadi Basin and Dandakaranya, which together comprise the Eastern Plateau, or in the Satpura

Range (Korku). There are also several Munda groups in the northern Eastern Ghats, the isolate

Nihali in the Satpura Range, Dravidian languages such as Kurukh and Malto on the Chotanagpur

Plateau, and Dravidian Gondi in the Satpura Range. Traces of other languages which were once

spoken in these regions are also occasionally found. For example, in the Indo-Aryan language

Kurmali, spoken in Jharkhand on the Chotanagpur Plateau, we find traces in the core vocabulary

of a language which is no longer spoken and which at least at present cannot be traced to Indo-

Aryan, Munda or Dravidian.

These regions form two of the major physiographic divisions of India and include part of a third

division, all of which fit the description of Nichols’ residual/accretion zones quite well:

 South Central Highlands – consisting of the Satpura and Vindhya ranges and the Vindhyan

Scarplands (e.g. Nihali, Korku, Gondi);

 Eastern Plateau – consisting of the Chotanagpur Plateau, the Baghelkhand Plateau, the

Garhjat Hills, the Mahanadi Basin and Dandakaranya (e.g., North Munda other than Korku,

Kharia, Juang);

 Eastern Hills – of which the northern Eastern Ghats comprise the northernmost one-third

(e.g. Gutob-Remo, Sora-Gorum, GataɁ).

Note also that where we find Munda districts in northern coastal Andhra Pradesh and Odisha

in Figure 5, these are located where the Eastern Hills reach almost to the sea, i.e., these locations

are not wide coastal areas. These regions are shown in Figure 7.

In other words, from an areal-typological perspective the Munda districts from Figure 5 are

highly unlikely to be the entire original spread of Munda in South Asia, regardless of whether

Proto-Munda speakers entered South Asia by land or sea. Rather, these more likely represent

residual/accretion zones in which the above-mentioned languages have managed to survive.

Concentrating here on Munda, this strongly suggests that these languages were once also spoken

Cf. Paudyal & Peterson (2021: 22). This list includes such common words as very, last, open, eat, sleep and see.

in surrounding areas, in my opinion above all in the Gangetic Plains, where they disappeared when

the population there later switched to Indo-Aryan. Thus, barring recent migrations

we can say that

where Munda languages are now found these are spoken in regions which have traditionally been

of little interest to newcomers, both from an agricultural as well as an economic perspective, so

that these languages have managed to survive there.

Figure 7: Physiographic divisions of India

Chaubey et al. (2017: 493) note in this respect that the Vindhya and Satpura ranges are a “fringe

area” where a “combination of the more rudimentary technological level of development of the

resident populations and geographical remoteness may have facilitated the gradual admixture and

assimilation of incursive populations willing to adapt to the subsistence strategies practiced locally,

while impeding the bearers of technologically more advanced cultural assemblages.”

Taken

together, these observations suggest that the present-day spread of Munda, even including the areas

between Korku and the other North Munda languages where Gondi and Indo-Aryan languages are

now spoken (cf. Section 4.2), most likely does not provide a realistic indication of the full extent

With respect to the eastern-most Santali groups of West Bengal, outside of the so-called “tea districts”, it is generally

assumed that the Santals migrated to their present homeland in eastern Jharkhand from western and central Jharkhand,

hence the eastern-most Santali-speaking regions in Figure 5 represent a later development, perhaps as early as the mid-

14th century, although this eastward movement continued at least up to the 19th century (cf. Das, 2020: 1224-1225).

I also assume a similarly late (i.e., 19th century) migration into this region by other groups who are now also found

there, such as the Turi (North Munda), but who are otherwise found in western Jharkhand, eastern Chhattisgarh and

also in Bihar.

This map is from the “Physical map of India with various physiographic divisions” in Wikipedia, Geography of

India: https://commons.wikimedia.org/wiki/File:Physical_Map_of_India.jpg, Creative Commons Attribution-Share

Alike 4.0 International license. Only the inset map “Physiographic divisions of India” is shown here.

Cf. e.g. also Heggarty (2014: 620) “[…] the hunter-gatherers’ languages, if they survive at all, invariably end up

cantoned into inhospitable areas of little value to agriculturalists.”

of the spread of people who are assumed to have once belonged to the dominant agricultural

societies of eastern India (cf. Section 4.1.2).

One final note on spread zones is important here. As Epps (2020) shows in her discussion of

lowland regions in South America, a flat, easily accessible area with a river flowing through it does

not guarantee that the respective region will be a spread zone. As important as the terrain is for the

spread of languages, the cultures of the peoples who live there and their degree of social (in)equality

are equally important. For example, the lowlands of the Amazon which Epps discusses are

characterized by a high degree of social equality, where the different ethnic groups “relate to each

other as distinct parts of the same machine” (Epps, 2020: 285) and where language, among other

cultural traits, is seen as an essential part of the identity of the individual groups. Consequently, the

borrowing of words from one language to another is quite rare in these areas, although considerable

morphosyntactic convergence is found, as is expected in areas of such intense language contact.

As these societies are not characterized by top-down political or social structures, there appears to

have been no general lingua franca in at least some of these regions until the arrival of Portuguese

(Epps, 2020: 282). Thus, with respect to linguistic density, these zones more closely resemble the

above-mentioned residual/accretion zones than spread zones, despite their terrain.

With respect to the Gangetic Plains, the spread of Indo-Aryan languages throughout this region

is be expected from a hierarchical social order in this kind of terrain. This is in line with the social,

political, military and economic predominance of the Indo-Aryan newcomers to this region. We

can only speculate with respect to the social structures which were predominant in the region before

the arrival of these speakers. However we must be careful not to simply assume that the ethnic

groups which inhabited this region prior to the arrival of Indo-Aryan speakers also had similarly

hierarchical social structures. In fact, I would argue that these groups were in general more

egalitarian than the new arrivals, practicing small-scale agriculture, living in small villages and

perhaps also practicing a certain degree of hunting.

This is significant since Rau & Sidwell (2019: 45) note that convincing linguistic evidence for

a Munda substrate in the languages of the Gangetic Plains is lacking, as “we find no pattern of

linguistic remnants in residual zones around the Gangetic Plain.” However, I do not consider this

to be convincing evidence against an earlier Munda presence there. To begin with, formerly distinct

ethnic groups are generally believed to be incorporated into mainstream Indian society as “low-

caste” groups, generally performing menial, often “unclean” tasks. Thus, in order to verify a Munda

substrate in these languages (if indeed there is one) we would need detailed data on the lexicons of

the Indo-Aryan speech of these communities, which is not available at present.

However, I would argue that such data will probably never be found, at least not on a larger

scale. Assuming that these were indeed small-scale agricultural societies and relatively egalitarian,

it seems unreasonable to expect to find traces of one clear substrate in the entire region, as there

was no single privileged language. Rather, we would expect to find scattered traces from several

languages, many of them belonging to families about which we know nothing at present. A likely

candidate here is Kurmali, mentioned above, in which we find traces of an unknown language

(family) in the core lexicon (cf. footnote 8). But given what we now know about language spread,

we are unlikely to find a single substrate language throughout the entire region.

In other words, a lack of clear evidence for a Munda substrate in these languages is not

necessarily an argument against their presence in the eastern Gangetic Plains in earlier times – it is

simply a lack of positive evidence that they were there.

There appear to have been no notable larger settlements in the lower plains, with urban centers only appearing after

600 BCE, with the arrival of the Indo-Aryan-speaking settlers from the west (Kulke & Rothermund, 1991: 52-53).

4.1.2 The importance of agricultural terminology for the prehistoric spread of Munda

Zide & Zide (1976) argue that Proto-Munda speakers were agriculturalists who most likely grew

rice and different types of millet and who kept domesticated animals, and that modern Munda

groups such as the Juang and the Birhor, who until recently were predominantly hunters and

gatherers, are “examples of reversion from a more complex culture to a simpler one.” (Zide & Zide,

1976: 1296) In light of my comments in the preceding sections I argue that groups such as these

“reverted” to a hunter-and-gatherer lifestyle only after moving into the Eastern Plateau, where

agriculture likely proved difficult, at least initially. For the sake of brevity, I will only cite here the

respective agricultural terms in English, without their suggested Proto-Munda forms.

Zide & Zide (1976) find evidence for Proto-Munda names for various types of fruits, such as

‘wild fig’, ‘mango’, ‘green or unripe mango’, ‘jamun or Indian blackberry’, ‘turmeric’, ‘tamarind’

and ‘(wild) date’ but more importantly also various words for ‘rice’ such as ‘uncooked, husked

rice’, ‘paddy’ and ‘cooked rice’, different types of millet and gourds, as well as words for ‘pestle’,

‘mortar’ and ‘husking hole’ (Zide & Zide, 1976: 1297-1315). While not all of these, especially

terms for ‘millet’, have known cognates in non-Munda Austro-Asiatic languages, these terms

nevertheless strongly suggest a familiarity of Proto-Munda speakers with these crops, tree fruits,

legumes and gourds and how to prepare them for cooking. Similarly, terms for domesticable

animals such as ‘dog’, ‘chicken’, ‘goat’, ‘pig’, ‘buffalo’, ‘cat’ and ‘cattle’ can be identified (Zide

& Zide, 1315-1324). The authors admit that this does not prove that these early Proto-Munda

speakers were agriculturalists, although the evidence is nevertheless quite suggestive:

“The data presented in this paper provides good evidence that the Proto-Mundas, presumably at least 3500

years [before the present, JP] (or earlier) at a conservative estimate, had a subsistence agriculture which

produced or at least knew grain – in particular rice, two or three millets, and at least three legumes. Further,

the agricultural technology included implements which presuppose the knowledge and use of such grains and

legumes as food, since the specific and consistent meanings for ‘husking pestle’ and ‘mortar’ go back, at

least in one item, to Proto-Austroasiatic.’ (Zide & Zide, 1976: 1324)

“Further, the existence of certain terms for agricultural operations (e.g. ‘winnowing’, ‘transplanting’)

strongly suggests that some degree of domestication of these plants was likely, and this in turn presupposes

some degree of sedentary agriculture.” (Zide & Zide, 1976: 1327)

Assuming that the Proto-Munda speakers were cultivators, something of a dilemma arises, one

which Rau & Sidwell (2019: 43-44) also recognize: Why would agriculturalists choose to settle in

the less hospitable hills of eastern and central India? If they did first inhabit the wetlands of Odisha

in this area, in line with Rau & Sidwell’s Maritime Hypothesis, why would they not have followed

the coast to the north, eventually reaching the Ganges Delta, and then have followed the Ganges

with its expansive plains upstream to the west? Surely the Gangetic Plains provide better land for

the cultivation of rice than the rough terrain which is characteristic of so much of the eastern and

central highlands. In contrast, assuming the prehistoric spread of Munda which I have suggested

above on different grounds and which includes the eastern half of the Gangetic Plains would

provide a simple solution to this dilemma.

In fact, based on archaeological evidence, Kingwell-Banham et al. (2018: 11) suggest a very

different scenario from that in Rau & Sidwell (2019). They propose that the agriculturalists of the

Odisha wetlands may in fact have come from the Gangetic Plains, bringing the cultivation of rice

with them.

This analysis, although it speaks against Rau & Sidwell’s Maritime Hypothesis, is

compatible with my assumption that the Proto-Munda speakers once inhabited not only the hill and

Or from the Vindhyan Region of central India, although this suggestion appears to be based entirely on similarities

in pottery.

wetland regions of Odisha but also the eastern Gangetic Plains. Thus, whether Proto-Munda

speakers first settled in the Mahanadi Delta or in the Ganges Delta, we should expect them as

cultivators of rice to have inhabited at least some sections of the eastern Gangetic Plains.

There are historical cases of settlement along the coast in this region. We know that these

coastal regions were settled by Indo-Aryan-speaking groups rather early. For example, the kingdom

of Kalinga once stretched along the coast in precisely this region, while the hill regions to the west

for some time remained largely the refuge of “unconquered tribes” (cf. e.g. Map 4 in Kulke &

Rothermund, 1991: 378). This makes the status of these hill regions as areas of refuge all the more

apparent, perhaps as a consequence of the war which led to the incorporation of Kalinga into the

Mauryan empire by Emperor Ashoka in 261 BCE, whose brutality he himself reports on (cf. among

others Kulke & Rothermund, 1991: 65).

4.1.3 The “Indo-Aryan East-West Divide”

The suggested spread of Munda in prehistoric times is also compatible with linguistic evidence

suggesting that the eastern half of the Gangetic Plains was once inhabited by large numbers of non-

Indo-Aryan-speaking ethnic groups, whereas the western half was Indo-Aryan-speaking, or at least

dominated by Indo-Aryan, by the 6th century BCE.

In a number of recent works by myself and my research team, we have examined the

morphological and syntactic structures of the modern languages of South Asia in an attempt to

learn more about prehistoric migration and settlement patterns. This is possible because when

different ethnic groups speaking different languages live side-by-side and trade with one another

and perhaps also have social contacts above and beyond this, many members of one or both of the

respective communities become bilingual over time. This eventually has an impact on the structures

of these languages and this type of information can reveal much about the history of these

languages and their speakers. For example, when a large number of adult speakers learn a new

language at the same time, this often results in a simplification e.g. of the case system of the new

language, while long-term community bilingualism from childhood onward can lead to

morphological complexification.

In Peterson (2018) I present the results of a revised and somewhat expanded data set from an

earlier study (Peterson, 2017); these are illustrated in Figure 8 in a NeighborNet visualization of

the data.

This visualization illustrates that the Western Indo-Aryan languages such as Hindi, Braj

Bhasha, Marathi and Konkani, as well as Nepali (a relatively recent arrival in eastern South Asia

from the west) cluster together structurally, and together with Dravidian languages such as Telugu

and Kannada (far left of diagram). In contrast, eastern Indo-Aryan languages such as Maithili,

Bengali, Odiya and others (center of Figure 8, dotted lines) cluster with North and South Munda

languages, as well as with eastern Dravidian languages such as Kurukh and Malto (right half of

Figure 8). Bhojpuri clusters with the languages of the east in this visualization, but as we will see

below, it is a borderline case and often clusters with the languages of the west when different

criteria are used.

For a detailed introduction to this area of linguistics, referred to as “sociolinguistic typology”, see Trudgill (2011).

NeighborNet (Bryant & Moulton, 2004) is often used in contact linguistics to portray the effects of language contact.

In these networks, the length of branches corresponds directly to the degree of divergence or “distance” between

individual languages. Instead of trying to find an optimal tree-like format to portray similarities and differences

between languages, NeighborNet suggests alternative trees, resembling networks, to portray the possible paths which

may be taken between two points when there are conflicting signals in the data, as is often the case in language contact.

Figure 8: NeighborNet visualization of the structural similarities of selected languages of South Asia (29 languages,

46 morphosyntactic features) (from Peterson, 2018)

A later study by myself and my research team came to similar results. Figure 9 shows the

results of a statistical analysis of 16 Indo-Aryan languages with respect to 217 morphosyntactic

features and their respective structural distance from the Munda languages. The names of the

languages in green are those Indo-Aryan languages which are structurally closest to Munda, while

the languages given in red are those which are maximally different from Munda. I refer to this

structural schism within Indo-Aryan, which runs through central Uttar Pradesh from north to south

(see following text), as the “Indo-Aryan East-West Divide”.

Figure 9: The Indo-Aryan East-West Divide (Ivani et al., 2021: 19)

As there is no natural barrier separating eastern and western Indo-Aryan from one another in

the Gangetic Plains, I attribute this schism to the fact that eastern India at the time of the eastward

expansion of Indo-Aryan was populated by a large number of ethnic groups, many of them Munda-

speaking but certainly also other groups, so that many of the first speakers of Indo-Aryan in eastern

India will have been adult second-language learners, which led to numerous morphosyntactic

simplifications in these languages (cf. Peterson, in press). To the west of this border, the number

of Munda and other non-Indo-Aryan-speaking groups was considerably lower, so that these

languages do not show such strong simplifications.

Note that Bhojpuri in Figure 9 clusters with the western languages, whereas it clusters with

the eastern languages in Figure 8.

This suggests that the border between eastern and western Indo-

Aryan is diffuse or “fuzzy” and lies somewhere in central Uttar Pradesh. As some dialects of

Awadhi show various features which are typical of eastern Indo-Aryan languages while other

dialects of the same language show more western features (Peterson, 2017: 244; Stroński &

Verbeke, 2021), I tentatively assume that Munda speakers were present in the eastern half of Uttar

Pradesh and that the diffuse East-West border was in central-to-eastern Uttar Pradesh, probably

between what are now the Awadhi and Bhojpuri-speaking areas.

This would mean that Munda languages were spoken in the eastern half of the Gangetic Plains,

but probably not much further west than central Uttar Pradesh: As enticing as it may be, I see no

evidence that Munda languages were spoken to the west of this region, and certainly not in the

Indus Valley Civilization.

4.2 Genetic studies

Studies in genetics, while still comparatively scarce for this region, are beginning to play an

increasingly important role in uncovering the prehistory of eastern and central India. For example,

Chaubey et al. (2017) investigate the genetic affiliation of the Dravidian-speaking Gond tribe,

located among others in the Satpura Range between Korku and the Kherwarian languages, and

conclude that despite speaking a Dravidian language closely related to Telugu, “all the Gond

groups shared extensive portions of their genomes within the group as well as with North and South

Munda groups” (Chaubey et al., 2017: 497). This leads them to assume large-scale language

shifting from Munda to Dravidian in that region.

There are also a small number of genetic studies suggesting that people of Austro-Asiatic

descent may be found throughout much of the Gangetic Plains. One of these is Chaubey et al.

(2008), which analyzes the genetic make-up of the Indo-Aryan-speaking Mushar.

The name of

this group derives from Indo-Aryan and means ‘mouse-eater’, as their traditional occupation was

to flush rats out of their holes in fields, which the Mushar also ate. As the map in Chaubey et al.

(2008: 43) shows, the Mushar are now primarily concentrated in northern and western Bihar and

in northern and eastern Uttar Pradesh.

Nepali is also a “flip-flop” language, being western in Figure 8 and eastern in Figure 9. This is probably due to the

fact that it is originally a western language that spread to the east in the past few centuries so that it, like Awadhi (see

the following main text), has eastern and western dialects showing very different morphosyntactic features.

Genetic studies such as Narasimhan et al. (2016) and Shinde et al. (2019) suggest this as well.

Also spelled Musahar, Mushera and Mushahar.

Based on samples collected from 168 Mushar, 135 “Austro-Asiatic” speakers

and 151 Indo-

European-speaking individuals, Chaubey et al. (2008) conclude: “Indeed, this analysis shows

unambiguously that the Mushar population clusters with the Austro-Asiatic populations both in the

mtDNA and Y chromosomal PCA slots.” (Chaubey et al., 2008: 44) They attribute this to language

shift from an Austro-Asiatic language to the Indo-European languages of the region. They also

note that some speakers apparently still speak a Munda language, although they do not mention its

name (cf. Chaubey et al., 2008: 42). This would be important information, as I am not aware of

any Munda language spoken in Uttar Pradesh and Bihar up to the border with Nepal, where the

highest density of Mushar live, according to the map in Chaubey et al. (2008: 43).

Studies by David Reich have uncovered similar examples. Reich and his colleagues identify

two main genetic groups in South Asia, which they term “ANI” for “Ancestral North Indians” and

“ASI” for “Ancestral South Indians”. The ANI group is ultimately related to western Eurasians and

derives from the migration of this group into the subcontinent from the Eurasian steppe, which

presumably also brought Indo-European languages to South Asia. The second group, ASI, is

hypothesized to be deeply related to the Andaman Islanders and to have been in South Asia for

several millennia.

Reich and his colleagues speak of the “Indian Cline” to refer to the different

proportions of ANI and ASI ancestry in the genetic make-up of individuals and/or ethnic groups.

In Reich et al. (2009: 492), a number of ethnic groups of South Asia are positioned with respect to

this cline, with Kashmiri Pandits showing the most affinity with Europeans and Dravidian-speaking

Kurumbas among those showing the least.

Not surprisingly, the Munda-speaking Santals and Kharia are not positioned along this cline

but at some distance from it, due to their Southeast Asian ancestry. More interestingly, this also

holds true of the Sahariya, a “low-caste” Indo-Aryan-speaking group whose members for this study

are from Uttar Pradesh, with the four samples for this study from individuals living near Allahabad

in the eastern part of the state. This suggests that this ethnic group originally spoke a Munda

language which they gave up at some point in favor of an Indo-European one.

Similarly, members of the Tharu ethnic group were also represented in this study. The Tharu

are Indo-Aryan-speaking tribal groups found primarily in the Nepalese lowlands, although the nine

samples in this study were from Uttarakhand, near Nainital. In this study, one sample clustered

closely with the above-mentioned Sahariya off-cline, while the remaining members of this ethnic

group in the diagram are either on or much closer to the cline.

The significance of this information is that we find tribal and “low-caste” groups/individuals

both to the south and north of the Ganges River showing genetic similarities with speakers of

Munda languages. These facts, combined with the data above from Chaubey et al. (2008) with

respect to the Mushar, suggest that what are now Bihar and the eastern half of Uttar Pradesh,

possibly up to the Tarai lowlands of southern Nepal, were once settled by Munda-speaking ethnic

groups – and also by other groups – just as what are now Gondi- (and Indo-Aryan-) speaking areas

between Korku and the remaining North Munda languages further to the east were once also likely

Munda-speaking (Chaubey et al., 2017).

Note that “Austro-Asiatic” here refers to well established Munda groups but also includes the Mawasi group of Sidhi

District of eastern Madhya Pradesh. Unfortunately, I could find no literature confirming that this group is in fact

“Austro-Asiatic”, at least with respect to its traditional language.

The Ethnologue (Eberhard et al., 2021) lists two forms of Musahar, one a dialect of Maithili (Indo-Aryan), the other

an alternative name of the Indo-Aryan language Musasa of Nepal, and Musahari, a dialect of Bhojpuri (Indo-Aryan);

the Glottolog (Hammarström et al., 2021) contains an entry for Musahari, but only as a dialect of Bhojpuri. No mention

is made in either source of a Munda language with this name.

This discussion is in fact considerably more complex than described here. Cf. e.g. Narasimhan et al. (2019) and

Shinde et al. (2019) with respect to the Indus-Periphery and its role in the formation of ANI and ASI.

In sum: The data portrayed in Figures 8 and 9 above clearly indicate a linguistic division cutting

right through Uttar Pradesh which undoubtedly has to do with ethnic groups speaking different

languages meeting at approximately this position in the distant past, as there are no topographic

barriers here to separate these groups and cause such a clear typological division. I argue that the

ethnicities to the east of this divide will have been Munda-speaking and other groups, living in

largely egalitarian societies with no great differences between them with respect to status and with

no single predominant language. To the west of this divide, Indo-Aryan had become predominant

by ca. 600 BCE. The fact that an increasing number of genetic studies are producing evidence that

members of different “low-caste” or tribal groups in this area show genetic features typical of

Austro-Asiatic speakers lends further support to this conclusion.

5 Conclusion

In the present paper I suggest that the original spread of Munda languages in India was considerably

larger than the present-day spread of this family suggests and originally included the eastern

Gangetic Plains. Although the Munda-speaking groups until recently only inhabited the South

Central Highlands, the Eastern Plateau and the northernmost Eastern Hills, there is good reason to

believe that this was not the maximal prehistoric spread of these groups.

From a linguistic point of view, an original spread of Munda restricted to these often rugged

hill regions is problematic, as it would mean that the spread of Munda languages until quite recently

was essentially restricted to residual/accretion zones, in the terminology of Nichols (1992; 1997),

i.e., relatively isolated regions of little value with respect to agriculture or trade. As Proto-Munda

speakers very likely cultivated rice and other crops (e.g. Zide & Zide, 1976), it seems highly

counter-intuitive that they would have chosen to live in remote, difficult-to-reach areas with

suboptimal conditions for this type of agriculture when they must surely have been aware of the

fertile Gangetic Plains just to the north of their habitat.

A small but growing number of genetic studies (e.g. Chaubey et al., 2008; Reich et al., 2009)

also present evidence suggesting that the genetic make-up of tribal and “low-caste” ethnic groups

in Uttar Pradesh and Bihar, who now speak Indo-Aryan languages, show similarities with speakers

of Austro-Asiatic languages, again raising the possibility that Munda-speakers were once found

throughout the eastern Gangetic Plains, before they later switched to Indo-Aryan.

Finally, the so-called “Indo-Aryan East-West Divide”, suggested in Peterson (2017) and Ivani

et al. (2021), shows a clear typological division between eastern and western Indo-Aryan languages

whose diffuse border lies in Uttar Pradesh. This is significant since the present linguistic situation

in the Gangetic Plains is a textbook example of a spread zone, with no major geological barrier

such as a mountain range which would have separated the eastern and western groups from one

another physically.

I suggest that different ethnic groups speaking different languages were found on either side of

this Indo-Aryan divide at an earlier date: Indo-Aryan languages were predominant on the western

side of this divide by 600 BCE, whereas Munda-speaking and other groups predominated in the

east, where they had presumably already settled before the first Indo-Aryan speakers arrived in

South Asia. Due to the military, technological and economic advantages of the Indo-Aryan

speakers, their languages quickly spread after this time to the eastern Gangetic Plains as well,

replacing the Munda (and other) languages there. As most of these early learners will have been

adult learners, this resulted in considerable morphosyntactic simplifications in eastern Indo-Aryan

languages which did not take place in western Indo-Aryan languages (cf. Peterson, in press),

resulting in the Indo-Aryan East-West Divide that we find today.

With the advance of Indo-Aryan speakers in ever larger numbers into the Gangetic Plains, it is

likely that many Munda speakers sought refuge in the hill regions to the south, as there were

undoubtedly at least occasional skirmishes with the newcomers, and along the east coast fierce

battles over the kingdom of Kalinga are known to have taken place in the 3rd century BCE.

These hill regions will certainly not have been entirely empty, with speakers of residual

languages from pre-Munda times already living there and perhaps already a few Munda groups.

Here, in their new, relatively secluded homelands, many of these Munda languages have managed

to survive up to the present, often at the expense of their new neighbors’ languages, although the

isolate Nihali and several smaller Dravidian languages have survived right up to the present. In

contrast, those Munda-speaking groups who remained in the plains will probably have switched

entirely to Indo-Aryan within a few generations. It is this smaller distribution of Munda languages,

restricted to the central and eastern hill regions, which has remained essentially unchanged until

the present, not the earlier maximal spread.

The views presented in this study are those of a language typologist who specializes in language

contact, so that the data I present here is necessarily interpreted from this perspective. Nevertheless,

I believe that the maximal prehistoric spread of Munda suggested here, with Munda speakers

previously inhabiting large swaths of the eastern Gangetic Plains, is compatible not only with

general principles of language contact and areal typology, but also with the presumed agricultural

status of Proto-Munda speakers, with findings from archaeology, as well as with an increasing

number of genetic studies of ethnic groups of the Gangetic Plains.

Acknowledgments

I wish to thank the German Research Council (DFG) for a generous grant which allowed me to

conduct this research as well as the research cited here as Peterson and Baraik (in press), Ivani et

al. (2020), Paudyal and Peterson (2020) and Peterson (in press) within the project Towards a

Linguistic Prehistory of Eastern Central South Asia (and Beyond), as well as for the Cluster of

Excellence ROOTS - Social, Environmental and Cultural Connectivities of Past Societies, to

which I belong, which provided a stimulating research environment for this work.

6 Literature

Abbi, Anvita. 2009. Is Great Andamanese genealogically and typologically distinct from Onge and Jarawa? Language

Sciences 31(6). 791–812. https://doi.org/10.1016/j.langsci.2008.02.002

Anderson, Gregory D.S. 2015. Overview of the Munda languages. In Jenny & Sidwell (eds.), 364-414.

Bryant, David & Vincent Moulton. 2004. Neighbor-Net: An agglomerative method for the construction of

phylogenetic networks. Molecular Biology and Evolution 21/2: 255–265.

Chaubey, Gyaneshwer, Mait Metspalu, Monika Karmin, Kumarasamy Thangaraj, Siiri Rootsi,

Juri Parik, Anu Solnik, Deepa Selvi Rani, Vijay Kumar Singh, B. Prathap Naidu, Alla G. Reddy, Ene Metspalu, Lalji

Singh, Toomas Kivisild & Richard Villems. 2008. Language Shift by Indigenous Population: A model genetic

study in South Asia. International Journal of Human Genetics 8/1-2: 41-50.

https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.526.9968&rep=rep1&type=pdf

Chaubey, Gyaneshwer, Rakesh Tamang, Erwan Pennarun, Pavan Dubey, Niraj Rai, Rakesh Kumar Upadhyay,

Rajendra Prasad Meena, Jayanti R. Patel, George van Driem, Kumarasamy Thangaraj, Mait Metspalu & Richard

Villems. 2017. Reconstructing the population history of the largest tribe of India: the Dravidian speaking Gond.

European Journal of Human Genetics 25, 493–49. DOI: 10.1038/ejhg.2016.198

Das, Nayan Jyoti. 2020. History of origin of the Santals of India. Journal of Xi’an University of Architecture &

Technology, 12:5. 1222-1226. https://doi.org/10.37896/JXAT12.05/1520

Diffloth, Gérard. 2005. The contribution of linguistic palaeontology to th homeland of Austro-Asiatic. Laurent Sagart,

Roger Blench & Alicia Sanchez-Mazas (eds.), The peopling of East Asia: Putting together archaeology, linguistics

and genetics. London: Routldege Curzon. 77-80.

Eberhard, David M., Gary F. Simons, and Charles D. Fennig (eds.). 2021. Ethnologue: Languages of the World.

Twenty-fourth edition. Dallas, Texas: SIL International. Online version: http://www.ethnologue.com (Accessed

21 August, 2021).

Epps, Patience. 2020. Amazonian linguistic diversity and its sociocultural correlates. In: Mily Crevels & Pieter

Muysken (eds.), Language dispersal, diversification and contact: A global perspective. Oxford: Oxford University

Press. 275-290.

Hammarström, Harald, Robert Forkel, Martin Haspelmath & Sebastian Bank. 2021.

Glottolog 4.4. Leipzig: Max Planck Institute for Evolutionary Anthropology.

http://glottolog.org (Accessed on 2021-08-21).

Heggarty, Paul. 2014. Prehistory through language and archaeology. In Claire Bowern & Bethwyn Evans (eds.),

Routledge handbook of historical linguistics, 598–626. London: Taylor and Francis.

Ivani, Jessica, Netra Prasad Paudyal, John Peterson. 2021. A house divided? Evidence for the East-West Indo-Aryan

divide and its significance for the study of northern South Asia. In J. Ivani & J. Peterson (eds.). Special issue of

Journal of South Asian Languages and Linguistics.

Jenny, Mathias & Paul Sidwell (eds.), The handbook of Austroasiatic languages [Grammars and Language Sketches

of the World’s Languages, Mainland and Insular Southeast Asia]. Leiden & Boston: Brill.

Jenny, Mathias, Tobias Weber & Rachel Weymuth. 2015. The Austroasiatic Languages: A Typological Overview. In

Jenny & Sidwell (eds.), 13–143.

Kingwell-Banham, Eleanor, Emma Karoune nee Harvey, Rabindra Kumar Mohanty & Dorian Q. Fuller. 2018.

Archaeobotanical investigations into Golbai Sasan and Gopalpur, two Neolithic-Chalcolithic settlements of

Odisha. Ancient Asia, 9/5: 1-14. DOI: https://doi.org/10.5334/aa.164

Kuiper, Franciscus Bernardus Jacobus. 1948. Proto-Munda words in Sanskrit. [Verhandeling der Koninklijke

Nederlandse Akademie van Wetenschappen, Afd. Letterkunde, Nieuwe Reeks, Deel LI, No. 3.] N.V. Noord-

Hollandsche Uitgevers Maatschappij.

Kulke, Hermann & Dietmar Rothermund. 1991. A History of India. Calcutta/Allahabad/Bombay/Delhi: Rupa.

Kumar, Vikrant, Arimanda N.S. Reddy, Jagedeesh P. Babu, Tipirisetti N. Rao, Banrida T. Langstieh, Kumarasamy

Thangaraj, Alla G. Reddy, Lalji Singh & Battini M. Reddy. 2007. Y-chromosome evidence suggests a common

paternal heritage of Austro-Asiatic populations. BMC Evolutionary Biology, 7/47. DOI: 10.1186/1471-2148-7-47.

Metspalu, Mait, Mayukh Mondal & Gyaneshwer Chaubey. 2018. The genetic makings of South Asia. Current Opinion

in Genetics & Development, 53: 128-133.

https://doi.org/10.1016/j.gde.2018.09.003

Narasimhan, Vagheesh M., Nick Patterson, Priya Moorjani, et al. 2019. The Formation of human populations in South

and Central Asia. Science, Vol. 365, Issue 6457, eaat7487: 1-15. DOI: 10.1126/science.aat7487

Nichols, Johanna. 1992. Linguistic Diversity in Space and Time. Chicago: Chicago University Press.

Nichols, Johanna. 1997. Modeling Ancient Population Structures and Movement in Linguistics. Annual Review of

Anthropology 26. 359–384.

Paudyal, Netra Prasad and John Peterson. 2021. How one language became four: The impact of different contact-

scenarios between “Sadani” and the tribal languages of Jharkhand. In J. Ivani & J. Peterson (eds.). Special issue

of Journal of South Asian Languages and Linguistics. https://doi.org/10.1515/jsall-2021-2028

Peterson, John. 2017. “Fitting the pieces together. Towards a linguistic prehistory of eastern-central South Asia (and

beyond).” Journal of South Asian Languages and Linguistics, 4/2: 211-257.

https://doi.org/10.1515/jsall-2017-0008

Peterson, John. 2018. “Towards a linguistic prehistory of eastern-central South Asia (and beyond).” Keynote speech

at the 39th Conference of the Linguistic Society of Nepal. Tribhuvan University, Kathmandu, Nepal. November

29, 2018.

Peterson, John. in press. A sociolinguistic-typological approach to the linguistic prehistory of South Asia - Two case

studies. Language Dynamics and Change.

Peterson, John. forthcoming. Mountains, plains, rivers and social (in)equality – the linguistic perspective.

Rau, Felix and Paul Sidwell. 2019. The Munda maritime hypothesis. Journal of the Southeast Asian Linguistics Society

JSEALS 12:2: 35-57. http://hdl.handle.net/10524/52454

Reich, David, Kumarasamy Thangaraj, Nick Patterson, Alkes L. Price & Lalji Singh. 2009. Reconstructing Indian

population history. Nature 461. 489–495.

Risley, H.H. The tribes and castes of Bengal. Two volumes. Calcutta: Bengal Secretariat Press. [Reprint: Calcutta: P.

Mukerjee].

Shinde, Vasant, Vagheesh M. Narasimhan, Nadin Rohland, Swapan Mallick, Matthew Mah, Mark Lipson, Nathan

Nakatsuka, Nicole Adamski, Nasreen Broomandkhoshbacht, Matthew Ferry, et al. 2019. An ancient Harappan

genome lacks ancestry from steppe pastoralists or Iranian farmers. Cell 179(3). 729–735.

Sidwell, Paul. 2015. Austroasiatic Classification. In Jenny & Sidwell (eds.), 144-220.

Sidwell, Paul & Felix Rau. 2015. Austroasiatic comparative-historical reconstruction: An overview. In Jenny &

Sidwell (eds.), 221-363.

Stroński, Krzysztof & Saartje Verbeke. 2020. Shaping modern Indo-Aryan isoglosses. Poznań Studies in

Contemporary Linguistics 56/3: 529-552.

Trudgill, Peter. 2011. Sociolinguistic typology. Social determinants of linguistic complexity. Oxford: Oxford

University Press.

Wikipedia. Geography of India. https://en.wikipedia.org/wiki/Geography_of_India

Wikipedia. Substratum in the Vedic language:

https://en.wikipedia.org/wiki/Substratum_in_Vedic_Sanskrit#cite_note-35

Witzel, Michael. 1999. Substrate languages in Old Indo-Aryan (Ṛgvedic, Middle and Late Vedic). Electronic Journal

of Vedic Studies. 5(1). 1–67.

Zide, Arlene R.K. & Norman H. Zide. 1976. Proto-Munda cultural vocabulary: evidence for early agriculture.

Austroasiatic Studies 2, edited by Philip N. Jenner, Laurence C. Thompson and Stanley Starosta. [Oceanic

Linguistics Special Publications, 13]. Honolulu: University Press of Hawaii. 1295-1334.

Zide, Norman H. 1969. Munda and non-Munda Austroasiatic languages. In Thomas Sebeok (ed.), Current Trends in

Linguistics 5, 411–430. The Hague: Mouton.

ResearchGate has not been able to resolve any citations for this publication.

Language Dispersal, Diversification, and Contact

Article

Full-text available

Jul 2020

How did languages spread across the globe? Why do we sometimes find large language families, distributed over a wider area, and sometimes clusters of very small families or language isolates (i.e. languages without known relatives)? What was the role of agriculture in language spread? What do different language ideologies and patterns of ethnic identity formation contribute? What influence do geography and climate have?The availability of increasingly large databases and new analytical research techniques make it possible to provide new answers to these long standing questions. This book focuses on patterns of language dispersal, diversification, and contact in a global perspective by comparing the complex language and population histories of Island Southeast Asia/Oceania, Africa, and South America in terms of history and patterns of settlement, conceptions of ethnicity, and communication strategies. These three regions were selected because they show interesting contrasts in the distribution of languages and language families.

A sociolinguistic-typological approach to the linguistic prehistory of South Asia Two case studies

Article

Full-text available

Mar 2022

John Peterson

The present study compares two Indo-Aryan languages, Sadri and Konkani, with respect to their morphological complexity. Based on assumptions made in sociolin-guistic typology (e.g., Trudgill, 2011), which forms part of a larger research program investigating the effects of social factors on language structures, this study attempts to reconstruct various aspects of prehistoric society based on the structures of these two modern languages as typical representatives of eastern and western Indo-Aryan, respectively. The results suggest that 2,000-2,500 years ago eastern and western Indo-Aryan languages were spoken in very different sociolinguistic environments, with a high degree of ethnic and linguistic diversity in eastern India and a comparatively low level of diversity in the west. The study also confirms the results of other studies which suggest that different areas of grammar, such as nominal and verbal systems, may be affected to different degrees in language contact and that their respective rates of (re)complexification may also differ.

Indo-Aryan – a house divided? Evidence for the east–west Indo-Aryan divide and its significance for the study of northern South Asia

Article

Full-text available

Aug 2021

In this study, we investigate the possible presence of an east–west divide in Indo-Aryan languages suggested in previous literature (Peterson, John. 2017a. Fitting the pieces together – towards a linguistic prehistory of eastern-central South Asia (and beyond). Journal of South Asian Languages and Linguistics 4(2). 211–257.), with the further hypothesis that this divide may be linked to the influence of the Munda languages, spoken in the eastern part of the subcontinent. Working with 217 fine-grained variables on a sample of 27 Indo-Aryan and Munda languages, we test the presence of a geographical divide within Indo-Aryan using computational methods such as cluster analysis in combination with visual statistical inference. Our results confirm the presence of a geographical divide for the whole dataset and most of the individual features. We then proceed to compute the degree of similarity between the Indo-Aryan languages and Munda, using a Bayesian alternative to a t-test. The results for most features support the claim that the languages identified in the eastern clusters are indeed more similar to Munda, thereby opening up further research scenarios for the history of this region.

How one language became four: the impact of different contact-scenarios between “Sadani” and the tribal languages of Jharkhand

Article

Full-text available

May 2021

Four Indo-Aryan linguistic varieties are spoken in the state of Jharkhand in eastern central India, Sadri/Nagpuri, Khortha, Kurmali and Panchparganiya, which are considered by most linguists to be dialects of other, larger languages of the region, such as Bhojpuri, Magahi and Maithili, although their speakers consider them to be four distinct but closely related languages, collectively referred to as “Sadani”. In the present paper, we first make use of the program COG by the Summer Institute of Linguistics (SIL) to show that these four varieties do indeed form a distinct, compact genealogical group within the Magadhan language group of Indo-Aryan. We then go on to argue that the traditional classification of these languages as dialects of other languages appears to be based on morphosyntactic differences between these four languages and similarities with their larger neighbors such as Bhojpuri and Magahi, differences which have arisen due to the different contact situations in which they are found.

The formation of human populations in South and Central Asia

Article

Full-text available

Sep 2019

Ancient human movements through Asia Ancient DNA has allowed us to begin tracing the history of human movements across the globe. Narasimhan et al. identify a complex pattern of human migrations and admixture events in South and Central Asia by performing genetic analysis of more than 500 people who lived over the past 8000 years (see the Perspective by Schaefer and Shapiro). They establish key phases in the population prehistory of Eurasia, including the spread of farming peoples from the Near East, with movements both westward and eastward. The people known as the Yamnaya in the Bronze Age also moved both westward and eastward from a focal area located north of the Black Sea. The overall patterns of genetic clines reflect similar and parallel patterns in South Asia and Europe. Science , this issue p. eaat7487 ; see also p. 981

Reconstructing the population history of the largest tribe of India: the Dravidian speaking Gond

Article

Full-text available

Feb 2017
EUR J HUM GENET

The Gond comprise the largest tribal group of India with a population exceeding 12 million. Linguistically, the Gond belong to the Gondi-Manda subgroup of the South Central branch of the Dravidian language family. Ethnographers, anthropologists and linguists entertain mutually incompatible hypotheses on their origin. Genetic studies of these people have thus far suffered from the low resolution of the genetic data or the limited number of samples. Therefore, to gain a more comprehensive view on ancient ancestry and genetic affinities of the Gond with the neighbouring populations speaking Indo-European, Dravidian and Austroasiatic languages, we have studied four geographically distinct groups of Gond using high-resolution data. All the Gond groups share a common ancestry with a certain degree of isolation and differentiation. Our allele frequency and haplotype-based analyses reveal that the Gond share substantial genetic ancestry with the Indian Austroasiatic (ie, Munda) groups, rather than with the other Dravidian groups to whom they are most closely related linguistically.European Journal of Human Genetics advance online publication, 1 February 2017; doi:10.1038/ejhg.2016.198.

An Ancient Harappan Genome Lacks Ancestry from Steppe Pastoralists or Iranian Farmers

Article

Sep 2019
CELL

We report an ancient genome from the Indus Valley Civilization (IVC). The individual we sequenced fits as a mixture of people related to ancient Iranians (the largest component) and Southeast Asian hunter-gatherers, a unique profile that matches ancient DNA from 11 genetic outliers from sites in Iran and Turkmenistan in cultural communication with the IVC. These individuals had little if any Steppe pastoralist-derived ancestry, showing that it was not ubiquitous in northwest South Asia during the IVC as it is today. The Iranian-related ancestry in the IVC derives from a lineage leading to early Iranian farmers, herders, and hunter-gatherers before their ancestors separated, contradicting the hypothesis that the shared ancestry between early Iranians and South Asians reflects a large-scale spread of western Iranian farmers east. Instead, sampled ancient genomes from the Iranian plateau and IVC descend from different groups of hunter-gatherers who began farming without being connected by substantial movement of people.

Fitting the pieces together – Towards a linguistic prehistory of eastern-central South Asia (and beyond)

Article

Sep 2017

John Peterson

This study summarizes preliminary research into the distribution of morphosyntactic patterns in the languages of South Asia from three different families, above all in eastern-central South Asia, in a first attempt to unravel the linguistic prehistory of this part of the subcontinent. To achieve this goal a small, preliminary morphosyntactic database has been compiled on 29 languages from throughout South Asia based on data from published resources, original field work, as well as questionnaires sent out to researchers working on a number of languages from the region. This data base, although still quite limited, will serve as the starting point for a much larger, finer-grained analysis of languages from throughout the subcontinent which will ultimately contribute substantially to our knowledge of the linguistic prehistory of this region.

3 Austroasiatic Classification

Chapter

Dec 2014

Paul Sidwell

Fitting the pieces together. Towards a linguistic prehistory of eastern-central South Asia (and beyond)

Working Paper

Dec 2016

John Peterson

The spread of Munda in prehistoric South Asia -the view from areal typology To appear in: Volume in Celebration of the Bicentenary of Deccan College Post-Graduate and Research Institute (Deemed University).

Abstract and Figures

Recommended publications

A sociolinguistic-typological approach to the linguistic prehistory of South Asia Two case studies

Fitting the pieces together – Towards a linguistic prehistory of eastern-central South Asia (and bey...

Towards a Typology of Negation in South Asian Languages

Indo-Aryan – a house divided? Evidence for the east–west Indo-Aryan divide and its significance for...