A Linguistic Analysis of Northern Sotho (524 pages)

The Lemmatisation of Adverbs in Northern Sotho*

Article

Full-text available

Oct 2011
LEXIKOS

D.J. Prinsloo

p>Abstract: To date Northern Sotho metalexicographers have focused their attention on lemma-tisation problems in respect of the so-called main or primary part of speech categories, viz. nouns and verbs. See, for example, Prinsloo and De Schryver (1999) and Prinsloo and Gouws (1996). No attention has been given to the lemmatisation of adverbs. The latter are regarded by Ziervogel and Mokgokong (1975: 114, Introduction) as a "secondary part of speech". The treatment of adverbs in Northern Sotho dictionaries is marred by inconsistencies such as omissions from the macrostruc-ture, insufficient and inconsistent labelling, inferior treatment in the microstructure, under-utiliza-tion of the mediostructure and outer texts, and reflects a lack of a strategy of selection of items for lemmatisation. Linguistic descriptions of adverbs in currently available grammars vary substan-tially and therefore confuse learners of the language and inexperienced lexicographers1. The aim of this article is to offer solutions to the lemmatisation problems regarding adverbs in Northern Sotho and to propose guiding entries for paper and electronic dictionaries which could serve as models for future dictionaries. The treatment of adverbs in Northern Sotho dictionaries will also be criti-cally evaluated, especially in terms of frequency of use and target users' needs. Keywords: LEXICOGRAPHY, LEMMATISATION, ADVERBS, INFORMATION RETRIEV-AL, ELECTRONIC DICTIONARY, MACROSTRUCTURE, MICROSTRUCTURE, CROSS-REFER-ENCING, MEDIOSTRUCTURE, DICTIONARY, AFRICAN LANGUAGES, BACK MATTER, NORTHERN SOTHO Opsomming: Die lemmatisering van bywoorde in Noord-Sotho. Tot dusver het Noord-Sotho metaleksikograwe hulle aandag bepaal by lemmatiseringsprobleme ten opsigte van die sogenaamde primêre woordkategorieë, naamlik naamwoorde en werkwoorde. Vergelyk byvoorbeeld, Prinsloo en De Schryver (1999) en Prinsloo en Gouws (1996). Geen aandag is gegee aan die lemmatisering van bywoorde nie. Laasgenoemde word deur Ziervogel en Mokgokong (1975: 72, Inleiding) as 'n "sekondêre rededeel" beskou. Die bewerking van bywoorde in Noord-Sotho woordeboeke word bederf deur inkonsekwenthede soos weglatings uit die makrostruktuur, onvol-doende en inkonsekwente etikettering, minderwaardige bewerking in die mikrostruktuur, onder-benutting van die mediostruktuur en buitetekste, en vertoon 'n gebrek aan 'n strategie vir seleksie van items vir lemmatisering. Taalkundige beskrywings van bywoorde in tans beskikbare gramma-tikas verskil grootliks en verwar dus aanleerders van die taal en onervare leksikograwe.2 Die doel van hierdie artikel is om oplossings aan die hand te doen vir die lemmatiseringsprobleme rakende bywoorde in Noord-Sotho en gidsinskrywings voor te stel vir papier- en elektroniese woordeboeke wat as modelle vir toekomstige woordeboeke kan dien. Die bewerking van bywoorde in Noord-Sotho woordeboeke sal ook krities geëvalueer word, veral ten opsigte van gebruiksfrekwensie en teikengebruikers se behoeftes. Sleutelwoorde: LEKSIKOGRAFIE, LEMMATISERING, BYWOORDE, INLIGTINGSONT-SLUITING, ELEKTRONIESE WOORDEBOEK, MAKROSTRUKTUUR, MIKROSTRUKTUUR, KRUISVERWYSING, MEDIOSTRUKTUUR, WOORDEBOEK, AFRIKATALE, AGTERWERK, NOORD-SOTHO</p

A computational implementation of the Northern Sotho infinitive

Article

Oct 2012

The aim of this article is to describe the infinitive in Northern Sotho based on corpus data and the respective literature; so far, all share the same view: The infinitive is a noun (of class 15) and a verb at the same time—‘it manifests both nominal as well as verbal features’ (Poulos & Louwrens, 1994:42). When implementing these constellations in a parser, however, a new perspective is found: to achieve its successful implementation, the infinitive must be defined as a verb on the one hand and as a noun of class 15 on the other, derived from this verb through nominalization (transposition). Instead of a subject concord, the verb stem in the infinitive is preceded by the respective class prefix.

Questions in Northern Sotho

Article

Jan 2006

Sabine Zerbian

This article gives an overview of the marking of polar and constituent questions in Northern Sotho, a Bantu language of South Africa. It thereby provides a contribution to the typological investigation of sentence types in the world's languages. As will be shown, Northern Sotho follows cross-linguistic tendencies in marking interrogative sentences: It uses intonation as main indicator in polar questions and question words as main indicator in constituent questions. Nevertheless, it also shows interesting language-specific variation, e.g. with respect to the location of raised intonation in polar questions, the presence of two pragmatically distinct question particles in polar questions, or a split in the formation of constituent questions based on the grammatical function of the questioned constituent.

The lemmatization of copulatives in northern Sotho

Article

Feb 2010
LEXIKOS

D.J. Prinsloo

For learners of Northern Sotho as a second or even foreign language, the copulative system is probably the most complicated grammatical system to master. The encoding needs of such learners, i.e. to find enough information in dictionaries in order to actively use copulatives in speech and writing, are poorly served in currently available dictionaries. The aim of this article is to offer solutions to the lemmatization problems regarding copulatives in Northern Sotho and to propose guiding entries for paper and electronic dictionaries which could serve as models for future dictionaries. It will be illustrated that the maximum utilisation of macrostructural and microstructural strategies as well as the mediostructure is called for in order to reach this objective. Prerequisites will be to reconstruct the entire copulative system in a user-friendly way, to abstract the rules governing the use of copulatives and to isolate the appropriate lemmas. The treatment of copulatives in Northern Sotho dictionaries will also be critically evaluated, especially in terms of frequency of use and target users' needs. Keywords: lexicography, lemmatization, copulatives, information retrieval, access structure, electronic dictionary, macrostructure, microstructure, cross-referencing, mediostructure, dictionary, African languages

Re-examining the relationship between the subject agreement morpheme and (in)definiteness in Northern Sotho

Article

Full-text available

Mar 2013

Mampaka Lydia Mojapelo

Die grammatikale posisie van die onderwerpnaamwoord-frase in Noord-Sotho is links van die predikaat. Die onderwerpskongruensie-morfeem is ’n verpligte skakel tussen die onderwerpnaamwoord-frase en die predikaat. Vakkundiges het die rol van hierdie morfeem uit verskillende perspektiewe bestudeer. Dit staan ook vas dat hierdie morfeem ’n tweeledige funksie het. Die funksie is eerstens om die kongruensie tussen die onderwerpnaamwoord en die predikaat te stel, en tweedens om as voornaamwoord na ’n voorafgaande gesprek te verwys. Hierdie artikel ondersoek opnuut die primêre rol van die onderwerpskongruensie-morfeem in Noord-Sotho as ’n funksie van die vertolking van ’n onderwerpnaamwoord as bepaald of onbepaald. Dit word gedoen (1) deur opnuut aandag te gee aan bestaande werke wat regstreeks of onregstreeks met (on)bepaaldheid en/of ’n onderwerpskongruensie-morfeem verbind is; (2) deur enkele tekste te ontleed wat bespreking van die saak kan aanhelp; en (3) deur die bevindinge van vorige werke met die huidige ontleding in verband te bring. Die eerste hipotese in hierdie artikel is dat daar gevalle kan voorkom wanneer naamwoord-frase as bepaald geïnterpreteer kan word as ’n klas 9 naamwoord-frase wat persone denoteer ’n klas 1 onderwerpskongruensie-morfeem oorneem. Die tweede hipotese is dat hoewel die onderwerpsposisie meestal as onderwerp en bepaald beskou word, sluit dit nie onbepaaldenaamwoord-frases uit nie, met die gevolg dat dit kan gebeur dat onbepaaldenaamwoord-frases deur middel van die onderwerpskongruensie-morfeem ook met predikate kan kongrueer.

Morphology and semantics of proper names in Northern Sotho

Article

Full-text available

Oct 2012

Mampaka Lydia Mojapelo

The objective of this article is to present the identifying features of the Northern Sotho proper name as a subcategory of the broader word category N (noun). The article analyses the identifying morphological features and semantic interpretations of two types of the proper name, namely, personal and place names in comparison to the common noun. Some proper names in Northern Sotho have the same phonetic form as the common noun, but can be distinguished by morpho-syntactic means. Common nouns may serve as modifying agents and compound with personal proper names for specific semantic effects. Four morphological forms reveal that both common and proper names are used to form place names, with specific morphological elements selecting either common or proper names. As languages advance, it becomes necessary that linguistic elements such as these be afforded further linguistic exploration for valid automatic identification so that they can be extracted from texts for various applications.

Morphological Sources of Phonological Length

Article

Anne Pycha

Determining the core vocabulary used by Sepedi-speaking children during regular preschool activities

Article

Feb 2021

Purpose: In order to provide equitable communication intervention and support services to clients from diverse cultural and linguistic backgrounds, the development of language-specific resources for assessment and intervention is needed. The purpose of the study was to develop a core vocabulary list based on language samples from Sepedi-speaking children, in order to make it available as a resource to inform vocabulary selection for augmentative and alternative communication (AAC) systems for children in need of AAC from a Sepedi language background. Method: The speech of six typically developing Sepedi-speaking children aged 5-6 years was recorded using small body-worn audio recording devices. Children were recorded during their regular pre-school day. The recordings were transcribed, coded and analysed. Result: The composite transcript consisted of 17 579 words, of which 1023 were different words. The core vocabulary was determined by identifying all words that were used with a minimal frequency of 0.05%, and were used by at least half of the participants. The Sepedi core vocabulary consisted of 226 words that accounted for 88.1% of the composite sample. Conclusion: The core vocabulary determined in this study represents a small pool of reusable linguistic elements that form the grammatical framework of the Sepedi language. As such, is a valuable resource that can be used to assist with vocabulary selection for children who require AAC and who come from a Sepedi language background.

The Status of Tone in Sesotho: A Production and Perception Study

Article

Full-text available

Dec 2017
J Afr Stud

Sesotho is generally described as a tonal language. This paper describes an investigation into the ability of a group of young speakers of Sesotho to perceive and produce tonal distinctions in Sesotho. In particular, it focuses on the status of the subject concord morpheme in the question phrases o batlang? (2SG; 'What do you want?') and ó batlang? (3SG/CL1; 'What does (s)he want?'). Thirty young mother tongue speakers aged between 15 and 17 participated in the study, which was conducted over a period of three months. The findings of both experiments clearly showed the functional use of tone by these speakers to be limited, or in some cases even totally absent. These results are suggestive of a system in a state of flux, perhaps indicating the start of an evolutionary process.

A corpus-driven account of the noun classes and genders in Northern Sotho

Article

Full-text available

Oct 2016
SO AFR LINGUIST APPL

This article offers a distributional corpus analysis of the Northern Sotho noun and gender system. The aim is twofold: first, to assess whether the existing descriptions of the noun class system in Northern Sotho are corroborated by information provided by the analysis of a large electronic corpus for this language, with specific reference to singular-plural pairings, and second, to present a number of novel visualisation aids to characterise a noun class system (in a radar diagram) and a noun gender system (using a two-directional weighted representation) for Northern Sotho in particular, and for any Bantu language in general. The findings include the discovery of two new genders in Northern Sotho (i.e. class pairs 1/6 and 3/10), and also indicate that the Northern Sotho noun class system, and by extension any one for Bantu, should be seen as dynamic.

Implementation of a Part-of-Speech Ontology: Morphemic Units of Bantu languages.

Article

Full-text available

Jan 2015
J Afr Stud

In a previous article (Faaß et al., 2012), a first attempt was made at documenting and encoding morphemic units of two South African Bantu languages, i.e. Northern Sotho and Zulu, with the aim of describing and storing the morphemic units of these two languages in a single relational database, structured as a hierarchical ontology. As a follow-up, the current article describes the implementation of our part-of-speech ontology. We give a detailed description of the morphemes and categories contained in the database, highlighting the need and reasons for a flexible ontology which will provide for both language specific and general linguistic information. By giving a detailed account of the methodology for the population of the database, we provide linguists from other Bantu languages with a road map for extending the database to also include their languages of specialization.

The Afrikaans Orthographic Rules as Guide for Other South African Languages

Article

Full-text available

Dec 2013
LEXIKOS

Marietta Alberts

The spelling and orthographic rules of a language are very important for compilers of general and technical dictionaries. When compiling a dictionary, the lexicographers and terminographers should adhere to these rules. The word-forming principles of a language form part of these rules, and new terms can only be coined in a given language if the spelling and orthographic rules of the language are followed. In this article work on the rules of Afrikaans spelling and orthography, by and large, is reported. It is hoped that some of the lessons learned in the process could serve as guidelines for parallel processes of the standardisation of the spelling and orthographies of the other South African languages. With the establishment of the National Language Body for Afrikaans (NLBA) of the Pan South African Language Board (PanSALB), it was decided that the members of the Taalkommissie (Language Commission) of the Suid-Afrikaanse Akademie vir Wetenskap en Kuns (SAAWK) (South African Academy for Science and the Arts) would become members of the NLBA. These members then became the Technical Committee for Standardisation (TC Standardisation). Since its establishment, the members of the Taalkommissie cum TC for Standardisation (Commission) continued with their work on the standardisation of the spelling and orthographic rules for Afrikaans. Along with work on spelling and orthography, the long-term objectives set by the Commission are, inter alia, the conversion and adaptation of the Afrikaanse Woordelys en Spelreëls (AWS) to an electronic format to be made available in an electronic version (e-book) and online, as well as a standard grammar for Afrikaans to be compiled for international access on the Internet. The AWS is a resource compiled by the Commission to assist users of Afrikaans in writing the standard variety of the language. The AWS explains the ground rules of the Afrikaans spelling and orthographic conventions. The basic rules are provided in simplified language. No language is static, and there are always language changes to be taken into account. The vocabulary and pronunciation of a dynamic language could change over a period of time. These changes should be reflected in the spelling and orthographic system of the language. The article addresses issues such as the front matter of the AWS, spelling and orthographic principles and rules, and the back matter (i.e. a list of abbreviations, a list of international place names, transliteration table, etc.). Although the AWS is not a dictionary, it also contains a word list. Words which present spelling problems, neologisms and items requiring recognition as part of the Afrikaans vocabulary are some of the categories considered for inclusion in this list. In 2004 the NLBs of the other nine official African languages started with the revision of the spelling and orthographic rules of these languages. The first editions were published by PanSALB in 2008. No revisions have been compiled since. The AWS could serve as example of what could be done for the other African languages and even the Khoe and San languages.

Devices for Information Presentation in Electronic Dictionaries

Article

Full-text available

Nov 2012
LEXIKOS

Electronic dictionaries should support dictionary users by giving them guidance in text production and text reception, alongside a user-definable offer of lexicographic data for cognitive purposes. In this article, we sketch the principles of an interactive and dynamic electronic dictionary aimed at text production and text reception guiding users in innovative ways, especially with respect to difficult, complicated or confusing issues. The lexicographer has to do a very careful analysis of the nature of the possible problems to suggest an optimal solution for a specific problem. We are of the opinion that there are numerous complex situations where users need more detailed support than currently available in e-dictionaries, enabling them to make valid and correct choices. For highly complex situations, we suggest guidance through a decision tree-like device. We assume that the solutions proposed here are not specific to one language only but can, after careful analysis, be applied to e-dictionaries in different languages across the world.

Mapping between Disjoining and Conjoining Writing Systems in Bantu Languages: Implementation on Kwanyama

Article

Full-text available

Jan 2001

Arvi Hurskainen

Several Bantu languages have adopted a disjoining writing system, which forms a special challenge for automatic analysis of written text. In those systems, part of bound morphemes is treated as independent words, while other languages treat equivalent morphemes as affixes of a head morpheme. The concept of word is blurred in disjoining writing systems, because there is no systematic rule system for writing conventions. Not only are bound morphemes written as separate words, also independent words are sometimes written together as one string. The paper makes a claim that disjoining writing systems require a special treatment, before they can be analyzed successfully. This can be done either by pre-processing the text first into a conjoining format, so that the analysis program would get the input in a form that conforms to the linguistically more motivated writing system. Another possibility is to construct the morphological analyzer so that it identifies bound morphemes although they are written as separate words. This paper applies the first choice and describes a system, which maps between a disjoining and conjoining writing system. The implementation was made on Kwanyama, a Bantu language spoken in Southern Angola and Northern Namibia. The performance of the system is evaluated.

Designing a noun guesser for part of speech tagging in Northern Sotho

Article

Full-text available

Oct 2012

In this article, we describe an element of a suite of computational tools for assigning word-class tags (as a preparation for part of speech (POS) tagging) to word forms in unrestricted Northern Sotho texts. POS-tagging is a step towards a linguistic analysis of the texts, which in turn allows for advanced data extraction. The tool component that is described, identifies (and classifies) noun forms. Several types of linguistic knowledge are used to recognize nouns that are not contained in the noun lexicon of the system. These include the relationship between singular and plural noun prefixes, knowledge about noun derivation, and data about the co-occurrence of the candidate with concords, pronouns and adjectives in a local context. Our implementation is a symbolic, voting-based process: together, all tests determine whether a candidate is a noun; accuracy on unseen test data is around 92%.

Expression of information structure in the Bantu language Northern Sotho

Article

Jan 2006

Sabine Zerbian

Die Hauptthese dieser Dissertation ist, dass Nord-Sotho keinen obligatorischen Gebrauch von grammatischen Mitteln zur Markierung von Fokus macht, weder in der Syntax noch in der Prosodie oder Morphologie. Trotzdem strukturiert diese Sprache eine Äußerung nach informationsstrukturellen Aspekten. Konstituenten, die im Diskurs gegeben sind, werden entweder getilgt, pronominalisiert oder an den rechten oder linken Satzrand versetzt. Diese (morpho-)syntaktischen Prozesse wirken so zusammen, dass die fokussierte Konstituente oft final in ihrem Teilsatz erscheint. Obwohl die finale Position keine designierte Fokusposition ist, ist das Wissen um diese Tendenz doch entscheidend für das Verständnis einer morphologischen Alternation, die in Nord-Sotho am Verb erscheint und die in der Literatur im Zusammenhang mit Fokus diskutiert wurde. Obwohl also ein direkter grammatischer Ausdruck von formaler F(okus)-Markierung im Nord-Sotho fehlt, ist F-Markierung trotzdem entscheidend für die Grammatik dieser Sprache: Fokussierte logische Subjekte können nicht in kanonischer präverbaler Position erscheinen. Sie erscheinen stattdessen entweder postverbal oder in einem Spaltsatz, abhängig von der Valenz des Verbs. Obwohl Nord-Sotho bei Objekten im Gebrauch von Spaltsätzen eine Korrespondenz von komplexer Form mit komplexer Bedeutung zeigt, gilt diese Korrespondenz nicht für logische Subjekte. Die vorliegende Dissertation modelliert die oben genannten Ergebnisse im theoretischen Rahmen der Optimalitätstheorie (OT). Syntaktischer in situ Fokus und die Abwesenheit von prosodischer Fokusmarkierung können mit unkontroversen Beschränkungen erfasst werden. Für die Ungrammatikaliät fokussierter logischer Subjekte in präverbaler Position schlägt die vorliegende Arbeit die Modifizierung einer in der Literatur vorhandenen Beschränkung vor, die in Nord-Sotho von entscheidener Bedeutung ist. Die Form-Bedeutungs-Korrespondenz wird, wie andere Phänomene pragmatischer Arbeitsteilung auch, innerhalb der schwach bidirektionalen Optimalitätstheorie behandelt.

Explaining Variation in Reading Comprehension in Northern Sotho-English Bilingual Readers: A Simple View of Reading Perspective on Longitudinal Data

Article

Dec 2023

AN ANALYSIS OF PERSONAL NAMING APPROACHES BY YOUNGER AND OLDER GENERATIONS AMONG THE BAPEDI NATION: AN ONOMASTIC COMPARISON

Article

Full-text available

Feb 2023

The article aims to explore the naming approaches that have been used and still used among the Bapedi (Sepedi speakers) by both the older and the younger generation. The article employed the qualitative approach where Sepedi personal names were collected randomly and were also analysed qualitatively. As alluded to, two categories of names, those given by the older generation and those by the younger generation are investigated and analysed. The article argues that naming is a linguistic act that is used to express and communicate different messages to the members of the society. However, it is found that the name-giving process among the Bapedi has become problematic in the sense that a practice that used to be a prerogative of and an assignment for elders, that is, grandparents and other senior members of the family in the past is now a project of both the older and younger generation. Unfortunately, in the process of naming the young generation lacks transparency and neither consult the elders nor inform them. However, the article explored if the naming system is used by both groups to effectively communicate and record some of the historical events and happenings in the family prior and during the birth of the child. The research established similarities and differences in the naming process and the names given to the newly born babies.

The elevation of Sepedi from a dialect to an official standard language: Cultural and economic power and political influence matter

Article

Full-text available

Feb 2022

This study explored the role played by economic, cultural, and political power and influence when a particular dialect was elevated to the status of an official standard language. This was a qualitative study that employed text analysis where journal articles, dissertations, theses, academic books and Parliamentary Joint Constitutional Review minutes were considered for data collection and analysis. In order to supplement the above-mentioned method, 267 research participants involving students (undergraduate and postgraduate) and lecturers from the selected five South African universities, including members of the language authorities, were also invited to participate in the study. Self-administered survey questionnaires and face-to-face interviews were chosen as qualitative methods for data collection. From a dialectal point of view, this study indicated that all official standard languages were dialects before. However, these dialects were considered superior and elevated to the status of official languages because of socio-economic power and political influence. This article further recorded that the status type of language planning in the South African context is quite political in nature, not less linguistic. It was against this background that the researchers claim that there is no official standard language that was not a dialect before.

The lexicographic treatment of the demonstrative copulative in Sesotho sa Leboa — an exercise in multiple cross-referencing

Article

Feb 2010
LEXIKOS

In this research article an in-depth investigation is presented of the lexicographic treatment of the demonstrative copulative (DC) in Sesotho sa Leboa. This one case study serves as an example to illustrate the so-called 'paradigmatic lemmatisation' of closed-class words in the African languages. The need for such an approach follows a discussion, in Sections 1 and 2 respectively, of the present and missing directions in African-language metalexicography. A theoretical conspectus of the DC in Sesotho sa Leboa is then offered in Section 3, while Section 4 examines the treatment of the DC in the four existing desktop dictionaries for this language. The outcomes from the two latter sections are then used in Section 5, which analyses the problems of and options for a sound lexicographic treatment of the DC in bilingual and monolingual dictionaries. The next two sections proceed with a review of the practical implementation of the DC lemmatisation suggestions in PyaSsaL, i.e. the Pukuntšutlhaloši ya Sesotho sa Leboa 'Explanatory Sesotho sa Leboa Dictionary' — with Section 6 focussing on the hardcopy and Section 7 on the online version. In the process, the very first fully monolingual African-language dictionary on the Internet is introduced. Section 8, finally, concludes briefly. Keywords: lexicography, paradigmatic lemmatisation, african languages, sesotho sa leboa (northern sotho, sepedi), demonstrative copulative, cross-referencing, corpus, monolingual dictionary, bilingual dictionary

The lexicographic treatment of sublexical and multilexical items in a northern Sotho monolingual dictionary: a challenge for lexicographers

Article

Feb 2010
LEXIKOS

Motlokwe Clifford Mphahlele

Dictionaries have in the past used a word-based approach in which sublexical and multilexical items were not regarded as lemmata. Metalexicography as the theoretical component of lexicography requires that sublexical and multilexical items be lemmatized and treated as inde-pendent lemmata in the macrostructure of dictionaries. One of the greater challenges for compiling a better and user-oriented Northern Sotho monolingual dictionary is to treat sublexical and multi-lexical items as macrostructural elements. Treating these items, the lexicographer faces quite a number of challenges. This article proposes possible ways in which sublexical and multilexical elements could be successfully treated in a Northern Sotho monolingual dictionary. Taking stock of these challenges, the writer comes with suggestions that would assist lexicographers in the compi-lation of a user-friendly, lexicon-based monolingual dictionary that would lead users to successful information retrieval. Keywords: sublexical items, multilexical items, affixes, prefixal mor-phemes, suffixal morphemes, integrated microstructure, word-based approach, lemmatization, collocations, compounds, complexes, group prepositions, fixed expressions, morphemes, metalexicographical aspects, word-formation processes, word-internal function, user-oriented mono-lingual dictionary, grammar, homonymous lemma, polysemous lemma, decoding, encoding, semantic comment, semantic transparency

Multilingualism in South Africa: with a focus on KwaZulu-Natal and Metropolitan Durban

Book

Full-text available

Jan 2002

This publication deals with the rhetoric and facts about multilingualism in South Africa, with a focus on KwaZulu-Natal and the metropolitan area of Durban. For those who have an interest in the opportunities and challenges presented by multilingualism in a multicultural society, South Africa is, for a number of reasons, one of the most fascinating places to look at. First of all, it has a unique and complex history of apartheid and post-apartheid, in which sociolinguistic issues play a central role. During the years of apartheid (1948–1994), English and Afrikaans were the only two languages with an officially recognized nation-wide status, despite the wide variety of other languages that were (and are) learnt and spoken. Derived from this context, the myth of South Africa as a bilingual English-Afrikaans country persisted for many years. Until 1994, language policy was decided by the apartheid regime and imposed on all inhabitants of South Africa and on all of their languages. The Constitutional Assembly of the post-apartheid Republic of South Africa adopted a new Constitution in 1996 which, at least in writing, is probably more generous to multilingualism than any other Constitution in the world. No less than eleven official languages have formally been adopted. The obvious challenge is how to move away from an apartheid language ideology to a post-apartheid one, not only in its rhetoric, but also in actual practice. Another reason for focusing on South Africa derives from the concept of language as a core value of culture (cf. Smolicz, 1980; 1992). According to Smolicz and other esearchers, the own or ancestral language of socioculturally dominated groups in a multicultural society may or may not be a core value of culture for such groups. In South Africa, where, from a demographic perspective, socioculturally dominant groups have been minority rather than majority groups, a most interesting continuum of attitudes towards this issue emerges. Native speakers of English adhere to the concept of language as a core value of culture more commonly than any other group in South Africa, even to such a degree that they often have a monolingual habitus. Most commonly, native speakers of Afrikaans consider Afrikaans to be a major value or even the core value of their culture. In many Afrikaans speaking homes, however, English is spoken as well. One of the consequences of the apartheid regime has been that indigenous African languages, spoken by the majority of the people in South Africa, have been stigmatized to such a degree that they often suffer from a diminished self-esteem by their speakers. As a result, African languages are conceived as core values of culture by their native speakers to a much lower degree. At the extreme of the continuum, Indian languages are rarely conceived as core values of culture by Indian South Africans, at least in terms of communicative use. Most Indian South Africans speak English at home. However, for many of them, Indian languages hold symbolic value. This publication is divided into three chapters. The first chapter outlines the new constitutional context of multilingualism in South Africa since the end of apartheid. It also goes into the present distribution of languages in South Africa in general and in KwaZulu-Natal in particular, and the outcomes and shortcomings of available census data on language use. Moreover, the status of Afrikaans, English, African languages and Indian languages, respectively, is discussed in a historical context. From 1996–1999, a joint research project was initiated and carried out by the Department of Afrikaans and Nederlands at Natal University in Durban and by Babylon, Center for Studies of Multilingualism in the Multicultural Society at Tilburg University in the Netherlands, in order to collect data on what languages primary school children in the greater Durban metropolitan area come into contact with at home and at school. In 1996 and in 1998, more than 10,000 children participated in a large-scale survey. Chapter 2 gives an overview of the aims, method, and sample of the survey. It also describes which languages are used at home, and which languages children would like to learn. Profiles are drawn up of the 10 most frequently mentioned home languages in terms of five dimensions, i.e. language repertoire, language proficiency, language choice, language dominance, and language preference. A crosslinguistic comparison based on these profiles reveals the relative positions of each of these languages compared to one another. The final chapter is an epilogue to the previous two chapters. It takes up the political context of multilingualism and language planning in the years of waning apartheid, and deals with the rhetoric and the actual practice of multilingualism, in particular in the context of education.The metropolitan stratification of languages in South Africa and the nature of the interaction between languages in contact is in urgent need of investigation. The greater metropolitan area of Durban in KwaZulu-Natal offers a context par excellence for the empirical investigation of multilingualism at home and at school. First of all, the whole range of languages, with English, Afrikaans, African languages and Indian languages, plays a role in this multicultural area, probably more so than anywhere else in South Africa. Second, according to many people involved, Durban is the last British outpost in South Africa. Nevertheless, African languages, Indian languages and Afrikaans are undoubtedly in strong competition with English in this area. Third, the University of Natal at Durban and the University of Tilburg in the Netherlands have an agreement of cooperation and have been working together (see, e.g., Extra and Maartens, 1998) for a number of years in this domain of research The reported findings of the Durban Language Survey point to interesting patterns of language variation. The multitude of languages that the children bring to the classrooms and the bi-/multilingual home environ- ment of many children will come as a surprise to educational planners who have not made any provision for this in the educational system. The survey has the potential to be an important and extensive source of data on language and the primary school child in KwaZulu-Natal. The knowledge this brings is a prerequisite for any strategic educational planning in this large and educationally underdeveloped area.

The Lexicographic Treatment of Sublexical and Multilexical Items in a Northern Sotho Monolingual Dictionary: A Challenge for Lexicographers

Article

Full-text available

Oct 2011
LEXIKOS

Motlokwe Clifford Mphahlele

p>Abstract: Dictionaries have in the past used a word-based approach in which sublexical and multilexical items were not regarded as lemmata. Metalexicography as the theoretical component of lexicography requires that sublexical and multilexical items be lemmatized and treated as independent lemmata in the macrostructure of dictionaries. One of the greater challenges for compiling a better and user-oriented Northern Sotho monolingual dictionary is to treat sublexical and multi-lexical items as macrostructural elements. Treating these items, the lexicographer faces quite a number of challenges. This article proposes possible ways in which sublexical and multilexical elements could be successfully treated in a Northern Sotho monolingual dictionary. Taking stock of these challenges, the writer comes with suggestions that would assist lexicographers in the compi-lation of a user-friendly, lexicon-based monolingual dictionary that would lead users to successful information retrieval. Keywords: SUBLEXICAL ITEMS, MULTILEXICAL ITEMS, AFFIXES, PREFIXAL MOR-PHEMES, SUFFIXAL MORPHEMES, INTEGRATED MICROSTRUCTURE, WORD-BASED APPROACH, LEMMATIZATION, COLLOCATIONS, COMPOUNDS, COMPLEXES, GROUP PREPOSITIONS, FIXED EXPRESSIONS, MORPHEMES, METALEXICOGRAPHICAL ASPECTS, WORD-FORMATION PROCESSES, WORD-INTERNAL FUNCTION, USER-ORIENTED MONO-LINGUAL DICTIONARY, GRAMMAR, HOMONYMOUS LEMMA, POLYSEMOUS LEMMA, DECODING, ENCODING, SEMANTIC COMMENT, SEMANTIC TRANSPARENCY Opsomming: Die leksikografiese behandeling van subleksikale en multi-leksikale items in 'n Noord-Sotho- eentalige woordeboek: 'n Uitdaging vir leksikograwe. Woordeboeke het in die verlede 'n woordgebaseerde benadering gevolg waar-by subleksikale en multileksikale items nie as lemmas beskou is nie. Die metaleksikografie as die teoretiese komponent van die leksikografie vereis dat subleksikale en multileksikale items gelem-matiseer word en as onafhanklike lemmas in die makrostuktuur van woordeboeke behandel word. Een van die groter uitdagings in die samestelling van 'n beter en gebruikersgerigte Noord-Sotho- eentalige woordeboek is om subleksikale en multileksikale items as makrostrukturele elemente te behandel. By die behandeling van hierdie items word die leksikograaf met 'n hele aantal uitdagings gekonfronteer. Hierdie artikel stel moontlike maniere voor waarop subleksikale en multileksikale elemente suksesvol in 'n Noord-Sotho- eentalige woordeboek behandel kan word. Deur hierdie uitdagings in oënskou te neem, kom die skrywer met voorstelle wat leksikograwe sal help met die samestelling van 'n gebruikersvriendelike, leksikongebaseerde eentalige woordeboek wat gebrui-kers tot suksesvolle inligtingsherwinning sal lei. Sleutelwoorde: SUBLEKSIKALE ITEMS, MULTILEKSIKALE ITEMS, AFFIKSE, PREFI-GALE MORFEME, SUFFIGALE MORFEME, GEÏNTEGREERDE MIKROSTRUKTUUR, WOORD-GEBASEERDE BENADERING, LEMMATISERING, KOLLOKASIES, SAMESTELLINGS, KOM-PLEKSE, GROEPVOORSETSELS, VASTE UITDRUKKINGS, MORFEME, METALEKSIKOGRA-FIESE ASPEKTE, WOORDVORMINGSPROSESSE, WOORDINTERNE FUNKSIE, GEBRUIKERS-GERIGTE EENTALIGE WOORDEBOEK, GRAMMATIKA, HOMONIMIESE LEMMA, POLISE-MIESE LEMMA, DEKODERING, ENKODERING, SEMANTIESE KOMMENTAAR, SEMANTIE-SE DEURSIGTIGHEID</p

The Lexicographic Treatment of the Demonstrative Copulative in Sesotho sa Leboa ? An Exercise in Multiple Cross-referencing

Article

Full-text available

Oct 2011
LEXIKOS

p>Abstract: In this research article an in-depth investigation is presented of the lexicographictreatment of the demonstrative copulative (DC) in Sesotho sa Leboa. This one case study serves asan example to illustrate the so-called 'paradigmatic lemmatisation' of closed-class words in theAfrican languages. The need for such an approach follows a discussion, in Sections 1 and 2 respectively,of the present and missing directions in African-language metalexicography. A theoreticalconspectus of the DC in Sesotho sa Leboa is then offered in Section 3, while Section 4 examines thetreatment of the DC in the four existing desktop dictionaries for this language. The outcomes fromthe two latter sections are then used in Section 5, which analyses the problems of and options for asound lexicographic treatment of the DC in bilingual and monolingual dictionaries. The next twosections proceed with a review of the practical implementation of the DC lemmatisation suggestionsin PyaSsaL, i.e. the Pukuntšutlhaloši ya Sesotho sa Leboa 'Explanatory Sesotho sa Leboa Dictionary'— with Section 6 focussing on the hardcopy and Section 7 on the online version. In the process,the very first fully monolingual African-language dictionary on the Internet is introduced. Section8, finally, concludes briefly. Keywords: LEXICOGRAPHY, PARADIGMATIC LEMMATISATION, AFRICAN LANGUAGES,SESOTHO SA LEBOA (NORTHERN SOTHO, SEPEDI), DEMONSTRATIVE COPULATIVE,CROSS-REFERENCING, CORPUS, MONOLINGUAL DICTIONARY, BILINGUAL DICTIONARY, HARDCOPY, ONLINE, INTERNET, EXPLANATORY SESOTHO SA LEBOA DICTIONARY(PYASSAL), SIMULTANEOUS FEEDBACK (SF) Senaganwa: Tokelotlhalošo ya lešalašupi-leba ka mo pukuntšung ya Sesothosa Leboa — Tirišo ka go šupana go gontši. Ka go sengwalwana se sa nyakišišo,nyakišišo yeo e tseneletšego e laetšwa ka ga go lokelwa le go hlalošwa ga lešalašupi-leba ka mopukuntšung ya Sesotho sa Leboa. Thutwana ya mohuta wo ya nyakišišo e šoma bjalo ka mohlalago laetša seo se bitšwago 'tokelo ya mantšu ka lenaneo' (paradigmatic lemmatisation) ya mantšu alegoro leo le tswaletšwego ka go maleme a Afrika. Tlhokego ya nyakišišo ya mohuta wo e tla kamorago ga therišano ya ditaetšo tša gonabjale le tšeo di sego gona ka go tlhamopukuntšu ya teoriya maleme a Afrika. Ditaba tše di hlalošwa ka go dikarolo 1 le 2. Tlhalošo ya teori ya lešalašupilebaka go Sesotho sa Leboa e fiwa ka go karolo 3, mola karolo 4 e hlahloba tokelo le tlhalošo yalešalašupi-leba ka go dipukuntšu tše nne tšeo di lego gona mo polelong ye. Dipoelo tša dikarolo 3le 4 di šomišwa karolong ya 5, yeo e sekasekago mathata le dikgonego tša tokelotlhalošo yalešalašupi-leba ka go dipukuntšu tša polelopedi le tša polelotee. Dikarolo tše pedi tšeo di latelagodi tšwela pele ka tekolo tirišong ya dikakanyetšo tša tšhomišo ya lešalašupi-leba ka go PyaSsaL, elego Pukuntšutlhaloši ya Sesotho sa Leboa. Karolo 6 e lebane le taodišwana ya pampiri mola karolo 7 elebane le taodišwana ya Inthanete. Ka go dira bjalo, pukuntšu ya mathomothomo ya polelotee yamaleme a Afrika e tsebagatšwa mo Inthaneteng. Mafelelong karolo 8 e fa kakaretšo ka bokopana. Mantšu a bohlokwa: TLHAMOPUKUNTŠU, TOKELO YA MANTŠU KA LENANEO,MALEME A AFRIKA, SESOTHO SA LEBOA, LEŠALAŠUPI-LEBA, TŠHUPANO, KHOPHASE,PUKUNTŠU YA POLELOTEE, PUKUNTŠU YA POLELOPEDI, PUKUNTŠU YA PAMPIRI, KAGO INTHANETE, INTHANETE, PUKUNTŠUTLHALOŠI YA SESOTHO SA LEBOA (PYASSAL),SIMULTANEOUS FEEDBACK (SF)</p

A GF miniature resource grammar for Tswana: modelling the proper verb

Article

Full-text available

Mar 2017

The Grammatical Framework (GF) not only offers state of the art grammar-based machine translation support between an increasing number of languages through its so-called Resource Grammar Library, but is also fast becoming a de facto framework for developing multilingual controlled natural languages (CNLs). For a natural language to share maximally in the opportunities that GF-based multilingual CNL support presents, it has to have a GF resource grammar. Tswana, an agglutinating Bantu language, spoken in Southern Africa as one of the eleven official languages of South Africa, does not yet have such a grammar. This article reports on the development of a so-called miniature resource grammar, a first step towards a full resource grammar for Tswana. The focus is on the modelling of the Tswana proper verb as it occurs in simple sentences. The (proper) verb is the morphologically most complex word category in Tswana, and therefore constitutes a notable contribution towards the development of a GF resource grammar for Tswana. The computational model is discussed in some detail, implemented and tested on a systematically constructed treebank.

The Bantu attribute noun class prefixes and their suffixal counterparts, with special reference to Zulu

Article

Mar 2004

Linkie Mohlala

The Alleged Class 2a Prefix bO in Eton: A Plural Word

Article

Feb 2005

Mark Van De Velde

Tswana Finite State Tokenisation

Article

Dec 2014
LANG RESOUR EVAL

Tswana, a Bantu language in the Sotho group, is characterised by an agglutinative morphology and a disjunctive orthography, which mainly affects the verb category. In particular, verbal prefixes are usually written disjunctively, while suffixes follow a conjunctive writing style. Therefore, Tswana tokenisation cannot be based solely on whitespace, as is the case in many alphabetic, segmented languages, including the conjunctively written Nguni group of South African Bantu languages. This paper shows how a combination of two finite state tokeniser transducers and a finite state morphological analyser are combined to solve the Tswana (verb) tokenisation problem. The approach has the important advantage of bringing the processing of Tswana, beyond the morphological analysis level, in line with what is appropriate for the Nguni languages. This means that the challenge of the disjunctive orthography is met at the tokenisation/morphological analysis level and does not in principle propagate to subsequent levels of analysis such as POS tagging and shallow parsing, etc. The tokenisation approach is novel and, when implemented and evaluated, yields an F1-score of 95 % with respect to a hand tokenised gold standard.

THE DISTINCTION BETWEEN ABSOLUTE AND RELATIVE TENSES WITH REFERENCE TO ZULU (AND OTHER BANTU LANGUAGES)

Article

Full-text available

Lionel Posthumus

AIMS The first aim of this article is to distinguish between the two types of tense system, namely absolute and relative tense in Zulu. Secondly, it is essential to analyse the individual tense forms distinguished within each of these tense systems within an adequate framework and to adopt appropriate terminology to refer to the array of tense forms of Zulu (and the other Bantu languages).

The complexity of language change: The case of Ancient Hebrew

Article

Full-text available

Dec 2012

Jacobus A Naudé

This article develops a theory of language change and diffusion in the light of new developments in contemporary linguistics on the themes of language evolution and the rise of linguistic complexity. The core assumptions of this article are, first, the fact that a language inevitably changes and diffuses over time and, second, a language inherently displays variation, which originates in geography or in the idiolect of a single speaker. The article focuses on the description and explanation of linguistic variation of Biblical Hebrew. The traditional division of Biblical Hebrew into chronological periods assumes that both spoken and written Hebrew constituted a single homogenous variety at any given time that transitioned into another variety over time by means of exceptionless sound changes (Hurvitz, 2006). These (Neogrammarian) assumptions concerning language change and diffusion are not psychologically feasible. However, the claims of Young et al. (2008) – that the language features used by Hurvitz to distinguish pre-exilic from post-exilic Hebrew are no more than manifestations of synchronic styles available to biblical authors – are also not psychologically real.

Adverbial descriptions in Northern Sotho

Article

Oct 2012

W. J. Pretorius

Owing to a virtual lack of a ‘basic adverbs’ word class in Northern Sotho, the objective of this article is to investigate some language strategies which are exploited to express adverbial meanings in this language. In popular Northern Sotho grammar books, it is clearly stated that adverbs in Northern Sotho are usually derived from other parts of speech. Important language strategies, such as the use of auxiliary word groups and ideophones in expressing adverbial meanings are, to a great extent, neglected in most popular Northern Sotho grammars. Owing to the fact that in most cases, no one-to-one lexical equivalence exists between Northern Sotho and for instance English adverbs, the compilers of Northern Sotho-English dictionaries find it difficult to accommodate the different strategies in these dictionaries. In this article, various ways of conveying adverbial expressions in Northern Sotho are illustrated, with the focus mainly on auxiliary word groups and ideophones. The aim of this article is not to do a grammatical analysis of the adverb and ideophone word classes as such, but rather to focus on the descriptive function of these words in certain Northern Sotho contexts.

Locative trigrams in Northern Sotho, preceded by analyses of formative bigrams

Article

Full-text available

Jan 2006
LINGUISTICS

In Northern Sotho one of the strategies to express locality makes use of lo- cative particle groups, being complements preceded by any of the so-called locative particles ka, kua, mo, ga ,o rgo. Current linguistic descriptions shy away from those cases where sequences of such particles are employed. In this article these sequences are termed ''locative n-grams'' and are studied for the first time. It will be shown that, synchronically, just a handful of lo- cative trigrams and bigrams do actually occur in a relatively large corpus. An in-depth study of the examples allows taking stock of the existing struc- tures, provides data regarding the distribution of all the n-grams, and hints at the semantic content as well as the restrictions posed on the nature of the complements. In order to get clarity on the latter two aspects, a diachronic approach is often pursued. As a by-product, the study of the higher-order n-grams also brings hitherto overlooked features of the unigrams to light. The main research question that drove this investigation was thus to find out whether or not higher-order locative n-grams exist in Northern Sotho. As the answer was found to be positive, the major objective became to describe the found structures minutely by drawing on corpus data.

The nanosyntax of Nguni noun class prefixes and concords

Article

Jun 2010
LINGUA

Tarald Taraldsen

This article shows that once it is accepted that a single morpheme can lexicalize a “span” of heads rather than a single head, it becomes possible to establish the complex structures underlying noun class prefixes and agreement markers in Nguni (Xhosa, Zulu, Ndebele and Swati) in a mechanical way, based on the distributional properties of the morphemes involved. These structures together with general principles of lexicalization turn out to make an accurate prediction about the syncretism patterns observed among the different types of agreement markers (concords), and we are also led to conclude that the size of nominal projections must be a locus of parametric variation, by comparing Nguni to other Southern Bamtu languages.

On the development of a tagset for Northern Sotho with special reference to the issue of standardisation

Article

Full-text available

Jul 2008

On the development of a tagset for Northern Sotho with special reference to the issue of standardisation Working with corpora in the South African Bantu languages has up till now been limited to the utilisation of raw corpora. Such corpora, however, have limited functionality. Thus the next logi- cal step in any NLP application is the development of software for automatic tagging of electronic texts. The development of a tagset is one of the first steps in corpus annotation. The authors of this article argue that the design of a tagset cannot be isolated from the purpose of the tagset, or from the place of the tagset and its design within the bigger picture of the architecture of corpus annotation. Usage-related aspects therefore feature prominently in the design of the tagset for Northern Sotho. It is explained why this proposed tagset is biased towards human readability, rather than machine readability; this choice of a stochastic tagger is motivated, and the relationship between tokenising, tagging, morphological analysis and parsing is dis- cussed. In order to account at least to some extent for the morphological complexity of Northern Sotho at the tagging level, a multilevel annotation is opted for: the first level com- prising obligatory information and the second optional and re- commended information. Finally, aspects of standardisation are considered against the background of reuse, of sharing of resources, and of possible adaptation for use by other disjunc-

A Comparison of Approaches to Word Class Tagging: Disjunctively vs. Conjunctively Written Bantu Languages

Article

Full-text available

Jan 2006

Northern Sotho and Zulu are two South African Bantu languages that make use of different writing systems, viz. a disjunctive and a conjunctive writing system respectively. In this article it is argued that the different orthographic systems obscure the morphological similarities and that these systems impact directly on word class tagging for the two languages. It is illustrated that not only different approaches are needed for word class tagging, but also that the sequencing of tasks is to a large extent determined by the difference in writing systems.

The Lexicographic Treatment of the Demonstrative Copulative in Sesotho sa Leboa — An Exercise in Multiple Cross-referencing

Article

Full-text available

Dec 2004

In this research article an in-depth investigation is presented of the lexicographic treatment of the demonstrative copulative (DC) in Sesotho sa Leboa. This one case study serves as an example to illustrate the so-called 'paradigmatic lemmatisation' of closed-class words in the African languages. The need for such an approach follows a discussion, in Sections 1 and 2 respec- tively, of the present and missing directions in African-language metalexicography. A theoretical conspectus of the DC in Sesotho sa Leboa is then offered in Section 3, while Section 4 examines the treatment of the DC in the four existing desktop dictionaries for this language. The outcomes from the two latter sections are then used in Section 5, which analyses the problems of and options for a sound lexicographic treatment of the DC in bilingual and monolingual dictionaries. The next two sections proceed with a review of the practical implementation of the DC lemmatisation sugges- tions in PyaSsaL, i.e. the Pukuntšutlhaloši ya Sesotho sa Leboa 'Explanatory Sesotho sa Leboa Dictionary' — with Section 6 focussing on the hardcopy and Section 7 on the online version. In the process, the very first fully monolingual African-language dictionary on the Internet is introduced. Section 8, finally, concludes briefly.

A word-class tagset for Setswana

Article

Nov 2003

This paper aims to present a general tagset for use in an automatic word-class tagger, functioning largely at the level of word-classes, rather than pure morphological information. In view of the importance of reusability, guidelines and standards for tagsets are identified, concentrating on the standards proposed by the Expert Advisory Group on Language Engineering Standards (EAGLES) within the framework of the European Union language technology initiatives. Certain criteria for both tagsets and tag labels are identified. Thereafter, problems and solutions for tokenisation in Setswana are discussed, with emphasis on the challenge presented by the disjunctive orthography and the agglutinative character of Bantu languages. The bulk of the article is then devoted to the development of a tagset for the various part-of-speech categories of Setswana, as a test for the extent to which the EAGLES standards can be adopted and adjusted to make them suitable for an agglutinating language. The conclusion is that this is indeed possible to a large extent, with minor elaborations necessary, in particular as far as the disjunctively written prefixes of verbs are concerned.

Towards a better understanding of the nature of the word in African languages

Article

Jun 2007

George Poulos

The analysis of the core linguistic element, the’ word’, has been a debatable issue in African languages ever since these languages were first written, and to this very day, linguists differ in their opinions on what truly constitutes a word in these languages. The very fact that the official languages of South Africa do not all conform to one specific writing system bears testimony to the differences in opinion that have prevailed in the analyses of these languages. The African languages are primarily agglutinating in structure which should be reflected in a common writing system, and not in the distinctly diverse disjunctive and conjunctive systems which have prevailed for over a century. In an article by Louwrens and Poulos (2006), the shortcomings of the disjunctive system of writing are discussed in detail. In this current article, the focus shifts to languages which use the conjunctive system of writing, and certain relevant issues on word structures are also drawn from other language types for example from an inflectional language such as Greek. By carrying out this type of typological investigation, it is believed that we might move closer towards a clearer understanding of the criteria which govern word boundaries in our African languages.

Relative clause formation in the Bantu languages of South Africa

Article

Full-text available

Feb 2004

Jochen Zeller

This article discusses (verbal) relative clauses in the Bantu languages spoken in South Africa. The first part of the article offers a comparison of the relative clause formation strategies in Sotho, Tsonga, Nguni and Venda. An interesting difference between these language groups con-cerns the syntactic position and the agreement properties of the relative marker. Whereas the rela-tive markers in Sotho, Tsonga and Venda are clause-initial elements, which express agreement with the head noun, the relative markers in the Nguni languages are relative concords, which are pre-fixed to the verb and agree with the subject of the relative clause. The second part of the article addresses this difference and shows that there is a historical relation between these two types of relative constructions. It is argued that earlier forms of Nguni employed relative markers similar to those used in present-day Sotho and Tsonga. In Nguni, these relative markers underwent a gram-maticalisation process which turned them into relative concords. A detailed analysis of the syntactic conditions for, and the properties of, this grammaticalisation process leads to a hypothesis about the reasons why relative concords have developed in Nguni, but not (to the same extent) in Tsonga, Sotho and Venda.

Electronic Dictionaries viewed from South Africa

Article

Mar 2017

D. J. Prinsloo

The aim of this article is to evaluate currently available electronic dictionaries from a South African perspective for the eleven offi cial languages of South Africa namely English, Afrikaans and the nine Bantu languages Zulu, Xhosa, Swazi, Ndebele, North ern Sotho, Southern Sotho, Tswana, Tsonga and Venda. A brief discussion of the needs and status quo for English and Afrikaans will be followed by a more detailed discussion of the unique nature and consequent electronic dictionary requirements of the Bantu languages. In the latter category the focus will be on problematic aspects of lem matisation which can only be solved in the electronic dictionary dimension.

Grammar-based tools for the creation of tagging resources for an unresourced language: the case of Northern Sotho

Article

Full-text available

We describe an architecture for the parallel construction of a tagger lexicon and an annotated reference corpus for the part-of-speech tagging of Nothern Sotho, a Bantu language of South Africa, for which no tagged resources have been available so far. Our tools make use of grammatical properties (morphological and syntactic) of the language. We use symbolic pretagging, followed by stochastic tagging, an architecture which proves useful not only for the bootstrapping of tagging resources, but also for the tagging of any new text. We discuss the tagset design, the tool architecture and the current state of our ongoing effort.

On the relation between noun prefixes and grammaticalisation in Nguni relative clauses*

Article

Aug 2006
Stud Ling

Jochen Zeller

This paper discusses morphological and syntactic aspects of relative clauses in two related Southern Bantu language groups. In Sotho-Tswana, object relative clauses are formed by means of clause-initial relative complementisers which agree with the head noun. In contrast, object relatives in the Nguni languages are formed by means of relative concords which are attached to the relative clause predicate and express agreement with the subject. I suggest that the Nguni relative concords are the result of a grammaticalisation process in which early Nguni relative complementisers first turned into clitics and then into relative concords. On the basis of a detailed analysis of this process I further argue that the syntactic difference between Sotho-Tswana and Nguni relative clauses is correlated with a morphological difference between nouns in these languages.

On the development of a tagset for Northern Sotho with special reference to the issue of standardization

Article

Full-text available

Jan 2008

Designing a verb guesser for part of speech tagging in Northern Sotho

Article

Oct 2008

The aim of this article is to describe the design and implementation of a verb guesser that will enhance the results of statistical part of speech (POS) tagging of verbs in Northern Sotho. It will be illustrated that verb stems in Northern Sotho can successfully be recognised by examining their suffixes and combinations of suffixes. Two approaches to verbal derivation analysis will be utilised, namely morphological analysis and corpus querying of suffixes and combinations of suffixes.

Describing Verbs in Disjoining Writing Systems

Conference Paper

Full-text available

Sep 2005

Many Bantu languages, especially in Southern Africa, have a writing system, where most verb morphemes preceding the verb stem and some suffixes are written as separate words. These languages have also other writing conventions, which differ from the way they are written in other related languages. These two systems are conventionally called disjoining and conjoining writing systems. Disjoining writing can be considered simply as an under-specified way of writing, but for computational description it is a challenge, especially if the system allows only continuous sequences of characters to be recognised as units of analysis. In order to reduce unnecessary ambiguity, verb morphemes should be isolated from such strings of characters that are real words.

Finite state tokenisation of an orthographical disjunctive agglutinative language: The verbal segment of Northern Sotho

Conference Paper

Full-text available

Jan 2006

Tokenisation is an important first pre-processing step required to adequately test finite-state morphological analysers. In agglutinative languages each morpheme is concatinatively added on to form a complete morphological structure. Disjunctive agglutinative languages like Northern Sotho write these morphemes, for certain morphological categories only, as separate words separated by spaces or line breaks. These breaks are, by their nature, different from breaks that separate textquotelefttextquoteleftwordstextquoterighttextquoteright that are written conjunctively. A tokeniser is required to isolate categories, like a verb, from raw text before they can be correctly morphologically analysed. The authors have successfully produced a finite state tokeniser for Northern Sotho, where verb segments are written disjunctively but nominal segments conjunctively. The authors show that since reduplication in the Northern Sotho language does not affect the pre-processing tokeniser, the disjunctive standard verbal segment as a construct in Northern Sotho is deterministic, finite-state and a regular Type 0 language in the Chomsky hierarchy and that the copulative verbal segment, due to its semi-disjunctivism, is ambiguously non-deterministic.

Verbal extension sequencing: An examination from a computational perspective

Conference Paper

Full-text available

Jan 2007

Lexical transducers utilise a two-level finite-state network to simultaneously code morphological analysis and morphological generation rewrite rules. Multiple extensions following the verb root can be morphologically analysed as a closed morpheme class using different computational techniques. Analysis of a multiple extension sequence is achieved by trivial analysis, based on any combination of the closed class members, but this produces unnecessary over-generation of lexical items, many of which may not occur in a lexicon. Limiting the extension combinations in an attempt to represent examples that may actually exist in terms of both the possible number of extensions in a sequence and the relative ordering of the extensions, leads to a radical reduction in the generation of lexical items while the ability to analyse adequately is maintained. The presenters highlight details of their findings as well as the testing of possible extension sequences and morphophonemic alternations of extensions for Northern Sotho, garnered from literature research, lexicographic investigation and the computational morphological analysis of texts.

A Grammatical Analysis of the Tswana Adverbial

Article

Jurie Le Roux

Experiences in data collection for the training of an automatic speech recognizer in Sepedi

Conference Paper

Nov 2002

This paper reports on the process, experiences and statistics of collecting Northern Sotho (Sepedi) telephone speech data for the training of an automatic speech recogniser. A toll-free telephone speech recording platform was used for the data collection. An analysis of the collected speech data together with observations made during supervised calls are included before discussing preliminary results.

A Linguistic Analysis of Northern Sotho (524 pages)

Abstract

Recommended publications

A Linguistic Analysis of Zulu (590 pages)

A Linguistic Analysis of Venda (605 pages)