ResearchPDF Available

Language documentation 20 years on

Authors:

Abstract and Figures

In the last decade of the 20th century a new sub-field of linguistics emerged that has come to be known as ‘language documentation’ or ‘documentary linguistics’ (Himmelmann 1998, 2002, 2006, Lehmann 2001, Austin 2010a, Grenoble 2010, Woodbury 2003, 2011). In this paper we explore how it was defined in the seminal work of Himmelmann (1998) and others, including what were presented as significant characteristics that distinguished language documentation from language description, and how the field has changed and evolved over the past 20 years. A focus on best practices, standards, tools and models for documentary corpora appeared in the early years, which led later to more critical discussions of the goals and methods of language documentation. The paper examines some current developments, including new approaches to language archiving, and suggests that there are opportunities for language documentation to adopt a more socially-engaged approach to languages to and linguistic research, including better engagement with language revitalisation. There are also opportunities to work towards addressing what is currently a language documentation output gap through experimentation new genres and innovations in writing and publication.
Content may be subject to copyright.
Language documentation 20 years on
Peter K. Austin
SOAS, University of London
submitted: 2015-03-27
revised: 2015-06-09
To appear in Martin Pütz and Luna Filipovic (eds.) Endangered Languages across the
planet: Issues of ecology, policy and documentation. Amsterdam: John Benjamins.
1 Defining language documentation
1
Language documentation (also known by the term ‘documentary linguistics’) aims, according
to the seminal definition in Himmelmann (1998: 161), to provide a comprehensive record of
the linguistic practices characteristic of a given speech community... This... differs
fundamentally from... language description [which] aims at the record of a language... as a
system of abstract elements, constructions, and rules.’ Himmelmann (2006) presents it as the
subfield of linguistics that is ‘concerned with the methods, tools, and theoretical
underpinnings for compiling a representative and lasting multipurpose record of a natural
language or one of its varieties’ (Himmelmann 2006:v). Language documentation is by its
nature multi-disciplinary, and as Woodbury (2011) notes, it is not restricted to theory and
methods from linguistics but draws on ‘concepts and techniques from linguistics,
ethnography, psychology, computer science, recording arts, and more’ (see Harrison 2005,
Coelho 2005, Eisenbeiss 2005 for arguments).
Documentary linguistics has developed over the past 20 years as a response to the
growing realisation among linguists, dating from the late 1980s
2
, that a majority of the
world’s 7,000 languages are endangered, in the sense that they are being spoken by
decreasing and aging populations in reducing numbers of domains and are not being passed
on to the next generation of speakers (Robins and Uhlenbeck 1991, Hale et al 1992, Crystal
2000, Austin 2007, Whalen 2004, Grenoble 2011). A desire among some researchers to
create a lasting, and potentially unrepeatable, record of language use in its social and cultural
context was one of the driving forces behind the interest in this new approach. This involved
a renewed attention to context, influenced by the ethnography of communication (pioneered
by Hymes 1964), and the discourse-based approach of Sherzer 1987.
There was also a concern from the beginning of language documentation for supporting
speakers and communities who wish to retrieve, revitalise or maintain their languages by
1
This is a revised and extended version of Austin 2014 (published in the student publication JournaLIPP). For
detailed comments on an earlier draft I am grateful to Christine Beier, Aaron Broadwell, Shobhana L. Chelliah,
Lise Dobrin, Lauren Gawne, Anthony Grant, Lenore Grenoble, Guillaume Jacques, Friederike Luepke, Waruno
Mahdi, David Nathan, Willem de Reuse, Julia Sallabank, Norval Smith, Mauro Tosco, Anthony Woodbury,
Joshua Wilbur and an anonymous reviewer; I alone am responsible for any errors.
2
Himmelmann (2008: 339) argues that the trigger was a short presentation by Johannes Bechert at the
fourteenth International Congress of Linguists in East Berlin in 1987. ... [and] a motion drafted by Christian
Lehmann, which was presented to the business meeting of the Comité International Permanent des Linguistes
(CIPL) … [urging] the committee to take action with the goal of bringing the issue of language endangerment
to the attention of professional linguists and the general public. Also important, especially in North America,
was Hale et al 1992.
2
providing documentation corpora that could be connected to revitalization work (but see
Section 5 below). Also playing a role were advances in information, media, communication
and archiving technologies (see Nathan 2010a, 2010b and Section 4) which made possible
the collection, analysis, preservation and dissemination of documentary corpora in ways
which were not feasible previously. Language documentation also paid attention to the rights
and needs of language speakers and community members, and encouraged collaborative
approaches that would include their direct involvement in the documentation and support of
their own languages (see Grinevald 2003, Austin 2010, Yamada 2007).
A concurrent and supporting development was the availability of extensive new
funding resources for research on endangered languages from several sources, and the
requirements of these funders to adopt a documentary perspective and to archive the recorded
data and analyses. The new funders included the Endangered Languages Documentation
Programme (ELDP)
3
at SOAS (established in 2002 by Arcadia Fund, it has now provided
around 350 documentation grants), the Volkswagen Foundation DoBeS
4
project (which ran
from 2001 to 2014 and funded 80 projects), and the Documenting Endangered Languages
(DEL)
5
inter-agency programme of the National Science Foundation and the National
Endowment for the Humanities (established 2005, it has funded 320 projects to date). Other
smaller sources also emerged (the Endangered Language Fund (ELF)
6
, Foundation for
Endangered Languages (FEL)
7
, Gesellschaft für bedrohte Sprachen (GBS)
8
and Unesco
9
) and
have made more modest grants supporting scores of projects, many of which are community-
based. This new funding influenced the topics that linguists (and others) chose to research,
and the research methods they employed (see Sections 2 and 4 below).
The broader impact on the field of linguistics can be seen in the development of:
academic journals specialising in language documentation topics (Language
Documentation and Conservation
10
, Language Documentation and Description
11
),
and special issues of other linguistics journals dedicated to documentation and
revitalisation (e.g. Volume 34/4 (2013) of the Journal of Multilingual and
Multicultural Development
12
);
specialist conferences, such as the International Conference on Language
Documentation and Conservation held biennially in Hawaii
13
and the Language
Documentation and Linguistic Theory (LDLT)
14
conference held biennially since
2007 at SOAS;
workshops and training courses, including the Summer Institutes of
CoLang/InField
15
run biennially in the United States since 2008, summer schools
of the 3L consortium (Leiden-London-Lyon) that also commenced in 2008, and
3
http://www.hrelp.org/grants/, accessed 11 March 2015
4
http://dobes.mpi.nl/dobesprogramme, accessed 12 March 2015
5
http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=12816, accessed 27 March 2015
6
http://www.endangeredlanguagefund.org/, accessed 27 March 2015
7
http://www.ogmios.org/index.php, accessed 27 March 2015
8
http://www.uni-koeln.de/gbs/, accessed 27 March 2015
9
http://www.unesco.org/new/en/culture/themes/endangered-languages/, accessed 27 March 2015
10
http://nflrc.hawaii.edu/ldc/, accessed 14 March 2015
11
http://www.elpublishing.org, accessed 10 March 2015
12
http://www.tandfonline.com/toc/rmmm20/34/4#.VRTkmvmUeSo, accessed 27 March 2015
13
http://icldc-hawaii.org/, accessed 27 March 2015
14
http://www.hrelp.org/events/, accessed 27 March 2015
15
http://www.alaska.edu/colang2016/charter/, accessed 27 March 2015
3
the LingDy
16
training course held annually at Tokyo University of Foreign Studies
since 2008;
specialist MA and PhD programmes at SOAS
17
(Austin 2008), University of
Hawaii
18
, and the increasing introduction of documentation topics in
undergraduate and postgraduate Linguistics programmes elsewhere;
a growing number of book publications on topics related to language
documentation (for an annotated bibliography see Austin 2013)
increased attention among linguists with a range of interests, objectives and
theoretical persuasions in issues of data quality, portability, data citation, glossing
standardization, and data sources (including elicitation, translation, story boarding,
naturalistic observations, and experimentation).
Himmelmann (2006: 15) identified five major characteristics of language
documentation that he proposed would distinguish it from other approaches to the study of
human languages:
focus on primary data language documentation concerns the collection and
analysis of an array of primary language data to be made available for a wide
range of users (further elaborated in Himmelmann 2012);
explicit concern for accountability access to primary data and representations of
it makes evaluation of linguistic analyses possible and expected;
concern for long-term storage and preservation of primary data language
documentation includes a focus on archiving in order to ensure that documentary
materials are made available to potential users now and into the distant future;
work in interdisciplinary teams documentation requires input and expertise from
a range of disciplines and is not restricted to linguistics alone;
close cooperation with and direct involvement of the speech community
language documentation requires active and collaborative work with community
members both as producers of language materials and as co-researchers.
19
The application of these principles results, according to Himmelmann (1998, 2002, 2006), in
the creation of a record of the linguistic practices and traditions of a speech community
together with information about speakers’ metalinguistic knowledge of those practices and
traditions. This is achieved by systematic recording, transcription, translation and analysis of
a variety of spoken (and written) language samples collected within their appropriate social
and cultural context. Analysis within language documentation under this view is aimed at
making the records accessible to a broad range of potential users which includes not only
linguists but also researchers in other disciplines, community members and others, who may
not have first-hand knowledge of the documented language. The record is also intended for
posterity (and hence should be preservable and portable, in the sense of Bird and Simons
2003), and so some level of processing is required. There is a need for systematic recording
of metadata (data about the data) to make the archived materials understandable, findable,
preservable and usable.
16
http://www.aa.tufs.ac.jp/en/training/fieldling-ws/docling, accessed 27 March 2015
17
https://www.soas.ac.uk/linguistics/programmes/malangdocdesc/, accessed 27 March 2015
18
http://ling.hawaii.edu/, accessed 27 March 2015
19
Issues concerning communities, collaboration and ethics of research have been an ongoing thread in papers
published in the journal Language Documentation and Conservation over a number of years.
4
The core of a language documentation defined in this way was generally understood
to be a corpus of audio and/or video materials with time-aligned transcription, annotation,
and translation into a language of wider communication (Schulze-Berndt 2006), and relevant
metadata on context and use of the materials. Woodbury (2003) argued that the corpus will
ideally cover a diverse range of genres and contexts, and be large, expandable, opportunistic,
portable, transparent, ethical and preservable. Austin (2006a, 2008, 2010) proposes that there
are five activities (not necessarily sequential) which are identifiable in this documentation
approach and which contribute to corpus creation, analysis, preservation and dissemination:
recording of media and text (including metadata) in context;
transfer to a data management environment;
adding value the transcription, translation, annotation and notation and linking
of metadata to the recordings;
archiving creating archival objects and assigning them access and usage
rights;
mobilisation creation, publication and distribution of outputs, in a range of
formats for a range of different users and uses.
2 Best practices, tools and models
The establishment of the DoBeS project in 2001 saw the emergence of a unified ‘DoBeS
model’ for language documentation that the funded projects were expected to adopt.
20
This
included specifications for archival storage, recommendations about recording and analysis
formats, and the development of new software tools to assist with audio and video annotation
(such as ELAN
21
), and the creation and management of metadata (various IMDI tools
22
).
Researchers affiliated with DoBeS also proposed general principles (or ‘best practice’) for
language documentation, such as sampling (to meet Himmelmann’s desideratum that the
documentary record should be ‘representative’, see Seifart 2008), data collection methods
(Lüpke 2009) and a typology of data types (Himmelmann 2012).
Definition of best practice, standards, tools and models was also a central goal of the
E-MELD project
23
funded by the National Science Foundation which ran from 2001 to 2006
aiming to develop recommendations for metadata, annotation markup, language identification
and linguistic ontology (essentially the sets of labels employed in interlinear glossing). This
resulted in a series of papers
24
defining formats for lexical entries (Bell and Bird 2000),
interlinear text (Bird and Liberman 2001, Bowe et al. 2003), paradigms (Penton et al. 2004)
and a generalised ontology for glossing (Farrar et al. 2002, Farrar and Langendoen 2003a, b).
E-MELD set up a ‘School of Best Practices’ (Aristar 2003, Aristar-Dry 2004)
25
with case
studies, a reference list of readings and tools, and a classroom ‘designed to offer “lessons”
and tutorials which explain the recommendations of best practices’.
20
http://dobes.mpi.nl/dobesprogramme and http://www.mpi.nl/corpus/a4guides/a4-guide-dobes-format-
encoding.pdf, accessed 10 March 2015.
21
https://tla.mpi.nl/tools/tla-tools/elan/, accessed 10 March 2015.
22
http://www.mpi.nl/IMDI/, accessed 10 March 2015
23
http://emeld.org/, accessed 10 March 2015
24
http://emeld.org/documents/index.cfm#loc-papers, accessed 10 March 2015
25
http://emeld.org/school/index.html, accessed 10 March 2015
5
Probably the most ambitious attempt to define best practice and what would constitute
a complete documentation of a language is to be found in CELP 2007, which attempted to
define everything that an adequate documentation should cover: all the basic phonology,
morphology, syntactic constructions (in context), and provide a lexicon covering all the basic
vocabulary and important areas of special expertise in the culture, with at least glosses for all
words/morphemes in the corpus, plus a full range of textual genres and registers. It offered a
set of ‘accounting standards’ to determine adequacy, including quantitative measures such as
a figure of 10,000 items for a lexicon, and a text corpus of one million words (around 1200
hours of recorded speech). Other qualitative measures were suggested such as the notion that
research on an endangered language is completed ‘when nothing new is coming up in non-
elicited material and when any apparent lacunae in the phonological system can be shown to
be real and not an accident of data collection’.
It is doubtful if linguists would ever suggest it is possible to qualitatively and
quantitatively determine when a research project is ‘complete’ for non-endangered languages,
yet this is precisely what was suggested for language documentation.
Both DoBeS and E-MELD were influential in encouraging linguists to begin to pay
attention to data types, data structures, analytical processes and workflows, together with
preservability and transparency, however the notion that there was a ‘documentation model’
or a ‘best practice’ (or a small number of ‘best practices’) was questioned by some
researchers, beginning around 2004.
3 Critical responses
The role of archives in defining the goals and values of language documentation was
challenged by Nathan 2004 who introduced the term ‘archivism’ to describe the idea that
quantifiable properties such as recording hours, data volume, and file parameters, and
technical desiderata like ‘archival quality’ and ‘portability’ could be reference points in
assessing the aims and outcomes of language documentation. He argued that these should not
be measures of quality of a documentation project, and that there had been a lack of
discussion of research methodology among language documenters, including about what such
quality measures might be.
Nathan and Austin (2004) addressed the issue of metadata and argued that all value-
adding that researchers provide for the audio or video records they make should be
understood as metadata, and that it should be as rich as possible and designed for the
documentation purpose at hand. This means that metadata should not constrained by
specifications in the form of an ‘ontology’ or standard minimal set (such as that proposed by
OLAC
26
). The need for richer metadata and meta-documentation (documentation of the
language documentation) was further elaborated on by Austin (2009, 2013) see also Gawne
et al. (2015).
Two important issues for the definition of language documentation were raised in
2006, namely the difference between documentation and description, which was considered
fundamental in Himmelmann’s seminal paper (see quotation in 1 above), and the approach to
audio recording within documentation. Austin 2006b (revised and published as Austin and
Grenoble 2007) noted that, as Himmelmann 1998 made clear, language documentation and
26
http://www.language-archives.org/OLAC/olacms.html, accessed 10 March 2015
6
description differ in terms of their goals, areas of interest, research methods, workflows, and
outcomes. Language description focusses on languages as sets of structures and systems, and
typically aims to produce grammars, dictionaries, and collections of texts, the intended
audience of which is usually linguistics specialists. By contrast, documentation is discourse-
centered: its primary goal is the representation of a range of instances and types of language
use in their social and cultural context. Although description may draw on a corpus, it
involves analysis of a different order, oriented to providing an understanding of language at a
more abstract level, as a system of elements, rules, and constructions. Austin and Grenoble
(2007: 22) challenged this sharp separation of description and documentation and argued
that:
[d]ocumentation projects must rely on the application of theoretical and descriptive
linguistic techniques in order to ensure that they are usable (i.e. have accessible entry
points via transcription, translation and annotation), as well as to ensure that they are
comprehensive. It is only through linguistic analysis that we can discover that some
crucial speech genre, lexical form, grammatical paradigm or sentence construction is
missing or under-represented in the documentary record. Without good analysis,
recorded audio and video materials do not serve as data for any community of
potential users.
In terms of workflow, they also differ. For description, linguistic knowledge and
decision-making is applied to some event in the real world to make an inscription (e.g. an
audio recording) that is not itself of interest but serves as a source which can then be selected,
analysed and systematised in order to create analytical representations, typically in the form
of lists, summaries and analyses (e.g. statements about phonology, morphology or syntax). It
is these representations which are the main focus of interest and which are then presented and
distributed to users, typically other linguists. For documentation, linguistic and cultural
knowledge and documentary techniques are applied to some event in the real world to make
an inscription (audio or video recording) that recapitulates aspects of the original event (such
as social or spatial relationships see Nathan 2010a) and is itself a focus of interest (e.g. for
archiving, preservation and distribution). The documentary researcher adds value to the
inscription by making decisions and applying linguistic and other knowledge to create
representations, typically in the form of transcriptions, translations and annotations. These
representations are the second major focus of interest and will be archived and/or mobilized
and distributed. The same representations could, of course, also be the input to the selection
and analytical procedures of description, thereby linking the descriptive outcomes to the
documentary corpus. From this viewpoint, documentation and description are complementary
activities with complementary goals, methods and outcomes.
Nathan (2006) argued that despite the expressed concern by language documenters for
recording language in its social and cultural context, many researchers took an unscientific
approach to audio recording in particular, ignoring issues such as spatiality and microphone
selection and placement. He extended this critique in Nathan (2009, 2010a) and argued for
the need to establish an epistemology for audio recording within language documentation.
A broader critique of documentation and contemporary endangered languages research
can be found in Dobrin et al. (2007) who identify and highlight tendencies towards
objectification of languages, and reliance on familiar qualitative metrics to measure quality,
progress and value. More specifically, they argue that subtle and pervasive kinds of
7
commoditisation (reduction of languages to common exchange values) abound, particularly
in competitive and programmatic contexts such as grant-seeking and standard-setting where
languages are necessarily compared and ranked’. Bowern (2011: 468) also points to
commoditisation and suggests that ‘community members report sometimes feeling that the
linguist comes in, reifies the language, turns it into a commodity, and then takes it away.’
Dobrin et al. (2007) echo Nathan (2004) in pointing to archivism as problematic, and
join Nathan (2006) in arguing that documentary linguists show little or no knowledge about
recording arts, including microphone types, properties and placement, even though
microphone choice and handling is the single greatest determiner of audio recording quality.
They also note that evidence from archival deposits shows that video tends to be poorly used
by documentary linguists, with video recordings being made without reference to articulated
hypotheses, goals, or methodology, simply because the technology is available, portable and
relatively inexpensive. Finally, in contrast to earlier approaches, they point to diversity as an
important aspect of language documentation. As researchers respond to the unique and
particular social, cultural and linguistic contexts within which the languages they are studying
are spoken or signed, actual documentation projects, as evidenced by grant project proposals
and materials deposited in archives, show a diversity of approaches, techniques,
methodologies, skills and responses. In the last 10 years we also find an increasing diversity
of materials that can be included in corpora, so that alongside the traditional field interviews,
observations, experiments and narrative collections that have been the bread and butter of
documentation and description, we also find materials, much of them created by native
speakers, from YouTube uploads, Twitter feeds, Facebook posts, blogs, email, chat, Skype
calls, and local pedagogy developed for revitalization. Similarly, the outcomes of
documentation are increasingly diverse so that alongside books, papers and archive deposits,
today research projects are also generating YouTube uploads
27
, Twitter and Facebook posts,
blogs
28
, multimedia (such as Gayarragi Winangali
29
, and mobile apps (such as Ma!
Iwaidja
30
). Rather than aiming for comprehensiveness or representativeness, research funded
recently by ELDP for example, rather shows specificity, focussing on topics such as
traditional song in its diaspora context, language use by blacksmiths, bark cloth making,
libation rituals, fishing practices, child language, interactive speech, and ethnobotany
(projects funded in 2012 and 2014).
31
In a recent handbook, Woodbury (2011: 159) presents a definition of language
documentation which reflects this shift away from representative samples towards more
specific goals as the creation, annotation, preservation and dissemination of transparent
records of a language. He also identifies some gaps in the earlier conceptions of
documentation, especially because language encompasses conscious and unconscious
knowledge, ideation and cognitive ability, as well as overt social behaviour (ibid.). The role
27
For example, Anthony Jukes’ subtitled video on Minahasan food and cooking methods at
https://www.youtube.com/watch?v=wVy2QsFqdYI, accessed 9 June 2015 (see also
https://www.youtube.com/watch?v=hqNQ-z9sIBw for further details)
28
For example, Austin’s Dieri blog at http://www.dieriyawarra.wordpress.com, accessed 15 March 2015
29
http://www.dnathan.com/projects/gw/, accessed 27 March 2015
30
http://www.iwaidja.org, accessed 15 March 2015
31
http://www.hrelp.org/grants/projects/index.php?year=2012,
http://www.hrelp.org/grants/projects/index.php?year=2014, accessed 14 March 2015. Note that there has not
been a complete shift away from the ‘whole language documentation’ approach with quite a number of funded
projects still taking such an approach.
8
of ideologies of language structure and use, attitudes of speakers to their and others’ speech,
and the relationships of beliefs and attitudes to actual performance in the world are only
beginning to be addressed by documentary linguists (see Austin and Sallabank 2014). As
Woodbury (2011: 160) notes, ‘humans experience their own and other people’s languages
viscerally and have differing stakes, purposes, goals and aspirations for language records and
language documentation’.
Woodbury (2011) has also highlighted a need to develop a theory of documentary
corpora (covering the principles by which a particular corpus ‘hangs together’), as well as a
need for accounts of individual documentation project designs. Austin (2013) extends this to
a general call for reflexive meta-documentation of their work by researchers concerning their
documentary models, processes and practices. This would include: the identity of
stakeholders and their roles; the attitudes and ideologies of language consultants and the
communities within which they are located (towards their languages as well as the
documenter and documentation project
32
); the relationships between researchers, research
project participants and the wider community; the goals and methodology adopted by the
project, including research methods and tools (see Lüpke 2009); corpus theorization
(Woodbury 2011); theoretical assumptions embedded in annotation and translation (e.g. in
abbreviations, glosses); and considerations of the potential for a project to contribute to
revitalization. In addition, it is important to know the biography of the project, including
background knowledge and experience of the researcher and main consultants (e.g. how
much fieldwork the researcher had done at the beginning of the project and under what
conditions, what training the researcher and consultants had received). Austin (2013)
suggests that such meta-documentation can draw upon knowledge and practices in other
disciplines (such as social and cultural anthropology, archaeology, archiving and museum
studies), and from considerations that surface in the interpretation of past documentations (of
legacy materials). The many parallels between language documentation and ethnomusicology
in terms of these and other topics are explored in detail by Grant (2014).
Austin and Sallabank (2015) point out that the early emphasis on ‘compiling a
representative and lasting multipurpose record of a language’ has led documenters to focus
on defining and describing individual languages in isolation with a narrow attention to what
Woodbury (2011: 177) calls ‘the ancestral code’, rather than documenting dynamic language
practices and real-life interactions in their sociolinguistic context (see also Sugita 2007;
Amery 2009; Childs et al. 2014). By definition, endangered languages do not exist in
isolation but are always spoken in relationships with other languages, varieties, codes, styles,
registers, etc., in a complex linguistic ecology (Haugen 1972; Mühlhäusler 1992, 2000;
Calvet 2006). Grenoble (2011) has argued that linguists should aim to document language
ecologies, not just what they define as individual languages or varieties (the ancestral code
approach). At the very least they should pay attention to multilingual repertoires, mixed
codes, the sociolinguistic and structural effects of contact, and language variation and change
(Lüpke and Storch 2013). Gullberg (2012) has explored the interplay between
multilingualism and multimodality, arguing that language documentation data has the
potential to inform theoretical and empirical studies of linguistics, bilingualism and
multimodality in entirely new ways, and, conversely, that documentation work would benefit
from taking the bilingual and multimodal nature of its data into account‘ (Gulberg (2012: 46).
32
See Kroskrity (2015) for an example relating to a dictionary project.
9
It is also important to consider extra-linguistic factors such as language attitudes and
ideologies (Sallabank 2013, Austin and Sallabank 2014). The dominant model of language
documentation from 1995 to 2010 could be described as ‘saving the morphemes two-by-two’
in a ‘Noah’s arc(hive)’, salvage-linguistics approach which reflects a purist notion of single
languages in isolation. From 2010, for at least some language documenters, the approach has
become more particular, dynamic, pluralistic and socially-engaged.
4 Developments in archiving
The rise of language documentation has also seen the development of a number of internet-
accessible digital archives focusing in particular on the preservation of materials on
endangered languages. These include DoBes in the Netherlands
33
, Paradisec in Australia
34
,
Pangloss in Paris
35
, the California Archive in Berkeley
36
, AILLA in Texas
37
, and ANLA in
Alaska
38
.
One of the most dramatic developments of the 21st century has been the rise of social
network models on the internet (so-called Web 2.0) that aim to link people rather than
documents, with a focus on interaction and collaboration instead of passive downloading and
viewing of content. These new models have been taken up in the last 10 years by some
language documentation archives (such as ELAR
39
at SOAS) leading to what Nathan (2010b)
calls ‘Archives 2.0’.
Traditionally, archiving has focused heavily on preservation (and on cataloguing and
standards see Section 3 above), however language documentation raises a number of new
methodological challenges, especially in relation to endangered languages where speakers
tend to use their language more and more to speak of private, local, sensitive and secret
matters. So the primary data of documentary linguistics maximises the likelihood of
including content that can cause discomfort or harm to the recorded speakers’ (Nathan 2014:
191) or their families and descendants. Thus documentation corpora often contain ritual or
sacred material that may be restricted in terms of who can be exposed to them, as well as
gossip which may contain references to private knowledge or events. As a result, language
documentation archives need powerful but flexible access management that is transparent,
easy to understand, and able to be changed as circumstances develop. The basis for access
will be via relationships between the providers of the materials (archive depositors and the
stakeholders they work with) and those who wish to use them. Beginning in 2005, the ELAR
archive at SOAS developed a richly articulated system of ‘access protocols’ designed to
formulate and implement speakers’ rights and sensitivities, together with rigorous methods
and processes for controlled access to the archival materials. Each resource is assigned one of
five levels of access: U (open to all registered users), R (for registered researchers only), C
(for community members only) and S (for subscribers who negotiate access with the
depositor), X (closed to all but the depositor). Registered users are then categorized by
archive staff and their access to particular materials depends on their status (e.g. they are R
33
http://dobes.mpi.nl/, accessed 9 June 2015
34
http://paradisec.org.au/, accessed 9 June 2015
35
http://lacito.vjf.cnrs.fr/pangloss/presentation_en.htm, accessed 9 June 2015
36
http://cla.berkeley.edu/, accessed 9 June 2015
37
http://www.ailla.utexas.org/site/welcome.html, accessed 9 June 2015
38
https://www.uaf.edu/anla/, accessed 9 June 2015
39
http://www.elar-archive.org, accessed 10 March 2015
10
by virtue of being associated with an academic programme, and/or C because they explain
that they have links to a particular community
40
) and the access type of the materials they
wish to use. A similar access protocol system is in use by TLA, The Language Archive, at the
Max Planck Institute for Psycholinguistics (which includes the DoBeS endangered languages
archive).
41
Endangered language archiving thus requires a special response to the well-
publicised movement for complete open access that is current in much other academic
research and publication.
In this view, such an archive can also be seen as a place for establishing and transacting
relationships and sharing, and Web 2.0 models provide a technology for instantiating this.
The general model of the ELAR archive is presented by Nathan (2010b) as in Figure 1.
Figure 1: ELAR Archive 2.0 model
There are several other archiving developments that have been pioneered by ELAR in the last
10 years. The first, called progressive archiving, sees archiving as a whole-of-project
relationship: depositor accounts are established at the beginning of a research project, and
researchers add and manage or update their materials over time, as well as managing and
engaging in interactions with the curators and users. Secondly, ELAR have developed a web
accessible archive interface that has been designed to provide contextualization, different
degrees of presentation for different projects, and ease of navigation for users. The interface
directly reflects the interests and needs of the materials providers and the users, rather than
being, for example, a unified tree structure across the whole collection, as other archives such
as DoBeS and AILLA. Thirdly, ELAR has promoted increased participation so that users can
negotiate access to particular materials and bookmark their favourites, while depositors can
negotiate access requests and monitor usage. A communication channel has also been
established in order for both groups to exchange and share information. Nathan (2014) gives
examples of these exchanges and how they can lead to creative outcomes and collaborations
between researchers and members of the community of users.
Possible future developments in endangered languages archiving may include
community curation of archived materials (Linn 2014), participant identification and
40
this can be one of the most difficult and complex statuses for an archive to determine
41
http://dobes.mpi.nl/access_registration/, accessed 10 March 2015
11
expression of rights (Garrett 2014), and the creation of new kinds of outputs that draw upon a
range of materials drawn from several collections within the archive (just as museums and
galleries choose, select and exhibit their resources for educational or other purposes see
Holton 2014). The overall flavour of archiving in the last five years has changed from finality
and completeness to being open and evolutionary. These developments also raise questions
for archives about what a deposit or depositor really is, and recast archives as providers of
services within a revised, holistic concept of language documentation.
5 Language documentation and revitalization
The term ‘language revitalization is used to describe principles and activities aimed at
increasing the number of users of a language, and/or the range of domains within which it is
used (Fishman 1991, 2001; Hinton and Hale 2001; Hornberger 2010; Hinton 2011; Romaine
2007; Grenoble and Whaley 2006). It has been in operation for more than 20 years longer
than language documentation as its origins go back to community-based activities by Māori
in New Zealand in the 1970s (Spolsky 1989, 2003, Bentahila and Davies 1993) and by other
groups such as North American indigenous people (Niedzielski 1992, Kapono 1995, Hinton
1993, 2002, 2013), and European minorities such as the Catalan, Welsh and Basque.
The relationship between language documentation and language revitalization is a
rather complex one, and is explored in some detail in Austin and Sallabank 2015. For many
language documenters revitalization has been seen as a waste of resources, a viewpoint
connected to the ‘language-as-system’ ideology that sees linguistic data as the only thing
worth collecting and preserving, in contrast to ‘linguistic social work’ (Newman 2003: 6; see
also Dimmendaal 2004: 84 and Blench 2008: 153).
42
Although documentation defined itself from the beginning as a field that set out to create
a multipurpose record for a wide range of users, including community members, language
revitalization has been treated as a simple ‘technical add-on’ that involves creation of
orthographies, dictionaries, sub-titled videos, and primers and multimedia, including
websites, rather than as a field of research or activity that requires theoretical and applied
knowledge. This view was also been strongly supported by the funding agencies (including
ELDP, Volkswagen Stiftung, NSF-NEH), who excluded revitalization-oriented projects from
grants and severely limited the amount of money that could be included for revitalisation
materials creation or ‘community publication’ of research results.
Much of the material that has ended up in language documentation archives is unsuitable
for revitalization for a variety of reasons, including inappropriate genres or topics, recordings
and analyses in difficult to access archival formats that require specialised software (such as
ELAN or FLEx), or glossed and translated into languages such as English that have little or
no place in the local linguistic ecology. Documentation is also heavily biased towards the
performances of older fluent speakers, resulting in language that may be too fast, heavily
context dependent and include slurring or elisions, or even be affected by physiological
factors (not least of which may be lack of teeth). Few, if any, documentary corpora include
42
Newman’s views were repeated and further elaborated in Newman (2013); see also the response by Whalen
(2013).
12
samples of children’s ordinary language use
43
or learner-directed speech; in addition, as noted
by Cope (2014), documentary linguists are not trained in pedagogical materials design, and
applied linguists are rarely included in language documentation teams. The relationship
between language documentation and revitalization has thus varied from avoidance or
subordination to, at best, only an indirect connection (Sallabank 2012). There is a need for
much more exploration and development of this area in the future (see also Austin and
Sallabank 2015 for further discussion).
6 Documentation and academia
The development of language documentation as a sub-field with its own principles and
practices appeared to many researchers in its foundation period at the end of the 20th century
to offer an opportunity to change the socio-political academic balance between fieldworkers
and so-called ‘armchair linguists (typologists, theoreticians) (Fillmore 1992, Aikhenvald
2007:4, Crowley 2007: 11-13) by providing a foundation (theory, best practices) for corpus
creation, data collection and analysis. Many perceived that fieldwork and language
description were in a subordinate sociological position (Newman (2009: 124)
44
states
explicitly that ‘theoreticians belittle descriptivists as linguistically second-class citizens
45
),
and hoped that language documentation and the work of corpus creation and associated
activities would raise their status in academic linguistics. Indeed, lobbying by documenters
and others led in 2010 to the Linguistic Society of America ‘Resolution Recognizing the
Scholarly Merit of Language Documentation which states that:
[a] shift in practice has broadened the range of scholarly work to include not only
grammars, dictionaries, and text collections, but also archives of primary data,
electronic databases, corpora, critical editions of legacy materials, pedagogical works
designed for the use of speech communities, software, websites, or other digital
media. The products of language documentation and work supporting linguistic
vitality are of significant importance to the preservation of linguistic diversity, are
fundamental and permanent contributions to the foundation of linguistics, and are
intellectual achievements which require sophisticated analytical skills, deep
theoretical knowledge, and broad linguistic expertise
The resolution ‘support[ed] the recognition of these materials as scholarly contributions to be
given weight in the awarding of advanced degrees and in decisions on hiring, tenure, and
promotion of faculty. In addition, the resolution encouraged the development of appropriate
means of review of such works so that their functionality, import, and scope can be assessed
relative to other language resources and to more traditional publications’.
43
an exception is the DoBeS Chintang/Puma project see http://dobes.mpi.nl/projects/chintang/, accessed 9
June 2015. There is incidental children’s language material in the ELAR archive, such as children’s retellings of
the Frog Story book (Mayer 1969), however this material has not been systematically collected.
44
originally published in 1992.
45
Newman (2009: 124) considers this to be an ‘unintended consequence of Chomsky’s (1964) hierarchy of
levels of adequacy in grammar, namely, from the bottom up, observational adequacy “A grammar that aims
for observational adequacy is concerned merely to give an account of the primary data” (p. 63, italics mine),
descriptive adequacy, and explanatory adequacy.
13
To date, criteria for this kind of review of documentary corpora, or examples of such
reviews (parallel, say, to book reviews), have not appeared. In the five years since this
resolution has passed there still remains what we can call an ‘output gap’: traditional products
of language description and typological and theoretical research (grammars, book chapters,
journal articles) are understood and accorded value in determining promotion, award of
tenure and in decision making about new job appointments, but the newer outputs in the form
of digital archival deposits, multimedia products, and pedagogical materials for revitalization
are either not valued or discounted.
According to Thieberger (2012) similar discussions have taken place in Australia
beginning in 2011 between the Australian Linguistic Society (ALS) and the Australian
Research Council (ARC), and ‘although the ARC accepted that curated corpora could
legitimately be seen as research output, it would be the responsibility of the ALS (or the
scholarly community more generally) to establish conventions to accord scholarly credibility
to such products’. He reports on proposals for a possible review procedure but recognizes that
‘the question of what criteria to use in evaluating a corpus is more problematic’. For some
suggestions for criteria see Thieberger (2012) and Thieberger et al. (2012); again no action
appears to have been taken to date to actually implement these Australian proposals.
In my view, to address this output gap, there is a need for experimentation and the
development of new genres, so far unfamiliar to linguists, that link and contextualise
analytical outputs and the archival corpus. These could include ethnographies of
documentation project designs, accounts of data collection (cf. the genre of research
publications in archaeology called field reports’), finding-aids to corpus collections, or
‘exhibitions’ or ‘guided tours’ of archival deposits (along the lines of exhibitions and
associated products regularly mounted by museums to display parts of their collections, see
also Woodbury 2014). Similarly, reviews of corpora or these new kinds of writing could also
be attempted.
There has been a very recent development in Linguistics of free online open access
publication platforms (e.g. Language Science Press, established in April 2013
46
, and EL
Publishing, launched in July 2014),
47
with all the usual academic requirements such as
double-blind reviewing and professional editing, design and layout. While Language Science
Press publishes digital versions of traditional books, EL Publishing has set out to provide and
encourage new opportunities for language documenters to publish multimedia and the other
innovative types of output mentioned above. It remains to be seen whether these
opportunities will be taken up by practitioners, and whether they will go some way to
addressing the output gap in the future.
7 Conclusions
The past 20 years has seen the emergence and gradual development of a new sub-field of
research called ‘documentary linguistics’ or ‘language documentation’ which has
concentrated on recording, analysing, preserving and disseminating records of languages in
use in ways that can serve a wide range of constituencies, particularly the language
communities themselves. In the early period of its development there was a concentration on
46
http://langsci-press.org/index, accessed 9 June 2015
47
http://www.elpublishing.org, accessed 1 March 2015.
14
defining a model for language documentation and specifying best practices, tools and
analytical categories, however the past 10 years have seen a shift in perspective responding to
criticism of these early concerns. Today, there is more recognition of diversity of contexts,
goals, methods and outcomes of language documentation, and indications of the introduction
of social models of research, especially in the area of archiving. Much work remains to be
done however, to engage better with language revitalization and to establish reliable and
replicable measures for evaluating the quality, significance and value of language
documentation research so that its position alongside such sub-fields as descriptive linguistics
and theoretical linguistics can be assured and enhanced.
References
Aikhenvald, Alexandra Y. 2007. Linguistic fieldwork: setting the scene. Sprachtypologie und
Universalienforschung 60(1), 3-11.
Amery, Rob. 2009. Phoenix or Relic? Documentation of Languages with Revitalization in Mind.
Language Documentation and Conservation 3(2): 138-148.
Aristar, Anthony Rodrigues. 2003. The school of best practice. SOAS Workshop on Archives for
Endangered Languages, London, November 21-22.
Aristar-Dry, Helen. 2004. E-MELD School of best practices in digital language documentation.
Presentation at E-MELD Conference 2004: Workshop on linguistic databases and best practice.
Detroit, Michigan, July 15-18.
Austin, Peter K. 2006a. Data and language documentation. In Jost Gippert, Nikolaus Himmelmann
and Ulrike Mosel (eds.) Essentials of Language Documentation (Trends in Linguistics. Studies
and Monographs, 178), 87-112. Berlin: Mouton de Gruyter.
Austin, Peter K. 2006b. Defining language documentation. Paper presented at the Georgetown
University Roundtable on Linguistics, Georgetown University, Washington, DC, March.
Austin, Peter K. 2007. Survival of Languages. In Emily F. Shuckburgh (ed.) Survival: Darwin
College Lectures, 80-98. Cambridge: Cambridge University Press.
Austin, Peter K. 2008. Training for language documentation: Experiences at the School of Oriental
and African Studies. In Margaret Florey and Victoria Rau (eds.) Documenting and Revitalising
Austronesian Languages, 25-41. Language Documentation and Conservation Special
Publication No. 1. Hawaii: University of Hawaii Press
Austin, Peter K. 2009. Meta-documentary linguistics. Paper given at the Aboriginal languages
workshop, Kioloa, Australia, March 2009. [Powerpoint slides available at
http://www.slideshare.net/pkaustin/2010-march-kioloa]
Austin, Peter K. 2010. Current issues in language documentation. Language Documentation and
Description, Volume 7, 12-33. London: SOAS.
Austin, Peter K. 2013. Language documentation and meta-documentation. In Sarah Ogilvie and Mari
Jones (eds.) Keeping Languages Alive: Documentation, Pedagogy and Revitalization, 3-15.
Cambridge: Cambridge University Press.
Austin, Peter K. 2014. Language documentation in the 21st century. JournaLIPP 3: 57-71.
Austin, Peter K. and Lenore Grenoble. 2007. Current trends in language documentation. Language
Documentation and Description, Volume 4, 12-25. London: SOAS.
Austin, Peter K. and Julia Sallabank. 2014. Endangered Languages: Ideologies and beliefs in
language documentation and revitalization. London: British Academy.
Austin, Peter K. and Julia Sallabank. 2015. Language documentation and language revitalization:
partners or just good friends? SOAS manuscript.
15
Bell, John and Steven Bird. 2000. Preliminary study of the structure of lexicon entries. Proceedings of
the Workshop on Web-Based Language Documentation and Description, Philadelphia,
December.
Bentahila, Abdelali, and Eirlys E. Davies. 1993. Language revival: restoration or transformation?
Journal of Multilingual and Multicultural Development 14(5), 355-374.
Bird, Steven and Mark Liberman. 2001. A formal framework for linguistic annotation. Speech
Communication, 33 (1, 2), 23-60.
Bird, Steven and Gary Simons. 2003. Seven dimensions of portability for language documentation
and description. Language 79(3), 557-82.
Blench, Roger. 2008. Endangered languages in West Africa. In Matthias Brenzinger (ed.) Language
Diversity Endangered, 140-162. Berlin: Mouton de Gruyter.
Bowe, Cathy, Baden Hughes and Steven Bird. 2003. Towards a general model for interlinear text.
Proceedings of EMELD-03 [http://emeld.org/workshop/2003/bowbadenbird-paper.html,
accessed 2013-08-12]
Bowern, Claire. 2011. Planning a language documentation project. In Peter K. Austin and Julia
Sallabank (eds.) The Cambridge Handbook of Endangered Languages, 459-482. Cambridge:
Cambridge University Press.
Calvet, Jean-Louis. 2006. Towards an Ecology of World Languages. Cambridge: Polity.
CELP. 2007. Adequacy of documentation. Document circulated at the January 2007 meeting of the
Linguistic Society of America Committee on Endangered Languages and Their Preservation.
Childs, Tucker, Jeff Good and Alice Mitchell. 2014. Beyond the ancestral code: Towards a model for
sociolinguistic language documentation. Language Documentation and Conservation 8, 168-
191
Chomsky, Noam. 1964. Current issues in linguistic theory. In Jerry A. Fodor and Jerrold J. Katz (eds.)
The structure of language, 50-118. Englewood Cliffs, NJ: Prentice Hall.
Coelho, Gail 2005. Language documentation and ecology: areas of interaction. In Peter K. Austin
(ed.) Language Documentation and Description, Volume 3, 63-74. London: SOAS.
Cope, Lida. (ed.) 2014. Applied Linguists Needed: Cross-disciplinary Networking in Endangered
Language Contexts. Abingdon: Routledge.
Crowley, Terry. 2007. Field Linguistics: A Beginners Guide. Oxford: Oxford University Press.
Crystal, David. 2000. Language death. Cambridge: Cambridge University Press.
Dimmendaal, Gerrit. 2004. Capacity building in an African context. In Peter K. Austin (ed.)
Language Documentation and Description, volume 2, 71-89. London: SOAS.
Dobrin, Lise, Peter K. Austin, and David Nathan. 2007. Dying to be counted: commodification of
endangered languages in documentary linguistics. Language Documentation and Description,
Volume 6, 37-52. London: SOAS.
Eisebeiss, Sonja. 2005. Psycholinguistic contributions to language documentation. Language
Documentation and Description, Volume 3, 106-140. London: SOAS.
Farrar, Scott and D. Terence Langendoen. 2003a. A linguistic ontology for the semantic web. GLOT
International 7(3): 97-100.
Farrar, Scott and D. Terence Langendoen. 2003b. Markup and the GOLD ontology. Proceedings of
EMELD-03. [http://staff.washington.edu/farrar/documents/inproceedings/FarLang03a.pdf,
accessed 12 August 2013]
Farrar, Scott, William D. Lewis and D. Terence Langendoen. 2002. A common ontology for linguistic
concepts. Proceedings of the Knowledge Technologies Conference, Seattle, Washington
[http://staff.washington.edu/farrar/documents/inproceedings/FarLewLang02a.pdf, accessed 12
August 2013].
16
Fillmore, Charles. 1992. ‘Corpus linguistics’ or ‘computer-aided armchair linguistics’. In Jan Svartik
(ed.) Directions in Corpus Linguistics, 35-45. Berlin: Mouton de Gruyter.
Fishman, Joshua. 1991. Reversing Language Shift: Theory and Practice of Assistance to Threatened
Languages. Bristol: Multilingual Matters.
Fishman, Joshua A. (ed.) 2001. Can Threatened Languages be Saved? Reversing Language Shift,
Revisited: A 21st Century Perspective. Bristol: Multilingual Matters.
Garrett, Edward. 2014. Participant driven language archiving. In David Nathan and Peter K. Austin
(eds.) Language Documentation and Description, Volume 12, 68-84. London: SOAS.
Gawne, Lauren, Barbara F. Kelly, Andrea Berez and Tyler Heston. 2015. Putting practice into words:
fieldwork methodology in grammatical descriptions. Paper presented at ICLDC 4 Conference,
Hawaii, February 2015.
Grant, Catherine. 2014. Music endangerment: how language maintenance can help. Oxford: Oxford
University Press.
Grenoble, Lenore. 2010. Language documentation and field linguistics: The state of the field. In
Lenore A. Grenoble and N. Louanna Furbee (eds.) Language documentation: Practice and
values, 289309. Amsterdam: John Benjamins.
Grenoble, Lenore A. 2011. Language ecology and endangerment In Peter K. Austin and Julia
Sallabank (eds.) The Cambridge Handbook of Endangered Languages, 27-45. Cambridge:
Cambridge University Press.
Grenoble, Lenore A. and Lindsay J. Whaley. 2006. Saving languages: An introduction to language
revitalization. Cambridge: Cambridge University Press.
Grinevald, Colette. 2003. Speakers and documentation of endangered languages. Language
Documentation and Description, Volume 1, 52-72.
Gullberg, Marianne. 2012. Bilingual multimodality and language documentation. In Frank Seifart,
Geoffrey Haig, Nikolaus P. Himmelmann, Dagmar Jung, Anna Margetts and Paul Trilsbeek
(eds.) Potentials of Language Documentation: Methods, Analyses, and Utilization, 46-53.
Language Documentation and Conservation Special Publication No. 3. Honolulu: University of
Hawai’i Press.
Hale, Kenneth, Michael Krauss, L. J. Watahomigie, Akira Y. Yamamoto, Colette Craig, Laverne M.
Jeanne, and Nora C. England. 1992. Endangered languages. Language 68(1): 142.
Harrison, K. David. 2005. Ethnographically informed language documentation. Documentation and
Description, Volume 3, 22-41. London: SOAS.
Haugen, Einar. 1972. The Ecology of Language. Stanford, CA: Stanford University Press.
Himmelmann, Nikolaus P. 1998. Documentary and descriptive linguistics. Linguistics 36:161-95.
Himmelmann, Nikolaus P. 2002. Documentary and descriptive linguistics. In Osamu Sakiyama and
Fubito Endo (eds.) Lectures on endangered languages, Volume V, 3783. Kyoto: Endangered
Languages of the Pacific Rim.
Himmelmann, Nikolaus P. 2006. Language documentation: What is it and what is it good for? In Jost
Gippert, Nikolaus P. Himmelmann and Ulrike Mosel (eds.) Essentials of Language
Documentation (Trends in Linguistics. Studies and Monographs, 178), 1-30. Berlin: Mouton de
Gruyter.
Himmelmann, Nikolaus P. 2008. Reproduction and preservation of linguistic knowledge: Linguistics’
response to language endangerment. Annual Review of Anthropology, Volume 37, 337-350.
Himmelmann, Nikolaus P. 2012. Linguistic data types and the interface between language
documentation and description. Language Documentation and Conservation 6, 187-207.
Hinton, Leanne. 1993. Flutes of Fire. Berkeley: Heyday Books.
Hinton, Leanne. 2002. How to keep your language alive. Berkeley: Heyday Books
17
Hinton, Leanne. 2011. Revitalization of endangered languages. In Peter K. Austin, and Julia
Sallabank (eds.) The Cambridge Handbook of Endangered Languages, 291311. Cambridge:
Cambridge University Press.
Hinton, Leanne. 2013. Bringing Our Languages Home: Language Revitalization for Families.
Berkeley: Heyday Books
Hinton, Leanne, and Kenneth Hale, eds. 2001. The green book of language revitalization in practice.
San Diego, CA: Academic Press.
Holton, Gary. 2014. Mediating language documentation. In David Nathan and Peter K. Austin (eds.)
Language Documentation and Description, Volume 12, 37-52. London: SOAS.
Hornberger, Nancy H. 2010. Language shift and language revitalization. In Robert B. Kaplan (ed.)
The Oxford Handbook of Applied Linguistics (2nd edition), 365-373. Oxford: Oxford
University Press.
Hymes, Dell. 1964. Introduction: toward ethnographies of communication. American Anthropologist
66 (6), 134.
Kapono, Eric. 1995. Hawaiian language revitalization and immersion education. International
Journal of the Sociology of Language 112, 121.
Kroskrity, Paul V. 2015. Designing a Dictionary for an Endangered Language Community:
Lexicographical Deliberations, Language Ideological Clarifications. Language Documentation
and Conservation 9, 140-157.
Lehmann, Christian. 2001. Language documentation: A program. In Walter Bisang (ed.) Aspects of
typology and universals, 83-97. Berlin: Akademie Verlag.
Linn, Mary-Anne. 2014. Living archives: A community-based language archive model. In David
Nathan and Peter K. Austin (eds.) Language Documentation and Description, Volume 12, 53-
67. London: SOAS.
Lüpke, Friederike. 2009. Data collection methods for field-based language documentation. In Peter K.
Austin (ed.) Language Documentation and Description, Volume 6, 53-100. London: SOAS.
Lüpke, Friederike and Anna Storch. 2013. Repertoires and Choices in African Languages. Berlin:
Mouton de Gruyter.
Mayer, Mercer. 1969. Frog where are you? New York: Dial Books for Young Readers.
Mühlhäusler, Peter. 1992. Preserving languages or language ecologies? A top-down approach to
language survival. Oceanic Linguistics 31(2), 16380.
Mühlhäusler, Peter. 2000. Language planning and language ecology. Current Issues in Language
Planning 1(3), 306-367.
Nathan, David 2004. Documentary linguistics: alarm bells and whistles? Seminar presentation,
SOAS. 23 November 2004.
Nathan, David. 2006. Sound and unsound documentation: Questions about the roles of audio in
language documentation. Paper presented at the Georgetown University Roundtable on
Linguistics, Georgetown University, Washington, DC, March.
Nathan, David. 2009. The soundness of documentation: Towards an epistemology for audio in
documentary linguistics. Journal of the International Association of Sound Archives (IASA) 33.
[Available online at www.iasa-web.org/book/iasa-journal-no-33-june-2009 accessed 22 March
2012]
Nathan, David. 2010a. Sound and unsound practices in documentary linguistics: towards an
epistemology for audio. Language Documentation and Description, Volume 7, 1-17. London:
SOAS.
Nathan, David. 2010b. Archives 2.0 for endangered languages: From disk space to MySpace.
International Journal of Humanities and Arts Computing 4(1-2): 111-124.
18
Nathan, David. 2014. Access and accessibility at ELAR, an archive for endangered languages
documentation. In David Nathan and Peter K. Austin (eds.) Language Documentation and
Description, volume 12, 187-208. London: SOAS.
Nathan, David and Peter K. Austin. 2004. Reconceiving metadata: language documentation through
thick and thin. In Peter K. Austin (ed.) Language Documentation and Description, Volume 2,
179-187. London: SOAS.
Newman, Paul. 2003. The endangered languages issue as a hopeless cause. In Mark Janse and Sijmen
Tol (eds.) Language Death and Language Maintenance, 1-13. Amsterdam: John Benjamins.
Newman, Paul. 2009. Fieldwork and fieldmethods in linguistics. Language Documentation and
Conservation 3(1), 113-125.
Newman, Paul. 2013. The Law of Unintended Consequences: How the Endangered Languages
Movement Undermines Field Linguistics as a Scientific Enterprise. Seminar presented at
SOAS, 15th October 2013. Audio recording available at http://bit.ly/1FPIXY6 and video at
https://www.youtube.com/watch?v=xziE08ozQok, both accessed 27 March 2015.
Niedzielski, H. Z. 1992. The Hawaiian model for the revitalization of native minority languages and
cultures. In Willem Fase, Koen Jaspaert and Sjaak Kroon (eds.) Maintenance and Loss of
Minority Languages, 369-384. Amsterdam: John Benjamins.
Penton, David, Cathy Bow, Steven Bird and Baden Hughes. 2004. Towards a general model for
linguistic paradigms. Proceedings of EMELD-04. [http://emeld.org/workshop/2004/bird-
paper.html, accessed 12 August 2013]
Robins, Robert H. and Eugenius M. Uhlenbeck. 1991. Endangered Languages. New York: Berg
Publishers.
Romaine, Suzanne. 2007. Preserving endangered languages. Language and Linguistics Compass 1(1-
2), 115-132.
Sallabank, Julia. 2012. From language documentation to language planning: not necessarily a direct
route. In Frank Seifart, Geoffrey Haig, Nikolaus P. Himmelmann, Dagmar Jung, Anna
Margetts and Paul Trilsbeek (eds.) Potentials of Language Documentation: Methods, Analyses,
and Utilization, 118-125. Language Documentation and Conservation Special Publication No.
3. Honolulu: University of Hawai’i Press.
Sallabank, Julia. 2013. Endangered Languages: Attitudes, Identities and Policies. Cambridge:
Cambridge University Press.
Schultze-Berndt, Eva. 2006. Linguistic Annotation. In Jost Gippert, Nikolaus P. Himmelmann and
Ulrike Mosel (eds.) Essentials of Language Documentation, 213-251. Berlin: Mouton de
Gruyter.
Seifart, Frank. 2008. The representativeness of language documentations. Language Documentation
and Description, Volume 5, 60-76. London: SOAS.
Sherzer, Joel. 1987. A discourse-centered approach to language and culture. American Anthropologist
89(2), 295-309.
Spolsky, Bernard. 1989. Maori bilingual education and language revitalization. Journal of
Multilingual and Multicultural Development 10(2), 89-106.
Spolsky, Bernard. 2003. Reassessing Maori regeneration. Language in Society 32, 553-578.
Sugita, Yuko. 2007. Language revitalization or language fossilization? Some suggestions for language
documentation from the viewpoint of interactional linguistics. In Peter K. Austin, Oliver Bond
and David Nathan (eds.) Proceedings of First Conference on Language Documentation and
Linguistic Theory, 243-250. London: SOAS. [www.hrelp.org/eprints/ldlt_28.pdf, accessed 27
February 2015]
Thieberger, Nick. 2012. Counting collections. Post on Paradisec blog, 29 November 2012.
[http://www.paradisec.org.au/blog/2012/11/counting-collections/, accessed 6 June 2015]
19
Thieberger, Nick, Anna Margetts, Stephen Morey, Simon Musgrave and Adam Schembri. 2012.
Assessing curated corpora as research output. Paper presented to the Annual Conference of the
Australian Linguistic Society, University of Western Australia.
Whalen, Doug. 2004. How the study of endangered languages will revolutionize linguistics. In Piet
van Sterkenburg (ed.) Linguistics Today Facing a greater challenge, 321-342. Amsterdam:
John Benjamins.
Whalen, Doug. 2013. Response to Paul Newman. Blog post dated 3rd December 2013 at http://elar-
archive.org/blog/response-to-paul-newman-by-doug-whalen/, accessed 27 March 2015.
Woodbury, Tony. 2003. Defining documentary linguistics. Language Documentation and
Description, Volume 1, 35-51. London: SOAS.
Woodbury, Tony. 2011. Language documentation. In Peter K. Austin and Julia Sallabank (eds.) The
Cambridge Handbook of Endangered Languages, 159-186. Cambridge: Cambridge University
Press.
Woodbury, Anthony C. 2014. Archives and audiences: Toward making endangered language
documentations people can read, use, understand, and admire. In David Nathan and Peter K.
Austin (eds) Language Documentation and Description, volume 12: Special Issue on Language
Documentation and Archiving, 19-36. London: SOAS.
Yamada, Racquel-María. 2007. Collaborative linguistic fieldwork: Practical application of the
empowerment model. Language Documentation and Conservation 1(2), 257-282.
ResearchGate has not been able to resolve any citations for this publication.
Chapter
Full-text available
A language is a human skill and its products, comparable to some handcraft like weaving and its products. To document it means to preserve show pieces of it and to provide them with information which enable one to appreciate the skill and to learn (and thus, if necessary, revive) it to some extent. Documentation of a language is an activity (and, derivatively, its result) that gathers, processes and exhibits a sample of data of the language that is representative of its linguistic structure and gives a fair impression of how and for what purposes the language is used. Its aim is to represent the language for those who do not have access to the language itself. Description of a language is an activity (and, derivatively, its result) that formulates, in the most general way possible, the patterns underlying the linguistic data. Its aim is to make the user of the description understand the way the language works. While documentation in this sense has been done for other human skills, for instance in relevant museums, nothing of the sort has ever been suggested for languages. Therefore, many important questions have yet to be answered before one can even think of beginning. A methodologically interesting question is the following: Given this distinction between documentation and description, what kind and amount of information does the documentation have to add to the raw data so that a linguist should be enabled to come up with a description of the language on the basis of its documentation? From the answer to this and related questions, some proposals for a program of language documentation will be derived.
Book
Crowley's voice of experience brings us the best practical fieldwork guide to date. Sensible, frank, and comprehensive, this book prepares beginning field workers for the rigours ahead and will save years of costly trial and error. N. J. Enfield This book is a comprehensive, practical guide to field linguistics. It deals in particular with the problems arising from the documentation of endangered languages. Deploying a mixture of methodology and practical advice and drawing on his own immense experience, Terry Crowley shows how to record, analyse, and describe a language in the field. He covers the challenges and problems the researcher is likely to encounter, offers guidance on issues ranging from ethics to everyday diplomacy, and provides full discussions of corpus elicitation, how to keep track of data, salvage fieldwork, dealing with unexpected circumstances, and many other central topics. "We all learn by our mistakes," he writes, "and I have plenty of my own to share with you."
Chapter
This book collects and introduces some of the best and most useful work in practical lexicography. It has been designed as a resource for students and scholars of lexicography and lexicology and to be an essential reference for professional lexicographers. It focusses on central issues in the field and covers topics hotly debated in lexicography circles. After a full contextual introduction Thierry Fontenelle divides the book into twelve parts - theoretical perspectives, corpus design, lexicographical evidence, word senses and polysemy, collocations and idioms, definitions, examples, grammar and usage, bilingual lexicography, tools and methods, semantic networks, and how dictionaries are used. The book is fully referenced and indexed. The reader may be used independently for reference or as reading material for a course of study. It is an essential companion for The Oxford Guide to Practical Lexicography by Sue Atkins and Michael Rundell, published by OUP in 2008.