Terttu Nevalainen's research while affiliated with University of Helsinki and other places

Publications (38)

Chapter
This volume brings together contributions selected from papers delivered at the 21st International Conference on English Historical Linguistics (ICEHL, Leiden 2021). The chapters deal with aspects of language use throughout the history of English, including efforts to prescribe and regulate language in texts that share specific forms, functions and...
Article
Full-text available
This paper discusses the process of part-of-speech tagging the Corpus of Early English Correspondence Extension (CEECE), as well as the end result. The process involved normalisation of historical spelling variation, conversion from a legacy format into TEI-XML, and finally, tokenisation and tagging by the CLAWS software. At each stage, we had to f...
Article
Full-text available
In this paper, we explore the rate of language change in the history of English. Our main focus is on detecting periods of accelerated change in Middle English (1150–1500), but we also compare the Middle English data with the Early Modern period (1500–1700) in order to establish a longer diachrony for the pace at which English has changed over time...
Article
Full-text available
This issue of the Journal of Historical Sociolinguistics aims to contribute to our understanding of language change in real time by presenting a group of articles particularly focused on social and sociocultural factors underlying language diversification and change. By analysing data from a varied set of languages, including Greek, English, and th...
Conference Paper
Full-text available
Research in the digital humanities and computational social sciences requires overcoming complexity in research data, methodology, and research questions. In this article, we show through case studies of three different digital humanities and computational social science projects, that these problems are prevalent, multiform, as well as laborious t...
Article
Full-text available
Research into orthography in the history of English is not a simple venture. The history of English spelling is primarily based on printed texts, which fail to capture the range of variation inherent in the language; many manuscript phenomena are simply not found in printed texts. Manuscript-based corpora would be the ideal research data, but as th...
Article
The paper introduces our new project on diachronic sociolinguistics, focusing on the problems of compiling a representative corpus for this purpose. We study long-term linguistic change in the Late Middle and Early Modern English periods (1420-1680) in a computer-readable corpus of personal letters, which is designed specifically for the purposes o...
Chapter
Human communicative practices are organized in terms of genres, and people are highly skilled at recognizing genre differences. In text corpora, genres are typically defined on the basis of text-external features, such as medium, function and format. We show that the core genres of face-to-face conversation, prose fiction, broadsheet newspapers, an...
Book
Historical Sociolinguistics: Language Change in Tudor and Stuart England is the seminal text in the field of historical sociolinguistics. Demonstrating the real-world application of sociolinguistic research methodologies, this book examines the social factors which promoted linguistic changes in English, laying the foundation for Modern Standard En...
Article
Full-text available
We introduce the Language Change Database (LCD), which provides access to the results of previous corpus-based research dealing with change in the English language. The LCD will be published on an open-access linked data platform that will allow users to enter information about their own publications into the database and to conduct searches based...
Article
This paper takes a look at recent shifts in sociolinguistic paradigms and considers their applications to historical sociolinguistic research. Besides growth areas such as multilingualism, a current trend is convergence of established approaches. My discussion focuses on those that go even further, bridging the gap between macro- and micro-levels o...
Article
Place is an integral part of social network analysis, which reconstructs network structures and documents the network members’ linguistic practices in a community. Historical network analysis presents particular challenges in both respects. This article first discusses the kinds of data, official documents, personal letters and diaries that histori...
Article
Finding out whether a word occurs significantly more often in one text or corpus than in another is an important question in analysing corpora. As noted by Kilgarriff (Language is never, ever, ever, random, Corpus Linguistics and Linguistic Theory, 2005; 1(2): 263–76.), the use of the χ2 and log-likelihood ratio tests is problematic in this context...
Article
Full-text available
This paper reviews the gap between current methods of text visualization and the needs of corpus-linguistic research, and introduces a tool that takes a step towards bridging that gap. Current text visualization methods tend to treat the problem as a data-encoding issue only, and do not strive for interactive, tightly coupled representations of tex...
Chapter
Methodological know-how has become one of the key qualifications in contemporary linguistics, which has a strong empirical focus. Containing 23 chapters, each devoted to a different research method, this volume brings together the expertise and insight of a range of established practitioners. The chapters are arranged in three parts, devoted to thr...
Conference Paper
Full-text available
Being able to trace language change in corpus data is premised on the assumption that the corpora providing the evidence remain comparable over time. General-purpose corpora such as the Helsinki Corpus of English Texts have been compiled using as closely similar text selection criteria as possible within each major period and even across periods. A...
Article
Many corpus linguists make the tacit assumption that part-of-speech frequencies remain constant during the period of observation. In this article, we will consider two related issues: (1) the reliability of part-of-speech tagging in a diachronic corpus and (2) shifts in tag ratios over time. The purpose is both to serve the users of the corpus by m...
Article
A major issue in the study of language change is the degree to which individual speakers participate in ongoing linguistic changes as these progress over time. In this study, we examine the hypothesis, suggested by research based on the apparent-time model, that in any given period most people are neither progressive nor conservative with regard to...
Article
In this paper linguists and researchers of visual data analysis outline the requirements and benefits of an information visualization approach for corpus linguistics. Over the years, the information visualization community has come up with a number of methods to visualize text, but the majority of these techniques do not serve the needs of the ling...
Chapter
Setting the Scene: People and CitiesVernacularizationEnrichment of the Written LanguageTowards a Standard LanguageReferences and Further Reading
Chapter
Introduction: An Interdisciplinary ApproachEmpirical FoundationsThe Widening Scope of ResearchTime-depth of Sociolinguistic GeneralizationsConclusion
Chapter
The Corpus of Early English Correspondence (CEEC) was compiled within the Sociolinguistics and Language History research project, which was funded by the Academy of Finland and the University of Helsinki in 1993–97. After that date, the researchers concerned with this project formed the core of the Historical Sociolinguistics team in the Research U...
Article
Estimating the relative frequencies of linguistic features is a fundamental task in linguistic computation. As the amount of text or speech that is available from a given user of the language typically varies greatly, and the sample sizes tend to be small, the most straightforward methods do not always give the most informative answers. Bootstrap a...
Article
This article is a contribution to the study of English vernacular universals, and its aims are twofold. Its empirical aim is to give a sociolinguistic account of the use and nonuse of negative concord, or multiple negation, from Late Middle to Late Modern English between 1400 and 1800. Its second aim is theory-driven: to consider the spread of nona...
Article
Full-text available
The English language has a well-documented history which can be traced back over twelve hundred years. This paper discusses the history of English focussing on the evidence it offers for sociolinguistic inquiry and raising issues to do with the social, historical and empirical validity of the enterprise. As the documentation on the earliest stages...
Article
cusses language and gender research on three topics that have been carried out within the frameworks of historical sociolinguistics and pragmatics: linguistic ste- reotyping, gender roles encoded in language use, and gender variation in discourse styles. The studies compare female and male usage across time and indicate the de- gree to which gender...
Article
The paper discusses the adverbialization of two roughly synonymous present-day English intensifiers, pretty and fairly. Based on electronic corpora, a quantitative analysis of their long-term history is provided using the framework of adverb functions proposed by Quirk et al. [Quirk, R., Greenbaum, S., Leech, G., Svartvik, J., 1985. A Comprehensive...

Citations

... The spelling was standardised in two stages: first semi-automatically by the Variant Detector software (vard 2; Baron, 2011aBaron, , 2011b and then manually by a team of people paying special attention to remaining variation recognised as problematic for the tagging, such as obsolete abbreviations and non-modern punctuation marks (Saario & Säily, 2020). The standardised-spelling ceece was then converted into xml and part-of-speech tagged by claws (Saario et al., 2021). ...
... We would argue that this phenomenon is not limited to the English Civil War but can be seen as part of what Dixon [6] calls "punctuated equilibria", which is the notion that language history is characterized by periods of relative stability punctuated by external events that cause sudden changes in the linguistic situation and hence accelerate the rate of linguistic change. Nevalainen et al. [26] have shown that in Middle English, such punctuating events included the Norman Conquest and the Black Death; at the lexical level, the legacy of the conquest shows up in the many French loanwords in English. ...
... Creating linked data manually is costly but automatic methods may not be available and automation lowers data quality. Using structured semantic data and making the knowledge structures explicit to the end user in the UI calls for new kind of digital data literacy and source criticism 68 from the end user [51,60]. What the underlying data actually means is not always clear and issues of Big Data quality, such as completeness, veracity, skewness, uncertainty, fuzziness, and errors of data arise. ...
... In addition to the grammar component, the LCD includes detailed information about the composition of various corpora, which can be used, for instance, when searching for research on specific genres (e.g. according to the detailed genre classification of the Helsinki Corpus of English Texts). Once published, the LCD will be accompanied by a tool designed to streamline the workflow related to meta-analysis, which further reduces the need for manual data processing (LADA; see Kesäniemi et al. 2018). The LADA tool can access the corpus information included in the LCD, which makes it useful for a variety of purposes in the meta-analytical process, such as the normalization of raw frequencies according to specific time periods or genres. ...
... Yet, the prevalence of counterexamples, coupled with the fundamental indeterminacy of grammatical structure (Cornips & Corrigan, 2005;Labov, 1978), means that the assertion that grammatical variables are less subject to evaluation and hence less socially stratified than phonetic/lexical variables is not entirely uncontroversial. Such counterexamples include work on historical changes in English (Nevalainen, 2006;Raumolin-Brunberg, 2005;Raumolin-Brunberg & Nevalainen, 1994), Meyerhoff's (1997) analysis of phonetically null versus overt pronominal subjects in Bislama, and research on morphosyntactic variation in French, particularly negation (Ashby, 1981;Coveney, 1996), subject doubling (Coveney, 2005;Nadasdi, 1995), and future time reference (Roberts, 2014). In fact, a study that compared variability on different levels of the grammar in English (Cheshire, Kerswill, & Williams, 2005) found no evidence that there is less social variation in morphosyntax and syntax than in phonology. ...
... Diachronically, Krielke et al. (2019) have shown a remarkable decrease in pronominal adverbs in scientific English between 1650 and 1850. This paradigm reduction of pronominal adverbs over time can partly be explained by the typological drift from synthetic to analytic (Nevalainen and Raumolin-Brunberg, 2012), e.g., whereby becoming by which. ...
... By doing a cross-corpus validation (see, e.g., (Lijffijt & Nevalainen, 2017), we have also shown that the research findings can be similar across corpora, which is, to some extent, can be explained by the fact that "both written and spoken genres are constituted by bundles of co-occurring features" (Bamford & Bondi, 2005, p. XY). ...
... Though these data need to be treated with care, as it concerns a study of letters from edited volumes (cf. Sairio et al. 2018), this seems to suggest that Norwich participated with the other non-northern cities in the adoption of -s, but that text type was indeed a constraining factor. ...
... Its variance 2 is less important than that corresponding to the first direction. Therefore, the diagonal elements of are arranged in descending order [46][47][48][49]: 1 ≥ ⋯ ≥ . Considering the matrix , the data vector x(k) can be transformed without any loss of information into a vector of principal components (PC) [36,37,39]: ...
... Desde otro punto de vista, las cartas proporcionan también detalles contextuales sumamente relevantes para el análisis sociolingüístico, como las relaciones de poder y solidaridad entre los participantes, su condición social y dialectal, etc. (Nevalainen y Raumolin-Brunberg, 2017[2003). En nuestro caso, a todo ello hay que sumar la excepcional circunstancia histórica que supusieron las migraciones de españoles a América, lo que generó un caudal inmenso de cartas a uno y otro lado del Atlántico, en las que se trasladaban al papel historias personales colmadas de afectividad. ...