PreprintPDF Available

FST Morphology for the Endangered Skolt Sami Language

April 2020

April 2020

License
CC BY 4.0

Authors:

Jack Rueter

University of Helsinki

Mika Hämäläinen

Helsinki Metropolia University of Applied Sciences

We present advances in the development of a FST-based morphological analyzer and generator for Skolt Sami. Like other minority Uralic languages, Skolt Sami exhibits a rich morphology, on the one hand, and there is little golden standard material for it, on the other. This makes NLP approaches for its study difficult without a solid morphological analysis. The language is severely endangered and the work presented in this paper forms a part of a greater whole in its revitalization efforts. Furthermore, we intersperse our description with facilitation and description practices not well documented in the infrastructure. Currently, the analyzer covers over 30,000 Skolt Sami words in 148 inflectional paradigms and over 12 derivational forms.

A diagram showing file content for yaml analyzer-generator testing

…

A diagram showing some triggers used in description of ALGG type nouns

…

Figures - available via license: Creative Commons Attribution 4.0 International

Content may be subject to copyright.

Available via license: CC BY 4.0

Content may be subject to copyright.

FST Morphology for the Endangered Skolt Sami Language

Jack Rueter, Mika Hämäläinen

Department of Digital Humanities

University of Helsinki

{jack.rueter, mika.hamalainen}@helsinki.ﬁ

Abstract

We present advances in the development of a FST-based morphological analyzer and generator for Skolt Sami. Like other minority

Uralic languages, Skolt Sami exhibits a rich morphology, on the one hand, and there is little golden standard material for it, on the other.

This makes NLP approaches for its study difﬁcult without a solid morphological analysis. The language is severely endangered and the

work presented in this paper forms a part of a greater whole in its revitalization efforts. Furthermore, we intersperse our description with

facilitation and description practices not well documented in the infrastructure. Currently, the analyzer covers over 30,000 Skolt Sami

words in 148 inﬂectional paradigms and over 12 derivational forms.

Keywords: Skolt Sami, endangered languages, morphology

1. Introduction

Skolt Sami is a minority language belonging to Sami

branch of the Uralic language family. With its native speak-

ers at only around 300, it is considered a severely endan-

gered language (Moseley, 2010), which, despite its pluri-

centric potential, is decidedly focusing on one mutual lan-

gauge (Rueter and Hämäläinen, 2019). In this paper, we

present our open-source FST morphology for the language,

which is a part of the wider context of its on-going revital-

ization efforts.

The intricacies of Skolt Sami morphology include qual-

ity and quantity variation in the word stem as well as

suprasegmental palatalization before subsequent afﬁxes.

Like Northern Sami and Estonian, Skolt Sami has conso-

nant quantity and quality variation that surpasses that of

Finnish, i.e. Skolt Sami has as many as three lengths in

the vowel and consonant quantities in a given word.

The ﬁnite-state description of Skolt Sami involves develop-

ing strategies for reusability of open-source documentation

in other minority languages. In other words, the FST de-

scription is designed in such a fashion that it can be ap-

plied to other languages as well with minimal modiﬁca-

tions. Skolt Sami, like many other minority Uralic lan-

guages, attests to a fair degree of regular morphology, i.e.,

its nouns are marked for the categories of number, pos-

session and numerous case forms with regular diminutive

derivation, and its verbs are conjugated for tense, mood

and person in addition to undergoing several regular deriva-

tions. Morphological descriptions have been developed in

the GiellaLT (Sami Language technology) infrastructure at

the Norwegian Arctic University in Tromso, using Helsinki

Finite-State Technology (HFST) (Lindén et al., 2013).

Working in the GiellaLT infrastructure, it is possible to ap-

ply ready-made solutions to multiple language learning, fa-

cilitation and empowerment tasks. Leading into the digital

age, there are ongoing implementations, such as keyboards1

for various platforms, and corpora2, being expanded to

provide developers, researchers and language community

1http://divvun.no/keyboards/index.html/

2http://gtweb.uit.no/korp/

members access to language materials directly. The trick is

to ﬁnd new uses and reuses for data sets and technologies

as well as to bring development closer to the language com-

munity. If development follows the North Sámi lead, any

project can reap from the work already done.

Extensive work has already been done on data and tool

development in the GiellaLT infrastructure (Moshagen et

al., 2013) and (Moshagen et al., 2014), and previous work

also exists for Skolt Sami3(Sammallahti and Mosnikoff,

1991; Sammallahti, 2015; Feist, 2015). There are online

and click-in-text dictionaries (Rueter, 2017), 4spell check-

ers (Morottaja et al., 2018), 5, these are implemented in

OpenOfﬁce, but some of the more prominent languages

are supported in MS Word, as well as rule-based language

learning (Antonsen et al., 2013; Uibo et al., 2015). For

languages with extensive description and documentation,

there are syntax checkers (Wiechetek et al., 2019), machine

translation (Antonsen et al., 2017) and speech synthesis and

recognition (Hjortnaes et al., 2020), just to mention the tip

of the iceberg (Rueter, 2014). From a language learner

and research point of departure, the development and ap-

plication of these tools points to well-organized morpho-

syntactic and lexical descriptions of the language in focus.

By well-organized descriptions, we mean approaching

tasks at hand with applied reusability. Reusability is illus-

trated in the construction of a morphological analyzer for

linguists, which, due to the fact that it is able to recognize

and analyze regular morphological forms, can also serve as

a morphological spell checker. In fact, this same analyzer

can be reversed and used as a generator, which is useful

in providing language learners with ﬁxed, analogous and

random tasks in morphology. The same morphological an-

3http://oahpa.no/sms/useoahpa/background.

eng.html/, read further in this article for subsequent develop-

ments in http://oahpa.no/nuorti/

4The forerunner https://sanit.oahpa.no/read/, an

online dictionary here, and on analogous pages of other dic-

tionaries, (e.g., https://saan.oahpa.no/read/), can be

dragged to the tool bar of Firefox and Google Chrome

5http://divvun.no/korrektur/korrektur.

html/

arXiv:2004.04803v1 [cs.CL] 9 Apr 2020

alyzer, when augmented by glosses, can immediately begin

to provide online dictionary and click-in-text analyses.

The development of an optimal morphological analyzer and

glossing for a language like Skolt Sami requires concise

morphological and lexical work, on the one hand, and ac-

cess to corpora including language learning materials, on

the other. Corpora provide access to language in use, and

language learning materials help to establish a received un-

derstanding of the language. To this end, the morphologi-

cal analyzer for Skolt Sami has been constructed to analyze

and generate a pedagogically enhanced orthography, for in-

dication of short and long diphthongs preceding geminates

as well as mid low front vowels, as might be rendered in a

pronouncing dictionary. One such example might be seen

in the word kue

0tt ‘hut’ as opposed to the literal norm kue0tt,

where the dot below the enot only indicates a slightly low-

ered pronunciation of the vowel but also assists in identi-

fying the paradigm type, kue

0tt :kue0ąid ‘hut+N+Pl+Acc’

versus kue0ll :kuõ0lid ‘ﬁsh+N+Pl+Acc’.

By focusing on the construction of a pedagogical enhanced

analyzer-generator, teaching resources can be developed

that target randomly generated morphological tasks for the

language learner as in the North Sami learning tool Davvi 6.

In any given language reader, there are texts with words in

various forms and an accompanying vocabulary. While vo-

cabulary translation can readily be utilized as a ﬁxed task in

language learning, inﬂectional tasks, especially in morpho-

logically rich languages, can be developed as random exer-

cises. Although the contextual word forms in the reader are

quite limited, it is possible to construct randomized mor-

phological exercises where the student is expected to in-

ﬂect nouns, adjectives and verbs alike in forms that have

been taught but not explicitly given for the random words

provided in the reader vocabulary, e.g. in nouns the student

may select vocabulary from reader Achapters 1–5 with a

randomized task for nouns, plural, comitative, third person

singular possessive sufﬁx: +N+Pl+Com+PxSg3. Essen-

tially all nouns in the selected vocabulary available for this

reading are inadvertently presented to the learner.

2. Related Work

In the past, multiple methods have been proposed for auto-

matically learning morphology for a given language. One

of these is Morfessor (Creutz and Lagus, 2007), which is a

set of tools designed to learn morphology from raw textual

data. It has been developed with Finnish in mind, and this

means that it is intended to perform well with extensive reg-

ular morphology, i.e. morphologically rich languages, too.

Bergmanis and Goldwater (Bergmanis and Goldwater,

2017) present another statistical approach that can also take

spelling variation into account. Their approach is based on

the notion of a morphological chain consisting of child-

parent pairs. When analyzing the morphology of a lan-

guage, the approach takes several features into account such

as presence of the parent in the training data, semantic sim-

ilarity, likely afﬁxes and so on.

Such statistical approaches, however, are data-hungry. This

is a problem for various reasons in the case of Skolt Sami.

6http://oanpa.no/davvi/morfas/

The scarce quantity of textual data is one limitation, but it

is even a greater one given that the language is still being

standardized and the users provide a variety of forms and

vocabulary when expressing themselves in their native lan-

guage. This means an even greater variety in morphology

that the statistical model should be able capture from a lim-

ited dataset.

In the absence of a reasonably sized descriptive corpus of

the language, annotated or not, the most accurate way to

model the morphology is by using a rule-based methodol-

ogy.

FSTs (Finite-State Transducers) have been shown in the

past to be an effective way to model the morphology even

for languages with an abundance of morphological features

(cf. (Beesley and Karttunen, 2003)). Perhaps one of the

largest-scale FSTs to model the morphology of a language

is the one developed for Finnish (Pirinen et al., 2017). This

tool, Omorﬁ, serves as the state-of-the-art morphological

analyzer for Finnish.

3. The FST Model Development Pipeline

Developing a morphological description of a language pre-

supposes a language-learning and documentary approach.

Other people have learned the language and become proﬁ-

cient in it before you, so extract paradigms from grammars,

readers and research to build the language model. If you

are the ﬁrst researcher to describe the language, take hints

from the language learners, if there are any, they may be

still developing their own understanding of the language

morpho-syntax, and, at times, they may provide you with

informative interpretations of the language.

Idiosyncrasies of a language can, sometimes, be captured

through comparison to those of another. When a descrip-

tion of Skolt Sami, Finnish, Estonian, etc. introduces alien

phenomena, such as word-stem quality and quantity vari-

ation as well as suprasegmental palatalization, it is a good

idea to try describing them both separately and in tandem.

Word-stem quality variation affects both consonants and

vowel. In consonants, an analogous English example might

be illustrated with the f:vvariation found in the English

words life,lives and loaf,loaves. From a historical perspec-

tive, the verb to live will serve as an instance where long

and short vowels accentuate a distinction between nouns

and verbs. In a like manner, the English verb paradigm

(sing,sang,sung) provides a sample of vowel variation

with regular semantic alignment in other verbs, such as

swim and drink. These seemingly peripheral phenomena

of English, however, are central to the description of Skolt

Sami morphology, where consonant quality and quantity

variation permeate the verbal and nominal inﬂection sys-

tems. Suprasegmental palatalization is yet another phe-

nomenon to be dealt with, as it may present its own inﬂu-

ence on sound variations in both the consonants and vowels

in the same coda of a word stem. These require sound vari-

ation modeling in what is referred to as a two-level model,

where awareness of underlying hypothetical sound patterns

and surface-level reﬂexes are united to facilitate analysis

and generation of paradigmatic stem type variation, e.g.

an underlying sw{iau}m could be conﬁgured with a ˆ VowI

trigger to call the form swim,ˆ VowA the form swam, and

ˆ VowU the form swum.

Theoretically speaking, Skolt Sámi has vowel and conso-

nant quantity variation in three lengths, i.e. monophthongs

and diphthongs as well as geminates and consonant clus-

ters are subject to three lengths. One problem with the ini-

tial ﬁnite-state description of Skolt Sami was that attempts

were made to describe Skolt Sami according to the comple-

mentary distibution of quantity found in North Sámi7.

By chance, the author set out to describe vowel and conso-

nant quantity as separate conjoined phenomena, and when

the instance of short vowel and shortened consonant in tan-

dem presented itself, only a little extra implementation was

required for identifying this new variation. In fact, the phe-

nomenon had been described earlier as allegro versus largo,

but it had been ignored in some of the linguistic literature

(Koponen and Rueter, 2016).

Preparing the description of a single word is much like writ-

ing a terse dictionary entry. The required information con-

sists of a head word form or lemma, a stem form from

which to derive all required stems, a continuation lexicon

indicating paradigm type (part of speech is also interest-

ing), and ﬁnally a gloss or note. The word radio ‘radio’

might be presented as follows:

radio+N:radio N_RADIO ''radio'' ;

The LE MMA:STE M CONTINUATION-LEXICON NOTE pre-

sentation represents one line of code consisting of four

pieces of data. First, comes the index, which consists of the

lemma and part-of-speech tag. Second, after a separating

colon, comes the stem, which, with the Continuation lexi-

con (third constituent) make paradigm compilation possible

by indicating what base all subsequent concatenated mor-

phology connects to – the loanword ‘radio’ has no stem-

internal variation. Finally, there is the optional NOTE con-

stituent, where a gloss has been provided.

The Continuation lexicon name has been written in upper-

case letters to distinguish it from the remainder of the code

line. In this language, continuation lexicon names are ini-

tially marked for part of speech, hence the initial ‘N_’. This

part-of-speech increment is more of a mnemonic note to

help facilitate faster manual coding. After initial denom-

inal derivation lexica, nouns, adjectives and numerals are

directed to mutual handling of case, number and possessive

marking.

This initial line of code may encode even more complex

data. One such entry might be observed in the noun ve0rdd

‘stream’, which exhibits necessary information for complex

stem variation:

ve0rdd+N:ve

ˆ1VOW{0Ø}rdd N_KAQLBB ''ﬂow,

stream'';

The index ve0rdd+N: (LEM MA constituent and part-of-

speech tag), as such, is readily comprehensible. The part-

of-speech tag may also be preceded by tags indicating vari-

ants in order of preference (+v1,+v2) and homonymity

7In North Sámi, there is a three-way gradation system where

grade one has an extra-long vowel and short consonant, grade two

has a long vowel with a long consonant, and grade three has a

short vowel with an extra-long consonant.

(+Hom1,+Hom2), and it may be followed by tags indicat-

ing semantics (+Sem...) and part-of-speech subtypes (e.g.

+Prop for proper nouns, +Dem as in demonstrative pro-

noun). Tags, of course, may be inserted at the root or in

subsequent continuation lexica – this is simply a matter of

taste and the complexity of the continuation lexicon net-

work.

The ST EM ve

ˆ1VOW{0Ø}rdd in combination with the

CONTINUATION-LEXICON N_KAQLBB is what captures

the proliferation of six separate stem forms used in regu-

lar inﬂection: ve0rdd ‘S G+NOM’, vee0rd ‘SG+GEN’, ve

˙rdda

‘SG +ILL’, vii0rdi ‘PL+G EN’, ve0rdstes ‘S G+LOC+PXSG 3’,

˙e

˙rdaž ‘DIMIN+SG +NOM’. While vowel and consonant

variation might be considered peripheral in English, these

extensive patterns are wide-spread in Skolt Sami inﬂection.

Some verb types may even have as many as eleven sepa-

rate stem forms used in regular inﬂection and derivation.

Hence, consonant and vowel quality together with quantity

in both provides a challenge for description of the regular

inﬂectional paradigms of Skolt Sámi.

The continuation lexicon N_KAQLBB mnemonically points

to the Skolt Sámi word kä0lbb ‘calf (anim.)’ as a reference

to paradigm type.

Reference to paradigms has traditionally been done using

numbers. This entails access to a set of paradigm descrip-

tions, because no one can be expected to memorize large

sets of paradigm types by number alone. Using familiar

words to allude to paradigm types, however, may be straight

forward from a native speaker’s perspective, but they too

will require documentation in test code. Test codes might

be located adjacent to the appropriate afﬁx continuation

lexicon or in a separate set of test ﬁles (see also the noun

algg ‘beginning’ in Figure 1, below). The NOTE section, of

course, is open for virtually any type of data.

Development of guidelines helps newcomers join a tra-

dition and construct analogous, parallel descriptions in

the same or similar infrastructures. The presupposition

of a willingness to adapt new projects to the practices

of established analogous work is an important element in

open-source FST development at GiellaLT, which has been

adopted as the basis for guideline development. At Giel-

laLT documentation is sometimes sparse, incomplete or dif-

ﬁcult to ﬁnd, and therefore it is imperative that all possi-

ble reference be made to shared practices. For maximal-

ized short term achievement (2 to 5 years), the project lan-

guages to consult ﬁrst are North Sami (sme) and South

Sami (sma), whereas the experience from the Skolt Sami

language project is discussed here.

Skolt Sami speciﬁc descriptive materials have been dealt

with in the light of work in closely related languages. Here,

practice with analogous work in other Sami and Uralic lan-

guages has been helpful in learning mnemonic methods that

can be applied as well as lexicon code line writing and

sound variation modeling. Each language has many of its

own requirements, but, where ever possible, we should seek

out ways to align all projects.

The tag sets used with various language parsers at GiellaLT

are extensive and have been directly adapted to work in the

Skolt Sami project to ensure a high usability of tools al-

ready implemented and in mutual use in many language

projects. Ordering of tags reﬂects parsing no later than

2005, e.g. N+Sg+Nom giehta ... (Sjur Moshagen and

Trosterud, 2005). Inﬂection types are indicated mnemon-

ically by use of a frequent representative of the type, a

strategy also observed in Omorﬁ, e.g. an initial continu-

ation class marking N_ALGG (algg ‘beginning’) is given

for nouns with a coda structure in VhighC1C2C2. Inﬂection

type naming of this kind draws the developer’s attention to

the familiar word and helps to minimize speciﬁcation con-

sultation required when inﬂection types are only numeri-

cally coded, e.g. 1, 2, 3... Both systems, however, require

set speciﬁcations for each inﬂection type.

In order to enable morpho-lexical variation detection, FST

description presupposes a degree of wrong form genera-

tion. Indeed, wrong form coverage is what facilitates in-

telligent spell checking suggestions, e.g. generation of a

four-year-old’s simple past rendition, swimmed, with a hint

tag +regular-past-error could be useful. For extended cov-

erage, more inﬂection types and extensions are described

than would otherwise be assumed from mere phonologi-

cal descriptions. There is diversity in the spoken language,

which has meant that certain stem types or individual forms

must be provided with multiple realizations. Here we want

to avoid assigning multiple paradigms to individual lem-

mas where the distinction between the paradigms may lie

in only one or two forms (cf. (Iva, 2007)).

In Skolt Sami building a slightly more demanding descrip-

tion of the phonology has meant the inclusion of otherwise

pedagogical characters and graphemes. Special ﬁltering is

available for converting pedagogic target transducers into

normative transducers and spell relaxes extend these in turn

to descriptive transducers. These same methods are shared

by other language projects in the GiellaLT infrastructure. In

the long run, tweeking the description for pedagogic target-

ing means that even more uses are being made available,

and that basic work is almost immediately available for

continuation projects already realized or under construction

in other language projects, i.e. syntactic disambiguation,

text-to-speech, etymology suggestion.

3.1. Development of the two-level description

Skolt Sami Finite-state transducer development reuses de-

scriptive materials for both concatenation strategies and

testing. Work in the GiellaLT infrastructure begins with

generation-analysis code test ﬁles (yaml), with content as

in (Figure 1). Each line contains a lemma, subsequent tag

set and resulting output word form or forms following a

colon, e.g. algg+N+Sg+Gen: aalg.

The lines of description in the yaml test ﬁle (lemma + tag

set + resulting word forms) are readily copied to a lexc af-

ﬁx description ﬁle for further editing and implementation

as code (Figure 2). Here it can be observed that concate-

national morphology is added after the :colon, but at the

same time there is a certain amount of further required mor-

phological quality and quantity change.

Editing in the continuation lexica in the afﬁxes/*.lexc ﬁles

entails stripping the lemma and the part of the target word

forms that can serve as the stem. Since Skolt Sami is not

a language with entirely simple concatenation strategies,

we can make a few observations of the interplay between

Figure 1: A diagram showing ﬁle content for yaml

analyzer-generator testing

Figure 2: A diagram showing LEXICON development for

ALGG type nouns

simple morphological concatenation and the complemen-

tary two-level model facilitation.

The lemma for the word algg ‘beginning’ is the same as

the nominative singular and has no morpho-phonological

changes, hence no triggers are present when coding

+N+Sg+Nom. In the genitive and accusative singular,

however, coding +N+Sg+Acc co-occurs with coda vowel

lengthening indicated with the trigger V2VV (lengthening,

i.e. one vowel becomes two) and consonant cluster weaken-

ing indicated with the trigger XYY2XY (i.e. the consonant

cluster altenation in -lgg and -lg) (compare concatenation

and phenomena in Figure 2), on the one hand, and the com-

pound of concatenational morphology with accompanying

triggers V2VV and XYY2XY, on the other in (Figure 3).

Figure 3: A diagram showing some triggers used in de-

scription of ALGG type nouns

The .yaml code test content can be further utilized as

in-line testing code by simply ﬂipping content left-to-

right for analysis reading, as shown in (Figure 4). Im-

plicit in the test data, we can observe ﬁve different

stems for the monophthong noun algg:algg ‘Sg+Nom’,

aalg‘Sg+Gen’, a0lˇ

gˇ

ge ‘Sg+Ill’, algstan ‘Sg+Loc+PxSg1’,

aa0lje ‘Dimin+N+Sg+Gen’.

Figure 4: A diagram showing some test data for ALGG

type noun analysis

Although there are instances of single stems taking nu-

merous afﬁxes, e.g. biograﬁa or radio, above, most

nominals and verbs require multiple stems. The exten-

sive stem variation observed in the noun algg, above,

is surpassed in the verb tie0tted ‘to know’. It uses the

following 10 stems in regular inﬂection: tie0tt- ‘Inf’,

tie0ą-‘Ind+Prt+Sg3’, tiõt’t- ‘Imprt+ConNeg’, tiõą-‘Deriv’,

tiõ0t’t- ‘Ind+Prt+Pl3’, tiõ0ą-‘Pot’, teât’t- ‘Imprt+Pl3’, teâtt-

‘Ind+Prs+Sg3’, teâą-‘Cond’, teä0t’t- ‘Ind+Prs+Pl3’. The

vowel quality variation in Skolt Sami and North Sami is

analogous to what is observed in Germanic irregular verbs,

e.g. sing,sang,sung.

Skolt Sami provides a challenge deserving of morpholex-

ical and two-level model descriptions as introduced origi-

nally (Koskenniemi, 1983) integration. Integration of con-

catenation lexicon and morphophonological two-level de-

scription has required both intuition and a working knowl-

edge of the target language. Whereas concatenation al-

ludes to simply adding one morpheme to another, morpho-

phonology draws our attention to changes required in the

stems; hence the challenge of deﬁning 10 separate stems

for a single lemma in Skolt Sami provided above. (More ex-

tensive descriptions of quality, quantity and suprasegmental

variation are provided in (Feist, 2015; Sammallahti, 2015).)

The two-level model utilizes parallel constraints for phono-

logical description. As mentioned above, descriptive gram-

mars of the Skolt Sami language indicate multiple simul-

taneous, coordinated variation in the stem. Thus work on

the two-level model initially opted to provide separate trig-

gers for each individual phenomenon, here ˆ V2VV quantity,

ˆ VowRaise quality and ˆ PAL palatalization.

In brief, triggers are an artiﬁcial means of replacing the

natural phonological features occurring in the morphology.

They can be used for causing phenomena subsequent (right-

context here) or preceding (left-context). For example, if

front-back vowel harmony is highly predictable on the basis

of the preceding stem, the individual stems can be marked

{front} or {back} triggers in order to elicit the front or

back allomorphs of subsequent sufﬁxes, i.e. triggers are set

for right-context phenomena. A trigger provides for ma-

nipulation of the harmony reﬂexes necessary for incorrect

morphology, as well, i.e. something needed in recogniz-

ing misspellings in intelligent computer-assisted language

learning and spell checker suggestions – let us remember

the instance of swimmed, above.

The two-level model rules facilitate simultaneous variation

of many features in the same word. Left and right con-

texts play an important role in this description, whereas

both contexts can contain morpho-phonological phenom-

ena seen to precede or follow the change elicited by a given

rule, or they can disregard them. Triggers are used in rule

writing, because the actual morphophonology of the words

does not necessarily reﬂect idealconsistant trigger pattern-

ing.

Zero-to-surface-entity rules present in the early phases of

the project have been corrected by adding multicharacter

archiphones to the individual stems. Stem-internal change

such as matters of vowel quantity and quality are indicated

with these symbols. For purposes of phenomenon recog-

nition, curly brackets have been used for displaying arrays

of variation, e.g. {eöâä} indicates there is a vowel vari-

ation of four separate qualities as required in the various

stems. Parallel multiple-character symbols have been im-

plemented for suprasegmentals, length markers, etc. Stem

variation in the word.

Modeling quantity in Skolt Sami has meant a divorce from

the description of other Sami languages. Quantity varia-

tion is generally viewed as a coordinated phenomenon af-

fecting vowel and consonant length simultaneously (see

reference to North Sámi and complementary distribution

of quantity, above). Skolt Sami deviates here: The pre-

dictable ‘extra long vowel + short consonant’; ‘long vowel

+ long consonant’, ‘short vowel + extra long consonant’

combinations are supplemented by a fourth ‘extra short

vowel + extra short consonant’ pattern. The four-way

split required little new coding; original quantity model-

ing had treated vowel and consonant length as separate

phenomena. When the fourth pattern became more ap-

parent after the ﬁrst half year, all triggers were present,

and actually little work was required to implement their

use. Since the fourth pattern alternates with the long-

vowel-long-consonant pattern algstan (allegro) ∼aalgstan

(largo), respectively ‘begin+N+Sg+Loc+PxSg1’, more lan-

guage documentation was required, as this variation was

found to permeate the inﬂection and derivation pattern of

the language.

Modeling quality in Skolt Sami has introduced multi-

character symbols in the stem. These multi-character sym-

bols contain arrays of realizations in commented curly

brackets, e.g. t%{ie%}%{eöâä%}%{0Ø%}tt ‘to know’,

above. Each array indicates a mnemonic list of variables.

These lists are easy to interpret and consistent with guesser

and cognate search development, where sound change is

consistently traceable (Kimmo Koskenniemi and Heiki-

Jaan Kaalep, pc.). Moreover, array notations are analo-

gous with inﬂection group identifying model words as in

N_ALGG and N_KAQLBB, above.

Variation in the multi-character symbols as well as the

unmarked consonants is modeled with triggers. Triggers

are used to elicit vowel length and height, suprasegmental

palatalization (which may affect the realization of both the

preceding vowel and subsequent consonantism), as well as

consonant length and quality. In the Skolt Sami project,

vowel length is triggered with the multi-character symbols

%ˆ V2VV (short to long) and %ˆ VV2V (long to short).

To avoid balancing problems introduced with ﬂag diacritics

and further unexpected complications, triggers are ordered

and follow the stem before concatenated sufﬁxes. The tie0ą-

stem required for rendering the form V+Pot+Sg3: tie0ąež

is elicited with the consecutive triggers: %ˆ VOWRaise,

%ˆ PALE, %ˆ PAL and %ˆ CC2C, i.e. vowel raising (which

would regularly render iõ), suprasegmental coloring (ren-

dering iõ ⇒ie), palatalization ( 0) and consonant quality

change via shortening. The large number of triggers de-

manded a large memory, and to alleviate the problem a

reversed-intersect function was implemented in the Giel-

laLT infrastructure as recommended by a member of the

HFST team.

3.2. Deviation from Point of Departure on

GiellaLT

The Skolt Sami project has seen departure from previous

work in the infrastructure but simultaneously adherence to

a mnemonic system of description. In the course of the

project, the policy of lemma followed by a simple ortho-

graphic stem has not been retained. The number of nominal

stem types has risen to 308 from the 56 described in (Sam-

mallahti and Mosnikoff, 1991), while the number of verbal

stem types is 115 as compared to 30 (ibidem.). Adjectives

and numerals share inﬂection types with nouns. Before the

commence of the project in 2013, for instance, only 280

verbs and 828 nouns were partially facilitated by the sys-

tem, whereas by the end of 2018 the analogous ﬁgures were

4844 verb stems with over 40 conjugation forms as well as

numerous verbal and nominal derivations and 23683 noun

stems with over 98 declensional forms aa well as additional

derivations, and the entire lemma count exceeded 36000.

Multi-character symbol development endears mnemonic

forms. Arrays enclosed in curly brackets are used for in-

dicating vowel quality and quantity variation, a practice

analogous of inﬂection type model words that hint at the

type of stem variation. Triggers have, in matters of length,

been drafted to reﬂect speciﬁc nuances of coda description,

e.g. %ˆ VV2V indicates vowel shortening, %ˆ CCC2CC

geminate shortening, and %ˆ XYY2XY consonant cluster

shortening, respectively.

Triggers have been fashioned for and subsequent afﬁxes.

The stem has been ﬁlled with multiple-character symbols to

indicate which letters and graphemes undergo change and

what kind of change. Ordered triggers have been applied

to bring about these changes regardless of the orthographic

context, which simpliﬁes the generation of incorrect forms,

a necessity in the recognition of ill-formed word forms and

their alignment with the desired words.

Trigger ordering is aligned with the orthographic realiza-

Word Class glossed unglossed inﬂections derivations

Adjectives 4190 166 16 3

Nouns 21640 712 99 3+

Verbs 4845 23 33 6+

total 30675 901 148 12+

Figure 5: morpholexical coverage’

tion of phonological phenomena. Thus, changes in penulti-

mate syllables precede those in ultimate syllables, which is

similar to vowel changes preceding suprasegmental mark-

ing and subsequent consonants.A special context marker

Pen is used before each trigger effecting change in the

penultimate syllable. The trigger count in a given stem may

reach six.

4. Lexical and Morphological Coverage

In the absence of gold annotated data, we do not con-

duct an evaluation typical to the current mainstream NLP,

but rather describe the coverage of forms and lexemes in

the transducer. Here we will limit our discussion to the

most extensive paradigms, i.e. adjectives, nouns and verbs

(see Figure 5). In addition to statistics on glossed and un-

glossed lexicon, where glossed is a loose term for the pres-

ence of at least one single word translation for each Skolt

Sámi word in the Akusanat dictionary (Hämäläinen and

Rueter, 2018), we will discuss regular inﬂection and deriva-

tion. While inﬂection refers to conjugation and declension,

on the one hand, derivation indicates part-of-speech trans-

formation brought about by morphological means, on the

other. As a result of this work, the Skolt Sámi transducer

represents a lexicon of over 30,000 lemmas with a cover-

age of over 2.3 million inﬂectional forms, not to mention

the derivational exponent or compound nouns.

Adjectives in Skolt Sami may have special attribute forms

for use in the noun phrase, as is the situation in other Sami

languages. Adjectives are also known to decline in the same

case forms as nouns, which brings us to a total of approxi-

mately 16 paradigmatic forms associated with the declina-

tion of each adjective. Regular derivation, it will be noted,

is generally limited to comparative and superlative inﬂec-

tion will all cases as well as nominalization, which goes on

to feed regular noun inﬂection.

Nouns, like adjectives, can be declined in seven cases for

singular and plural with the addition of the partitive8. In

contrast to the adjectives, however, number and case can

be augmented with possession markers for three persons

and two numbers, which brings the number of paradigmatic

cells in declination to nearly 100. Nouns can further be de-

rived as regular diminutives (this again feeds regular deriva-

tion) and two types of adjectives with the meanings ‘with-

out X (privative)’ and ‘full of X’ (both of which can further

derived as nouns, and the former is regulary derived as a

verb).

The verbal paradigm is also relatively extensive. Each tense

and additional mood, with the exception of the imperative,

has three categories for person, two for number and an in-

deﬁnite personal form (7). Thus, in addition to two tenses

in the indicative, the subjunctive and potential mood there

8the partitive has no morphological distinction for number

are ﬁve more forms for the imperative, which brings us to

a total of 33 forms in a given conjugation paradigm. Non-

ﬁnite derivation, participles in addition to deverbal nouns

and verbs, adds feeders to nominal and verbal derivation

alike.

A large percentage of this regular inﬂection is in place and

available in the UralicNLP, a python library for Uralic mi-

nority languages (Hämäläinen, 2019). The lexical database

for Skolt Sami is also undergoing rigorous scrutiny and de-

velopment in the editing of the forth-coming Moshnikoff

Skolt Sami dictionary in Ve0rdd9, an open-source dictio-

nary environment for minority language community editor

and developer collaboration (Alnajjar et al., 2020). Ve0rdd

‘stream, ﬂow’ also provides an interface for feedback into

the dictionary system.

5. Discussion and Future Work

The FSTs are released in GiellaLT infrastructure as a con-

stantly updating bleeding edge release. Efforts have been

made to bring the writing of the FST lexc materials into an

easier MediaWiki based framework (Rueter and Hämäläi-

nen, 2017). All edits to the FSTs made in the Medi-

aWiki platform are automatically synchronized with those

uploaded to GiellaLT.

According to statistics at GiellaLT for online dictionary us-

age, the Skolt Sami–Finnish dictionary enjoys a great pop-

ularity among the language community. It is only second

to North Sami–Norwegian (Trosterud, p.c. 2019–06–04).

Statistics provide pointers for where elaboration is needed

in deﬁnitions as well as the shortcomings of the transducer

(analysis of misspelled words).

In order to make the FSTs more accessible for other re-

sarchers conducting NLP tasks focused on Skolt Sami,

the FSTs have been made available through UralicNLP

(Hämäläinen, 2019). This is a specialized Python library

for NLP for Uralic languages which makes using FSTs

easier by providing a documented programmatic interface.

Furthermore, the library uses precompiled models, which

further facilitates the reuse of our FSTs.

Modeling diphthongs is still a challenge for Skolt Sami. Fu-

ture work will attempt to develop separate triggers for the

ﬁrst and second element. Thus, the treatment of diphthongs

will be analogous to that of quantity. Especially front and

fronted diphthongs still offer unresolved variation in the

paradigms of a number of nouns.

FSTs provide a good starting point for development of

higher level NLP tools that embrace the new neural network

methods. For instance, FSTs can be used to generate paral-

lel sentences out of lexica and abstract syntax descriptions

to be used for neural machine translation in scenarios with-

out any real parallel data (Hämäläinen and Alnajjar, 2019).

Neural models for morphological tagging can as well ben-

eﬁt from readings provided by FSTs (Ens et al., 2019).

6. Conclusions

We have presented the current state of our on-going project

of modeling Skolt Sami morphology. The transducers are

9https://akusanat.com/verdd/

made available in a continuously updated fashion in multi-

ple different channels, to promote their use in any tasks that

contributes to the revitalization of the language

The highly phonological Skolt Sami orthography has

strengthened the notion that one description might be uti-

lized in multiple tools, i.e. text-to-speech, orthographic,

pedagogical, etc. This has lead to the addition of two extra

characters in the alphabet and the addition of a pedagogic

dictionary type generator.

Mnemonic formation of inﬂection type indicators has

been followed by the formulation of mnemonic multiple-

character symbols and triggers. Triggers have been or-

dered, and regular inﬂection has been modeled to exceed

mere ﬁnite conjugation and nominal declension. Additional

trigger work may be required for the description of diph-

thong quality change and derivation, but this must be done

in collaboration with the language community, language re-

searchers and the normative body.

7. Bibliographical References

Alnajjar, K., Hämäläinen, M., and Rueter, J. (2020). On

editing dictionaries for uralic languages in an online en-

vironment. In Proceedings of the Sixth International

Workshop on Computational Linguistics of Uralic Lan-

guages.

Antonsen, L., Johnson, R., Trosterud, T., and Uibo, H.

(2013). Generating modular grammar exercises with

ﬁnite-state transducers. In Proceedings of the second

workshop on NLP for computer-assisted language learn-

ing at NoDaLiDa 2013, pages 27–38.

Antonsen, L., Gerstenberger, C., Kappfjell, M.,

Nystø Rahka, S., Olthuis, M.-L., Trosterud, T., and

Tyers, F. M. (2017). Machine translation with north

saami as a pivot language. In Proceedings of the 21st

Nordic Conference on Computational Linguistics, pages

123–131, Gothenburg, Sweden, May. Association for

Computational Linguistics.

Beesley, K. R. and Karttunen, L., (2003). Finite-State Mor-

phology, pages 451–454. Stanford, CA: CSLI Publica-

tions.

Bergmanis, T. and Goldwater, S. (2017). From Segmen-

tation to Analyses: A Probabilistic Model for Unsuper-

vised Morphology Induction. In Proceedings of the 15th

Conference of the European Chapter of the Association

for Computational Linguistics: Volume 1, Long Papers,

pages 337–346.

Creutz, M. and Lagus, K. (2007). Unsupervised models

for morpheme segmentation and morphology learning.

ACM Transactions on Speech and Language Processing,

4(1), January.

Ens, J., Hämäläinen, M., Rueter, J., and Pasquier, P. (2019).

Morphosyntactic disambiguation in an endangered lan-

guage setting. In Proceedings of the 22nd Nordic Con-

ference on Computational Linguistics, pages 345–349.

Feist, T., (2015). A Grammar of Skolt Saami, volume 273,

pages 137–216. Helsinki: Suomalais-Ugrilainen Seura.

Hämäläinen, M. and Alnajjar, K. (2019). A template

based approach for training nmt for low-resource uralic

languages-a pilot with ﬁnnish. In Proceedings of the

2019 2nd International Conference on Algorithms, Com-

puting and Artiﬁcial Intelligence, pages 520–525.

Hämäläinen, M. (2019). UralicNLP: An NLP library for

Uralic languages. Journal of Open Source Software,

4(37):1345.

Hjortnaes, N., Partanen, N., Rießler, M., and M. Tyers,

F. (2020). Towards a speech recognizer for Komi, an

endangered and low-resource uralic language. In Pro-

ceedings of the Sixth International Workshop on Compu-

tational Linguistics of Uralic Languages, pages 31–37,

Wien, Austria, 10–11 January. Association for Compu-

tational Linguistics.

Hämäläinen, M. and Rueter, J. (2018). Advances in Syn-

chronized XML-MediaWiki Dictionary Development in

the Context of Endangered Uralic Languages. In Pro-

ceedings of the Eighteenth EURALEX International

Congress, pages 967–978.

Iva, S. (2007). Võru kirjakeele sõnamuutmissüsteem. [The

Inﬂection System of the Võro Literary Language.] PhD

thesis. University of Tartu.

Koponen, E. and Rueter, J. (2016). The ﬁrst com-

plete scientiﬁc grammar of skolt saami in english. In

Finnisch-Ugrische Forschungen, 2016(63), pages 254–

266. Suomalais-Ugrilainen Seura.

Koskenniemi, K. (1983). Two-Level Morphology: A Gen-

eral Computational Model for Word-Form Recognition

and Production. Helsinki: University of Helsinki, De-

partment of General Linguistics.

Lindén, K., Axelson, E., Drobac, S., Hardwick, S.,

Kuokkala, J., Niemi, J., Pirinen, T. A., and Silfverberg,

M. (2013). HFST a system for creating NLP tools. In

International Workshop on Systems and Frameworks for

Computational Morphology, pages 53–71. Springer.

Morottaja, P., Olthuis, M.-L., Trosterud, T., and An-

tonsen, L. (2018). Anarâškielâ tivvoomohjelm –

Kielâ- já ortograﬁafeeilâi kuorrâm tivvoomohjelmáin.

Dutkansearvvi die ¯

dalaš áigeˇcála, 1(2):63–259.

Christopher Moseley, editor. (2010). Atlas of the World0s

Languages in Danger. UNESCO Publishing, 3rd edi-

tion. Online version: http://www.unesco.org/languages-

atlas/.

Moshagen, S. N., Pirinen, T. A., and Trosterud, T. (2013).

Building an open-source development infrastructure for

language technology projects. In Proceedings of the

19th Nordic Conference of Computational Linguistics

(NODALIDA 2013); May 22-24; 2013; Oslo University;

Norway., number 85 in 16, pages 343–352. Linköping

University Electronic Press; Linköpings universitet.

Moshagen, S., Rueter, J., Pirinen, T., Trosterud, T., and

Tyers, F. M. (2014). Open-source infrastructures for

collaborative work on under-resourced languages. The

LREC 2014 Workshop “CCURL 2014 - Collaboration

and Computing for Under-Resourced Languages in the

Linked Open Data Era”.

Pirinen, T. A., Listenmaa, I., Johnson, R., Tyers, F. M.,

and Kuokkala, J. (2017). Open morphology of Finnish.

LINDAT/CLARIN digital library at the Institute of For-

mal and Applied Linguistics, Charles University.

Rueter, J. and Hämäläinen, M. (2017). Synchronized Me-

diawiki Based Analyzer Dictionary Development. In

Proceedings of the Third Workshop on Computational

Linguistics for Uralic Languages, pages 1–7.

Rueter, J. and Hämäläinen, M. (2019). Skolt sami, the

makings of a pluricentric language, where does it stand?

In Rudolf Muhr, et al., editors, European Pluricentric

Languages in Contact and Conﬂict, Bern, Switzerland.

Peter Lang.

Rueter, J. (2014). The Livonian-Estonian-Latvian Dictio-

nary as a threshold to the era of language technological

applications. Eesti ja soome-ugri keeleteaduse ajakiri,

5(1):251–259.

Rueter, J. (2017). DEMO: Giellatekno open-source click-

in-text dictionaries for bringing closely related languages

into contact. In Proceedings of the Third Workshop on

Computational Linguistics for Uralic Languages, pages

8–9, St. Petersburg, Russia, January. Association for

Computational Linguistics.

Sammallahti, P. and Mosnikoff, J. (1991). Suomi-

Koltansaame sanakirja. Lää0dd-sää0m sää0nneˇ

ke0rjj

[Finnish-Skolt Sami Dictionary]. Ohcejohka: Girjegiisá

Oy.

Sammallahti, P., (2015). Vuõ0lˇ

gˇ

ge jåå0tted ooudâs, De fas

johttájedje, Taas mentiin: Sää0mˇ

kiõllsaž lookkâmˇ

ke0rjj,

Nuortalašgiel lohkosat, Koltansaamen lukemisto, vol-

ume 14, pages 150–171. Oulu: Oulun Yliopisto.

Sjur Moshagen, P. S. and Trosterud, T. (2005). Twol at

work. CSLI Studies in Computational Linguistics ON-

LINE, pages 94–105.

Uibo, H., Pruulmann-Vengerfeldt, J., Rueter, J., and Iva, S.

(2015). Oahpa! õpi! opiq! developing free online pro-

grams for learning Estonian and võro. In Proceedings of

the fourth workshop on NLP for computer-assisted lan-

guage learning, pages 51–64, Vilnius, Lithuania, May.

LiU Electronic Press.

Wiechetek, L., Moshagen, S. N., and Omma, T. (2019). Is

this the end? two-step tokenization of sentence bound-

aries. In Proceedings of the Fifth International Workshop

on Computational Linguistics for Uralic Languages,

pages 141–153, Tartu, Estonia, January. Association for

Computational Linguistics.

8. Language Resource References

Sammallahti, P. and Mosnikoff, J., (1991). Suomi-

Koltansaame sanakirja. LÄÄ0DD-SÄÄ0m SÄÄ0NNÊ0RJJ

[Finnish-Skolt Sami Dictionary], pages 180–202. Ohce-

johka: Girjegiisá Oy.

ResearchGate has not been able to resolve any citations for this publication.

Advances in Synchronized XML-MediaWiki Dictionary Development in the Context of Endangered Uralic Languages

Conference Paper

Full-text available

Jan 2018

We present our ongoing development of a synchronized XML-MediaWiki dictionary to solve the problem of XML dictionaries in the context of small Uralic languages. XML is good at representing structured data, but it does not fare well in a situation where multiple users are editing the dictionary simultaneously. Furthermore, XML is overly complicated for non-technical users due to its strict syntax that has to be maintained valid at all times. Our system solves these problems by making a synchronized editing of the same dictionary data possible both in a MediaWiki environment and XML files in an easy fashion. In addition, we describe how the dictionary knowledge in the MediaWiki-based dictionary can be enhanced by an additional Semantic Me-diaWiki layer for more effective searches in the data. In addition, an API access to the lexical information in the dictionary and morphological tools in the form of an open source Python library is presented.

On Editing Dictionaries for Uralic Languages in an Online Environment

Conference Paper

Full-text available

Jan 2020

We present an open online infrastructure for editing and visualization of dictionaries of different Uralic languages (e.g. Erzya, Moksha, Skolt Sami and Komi-Zyrian). Our infrastructure integrates fully into the existing Giellatekno one in terms of XML dictionaries and FST morphology. Our code is open source, and the system is being actively used in editing a Skolt Sami dictionary set to be published in 2020. Abstract Tämä artikkeli esittelee Uralilaisten kielten (kuten ersän, mokshan, koltansaamen ja komi-syrjäänin) sanakirjojen toimit-tamiseen ja visualisointiin tarkoitetun avoimen verkkoinfrastruktuurin. Mei-dän infrastruktuurimme integroituu Giellateknoon XML-sanakirjojen ja FST-morfologian osalta. Lähdekoodimme on avointa, ja järjestelmäämme käytetään tällä hetkellä aktiivisesti koltansaamen sanakirjan toimitustyössä. Koltan sanakirja julkaistaan vuonna 2020.

Is this the end? Two-step tokenization of sentence boundaries

Conference Paper

Full-text available

Jan 2019

Morphosyntactic Disambiguation in an Endangered Language Setting

Conference Paper

Full-text available

Oct 2019

Endangered Uralic languages present a high variety of inflectional forms in their morphology. This results in a high number of homonyms in inflections, which introduces a lot of morphological ambiguity in sentences. Previous research has employed constraint grammars to address this problem, however CGs are often unable to fully disambiguate a sentence, and their development is labour intensive. We present an LSTM based model for automatically ranking morphological readings of sentences based on their quality. This ranking can be used to evaluate the existing CG disambiguators or to directly morphologically disambiguate sentences. Our approach works on a morphological abstraction and it can be trained with a very small dataset.

UralicNLP: An NLP Library for Uralic Languages

Article

Full-text available

May 2019

Mika Hämäläinen

In the past years the natural language processing (NLP) tools and resources for small Uralic languages have received a major uplift. The open-source Giellatekno infrastructure has served a key role in gathering these tools and resources in an open environment for researchers to use. However, the many of the crucially important NLP tools, such as FSTs and CGs require specialized tools with a learning curve. This paper presents UralicNLP, a Python library, the goal of which is to mask the actual implementation behind a Python interface. This not only lowers the threshold to use the tools provided in the Giellatekno infrastructure but also makes it easier to incorporate them as a part of research code written in Python.

Synchronized Mediawiki based analyzer dictionary development

Conference Paper

Full-text available

Jan 2017

Towards a Speech Recognizer for Komi, an Endangered and Low-Resource Uralic Language

Conference Paper

Jan 2020

Grammar of Skolt Saami

Book

Dec 2015

Timothy Feist

Skolt Saami is a Finno-Ugric language spoken primarily in northeast Finland by less than 300 people. The aim of this descriptive grammar is to provide an overview of all the major grammatical aspects of the language. It comprises descriptions of Skolt Saami phonology, morphophonology, morphology, morphosyntax and syntax. A compilation of interlinearised texts is provided in Chapter 11. Skolt Saami is a phonologically complex language, displaying contrastive vowel length, consonant gradation, suprasegmental palatalisation and vowel height alternations. It is also well known for being one of the few languages to display three distinctive degrees of quantity; indeed, this very topic has already been the subject of an acoustic analysis (McRobbie-Utasi 1999). Skolt Saami is also a morphologically complex language. Nominals in Skolt Saami belong to twelve different inflectional classes. They inflect for number and nine grammatical cases and may also mark possession, giving rise to over seventy distinct forms. Verbs belong to four different inflectional classes and inflect for person, number, tense and mood. Inflection is marked by suffixes, many of which are fused morphemes. Other typologically interesting features of the language, which are covered in this grammar, include (i) the existence of distinct predicative and attributive forms of adjectives, (ii) the case-marking of subject and object nominals which have cardinal numerals as determiners, and (iii) the marking of negation with a negative auxiliary verb. Skolt Saami is a seriously endangered language and it is thus hoped that this grammar will serve both as a tool to linguistic researchers and as an impetus to the speech community in any future revitalisation efforts.

A Template Based Approach for Training NMT for Low-Resource Uralic Languages - A Pilot with Finnish

Conference Paper

Dec 2019

The first complete scientific grammar of Skolt Saami in English

Article

Oct 2019

Timothy Feist: A Grammar of Skolt Saami. Mémoires de la Société Finno-Ougrienne 273. Finno-Ugrian Society. Helsinki 2015. 414 p. https://doi.org/10.33339/fuf.86126 This is an assessment of the merits of the English-language Skolt Sami Grammar written by Timothy Feist with respect to existing scholarship already available in English, Finnish and German. Here the writers use their knowledge in comparative Sami research and finite-state morphological descriptions of the language.

FST Morphology for the Endangered Skolt Sami Language

Abstract and Figures

Recommended publications

FST Morphology for the Endangered Skolt Sami Language

Ve'rdd. Narrowing the Gap between Paper Dictionaries, Low-Resource NLP and Community Involvement

Ve'rdd. Narrowing the Gap between Paper Dictionaries, Low-Resource NLP and Community Involvement

Ve rdd. Narrowing the Gap between Paper Dictionaries, Low-Resource NLP and Community Involvement