Illustration of the embedding step. The word and its affixes are embedded to obtain their vector representations. Character embeddings of the word are composed with a max-pooled CNN. The final word embedding is the concatenation of all the result vectors.

Illustration of the embedding step. The word and its affixes are embedded to obtain their vector representations. Character embeddings of the word are composed with a max-pooled CNN. The final word embedding is the concatenation of all the result vectors.

Source publication
Preprint
Full-text available
Previous work in Indonesian part-of-speech (POS) tagging are hard to compare as they are not evaluated on a common dataset. Furthermore, in spite of the success of neural network models for English POS tagging, they are rarely explored for Indonesian. In this paper, we explored various techniques for Indonesian POS tagging, including rule-based, CR...

Context in source publication

Context 1
... Prefix features are the first 2 and 3 characters of the word. Likewise, suffix features are the last 2 and 3 characters of the word. 7 For the character features, we followed [10] by embedding each character and composing the resulting vectors with a max-pooled CNN. The final embedding of a word is then the concatenation of all these vectors. Fig. 1 shows an illustration of the ...

Similar publications

Preprint
Full-text available
Detecting and identifying user intent from text, both written and spoken, plays an important role in modelling and understand dialogs. Existing research for intent discovery model it as a classification task with a predefined set of known categories. To generailze beyond these preexisting classes, we define a new task of \textit{open intent discove...
Conference Paper
Full-text available
The rapid growth of Twitter usage attracts many researchers to utilize Twitter data for several purposes, including emotion analysis. However, there is resource limitation with respect to standard dataset for emotion analysis task for under-resourced language, especially Indonesian. In this study, we build an Indonesian twitter dataset for emotion...