Illustration of the embedding step. The word and its affixes are embedded to obtain their vector representations. Character embeddings of the word are composed with a max-pooled CNN. The final word embedding is the concatenation of all the result vectors.

Source publication

Figure 1. Illustration of the embedding step. The word and its affixes...

Figure 2. Confusion matrix of the best biLSTM with CRF tagger from the...

Toward a Standardized and More Accurate Indonesian Part-of-Speech Tagging

Preprint

Full-text available

Sep 2018

Previous work in Indonesian part-of-speech (POS) tagging are hard to compare as they are not evaluated on a common dataset. Furthermore, in spite of the success of neural network models for English POS tagging, they are rarely explored for Indonesian. In this paper, we explored various techniques for Indonesian POS tagging, including rule-based, CR...

Context 1

... Prefix features are the first 2 and 3 characters of the word. Likewise, suffix features are the last 2 and 3 characters of the word. 7 For the character features, we followed [10] by embedding each character and composing the resulting vectors with a max-pooled CNN. The final embedding of a word is then the concatenation of all these vectors. Fig. 1 shows an illustration of the ...

View in full-text

Towards Open Intent Discovery for Conversational Text

Preprint

Full-text available

Apr 2019

Detecting and identifying user intent from text, both written and spoken, plays an important role in modelling and understand dialogs. Existing research for intent discovery model it as a classification task with a predefined set of known categories. To generailze beyond these preexisting classes, we define a new task of \textit{open intent discove...

Emotion Classification on Indonesian Twitter Dataset

Conference Paper

Full-text available

Nov 2018

The rapid growth of Twitter usage attracts many researchers to utilize Twitter data for several purposes, including emotion analysis. However, there is resource limitation with respect to standard dataset for emotion analysis task for under-resourced language, especially Indonesian. In this study, we build an Indonesian twitter dataset for emotion...

Illustration of the embedding step. The word and its affixes are embedded to obtain their vector representations. Character embeddings of the word are composed with a max-pooled CNN. The final word embedding is the concatenation of all the result vectors.

Context in source publication

Similar publications