Conference PaperPDF Available

A Systematic Mapping Study of Language Features Identification from Large Text Collection

June 2019

June 2019

DOI:10.1109/MECO.2019.8760042

Conference: 2019 8th Mediterranean Conference on Embedded Computing (MECO)

Authors:

Diellza Nagavci Mati

South East European University

South East European University

TU Dublin

Show all 5 authorsHide

NUMBER OF PAPERS BY FRAMEWORK TYPE

Figures - uploaded by Diellza Nagavci Mati

Content may be subject to copyright.

Content uploaded by Diellza Nagavci Mati

Content may be subject to copyright.

A Systematic Mapping Study of Language Features

Identification from Large Text Collection

Diellza Nagavci Mati

Faculty of Computer Science and Technologies

South East European University (SEEU)

Tetovo, Macedonia

dn16574@seeu.edu.mk

Jaumin Ajdari

Faculty of Computer Science and Technologies

South East European University (SEEU)

Tetovo, Macedonia

j.ajdari@seeu.edu.mk

Bujar Raufi

Faculty of Computer Science and Technologies

South East European University (SEEU)

Tetovo, Macedonia

b.raufi@seeu.edu.mk

Mentor Hamiti

Faculty of Computer Science and Technologies

South East European University (SEEU)

Tetovo, Macedonia

m.hamiti@seeu.edu.mk

Besnik Selimi

Faculty of Computer Science and Technologies

South East European University (SEEU)

Tetovo, Macedonia

b.selimi@seeu.edu.mk

Abstract— Natural Language Processing1 is an emerging research

area in today’s era. The NLP resources are quite useful when it

comes to building a machine capable of translating between

linguistic pairs – a solution that strives to resolve the language

barrier problems. Based on this premise, we are focusing our

research on feature identification from large text collections of

Albanian language. ‘Rule-based’ or statistical Part-of-Speech2

(POS) taggers are sought to be utilized that would either need

considerable time for rule development or a sufficient amount of

manually labelled data.

In light of this, the impact of this research is based on e xploring

numerous cases that are conducive to progress and further

development of this field. One of the goals of this paper is to

conduct a systematic review study; to explore and analyze existing

research that seek to target low resources language such as is the

case of the Albanian language. According to prior observation o f

published research conducted since 2015, we are focusing our

research on studies that have been published in areas that are

relevant to Natural Language Processing. Based on considerable

load of related research on this field, it is essential to conduct a

review and provide an outline of the research situation as well as

current developments in this specific but important field o f

research.

1 Henceforth: NLP

Keywords-component; Natural Language Processing, Machine

Learning, Part-of-Speech, Algorithms, Chinese Whispers,

Clustering.

I. INTRODUCTION

Nowadays assigning syntactical classes of words is a crucial

pre-processing step for many NLP applications. On this note,

POS-tags are largely utilized when we seek to parse, chunk,

resolve anaphora, recognize named entities and extract

knowledge, among other uses. Basically, constructing a tagger

would require two conditions: a lexicon which is constituted of

tags-for-words, as well as a mechanism that attribute such tags

to relevant tokens in a text setting. Based on previous research,

focus will be given on analyzing ‘Natural Language Processing’

and ‘Machine Learning’ research papers that have proposed

different methods and algorithms on dealing with low resource

languages. As such, the methodology used in this paper will seek

to analyze and resolve the relevant research questions that arise.

After that, the paper seeks to summarize a classification scheme

on the fields of Machine Learning, Natural Language

Processing, and related Ontology. After that, four research

questions are answered and four others are proposed as future

research interests. In the literature review section of the paper,

the time series of papers according to machine learning and

natural language processing areas of interest are also included.

2 Henceforth: POS

Unsupervised POS-tagging methods and Chines Whispers3 are

specifically proposed for future research areas in low resource

languages.

II. METHODOLOGY

The main scope of this systematic review study is to

determine and answer the research questions based on analyzing

and cross-referencing related articles. After conducting a

comprehensive query; the most adequate and relevant papers are

selected, based on which the classification scheme is defined.

On that note, the research questions are addressed based on the

outcomes of the mapping related to-, and the outcome of this

entire systematic review process. This approach is appropriate,

since it often provides a visual summary, a map, of its results

[7]. Initially, this paper endeavors to amass all the relevant

publications relate to the field of interest. At the same time an

outline of this research field is to be provided – in which the

quantity, type of research along with available results are

identified. The second part of this paper will focus on conducting

a research related to major leading papers in the field, and

excluding the remainder of the studies that are not relevant to the

research questions.

TABLE 1. SEARCH STRINGS

After this, the paper addresses the research questions that

will guide through the structure of the research. Then, the third

part of this paper is focused on enacting the results of a

classification scheme – which seeks to capitalize on the existing

published studies towards providing more accurate results.

Answering the research questions ensures data extraction and

complete mapping of studies, by identifying, analyzing and

interpreting the suitable evidence. The classification scheme

depicts the field of interests that we mentioned earlier, namely:

Machine Learning, NLP and related Ontology.

3 Henceforth: CW

A. Research Questions and Search Strategy

The aim of this paper is to analyze publications that have

tackled Natural Language Processing based on some researc h

questions such as:

What is the core topic of interest that has been

discussed in the papers?

What type of methods were used with regard to

Natural Processing Learning?

How publications have evolved over time? What are

the research and publication trends?

Which algorithms can help towards finding rarely used

words?

The majority of research publications that were used for cross-

referencing and analysis in this research, were extracted from

digital libraries such as IEEE-Xplore, ACM, while additionally,

some articles were taken from Springer Link. The search strings

shown in Table 1 are the ones that were used to perform the

queries in the digital libraries mentioned above.

The majority of articles have proposed different search

strings. Out of those, we have selected just the ones that were

deemed more relatable for the field of interest of this paper

(shown on Table 1). Most of the selected papers are published in

recent years. Tab. 3 shows the number of publications in the

recent years (from 2015 to 2019) – again, focusing on the ones

that are appropriate to this study. From the selected papers,

further analysis is conducted, sorting only the papers that are

related to NLP and Machine Learning, Unsupervised and

Language Identification. As a result, after removing duplicates

and irrelevant papers, only 125 relevant articles remained.

III. CLASSIFICATION SCHEME

The classification scheme is presented in three columns –

presenting the main fields of interests related to the research

(Fig. 1). Machine Learning is the main area of interest on which

the focus will be placed, which will eventually lead towards

creating dictionaries for low-resource languages by using

unsupervised POS-tagging and methods strategies in the future.

The field of interest is defined in the first column of the scheme,

including Machine learning, NLP and Ontology as fields of

interest. Then, as shown in the second column, the unsupervised

learning can be used to help improve the automated POS-

tagging in low-resource languages. And finally, in the third

column, different Machine Learning Algorithms are included –

which will be used in low-resource languages such as the

Chinese Whispers algorithm. Based on the analysis from the

collected papers the gap was analyzed in pertinence to the ‘low-

resource languages’ - in which such research can add valuable

contribution.

No. Search String No. of

papers

SS1

((("Abstract”: Natural Language

Processing OR "

Abstract”: Machine Learning)

AND "Abstract”: Language

Identification)

205

SS2

((("Natural Language Processing ")

OR “Unsupervised")

AND Machine Learning)

AND Language Identification

132

SS3

((((("Natural Language Processing ")

OR" Unsupervised") AND Machine

Learning) AND Language

Identification)

AND prediction

23

Figure 1. Classification scheme

IV. RESULTS

All selected research papers were classified into distinct

categories in order to provide answers to the five research

questions. The research questions and the resulting responses of

the systematic review study are presented in the following

paragraphs.

A. RQ1: What is the core topic of interest that has been

discussed in the papers?

This question deals with the main field of interest that is

investigated in each of the papers. We are interested in Natural

Language Processing, but since we used several search strings,

we got several results. In order to answer this question, we

created the ‘Field of Interest’ classification for the papers.

From Table 2 we can observe that about 54% of the papers have

the machine learning as the main focus, [9] along with various

methods that have been utilized.

The second most mentioned area of interest is the Natural

Language Processing reaching almost 37%. This category of

papers includes semantic role labeling, spatial expression

recognition, opinion summarization, topic linking and

visualization plug-ins, etc. It has a lot of other major

applications such as OCR, parsing, natural language

understanding, named entity recognition and machine

translation [10].

TABLE 2. NUMBER OF PAPERS BY MAIN FIELD OF INTEREST

Field of interest Number of papers

Percentage

Machine Learning 68 54%

Natural Process

Learning

46 37%

Ontology 11 9%

B. RQ2. How publications have evolved over time? What are

the research and publication trends?

While examining the year of distribution for each paper, we

focus specifically on the time ranges between 2015 and 2019.

The majority of the selected papers (44.80%) have been

published in 2018 In fact, if we look at the graph in Fig. 2, it can

be observed that there is an increased publication rate from year

to year, indicating an increased interest in the field.

TABLE 3. NUMBER OF PAPERS PER YEAR UNTIL Q1-2019

Year Numbers of Papers %

2019 24 19.20%

2018 56 44.80%

2017 18 14.40%

2016 17 13.60%

2015 10 8.00%

Figure 2. Number of papers per year

C. RQ3. What type of methods were used with regard to

Natural Processing Learning?

In recent years, NLP (Natural Language Processing) have been

applied to solve different problems faced by linguistics.

Dipanjan, in his paper regarding natural processing language

(NLP) [11] has shown the efficacy of graph-based label

propagation for projecting part-of-speech information across

different languages. Such results show that it is possible to learn

accurate POS taggers for languages which do not have any

annotated data, but have translations into a resource-rich

language. Also, the results show leaning support towards

unsupervised POS-tagging; and related approaches that rely on

direct projections, and bridging the gap between purely

supervised and unsupervised POS tagging models [12]. From

the results of this paper, it has been found out that (please see

Tab. 4) the majority of the selected papers were focused on

methods related to Unsupervised Part-Of-Speech tagging

(61%), whereas the remainder tend to deal with Clustering,

approximately 39%.

Field of interest

Machine Learning

Natural Language

Processing

Ontology

Methods

Unsupervised

POS-tagger

Clustering

N-Gram

Algorithms

Chinese

Whispers

Linear

Regression

Logistic

Regresion

Classification

Active

Learning

19,20%

44,80%

14,40% 13,60%

8,00%

0,00%

10,00%

20,00%

30,00%

40,00%

50,00%

0

10

20

30

40

50

60

2019 2018 2017 2016 2015

N U M B E R O F P AP E RS

TABLE 4. NUMBER OF PAPERS BY FRAMEWORK TYPE

Methods Type Number of

papers Percentage

Unsupervised POS-

tagging 76 61%

Clustering 49 39%

D. RQ4: Which algorithms can help towards finding rarely

used words?

One important part of our research is to find out which

algorithms are used for finding the rarest words in low resource

languages. Regarding Natural Language Processing, one such

algorithm as used by Biemann [5] is the Chinese Whispers -

which is very basic algorithm to partition the nodes of attached,

undirected graphs.

Other algorithms such as Naïve Bayes, SGD, Logistic,

HyperPipes, RBFNetwork were used by Marenglen et al. [13]

in their research paper, for evaluation through experiments to

see the performance of classification algorithms for opinion

mining in a multi-domain corpus in Albanian language. They

have created 11 text corpuses of Albanian written opinions

collected from different well-known Albanian newspapers.

Each corpus has an identical number of text documents

categorized as positive opinions and text documents

categorized as negative opinions. Another entity recognition

system was created and evaluated, in view of existing machine

learning algorithms, such as decision trees and neural networks

presented by Georgios et al. [14]. These systems were evaluated

in the Greek text collection, and they carry to the recognition of

the disadvantages and restrictions imposed by the inspected

algorithms, when applied to natural language data. This new

technique is part of the category of inductive grammar learning.

The fundamental preferences of this method with respect to

other machine learning methods are the ability to handle textual

data, as well as the chance of using learned grammars in actual

systems, replacing manually developed grammars [16]. For

applying inductive grammar learning, a new algorithm has been

created that learns grammars from positive examples only. This

new algorithm can conclude context-free grammars, and it has

been founded on the existing algorithm- GRIDS [15],

improving both the user-friendliness, as well as the search

process in the space of possible grammars; jointly increasing

the applicability of the new algorithm into bigger collections of

data.

Anchor-NMF have been presented by Karl et al. [18]. This

learning algorithm is used to deal with the task of unsupervised

POS tagging. The goal of such task is to stimulate the correct

sequence of POS tags (hidden states) -given a sequence of

words (observation states). In the system of each POS tag, the

anchor condition corresponds to the assumption that at least one

word that occurs, is found under that tag.

E. RQ5: How can raw text be used to generate spell-check

dictionaries?

The low resource languages have lots of raw text, so we have

to see how can they be used to generate spell-check dictionaries

[63]. Some questions that need to be answered for unsupervised

algorithms are as the following:

- Which software should be used to generate the spell-

check dictionaries?

- Are we going to have results from raw text?

- What methods should be applied to generate the spell-

check dictionaries?

F. RQ6. Can usage differences detect misspelled words?

In order to detect usage differences in misspelled word in low-

resource languages, some questions need to be answered,

pertaining to unsupervised area:

- How can usage differences in a textual context detect

misspelled words in text collection?

- Which results can be obtained in low-resource

languages?

V. DISCUSSION

After analyzing 125 research papers and shared experiences

in Natural language Processing we are ready to apply this on the

low-resource languages. There are many researches that had

applied different algorithms of Machine Learning using

Supervised or Unsupervised learning [2] in different languages.

The Chinese Whispers graph clustering algorithm has been used

to perform necessary abstractions and generalizations [22] for

grouping words into POS-classes for text collection in a

language as Dutch, Italian, Sweden, Hungarian, German etc.

TABLE 6. METHODS THAT ARE USED

Methods Can be used

Unsupervised

POS-tagging

U

sed to draw inferences from datasets

consisting of input data without labeled

responses.

Clustering Used as a statistical data analysis

used in

many fields.

N-gram That each n-gram is composed of n-words.

\

Figure 3. Time series of papers according to methods field of interest

5

10

432

6

12 12 10

55

12 11

18

10

0

10

20

2019 2018 2017 2016 2015

Unsupervised POS-tagging

Clustering

N-gram

The system presented by Biemann et al. [5] uses CW

clustering of graphs constructed by distributional similarity to

induce a lexicon of supposedly non-ambiguous words with

respect to the Part-of-Speech, by choosing largely safe cases and

excluding questionable ones from the lexicon. In such

implementation, two clusters are combined: one for high and

medium frequency words, whereas the other for medium and

low frequency words. High and medium frequency words are

clustered by the similarity of their stop-word and context feature

vectors, such that a graph would be built that’d include only

words that are involved in highly similar pairs [4]. Clustering

such a graph of typical 5,000 vertices would result in several

hundred clusters, which are further used as POS categories. To

extend the lexicon, words of medium and low frequency would

be clustered using a graph that encodes the similarity of

NeighborCare co-occurrences. Both clusters are mapped by

overlapping elements into a lexicon that provides POS

information for some 50,000 words. For getting a cluster on data

sets of this size, an efficient algorithmic rule like CW is crucial

as well as a written word tagger with a morphological extension

would be trained -which in turn would assign a tag to each token

within the corpus [3].

VI. FUTURE RESEARCH AREA

This paper is based on many researches, that have great

relevance concerning this topic. Orientations in this field are

based on the latest technology developments (trends). Natural

language processing is an emerging research area in this

contemporary age, especially for low-resource languages. It

provides solutions for people that belong to different linguistic

backgrounds in the context of language learning. Through NLP

resources, a language translator can be developed and language

barrier problem can be reduced among populations [35].

Taking into consideration the great challenges of creating a

vocabulary for any language, this research will strive to

contribute towards enriching this low-resource language with

the results that will ensue, all the more since Albanian language

still possesses no proper digital vocabulary. So, in such instance

the limitations of using supervised learning methods would be

overcome through the use of unsupervised methods of

identifying language features in text collections. Two research

questions proposed as future research area:

Which results can be obtained when applying

unsupervised POS tagging to a large text collection

in low-resource languages?

How increasing the text collection can affect the

improvement of accuracy?

VII. CONCLUSION

According to the analyzed literature, NLP Unsupervised

learning had seen an increase in usage in the recent years, as per

the research trends. A systematic study in order to improve

Language feature identification from large text collection is

introduced in this study. In addition, we have introduced so me

research questions and some future research questions which

have to be further elaborated. We were looking on time series

of papers according to the gap of the NLP field of interest. Most

of the papers were supported by different analysis and models

of NLP and Unsupervised POS-tagging.

REFERENCES

[1] Alam, H., & Kumar, A. (2015). Multi-lingual author identification and

linguistic feature extraction â€” A machine learning approach. IEEE.

[2] Anjana, J. S., & Poorna, S. S. (2018). Language Identification from

Speech Features Using SVM and LDA. IEEE.

[3] Atzeni, M., & Atzori, M. (2018). Translating Natural Language to Code:

An Unsupervised Ontology-Based Approach. IEEE.

[4] Bais, H., Machkour, M., & Koutti, L. (2016). Querying database using a

universal natural language interface based on machine learning. IEEE.

[5] Biemann, C. (2016). Unsupervised Part-of-Speech Tagging in the Large.

Germany: Res on Lang and Comput.

[6] Bodapati, S. B., Ramaswamy, S., & Narayanan, G. (2018). A Machine

Learning Approach to Detecting Start Reading Location of eBooks. IEEE.

[7] Bodapati, S. B., Ramaswamy, S., & Narayanan, G. (2018). A Machine

Learning Approach to Detecting Start Reading Location of eBooks. IEEE.

[8] Cai, J., Li, J., Li, W., & Wang, J. (2018). Deeplearning Model Used in

Text Classification. IEEE.

[9] Caranica, A., Cucu, H., & Buzo, A. (2016). Exploring an unsupervised,

language independent, spoken document retrieval system. IEEE.

[10] Carbajal, M. J., Dawud, A., Thiollière, R., & Dupoux, E. (2016). The

“language filter” hypothesis: A feasibility study of language separation in

infancy using unsupervised clustering of I-vectors. IEEE.

[11] Celikyilmaz, A., Sarikaya, R., Jeong, M., & Deoras, A. (2016). An

Empirical Investigation of W ord Class-Based Features for Natural

Language Understanding. IEEE.

[12] Collobert, R. (2016). A Unified Architecture for Natural Language

Processing: Deep Neural Networks with Multitask Learning. IEEE.

[13] Dipanjan Das, S. P. (2011). Unsupervised Part-of-Speech Tagging with

Bilingual Graph-Based Projections. Proceedings of the 49th Annual

Meeting of the Association for Computational Linguistics, pages 600–609

[14] Dodal, S. S., & Kulkarni, P. V. (2018). Multi-Lingual Information

Retrieval Using Deep Learning. IEEE.

[15] Erik Cambria, B. W. (2015). Jumping NLP Curves: A Review of Natural.

Digital Object Identifier 10.1109/MCI.

[16] Gharge, S., & C havan, M. (2017). An integrated approach for malicious

tweets detection using NLP. IEEE.

[17] Goldberg, D. (2015). Genetic Algorithms in Search, Optimization,. IEEE.

[18]

representations. IEEE.

[19] Gunn, S. R. (2015). Support Vector Machines for Classification and

Regression.

[20] Hung, C.-K. (2017). Making machine-learning tools accessible to

language teachers and other non-techies: T-SNE-lab and rocanr as first

examples. IEEE.

[21] Hutchinson, T. (2018). Protecting Privacy in the Archives: Supervised

Machine Learning and Born-Digital Records. IEEE.

[22] IEEE/ACM Transactions on Audio, S. a. (2015). Supervised Detection

and Unsupervised Discovery of Pronunciation Error Patterns for

Computer-Assisted Language Learning. IEEE.

[23] Itauma Itauma 1, M. S.-w. (2015). Unsupervised Learning and Image

Classification in High Performance Computing Cluster. IEEE 14th

International Conference on Machine Learning and Applications.

Automatic Hate Speech Detection using Natural Language Processing: A state-of-the-art literature review

Conference Paper

Jun 2023

Social network usage is growing daily, making it impossible to manage the enormous amount of data being generated. The presence of abusive behavior and hate speech is a clearly harmful phenomenon that is evident on these networks. Due to its importance, recent studies have revealed a significant concern in this field. This review aims to provide insight into the tasks and procedures associated with the automatic detection and classification of texts containing hate speech. As the domain of hate speech is wide, an analysis of definitions is conducted and a comprehensive and unifying definition is proposed. This paper also investigates the latest datasets across languages used to train AI models. Recent studies show that feature selection plays a key role in detecting hate speech. In this research, we analyze which are the most utilized and impactful features from works done in this domain. While various classification algorithms have been used for hate speech detection, we investigate numerous research studies using multiple types of machine learning and deep learning models and present the most recent and relevant methods.

Building Spell-Check Dictionary for Low-Resource Language by Comparing Word Usage

Conference Paper

Sep 2021

Building Dictionaries for Low Resource Languages: Challenges of Unsupervised Learning

Conference Paper

Full-text available

Jul 2021

The development of natural language processing resources for Albanian has grown steadily in recent years. This paper presents research conducted on unsupervised learning-the challenges associated with building a dictionary for the Albanian language and creating part-of-speech tagging models. The majority of languages have their own dictionary, but languages with low resources suffer from a lack of resources. It facilitates the sharing of information and services for users and whole communities through natural language processing. The experimentation corpora for the Albanian language includes 250K sentences from different disciplines, with a proposal for a part-of-speech tagging tag set that can adequately represent the underlying linguistic phenomena. Contributing to the development of Albanian is the purpose of this paper. The results of experiments with the Albanian language corpus revealed that its use of articles and pronouns resembles that of more high-resource languages. According to this study, the total expected frequency as a means for correctly tagging words has been proven effective for populating the Albanian language dictionary.

Web app for quick evaluation of subjective answers using natural language processing

Article

Jun 2022

In current digital climate, education sector is evolving as the computer technology advances. Education is being digitized: online classes, online examination methods are conducted, etc. During examination, students are assessed by their answers having given for the question set by a teacher. Today many tools are available to assess the performance of a student using multi choice questions tools which provide instant evaluation, but there are available very limited and operational tools where subjective type answer of students are evaluated. This paper presents a web-based application to address this challenge. It automates the process of subjective answers checking and generates results through using natural language processing methods, like keyword matching semantic, lexical analysis and cosine similarity. Experiments show that appreciated by the teacher result and the system estimation does not have much difference which signifies that the system evaluates answers with a 97 % accuracy. The presented system not only reduces manpower but also eliminates the traditional method of conducting exclusively subjective exams using paper documents. It also eliminates the delays in the paper checking, result generation process. The cases of information leak are being reduced and the objectivity of the assessment is being increased.

Toward an Electronic Resource for Systematic Reviews in Computing

Technical Report

Full-text available

Jul 2022

Ghader Kurdi

Morphological Tagging and Lemmatization in the Albanian Language

Article

Full-text available

Dec 2021

An important element of Natural Language Processing is parts of speech tagging. With fine-grained word-class annotations, the word forms in a text can be enhanced and can also be used in downstream processes, such as dependency parsing. The improved search options that tagged data offers also greatly benefit linguists and lexicographers. Natural language processing research is becoming increasingly popular and important as unsupervised learning methods are developed. There are some aspects of the Albanian language that make the creation of a part-of-speech tag set challenging. This research provides a discussion of those issues linguistic phenomena and presents a proposal for a part-of-speech tag set that can adequately represent them. The corpus contains more than 250,000 tokens, each annotated with a medium-sized tag set. The Albanian language’s syntagmatic aspects are adequately represented. Additionally, in this paper are morphologically and part-of-speech tagged corpora for the Albanian language, as well as lemmatize and neural morphological tagger trained on these corpora. Based on the held-out evaluation set, the model achieves 93.65% accuracy on part-of-speech tagging, The morphological tagging rate was 85.31 % and the lemmatization rate was 88.95%. Furthermore, the TF-IDF technique weighs terms and with the scores are highlighted words that have additional information for the Albanian corpus.

A Descriptive Answer Evaluation System Using Cosine Similarity Technique

Conference Paper

Jun 2021

Meenakshi Thalor

Revolutionize Cosine Answer Matching Technique for Question Answering System

Conference Paper

Mar 2021

Linguistic features in Turkish word representations

Conference Paper

May 2017

A Machine Learning Approach to Detecting Start Reading Location of eBooks

Conference Paper

Nov 2018

Deeplearning Model Used in Text Classification

Conference Paper

Dec 2018

Protecting Privacy in the Archives: Supervised Machine Learning and Born-Digital Records

Conference Paper

Dec 2018

Tim Hutchinson

A Machine Learning Approach to Detecting Start Reading Location of eBooks

Conference Paper

Nov 2018

Language Identification From Speech Features Using SVM and LDA

Conference Paper

Mar 2018

Translating Natural Language to Code: An Unsupervised Ontology-Based Approach

Conference Paper

Sep 2018

Multi-Lingual Information Retrieval Using Deep Learning

Conference Paper

Jul 2018

Making machine-learning tools accessible to language teachers and other non-techies: T-SNE-lab and rocanr as first examples

Conference Paper

Nov 2017

Chao-Kuei Hung

An integrated approach for malicious tweets detection using NLP

Conference Paper

Mar 2017

Article

Between veneration for the text ..........Quran recitation

January 2016

Zak Hamdia

The recitation of quran exhibits features of casual speech in natural languages.

Conference Paper

Full-text available

CONCEPTUALISING TOUCHPOINT CONTAINERS TO ENHANCE MULTI-ACTOR EXPERIENCE

June 2017

When designing and managing services, customer journey mapping and service blueprints are often used to described and understand the specific service. However, in many services, when implemented, actual customer journeys are intersecting in specific spaces. This paper introduces the touchpoint container as a strong concept to understand resource integration in spaces where touchpoints from ... [Show full abstract] different customer journeys have been collected. We provide an empirical illustration on what a touchpoint container is and highlight the features of such a container.

Article

18 century examples of satirical edicts by the Brussels dockers of the canal, followed by a lampoon...

January 2001

WL Braekman

In times of political or social unrest, official promulgations such as decrees, edicts, ordinances, characterized by stereotyped language and rigid formalities were often ridiculed for political purposes or (and) for fun. The outward features of the official documents with all their trimmings are preserved and clash with the trifling content. Such ephemeral and strongly topical texts have rarely ... [Show full abstract] come down to us. Two late 18th-century examples of lampoons from Brussels, the one by the dockers of the canal, the other by the town women of easy virtue, are discussed and documented here.

Article

XML schema part 0: Primer second edition

January 2004

XML Schema Part 0: Primer is a non-normative document intended to provide an easily readable description of the XML Schema facilities, and is oriented towards quickly understanding how to create schemas using the XML Schema language. XML Schema Part 1: Structures and XML Schema Part 2: Datatypes provide the complete normative description of the XML Schema language. This primer describes the ... [Show full abstract] language features through numerous examples which are complemented by extensive references to the normative texts.