The Translation of English Dataset to Malay

Source publication

TEXT CLUSTERING FOR REDUCING SEMANTIC INFORMATION IN MALAY SEMANTIC REPRESENTATION

Article

Full-text available

Dec 2020

The generation of texts are dramatically increased in this era. A text basically consists of structured and unstructured texts. The enormous amount of unstructured texts can be easily perceived by humans, unfortunately cannot be simply processed by computer. It needs efficient techniques to reduce the information into more valuable vectors. In this...

Context 1

... selected words from three English datasets were translated into Malay. Table 2 shows the translation of dataset from English to Malay. As seen in Table 2, the English dataset was translated to Malay using Online Dictionary: Cambridge Dictionary (https://dictionary.cambridge.org/) ...

View in full-text

Context 2

... 2 shows the translation of dataset from English to Malay. As seen in Table 2, the English dataset was translated to Malay using Online Dictionary: Cambridge Dictionary (https://dictionary.cambridge.org/) dan Kamus Oxford Fajar (Hawkins, 2006). ...

View in full-text

Context 3

View in full-text

Context 4

View in full-text

Figure 1: Example of two anchor texts (100 metres, 100 m final) from...

Event coreference results on the ECB+ test set

Sample of disqualified general infoboxs types, usually linked from many...

Final 28 WEC-Eng extracted infobox types list WEC-Eng ECB+

Distribution of lexical types for 100 randomly sampled mentions from...

WEC: Deriving a Large-scale Cross-document Event Coreference dataset from Wikipedia

Preprint

Full-text available

Apr 2021

Cross-document event coreference resolution is a foundational task for NLP applications involving multi-text processing. However, existing corpora for this task are scarce and relatively small, while annotating only modest-size clusters of documents belonging to the same topic. To complement these resources and enhance future research, we present W...

Figure 1: Distribution of editors' "seniority", i.e., the years they...

Figure 2: Number of articles in the Wikipedia languages of the...

Figure 3: Number of active editors in the Wikipedia languages of the...

Figure 4: Most common workflow among the editors interviewed

References in Wikipedia: The Editors' Perspective

Preprint

Full-text available

Feb 2021

References are an essential part of Wikipedia. Each statement in Wikipedia should be referenced. In this paper, we explore the creation and collection of references for new Wikipedia articles from an editors' perspective. We map out the workflow of editors when creating a new article, emphasising how they select references.

The effects of teacher' feedback: a case study of an online discussion forum in Higher Education

Article

Full-text available

Dec 2021

A pesar de su importancia, los efectos de la retroalimentación en entornos en línea no han sido ampliamente evaluados; tampoco hay consenso sobre cómo deben medirse. El objetivo de este estudio exploratorio es analizar los efectos de la retroalimentación del profesorado durante el desarrollo de un foro de discusión en línea. Durante un período de t...

Figure 3 Different models lead to different partitions of Wikipedia...

Figure 5 Text layer determines the partitions obtained in the...

Summary of the datasets used in this paper

Multilayer networks for text analysis with multiple data types

Article

Full-text available

Dec 2021

We are interested in the widespread problem of clustering documents and finding topics in large collections of written documents in the presence of metadata and hyperlinks. To tackle the challenge of accounting for these different types of datasets, we propose a novel framework based on Multilayer Networks and Stochastic Block Models. The main inno...

teacher.js

Preprint

Full-text available

Jul 2021

teacher.js is a web-based digital communication and teaching tool. Developed as part of the teacher.solar project (whose goal is to take teaching outdoors, using entirely solar-powered devices) one of the main requirements was to keep both the bandwidth and power consumption low. For that reason teacher.js does not make use of video streaming (whic...

Short Text Clustering Algorithms, Application and Challenges: A Survey

Article

Full-text available

Dec 2022

The number of online documents has rapidly grown, and with the expansion of the Web, document analysis, or text analysis, has become an essential task for preparing, storing, visualizing and mining documents. The texts generated daily on social media platforms such as Twitter, Instagram and Facebook are vast and unstructured. Most of these generated texts come in the form of short text and need special analysis because short text suffers from lack of information and sparsity. Thus, this topic has attracted growing attention from researchers in the data storing and processing community for knowledge discovery. Short text clustering (STC) has become a critical task for automatically grouping various unlabelled texts into meaningful clusters. STC is a necessary step in many applications, including Twitter personalization, sentiment analysis, spam filtering, customer reviews and many other social network-related applications. In the last few years, the natural-language-processing research community has concentrated on STC and attempted to overcome the problems of sparseness, dimensionality, and lack of information. We comprehensively review various STC approaches proposed in the literature. Providing insights into the technological component should assist researchers in identifying the possibilities and challenges facing STC. To gain such insights, we review various literature, journals, and academic papers focusing on STC techniques. The contents of this study are prepared by reviewing, analysing and summarizing diverse types of journals and scholarly articles with a focus on the STC techniques from five authoritative databases: IEEE Xplore, Web of Science, Science Direct, Scopus and Google Scholar. This study focuses on STC techniques: text clustering, challenges to short texts, pre-processing, document representation, dimensionality reduction, similarity measurement of short text and evaluation.

The Translation of English Dataset to Malay

Contexts in source publication

Similar publications

Citations