The Translation of English Dataset to Malay

The Translation of English Dataset to Malay

Source publication
Article
Full-text available
The generation of texts are dramatically increased in this era. A text basically consists of structured and unstructured texts. The enormous amount of unstructured texts can be easily perceived by humans, unfortunately cannot be simply processed by computer. It needs efficient techniques to reduce the information into more valuable vectors. In this...

Contexts in source publication

Context 1
... selected words from three English datasets were translated into Malay. Table 2 shows the translation of dataset from English to Malay. As seen in Table 2, the English dataset was translated to Malay using Online Dictionary: Cambridge Dictionary (https://dictionary.cambridge.org/) ...
Context 2
... 2 shows the translation of dataset from English to Malay. As seen in Table 2, the English dataset was translated to Malay using Online Dictionary: Cambridge Dictionary (https://dictionary.cambridge.org/) dan Kamus Oxford Fajar (Hawkins, 2006). ...
Context 3
... selected words from three English datasets were translated into Malay. Table 2 shows the translation of dataset from English to Malay. As seen in Table 2, the English dataset was translated to Malay using Online Dictionary: Cambridge Dictionary (https://dictionary.cambridge.org/) ...
Context 4
... 2 shows the translation of dataset from English to Malay. As seen in Table 2, the English dataset was translated to Malay using Online Dictionary: Cambridge Dictionary (https://dictionary.cambridge.org/) dan Kamus Oxford Fajar (Hawkins, 2006). ...

Similar publications

Preprint
Full-text available
Cross-document event coreference resolution is a foundational task for NLP applications involving multi-text processing. However, existing corpora for this task are scarce and relatively small, while annotating only modest-size clusters of documents belonging to the same topic. To complement these resources and enhance future research, we present W...
Preprint
Full-text available
References are an essential part of Wikipedia. Each statement in Wikipedia should be referenced. In this paper, we explore the creation and collection of references for new Wikipedia articles from an editors' perspective. We map out the workflow of editors when creating a new article, emphasising how they select references.
Article
Full-text available
A pesar de su importancia, los efectos de la retroalimentación en entornos en línea no han sido ampliamente evaluados; tampoco hay consenso sobre cómo deben medirse. El objetivo de este estudio exploratorio es analizar los efectos de la retroalimentación del profesorado durante el desarrollo de un foro de discusión en línea. Durante un período de t...
Article
Full-text available
We are interested in the widespread problem of clustering documents and finding topics in large collections of written documents in the presence of metadata and hyperlinks. To tackle the challenge of accounting for these different types of datasets, we propose a novel framework based on Multilayer Networks and Stochastic Block Models. The main inno...
Preprint
Full-text available
teacher.js is a web-based digital communication and teaching tool. Developed as part of the teacher.solar project (whose goal is to take teaching outdoors, using entirely solar-powered devices) one of the main requirements was to keep both the bandwidth and power consumption low. For that reason teacher.js does not make use of video streaming (whic...

Citations

... These methods capture the lexical features of the text and are simple to implement; however, they ignore the semantic and syntactic features of the text. To address this issue, several studies have expanded and enriched the context of data from an ontology [68,69] or Wikipedia [70,71]. However, these techniques require a great deal of understanding of NLP. ...
Article
Full-text available
The number of online documents has rapidly grown, and with the expansion of the Web, document analysis, or text analysis, has become an essential task for preparing, storing, visualizing and mining documents. The texts generated daily on social media platforms such as Twitter, Instagram and Facebook are vast and unstructured. Most of these generated texts come in the form of short text and need special analysis because short text suffers from lack of information and sparsity. Thus, this topic has attracted growing attention from researchers in the data storing and processing community for knowledge discovery. Short text clustering (STC) has become a critical task for automatically grouping various unlabelled texts into meaningful clusters. STC is a necessary step in many applications, including Twitter personalization, sentiment analysis, spam filtering, customer reviews and many other social network-related applications. In the last few years, the natural-language-processing research community has concentrated on STC and attempted to overcome the problems of sparseness, dimensionality, and lack of information. We comprehensively review various STC approaches proposed in the literature. Providing insights into the technological component should assist researchers in identifying the possibilities and challenges facing STC. To gain such insights, we review various literature, journals, and academic papers focusing on STC techniques. The contents of this study are prepared by reviewing, analysing and summarizing diverse types of journals and scholarly articles with a focus on the STC techniques from five authoritative databases: IEEE Xplore, Web of Science, Science Direct, Scopus and Google Scholar. This study focuses on STC techniques: text clustering, challenges to short texts, pre-processing, document representation, dimensionality reduction, similarity measurement of short text and evaluation.