Hien Nguyen

Hien Nguyen
Ton Duc Thang University | TDT · Faculty of Information Technology

About

55
Publications
13,965
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
610
Citations

Publications

Publications (55)
Chapter
The rise of hate speech contents on social network platforms has recently become a topic of interest. There have been a lot of studies to develop systems that can automatically detect hate speech contents. In this paper, we propose a knowledge-rich solution to hate speech detection by incorporating hate speech embeddings to generate a more accurate...
Chapter
In this study, we propose a new deep language model that taking the advantage of Transformer model towards the task of Vietnamese sentence classification. We construct a new Vietnamese dataset for evaluating the model. We also conduct experiments on English corpora to evaluate our proposed model.
Article
We present a novel method based on interdependent representations of short texts for determining their degree of semantic similarity. The method represents each short text as two dense vectors: the former is built using the word-to-word similarity based on pre-trained word vectors, the latter is built using the word-to-word similarity based on exte...
Article
Full-text available
Time series classification is one of the most important issues in time series data mining. This problem has attracted more and more attention of researchers in recent years. Among proposed methods in literature, 1-Nearest Neighbor (1-NN), its variants and improvements have been widely considered as hard to be beaten on classification of time series...
Conference Paper
Full-text available
In this study, MIKE NAM, artificial neural networks (ANNs), and a hybridization of ANNs and Particle Swarm Optimization (ANN-PSO) are utilized to predict the Dak Nong runoff. ANNs are trained by the back-propagation (BP) procedure which is based on the gradient descent algorithm and an incorporating algorithm of PSO and BP. Moreover, to improve the...
Article
Full-text available
Unsupervised word alignments are widely used in phrase-based statistical machine translation. The quality of this model is proportional to the size and quality of a bilingual corpus. However, for low-resource language pairs such as Chinese and Vietnamese, the result of unsupervised word alignment sometimes is of low quality due to the sparse data....
Conference Paper
We propose a Weighted Local Mean-based k Nearest Neighbors (WLMk-NN) classifier for time series. The proposed method is different from Local Mean-based k-Nearest Neighbors (LMk-NN) in that it assigns a weight for each element of time series when calculating local mean vectors. Indeed, our proposed method determines the local mean vectors more effic...
Conference Paper
Full-text available
We propose an approach to diagnosing brain hemorrhage by using deep learning. In particular, three types of convolutional neural networks that are LeNet, GoogLeNet, and Inception-ResNet are employed. In the training phase, we only train the last fully-connected layers of GoogLeNet and Inception-ResNet, but do train all layers of LeNet. We build a d...
Article
Full-text available
Background Named entity recognition (NER) is a task of detecting named entities in documents and categorizing them to predefined classes, such as person, location, and organization. This paper focuses on tweets posted on Twitter. Since tweets are noisy, irregular, brief, and include acronyms and spelling errors, NER in those tweets is a challenging...
Conference Paper
1-Nearest Neighbor has been endorsed an efficient method for time series classification as it outperforms more advanced classification algorithms in most cases. However, the time and space efficiency of this method depend on the number of instances in a training set. In order to improve its running time and space using, one can apply an approach ca...
Conference Paper
Identifying “influential spreader” is finding a subset of individuals in the social network, so that when information injected into this subset, it is spread most broadly to the rest of the network individuals. The determination of the information influence degree of individual plays an important role in online social networking. Once there is a li...
Conference Paper
Community detection is to detect groups consisting of densely connected nodes, and having sparse connections between them. Many researchers indicate that detecting community structures in complex networks can extract plenty of useful information, such as the structural features, network properties, and dynamic characteristics of the community. Seve...
Conference Paper
In this paper, we present a method for measuring semantic similarity between short texts by combining two different kinds of features: (1) distributed representation of word, (2) knowledge-based and corpus-based metrics. Then, we present experiments to evaluate our method on two popular datasets - Microsoft Research Paraphrase Corpus and SemEval-20...
Article
Full-text available
Chinese and Vietnamese have the same isolated language; that is, the words are not delimited by spaces. In machine translation, word segmentation is often done first when translating from Chinese or Vietnamese into different languages (typically English) and vice versa. However, it is a matter for consideration that words may or may not be segmente...
Article
Full-text available
Background/Objectives: Data-driven models such as Recurrent Fuzzy Neural Network (RFNN) have been proven to be great methods for modeling, characterizing and predicting various kinds of nonlinear hydrologic time series data such as rainfall, water quality and river runoff. In modeling and predicting river runoff, the most important advantage of dat...
Article
Online social networks have become an effective and important social platform for communication, opinions exchange, and information sharing. However, they also make it possible for rapid and wide misinformation diffusion, which may lead to pernicious influences on individuals or society. Hence, it is extremely important and necessary to detect the...
Conference Paper
Dengue fever is a very dangerous disease and it is rampant in several tropical countries generally and Vietnam particularly. In Vietnam, due to the lack of diagnosis supporting tools, annually, there have been approximately 100 dead dengue patients. Thus it is necessary to develop a diagnosis supporting tool of dengue. In the past few decades, Deci...
Conference Paper
Dengue fever is a dangerous disease and very popular in several tropical countries generally and Vietnam particularly. In Vietnam, due to the lacking of diagnosis supporting tools, annually, there have been approximately 100 dead patients of dengue. Thus it is necessary to develop a diagnosis supporting tool of dengue. In past few decades, Decision...
Conference Paper
Full-text available
We study how to optimize the boiler efficiency of a steam boiler which is the most important component in a fertilizer plant. In particular , we have proposed several methods for forecasting when the trend of the boiler efficiency is going down so that some control parameters of the boiler are adjusted to keep its efficiency stably. This is a chall...
Conference Paper
In this paper, we systematically investigate the methods of measuring semantic relatedness between concepts and categorize them into 6 types of methods: path based, information content, gloss based, vector based, corpus based and string based. Besides, we re-implement those types of methods for evaluating on the latest knowledge sources and made AP...
Conference Paper
Text compression is a technique to reduce the size of text file and increase the transfer rate as well as save storage space. Many approaches have been proposed to tackle this problem in several languages such as: English, Chinese, Turkey, Japanese, French, etc. In this paper, we propose a method to compress Vietnamese text using syllables based on...
Conference Paper
This paper presents a new and efficient method for text compression using tri-grams dictionary. There have been many methods proposed to text compression such as: run length coding, Huffman coding, Lempel-Ziv-Welch (LZW) coding. Most of them have based on frequency of occurrence of letters in the text. In this paper, we propose a method to compress...
Article
Full-text available
We propose an efficient method for compressing Vietnamese text using n -gram dictionaries. It has a significant compression ratio in comparison with those of state-of-the-art methods on the same dataset. Given a text, first, the proposed method splits it into n -grams and then encodes them based on n -gram dictionaries. In the encoding phase, we us...
Book
This book constitutes the refereed proceedings of the 5th International Conference on Computational Social Networks, CSoNet 2016, held in Ho Chi Minh City, Vietnam, in August 2016. The 30 revised full papers presented were carefully reviewed and selected from 79 submissions. The papers cover topics on common principles, algorithms and tools that go...
Conference Paper
We propose a novel method for measuring semantic similarity between two sentences. The method exploits both syntactic and semantic features to assess the similarity. In our method, words in a sentence are weighted using their information content. The weights of words help differentiate their contribution towards the meaning of the sentence. The ori...
Conference Paper
Full-text available
We study a task of noisy text normalization focusing on Viet-namese tweets. This task aims to improve the performance of applications mining or analyzing semantics of social media contents as well as other social network analysis applications. Since tweets on Twitter are noisy, irregular, short and consist of acronym, spelling errors, processing th...
Article
Evaluating semantic similarity between concepts is a very common component in many applications dealing with textual data such as information extraction, information retrieval, natural language processing, or knowledge acquisition. This paper presents an approach to assess semantic similarity between Vietnamese concepts using Vietnamese Wikipedia....
Article
This paper proposes an algorithm for mining frequent closed itemsets from multidimensional databases. The algorithm, which does not require transforming a database into a transaction database, is based on the intersections of object identifications for fast computing the supports of itemsets. Experimental results show that the algorithm is faster t...
Conference Paper
Time series prediction has attracted attention of many researchers as well as practitioners from different fields and many approaches have been proposed. Traditionally, sliding window technique was employed to transform data first and then some learning models such as fuzzy neural networks were exploited for prediction. In order to improve the pred...
Conference Paper
We propose a novel method for measuring semantic similarity of two sentences. The originality of the method is the way that it explores the similarity of concepts referred to in the sentences using Wikipedia. The method also exploits Wiktionary to measure word-to-word similarity. The overall semantic similarity is a linear combination of word-to-wo...
Conference Paper
Data mining is one of exciting fields in recent years. Its purpose is to discover useful information and knowledge from large databases for business decisions and other areas. One engineering topic of data mining is utility mining which discovers high-utility itemsets. An itemset in traditional utility mining considers individual profits and quanti...
Conference Paper
Full-text available
Recurrent fuzzy neural network (RFNN) is proven to be a great method for modeling, characterizing and predicting many kinds of nonlinear hydrological time series data such as rainfall, water quality, and river runoff. In our study, we employed RFNN to find out the correlation between the climate data and the runoff of Srepok River in Vietnam and th...
Conference Paper
Full-text available
We study the task of entity linking for Vietnamese tweets, which aims at detecting entity mentions and linking them to corresponding entries in a given knowledge base. Unlike authored news or textual web content, tweets are noisy, irregular, and short, which causes entity linking in tweets much more challenging.We propose an approach to build an en...
Article
As an imperative channel for fast information propagation, online social networks (OSNs) also have their defects. One of them is the information leakage, i.e., information could be spread via OSNs to the users whom we are not willing to share with. Thus the problem of constructing a circle of trust to share information with as many friends as possi...
Article
Assessing network systems for failures is critical to mitigate the risk and develop proactive responses. In this paper, we investigate devastating consequences of link failures in networks. We propose an exact algorithm and a spectral lower-bound on the minimum number of removed links to incur a significant level of disruption. Our exact solution c...
Conference Paper
Entity linking refers to the task of mapping name strings in a text to their corresponding entities in a given knowledge base. It is an essential component in natural language processing applications and a challenging task. This paper proposes a method that combines heuristics and learning for entity linking by (i) learning coherence among co-occur...
Conference Paper
Social networking services (in short, SNS) allow users to share their own data with family, friends, and communities. Since there are many kinds of information that has been uploaded and shared through the SNS, the amount of information on the SNS keeps increasing exponentially. Particularly, Facebook has adopted some interesting features related t...
Article
Online social media have been playing an important role of creating and diffusing information to many users. It means the users can get cognitive influence to the other users. Thus, it is important to understand how the information can be diffused by interactions among users through online social media. In this paper, we design a social media monit...
Article
Full-text available
Semantic annotation of named entities for enriching unstructured content is a critical step in development of Semantic Web and many Natural Language Processing applications. To this end, this paper addresses the named entity disambiguation problem that aims at detecting entity mentions in a text and then linking them to entries in a knowledge base....
Conference Paper
Full-text available
We present two methods for entity linking in two of our systems submitted to TAC-KBP 2012. The first one, namely Method 1, learns coherence among co-occurrence entities re-ferred to within a text by exploiting Wikipe-dia's link structure and the second one, namely Method 2, combines some heuristics with a statistical model for entity linking. Metho...
Conference Paper
With the wide usage of Wikipedia in research and applications,disambiguation of concepts and entities to Wikipedia is an essential component in natural language processing. This paper addresses the task of identifying and linking specific words or phrases in a text to their referents described by Wikipedia articles. In this work, we propose a metho...
Conference Paper
Precisely identifying entities is essential for semantic annotation. This paper addresses the problem of named entity disambiguation that aims at mapping entity mentions in a text onto the right entities in Wikipedia. The aim of this paper is to explore and evaluate various combinations of features extracted from Wikipedia and texts for the disambi...
Conference Paper
Full-text available
The rapidly increasing use of large-scale data on the Web makes named entity disambiguation become one of the main challenges to research in Information Extraction and development of Semantic Web. This paper presents a novel method for detecting proper names in a text and linking them to the right entities in Wikipedia. The method is hybrid, contai...
Conference Paper
Currently, for named entity disambiguation, the short-age of training data is a problem. This paper presents a novel method that overcomes this problem by automatically generating an annotated corpus based on a specific ontology. Then the corpus was enriched with new and informative features extracted from Wikipedia data. Moreover, rather than purs...
Conference Paper
Full-text available
Named entity disambiguation has been one of the main challenges to research in Information Extraction and development of Semantic Web. Therefore, it has attracted much research effort, with various methods introduced for different domains, scopes, and purposes. In this paper, we propose a new approach that is not limited to some entity classes and...
Article
Full-text available
Detecting entity mentions in a text and then map-ping them to their right entities in a given knowledge source is significant to realization of the semantic web, as well as ad-vanced development of natural language processing applica-tions. The knowledge sources used are often close ontologies -built by small groups of experts -and Wikipedia. To da...

Network

Cited By