Home
Ton Duc Thang University
Faculty of Information Technology
Hien Nguyen

Hien Nguyen
Ton Duc Thang University | TDT · Faculty of Information Technology

About

Publications

13,965

Reads

610

Citations

Skills and Expertise

Data Mining and Knowledge Discovery

Information Extraction

Semantic Web

Knowledge Bases

Natural Language Processing

Data Mining

Publications

Detecting Hate Speech Contents Using Embedding Models

Chapter

Dec 2021

The rise of hate speech contents on social network platforms has recently become a topic of interest. There have been a lot of studies to develop systems that can automatically detect hate speech contents. In this paper, we propose a knowledge-rich solution to hate speech detection by incorporating hate speech embeddings to generate a more accurate...

Visualizing Vietnam’s Scientific Research Projects Based on Pre-trained Language Models and UMAP

Conference Paper

Nov 2020

New Avenues in Mobile Tourism

Conference Paper

Jul 2020

An improvement of SAX representation for time series by using complexity invariance

Article

Full-text available

May 2020

Learning Representations for Vietnamese Sentence Classification (Extended Abstract)

Chapter

Nov 2019

In this study, we propose a new deep language model that taking the advantage of Transformer model towards the task of Vietnamese sentence classification. We construct a new Vietnamese dataset for evaluating the model. We also conduct experiments on English corpora to evaluate our proposed model.

Understanding the Role of Social Media in Backpacker Tourism

Conference Paper

Nov 2019

Learning short-text semantic similarity with word embeddings and external knowledge sources

Article

Jul 2019

We present a novel method based on interdependent representations of short texts for determining their degree of semantic similarity. The method represents each short text as two dense vectors: the former is built using the word-to-word similarity based on pre-trained word vectors, the latter is built using the word-to-word similarity based on exte...

A novel non-parametric method for time series classification based on k -Nearest Neighbors and Dynamic Time Warping Barycenter Averaging

Article

Full-text available

Feb 2019

Time series classification is one of the most important issues in time series data mining. This problem has attracted more and more attention of researchers in recent years. Among proposed methods in literature, 1-Nearest Neighbor (1-NN), its variants and improvements have been widely considered as hard to be beaten on classification of time series...

A Hybrid Approach to Paraphrase Detection

Conference Paper

Nov 2018

A Hybrid Approach to Answer Selection in Question Answering Systems

Chapter

Feb 2018

A Short Review on Deep Learning for Entity Recognition: 5th International Conference, FDSE 2018, Ho Chi Minh City, Vietnam, November 28–30, 2018, Proceedings

Chapter

Oct 2018

Question Understanding in Community-Based Question Answering Systems: 7th International Conference, CSoNet 2018, Shanghai, China, December 18–20, 2018, Proceedings

Chapter

Nov 2018

A comprehensive study on predicting river runoff

Conference Paper

Full-text available

Oct 2017

In this study, MIKE NAM, artificial neural networks (ANNs), and a hybridization of ANNs and Particle Swarm Optimization (ANN-PSO) are utilized to predict the Dak Nong runoff. ANNs are trained by the back-propagation (BP) procedure which is based on the gradient descent algorithm and an incorporating algorithm of PSO and BP. Moreover, to improve the...

Improving word alignment based on named entity

Article

Full-text available

Jul 2017

Unsupervised word alignments are widely used in phrase-based statistical machine translation. The quality of this model is proportional to the size and quality of a bilingual corpus. However, for low-resource language pairs such as Chinese and Vietnamese, the result of unsupervised word alignment sometimes is of low quality due to the sparse data....

A Weighted Local Mean-Based k-Nearest Neighbors Classifier for Time Series

Conference Paper

Feb 2017

We propose a Weighted Local Mean-based k Nearest Neighbors (WLMk-NN) classifier for time series. The proposed method is different from Local Mean-based k-Nearest Neighbors (LMk-NN) in that it assigns a weight for each element of time series when calculating local mean vectors. Indeed, our proposed method determines the local mean vectors more effic...

Brain Hemorrhage Diagnosis by Using Deep Learning

Conference Paper

Full-text available

Jan 2017

We propose an approach to diagnosing brain hemorrhage by using deep learning. In particular, three types of convolutional neural networks that are LeNet, GoogLeNet, and Inception-ResNet are employed. In the training phase, we only train the last fully-connected layers of GoogLeNet and Inception-ResNet, but do train all layers of LeNet. We build a d...

Text normalization for named entity recognition in Vietnamese tweets

Article

Full-text available

Dec 2016

Background Named entity recognition (NER) is a task of detecting named entities in documents and categorizing them to predefined classes, such as person, location, and organization. This paper focuses on tweets posted on Twitter. Since tweets are noisy, irregular, brief, and include acronyms and spelling errors, NER in those tweets is a challenging...

Instance Reduction for Time Series Classification by Exploiting Representative Characteristics using k-means

Conference Paper

Nov 2016

1-Nearest Neighbor has been endorsed an efficient method for time series classification as it outperforms more advanced classification algorithms in most cases. However, the time and space efficiency of this method depend on the number of instances in a training set. In order to improve its running time and space using, one can apply an approach ca...

Identify Influential Spreaders in Online Social Networks Based on Social Meta Path and PageRank

Conference Paper

Aug 2016

Identifying “influential spreader” is finding a subset of individuals in the social network, so that when information injected into this subset, it is spread most broadly to the rest of the network individuals. The determination of the information influence degree of individual plays an important role in online social networking. Once there is a li...

Improving Node Similarity for Discovering Community Structure in Complex Networks

Conference Paper

Aug 2016

Community detection is to detect groups consisting of densely connected nodes, and having sparse connections between them. Many researchers indicate that detecting community structures in complex networks can extract plenty of useful information, such as the structural features, network properties, and dynamic characteristics of the community. Seve...

Conference Paper

Aug 2016

In this paper, we present a method for measuring semantic similarity between short texts by combining two different kinds of features: (1) distributed representation of word, (2) knowledge-based and corpus-based metrics. Then, we present experiments to evaluate our method on two popular datasets - Microsoft Research Paraphrase Corpus and SemEval-20...

Examples for incorrect translations of CL and WL translation systems.

A Character Level Based and Word Level Based Approach for Chinese-Vietnamese Machine Translation

Article

Full-text available

Jun 2016

Chinese and Vietnamese have the same isolated language; that is, the words are not delimited by spaces. In machine translation, word segmentation is often done first when translating from Chinese or Vietnamese into different languages (typically English) and vice versa. However, it is a matter for consideration that words may or may not be segmente...

A comparative study of SWAT, RFNN and RFNN-GA for predicting river runoff

Article

Full-text available

May 2016

Background/Objectives: Data-driven models such as Recurrent Fuzzy Neural Network (RFNN) have been proven to be great methods for modeling, characterizing and predicting various kinds of nonlinear hydrologic time series data such as rainfall, water quality and river runoff. In modeling and predicting river runoff, the most important advantage of dat...

Misinformation in Online Social Networks: Detect Them All with a Limited Budget

Article

Apr 2016

Online social networks have become an effective and important social platform for communication, opinions exchange, and information sharing. However, they also make it possible for rapid and wide misinformation diffusion, which may lead to pernicious influences on individuals or society. Hence, it is extremely important and necessary to detect the...

Decision trees for diagnosis of dengue fever

Conference Paper

Jan 2016

Dengue fever is a very dangerous disease and it is rampant in several tropical countries generally and Vietnam particularly. In Vietnam, due to the lack of diagnosis supporting tools, annually, there have been approximately 100 dead dengue patients. Thus it is necessary to develop a diagnosis supporting tool of dengue. In the past few decades, Deci...

Decision trees for dianosis of dengue fever

Conference Paper

Jan 2016

Dengue fever is a dangerous disease and very popular in several tropical countries generally and Vietnam particularly. In Vietnam, due to the lacking of diagnosis supporting tools, annually, there have been approximately 100 dead patients of dengue. Thus it is necessary to develop a diagnosis supporting tool of dengue. In past few decades, Decision...

A multi-step-ahead and real-time approach for forecasting boiler efficiency

Conference Paper

Full-text available

Jan 2016

We study how to optimize the boiler efficiency of a steam boiler which is the most important component in a fertilizer plant. In particular , we have proposed several methods for forecasting when the trend of the boiler efficiency is going down so that some control parameters of the boiler are adjusted to keep its efficiency stably. This is a chall...

Evaluating Semantic Relatedness Between Concepts

Conference Paper

Jan 2016

In this paper, we systematically investigate the methods of measuring semantic relatedness between concepts and categorize them into 6 types of methods: path based, information content, gloss based, vector based, corpus based and string based. Besides, we re-implement those types of methods for evaluating on the latest knowledge sources and made AP...

A syllable-based method for Vietnamese text compression

Conference Paper

Jan 2016

Text compression is a technique to reduce the size of text file and increase the transfer rate as well as save storage space. Many approaches have been proposed to tackle this problem in several languages such as: English, Chinese, Turkey, Japanese, French, etc. In this paper, we propose a method to compress Vietnamese text using syllables based on...

Trigram-Based Vietnamese Text Compression

Conference Paper

Jan 2016

This paper presents a new and efficient method for text compression using tri-grams dictionary. There have been many methods proposed to text compression such as: run length coding, Huffman coding, Lempel-Ziv-Welch (LZW) coding. Most of them have based on frequency of occurrence of letters in the text. In this paper, we propose a method to compress...

Compression ratio of our method [13, 14].

Compression ratio of our method, WinRAR, and WinZIP.

n -Gram-Based Text Compression

Article

Full-text available

Jan 2016

We propose an efficient method for compressing Vietnamese text using n -gram dictionaries. It has a significant compression ratio in comparison with those of state-of-the-art methods on the same dataset. Given a text, first, the proposed method splits it into n -grams and then encodes them based on n -gram dictionaries. In the encoding phase, we us...

Computational Social Networks: 5th International Conference, CSoNet 2016, Ho Chi Minh City, Vietnam, August 2-4, 2016, Proceedings

Book

Jan 2016

This book constitutes the refereed proceedings of the 5th International Conference on Computational Social Networks, CSoNet 2016, held in Ho Chi Minh City, Vietnam, in August 2016. The 30 revised full papers presented were carefully reviewed and selected from 79 submissions. The papers cover topics on common principles, algorithms and tools that go...

A Multifaceted Approach to Sentence Similarity

Conference Paper

Oct 2015

We propose a novel method for measuring semantic similarity between two sentences. The method exploits both syntactic and semantic features to assess the similarity. In our method, words in a sentence are weighted using their information content. The weights of words help differentiate their contribution towards the meaning of the sentence. The ori...

Normalization of Vietnamese Tweets on Twitter

Conference Paper

Full-text available

Jun 2015

We study a task of noisy text normalization focusing on Viet-namese tweets. This task aims to improve the performance of applications mining or analyzing semantics of social media contents as well as other social network analysis applications. Since tweets on Twitter are noisy, irregular, short and consist of acronym, spelling errors, processing th...

Monitor Placement to Timely Detect Misinformation in Online Social Networks

Conference Paper

Jun 2015

Computing Semantic Similarity for Vietnamese Concepts Using Wikipedia

Article

Feb 2015

Hien Nguyen

Evaluating semantic similarity between concepts is a very common component in many applications dealing with textual data such as information extraction, information retrieval, natural language processing, or knowledge acquisition. This paper presents an approach to assess semantic similarity between Vietnamese concepts using Vietnamese Wikipedia....

Mining frequent closed itemsets from multidimensional databases

Article

Jan 2015

This paper proposes an algorithm for mining frequent closed itemsets from multidimensional databases. The algorithm, which does not require transforming a database into a transaction database, is based on the intersections of object identifications for fast computing the supports of itemsets. Experimental results show that the algorithm is faster t...

A Hybrid Approach for Predicting River Runoff

Conference Paper

Jun 2015

Time series prediction has attracted attention of many researchers as well as practitioners from different fields and many approaches have been proposed. Traditionally, sliding window technique was employed to transform data first and then some learning models such as fuzzy neural networks were exploited for prediction. In order to improve the pred...

Vietnamese Sentence Similarity Based on Concepts

Conference Paper

Nov 2014

We propose a novel method for measuring semantic similarity of two sentences. The originality of the method is the way that it explores the similarity of concepts referred to in the sentences using Wikipedia. The method also exploits Wiktionary to measure word-to-word similarity. The overall semantic similarity is a linear combination of word-to-wo...

A New Method for Mining High Average Utility Itemsets

Conference Paper

Nov 2014

Data mining is one of exciting fields in recent years. Its purpose is to discover useful information and knowledge from large databases for business decisions and other areas. One engineering topic of data mining is utility mining which discovers high-utility itemsets. An itemset in traditional utility mining considers individual profits and quanti...

Applying Recurrent Fuzzy Neural Network to Predict the Runoff of Srepok River

Conference Paper

Full-text available

Nov 2014

Recurrent fuzzy neural network (RFNN) is proven to be a great method for modeling, characterizing and predicting many kinds of nonlinear hydrological time series data such as rainfall, water quality, and river runoff. In our study, we employed RFNN to find out the correlation between the climate data and the runoff of Srepok River in Vietnam and th...

Entity Linking for Vietnamese Tweets

Conference Paper

Full-text available

Oct 2014

We study the task of entity linking for Vietnamese tweets, which aims at detecting entity mentions and linking them to corresponding entries in a given knowledge base. Unlike authored news or textual web content, tweets are noisy, irregular, and short, which causes entity linking in tweets much more challenging.We propose an approach to build an en...

Staying safe and visible via message sharing in online social networks

Article

Jul 2014

As an imperative channel for fast information propagation, online social networks (OSNs) also have their defects. One of them is the information leakage, i.e., information could be spread via OSNs to the users whom we are not willing to share with. Thus the problem of constructing a circle of trust to share information with as many friends as possi...

Bound and exact methods for assessing link vulnerability in complex networks

Article

Jul 2014

Assessing network systems for failures is critical to mitigate the risk and develop proactive responses. In this paper, we investigate devastating consequences of link failures in networks. We propose an exact algorithm and a spectral lower-bound on the minimum number of removed links to incur a significant level of disruption. Our exact solution c...

Combining Heuristics and Learning for Entity Linking

Conference Paper

Apr 2014

Hien Nguyen

Entity linking refers to the task of mapping name strings in a text to their corresponding entities in a given knowledge base. It is an essential component in natural language processing applications and a challenging task. This paper proposes a method that combines heuristics and learning for entity linking by (i) learning coherence among co-occur...

Social Context-Based Movie Recommendation: A Case Study on MyMovieHistory

Conference Paper

Apr 2014

Social networking services (in short, SNS) allow users to share their own data with family, friends, and communities. Since there are many kinds of information that has been uploaded and shared through the SNS, the amount of information on the SNS keeps increasing exponentially. Particularly, Facebook has adopted some interesting features related t...

Social data visualization system for understanding diffusion patterns on twitter: A case study on Korean enterprises

Article

Jan 2014

Online social media have been playing an important role of creating and diffusing information to many users. It means the users can get cognitive influence to the other users. Thus, it is important to understand how the information can be diffused by interactions among users through online social media. In this paper, we design a social media monit...

Table 1 . Statistics of mentions in the datasets.

Table 3 . Statistics of mentions in the dataset D 1

NAMED ENTITY DISAMBIGUATION: A HYBRID APPROACH

Article

Full-text available

Nov 2012

Semantic annotation of named entities for enriching unstructured content is a critical step in development of Semantic Web and many Natural Language Processing applications. To this end, this paper addresses the named entity disambiguation problem that aims at detecting entity mentions in a text and then linking them to entries in a knowledge base....

JVN-TDT Entity Linking Systems at TAC-KBP2012

Conference Paper

Full-text available

Nov 2012

We present two methods for entity linking in two of our systems submitted to TAC-KBP 2012. The first one, namely Method 1, learns coherence among co-occurrence entities re-ferred to within a text by exploiting Wikipe-dia's link structure and the second one, namely Method 2, combines some heuristics with a statistical model for entity linking. Metho...

Heuristics- and Statistics-Based Wikification

Conference Paper

Sep 2012

With the wide usage of Wikipedia in research and applications,disambiguation of concepts and entities to Wikipedia is an essential component in natural language processing. This paper addresses the task of identifying and linking specific words or phrases in a text to their referents described by Wikipedia articles. In this work, we propose a metho...

Exploring Wikipedia and Text Features for Named Entity Disambiguation

Conference Paper

Mar 2010

Precisely identifying entities is essential for semantic annotation. This paper addresses the problem of named entity disambiguation that aims at mapping entity mentions in a text onto the right entities in Wikipedia. The aim of this paper is to explore and evaluate various combinations of features extracted from Wikipedia and texts for the disambi...

Named Entity Disambiguation: A Hybrid Statistical and Rule-Based Incremental Approach

Conference Paper

Full-text available

Dec 2008

The rapidly increasing use of large-scale data on the Web makes named entity disambiguation become one of the main challenges to research in Information Extraction and development of Semantic Web. This paper presents a novel method for detecting proper names in a text and linking them to the right entities in Wikipedia. The method is hybrid, contai...

Named entity disambiguation on an ontology enriched by Wikipedia

Conference Paper

Aug 2008

Currently, for named entity disambiguation, the short-age of training data is a problem. This paper presents a novel method that overcomes this problem by automatically generating an annotated corpus based on a specific ontology. Then the corpus was enriched with new and informative features extracted from Wikipedia data. Moreover, rather than purs...

A Knowledge-Based Approach to Named Entity Disambiguation in News Articles

Conference Paper

Full-text available

Dec 2007

Named entity disambiguation has been one of the main challenges to research in Information Extraction and development of Semantic Web. Therefore, it has attracted much research effort, with various methods introduced for different domains, scopes, and purposes. In this paper, we propose a new approach that is not limited to some entity classes and...

Enriching Ontologies for Named Entity Disambiguation

Article

Full-text available

Detecting entity mentions in a text and then map-ping them to their right entities in a given knowledge source is significant to realization of the semantic web, as well as ad-vanced development of natural language processing applica-tions. The knowledge sources used are often close ontologies -built by small groups of experts -and Wikipedia. To da...