Thang Hoang Ta

Thang Hoang Ta
University of Dalat

Doctor of Philosophy

About

19
Publications
3,189
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
41
Citations
Introduction
I graduated with my Ph.D. from Centro de Investigación en Computación (CIC), Instituto Politécnico Nacional (IPN). My research interests are about Natural Language Generation, Knowledge Bases, and Sentiment Analysis. I am also a web designer with more than 10 years of experience.
Additional affiliations
April 2008 - present
University of Dalat
Position
  • Lecturer
Education
October 2012 - February 2015
Shinawatra University
Field of study
  • Information Technology
September 2003 - January 2008
University of Dalat
Field of study
  • Software Engineering

Publications

Publications (19)
Preprint
Full-text available
In this paper, we introduce BSRBF-KAN, a Kolmogorov Arnold Network (KAN) that combines Bsplines and radial basis functions (RBFs) to fit input vectors in data training. We perform experiments with BSRBF-KAN, MLP, and other popular KANs, including EfficientKAN, FastKAN, FasterKAN, and GottliebKAN over the MNIST and FashionMNIST datasets. BSRBF-KAN s...
Preprint
Full-text available
Emotions are integral to human social interactions, with diverse responses elicited by various situational contexts. Particularly, the prevalence of negative emotional states has been correlated with negative outcomes for mental health, necessitating a comprehensive analysis of their occurrence and impact on individuals. In this paper, we introduce...
Preprint
Full-text available
This paper introduces a novel training model, self-training from self-memory (STSM) in data-to-text generation (DTG), allowing the model to self-train on subsets, including self-memory as outputs inferred directly from the trained models and/or the new data. The quality of self-memory is validated by two models, data-to-text (D2T) and text-to-data...
Chapter
The use of transfer learning methods is largely responsible for the present breakthrough in Natural Learning Processing (NLP) tasks across multiple domains. In order to solve the problem of sentiment detection, we examined the performance of four different types of well-known state-of-the-art transformer models for text classification. Models such...
Preprint
Full-text available
Acknowledged as one of the most successful online cooperative projects in human society, Wikipedia has obtained rapid growth in recent years and desires continuously to expand content and disseminate knowledge values for everyone globally. The shortage of volunteers brings to Wikipedia many issues, including developing content for over 300 language...
Article
As free online encyclopedias with massive volumes of content, Wikipedia and Wikidata are key to many Natural Language Processing (NLP) tasks, such as information retrieval, knowledge base building, machine translation, text classification, and text summarization. In this paper, we introduce WikiDes, a novel dataset to generate short descriptions of...
Preprint
Full-text available
As free online encyclopedias with massive volumes of content, Wikipedia and Wikidata are key to many Natural Language Processing (NLP) tasks, such as information retrieval, knowledge base building, machine translation, text classification, and text summarization. In this paper, we introduce WikiDes, a novel dataset to generate short descriptions of...
Article
Full-text available
In this paper, we participate in the task of Detection of Aggressive and Violent INCIdents from Social Media in Spanish (DA-VINCIS). We apply a multi-task learning network, MT-DNN to train users' tweets on their text embeddings from pre-trained transformer models. In the first subtask, we obtained the best F1 of 74.80%, Precision of 75.52%, and Rec...
Conference Paper
Full-text available
In this paper, we address Subtask 1 of Detection of Aggressive and Violent INCIdents from Social Media in Spanish (DA-VINCIS). We introduced our method, using text embeddings from pre-trained transformer models for the training process by GAN-BERT, an adversarial learning architecture. Finally, we obtained F1 of 74.43%, Precision of 74.08%, and Rec...
Conference Paper
Full-text available
In this paper, we address the Task 1 and Task 2 of the EXIST 2022 in detecting sexism in a broad sense, from ideological inequality, sexual violence, misogyny to other expressions that involve implicit sexist behaviours in social networks. We apply transfer learning from a pre-trained multilingual DeBERTa (mDeBERTa) model and its zero classificatio...
Conference Paper
Full-text available
In this paper, we work on Paraphrase Identification in Mexican Spanish (PAR-MEX) at the sentence level. We introduced two lightweight methods, linear regression and multilayer perceptron for training data on features, extracted from pre-trained models. A rule of thumb, pair similarity is used to filter noises in the positive examples. We obtained t...
Conference Paper
Full-text available
In this paper, we address the task of Paraphrase Identification in Mexican Spanish (PAR-MEX) at sentence-level. We introduced our method, using text embeddings from pre-trained transformer models for the training process by GAN-BERT, an adversarial learning. We modified noises for the generator, which have a random rate and the same size of the hid...
Conference Paper
Full-text available
This paper presents our participation in the task of detecting gender, profession, and political ideology in tweets of Spanish users, in a binary and multi-class perspective. The task plays an important role in identifying political ideology of parties and politicians, especially new emerging ones. This may support relevant tasks to make prediction...
Article
Full-text available
Hatred spreading through the use of language on social media platforms and in online groups is becoming a well-known phenomenon. By comparing two text representations: bag of words (BoW) and pre-trained word embedding using GloVe, we used a binary classification approach to automatically process user contents to detect hate speech. The Naive Bayes...
Article
Full-text available
In this paper, we engage the Task 2 of the SMART Task 2021 challenge in predicting relations used to identify the correct answer of a given question. This is a subtask of Knowledge Base Question Answering (KBQA) and offers valuable insights for the development of KBQA systems. We introduce our method, combining BERT and data oversampling with text...
Chapter
Full-text available
In this paper, we extract quotations from Al Jazeera’s news articles containing keywords related to the COVID-19 pandemic. We apply Latent Dirichlet allocation (LDA), coherence measures, and clustering algorithms to unsupervisedly explore latent topics from the dataset of about 3400 quotations to see how coronavirus impacts human beings. By combini...
Article
Full-text available
Wikipedia nổi tiếng là một bách khoa toàn thư mở lớn nhất hiện nay với mục đích phổ cập kiến thức cho tất cả mọi người trên thế giới. Với việc áp dụng robot trong khâu tạo bài tự động, dự án tiếng Việt là một trong 13 dự án ngôn ngữ có hơn một triệu bài viết. Tuy nhiên, điều đó tạo cho Wikipedia tiếng Việt nhiều thách thức trong việc nâng cao chất...
Article
Full-text available
Wikidata là một cơ sở dữ liệu trực tuyến mở lưu trữ các tài nguyên chung của các dự án liên quan do tổ chức Wikimedia quản lý. Việc đồng nhất hóa các hộp thông tin (infobox) của Wikipedia được nêu trong kế hoạch giai đoạn 2 của Wikidata. Theo đó, các hộp thông tin sẽ được đồng nhất hóa để tránh tình trạng đa dạng dữ liệu giữa các dự án ngôn ngữ. Đồ...
Data
Wikipedia supports a large converged data with millions of contributions in more than 287 languages currently. Its content changes rapidly and continuously every hour with thousands of edits which trigger many challenges for Wikipedia in controlling, associating and balancing article content among language editions. This paper provides some process...

Questions

Questions (2)
Question
I found that a lot of papers researching about monolingual or bilingual issues in NLP. Therefore, there are not many things to do now. Given a problem, many scholars try to open it to the general case by approaching multilingualism. Should we conclude that multilingualism will be the future of NLP?
Question
I am looking for any professors who research about Wikipedia and related things.

Network

Cited By