Thang Hoang TaUniversity of Dalat
Thang Hoang Ta
Doctor of Philosophy
About
19
Publications
3,189
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
41
Citations
Introduction
I graduated with my Ph.D. from Centro de Investigación en Computación (CIC), Instituto Politécnico Nacional (IPN). My research interests are about Natural Language Generation, Knowledge Bases, and Sentiment Analysis. I am also a web designer with more than 10 years of experience.
Additional affiliations
April 2008 - present
Education
October 2012 - February 2015
September 2003 - January 2008
Publications
Publications (19)
In this paper, we introduce BSRBF-KAN, a Kolmogorov Arnold Network (KAN) that combines Bsplines and radial basis functions (RBFs) to fit input vectors in data training. We perform experiments with BSRBF-KAN, MLP, and other popular KANs, including EfficientKAN, FastKAN, FasterKAN, and GottliebKAN over the MNIST and FashionMNIST datasets. BSRBF-KAN s...
Emotions are integral to human social interactions, with diverse responses elicited by various situational contexts. Particularly, the prevalence of negative emotional states has been correlated with negative outcomes for mental health, necessitating a comprehensive analysis of their occurrence and impact on individuals. In this paper, we introduce...
This paper introduces a novel training model, self-training from self-memory (STSM) in data-to-text generation (DTG), allowing the model to self-train on subsets, including self-memory as outputs inferred directly from the trained models and/or the new data. The quality of self-memory is validated by two models, data-to-text (D2T) and text-to-data...
The use of transfer learning methods is largely responsible for the present breakthrough in Natural Learning Processing (NLP) tasks across multiple domains. In order to solve the problem of sentiment detection, we examined the performance of four different types of well-known state-of-the-art transformer models for text classification. Models such...
Acknowledged as one of the most successful online cooperative projects in human society, Wikipedia has obtained rapid growth in recent years and desires continuously to expand content and disseminate knowledge values for everyone globally. The shortage of volunteers brings to Wikipedia many issues, including developing content for over 300 language...
As free online encyclopedias with massive volumes of content, Wikipedia and Wikidata are key to many Natural Language Processing (NLP) tasks, such as information retrieval, knowledge base building, machine translation, text classification, and text summarization. In this paper, we introduce WikiDes, a novel dataset to generate short descriptions of...
As free online encyclopedias with massive volumes of content, Wikipedia and Wikidata are key to many Natural Language Processing (NLP) tasks, such as information retrieval, knowledge base building, machine translation, text classification, and text summarization. In this paper, we introduce WikiDes, a novel dataset to generate short descriptions of...
In this paper, we participate in the task of Detection of Aggressive and Violent INCIdents from Social Media in Spanish (DA-VINCIS). We apply a multi-task learning network, MT-DNN to train users' tweets on their text embeddings from pre-trained transformer models. In the first subtask, we obtained the best F1 of 74.80%, Precision of 75.52%, and Rec...
In this paper, we address Subtask 1 of Detection of Aggressive and Violent INCIdents from Social Media in Spanish (DA-VINCIS). We introduced our method, using text embeddings from pre-trained transformer models for the training process by GAN-BERT, an adversarial learning architecture. Finally, we obtained F1 of 74.43%, Precision of 74.08%, and Rec...
In this paper, we address the Task 1 and Task 2 of the EXIST 2022 in detecting sexism in a broad sense, from ideological inequality, sexual violence, misogyny to other expressions that involve implicit sexist behaviours in social networks. We apply transfer learning from a pre-trained multilingual DeBERTa (mDeBERTa) model and its zero classificatio...
In this paper, we work on Paraphrase Identification in Mexican Spanish (PAR-MEX) at the sentence level. We introduced two lightweight methods, linear regression and multilayer perceptron for training data on features, extracted from pre-trained models. A rule of thumb, pair similarity is used to filter noises in the positive examples. We obtained t...
In this paper, we address the task of Paraphrase Identification in Mexican Spanish (PAR-MEX) at sentence-level. We introduced our method, using text embeddings from pre-trained transformer models for the training process by GAN-BERT, an adversarial learning. We modified noises for the generator, which have a random rate and the same size of the hid...
This paper presents our participation in the task of detecting gender, profession, and political ideology in tweets of Spanish users, in a binary and multi-class perspective. The task plays an important role in identifying political ideology of parties and politicians, especially new emerging ones. This may support relevant tasks to make prediction...
Hatred spreading through the use of language on social media platforms and in online groups is becoming a well-known phenomenon. By comparing two text representations: bag of words (BoW) and pre-trained word embedding using GloVe, we used a binary classification approach to automatically process user contents to detect hate speech. The Naive Bayes...
In this paper, we engage the Task 2 of the SMART Task 2021 challenge in predicting relations used to identify the correct answer of a given question. This is a subtask of Knowledge Base Question Answering (KBQA) and offers valuable insights for the development of KBQA systems. We introduce our method, combining BERT and data oversampling with text...
In this paper, we extract quotations from Al Jazeera’s news articles containing keywords related to the COVID-19 pandemic. We apply Latent Dirichlet allocation (LDA), coherence measures, and clustering algorithms to unsupervisedly explore latent topics from the dataset of about 3400 quotations to see how coronavirus impacts human beings. By combini...
Wikipedia nổi tiếng là một bách khoa toàn thư mở lớn nhất hiện nay với mục đích phổ cập kiến thức cho tất cả mọi người trên thế giới. Với việc áp dụng robot trong khâu tạo bài tự động, dự án tiếng Việt là một trong 13 dự án ngôn ngữ có hơn một triệu bài viết. Tuy nhiên, điều đó tạo cho Wikipedia tiếng Việt nhiều thách thức trong việc nâng cao chất...
Wikidata là một cơ sở dữ liệu trực tuyến mở lưu trữ các tài nguyên chung của các dự án liên quan do tổ chức Wikimedia quản lý. Việc đồng nhất hóa các hộp thông tin (infobox) của Wikipedia được nêu trong kế hoạch giai đoạn 2 của Wikidata. Theo đó, các hộp thông tin sẽ được đồng nhất hóa để tránh tình trạng đa dạng dữ liệu giữa các dự án ngôn ngữ. Đồ...
Wikipedia supports a large converged data with millions of contributions in more than 287 languages currently. Its content changes rapidly and continuously every hour with thousands of edits which trigger many challenges for Wikipedia in controlling, associating and balancing article content among language editions. This paper provides some process...
Questions
Questions (2)
I found that a lot of papers researching about monolingual or bilingual issues in NLP. Therefore, there are not many things to do now. Given a problem, many scholars try to open it to the general case by approaching multilingualism. Should we conclude that multilingualism will be the future of NLP?
I am looking for any professors who research about Wikipedia and related things.