Conference PaperPDF Available

Short Text Classification With A Convolutional Neural Networks Based Method

November 2018

November 2018

DOI:10.1109/ICARCV.2018.8581332

Conference: 2018 15th International Conference on Control, Automation, Robotics and Vision (ICARCV)
At: Singapore

Authors:

Yang Li

Northwestern Polytechnical University

Tao Yang

Northwestern Polytechnical University

The traditional machine learning algorithms are easily affected by datasets in short text classification tasks, so they have weak generalization ability when confronted with new situations. This paper presents a new method SVMCNN by combining Convolutional Neural Networks and Support Vector Machine. Training the SVMCNN model with labeled datasets, and using the collected Twitter data for classification test. The results show that the SVMCNN, especially pre-trained SVMCNN has good performance in short text classification, which gets the high Precision rate, Recall rate and F1-measure.

Content uploaded by Yang Li

Content may be subject to copyright.

Short Text Classiﬁcation With A Convolutional Neural Networks Based

Method

Yibo Hu, Yang Li, Tao Yang, Quan Pan

Abstract— The traditional machine learning algorithms are

easily affected by datasets in short text classiﬁcation tasks, so

they have weak generalization ability when confronted with

new situations. This paper presents a new method SVMCNN

by combining Convolutional Neural Networks and Support

Vector Machine. Training the SVMCNN model with labeled

datasets, and using the collected Twitter data for classiﬁcation

test. The results show that the SVMCNN, especially pre-trained

SVMCNN has good performance in short text classiﬁcation,

which gets the high Precision rate, Recall rate and F1-measure.

I. INTRODUCTION

According to the latest statistics, the number of global

Internet users has exceeded 4 billion. In 2017, the number

of newly added Internet users was 250 million, it meant

the Internet penetration rate had exceeded 50%, and the

users who use social media had increased by 13%. There

are numerous social applications attract many Internet users,

for example, the number of Twitters monthly active users

has reached 3.3 billion as of 20171. Internet users generate

a large amount of information on these social applications

every day, including articles, news, and comments. These

information are mainly distributed as texts, and many of

them are in short texts. Through these information, it is pos-

sible to ﬁlter spam, advertisement, and illegal information.

Meanwhile, by analyzing these information, some signiﬁcant

events may be extracted, such as natural disasters, large-scale

public events, etc. Text classiﬁcation is a useful technology

for these scenarios.

Traditional text classiﬁcation methods are mainly based

on statistical principles, using manually labeled datasets to

train classiﬁers, and then classifying new data. Nill et al. use

a KNN (K-Nearest Neighbor) based system to classify the

Japanese Nursing-case text, and select the candidate category

for each text[1]. Naive Bayes (NB) is often used as a baseline

in text classiﬁcation, because it is fast and easy to imple-

ment. Rennie et al. show that with proper preprocessing,

NB can be compared with more advanced methods such

as support vector machine[2]. Diab and Hindi use three

This work was not supported by any organization

Y. Hu is with the School of Automation, Northwestern Polytechnical

University, Xi’an, China 710072 hyb@mail.nwpu.edu.cn

Y. Li is with the School of Automation, Northwestern Polytechnical U-

niversity, Xi’an, China 710072 liyangnpu@mail.nwpu.edu.cn

T. Yang is with the School of Automation, Northwestern

Polytechnical University, Xi’an, China 710072 corresponding

author to provide phone: 13571913583;

yangtao107@nwpu.edu.cn

Q. Pan is with the School of Automation, Northwestern Polytechnical

University, Xi’an, China 710072 quanpan@nwpu.edu.cn

1https://www.statista.com/statistics/282087/number-of-monthly-active-

twitter-users/

ways to improve performance of NB when dealing with

sparse text data[3]. Xu proposes three Bayesian counterparts,

and proves that Bayesian NB classiﬁer with Gaussian event

model is obviously better than classical counterpart to text

classiﬁcation[4]. Lilleberg et al. use SVM (Support Vector

Machine) to verify that combining TF-IDF and word2vec can

outperform TF-IDF on text classiﬁcation, but this study does

not consider the impact of redundant features on SVM[5].

Xia et al. use SVM to perform Chinese sentiment analysis by

the online hotel reviews[6]. The effect of different stop word

ﬁltering methods and feature selection methods are veriﬁed

by SVM. But the representation of text data loses semantic

information. Though NB and SVM are commonly used for

text classiﬁcation, they heavily depending on the variant,

features and datasets are used. For short snippet sentiment

tasks, NB actually does better than SVM, and the opposite

result holds while for longer documents. Wang et al. propose

a NBSVM model, which using NB log-count ratios as feature

values and using SVM for classiﬁcation[7]. But NB cant

guarantee that it will provide the most representative features.

In recent years, using neural networks to establish lan-

guage models has gradually matured. Bengio et al. propose

a neural network method to construct a binary language

model[8]. Hinton proposes the concept of word embedding,

which is valued by more and more researchers[9]. Word

embedding not only avoids the ”dimension disaster”, but

also describes the relationship between words from a higher

semantic level.

With the appearance and development of deep learning,

many ﬁelds have its presence, such as text classiﬁcation

tasks[10][11]. Kim uses a simple convolutional neural net-

work(CNN) model for text semantic classiﬁcation[12]. The

experimental results show that the CNN model performs no

less than traditional methods in the semantic classiﬁcation of

sentences. Kalchbrenner et al. propose a multi-layered CNN,

and k-max pooling is used for sentence classiﬁcation[13].

Zhang et al. using CNN for the text feature extraction[14].

Conneau et al. propose a deep CNN model for text clas-

siﬁcation, but it takes a long time[15]. Lai et al. propose

RCNN model that combines CNN and RNN[16]. The RCNN

model uses RNN to capture context information, and uses

CNN to construct a semantic representation of text. Lee and

Dernoncourt use RNN and CNN to classify continuous short

texts, which shows that CNN works better[17]. At the same

time, the short text representation is better than the class

representation, and the effect is reduced when they are used

simultaneously. This is because the short text representation

contains richer information than the class representation.

2018 15th International Conference on

Control, Automation, Robotics and Vision (ICARCV)

Singapore, November 18-21, 2018

This paper focuses on user comments on the Twitter social

platform, and discusses a short text classiﬁcation method that

combines CNN with SVM, which using CNN for features

extraction and using SVM for classiﬁcation.

II. SHO RT TEXT CLASSIFICATION

Short texts are unstructured data and need to be converted

into structured data that can be processed by computer

directly. Structured representations contain a large amount

of semantically relevant information, that is, contain a large

number of features. There are many features that are less

useful for classiﬁcation. Extracting the most important set of

features, and using them to train the classiﬁers.

At present, the commonly used text classiﬁcation meth-

ods include the traditional methods and the deep learning

based methods. The traditional text classiﬁcation methods

are mainly based on machine learning, and use the principles

of statistics to classify. The deep learning based methods

mainly use the neural network to extract the features of

texts, which can combine the low-level features to form more

abstract high-level features. This paper tries to combine these

two methods, and proposes a Support Vector Machine with

Convolutional Neural Network(SVMCNN).

III. SVMCNN MOD EL

This paper combines CNN with SVM, because CNN can

capture features between consecutive words through convo-

lution processing, and SVM can get the optimal solution of

existing information in the case of limited samples. Fig. 1

shows the work process of SVMCNN model, which uses

CNN to extract features of short texts and then uses SVM

classiﬁer for classiﬁcation.

There are many short texts of various lengths in

the Twitter datasets. Each short text is initialized with

vector(V1,V2, ..., VN), Let Vi∈Rkbe the k-dimensional word

vector corresponding to the i-th word in the short text, N

means the number of words in the longest text of the Twitter

datasets. Short texts with a length less than Nare ﬁlled, so

that all short texts have the same length. Cascading the word

vectors of each short text as

S=V1⊕V2⊕ · · · ⊕ VN(1)

where Sis the representation the short text, ⊕is the con-

catenation operator. The word vectors can be concatenated

together as a matrix, and then input this matrix into CNN

model for feature extraction.

A convolution operation involves a ﬁlter W∈Rh×k, where

his the height of the convolution kernel window, a window of

h word vectors can be mapped and produce a new feature.

Let Vi:i+h−1refers to the concatenation of hword vectors

(Vi,Vi+1, ..., Vi+h−1), their feature can be generated by

ci=f(W·Vi:i+h−1+b)(2)

where b∈Ris a bias term, fis a non-linear activation

function, such as Sigmoid, Tanh, Relu. This paper uses

ReLu as the activation function. ReLu is a piecewise linear

function that can reduce the interdependence of parameters.

Therefore, ReLu can ease the overﬁtting problem.

The ﬁlter is applied to each possible window of word

vectors (V1:h,V2:h+1, ..., Vn−h+1:n)to produce a feature map

[c1,c2, ..., cn−h+1]. Different ﬁlters can extract text features

from different perspectives. Therefore, different feature maps

can be obtained by setting the ﬁlter size and the number of

each ﬁlter.

After the feature map, reducing the parameters by the max-

pooling layer and obtaining the optimal features. Then, all

the obtained local optimal features are connected through a

fully connected layer whose output is the feature vector of

the short text.

Finally, using SVM classiﬁer to classify the short text fea-

tures. Let xi= (x(1)

i,x(2)

i, ..., x(m)

i)Tbe the i-th k-dimensional

feature vector, yi∈ {−1,1}is the category of the i-th short

text. The SVM classiﬁer can separate the feature vectors of

the short text by learning to ﬁnd a hyperplane

#»

ω·x+b=0 (3)

where #»

ωis the normal vector that determines the direction

of the hyperplane, bis the displacement term that determines

the distance between the hyperplane and the origin. The

distance from xito the hyperplane

ri=

#»

ωxi+b

||#»

ω|| (4)

Finding the optimal hyperplane is to ﬁnd the nearest two d-

ifferent vectors that their distances to the hyperplane is equal

and the sum of the distances from them to the hyperplane is

the furthest. That’s equivalent to

min 1

2||#»

ω||2

s.t.yi(ωTxi+b)≥1,i=1,··· ,n

(5)

Getting #»

ωand bof the optimal hyperplane, and then using

this optimal hyperplane to classify short text feature vectors.

IV. EXP ERI MEN T DESI GN AN D RESU LT S ANALYS IS

A. Datasets and Evaluation

This paper applies to two datasets. One is Sentiment po-

larity datasets2. The role of the datasets is to train SVMCNN

model parameters. The datasets include positive subsets and

negative subsets, and each of them contains more than 5,000

movie-review data. This paper trains the model with 10-fold

cross-validation. The other one is Twitter datasets. The role

of the datasets is to evaluate the generalization ability of

the SVMCNN model. The Twitter datasets imclude a total

of 3,169 comments data collected from the Twitter social

platform, and every piece of data carries the user’s sentiment.

For example, ”thankfully, overall, in the long run, things

are getting better in the world”, this is a positive sentiment

obviously. Another example, ”so sad to hear of the terrorist

attack in Egypt”, this comment expresses negative sentiment.

Since there are many illegal characters in the Twitter data,

2http://www.cs.cornell.edu/people/pabo/movie-review-data/

1433

Fig. 1. SVMCNN model

these will affect the short text classiﬁcation. Therefore, the

Twitter data need to be pre-processed to remove unnecessary

characters. This paper uses movie-review data to train model

and uses Twitter data to evaluate model, which is to evaluate

the models ability to adapt the big different data.

This paper evaluates the performance of algorithms with

three indicators, Precision rate, Recall rate, and F1-measure.

B. Experiment Design

It’s essential to use CNN to extract feature vectors of short

texts. Short texts must be initialized into word vectors when

they are input into the CNN model. Different initialization

methods have different classiﬁcation effects. This paper

uses random initialization and pre-trained methods for text

representation respectively.

Random initialization only needs to input the datasets

into the CNN model whose input layer is used for text

quantization. In the pre-trained initialization, datasets are

mapped to word vectors based on pre-trained. This paper

uses a public word embedding model which is pre-trained

by word2vec3. Word2vec can obtain a word vector space by

performing unsupervised learning on a large amount of text

corpora. As long as collecting a large amount of text corpora

covering most of the daily work, a universal word embedding

model can be pre-trained. Using this word embedding model

can make the initialized word vectors contain more semantic

information.

This paper builds a CNN model with three convolutional

layers based on the TensorFlow4. Speciﬁc parameters of

CNN model are in Table I.

Each short text can be represented as a 384-dimensional

vector after feature extraction by CNN. These vectors rep-

resent the main features of each short text and can be input

into SVM classiﬁer for classiﬁcation.

C. Results and Analysis

This paper compares random initialization and pre-trained

initialization for short text features extraction with CNN

model. Training the CNN model with sentiment polarity

3GoogleNews-vectors-negative300.bin

4https://www.tensorﬂow.org/

TABLE I

CNN MODEL PARAMETERS

Parameter Value

Word vector dimension 128

Filter size 3, 4, 5

Filter Number 128

Dropout rate 0.5

Batch size 64

Steps 3000

Learning rate 10−3

datasets. When using random initialization, the model starts

to converge after 2000 steps, and the accuracy on train set is

95% when the training is completed. However, when using

the pre-trained initialization, the training has already begun

to converge when it reaches 1000 steps, and the accuracy on

train set is close to 100%.

Then using various algorithms to predict the categories of

Twitter short texts. In addition to using SVMCNN and CNN,

this paper also uses other text classiﬁcation methods. All

of these model use sentiment polarity datasets for training,

and then predict Twitter datasets. Since the Twitter datasets

are unlabeled, this paper analyze the real categories manu-

ally. Comparing the real categories with the predictions of

the models, and calculating the classiﬁcation results of all

models ﬁnally. The model training results on the Sentiment

polarity datasets are in Table II, the prediction results on the

Twitter datasets are in Table III.

From the results, it can be seen that the model initial-

ized with pre-trained is better than the model with random

initialization, the three indicators of the former are all

higher than the latter. In addition, the SVMCNN model

performs well in all aspects. Especially, the SVMCNN with

pre-trained initialization has three highest indicators. The

model’s classiﬁcation precision rate on Sentiment polarity

datasets is about 92%, and the test precision rate on Twitter

also closes to 90%. The Recall rate and F1-Measure also

describe the good performance of the SVMCNN model.

SVMCNN can achieve such results because it makes full

use of the advantages of CNN and SVM. It can handle

interactions between nonlinear features without relying on

all data. Apart from this, SVMCNN has high generalization

1434

TABLE II

THE R ESULTS O N THE SEN TIME NT PO LARI TY DATASET S

Precision Recall F1-Measure

Model Random Pre-trained Random Pre-trained Random Pre-trained

SVMCNN 91.89% 92.11% 85.00% 87.50% 88.30% 89.70%

CNN 88.89% 91.70% 82.00% 82.50% 85.30% 86.86%

SVM 86.20% 67.65% 75.76%

NB 87.50% 52.50% 65.25%

RNN 88.93% 84.10% 86.45%

LSTM 90.79% 85.32% 87.97%

TABLE III

THE R ESULTS O N THE TWI TTER D ATAS ETS

Precision Recall F1-Measure

Model Random Pre-trained Random Pre-trained Random Pre-trained

SVMCNN 87.19% 88.32% 80.03% 82.63% 83.46% 85.38%

CNN 79.58% 81.40% 74.32% 77.30% 76.86% 79.30%

SVM 76.30% 59.50% 66.86%

NB 75.21% 51.34% 61.02%

RNN 80.42% 73.53% 76.82%

LSTM 84.37% 80.10% 82.18%

ability, it can obtain good effect even if the scenarios of

the two datasets are very different. With just ﬁne-tuned,

SVMCNN model can adapt to multiple scenarios.

V. CONCLUSIONS

This paper aims at the problem that SVM is easily

affected by datasets in short text classiﬁcation, and proposes

to combine CNN with SVM to improve the classiﬁcation

effect. According to testing with Twitter users remarks, the

results show that the SVMCNN with pre-trained initialization

performs better than other algorithms. SVMCNN can play a

role in the public opinion analysis and sensitive information

identiﬁcation on the online social platform, which helps to

guide and maintain a safe and pure network environment.

REFERENCES

[1] M.Nii, K.Takahama, A.Uchinuno, and R.Sakashita, ”Soft class de-

cision for nursing-care text classiﬁcation using a k-nearest neighbor

based system”, in IEEE International Conference on Fuzzy Systems,

2014, pp. 1825-1830.

[2] J.D.M. Rennie, L. Shih, J. Teevan, and D.R. Karger, ”Tackling the

poor assumptions of naive bayes text classiﬁers”, in Machine Learning,

Proceedings of the Twentieth International Conference (ICML 2003),

2003, pp. 616-623.

[3] D.M. Diab, K.M.E. Hindi, Using differential evolution for ﬁne tuning

na¨

ıve Bayesian classiﬁers and its application for text classiﬁcation,

Applied Soft Computing, vol. 54, 2017, pp. 183-199.

[4] S. Xu, Bayesian naive bayes classiﬁers to text classiﬁcation, Journal

of Information Science, vol. 44, no. 1, 2018, pp. 48-59.

[5] J. Lilleberg, Y. Zhu, and Y. Zhang, ”Support vector machines and

word2vec for text classiﬁcation with semantic features”, in 14th

IEEE International Conference on Cognitive Informatics & Cognitive

Computing, 2015, pp. 136-140.

[6] H. Xia, M. Tao, and Y. Wang, ”Sentiment text classiﬁcation of

customers reviews on the web based on SVM”, in Sixth International

Conference on Natural Computation, 2010, pp. 3633-3637.

[7] S. Wang and C.D. Manning, ”Baselines and bigrams: Simple, good

sentiment and topic classiﬁcation”, in Proceedings of the 50th Annual

Meeting of the Association for Computational Linguistics: Short

Papers, Vol. 2, 2012, pp. 90-94.

[8] Y. Bengio, R. Ducharme, P. Vincent, et al, A neural probabilistic

language mode, Journal of Machine Learning Research, vol. 3, 2003,

pp. 1137-1155.

[9] G.E. Hinton, ”Learning distributed representations of concepts, in

Proceedings of the eighth annual conference of the cognitive science

society, 1986, pp. 1-12.

[10] G.E. Hinton and R.R. Salakhutdinov, Reducing the dimensionality of

data with neural networks, Science, vol. 313, no. 5786, 2006, pp. 504-

507.

[11] Y. LeCun, Y. Bengio, G.E. Hinton, Deep learning, Nature, vol. 521,

2015, pp. 436-444.

[12] Y. Kim, ”Convolutional neural networks for sentence classiﬁcation”,

Proceedings of the 2014 Conference on Empirical Methods in Natural

Language Processing, 2014, pp. 1746-1751.

[13] N. Kalchbrenner, E. Grefenstette, P. A. Blunsom, Convolutional Neural

Network for Modelling Sentences. The Association for Computer

Linguistics, vol. 1, 2014, pp. 655-665.

[14] T. Zhang, C. Li, N. Cao, et al, Text feature extraction and classiﬁcation

based on convolu-tional neural network (cnn),in Data Science, 2017,

pp. 472-485.

[15] A. Conneau, H. Schwenk, L. Barraul, et al, Very Deep Convolutional

Networks for Text Classiﬁcation, Association for Computer Linguis-

tics, vol. 1, 2017, pp. 107-1116.

[16] S. Lai, L. Xu, K. Liu, and J. Zhao, ”Recurrent convolutional neural

networks for text classiﬁcation”, in Proceedings of the Twenty-Ninth

AAAI Conference on Artiﬁcial Intelligence, 2015, pp. 2267-2273.

[17] J.Y. Lee and F.Dernoncourt, ”Sequential short-text classiﬁcation with

recurrent and convolutional neural networks”, in The 2016 Conference

of the North American Chapter of the Association for Computational

Linguistics: Human Language Technologies, 2016, pp. 515-520.

1435

A Comparative Study of Short Text Classification with Spiking Neural Networks

Conference Paper

Full-text available

Sep 2022

A Comparative Study of Short Text Classification with Spiking Neural Networks

Preprint

Full-text available

Sep 2022

Short text classification is an important task widely used in many applications. However, few works investigated applying Spiking Neural Networks (SNNs) for text classification. To the best of our knowledge, there were no attempts to apply SNNs as classifiers of short texts. In this paper, we offer a comparative study of short text classification using SNNs. To this end, we selected and evaluated three popular implementations of SNNs: evolving Spiking Neural Networks (eSNN), the NeuCube implementation of SNNs, as well as the SNNTorch implementation that is available as the Python language package. In order to test the selected classifiers, we selected and preprocessed three publicly available datasets: 20-newsgroup dataset as well as imbalanced and balanced PubMed datasets of medical publications. The preprocessed 20-newsgroup dataset consists of first 100 words of each text, while for the classification of PubMed datasets we use only a title of each publication. As a text representation of documents, we applied the TF-IDF encoding. In this work, we also offered a new encoding method for eSNN networks, that can effectively encode values of input features having non-uniform distributions. The designed method works especially effectively with the TF-IDF encoding. The results of our study suggest that SNN networks may provide the classification quality is some cases matching or outperforming other types of classifiers.

Text Classification Accuracy Enhancement Using Deep Neural Networks

Conference Paper

Jul 2023

Unlocking the Potential of Keyword Extraction: The Need for Access to High-Quality Datasets

Article

Full-text available

Jun 2023

Keyword extraction is a critical task that enables various applications, including text classification, sentiment analysis, and information retrieval. However, the lack of a suitable dataset for semantic analysis of keyword extraction remains a serious problem that hinders progress in this field. Although some datasets exist for this task, they may not be representative, diverse, or of high quality, leading to suboptimal performance, inaccurate results, and reduced efficiency. To address this issue, we conducted a study to identify a suitable dataset for keyword extraction based on three key factors: dataset structure, complexity, and quality. The structure of a dataset should contain real-time data that is easily accessible and readable. The complexity should also reflect the diversity of sentences and their distribution in real-world scenarios. Finally, the quality of the dataset is a crucial factor in selecting a suitable dataset for keyword extraction. The quality depends on its accuracy, consistency, and completeness. The dataset should be annotated with high-quality labels that accurately reflect the keywords in the text. It should also be complete, with enough examples to accurately evaluate the performance of keyword extraction algorithms. Consistency in annotations is also essential, ensuring that the dataset is reliable and useful for further research.

A Study of Chinese News Headline Classification Based on Keyword Feature Expansion

Article

Full-text available

May 2023
INT J COMPUT INT SYS

Existing work generally classifies news headlines as a matter of short text classification. However, due to the strong domain nature and limited text length of news headlines, their classification results are usually determined by several specific keywords, which makes the traditional short text classification method ineffective. In this paper, we propose a new method to identify keywords in news headlines and expand their features from sentence level and word level respectively, and finally use convolutional neural networks (CNN) to extract and classify their features. The proposed model was tested on the Sogou News Corpus dataset and achieved 93.42 $$\%$$ % accuracy.

Short-Text Semantic Similarity (STSS): Techniques, Challenges and Future Perspectives

Article

Full-text available

Mar 2023

In natural language processing, short-text semantic similarity (STSS) is a very prominent field. It has a significant impact on a broad range of applications, such as question–answering systems, information retrieval, entity recognition, text analytics, sentiment classification, and so on. Despite their widespread use, many traditional machine learning techniques are incapable of identifying the semantics of short text. Traditional methods are based on ontologies, knowledge graphs, and corpus-based methods. The performance of these methods is influenced by the manually defined rules. Applying such measures is still difficult, since it poses various semantic challenges. In the existing literature, the most recent advances in short-text semantic similarity (STSS) research are not included. This study presents the systematic literature review (SLR) with the aim to (i) explain short sentence barriers in semantic similarity, (ii) identify the most appropriate standard deep learning techniques for the semantics of a short text, (iii) classify the language models that produce high-level contextual semantic information, (iv) determine appropriate datasets that are only intended for short text, and (v) highlight research challenges and proposed future improvements. To the best of our knowledge, we have provided an in-depth, comprehensive, and systematic review of short text semantic similarity trends, which will assist the researchers to reuse and enhance the semantic information.

Preprint

Full-text available

Nov 2022

The novel coronavirus disease 2019 (COVID-19) has had a serious impact on everyone's health and lives. Consequently, people started to post a lot on Twitter about this disease. The Twitter data can be used in many fields such as health, education, economy and politics. However, analyzing a large amount of text data is not a trivial task and requires advanced machine learning methods. Although there are some proposed methods to overcome this problem, it is considered that a comprehensive analysis that includes most of the most advanced machine learning methods is highly valued. In this study, it has been analyzed whether the tweets are related to COVID-19, and sentiments of Twitter users with the use of machine and deep learning based text mining techniques. A total of sixteen different machine learning based methods have been used including supervised and ensemble learning algorithms. In addition, the architecture designs based on deep learning techniques Convolutional Neural Network (CNN), Recurrent Neural Networks (RNNs), and CNN+RNNs hyrid models have been used. In sentiment analysis task, the highest predictive performance has been obtained by Bidirectional Gated Recurrent Unit (BiGRU) architecture design in conjunction with pretrained GloVe word embedding method, with a classification accuracy of 95.35%. In topic classification task, the highest predictive performance has been obtained by Stacking ensemble method using together unigram and bigram word based Term Frequency-Inverse Document Frequency (TF-IDF) feature extraction technique, with a classification accuracy of 91.49%.

EM-PERSONA: EMotion-assisted Deep Neural Framework for PERSONAlity Subtyping from Suicide Notes

Conference Paper

Full-text available

Oct 2022

The World Health Organization has emphasised the need of stepping up suicide prevention efforts to meet the United Nation’s Sustainable Development Goal target of 2030 (Goal 3: Good health and well-being). We address the challenging task of personality subtyping from suicide notes. Most research on personality subtyping has relied on statistical analysis and feature engineering. Moreover, state-of-the-art transformer models in the automated personality subtyping problem have received relatively less attention. We develop a novel EMotion-assisted PERSONAlity Detection Framework (EM-PERSONA). We annotate the benchmark CEASE-v2. 0 suicide notes dataset with personality traits across four dichotomies: Introversion (I)-Extraversion (E), Intuition (N)-Sensing (S), Thinking (T)-Feeling (F), Judging (J)–Perceiving (P). Our proposed method outperforms all baselines on comprehensive evaluation using multiple state-of-the-art systems. Across the four dichotomies, EM-PERSONA improved accuracy by 2.04%, 3.69%, 4.52%, and 3.42%, respectively, over the highest-performing single-task systems.

Online Brand Community User Segments: A Text Mining Approach

Article

Full-text available

Jul 2022

There is a trend that customers increasingly join the online brand community. However, evidence shows that there are nuances between different user segments, and only a small group of users are active. Thus, one key concern marketers face is identifying and targeting specific segments and decreasing user churn rates in an online environment. To this end, this study aims to propose a UGC-based segmentation of online brand community users, identify the characteristics of each segment, and consequently reduce online brand community users' churn rate. We used python to obtain users' post data from a well-known online brand community in China between July 2012 and December 2019, resulting in 912,452 posts and 20,493 users. We then use text mining and clustering methods to segment the users and compare the differences between the segments. Three groups—information-oriented users, entertainment-oriented users, and multi-motivation users—were emerged. Our results imply that entertainment-oriented users were the most active, yet, multi-directional users have the lowest probability of churn, with a churn rate of only 0.607 times than that of users who focus either on information or entertainment. Implications for marketing and future research opportunities are discussed.

A Survey of Artificial Intelligence Techniques for User Perceptions’ Extraction from Social Media Data

Article

Jul 2022

Measuring and analyzing user perceptions and behaviors in order to make user-centric decisions has been a topic of research for a long time even before the invention of social media platforms. In the past, the main approaches for measuring user perceptions were conducting surveys, interviewing experts and collecting data through questionnaires. But the main challenge with these methods was that the extracted perceptions were only able to represent a small group of people and not whole public. This challenge was resolved when social media platforms like Twitter and Facebook were introduced and users started to share their perceptions about any product, topic, event using these platforms. As these platforms became popular, the amount of data being shared on these platforms started to grow exponentially and this growth led to another challenge of analyzing this huge amount of data to understand or measure user perceptions. Computational techniques are used to address the challenge. This paper briefly describes the artificial intelligence (AI) techniques, which is one of the types of computational techniques available for analyzing social media data. Along with brief information about the AI techniques, this paper also shows state-of-the-art studies which utilize the AI techniques for measuring user perceptions from the social media data.

Using Differential Evolution for Fine Tuning Naïve Bayesian Classifiers and its Application for Text Classification

Article

Full-text available

Dec 2016
APPL SOFT COMPUT

The Naive Bayes (NB) learning algorithm is simple and effective in many domains including text classification. However, its performance depends on the accuracy of the estimated conditional probability terms. Sometimes these terms are hard to be accurately estimated especially when the training data is scarce. This work transforms the probability estimation problem into an optimization problem, and exploits three metaheuristic approaches to solve it. These approaches are Genetic Algorithms (GA), Simulated Annealing (SA), and Differential Evolution (DE). We also propose a novel DE algorithm that uses multi-parent mutation and crossover operations (MPDE) and three different methods to select the final solution. We create an initial population by manipulating the solution generated by a method used for fine tuning the NB. We evaluate the proposed methods by using their resulted solutions to build NB classifiers and compare their results with the results of obtained from classical NB and Fine-Tuning Naïve Bayesian (FTNB) algorithm, using 53 UCI benchmark data sets. We name these obtained classifiers NBGA, NBSA, NBDE, and NB-MPDE respectively. We also evaluate the performance NB-MPDE for text-classification using 18 text-classification data sets, and compare its results with the results of obtained from FTNB, BNB, and MNB. The experimental results show that using DE in general and the proposed MPDE algorithm in particular are more convenient for fine-tuning NB than all other methods, including the other two metaheuristic methods (GA, and SA). They also indicate that NB-MPDE achieves superiority over classical NB, FTNB, NBDE, NBGA, NBSA, MNB, and BNB.

Sequential Short-Text Classification with Recurrent and Convolutional Neural Networks

Conference Paper

Full-text available

Mar 2016

Recent approaches based on artificial neural networks (ANNs) have shown promising results for short-text classification. However, many short texts occur in sequences (e.g., sentences in a document or utterances in a dialog), and most existing ANN-based systems do not leverage the preceding short texts when classifying a subsequent one. In this work, we present a model based on recurrent neural networks and convolutional neural networks that incorporates the preceding short texts. Our model achieves state-of-the-art results on three different datasets for dialog act prediction.

Text Feature Extraction and Classification Based on Convolutional Neural Network (CNN)

Conference Paper

Sep 2017

With the high-speed development of the Internet, a growing number of Internet users like giving their subjective comments in the BBS, blog and shopping website. These comments contains critics’ attitudes, emotions, views and other information. Using these information reasonablely can help understand the social public opinion and make a timely response and help dealer to improve quality and service of products and make consumers know merchandise. This paper mainly discusses using convolutional neural network (CNN) for the operation of the text feature extraction. The concrete realization are discussed. Then combining with other text classifier make class operation. The experiment result shows the effectiveness of the method which is proposed in this paper.

Very Deep Convolutional Networks for Text Classification

Conference Paper

Apr 2017

The dominant approach for many NLP tasks are recurrent neural networks, in particular LSTMs, and convolutional neural networks. However, these architectures are rather shallow in comparison to the deep convolutional networks which have pushed the state-of-the-art in computer vision. We present a new architecture (VDCNN) for text processing which operates directly at the character level and uses only small convolutions and pooling operations. We are able to show that the performance of this model increases with the depth: using up to 29 convolutional layers, we report improvements over the state-of-the-art on several public text classification tasks. To the best of our knowledge, this is the first time that very deep convolutional nets have been applied to text processing.

Bayesian Nai ve Bayes classifiers to text classification

Article

Nov 2016
J INF SCI

Shuo Xu

Text classification is the task of assigning predefined categories to natural language documents, and it can provide conceptual views of document collections. The Naïve Bayes (NB) classifier is a family of simple probabilistic classifiers based on a common assumption that all features are independent of each other, given the category variable, and it is often used as the baseline in text classification. However, classical NB classifiers with multinomial, Bernoulli and Gaussian event models are not fully Bayesian. This study proposes three Bayesian counterparts, where it turns out that classical NB classifier with Bernoulli event model is equivalent to Bayesian counterpart. Finally, experimental results on 20 newsgroups and WebKB data sets show that the performance of Bayesian NB classifier with multinomial event model is similar to that of classical counterpart, but Bayesian NB classifier with Gaussian event model is obviously better than classical counterpart.

Support vector machines and Word2vec for text classification with semantic features

Conference Paper

Jul 2015

Attention-Based Convolutional Neural Networks for Sentence Classification

Conference Paper

Sep 2016

Soft class decision for nursing-care text classification using a k-nearest neighbor based system

Conference Paper

Jul 2014

Convolutional Neural Networks for Sentence Classification

Article

Aug 2014

Yoon Kim

We report on a series of experiments with convolutional neural networks (CNN) trained on top of pre-trained word vectors for sentence-level classification tasks. We first show that a simple CNN with little hyperparameter tuning and static vectors achieves excellent results on multiple benchmarks. Learning task-specific vectors through fine-tuning offers further gains in performance. We additionally propose a simple modification to the architecture to allow for the use of both task-specific and static word vectors. The CNN models discussed herein improve upon the state-of-the-art on 4 out of 7 tasks, which include sentiment analysis and question classification.

Baselines and Bigrams: Simple, Good Sentiment and Topic Classification

Conference Paper

Jul 2012

Variants of Naive Bayes (NB) and Support Vector Machines (SVM) are often used as baseline methods for text classification, but their performance varies greatly depending on the model variant, features used and task/dataset. We show that: (i) the inclusion of word bigram features gives consistent gains on sentiment analysis tasks; (ii) for short snippet sentiment tasks, NB actually does better than SVMs (while for longer documents the opposite result holds); (iii) a simple but novel SVM variant using NB log-count ratios as feature values consistently performs well across tasks and datasets. Based on these observations, we identify simple NB and SVM variants which outperform most published results on sentiment analysis datasets, sometimes providing a new state-of-the-art performance level.

Short Text Classification With A Convolutional Neural Networks Based Method

Abstract

Recommended publications

Sentence Style Meta Learning for Twitter Classification

MN-DS: A Multilabeled News Dataset for News Article Hierarchical Classification

Deep Learning Method with Attention for Extreme Multi-label Text Classification

Towards Explainable NLP: A Generative Explanation Framework for Text Classification