Conference PaperPDF Available

Short Text Classification With A Convolutional Neural Networks Based Method

Authors:

Abstract

The traditional machine learning algorithms are easily affected by datasets in short text classification tasks, so they have weak generalization ability when confronted with new situations. This paper presents a new method SVMCNN by combining Convolutional Neural Networks and Support Vector Machine. Training the SVMCNN model with labeled datasets, and using the collected Twitter data for classification test. The results show that the SVMCNN, especially pre-trained SVMCNN has good performance in short text classification, which gets the high Precision rate, Recall rate and F1-measure.
Short Text Classification With A Convolutional Neural Networks Based
Method
Yibo Hu, Yang Li, Tao Yang, Quan Pan
Abstract The traditional machine learning algorithms are
easily affected by datasets in short text classification tasks, so
they have weak generalization ability when confronted with
new situations. This paper presents a new method SVMCNN
by combining Convolutional Neural Networks and Support
Vector Machine. Training the SVMCNN model with labeled
datasets, and using the collected Twitter data for classification
test. The results show that the SVMCNN, especially pre-trained
SVMCNN has good performance in short text classification,
which gets the high Precision rate, Recall rate and F1-measure.
I. INTRODUCTION
According to the latest statistics, the number of global
Internet users has exceeded 4 billion. In 2017, the number
of newly added Internet users was 250 million, it meant
the Internet penetration rate had exceeded 50%, and the
users who use social media had increased by 13%. There
are numerous social applications attract many Internet users,
for example, the number of Twitters monthly active users
has reached 3.3 billion as of 20171. Internet users generate
a large amount of information on these social applications
every day, including articles, news, and comments. These
information are mainly distributed as texts, and many of
them are in short texts. Through these information, it is pos-
sible to filter spam, advertisement, and illegal information.
Meanwhile, by analyzing these information, some significant
events may be extracted, such as natural disasters, large-scale
public events, etc. Text classification is a useful technology
for these scenarios.
Traditional text classification methods are mainly based
on statistical principles, using manually labeled datasets to
train classifiers, and then classifying new data. Nill et al. use
a KNN (K-Nearest Neighbor) based system to classify the
Japanese Nursing-case text, and select the candidate category
for each text[1]. Naive Bayes (NB) is often used as a baseline
in text classification, because it is fast and easy to imple-
ment. Rennie et al. show that with proper preprocessing,
NB can be compared with more advanced methods such
as support vector machine[2]. Diab and Hindi use three
This work was not supported by any organization
Y. Hu is with the School of Automation, Northwestern Polytechnical
University, Xi’an, China 710072 hyb@mail.nwpu.edu.cn
Y. Li is with the School of Automation, Northwestern Polytechnical U-
niversity, Xi’an, China 710072 liyangnpu@mail.nwpu.edu.cn
T. Yang is with the School of Automation, Northwestern
Polytechnical University, Xi’an, China 710072 corresponding
author to provide phone: 13571913583;
yangtao107@nwpu.edu.cn
Q. Pan is with the School of Automation, Northwestern Polytechnical
University, Xi’an, China 710072 quanpan@nwpu.edu.cn
1https://www.statista.com/statistics/282087/number-of-monthly-active-
twitter-users/
ways to improve performance of NB when dealing with
sparse text data[3]. Xu proposes three Bayesian counterparts,
and proves that Bayesian NB classifier with Gaussian event
model is obviously better than classical counterpart to text
classification[4]. Lilleberg et al. use SVM (Support Vector
Machine) to verify that combining TF-IDF and word2vec can
outperform TF-IDF on text classification, but this study does
not consider the impact of redundant features on SVM[5].
Xia et al. use SVM to perform Chinese sentiment analysis by
the online hotel reviews[6]. The effect of different stop word
filtering methods and feature selection methods are verified
by SVM. But the representation of text data loses semantic
information. Though NB and SVM are commonly used for
text classification, they heavily depending on the variant,
features and datasets are used. For short snippet sentiment
tasks, NB actually does better than SVM, and the opposite
result holds while for longer documents. Wang et al. propose
a NBSVM model, which using NB log-count ratios as feature
values and using SVM for classification[7]. But NB cant
guarantee that it will provide the most representative features.
In recent years, using neural networks to establish lan-
guage models has gradually matured. Bengio et al. propose
a neural network method to construct a binary language
model[8]. Hinton proposes the concept of word embedding,
which is valued by more and more researchers[9]. Word
embedding not only avoids the ”dimension disaster”, but
also describes the relationship between words from a higher
semantic level.
With the appearance and development of deep learning,
many fields have its presence, such as text classification
tasks[10][11]. Kim uses a simple convolutional neural net-
work(CNN) model for text semantic classification[12]. The
experimental results show that the CNN model performs no
less than traditional methods in the semantic classification of
sentences. Kalchbrenner et al. propose a multi-layered CNN,
and k-max pooling is used for sentence classification[13].
Zhang et al. using CNN for the text feature extraction[14].
Conneau et al. propose a deep CNN model for text clas-
sification, but it takes a long time[15]. Lai et al. propose
RCNN model that combines CNN and RNN[16]. The RCNN
model uses RNN to capture context information, and uses
CNN to construct a semantic representation of text. Lee and
Dernoncourt use RNN and CNN to classify continuous short
texts, which shows that CNN works better[17]. At the same
time, the short text representation is better than the class
representation, and the effect is reduced when they are used
simultaneously. This is because the short text representation
contains richer information than the class representation.
2018 15th International Conference on
Control, Automation, Robotics and Vision (ICARCV)
Singapore, November 18-21, 2018
978-1-5386-9582-1/18/$31.00 ©2018 IEEE 1432
This paper focuses on user comments on the Twitter social
platform, and discusses a short text classification method that
combines CNN with SVM, which using CNN for features
extraction and using SVM for classification.
II. SHO RT TEXT CLASSIFICATION
Short texts are unstructured data and need to be converted
into structured data that can be processed by computer
directly. Structured representations contain a large amount
of semantically relevant information, that is, contain a large
number of features. There are many features that are less
useful for classification. Extracting the most important set of
features, and using them to train the classifiers.
At present, the commonly used text classification meth-
ods include the traditional methods and the deep learning
based methods. The traditional text classification methods
are mainly based on machine learning, and use the principles
of statistics to classify. The deep learning based methods
mainly use the neural network to extract the features of
texts, which can combine the low-level features to form more
abstract high-level features. This paper tries to combine these
two methods, and proposes a Support Vector Machine with
Convolutional Neural Network(SVMCNN).
III. SVMCNN MOD EL
This paper combines CNN with SVM, because CNN can
capture features between consecutive words through convo-
lution processing, and SVM can get the optimal solution of
existing information in the case of limited samples. Fig. 1
shows the work process of SVMCNN model, which uses
CNN to extract features of short texts and then uses SVM
classifier for classification.
There are many short texts of various lengths in
the Twitter datasets. Each short text is initialized with
vector(V1,V2, ..., VN), Let ViRkbe the k-dimensional word
vector corresponding to the i-th word in the short text, N
means the number of words in the longest text of the Twitter
datasets. Short texts with a length less than Nare filled, so
that all short texts have the same length. Cascading the word
vectors of each short text as
S=V1V2⊕ · · · ⊕ VN(1)
where Sis the representation the short text, is the con-
catenation operator. The word vectors can be concatenated
together as a matrix, and then input this matrix into CNN
model for feature extraction.
A convolution operation involves a filter WRh×k, where
his the height of the convolution kernel window, a window of
h word vectors can be mapped and produce a new feature.
Let Vi:i+h1refers to the concatenation of hword vectors
(Vi,Vi+1, ..., Vi+h1), their feature can be generated by
ci=f(W·Vi:i+h1+b)(2)
where bRis a bias term, fis a non-linear activation
function, such as Sigmoid, Tanh, Relu. This paper uses
ReLu as the activation function. ReLu is a piecewise linear
function that can reduce the interdependence of parameters.
Therefore, ReLu can ease the overfitting problem.
The filter is applied to each possible window of word
vectors (V1:h,V2:h+1, ..., Vnh+1:n)to produce a feature map
[c1,c2, ..., cnh+1]. Different filters can extract text features
from different perspectives. Therefore, different feature maps
can be obtained by setting the filter size and the number of
each filter.
After the feature map, reducing the parameters by the max-
pooling layer and obtaining the optimal features. Then, all
the obtained local optimal features are connected through a
fully connected layer whose output is the feature vector of
the short text.
Finally, using SVM classifier to classify the short text fea-
tures. Let xi= (x(1)
i,x(2)
i, ..., x(m)
i)Tbe the i-th k-dimensional
feature vector, yi∈ {−1,1}is the category of the i-th short
text. The SVM classifier can separate the feature vectors of
the short text by learning to find a hyperplane
#»
ω·x+b=0 (3)
where #»
ωis the normal vector that determines the direction
of the hyperplane, bis the displacement term that determines
the distance between the hyperplane and the origin. The
distance from xito the hyperplane
ri=
#»
ωxi+b
||#»
ω|| (4)
Finding the optimal hyperplane is to find the nearest two d-
ifferent vectors that their distances to the hyperplane is equal
and the sum of the distances from them to the hyperplane is
the furthest. That’s equivalent to
min 1
2||#»
ω||2
s.t.yi(ωTxi+b)1,i=1,··· ,n
(5)
Getting #»
ωand bof the optimal hyperplane, and then using
this optimal hyperplane to classify short text feature vectors.
IV. EXP ERI MEN T DESI GN AN D RESU LT S ANALYS IS
A. Datasets and Evaluation
This paper applies to two datasets. One is Sentiment po-
larity datasets2. The role of the datasets is to train SVMCNN
model parameters. The datasets include positive subsets and
negative subsets, and each of them contains more than 5,000
movie-review data. This paper trains the model with 10-fold
cross-validation. The other one is Twitter datasets. The role
of the datasets is to evaluate the generalization ability of
the SVMCNN model. The Twitter datasets imclude a total
of 3,169 comments data collected from the Twitter social
platform, and every piece of data carries the user’s sentiment.
For example, ”thankfully, overall, in the long run, things
are getting better in the world”, this is a positive sentiment
obviously. Another example, ”so sad to hear of the terrorist
attack in Egypt”, this comment expresses negative sentiment.
Since there are many illegal characters in the Twitter data,
2http://www.cs.cornell.edu/people/pabo/movie-review-data/
1433
Fig. 1. SVMCNN model
these will affect the short text classification. Therefore, the
Twitter data need to be pre-processed to remove unnecessary
characters. This paper uses movie-review data to train model
and uses Twitter data to evaluate model, which is to evaluate
the models ability to adapt the big different data.
This paper evaluates the performance of algorithms with
three indicators, Precision rate, Recall rate, and F1-measure.
B. Experiment Design
It’s essential to use CNN to extract feature vectors of short
texts. Short texts must be initialized into word vectors when
they are input into the CNN model. Different initialization
methods have different classification effects. This paper
uses random initialization and pre-trained methods for text
representation respectively.
Random initialization only needs to input the datasets
into the CNN model whose input layer is used for text
quantization. In the pre-trained initialization, datasets are
mapped to word vectors based on pre-trained. This paper
uses a public word embedding model which is pre-trained
by word2vec3. Word2vec can obtain a word vector space by
performing unsupervised learning on a large amount of text
corpora. As long as collecting a large amount of text corpora
covering most of the daily work, a universal word embedding
model can be pre-trained. Using this word embedding model
can make the initialized word vectors contain more semantic
information.
This paper builds a CNN model with three convolutional
layers based on the TensorFlow4. Specific parameters of
CNN model are in Table I.
Each short text can be represented as a 384-dimensional
vector after feature extraction by CNN. These vectors rep-
resent the main features of each short text and can be input
into SVM classifier for classification.
C. Results and Analysis
This paper compares random initialization and pre-trained
initialization for short text features extraction with CNN
model. Training the CNN model with sentiment polarity
3GoogleNews-vectors-negative300.bin
4https://www.tensorflow.org/
TABLE I
CNN MODEL PARAMETERS
Parameter Value
Word vector dimension 128
Filter size 3, 4, 5
Filter Number 128
Dropout rate 0.5
Batch size 64
Steps 3000
Learning rate 103
datasets. When using random initialization, the model starts
to converge after 2000 steps, and the accuracy on train set is
95% when the training is completed. However, when using
the pre-trained initialization, the training has already begun
to converge when it reaches 1000 steps, and the accuracy on
train set is close to 100%.
Then using various algorithms to predict the categories of
Twitter short texts. In addition to using SVMCNN and CNN,
this paper also uses other text classification methods. All
of these model use sentiment polarity datasets for training,
and then predict Twitter datasets. Since the Twitter datasets
are unlabeled, this paper analyze the real categories manu-
ally. Comparing the real categories with the predictions of
the models, and calculating the classification results of all
models finally. The model training results on the Sentiment
polarity datasets are in Table II, the prediction results on the
Twitter datasets are in Table III.
From the results, it can be seen that the model initial-
ized with pre-trained is better than the model with random
initialization, the three indicators of the former are all
higher than the latter. In addition, the SVMCNN model
performs well in all aspects. Especially, the SVMCNN with
pre-trained initialization has three highest indicators. The
model’s classification precision rate on Sentiment polarity
datasets is about 92%, and the test precision rate on Twitter
also closes to 90%. The Recall rate and F1-Measure also
describe the good performance of the SVMCNN model.
SVMCNN can achieve such results because it makes full
use of the advantages of CNN and SVM. It can handle
interactions between nonlinear features without relying on
all data. Apart from this, SVMCNN has high generalization
1434
TABLE II
THE R ESULTS O N THE SEN TIME NT PO LARI TY DATASET S
Precision Recall F1-Measure
Model Random Pre-trained Random Pre-trained Random Pre-trained
SVMCNN 91.89% 92.11% 85.00% 87.50% 88.30% 89.70%
CNN 88.89% 91.70% 82.00% 82.50% 85.30% 86.86%
SVM 86.20% 67.65% 75.76%
NB 87.50% 52.50% 65.25%
RNN 88.93% 84.10% 86.45%
LSTM 90.79% 85.32% 87.97%
TABLE III
THE R ESULTS O N THE TWI TTER D ATAS ETS
Precision Recall F1-Measure
Model Random Pre-trained Random Pre-trained Random Pre-trained
SVMCNN 87.19% 88.32% 80.03% 82.63% 83.46% 85.38%
CNN 79.58% 81.40% 74.32% 77.30% 76.86% 79.30%
SVM 76.30% 59.50% 66.86%
NB 75.21% 51.34% 61.02%
RNN 80.42% 73.53% 76.82%
LSTM 84.37% 80.10% 82.18%
ability, it can obtain good effect even if the scenarios of
the two datasets are very different. With just fine-tuned,
SVMCNN model can adapt to multiple scenarios.
V. CONCLUSIONS
This paper aims at the problem that SVM is easily
affected by datasets in short text classification, and proposes
to combine CNN with SVM to improve the classification
effect. According to testing with Twitter users remarks, the
results show that the SVMCNN with pre-trained initialization
performs better than other algorithms. SVMCNN can play a
role in the public opinion analysis and sensitive information
identification on the online social platform, which helps to
guide and maintain a safe and pure network environment.
REFERENCES
[1] M.Nii, K.Takahama, A.Uchinuno, and R.Sakashita, ”Soft class de-
cision for nursing-care text classification using a k-nearest neighbor
based system”, in IEEE International Conference on Fuzzy Systems,
2014, pp. 1825-1830.
[2] J.D.M. Rennie, L. Shih, J. Teevan, and D.R. Karger, ”Tackling the
poor assumptions of naive bayes text classifiers”, in Machine Learning,
Proceedings of the Twentieth International Conference (ICML 2003),
2003, pp. 616-623.
[3] D.M. Diab, K.M.E. Hindi, Using differential evolution for fine tuning
na¨
ıve Bayesian classifiers and its application for text classification,
Applied Soft Computing, vol. 54, 2017, pp. 183-199.
[4] S. Xu, Bayesian naive bayes classifiers to text classification, Journal
of Information Science, vol. 44, no. 1, 2018, pp. 48-59.
[5] J. Lilleberg, Y. Zhu, and Y. Zhang, ”Support vector machines and
word2vec for text classification with semantic features”, in 14th
IEEE International Conference on Cognitive Informatics & Cognitive
Computing, 2015, pp. 136-140.
[6] H. Xia, M. Tao, and Y. Wang, ”Sentiment text classification of
customers reviews on the web based on SVM”, in Sixth International
Conference on Natural Computation, 2010, pp. 3633-3637.
[7] S. Wang and C.D. Manning, ”Baselines and bigrams: Simple, good
sentiment and topic classification”, in Proceedings of the 50th Annual
Meeting of the Association for Computational Linguistics: Short
Papers, Vol. 2, 2012, pp. 90-94.
[8] Y. Bengio, R. Ducharme, P. Vincent, et al, A neural probabilistic
language mode, Journal of Machine Learning Research, vol. 3, 2003,
pp. 1137-1155.
[9] G.E. Hinton, ”Learning distributed representations of concepts, in
Proceedings of the eighth annual conference of the cognitive science
society, 1986, pp. 1-12.
[10] G.E. Hinton and R.R. Salakhutdinov, Reducing the dimensionality of
data with neural networks, Science, vol. 313, no. 5786, 2006, pp. 504-
507.
[11] Y. LeCun, Y. Bengio, G.E. Hinton, Deep learning, Nature, vol. 521,
2015, pp. 436-444.
[12] Y. Kim, ”Convolutional neural networks for sentence classification”,
Proceedings of the 2014 Conference on Empirical Methods in Natural
Language Processing, 2014, pp. 1746-1751.
[13] N. Kalchbrenner, E. Grefenstette, P. A. Blunsom, Convolutional Neural
Network for Modelling Sentences. The Association for Computer
Linguistics, vol. 1, 2014, pp. 655-665.
[14] T. Zhang, C. Li, N. Cao, et al, Text feature extraction and classification
based on convolu-tional neural network (cnn),in Data Science, 2017,
pp. 472-485.
[15] A. Conneau, H. Schwenk, L. Barraul, et al, Very Deep Convolutional
Networks for Text Classification, Association for Computer Linguis-
tics, vol. 1, 2017, pp. 107-1116.
[16] S. Lai, L. Xu, K. Liu, and J. Zhao, ”Recurrent convolutional neural
networks for text classification”, in Proceedings of the Twenty-Ninth
AAAI Conference on Artificial Intelligence, 2015, pp. 2267-2273.
[17] J.Y. Lee and F.Dernoncourt, ”Sequential short-text classification with
recurrent and convolutional neural networks”, in The 2016 Conference
of the North American Chapter of the Association for Computational
Linguistics: Human Language Technologies, 2016, pp. 515-520.
1435
... In [17], a model that combines CNNs networks with SVM classifier was applied to the classification of simple sentences expressing either positive and negative feelings. In the approach presented in [17], first, a word embedding method (such as Word2Wec) is used to obtain the vector representation of each word in the text corpus. ...
... In [17], a model that combines CNNs networks with SVM classifier was applied to the classification of simple sentences expressing either positive and negative feelings. In the approach presented in [17], first, a word embedding method (such as Word2Wec) is used to obtain the vector representation of each word in the text corpus. Subsequently, each text in the corpus is represented as a sequence of vectors, each corresponding to one of the text's words. ...
... The results obtained from the fully-connected layer are used as training data for the SVM classifier. The results of experiments presented in [17] suggest that this approach can provide better classification results than separate CNNs and SVM classifiers. ...
... In [17], a model that combines CNNs networks with SVM classifier was applied to the classification of simple sentences expressing either positive and negative feelings. In the approach presented in [17], first, a word embedding method (such as Word2Wec) is used to obtain the vector representation of each word in the text corpus. ...
... In [17], a model that combines CNNs networks with SVM classifier was applied to the classification of simple sentences expressing either positive and negative feelings. In the approach presented in [17], first, a word embedding method (such as Word2Wec) is used to obtain the vector representation of each word in the text corpus. Subsequently, each text in the corpus is represented as a sequence of vectors, each corresponding to one of the text's words. ...
... The results obtained from the fully-connected layer are used as training data for the SVM classifier. The results of experiments presented in [17] suggest that this approach can provide better classification results than separate CNNs and SVM classifiers. ...
Preprint
Full-text available
Short text classification is an important task widely used in many applications. However, few works investigated applying Spiking Neural Networks (SNNs) for text classification. To the best of our knowledge, there were no attempts to apply SNNs as classifiers of short texts. In this paper, we offer a comparative study of short text classification using SNNs. To this end, we selected and evaluated three popular implementations of SNNs: evolving Spiking Neural Networks (eSNN), the NeuCube implementation of SNNs, as well as the SNNTorch implementation that is available as the Python language package. In order to test the selected classifiers, we selected and preprocessed three publicly available datasets: 20-newsgroup dataset as well as imbalanced and balanced PubMed datasets of medical publications. The preprocessed 20-newsgroup dataset consists of first 100 words of each text, while for the classification of PubMed datasets we use only a title of each publication. As a text representation of documents, we applied the TF-IDF encoding. In this work, we also offered a new encoding method for eSNN networks, that can effectively encode values of input features having non-uniform distributions. The designed method works especially effectively with the TF-IDF encoding. The results of our study suggest that SNN networks may provide the classification quality is some cases matching or outperforming other types of classifiers.
... In [14], a new method called SVMCNN is presented which is a combination of two methods of convolutional neural networks and a support vector machine. CNN can capture properties between consecutive words through convolution processing, and SVM can provide the optimal solution to the information available in finite instances. ...
... A lightweight alternative method to unsupervised machine learning is to utilize local text features and statistical information, such as term frequencies and co-occurrences. This approach involves analyzing the document to identify common patterns and associations between terms without requiring pre-labeled training data [12]. By relying on statistical information and local text features, this method can provide a scalable and computationally efficient solution for keyword extraction tasks. ...
Article
Full-text available
Keyword extraction is a critical task that enables various applications, including text classification, sentiment analysis, and information retrieval. However, the lack of a suitable dataset for semantic analysis of keyword extraction remains a serious problem that hinders progress in this field. Although some datasets exist for this task, they may not be representative, diverse, or of high quality, leading to suboptimal performance, inaccurate results, and reduced efficiency. To address this issue, we conducted a study to identify a suitable dataset for keyword extraction based on three key factors: dataset structure, complexity, and quality. The structure of a dataset should contain real-time data that is easily accessible and readable. The complexity should also reflect the diversity of sentences and their distribution in real-world scenarios. Finally, the quality of the dataset is a crucial factor in selecting a suitable dataset for keyword extraction. The quality depends on its accuracy, consistency, and completeness. The dataset should be annotated with high-quality labels that accurately reflect the keywords in the text. It should also be complete, with enough examples to accurately evaluate the performance of keyword extraction algorithms. Consistency in annotations is also essential, ensuring that the dataset is reliable and useful for further research.
... In this regard, many scholars started to find new methods to solve this problem. Fan et al. [6] proposed a co-occurrence-based short text feature expansion method, Hu et al. [7] proposed a new method by combining CNN with support vector machines, and Zhihao et al. [8] started to use graph convolutional networks for headline classification. All of these methods improve the generalization ability of machine learning in face of new situations to some extent and also relatively improve the classification accuracy, but none of them essentially solve the problem of inefficient news headline classification. ...
Article
Full-text available
Existing work generally classifies news headlines as a matter of short text classification. However, due to the strong domain nature and limited text length of news headlines, their classification results are usually determined by several specific keywords, which makes the traditional short text classification method ineffective. In this paper, we propose a new method to identify keywords in news headlines and expand their features from sentence level and word level respectively, and finally use convolutional neural networks (CNN) to extract and classify their features. The proposed model was tested on the Sogou News Corpus dataset and achieved 93.42 $$\%$$ % accuracy.
... Moreover, many traditional algorithms of machine learning affect the generalization ability of short text. Due to this fact, Hu et al. [22] presented a CNN with a support vector machine convolutional neural network (SVMCNN) to improve short text classification. Meanwhile, the researchers trained their model on TensorFlow by utilizing the Twitter social platform. ...
Article
Full-text available
In natural language processing, short-text semantic similarity (STSS) is a very prominent field. It has a significant impact on a broad range of applications, such as question–answering systems, information retrieval, entity recognition, text analytics, sentiment classification, and so on. Despite their widespread use, many traditional machine learning techniques are incapable of identifying the semantics of short text. Traditional methods are based on ontologies, knowledge graphs, and corpus-based methods. The performance of these methods is influenced by the manually defined rules. Applying such measures is still difficult, since it poses various semantic challenges. In the existing literature, the most recent advances in short-text semantic similarity (STSS) research are not included. This study presents the systematic literature review (SLR) with the aim to (i) explain short sentence barriers in semantic similarity, (ii) identify the most appropriate standard deep learning techniques for the semantics of a short text, (iii) classify the language models that produce high-level contextual semantic information, (iv) determine appropriate datasets that are only intended for short text, and (v) highlight research challenges and proposed future improvements. To the best of our knowledge, we have provided an in-depth, comprehensive, and systematic review of short text semantic similarity trends, which will assist the researchers to reuse and enhance the semantic information.
... Kim et al. showed that the use of CNN in short text classifications, such as movie reviews increase the accuracy rate [40]. Hu et al. presented a new method combine the CNN and SVM, called SVMCNN for used in short text classification [41]. ...
Preprint
Full-text available
The novel coronavirus disease 2019 (COVID-19) has had a serious impact on everyone's health and lives. Consequently, people started to post a lot on Twitter about this disease. The Twitter data can be used in many fields such as health, education, economy and politics. However, analyzing a large amount of text data is not a trivial task and requires advanced machine learning methods. Although there are some proposed methods to overcome this problem, it is considered that a comprehensive analysis that includes most of the most advanced machine learning methods is highly valued. In this study, it has been analyzed whether the tweets are related to COVID-19, and sentiments of Twitter users with the use of machine and deep learning based text mining techniques. A total of sixteen different machine learning based methods have been used including supervised and ensemble learning algorithms. In addition, the architecture designs based on deep learning techniques Convolutional Neural Network (CNN), Recurrent Neural Networks (RNNs), and CNN+RNNs hyrid models have been used. In sentiment analysis task, the highest predictive performance has been obtained by Bidirectional Gated Recurrent Unit (BiGRU) architecture design in conjunction with pretrained GloVe word embedding method, with a classification accuracy of 95.35%. In topic classification task, the highest predictive performance has been obtained by Stacking ensemble method using together unigram and bigram word based Term Frequency-Inverse Document Frequency (TF-IDF) feature extraction technique, with a classification accuracy of 91.49%.
... The individual sentences of the same note are processed in parallel by four convolutional and max pool layers (Conv Max Pool) of the region (k) size 1, 2, 3, and 4 and 50 filters, each of which generates sentence-level feature representations (sr i ). We use convolutional neural networks (CNN) as they are easier to parallelise, faster to train than recurrent neural networks, and effective for short sentences (Hu et al., 2018;Wang et al., 2021) (average sentence length in the CEASE corpus is 15). Word vectors, at the sentence level, are fetched from the pre-trained GloVe (Pennington et al., 2014) embedding. ...
Conference Paper
Full-text available
The World Health Organization has emphasised the need of stepping up suicide prevention efforts to meet the United Nation’s Sustainable Development Goal target of 2030 (Goal 3: Good health and well-being). We address the challenging task of personality subtyping from suicide notes. Most research on personality subtyping has relied on statistical analysis and feature engineering. Moreover, state-of-the-art transformer models in the automated personality subtyping problem have received relatively less attention. We develop a novel EMotion-assisted PERSONAlity Detection Framework (EM-PERSONA). We annotate the benchmark CEASE-v2. 0 suicide notes dataset with personality traits across four dichotomies: Introversion (I)-Extraversion (E), Intuition (N)-Sensing (S), Thinking (T)-Feeling (F), Judging (J)–Perceiving (P). Our proposed method outperforms all baselines on comprehensive evaluation using multiple state-of-the-art systems. Across the four dichotomies, EM-PERSONA improved accuracy by 2.04%, 3.69%, 4.52%, and 3.42%, respectively, over the highest-performing single-task systems.
... Wang et al. (2017) successfully categorize labeled short text documents in Chinese using kernel SVM as the classifier, and their results show that the SVM method outperforms other conventional classification methods such as k-Nearest Neighbor and Decision Tree. In sum, SVM has been widely used in the short text classification of social media sites (Yin et al., 2015;Hu et al., 2018). ...
Article
Full-text available
There is a trend that customers increasingly join the online brand community. However, evidence shows that there are nuances between different user segments, and only a small group of users are active. Thus, one key concern marketers face is identifying and targeting specific segments and decreasing user churn rates in an online environment. To this end, this study aims to propose a UGC-based segmentation of online brand community users, identify the characteristics of each segment, and consequently reduce online brand community users' churn rate. We used python to obtain users' post data from a well-known online brand community in China between July 2012 and December 2019, resulting in 912,452 posts and 20,493 users. We then use text mining and clustering methods to segment the users and compare the differences between the segments. Three groups—information-oriented users, entertainment-oriented users, and multi-motivation users—were emerged. Our results imply that entertainment-oriented users were the most active, yet, multi-directional users have the lowest probability of churn, with a churn rate of only 0.607 times than that of users who focus either on information or entertainment. Implications for marketing and future research opportunities are discussed.
Article
Measuring and analyzing user perceptions and behaviors in order to make user-centric decisions has been a topic of research for a long time even before the invention of social media platforms. In the past, the main approaches for measuring user perceptions were conducting surveys, interviewing experts and collecting data through questionnaires. But the main challenge with these methods was that the extracted perceptions were only able to represent a small group of people and not whole public. This challenge was resolved when social media platforms like Twitter and Facebook were introduced and users started to share their perceptions about any product, topic, event using these platforms. As these platforms became popular, the amount of data being shared on these platforms started to grow exponentially and this growth led to another challenge of analyzing this huge amount of data to understand or measure user perceptions. Computational techniques are used to address the challenge. This paper briefly describes the artificial intelligence (AI) techniques, which is one of the types of computational techniques available for analyzing social media data. Along with brief information about the AI techniques, this paper also shows state-of-the-art studies which utilize the AI techniques for measuring user perceptions from the social media data.
Article
Full-text available
The Naive Bayes (NB) learning algorithm is simple and effective in many domains including text classification. However, its performance depends on the accuracy of the estimated conditional probability terms. Sometimes these terms are hard to be accurately estimated especially when the training data is scarce. This work transforms the probability estimation problem into an optimization problem, and exploits three metaheuristic approaches to solve it. These approaches are Genetic Algorithms (GA), Simulated Annealing (SA), and Differential Evolution (DE). We also propose a novel DE algorithm that uses multi-parent mutation and crossover operations (MPDE) and three different methods to select the final solution. We create an initial population by manipulating the solution generated by a method used for fine tuning the NB. We evaluate the proposed methods by using their resulted solutions to build NB classifiers and compare their results with the results of obtained from classical NB and Fine-Tuning Naïve Bayesian (FTNB) algorithm, using 53 UCI benchmark data sets. We name these obtained classifiers NBGA, NBSA, NBDE, and NB-MPDE respectively. We also evaluate the performance NB-MPDE for text-classification using 18 text-classification data sets, and compare its results with the results of obtained from FTNB, BNB, and MNB. The experimental results show that using DE in general and the proposed MPDE algorithm in particular are more convenient for fine-tuning NB than all other methods, including the other two metaheuristic methods (GA, and SA). They also indicate that NB-MPDE achieves superiority over classical NB, FTNB, NBDE, NBGA, NBSA, MNB, and BNB.
Conference Paper
Full-text available
Recent approaches based on artificial neural networks (ANNs) have shown promising results for short-text classification. However, many short texts occur in sequences (e.g., sentences in a document or utterances in a dialog), and most existing ANN-based systems do not leverage the preceding short texts when classifying a subsequent one. In this work, we present a model based on recurrent neural networks and convolutional neural networks that incorporates the preceding short texts. Our model achieves state-of-the-art results on three different datasets for dialog act prediction.
Conference Paper
With the high-speed development of the Internet, a growing number of Internet users like giving their subjective comments in the BBS, blog and shopping website. These comments contains critics’ attitudes, emotions, views and other information. Using these information reasonablely can help understand the social public opinion and make a timely response and help dealer to improve quality and service of products and make consumers know merchandise. This paper mainly discusses using convolutional neural network (CNN) for the operation of the text feature extraction. The concrete realization are discussed. Then combining with other text classifier make class operation. The experiment result shows the effectiveness of the method which is proposed in this paper.
Conference Paper
The dominant approach for many NLP tasks are recurrent neural networks, in particular LSTMs, and convolutional neural networks. However, these architectures are rather shallow in comparison to the deep convolutional networks which have pushed the state-of-the-art in computer vision. We present a new architecture (VDCNN) for text processing which operates directly at the character level and uses only small convolutions and pooling operations. We are able to show that the performance of this model increases with the depth: using up to 29 convolutional layers, we report improvements over the state-of-the-art on several public text classification tasks. To the best of our knowledge, this is the first time that very deep convolutional nets have been applied to text processing.
Article
Text classification is the task of assigning predefined categories to natural language documents, and it can provide conceptual views of document collections. The Naïve Bayes (NB) classifier is a family of simple probabilistic classifiers based on a common assumption that all features are independent of each other, given the category variable, and it is often used as the baseline in text classification. However, classical NB classifiers with multinomial, Bernoulli and Gaussian event models are not fully Bayesian. This study proposes three Bayesian counterparts, where it turns out that classical NB classifier with Bernoulli event model is equivalent to Bayesian counterpart. Finally, experimental results on 20 newsgroups and WebKB data sets show that the performance of Bayesian NB classifier with multinomial event model is similar to that of classical counterpart, but Bayesian NB classifier with Gaussian event model is obviously better than classical counterpart.
Article
We report on a series of experiments with convolutional neural networks (CNN) trained on top of pre-trained word vectors for sentence-level classification tasks. We first show that a simple CNN with little hyperparameter tuning and static vectors achieves excellent results on multiple benchmarks. Learning task-specific vectors through fine-tuning offers further gains in performance. We additionally propose a simple modification to the architecture to allow for the use of both task-specific and static word vectors. The CNN models discussed herein improve upon the state-of-the-art on 4 out of 7 tasks, which include sentiment analysis and question classification.
Conference Paper
Variants of Naive Bayes (NB) and Support Vector Machines (SVM) are often used as baseline methods for text classification, but their performance varies greatly depending on the model variant, features used and task/dataset. We show that: (i) the inclusion of word bigram features gives consistent gains on sentiment analysis tasks; (ii) for short snippet sentiment tasks, NB actually does better than SVMs (while for longer documents the opposite result holds); (iii) a simple but novel SVM variant using NB log-count ratios as feature values consistently performs well across tasks and datasets. Based on these observations, we identify simple NB and SVM variants which outperform most published results on sentiment analysis datasets, sometimes providing a new state-of-the-art performance level.