ArticlePDF Available

Social Media-Based Surveillance Systems for Healthcare using Machine Learning

November 2022
European Journal of Engineering and Technology Research 7(6):21-28

November 2022
7(6):21-28

DOI:10.24018/ejeng.2022.7.6.2914

License
CC BY 4.0

Authors:

Chetanpal Singh

Central Queensland University

Rahul Thakkar

One of the most popular domains that have caught the attention of researchers is real-time surveillance in the health and informatics segment. Many initiatives have been discovered due to this real-time surveillance surrounding public health informatics. Real-time surveillance in the health and informatics field has used the information from social media to predict the outbreak of diseases as well as to look after the diseases. There is no doubt in the fact that the availability of the data from social media in the recent past, especially the data from Twitter, has offered the researchers real-time syndromic surveillance in making quick analyses and conclusions in investigating the disease outbreak. The paper will get to know about the recent work of machine learning trends and text classification that has been utilized by the surveillance system by using the data from social media in the field of healthcare. Apart from this, the paper has also discussed the various limitations and challenges by taking into account the future direction that can be considered in this domain further.

Deep Neural Network [16].

…

Figures - available via license: Creative Commons Attribution 4.0 International

Content may be subject to copyright.

Available via license: CC BY 4.0

Content may be subject to copyright.

European Journal of Engineering and Technology Research

ISSN: 2736-576X

DOI: http://dx.doi.org/10.24018/ejeng.2022.7.6.2914

Vol 7 | Issue 6 | November 2022

Abstract — One of the most popular domains that have caught

the attention of researchers is real-time surveillance in the

health and informatics segment. Many initiatives have been

discovered due to this real-time surveillance surrounding public

health informatics. Real-time surveillance in the health and

informatics field has used the information from social media to

predict the outbreak of diseases as well as to look after the

diseases. There is no doubt in the fact that the availability of the

data from social media in the recent past, especially the data

from Twitter, has offered the researchers real-time syndromic

surveillance in making quick analyses and conclusions in

investigating the disease outbreak. The paper will get to know

about the recent work of machine learning trends and text

classification that has been utilized by the surveillance system by

using the data from social media in the field of healthcare. Apart

from this, the paper has also discussed the various limitations

and challenges by taking into account the future direction that

can be considered in this domain further.

Keywords — Disease Prediction, Health Prediction,

Instagram, Machine Learning, Outbreak, Social Media,

Surveillance Systems, Twitter.

I. INTRODUCTION

To enhance public health surveillance, the use of health

information available on the internet has been seen as an

opportunity. The health and surveillance system has always

depended on the established system of mandatory as well as

voluntary reporting of infectious diseases by the doctors in

the laboratories [1]. As of now, social media data enables

direct access to the data that would help in the surveillance

epidemiology used to monitor the various public health

threats like new diseases or pandemic-related early-level

warnings. However, whether data from social media and the

internet would help analyze potential public health threats

remains a question. The paper has highlighted that there is a

wide-reaching application in the field of public health

surveillance in this century with the challenges of utilizing

the imaging surveillance system for the sake of infectious

disease epidemiology such as the specific resource needed,

the technical essentials, and the acceptance of the public

health practitioner as well as a policymaker [2].

There are many machine learning algorithms such as the

deep neural network, Naive Bayes, Multinomial Naive

Bayes, etc. have been utilized and proposed for the sake of

the epidemic prediction classification approach at the time of

looking at the surveillance system in the health informatics

dormant [3].

Submitted on October 10, 2022.

Published on November 07, 2022.

C. Singh, VIT, Australia.

(e-mail: Chetanpal.singh vit.edu.au)

A. Research Objective

In the paper, one will get to deal with the latest trend based

on social media in the field of healthcare. Apart from this, the

overview of the machine learning algorithm used for

monitoring the data is also discussed in the paper [3].

Listed below are the Research questions that you should go

through.

RQ1: what type of machine learning has become popular

among the authors of the various research papers at the time

of developing social media-based surveillance systems in the

field of the health sector?

RQ2: what are some of the most popular social media data

that are used for civilians in the field of healthcare domain?

RQ3: what is the implementation of a social media-based

surveillance system in the field of health informatics?

RQ4: whether there are any challenges experienced by the

syndromic surveillance system by the inclusion of the data

from social media?

B. Research Motivation

One upon reading this paper will get to have a look at the

following contribution that was not there in the previous

paper. The concept of article selection query taken from

different digital library databases for choosing the relevant

article is there in this paper [4]. The research paper has

discussed the overview of the popular machine learning

classification algorithm related to social media-based

surveillance systems in the field of healthcare. The paper has

also conducted its statistical analysis on the social media

platform as well as health topics that have been studied by the

particular articles [5].

C. Research Gap

Social media data has a major role to play in making

healthcare decisions even though social media has a lot of

usage in other sectors. To reinforce the capability of the

traditional syndromic surveillance system and the early

detection of the disease and immediate public health

response, there is a need for new approaches and

technologies. To review the different surveillance systems

that utilize social media data, many research papers have been

prepared. All these papers have successfully covered the

various data sources, technologies, algorithms, application,

and evaluations. According to a recent review of the

surveillance system in the field of health informatics using

social media, the researchers have not been much impressed

with the development [2]. The paper will give a complete

R. Thakkar, VIT, Australia.

(e-mail: rahul.thakkar vit.edu.au)

J. Warraich, Holmesglen, Australia.

(e-mail: Jatinder.warraich holmesglen.edu.au).

Social Media-Based Surveillance Systems for Healthcare

using Machine Learning

Dr. Chetanpal Singh, Dr. Rahul Thakkar, and Jatinder Warraich

European Journal of Engineering and Technology Research

ISSN: 2736-576X

DOI: http://dx.doi.org/10.24018/ejeng.2022.7.6.2914

Vol 7 | Issue 6 | November 2022

analysis of machine learning technology and its approaches

in this field specifically in the recent past. Apart from this one

will also get to know about the challenges and future

directions of it.

II. LITERATURE REVIEW

A. Machine Learning Methods Utilized by Surveillance

Systems to Process Social Media Data

There is no doubt in fact about the growing popularity of

machine learning in the recent past in detecting the various

patterns in images on raw data. According to [6], he has

concluded the progression in machine learning offers

epidemiologists the to mine with the help of a broad set of

digital data. To detect the personal health experience as well

as the deep gramulator approach to enhance decisions at the

time of applying to the independent test set, there has been a

study of several supervised machine learning algorithms by

[7]. Another researcher [8] has provided a detailed analysis

of the conjunction of natural language processing as well as

machine learning with the various platforms of social media

to assist in the analysis of huge datasets for the sake of

population-level mental health research. Moreover, some

architecture is still popular among the various methodological

variations of machine learning.

There is no question of the requirement of labeled data sets

in predicting the output in an unsupervised algorithm like it

was required in a supervised classification algorithm. This is

the reason why the unsupervised classification method is

known to be a more popular alternative in analyzing the text;

however, this method is challenging in achieving the same

percentage of accuracy as a supervised method. The same

thing is seen when [9] provided tweet classification with the

help of supervised and unsupervised methods. [9] has

discussed topic modeling which is known to be one of the

best-supervised techniques that present control over topic

contents in contrast to the old classified specifically when it

is a naturally noisy media channel.

1) Multinomial naive bayes

To classify the Twitter content, one of the most popular

supervised classification approaches was followed which is

known as Multinomial Naive Bayes. [10] has come up with a

real-time allergy surveillance system that has helped in the

classification of tweets as either positive or negative and

when it is the positive tweet it highlights the person or any

other person who is the person that beholds all the allergy

symptoms. If it mentions things such as news, and

advertisements for general awareness of allergies then the

tweet is classified as negative. The author has come to the

conclusion that the Naive Bayes Multinomial model with an

F measure is the best solution for the text classification

performance. On the other hand, [11] has utilized machine

learning methods so that they can classify the tweets based on

personal or news related.

The author also went with classification of personal tweets

into a couple of categories that are negative or neutral tweets.

The NBM has provided the best result and is known to be

better than the other two techniques used. The classifier such

as in Naive Bayes and SVM has not produced satisfactory

results as Naive Bayes Multinomial model [12].

Fig. 1. Architecture of MNB [10], a) Naive bayes supervised approach; b)

Naïve bayes with F features.

TABLE I: NAÏVE BAYES APPROACH OUTCOME

Methods

used

Application

Outcome

Reference

Multinomial

Naive Bayes

Real-time allergy

surveillance system

for classification of

tweets as either

positive or negative

Naive Bayes

Multinomial model

with an F measure

is the best solution

for the text

classification

performance

[10]

Multinomial

Naive Bayes,

Naive Bayes

and SVM

Classification of

tweets based on

personal, or news

NBM has provided

the best result

[11], [12]

2) Support vector machine

One upon reading the paper will get to know that the

dependency of input parameter and application is high for the

performance of the classification algorithm, however, when

the classification task was taken into account, SVM is best

suited. [13] with the help of the SVM classification model,

was successfully able to classify sick microblog and non-sick

microblog posts. The author also highlighted the time

consumption by SVM that is required for the classification

task was not affected at the time of arranging the microblog

increment in the consumption of time by KNN in completing

the classification task. According to [13], the best

classification method was known to be an SVM when it was

differentiated from the various other techniques of machine

learning. The SVM method plays a crucial role in classifying

the various data from social media on a range of health issues.

According to [14], the SVM classifier is best suited when

accuracy in the prediction of the class of tweets was taken

into account. At the same time, as per a study by [15], this

algorithm can reach 90% accuracy when the tweets from

social media were segregated as epidemiological and non-

epidemiological.

Fig. 2. Support Vector Machine [13].

European Journal of Engineering and Technology Research

ISSN: 2736-576X

DOI: http://dx.doi.org/10.24018/ejeng.2022.7.6.2914

Vol 7 | Issue 6 | November 2022

TABLE II: SVM APPROACH OUTCOME

Methods used

Application

Outcome

Reference

SVM

classification

model

Classification of sick

microblog and non-

sick microblog posts

SVM is

best suited

[13]

SVM

classification

model

Accurate prediction of

the class of tweets

SVM

classifier is

best suited

[14]

SVM

classification

model

Segregation of tweets

from social media as

epidemiological and

non-epidemiological.

SVM can

reach to

90%

accuracy

[15]

3) Deep neural network

The convolutional deep neural network has a crucial role

in the classification of text in the field of health. [16] has

utilized the various types of DNN that is the convolutional

neural network as well as bidirectional long short-term

memory by combining machine learning approaches that

would help in the classification of measles-related tweet

classification tasks and the researchers have pointed out that

the convolutional neural network has provided remarkable

result.

Fig. 3. Deep Neural Network [16].

TABLE III: CNN BLSTM APPROACH OUTCOME

Methods used

Application

Outcome

Reference

Convolutional

neural network

(CNN), bidirectional

long short term

memory (BLSTM)

Classification of

measles-related

tweet classification

tasks

CNN has

provided

remarkable

result

[16]

4) Decision tree

For predicting positive as well as negative tweets

surrounding the personal health experience, the decision tree

classifier has played a crucial role and has performed well

[17]. Apart from this, the approach of the decision tree

classifier was also utilized by [18] to differentiate tweets

surrounding the swine flu. [19] has achieved a result that is

average with the help of a decision tree classifier for the sake

of classifying the personal health experience tweets.

1) Logistic regression

Logistic regression is another popular choice; it is used for

data classification tasks among the other classification

algorithms. In a study by [20], Logistic regression towards

updating a record showed better F1 measure and recall as

compared to the SVM in terms of classification of relevant

and irrelevant tweets regarding asthma. Apart from this, the

usage of a maximum entropy classifier is also seen in the

research paper which is significantly utilized for the

classification of text [21]. [17] has also used the logistic

regression classifier for the research. Moreover, the illness

tweets were monitored with the help of the maximum entropy

[22], and another study performed for tweet classification

also used the maximum entropy [23].

Fig. 4. Decision Tree Algorithm [17].

TABLE IV: DECISION TREE CLASSIFIER APPROACH OUTCOME

Methods used

Application

Outcome

Reference

Decision tree

classifier

Predicting positive as

well as negative tweets

surrounding the personal

health experience

Decision Tree

gives good

performance

[17]

Decision tree

classifier

To differentiate tweets

surrounding the swine

flu

Good

performance

by decision

tree

[18]

Decision tree

classifier

For classifying the

personal health

experience tweets

Decision Tree

gives average

result

[19]

Fig. 5. Logistic Regression [20].

European Journal of Engineering and Technology Research

ISSN: 2736-576X

DOI: http://dx.doi.org/10.24018/ejeng.2022.7.6.2914

Vol 7 | Issue 6 | November 2022

TABLE V: LOGISTIC REGRESSION APPROACH OUTCOME

Methods used

Application

Outcome

Reference

logistic regression

Text

classification

[17]

Logistic

regression

For

classification of

relevant and

irrelevant

tweets

regarding

asthma

Logistic

regression

showed better

F1 measure and

recall as

compared to

SVM

[20]

Maximum entropy

classifier

Text

classification

[21], [23]

Maximum entropy

classifier

Monitoring

illness tweets

[22]

2) Naive bayes

The SVM and Naive Bayes technique was used by [14] for

the classification of data sets into mosquito-borne disease and

the tweets that have been considered for this were further

classified into three classes: symptoms, fear, and prevention

with the help of the same classification. According to [18],

the tweets have been classified so that it can differentiate

swine Flu-related text from the noise of all the tweets that

were not relevant by utilizing various machine learning

techniques like the decision tree and random forest.

Fig. 6. Naive Bayes [24].

TABLE VI: NAÏVE BAYES AND SVM APPROACH OUTCOME

Methods used

Application

Outcome

Reference

SVM and Naive

Bayes

classification of

data sets into

mosquito-borne

Further classification

of tweets was possible

[14]

Naive Bayes,

SVM

To differentiate

swine Flu-

related text

from the noise

of all the tweets

Naive Bias and SVM

has given the best

outcome with a

measure of 0.77 as

compared to decision

tree and random forest

[18]

Naive Bayes

For text

classification

Naïve Bayes gives

average performance

compared to other

classifiers

[24]

Naive Bayes

Dengue

suspected tweet

was marked as

irrelevant or

relevant

Naive Bayes classifier

gave the best

performance

[23]

The author has taken into account all the swine Flu-related

words and identified that the Naive Bias and SVM has given

the best outcome with a measure of 0.77. The classification

algorithm has become popular, and the authors have

considered using this for the text classification that shows the

average performance when it is compared to the various other

classifiers [24]. The best performance was given by the Naive

Bayes classifier when the dengue-suspected tweet was taken

into account and marked as irrelevant or relevant. For this,

various bigrams, emojis, trigrams, and location information

were also considered.[23]

3) Random forest

Social media text classification is utilized by the random

forest approach combined with conventional machine

learning approaches. [25] has experimented with the various

approaches with the naive Bayes classifier and the outcome

of the result came to be that the former is far better than the

Naive Bayes method. The various types of machine learning

approaches were also experimented with to deal with text

mining such as clustering, k means, etc. [13], [24]. The

grounds of similar words were utilized to group the tweet and

the tweet can also be differentiated based on the similarity

measure.

Fig. 7. Random Forest [25].

TABLE VII: RANDOM FOREST, NAÏVE BAYES, RANDOM FOREST AND K

MEANS APPROACH OUTCOME

Methods used

Application

Outcome

Reference

Random Forest

Classifier, Naïve

Bayes

Social media text

classification

Random Forest

gave better

performance

[25]

Random Forest

Classifier,

clustering, k means

Text mining for

differentiating the

tweets

Random Forest

gave better

performance

[13] [24]

4) K nearest neighbor

The utilization of KNN with Naive Bayes, SVM, and

Naive Bayes multinomial was done by [12] to figure out and

monitor messages reporting and discuss various types of

allergies. The author has come up with the conclusion that the

k-NN has better precision as compared to the other

approaches used by the author in identifying and assigning

the tweets whether it is an actual incident of allergy or an

awareness tweet.

European Journal of Engineering and Technology Research

ISSN: 2736-576X

DOI: http://dx.doi.org/10.24018/ejeng.2022.7.6.2914

Vol 7 | Issue 6 | November 2022

Fig. 8. K-nearest Neighbor [12].

TABLE VIII: KNN WITH NAÏVE BAYES, SVM AND NAÏVE BAYES

APPROACH OUTCOME

Methods used

Application

Outcome

Reference

KNN with Naive

Bayes, SVM,

and Naive Bayes

Multinomial

(NBM)

To figure out and

monitor

messages

reporting and

discuss various

types of allergies

K-NN has better

precision for

identifying tweets

as actual incident

of allergy or

awareness tweets

[12]

B. Various Kinds of Social Media Data Sources for Data

Collection

The number of social media users is increasing with time

and the various social media users share different information

thus the researcher needs to track the significant information

so that they can monitor the various activities in social media

related to public health purposes. Social media also has

exposure to different kinds of topics apart from public health.

There is no doubt in the fact that social media platform is

one of the best platforms where one can get a lot of

information about public health. But some researchers are

still in doubt that the data from social media would play a

crucial role in detecting an outbreak and analyzing the content

of social media for healthcare data [26]. Various posts on

social media and online search behaviour can act as a very

crucial source of information related to health outbreaks.

1) Twitter

One of the most popular microblogging services is Twitter

which has a lot of users who are posting to tweet, and it's

related to various posts that the unregistered user can also

read. Twitter is one of the leading microblogging services has

more than 300 million monthly active users and this is the

reason why the social media platform can be trusted and can

identify the various incidents of diseases in mankind. A set of

seven terms were used by [27] to gather tweets of more than

50000 and study and classify them by analyzing cardiac arrest

and resuscitation. For the sake of surveillance of disease,

some of the factors such as location, volume, time as well as

public perceptions are taken into account [28].

There is recent work that was done by [29] for the various

health organizations where the information was collected

from the social media platform and was used to figure out the

information at the time of the epidemic which has been very

helpful to the various health organizations.

To get the Ebola-related tweets that are considered in 4

topics such as risk factors, prevention, education, disease

trends as well as compassion, the usage of a natural language

processing approach has been considered [30]. [31] has

performed the study to identify the potential of the social

media platform as a new way of sharing information. The

various posts from Twitter are trusted by millions of users and

the media postings are also considered as a fast source to

identify the incidence of diseases in the population and hence

the researchers feel it's important to find an efficient method

so that the health-related tweets can be examined and

processed easily.

Moreover, for the partition vector representation, the

unsupervised method was utilized by [24]. [10] has done

research based on the allergy activities in their collection of

tweets that have allergy-related tweets mentioned in them.

Twitter was also helpful in detecting health problems like

respiratory, gastrointestinal, health-related problems. The

data from Twitter was used to study a variety of public health

issues like allergy, mosquito-borne disease, dengue etcetera

[5].

2) Instagram

In 2010, another popular social media platform was

founded known as Instagram. Instagram is a popular photo

and video sharing platform and since it was founded in 2010

the number of registered users rose to 800 hundred million

[32]. The reason why Instagram has provided satisfactory

results is that it is a photo and video sharing platform and the

data that was there in this platform can be a good source for

the surveillance of the disease [32]. [33] have studied the

Ebola-related social media posts on a couple of social media

platforms is Twitter and Instagram and the outcome has

highlighted that the best platform among the two for

communication and reaching the people at the time of health

crisis is Instagram.

C. Application of Social Media-Based Surveillance System

This section talks about the recent application of the

popular surveillance system in the health informatics domain.

The various recent applications include the prediction of

disease tracking misinformation and global awareness.

Fig. 9. Applications of social media-based surveillance system [3].

Application

of social

media-based

surveillance

system

Global

awareness of

the event

Syndromic

surveillance-

based

disease

prediction

Event-based

surveillance

and disease

prediction

Magnitude

estimation of

disease over

sometime

European Journal of Engineering and Technology Research

ISSN: 2736-576X

DOI: http://dx.doi.org/10.24018/ejeng.2022.7.6.2914

Vol 7 | Issue 6 | November 2022

1) Global awareness of the event

The surveillance system plays a crucial role in monitoring

general public awareness and also provides perception

regarding health events once the detection of the event has

been completed. The social media platform has user-

generated sentiment regarding the outbreak situation that

talks about the knowledge, attitude, and perception of the

people [34]. The users on social media can share their

sentiments, opinion, and response at the time of the outbreak.

2) Syndromic surveillance-based disease prediction

One of the best tools for predicting the outbreak for public

health purposes with the help of data that is gathered from

different sources is syndromic surveillance. That is all that

was acquired from the tool is targeted to minimize the

exposure of the disease in the population and to take proper

measures and prevent it. The information from social media

has been used in the past few years widely to study disease

incidences and to figure out the outbreak of the disease. The

data that was taken from social media is beneficial for the

officials of public health in detecting the outbreak earlier than

traditional means. Studies have shown that a surveillance

system in the healthcare domain helps to predict diseases for

the concerns of public health and the data of the surveillance

system is in the type of self-reported symptom. For early

warning and outbreak detection, the data from Twitter was

utilized as a tool for predicting the swine Flu, tuberculosis,

Ebola, and syphilis [35]-[38]. In another study, the

examination of disease incidence such as dengue as well as

typhoid fever was taken into account [24].

3) Event-based surveillance and disease prediction

Event-based surveillance is a process where the data is

captured very fast and in a proper manner about the various

events that are at serious risk to the health of the public. The

collection of data can come from diverse internet sources

such as reports from the media, online discussion platforms,

routine reporting systems, personal information or it can even

from rumors. Talking about the web forum contacts, the

definition of an event is defined as excessive news posting.

The importance of the event is proportional to the total

number of postings about the topic. Hence the effect of the

event can be determined on the topic diffusion by taking into

account the total number of postings on the topic. According

to a study by [39], it has been analyzed that the epidemic

surrounding the Zika virus has used Twitter Corpus.

The social media-based public health intelligence

monitoring technique to give the situation awareness of the

various threats related to public health required to assist

surveillance activities has risen remarkably over the last 20

years [5]. [40] has proposed a software system, DEFENDER

that includes potential health events detection functionality to

study the streams of Twitter and then generate the event based

on the output to the users who are in the front end-user

interface.

4) Magnitude estimation of disease over sometime

The magnitude of the issue can be easily determined with

the help of a surveillance system. The estimation of the future

of the various diseases can easily be done by planning the

allocation of resource treatment and prevention [2].

Moreover, surveillance system analysis can play a significant

role in figuring out the disease level over a certain period and

the assessment can be made on behalf of that.

III. METHODS AND MATERIALS

The research aims are to look into surveillance of social

media-based systems by using the technique of machine

learning to forecast illness in real-time or the situation that

arises in the near real-time. The research article selection

criteria were established to include were published in the year

2010 and 2018.

To compile thorough research in a bibliography format the

publications on social media- which is a surveillance-based

system in the area of healthcare, the following scientific

databases were searched: IEEE Xplore, Science Direct,

PubMed, and ACM Digital Library.

Now following and describing the query based in IEEE

Xplore database:

The following query was formed using an advanced search

of the IEEE Xplore database: ((("Abstract": surveillance) OR

"Document Title": leadership OR "Abstract": outbreak))

When filters were applied, 656 items (Journals & Magazines

and conferences) were found.

A. Describing and Running the Query-Based in the ACM

Digital Library System

ACM Digital Library searched for query: record in a

similar way. (((outbreak OR surveillance) OR

acmdlTitle(+surveillance) AND (health* OR illness) 265

articles were found using the search terms; in addition, we

conducted an advanced search of the ScienceDirect database

for the following terms in the title, abstract, and keywords:

(surveillance OR outbreak) AND (health* OR illness) AND

“social media”. As a result of the search keywords, 75 articles

were retrieved.

Finally, PubMed, which accesses the MEDLINE database,

was utilized to find papers. (surveillance [Title/Abstract]) OR

(((epidemic [Title/Abstract]) AND ((health[Title/Abstract])

OR ((disease[Title/Abstract])) AND ((social

media[Title/Abstract])) AND ((health[Title/Abstract]) OR

((disease[Title/Abstract]) AND ((social

media[Title/Abstract]) AND A total of 1240 articles were

discovered for further research, out of which we acquire

roughly 244 articles.

In addition, they are about our research and concluding the

results and statistics of the same from Google Scholar were

used for the same reflected recent trends. The words

“surveillance system”, “social media”, “machine learning”,

and “health informatics” were used in the study. Over time,

this graph clearly illustrates a rise in the number of articles in

healthcare. The use of social media data and the various

machine learning algorithms is the central area of concern

considered in the surveillance system.

Each of the papers found in our research which was around

1240 articles, was separately vetted based on the abstract, and

titles were also considered by each of the paper's authors. We

accepted them for further inquiry if the abstract or header, or

both, explained social media or web-based monitoring;

otherwise, they were dismissed. The second thing we did was

look at the papers that included the algorithm of the machine

learning techniques in their methodology.

European Journal of Engineering and Technology Research

ISSN: 2736-576X

DOI: http://dx.doi.org/10.24018/ejeng.2022.7.6.2914

Vol 7 | Issue 6 | November 2022

IV. RESULTS AND DISCUSSION

While considering the system of surveillance, which was

based on the data from social media for detection of the

outbreak or a breakeven point or health events that have

improved early identification of epidemics and related events,

other researchers have questioned the effectiveness of these

monitoring systems for the following reasons:

A. Privacy Issues

Issues that arise from getting data from social media

accounts, such as datasets that are private, are obtained for

health purposes utilizing social media. Even though social

media data is publicly available, individuals may not want

their postings or data to be used for the study [41]. Users'

expectations, based on public data and privacy was

considered a significant factor.

B. Verification of the Data Set

An issue with gathering data from social media is that it

must be validated. Standardization, verification, and control

issues may arise if unofficial data from social media is used

[42]. The truth connected with a vast quantity of diverse data

from social media was validated by [43].

The dataset is an essential part of the prediction model. The

dataset has a significant impact on the outcomes of prediction

models of the following:

i. Historical data

ii. Training data

iii. Testing data

All the dataset mentioned above is included in the

prediction models. A considerable quantity of training data is

necessary for testing predictions based on model training to

forecast models.

C. Noise

Noise is one of the most significant issues encountered

during data collection. The information gathered through

social networking sites might include unrelated data to the

goal. Such information on sickness words has no bearing on

one's health. For example, posts featuring the keyword “Irish

Flu” may trigger a slew of flu-related activity [20].

Unfortunately, there are situations when a user posts a status

and is mistakenly assumed to be infected when they are not.

In this way, false information might impact illness

management in the public health department.

To eliminate such noise in data; nevertheless, additional

training is required to obtain the relevant data for further

analysis.

D. Bias Based on Demographics

Although social media can help collect demographic

information like age, gender, and race. The determination is

complex and managing the algorithm so that public

healthcare efforts are directed in such a direction makes the

task complex.

The research also supports the semi-demographic factor

and excludes those who are not active on social media and the

old children who are least involved in such platforms [44].

One of the few studies [45] looked at the users' profiles

looking at the Facebook comments posted about sex and

discovered that males wrote more posts per person than

women. The majority of social media users are under the age

of thirty. The discovery that social media data is weighted

towards the frequently active user and the data from young

people further supports the bias.

E. Variability in Lexicon and Language

Though communication via social media aids in extracting

healthcare data, it is difficult to evaluate the language

semantically. Due to the informal and imprecise nature of

social media communications, it results in an incomplete

result. This constraint has been studied and adequately

researched by [46].

F. Low Confidence

Low confidence is another issue that occurs when using

social media data. The research presents a conspiracy

concerning the Zika virus pandemic on Reddit during a public

health crisis [47]. And for this, more training is required to

reach and approach the algorithm with the classification

feature.

According to a recent study [48], official websites are a

more reliable source of vaccination information than social

media. The quality of health-related data available on the

internet varies. Many social media data analysis tools may

indicate hyped data that something significant is happening.

However, this might reflect panic rather than actual illness

outbreaks.

Also, users may claim to have the flu when they have a

regular cold, or others may discuss the sickness owing to

heightened media coverage.

V. CONCLUSION

According to the data, Twitter was in the platform which

was most searched. SVM was also the most often utilized

classification approach. Furthermore, when data were

categorized into two classes, SVM was the best classifier.

This research looks at the most recent trends in public health

monitoring systems that use different algorithms. Compared

to traditional methods, it is found that social media-based

surveillance systems outperform them. The paper has also

spoken about how data collected can be further used to

improve monitoring systems in the field of public health.

A. Future Work

The combination of internet data with the current

circumstances such as weather, demographic data, and so on

to improve forecast outcomes.

Combining the factors such as sentiment, comments,

locations, and other input characteristics from user postings

with text material for improved the overall analysis.

Sorting user postings into multiple categories that give

varying weights to each class to increase predicting accuracy

and text analysis of images can be extended. There is a lot of

room for development in topic modeling to generate more

precise findings.

REFERENCES

[1] Du LJ, Tang L. Using a Machine Learning Approach to Monitor

COVID-19 Vaccine Adverse Events (VAE) from Twitter Data.

Vaccines, 2022; 10(103): 1-11.

European Journal of Engineering and Technology Research

ISSN: 2736-576X

DOI: http://dx.doi.org/10.24018/ejeng.2022.7.6.2914

Vol 7 | Issue 6 | November 2022

[2] Aiello E, Renson A, Zivich PN. Social Media and Internet-Based

Disease Surveillance for Public Health. Annu. Rev. Public Health,

2020; 41: 101-118.

[3] Gupta, Katarya R. Social media based surveillance systems for

healthcare using machine learning: A systematic review. Journal of

Biomedical Informatics, 2020; 108: 103500.

[4] Hossein Abad ZS, Kline A, Sultana M, Noaeen M. Digital public health

surveillance: a systematic scoping review. NPJ Digital Medicine, 2021;

4(41): 1-13.

[5] Chiolero, Buckeridge D. Glossary for public health surveillance in the

age of data science. Journal of Epidemiology Community Health, 2020;

74(7): 612-616.

[6] Bates M. Tracking Disease: Digital Epidemiology Offers New Promise

in Predicting Outbreaks. IEEE Pulse, 2017; 8: 18-22.

[7] Calix R, Gupta R, Gupta M, Jiang K. Deep gramulator: Improving

precision in the classification of personal health-experience tweets with

deep learning. 2017 IEEE International Conference on Bioinformatics

and Biomedicine (BIBM); 2017.

[8] Mike, Daniel C. Social media, big data, and mental health: current

advances and ethical implications. Current Opinion in Psychology,

2016; 9: 77-82.

[9] Sousa L, de Mello R, Cedrim D, Garcia A, Missier P, Uchôa A, Oliveira

A, Romanovsky A. VazaDengue: An information system for

preventing and combating mosquito-borne diseases with social

networks. Information Systems, 2018; 75: 26-42.

[10] Ji X, Chun S, Geller J. Monitoring public health concerns using twitter

sentiment classifications. 2013 IEEE International Conference on

Healthcare Informatics, IEEE; 2013.

[11] Lee K, Agrawal A, Choudhary A. Mining social media streams to

improve public health allergy surveillance. Proceedings of the 2015

IEEE/ACM International Conference on Advances in Social Networks

Analysis and Mining, 2015.

[12] Nargund K, Natarajan S. Public health allergy surveillance using

micro-blogs. 2016 Int. Conf. Adv. Comput. Commun. Informatics,

ICACCI; 2016.

[13] Yang N, Cui X, Hu C, Zhu W, Yang C. Chinese social media analysis

for disease surveillance. 2014 International Conference on

Identification, Information and Knowledge in the Internet of Things,

IEEE; 2014.

[14] Jain V, Kumar S. Effective surveillance and predictive mapping of

mosquito-borne diseases using social media. J. Comput. Sci., 2018; 25:

406-415.

[15] Espina K, Regina M, Estuar J. Infodemiology for Syndromic

Surveillance of Dengue and Typhoid Fever in the Philippines. Proc.

Comput. Sci., 2017; 121: 554-561.

[16] Du J, Tang L, Xiang Y, Zhi D, Xu J, Song H, Tao C. Public perception

analysis of tweets during the 2015 measles outbreak: Comparative

study using convolutional neural network models. J. Med. Internet

Res., 2018; 20: 1-11.

[17] Jiang K, Gupta R, Gupta M, Calix R, Bernard G. Identifying Personal

Health Experience Tweets with Deep Neural Networks* HHS Public

Access. 2017 39th Annual International Conference of the IEEE

Engineering in Medicine and Biology Society (EMBC); 2017.

[18] Kumar V, Kumar S. An Effective Approach to Track Levels of

Influenza-A (H1N1) Pandemic in India Using Twitter. Procedia

Computer Science, 2015; 70: 801-807.

[19] Calix R, Gupta R, Gupta M, Jiang K. Deep gramulator: Improving

precision in the classification of personal health-experience tweets with

deep learning. Proc.-2017 IEEE Int. Conf. Bioinforma. Biomed. BIBM;

January 2017.

[20] Korkontzelos, Piliouras D, Dowsey A, Ananiadou S. Boosting drug

named entity recognition using an aggregate classifier. Artif. Intell.

Med., 2015; 65: 145-153.

[21] Zhang W, Ram S, Burkart M, Pengetnze Y. Extracting signals from

social media for chronic disease surveillance. Proceedings of the 6th

International Conference on Digital Health Conference; 2016.

[22] Mowery. Twitter Influenza Surveillance: Quantifying Seasonal

Misdiagnosis Patterns and their Impact on Surveillance Estimates.

Online J Public Heal. Inf., 2016; 8(3).

[23] Nsoesie E, Flor L, Hawkins J, Maharana A, Skotnes T, Marinho F,

Brownstein J. Social media as a sentinel for disease surveillance: what

does sociodemographic status have to do with it? PLoS Currents, 2016;

[24] Dai X, Bikdash M, Meyer B. From social media to public health

surveillance: Word embedding based clustering method for twitter

classification. SoutheastCon 2017, IEEE; 2017.

[25] Sousa, de Mello R, Cedrim D, Garcia A, Missier P, Uchôa A, Oliveira

A, Romanovsky A. VazaDengue: An information system for

preventing and combating mosquito-borne diseases with social

networks. Inf. Syst., 2018; 75: 26-42.

[26] Chaudhary S, Naaz S. Use of big data in computational epidemiology

for public health surveillance. 2017 International Conference on

Computing and Communication Technologies for Smart Nation

(IC3TSN), IEEE; 2017.

[27] Bosley J, Zhao N, Hill S, Shofer F, Asch D, Becker L, Merchant R.

Decoding twitter: Surveillance and trends for cardiac arrest and

resuscitation communication. Resuscitation, 2013; 84: 206-212.

[28] Stefanidis, Vraga E, Lamprianidis G, Radzikowski J, Delamater P,

Jacobsen K, Pfoser D, et al. Zika in Twitter: temporal variations of

locations, actors, and concepts. JMIR public health and surveillance,

2017; 3(2): e6925.

[29] Rudra, Sharma A, Ganguly N, Imran M. Classifying information from

microblogs during epidemics. Proceedings of the 2017 international

conference on digital health; 2017.

[30] Edd, Rn S. What can we learn about the Ebola outbreak from tweets?

Am. J. Infect. Control., 2015; 43: 563-571.

[31] Kwak H, Lee C, Park H, Moon S. What is Twitter, a Social Network or

a News Media? Arch. Zootec., 2011; 11: 297-300.

[32] Systrom K. Strengthening our commitment to safety and kindness for

800 million. Instagram Press, 2017. Accessed 9 March 2022. [Internet].

Available: https://instagram.tumblr.com/post/165759350412/170926-

news.

[33] Guidry J, Jin Y, Orr C, Messner M, Meganck S. Ebola on Instagram

and Twitter: How health organizations address the health crisis in their

social media engagement. Public Relat. Rev., 2017; 43: 477-486.

[34] Tang L, Bie B, Zhi D. Tweeting about measles during stages of an

outbreak: A semantic network approach to the framing of an emerging

infectious disease. American Journal of Infection Control, 2018;

46(12), 1375–1380. https://doi.org/10.1016/j.ajic.2018.05.019.

[35] Kostkova P, Szomszor M, St. Luis C. #swineflu: The Use of Twitter as

an EarlyWarning and Risk Communication. ACM Transactions on

Management Information Systems, 2014; 5(2), 1–25.

https://doi.org/10.1145/2597892X.

[36] Zhou, Ye J, Feng Y, Tuberculosis surveillance by analyzing google

trends. IEEE Trans. Biomed. Eng., 2011; 58: 2247-2254.

[37] Yom-Tov E. Ebola data from the Internet: An opportunity for

syndromic surveillance or a news event? Proceedings of the 5th

international conference on digital health; 2015.

[38] Young S, Mercer N, Weiss R, Torrone E, Aral S. Using social media

as a tool to predict syphilis. Prev. Med. (Baltim), 2018; 109: 58-61.

[39] Nolasco D, Oliveira J. Subevents Detection through Topic Modeling in

Social Media Posts. Future Generation Computer Systems, 2018; 93:

290-303.

[40] Thapen, Simmie D, Hankin C, Gillard J. DEFENDER: detecting and

forecasting epidemics using novel data-analytics for enhanced

response. PloS one, 2016; 11(5): e0155417.

[41] Mckee R. Ethical issues in using social media for health and health care

research. Health Policy (New. York), 2013; 110: 298-301.

[42] Blouin-Genest G, Miller A. The politics of participatory epidemiology:

Technologies, social media and influenza surveillance in the US. Heal.

Policy Technol., 2017; 6: 192-197.

[43] Bodnar T, Salathé M. Validating models for disease detection using

twitter. Proceedings of the 22nd International Conference on World

Wide Web; 2013.

[44] Charles-Smith L, Reynolds T, Cameron M, Conway M, Lau E, Olsen

J, Pavlin J, et al. Using social media for actionable disease surveillance

and outbreak management: A systematic literature review. PloS one,

2015; 10(10): e0139701.

[45] Strekalova Y. Emergent health risks and audience information

engagement on social media. Am. J. Infect. Control., 2016; 44: 363-

365.

[46] Limsopatham, Collier N. Towards the semantic interpretation of

personal health messages from social media. Proceedings of the ACM

First International Workshop on Understanding the City with Urban

Informatics; 2015.

[47] Kou Y, Gui X, Chen Y, Pine K. Conspiracy Talk on Social Media:

Collective Sensemaking during a Public Health Crisis. Proc. ACM

Human-Computer Interact, 2017; 1: 1-21.

[48] Cataldi J, Dempsey A, O'Leary S. Measles, the media, and MMR:

Impact of the 2014–15 measles outbreak. Vaccine, 2016; 34: 6375-

6380.

ResearchGate has not been able to resolve any citations for this publication.

Using a Machine Learning Approach to Monitor COVID-19 Vaccine Adverse Events (VAE) from Twitter Data

Article

Full-text available

Jan 2022

Social media can be used to monitor the adverse effects of vaccines. The goal of this project is to develop a machine learning and natural language processing approach to identify COVID-19 vaccine adverse events (VAE) from Twitter data. Based on COVID-19 vaccine-related tweets (1 December 2020–1 August 2021), we built a machine learning-based pipeline to identify tweets containing personal experiences with COVID-19 vaccinations and to extract and normalize VAE-related entities, including dose(s); vaccine types (Pfizer, Moderna, and Johnson & Johnson); and symptom(s) from tweets. We further analyzed the extracted VAE data based on the location, time, and frequency. We found that the four most populous states (California, Texas, Florida, and New York) in the US witnessed the most VAE discussions on Twitter. The frequency of Twitter discussions of VAE coincided with the progress of the COVID-19 vaccinations. Sore to touch, fatigue, and headache are the three most common adverse effects of all three COVID-19 vaccines in the US. Our findings demonstrate the feasibility of using social media data to monitor VAEs. To the best of our knowledge, this is the first study to identify COVID-19 vaccine adverse event signals from social media. It can be an excellent supplement to the existing vaccine pharmacovigilance systems.

Digital public health surveillance: a systematic scoping review

Article

Full-text available

Mar 2021

The ubiquitous and openly accessible information produced by the public on the Internet has sparked an increasing interest in developing digital public health surveillance (DPHS) systems. We conducted a systematic scoping review in accordance with the PRISMA extension for scoping reviews to consolidate and characterize the existing research on DPHS and identify areas for further research. We used Natural Language Processing and content analysis to define the search strings and searched Global Health, Web of Science, PubMed, and Google Scholar from 2005 to January 2020 for peer-reviewed articles on DPHS, with extensive hand searching. Seven hundred fifty-five articles were included in this review. The studies were from 54 countries and utilized 26 digital platforms to study 208 sub-categories of 49 categories associated with 16 public health surveillance (PHS) themes. Most studies were conducted by researchers from the United States (56%, 426) and dominated by communicable diseases-related topics (25%, 187), followed by behavioural risk factors (17%, 131). While this review discusses the potentials of using Internet-based data as an affordable and instantaneous resource for DPHS, it highlights the paucity of longitudinal studies and the methodological and inherent practical limitations underpinning the successful implementation of a DPHS system. Little work studied Internet users’ demographics when developing DPHS systems, and 39% (291) of studies did not stratify their results by geographic region. A clear methodology by which the results of DPHS can be linked to public health action has yet to be established, as only six (0.8%) studies deployed their system into a PHS context.

Glossary for public health surveillance in the age of data science

Article

Full-text available

Apr 2020

Public health surveillance is the ongoing systematic collection, analysis and interpretation of data, closely integrated with the timely dissemination of the resulting information to those responsible for preventing and controlling disease and injury. With the rapid development of data science, encompassing big data and artificial intelligence, and with the exponential growth of accessible and highly heterogeneous health-related data, from healthcare providers to user-generated online content, the field of surveillance and health monitoring is changing rapidly. It is, therefore, the right time for a short glossary of key terms in public health surveillance, with an emphasis on new data-science developments in the field.

Tweeting about measles during stages of an outbreak: A semantic network approach to the framing of an emerging infectious disease

Article

Full-text available

Jun 2018
AM J INFECT CONTROL

Background: The public increasingly uses social media not only to look for information about emerging infectious diseases (EIDs), but also to share opinions, emotions, and coping strategies. Identifying the frames used in social media discussion about EIDs will allow public health agencies to assess public opinions and sentiments. Method: This study examined how the public discussed measles during the measles outbreak in the United States during early 2015 that originated in Disneyland Park in Anaheim, CA, through a semantic network analysis of the content of around 1 million tweets using KH coder. Results: Four frames were identified based on word frequencies and co-occurrence: news update, public health, vaccination, and political. The prominence of each individual frame changed over the corse of the pre-crisis, initial, maintenance, and resolution stages of the outbreak. Conclusions: This study proposed and tested a method for assessing the frames used in social media discussions about EIDs based on the creation, interpretation, and quantification of semantic networks. Public health agencies could use social media outlets, such as Twitter, to assess how the public makes sense of an EID outbreak and to create adaptive messages in communicating with the public during different stages of the crisis.

Public Perception Analysis of Tweets during the 2015 Measles Outbreak Using a Neural Network Model (Preprint)

Article

Full-text available

Nov 2017
J MED INTERNET RES

Background: Timely understanding of public perceptions allows public health agencies to provide up-to-date responses to health crises such as infectious diseases outbreaks. Social media such as Twitter provide an unprecedented way for the prompt assessment of the large-scale public response. Objective: The aims of this study were to develop a scheme for a comprehensive public perception analysis of a measles outbreak based on Twitter data and demonstrate the superiority of the convolutional neural network (CNN) models (compared with conventional machine learning methods) on measles outbreak-related tweets classification tasks with a relatively small and highly unbalanced gold standard training set. Methods: We first designed a comprehensive scheme for the analysis of public perception of measles based on tweets, including 3 dimensions: discussion themes, emotions expressed, and attitude toward vaccination. All 1,154,156 tweets containing the word "measles" posted between December 1, 2014, and April 30, 2015, were purchased and downloaded from DiscoverText.com. Two expert annotators curated a gold standard of 1151 tweets (approximately 0.1% of all tweets) based on the 3-dimensional scheme. Next, a tweet classification system based on the CNN framework was developed. We compared the performance of the CNN models to those of 4 conventional machine learning models and another neural network model. We also compared the impact of different word embeddings configurations for the CNN models: (1) Stanford GloVe embedding trained on billions of tweets in the general domain, (2) measles-specific embedding trained on our 1 million measles related tweets, and (3) a combination of the 2 embeddings. Results: Cohen kappa intercoder reliability values for the annotation were: 0.78, 0.72, and 0.80 on the 3 dimensions, respectively. Class distributions within the gold standard were highly unbalanced for all dimensions. The CNN models performed better on all classification tasks than k-nearest neighbors, naïve Bayes, support vector machines, or random forest. Detailed comparison between support vector machines and the CNN models showed that the major contributor to the overall superiority of the CNN models is the improvement on recall, especially for classes with low occurrence. The CNN model with the 2 embedding combination led to better performance on discussion themes and emotions expressed (microaveraging F1 scores of 0.7811 and 0.8592, respectively), while the CNN model with Stanford embedding achieved best performance on attitude toward vaccination (microaveraging F1 score of 0.8642). Conclusions: The proposed scheme can successfully classify the public's opinions and emotions in multiple dimensions, which would facilitate the timely understanding of public perceptions during the outbreak of an infectious disease. Compared with conventional machine learning methods, our CNN models showed superiority on measles-related tweet classification tasks with a relatively small and highly unbalanced gold standard. With the success of these tasks, our proposed scheme and CNN-based tweets classification system is expected to be useful for the analysis of tweets about other infectious diseases such as influenza and Ebola.

Social Media based Surveillance Systems for Healthcare using Machine Learning: A Systematic Review

Article

Jul 2020

Background Real-time surveillance in the field of health informatics has emerged as a growing domain of interest among worldwide researchers. Evolution in this field has helped in the introduction of various initiatives related to public health informatics. Surveillance systems in the area of health informatics utilizing social media information have been developed for early prediction of disease outbreaks and to monitor diseases. In the past few years, the availability of social media data, particularly Twitter data, enabled real-time syndromic surveillance that provides immediate analysis and instant feedback to those who are charged with follow-ups and investigation of potential outbreaks. In this paper, we review the recent work, trends, and machine learning(ML) text classification approaches used by surveillance systems seeking social media data in the healthcare domain. We also highlight the limitations and challenges followed by possible future directions that can be taken further in this domain. Methods To study the landscape of research in health informatics performing surveillance of the various health-related data posted on social media or web-based platforms, we present a bibliometric analysis of the 1240 publications indexed in multiple scientific databases(IEEE, ACM Digital Library, ScienceDirect, PubMed) from the year 2010-2018. The papers were further reviewed based on the various machine learning algorithms used for analyzing health-related text posted on social media platforms. Findings Based on the corpus of 148 selected articles, the study finds the types of social media or web-based platforms used for surveillance in the healthcare domain, along with the health topic(s) studied by them. In the corpus of selected articles, we found 26 articles were using machine learning technique. These articles were studied to find commonly used ML techniques. The majority of studies (24%) focused on the surveillance of flu or influenza-like illness(ILI). Twitter (64%) is the most popular data source to perform surveillance research using social media text data, and Support Vector Machine(SVM) (33%) being the most used ML algorithm for text classification. Conclusions The inclusion of online data in surveillance systems has improved the disease prediction ability over traditional syndromic surveillance systems. However, social media based surveillance systems have many limitations and challenges, including noise, demographic bias, privacy issues, etc. Our paper mentions future directions, which can be useful for researchers working in the area. Researchers can use this paper as a library for social media based surveillance systems in the healthcare domain and can expand such systems by incorporating the future works discussed in our paper.

Social Media– and Internet-Based Disease Surveillance for Public Health

Article

Apr 2020

Disease surveillance systems are a cornerstone of public health tracking and prevention. This review addresses the use, promise, perils, and ethics of social media– and Internet-based data collection for public health surveillance. Our review highlights untapped opportunities for integrating digital surveillance in public health and current applications that could be improved through better integration, validation, and clarity on rules surrounding ethical considerations. Promising developments include hybrid systems that couple traditional surveillance data with data from search queries, social media posts, and crowdsourcing. In the future, it will be important to identify opportunities for public and private partnerships, train public health experts in data science, reduce biases related to digital data (gathered from Internet use, wearable devices, etc.), and address privacy. We are on the precipice of an unprecedented opportunity to track, predict, and prevent global disease burdens in the population using digital data. Expected final online publication date for the Annual Review of Public Health, Volume 41 is April 1, 2020. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

Subevents Detection through Topic Modeling in Social Media Posts

Article

Sep 2018
FUTURE GENER COMP SY

Event detection has been a significant topic for a long time, since the onset development of pervasive systems. The ability to gather data from various sensors, in a diverse number of formats, is a challenge due to the continuous growth of data volume. Users of social media act as human sensors, providing data and information in real time about entities and events. Most of the research about event detection – using human or non-human sensors – concentrates only on identifying events. These models assume an event to be a single entity and ignoring that it can be composed of other new events over time. The detection of subevents enriches the understanding of the main event, contextualizing it and creating a powerful knowledge about the scenario. To capture the parts of an event and the information changing over time, we created a scalable and modular topic modeling based algorithm. It identifies subevents and creates labels to represent them more accurately. We evaluate the proposed sub-event detection approach using two large-scale Twitter corpus. The first one is related to Brazil's political protests scenario. The second analyzes the Zika Virus epidemic in the world. Our approach detected several subevents, most of them are related to real subevents. Due to the nature of social networks, with a minimum delay between an event occurrence and its dissemination, these results can open an opportunity for temporal tracking of emergence and outbreak scenarios.

VazaDengue: An Information System for Preventing and Combating Mosquito-Borne Diseases with Social Networks

Article

Feb 2018
INFORM SYST

Using social media as a tool to predict syphilis

Article

Dec 2017

Syphilis rates have been rapidly rising in the United States. New technologies, such as social media, might be used to anticipate and prevent the spread of disease. Because social media data collection is easy and inexpensive, integration of social media data into syphilis surveillance may be a cost-effective surveillance strategy, especially in low-resource regions. People are increasingly using social media to discuss health-related issues, such as sexual risk behaviors, allowing social media to be a potential tool for public health and medical research. This study mined Twitter data to assess whether social media could be used to predict syphilis cases in 2013 based on 2012 data. We collected 2012 and 2013 county-level primary and secondary (P&S) and early latent syphilis cases reported to the Center for Disease Control and Prevention, along with >8500 geolocated tweets in the United States that were filtered to include sexual risk-related keywords, including colloquial terms for intercourse. We assessed the relationship between syphilis-related tweets and actual case reports by county, controlling for socioeconomic indicators and prior year syphilis cases. We found a significant positive relationship between tweets and cases of P&S and early latent syphilis. This study shows that social media may be an additional tool to enhance syphilis prediction and surveillance.

Social Media-Based Surveillance Systems for Healthcare using Machine Learning

Abstract and Figures

Recommended publications

Social Media based Surveillance Systems for Healthcare using Machine Learning: A Systematic Review

Real-time processing of social media with SENTINEL: A syndromic surveillance system incorporating de...

Usage of social media in epidemic intelligence activities in the WHO, Regional Office for the Easter...

Prediction of Influenza-like Illness from Twitter Data and Its Comparison with Integrated Disease Su...