ArticlePDF Available

New Approach for Detecting Spammers on Twitter using Machine Learning Framework

April 2020
SSRN Electronic Journal 7(2):794-799

April 2020
7(2):794-799

Authors:

Harrisburg University

Social network sites involve billions of users around the world wide. User interactions with these social sites, like twitter have a tremendous and occasionally undesirable impact implications for daily life. The major social networking sites have become a target platform for spammers to disperse a large amount of irrelevant and harmful information. Twitter, it has become one of the most extravagant platforms of all time and, most popular microblogging services which is generally used to share unreasonable amount of spam. Fake users send unwanted tweets to users to promote services or websites that do not only affect legitimate users, but also interrupt resource consumption. Furthermore, the possibility of expanding invalid information to users through false identities has increased, resulting in malicious content. Recently, the detection of spammers and the identification of fake users and fake tweets on Twitter has become an important area of research in online social networks (OSN). In this Paper, proposed the techniques used to detect spammers on Twitter. In addition, a taxonomy of Twitter spam detection approaches is presented which classifies techniques based on their ability to detect false content, URL-based, spam on trending issues. Twelve to Nineteen different features, including six recently defined functions and two redefined functions, identified to learn two machine supervised learning classifiers, in a real time data set that distinguish users and spammers.

Content uploaded by Mustafa Shuaieb Sabri

Content may be subject to copyright.

New Approach for Detecting Spammers on Twitter

using Machine Learning Framework

Deepali Prakash Sonawane Dr. Baisa L. Gunjal

Amrutvahini College of Engineering Amrutvahini College of Engineering

Sangamner, India Sangamner, India

Abstract: Social network sites involve billions of users around the world wide. User interactions with these social sites, like twitter

have a tremendous and occasionally undesirable impact implications for daily life. The major social networking sites have become a

target platform for spammers to disperse a large amount of irrelevant and harmful information. Twitter, it has become one of the most

extravagant platforms of all time and, most popular microblogging services which is generally used to share unreasonable amount of

spam. Fake users send unwanted tweets to users to promote services or websites that do not only affect legitimate users, but also

interrupt resource consumption. Furthermore, the possibility of expanding invalid information to users through false identities has

increased, resulting in malicious content. Recently, the detection of spammers and the identification of fake users and fake tweets on

Twitter has become an important area of research in online social networks (OSN). In this Paper, proposed the techniques used to

detect spammers on Twitter. In addition, a taxonomy of Twitter spam detection approaches is presented which classifies techniques

based on their ability to detect false content, URL-based, spam on trending issues. Twelve to Nineteen different features, including six

recently defined functions and two redefined functions, identified to learn two machine supervised learning classifiers, in a real time

data set that distinguish users and spammers.

IndexTerms – Classification, Social Network Security, Intrusion, Spam Detection, Machine learning.

I. INTRODUCTION

Online social networking sites like Twitter, Facebook, Instagram and some online social networking companies have become

extremely popular in recent years. People spend a lot of time in OSN making friends with people they are familiar with or interested

in. The expanded interest of social sites grants users to gather bounteous measure of data and information about users. Large volumes

of information accessible on these sites additionally draw the attention of spammers. Twitter has quickly become an online hotspot for

obtaining continuous data about users. Twitter is an Online Social Network (OSN) where users can share anything and everything,

such as news, opinions, and even their moods. Several arguments can be held over different topics, such as politics, current affairs,

and important events. At the point when a client tweets something, it is right away passed on to his/her supporters, enabling them to

extended the got data at an a lot more extensive level. With the development of OSNs, the need to ponder and break down clients’

practices in online social stages has strengthened. Numerous individuals who don’t have a lot of data with respect to the OSN s can

without much of a stretch be deceived by the fraudsters. There is additionally an interest to battle and place a control on the

individuals who use OSNs just for commercials and in this manner spam others’ records.

Recently, the recognition of spam in social networking sites attracted the consideration of researchers. Spam detection is a

difficult task in maintaining the security of social networks It is basic to perceive spams in the OSN locales to spare clients from

different sorts of malevolent assaults and to protect their security and protection. These unsafe moves embraced by spammers cause

huge demolition of the network in reality. Twitter spammers have different targets, for example, spreading invalid data, counterfeit

news, bits of gossip, and unconstrained messages. Spammers accomplish their noxious destinations through promotions and a few

different methods where they bolster diverse mailing records and consequently dispatch spam messages haphazardly to communicate

their inclinations. These exercises cause unsettling influence to the first clients who are known as non-spammers. Furthermore, it

likewise diminishes the notoriety of the OSN stages. Subsequently, it is fundamental to plan a plan to spot spammers so restorative

endeavors can be taken to counter their malevolent exercises.

The ability to order useful information is essential for the academic and industrial world to discover hidden ideas and predict

trends on Twitter. However, spam generates a lot of noise on Twitter. To detect spam automatically, researchers applied machine

learning algorithms to make spam detection a classification problem. Ordering a tweet broadcast instead of a Twitter user as spam or

non-spam is more realistic in the real world.

II. LITERATURE SURVEY

B Nathan Aston, Jacob Liddle and Wei Hu*[2] describe the Twitter Sentiment in Data Streams with Perceptron in this system the

implementation feature reduction we were able to make our Perceptron and Voted Perceptron algorithms more viable in a stream

environment. In this paper, develop methods by which twitter sentiment can be determined both quickly and accurately on such a

large scale.

Q. Cao, M. Sirivianos, X. Yang, and T. Pregueiro [3] describe the Aiding the detection of fake accounts in large scale social online

services.in this paper, SybilRank, an effective and efficient fake account inference scheme, which allows OSNs to rank accounts

according to their perceived likelihood of being fake. It works on the extracted knowledge from the network so it detects, verify and

remove the fake accounts.

G. Stringhini, C. Kruegel, and G. Vigna [4] describe the Detecting spammers on social networks in this paper, Help to detect

spam Profiles even when they do not contact a honeyprofile.The irregular behavior of user profile is detected and based on that the

profile is developed to identify the spammer.

J. Song, S. Lee, and J. Kim [5] describe the Spam filtering in Twitter using sender receiver relationship in this paper a spam

filtering method for social networks using relation information between users and System use distance and connectivity as the features

which are hard to manipulate by spammers and effective to classify spammers.

IJRAR2004252

International Journal of Research and Analytical Reviews (IJRAR)

www.ijrar.org

794

K. Lee, J. Caverlee, and S. Webb [6] describe the Uncovering social spammers: social honeypots and machine learning in this

System analyzes how spammers who target social networking sites operate to collect the data about spamming activity, system

created a large set of honey-profiles on three large social networking sites.

K. Thomas, C. Grier, D. Song, and V. Paxson [7] describe the Suspended accounts in retrospect: An analysis of Twitter spam in

this paper the behaviors of spammers on Twitter by analyzing the tweets sent by suspended users in retrospect. An emerging spam-as-

a-service market that includes reputable and not-so-reputable affiliate programs, ad-based shorteners, and Twitter account sellers

. K.Thomas, C.Grier, J.Ma, V.Paxson, and D.Song [8] describe the Design and evaluation of a real-time URL spam filtering in this

paper, service Monarch is a real-time system for filtering scam, phishing, and malware URLs as they are submitted to web

services.Monarchs architecture generalizes to many web services being targeted by URL spam, accurate classification hinges on

having an intimate understanding of the Spam campaigns abusing a service.

X. Jin, C. X. Lin, J. Luo, and J. Han [9] describe the Social spam guard: A data mining based spam detection system for social

media networks in this paper ,Automatically harvesting spam activities in social network by monitoring social sensors with popular

user bases.Introducing both image.

III. PROPOSED METHODOLOGY

We evaluate the spam detection performance on our dataset by using machine learning algorithm. The process of Twitter spam

detection by using machine learning algorithms. Before classification, a classifier that contains the knowledge structure should be

trained with the prelabeled tweets. After the classification model gains the knowledge structure of the training data, it can be used to

predict a new incoming tweet.

The whole process consists of two steps:

1. Learning

2. Classifying.

First, features of tweets will be extracted and formatted as a vector. The class labels (spam or nonspam) could be get via some

other approaches (like manual inspection). Features and class label will be combined as one instance for training. One training tweet

can then be represented by a pair containing one feature vector, which represents a tweet, and the expected result, and the training set

is the vector. The training set is the input of machine learning algorithm, the classification model will be built after training process. In

the classifying process, timely captured tweets will be labeled by the trained classification model.

A. Architecture:

Figure1: Proposed System Architecture

1. The collection of tweets with respect to trending topics on Twitter. After storing the tweets in a particular

file format, the tweets are subsequently analyzed.

2. Labelling of spam is performed to check through all datasets that are available to detect the malignant URL.

3. Feature extraction separates the characteristics construct based on the language model that uses language as a

tool and helps in determining whether the tweets are fake or not.

4. The classification of data set is performed by shortlisting the set of tweets that is described by the set of

features provided to the classifier to instruct the model and to acquire the knowledge for spam detection.

5. The spam detection uses the classification technique to accept tweets as the input and classify the spam and nonspam.

IJRAR2004252

International Journal of Research and Analytical Reviews (IJRAR)

www.ijrar.org

795

B. Algorithm:

1. Support Vector Machine:

Support Vector Machine (SVM) is used to classify the tweets. SVM Support vector machines are mainly two

class classifiers, linear or non-linear class boundaries. The idea behind SVM is to form a hyper plane in between the data

sets to express which class it belongs to. The task is to train the machine with known data and then SVM find the

optimal hyper plane which gives maximum distance to the nearest training data points of any class.

Steps:

Step 1: Read the test image features and trained features.

Step 2: Check the all test features of image and also get all train features.

Step 3: Consider the kernel.

Step 4: Train the SVM using both features and show the output.

Step 5: Classify an observation using a Trained SVM Classifier.

2. Naïve Bayes Classification:

Naive Bayes algorithm is the algorithm that learns the probability of an object with certain features belonging to

a particular group/class. In short, it is a probabilistic classifier. The Naive Bayes algorithm is called naive because it

makes the assumption that the occurrence of a certain feature is independent of the occurrence of other features.

The Naive Bayesian classifier is based on Bayes theorem with the independence guess between predictors.

A Naive Bayesian model is easy to form, with no critical iterative parameter computation which makes it

particularly useful for very large datasets. Regardless of its simplicity, the Naive Bayesian classifier often does

particularly well and is widely used because it often outperforms more experienced classification methods.

C. Mathematical Model:

1) Working of Support Vector Machine:

We have k sub-spaces so that there are k classification results of sub-space to classifying breast cancer cells, called

CL_SS1,CL_SS2, ..., CL_SSk. Thus the problem is how to integrate all of those results. The simple integrating

way is to calculate the mean value:

= 1 ∑ =1 CLSSi ……………(1)

Or Weighted Mean Value

= 1 ∑ =1_ ……….(2)

Where Wi is the weight of classification result of subspace, i.e. breast cancer cells result , SSi and satisfies:

∑ =1 = 1 …………………(3)

The centroid is calculated as follows:

 

pixel in the hand region and k denotes the number of histopathological image pixels that represent

Where ( , ) represents the centroid of the hand, Xi and Yi are x and y coordinates of the i

only the hand portion. In the next step, the distance between the centroid and the pixel value was calculated. For distance, the following Euclidean distance was used:

=√( 2− 1)2( 2− 1)2

……………(5)

Where (x1, x2) and (y1, y2) represent the two co-ordinate values of histopathological image pixel.

2) Working of Naïve Bayes Classification:

It gives us a method to calculate the conditional probability, i.e., the probability of an event based on previous knowledge

available on the events. Here we will use this technique for breast cancer classification. More formally, Bayes’

Theorem is stated as the following equation:

(

)

( )

……………..(6)

) =

(

)

IJRAR2004252

International Journal of Research and Analytical Reviews (IJRAR) www.ijrar.org

796

Let us understand the statement first and then we will look at the proof of the statement. The components of the

above statement are:

( ) Probability (conditional probability) of occurrence of event A given the event B is true.

( ) : Probability of the occurrence of event B given the event A is true.

IV. RESULT AND DISCUSSION

Experimental evaluation is done to compare the naive bayes and support vector machine for evaluating the performance. The

experimental result evaluation, we have notation as follows:

TP: True positive (correctly predicted number of instance)

FP: False positive (incorrectly predicted number of instance),

TN: True negative (correctly predicted the number of instances as not required)

FN false negative (incorrectly predicted the number of instances as not required),

On the basis of this parameter, we can calculate four measurements

Accuracy = TP+TN/TP+FP+TN+FN

Figure 2: Accuracy graph

Sr.No

Support Vector

Naïve Bayes

Machine

83%

92%

Table 1: Comparative Table

V. CONCLUSION

In this paper, proposed system performed a review of techniques used for detecting spammers on Twitter. In addition, also

presented a taxonomy of Twitter spam detection approaches and categorized them as fake content detection, URL based spam

detection, spam detection in trending topics, and fake user detection techniques also compared the presented techniques based on

several features, such as user features, content features, graph features, structure features, and time features. Moreover, the techniques

were also compared in terms of their specified goals and datasets used. It is anticipated that the presented review will help researchers

find the information on state-of-the-art Twitter spam detection techniques in a consolidated form.

IJRAR2004252

International Journal of Research and Analytical Reviews (IJRAR) www.ijrar.org

797

REFERENCES

www.ijrar.org (E-ISSN 2348-1269, P- ISSN 2349-5138)

[1] Mohd Fazil and Muhammad Abulaish, “A Hybrid Approach for Detecting Automated Spammers in Twitter” IEEE Transactio n

Information Forensics and Security Vol.11 No.2 January 2019

[2] Ishaq Azhar Mohammed, "ARTIFICIAL INTELLIGENCE: THE KEY TO SELF-DRIVING IDENTITY GOVERNANCE",

International Journal of Creative Research Thoughts (IJCRT), ISSN:2320-2882, Volume.4, Issue 4, pp.664-667, November 2016,

Available at :http://www.ijcrt.org/papers/IJCRT1134112.pdf

[3] Sikender Mohsienuddin Mohammad, Surya Lakshmisri , "SECURITY AUTOMATION IN INFORMATION TECHNOLOGY",

International Journal of Creative Research Thoughts (IJCRT), ISSN:2320-2882, Volume.6, Issue 2, pp.901-905, June 2018,

Available at :http://www.ijcrt.org/papers/IJCRT1133434.pdf

[4] Nathan Aston, Jacob Liddle and Wei Hu*, Twitter Sentiment in Data Streams with Perceptron, in Journal of Computer and

Communications, 2014, Vol-2 No-11.

[5] Sudhir Allam, "BIG DATA MIGHT JUST CURE CANCER - THE RESEARCH AND THE REALITY", International Journal of

Creative Research Thoughts (IJCRT), ISSN:2320-2882, Volume.7, Issue 4, pp.820-825, December-2019, Available at

:http://www.ijcrt.org/papers/IJCRT1133998.pdf

[6] Q. Cao, M. Sirivianos, X. Yang, and T. Pregueiro, Aiding the detection of fake accounts in large scale social online services, in

Proc. Symp. Netw. Syst. Des. Implement. (NSDI), 2012, pp. 197210.

[7] Ishaq Azhar Mohammed, "Identity-based Encryption: From Identity and Access Management to Enterprise Privacy

Management", International Journal of Emerging Technologies and Innovative Research (www.jetir.org | UGC and issn

Approved), ISSN:2349-5162, Vol.4, Issue 9, page no. pp719-722, September-2017, Available at :

http://www.jetir.org/papers/JETIR1709107.pdf

[8] G. Stringhini, C. Kruegel, and G. Vigna, Detecting spammers on social networks, in Proc. 26th Annu. Comput. Sec. Appl. Conf.,

2010, pp. 19.

[9] J. Song, S. Lee, and J. Kim, Spam filtering in Twitter using sender receiver relationship, in Proc. 14th Int. Conf. Recent Adv.

Intrusion Detection, 2011, pp. 301317.

[10] K. Lee, J. Caverlee, and S. Webb, Uncovering social spammers: social honeypots + machine learning, in Proc. 33rd Int. ACM

SIGIR Conf. Res.Develop. Inf. Retrieval, 2010, pp. 435442.

[11] Ravi Teja Yarlagadda, "Implementation of DevOps in healthcare systems", International Journal of Emerging Technologies and

Innovative Research (www.jetir.org), ISSN:2349-5162, Vol.4, Issue 6, page no.537-541, June-2017, Available

:http://www.jetir.org/papers/JETIR1706100.pdf

[12] K. Thomas, C. Grier, D. Song, and V. Paxson, Suspended accounts in retrospect: An analysis of Twitter spam, in Proc. ACM

SIGCOMM Conf. Internet Meas., 2011, pp. 243258.

[13] Ishaq Azhar Mohammed, "Identity and Access Management for the Internet of Things", International Journal of Emerging

Technologies and Innovative Research (www.jetir.org), ISSN:2349-5162, Vol.5, Issue 5, page no.1299-1303, May-2018,

Available :http://www.jetir.org/papers/JETIR1805954.pdf

[14] K. Thomas, C. Grier, J. Ma, V. Paxson, and D. Song, Design and evaluation of a real-time URL spam filtering service, in Proc.

IEEE Symp. Sec. Privacy, 2011, pp. 447462.

[15] Manishaben Jaiswal “ SOFTWARE ARCHITECTURE AND SOFTWARE DESIGN” International Research Journal of

Engineering and Technology (IRJET) e-ISSN: 2395-0056, p-ISSN: 2395-0072, Volume: 06 Issue: 11, s. no -303 , pp. 2452-2454 , Nov

2019 Available at: https://www.irjet.net/archives/V6/i11/IRJET-V6I11303.pdf

[16] Manishaben Jaiswal "RISK ANALYSIS IN INFORMATION TECHNOLOGY" , International Journal of Scientific Research and

Engineering Development (IJSRED) , ISSN:2581-7175, Vol 2-Issue 6, P110, pp. 857-860, November - December 2019 Available

at: http://www.ijsred.com/volume2/issue6/IJSRED-V2I6P110.pdf

[17] Manishaben Jaiswal, Mehul Patel “THE LEARNING ON CRM IN ERP- WITH SPECIAL REFERENCES TO SELECTED

ENGINEERING COMPANIES IN GUJARAT”, International Journal of Management and Humanities Scopus (IJMH) , published

by Blue Eyes Intelligence Engineering & Sciences Publication (BEIESP), ISSN 2394-0913, Volume-4 Issue-8, April 2020, Pg-

117-126,Available At,http://www.ijmh.org/wp-content/uploads/papers/v4i8/H0798044820.pdf

[18] Sudhir Allam, "RESEARCH ON THE SECURE MEDICAL BIG DATA ECOSYSTEM BASED ON HADOOP", International

Journal of Creative Research Thoughts (IJCRT), ISSN:2320-2882, Volume.7, Issue 1, pp.815-819, March 2019, Available at

:http://www.ijcrt.org/papers/IJCRT1133997.pdf

[19] Ishaq Azhar Mohammed. (2019). A SYSTEMATIC LITERATURE MAPPING ON SECURE IDENTITY MANAGEMENT

USING BLOCKCHAIN TECHNOLOGY. International Journal of Innovations in Engineering Research and Technology, 6(5),

86–91. Retrieved from https://repo.ijiert.org/index.php/ijiert/article/view/2798

[20] X. Jin, C. X. Lin, J. Luo, and J. Han, Socialspamguard: A data mining based spam detection system for social media networks,

PVLDB, vol. 4, no. 12, pp. 14581461, 2011.

[21] Surya Lakshmisri, "SOFTWARE AS A SERVICE IN CLOUD COMPUTING", International Journal of Creative Research

Thoughts (IJCRT), ISSN:2320-2882, Volume.7, Issue 4, pp.182-186, December 2019, Available at

:http://www.ijcrt.org/papers/IJCRT1133471.pdf

[22] S. Ghosh et al., Understanding and combating link farming in the Twitter social network, in Proc. 21st Int. Conf. World Wide

Web, 2012, pp. 6170.

[23] H. Costa, F. Benevenuto, and L. H. C. Merschmann, Detecting tip spam in location-based social networks, in Proc. 28th Annu.

ACM Symp. Appl. Comput., 2013, pp. 724729.

[24] M. Tsikerdekis, Identity deception prevention using common contribution network data, IEEE Transactions on Information

[25] Ishaq Azhar Mohammed, "RISK-BASED ACCESS CONTROL MODEL: A SYSTEMATIC LITERATURE REVIEW",

International Journal of Creative Research Thoughts (IJCRT), ISSN:2320-2882, Volume.7, Issue 2, pp.794-797, May 2019,

Available at :http://www.ijcrt.org/papers/IJCRT1134133.pdf

[26] T. Anwar and M. Abulaish, Ranking radically influential web forum users, IEEE Transactions on Information Forensics and

Security, vol. 10, no. 6, pp. 12891298, 2015.

[27] Y. Boshmaf, I. Muslukhov, K. Beznosov, and M. Ripeanu, Design and analysis of social botnet, Computer Networks, vol. 57, no.

2, pp. 556578, 2013.

IJRAR2004252

International Journal of Research and Analytical Reviews (IJRAR) www.ijrar.org

798

[28] Ravi Teja Yarlagadda, "Understanding DevOps & bridging the gap from continuous integration to continuous delivery",

International Journal of Emerging Technologies and Innovative Research (www.jetir.org), ISSN:2349-5162, Vol.5, Issue 2, page

no.1420-1424, February-2018, Available :http://www.jetir.org/papers/JETIR1802284.pdf

[29] D. Fletcher, A brief history of spam, TIME, Tech. Rep., 2009.

[30] Y. Boshmaf, M. Ripeanu, K. Beznosov, and E. Santos-Neto, Thwarting fake osn accounts by predicting their victims, in Proc.

AISec., Denver, 2015, pp. 8189.

[31] Ishaq Azhar Mohammed, "Artificial Intelligence for Caregivers of Persons with Alzheimer’s Disease and Related Dementias:

Systematic Literature Review", International Journal of Emerging Technologies and Innovative Research (www.jetir.org | UGC

and issn Approved), ISSN:2349-5162, Vol.6, Issue 1, page no. pp741-744, January-2019, Available at :

http://www.jetir.org/papers/JETIR1901E97.pdf

[32] N. R. Amit A Amleshwaram, S. Yadav, G. Gu, and C. Yang, Cats: Characterizing automation of twitter spammers, in Proc.

[33] Sudhir Allam, "THE FUTURE OF URBAN MODELS IN THE BIG DATA AND AI ERA: A BIBLIOMETRIC ANALYSIS",

International Journal of Creative Research Thoughts (IJCRT), ISSN:2320-2882, Volume.6, Issue 1, pp.797-800, February-2018,

Available at :http://www.ijcrt.org/papers/IJCRT1133993.pdf

[34] K. Lee, J. Caverlee, and S. Webb, Uncovering social spammers: Social honeypots + machine learning, in Proc. SIGIR, Geneva,

2010, pp. 435 442. [18] G. Stringhini, C. Kruegel, and G. Vigna, Detecting spammers on social networks, in Proc . ACSAC,

Austin, Texas, 2010, pp. 19.

[35] Lakshmisri Surya,Ravi Teja Yarlagadda, "AI ECONOMICAL SMART DEVICE TO IDENTIFY COVID-19 PANDEMIC, AND

ALERT ON SOCIAL DISTANCING WHO MEASURES", International Journal of Creative Research Thoughts (IJCRT),

ISSN:2320-2882, Volume.8, Issue 5, pp.4152-4156, May 2020, Available at :http://www.ijcrt.org/papers/IJCRT2005556.pdf

[36] Ishaq Azhar Mohammed. (2019). CLOUD IDENTITY AND ACCESS MANAGEMENT – A MODEL PROPOSAL.

International Journal of Innovations in Engineering Research and Technology, 6(10), 1–8. Retrieved from

https://repo.ijiert.org/index.php/ijiert/article/view/2781

[37] H. Yu, M. Kaminsky, P. B. Gibbons, and A. Flaxman, Sybilguard: Defending against sybil attacks via social networks,

IEEE/ACM Transactions on Networking, vol. 16, no. 3, pp. 576589, 2008.

IJRAR2004252

International Journal of Research and Analytical Reviews (IJRAR)

www.ijrar.org

799

Comparative study of different machine learning models for detecting spam tweet

Conference Paper

Jan 2023

A Review: Twitter Spam Detection Techniques

Chapter

Jun 2023

One of the most well-liked social media is Twitter. Spam is one of the several issues that negatively affect users. The objective of this study is to provide an overview of different techniques used for detecting spam in twitter. The proposed framework mainly contains the comparison of four existing twitter spam detection techniques namely, machine learning, feature based detection, combinational algorithm, and deep learning. Machine learning detection uses techniques such as SVM, future engineering, machine learning framework, and semantic similarity function to assess spam. In feature based detection, metadata based, tweet based, user based, and graph based techniques are used to detect spammers. In combinatorial algorithm detection, Naive Bayes-SVM, K-nearest neighbour-SVM, random forest-SVM and RNN-Short term memory techniques are used to detect spam. Deep learning detection uses feature based, semantic cnn, convolution-short term memory nn, and deep learning convolution technique to identify spam. This paper covers relevant work and comparison of several anti spamming techniques.

Bi-Modal Meta-Classification of Tweet Spamicity Using Machine Learning Approach

Chapter

Nov 2022

Nowadays, social media plays a centric role in sharing and disseminating information on products and services. Social network users post messages for attracting followers and increasing popularity. This attracts spammers to post fake and untruthful opinions on products. So, a strong and dependable framework for distinguishing spam users is important. Most of the research on detecting spam focuses on either tweets or metadata from Twitter profiles. Generally, spam tweets are generated using automated tools. The existence and credibility of the users in the social network is an important parameter in deciding the tweet is spam or not. In this work, a meta-classification approach is used with dynamic features extracted from the tweets, and tweeter behavior. The proposed work confirms spamicity in tweets using an aggregated model with spamicity inference generated by LSTM and machine learning models. Machine learning models are selected as it learns to extract patterns and interpret results from large dataset efficiently. The identification of distinct behavioral patterns from tweets is potent in various practical applications such as automatic knowledge extraction fields and large-scale human content generated platforms. The performance of the bi-modal meta-classifier is tested with benchmark datasets, and the accuracy of 99.76% achieved is found to be better than some recent related works.

Identity and Access Management for the Internet of Things

Article

Full-text available

May 2018

Ishaq Azhar Mohammed

The main aim of this paper is to investigate the idea of identity and access control for Internet of Things applications. Organizations collect enormous amounts of data from their activities nowadays. This information is derived from transactions performed by a person or a paired device. The internet has now become the preferred method of contemporary communication, raising the need for a way to monitor and protect various connections [1]. To guarantee the system's credibility, interconnected equipment gathering data must be protected. A company should not only verify the identities of its network users and be able to track their activities, but also trust the technology that provides this information. The Internet of Things is a broad-based network of devices, which may interconnect and cooperate in the production of a range of services, anywhere and in any manner [1]. Balancing access control, authentication, and mobile identity management while interacting with other equipment, resources and infrastructure is a major issue for identity management. In the modern Internet communication environment, the maintenance of identities poses major difficulties. These difficulties on the internet are compounded by the unlimited number of users and anticipated resource limits [2]. Modern identity management systems focus primarily on the identities often used by end-users as well as on networked services. Even so, these identity management systems are developed because substantial resources are available and their application to the resource-laden Internet of things requires careful study. To effectively manage the myriad of applications and believe that the identity of a machine would be verified, businesses need to implement digital credential solutions with a robust foundation of trust [2,3]. Historically, this has been accomplished via the use of Public Key Infrastructure (PKI) and a smart card. Combining blockchain with public key infrastructure solutions enables the provision of identity and access management platforms for the internet of things (IoT). RFID security measures and different blockchain solutions provide are viable alternatives for securing IoT device authentication and authorization.

Artificial Intelligence for Caregivers of Persons with Alzheimer’s Disease and Related Dementias: Systematic Literature Review

Article

Full-text available

Jan 2019

Ishaq Azhar Mohammed

Artificial Intelligence (AI) has a tremendous great potential to enhance the healthcare and quality of life of people with Alzheimer's disease and its associated dementias (ADRD). So far, there has been a dearth of comprehensive literature assessments of the impact of AI on ADRD care [1]. This paper is designed to explore and evaluate AI impacts and that offer knowledge to help ADRD caregivers manage ADRDs. Improving our perspective of Alzheimer's disease outside health care may be seen as being akin to physical and intellectual impairments, which affect every aspect of a human being's life. Using this perspective as a starting point, policies are required to ensure that individuals with dementia and their caregivers have access to equipment, services, and other resources that will help them live as independently as possible. This facilitates the establishment of direct technology, funding streams, the provision of maintenance and monitoring support, the provision of new digital technologies programs for people with the disease, scientific resources, the utilization of data analytics to predict patterns of need, and the proactive identification of people who are at risk [1]. A range of ethical concerns, such as privacy, identity management, access and use, liabilities, rights, obligations, and interactions (including data sharing), between private companies and legal authorities, also need to be resolved to support technology-friendly prevention and care measures.

Identity-based Encryption: From Identity and Access Management to Enterprise Privacy Management

Article

Full-text available

Sep 2017

Ishaq Azhar Mohammed

The main purpose of this paper is to explore the concept of identity-based encryption which will focus on identity and access management to enterprise privacy. In today's society, data privacy is a very sensitive subject. Data may be stored or sent, encryption is a strong key component for the anonymity and protection of information [1]. It has two larger ranges: symmetric and asymmetric. Two keys for encrypting and decrypting are needed for asymmetric systems, whereas a single key for encryption and decryption is used in asymmetric systems. Identity-Based Encryption is a kind of asymmetric cryptography that is used to protect sensitive information. To simplify the certificate handling, the message is encrypted by arbitrary strings. The bulk of our communications are carried out online with the aid of amazing technological progress. This is why encryption is more essential than ever before. In today's world, we rely heavily on technological communications. For commercial and communication reasons, we transmit data, movies, images [1]. We communicate through internet services such as social networks and messaging applications. We're sending millions of emails, downloading lots of information. As a consequence, we place ourselves in a position of extreme vulnerability. If they take over our digital communications, hackers, malicious hackers or assailants may damage us. That is why we use a variety of cryptographic techniques to guarantee the security and integrity of our interactions [2]. Identity-Based Encryption is one of the most common types of encryption and provides comprehensive security for people and organizations. This article will examine in depth what identification-based encryption is and why many companies need it for security purposes.

Understanding DevOps & bridging the gap from continuous integration to continuous delivery

Article

Full-text available

Feb 2018

Ravi Teja Yarlagadda

DevOps is considered an emerging concept that is seen by various experts as a strong solution in eliminating the traditional division and barriers that exist between the operations and developers in many organizations today. As the adoption of Agile transition has occurred in the last few years, IT companies have begun to incorporate continuous integration concepts in their software development lifecycle, and therefore improving the efficiency and reliability of the overall development process [1]. The biggest advantage of DevOps is its quick and regular product updates in continuous delivery of software. These features allow fast responses to customers' evolving needs. This is a huge benefit for businesses looking at gaining a competitive advantage over other market players. Continuous delivery is dependent on efficient teamwork and automation to respond to various customer needs faster. It is now known that the time spent on formalizing and integrating processes, alone, does not accelerate delivery speed or promote overall organizational performance. If the whole delivery chain functions smoothly, efficiency in an organization will be achieved [1]. This paper will focus on the concept of DevOps as well as how the gaps from continuous integration to continuous delivery can be bridged. The information that will be covered will involve how the aspects of DevOps apply to different SDLC phases in terms of customer needs, how to switch from continuous integration to continuous delivery, including the various benefits. It briefly discusses different factors one must think about before implementing DevOps, including the advantages that one might expect.

Implementation of DevOps in healthcare systems

Article

Full-text available

Jun 2017

Ravi Teja Yarlagadda

Organizations have a critical mission to transform their IT and company operations and conform to IT operations to their strategic objectives. DevOps is a collection of techniques and strategies designed to allow production and IT teams to work together more closely. Nowadays, more and more companies are adopting DevOps because of the introduction of continuous delivery and software development domains. This paper discusses the steps that are now being taken in healthcare programs to implement DevOps methods and how they are helping them succeed in the U.S. Traditionally, IT functions were organized into distinct subunits that were quite independent. Following the recognition that the adoption of joint, cross-functional DevOps practices is required to meet consumer expectations and handle progressively complicated IT architectures, many organizations, teams have started to embrace DevOps; these teams organize and streamline operations to connect functions to better respond to customers' evolving needs and apply a continuous delivery methodology to product development [1]. This paper will focus on the DevOps progression, the whole concept of DevOps in healthcare, and its many advantages, which include meeting service and application goals while maintaining IT quality, effectiveness, adaptability, and improving healthcare IT practices, all at the same time. A lot of healthcare organizations are already progressively acknowledging the significance of DevOps as they continue to implement data-driven programs and use cutting-edge technology to help serve patients and keep costs down. Since healthcare providers have to do whatever it takes to be successful, they turn to DevOps for ways to be on the cutting edge of the new digital medical practice, especially with IT [1]. Examples of DevOps being used in sectors such as the financial and manufacturing sector, as well as retail and consumer applications are seen almost every day. Nevertheless, healthcare is in a position to gain from DevOps implementation, with enough room to execute, if done correctly. Several entities in the healthcare organizations are uncertain of where to begin, as to allocate their budget, how much it would cost, and how to become successful.

AI Economical Smart Device to Identify Covid-19 Pandemic, and Alert on Social Distancing WHO Measures

Article

Full-text available

May 2020

The COVID-19 pandemic has spread throughout the world and changed all facets of our everyday lives dramatically. The WHO and CDC expect substantial spikes in the rate of infection cases and deaths, although there are signs of hope due to the availability of vaccines that are currently administered. The situation is still bad in many countries and new ways of dealing with the pandemic will be significant. Inspired by the rapid application and development of AIs and Big Data in different fields, this paper aims to underline their significance to respond to the COVID-19 outbreak and avoid the severe consequences of the disease [1]. First, I will include a review of artificial intelligence and then describe its applications to combat COVID-19, some up with a smart device that will detect the disease and alert people to maintain social distance. Since the outbreak of the pandemic, various groups of researchers have been making rapid efforts to leverage a broad range of technology to tackle this threat worldwide [2]. This paper will offer new perspectives to epidemiologists and communities on the significance of AI and big data to help in reducing the spread of the disease. The use of AI-linked smart devices during this crisis reduces the potential spread of COVID-19 to other patients through early detection and alerting patients to maintain social distance protocol. The main aim of this research paper is to come up with an artificial intelligence smart device that is economical in identifying the Covid-19 pandemic and alerting people about social distance per the WHO measure [3].

Security Automation in Information Technology

Article

Full-text available

Jun 2018

Security automation has been a major issue for many companies in the fight against rising cyber threats enabled by new cloud network attacks and proliferating the Internet of Things. A recent survey by the threat detection and hunting company Fidelis Cybersecurity has revealed this trend among 300 CISOs, CIOs, CTOs, architects, engineers, and analysts studied in a range of industries. More than half of the professionals analyzed (57 percent) said that their companies are concerned with a lack of automation [21]. Cybersecurity automation is one of the developments in information technology. Automating human-driven, and repeatable processes will focus on the more productive problem-solving tasks within organizations and individuals. Focusing on these issues will foster innovation and contribute to a more robust organization from a cyber-security point of view. Automation also adds to the complexity of information systems in an organization and as malicious targets grow, cybersecurity initiatives must be prepared to implement automated cybersecurity solutions. As long as the information is available, the confidentiality, integrity, and availability of the cybersecurity programs must be safeguarded [2]. In most industrial industries, automation is the main force of transition. By 2030, the automation industry is expected to completely replace over 800 million workers and technology transforms our way of working and organizing and communicating with others. The almost constant occurrence of data breaches suggests that it does not stop so that organizations are unable to have long-term reservations regarding security automation concepts and capabilities. Security automation of IT security infrastructure is a priority and keeping information systems safe [9]. Automating policy enforcement, warning control, and prioritization and the preparation of incidences will increase the efficiency of businesses and reduce costs significantly. Through automating the analysis, response, and remediation of threats in its entirety, businesses can replicate the expertise and reasoning of seasoned cyber experts on an international basis, ensuring a greater overall degree of protection and compliance [7]. That is never the case today. For example, most organizations, due to the huge resources needed for performing audits, only audit a representative sample of their processes. For example, it is common practice in an organization to the only audit a few of them or even to audit the basic security configuration they all need to have if 50,000 laptop computers would be similarly configured. Given the audit tools required, these approaches are understandable. Methodologies have been developed over the years to safeguard data but the complexity required to ensure security still hasn't been changed. Analysts need to manually address threats without security automation. This often involves investigating and comparing the issue to the threat of information from the company to identify its validity, agree on a course of action, and then manually solve the problem-all with possibly millions of signals and often incomplete information [5]. Moreover, many of them are repetitive. Analysts also waste valuable time on repeated tasks, which preclude them from identifying more critical problems. Security automation works a great deal for the information technology team. If an alert appears, it determines instantly whether an action based on previous responses to similar incidents-is required, and if so, it can remedy the problem automatically [2]. Meanwhile, security analysts have a longer time in which they can focus on strategic planning, threats, and more thorough research, which adds value to the company.

A Hybrid Approach for Detecting Automated Spammers in Twitter

Article

Full-text available

Apr 2018

Identity Deception Prevention Using Common Contribution Network Data

Article

Sep 2016

Michael Tsikerdekis

Identity deception in social media applications has negatively impacted online communities and it is likely to increase as the social media user population grows. The ease of generating new accounts on social media has exacerbated the issue. Many previous studies have been posited that focused on both verbal, non-verbal and network data produced by users in an attempt to detect identity deception. However, although these methods produced a high accuracy, they are mainly reactive to the issue of identity deception. This paper proposes a proactive approach that leverages social network data and it is focused on identity deception prevention for online sub-communities, communities that exist within larger communities (e.g., Facebook groups or Subreddits). The method can be applied to various types of social media applications and produces high accuracy in identifying deceptive accounts at the time of attempted entry to a subcommunity. Performance results as well as limitations for the method are presented. A discussion follows on the identification of possible implications of this study for social media applications and future directions on deception prevention are proposed.

Thwarting Fake OSN Accounts by Predicting their Victims

Conference Paper

Oct 2015

Traditional defense mechanisms for fighting against automated fake accounts in online social networks are victim-agnostic. Even though victims of fake accounts play an important role in the viability of subsequent attacks, there is no work on utilizing this insight to improve the status quo. In this position paper, we take the first step and propose to incorporate predictions about victims of unknown fakes into the workflows of existing defense mechanisms. In particular, we investigated how such an integration could lead to more robust fake account defense mechanisms. We also used real-world datasets from Facebook and Tuenti to evaluate the feasibility of predicting victims of fake accounts using supervised machine learning.

New Approach for Detecting Spammers on Twitter using Machine Learning Framework

Abstract

Recommended publications

Cost-Based Heterogeneous Learning Framework for Real-Time Spam Detection in Social Networks With Exp...

Spammer Detection and Fake User Identification on Social Networks

Detecting spammers on Twitter

Use of a Recurrent Neural Network to Identify Spammers on Twitter

Statistical Analysis for Twitter Spam Detection

Detecting Malicious Facebook Applications using LSTM Algorithm

Checksec Email Phishi Trasher Tool