ArticlePDF Available

Email Spam Filtering using Supervised Machine Learning Techniques

December 2010
International Journal on Computer Science and Engineering 2(9)

December 2010
2(9)

License
CC BY 4.0

Authors:

E-mail spam, known as unsolicited bulk Email (UBE), junk mail, or unsolicited commercial email (UCE), is the practice of sending unwanted e-mail messages, frequently with commercial content, in large quantities to an indiscriminate set of recipients. Spam is prevalent on the Internet because the transaction cost of electronic communications is radically less than any alternate form of communication. There are many spam filters using different approaches to identify the incoming message as spam, ranging from white list / black list, Bayesian analysis, keyword matching, mail header analysis, postage, legislation, and content scanning etc. Even though we are still flooded with spam emails everyday. This is not because the filters are not powerful enough, it is due to the swift adoption of new techniques by the spammers and the inflexibility of spamfilters to adapt the changes. In our work, we employed supervised machine learning techniques to filter the email spam messages. Widely used supervised machine learning techniques namely C 4.5 Decision tree classifier, Multilayer Perceptron, Naïve Bayes Classifier are used for learning the features of spam emails and the model is built by training with known spam emails and legitimate emails. The results of the models are discussed.

No caption available

…

Figures - available via license: Creative Commons Attribution 4.0 International

Content may be subject to copyright.

Available via license: CC BY 4.0

Content may be subject to copyright.

V. Christina et al. / (IJCSE) International Journal on Computer Science and Engineering

Vol. 02, No. 09, 2010, 3126-3129

Email Spam Filtering using Supervised Machine

Learning Techniques

V.Christina

, S.Karpagavalli

, G.Suganya

M.Phil Research scholar Department of Computer Science(PG)

P.S.G.R Krishnammal College for Women

Senior Lecturer

GR Govindarajulu School of Appiled Computer Technology

Abstract— E-mail spam, known as unsolicited bulk Email

(UBE), junk mail, or unsolicited commercial email (UCE), is the

practice of sending unwanted e-mail messages, frequently with

commercial content, in large quantities to an indiscriminate set

of recipients. Spam is prevalent on the Internet because the

transaction cost of electronic communications is radically less

than any alternate form of communication. There are many

spam filters using different approaches to identify the incoming

message as spam, ranging from white list / black list, Bayesian

analysis, keyword matching, mail header analysis, postage,

legislation, and content scanning etc. Even though we are still

flooded with spam emails everyday. This is not because the

filters are not powerful enough, it is due to the swift adoption of

new techniques by the spammers and the inflexibility of spam

filters to adapt the changes. In our work, we employed

supervised machine learning techniques to filter the email spam

messages. Widely used supervised machine learning techniques

namely C 4.5 Decision tree classifier, Multilayer Perceptron,

Naïve Bayes Classifier are used for learning the features of

spam emails and the model is built by training with known

spam emails and legitimate emails. The results of the models are

discussed.

Keywords— Spam, Spam filter, Spammer, Mail header,

Machine learning, Classifier

I. INTRODUCTION

The internet has become an integral part of everyday

life and e-mail has become a powerful tool for information

exchange. Along with the growth of the Internet and e-mail,

there has been a dramatic growth in spam in recent years.

Spam can originate from any location across the globe where

Internet access is available. Despite the development of anti-

spam services and technologies, the number of spam

messages continues to increase rapidly. In order to address

the growing problem, each organization must analyze the

tools available to

determine how best to counter spam in its

environment. Tools, such as the corporate e-mail system, e-

mail filtering gateways, contracted anti-spam services, and

end-user training, provide an important arsenal for any

organization. However, users cannot avoid the very serious

problem of attempting to deal with large amounts of spam on

a regular basis. If there are no anti spam activities, spam will

inundate network systems, kill employee productivity, steal

bandwidth, and still be there tomorrow.

II. S

PAM FILTER ARCHITECTURE AND METHODS

E-mail spam, known as unsolicited bulk Email (UBE), junk

mail, or unsolicited commercial email (UCE), is the practice

of sending unwanted e-mail messages, frequently with

commercial content, in large quantities to an indiscriminate

set of recipients. The technical definition of spam is ‘An

electronic message is "spam" if (A) the recipient's personal

identity and context are irrelevant because the message is

equally applicable to many other potential recipients; and (B)

the recipient has not verifiably granted deliberate, explicit,

and still-revocable permission for it to be sent’. The risks in

filtering spam are sometimes legitimate mails may be

rejected or denied and legitimate mails may be marked as

spam. The risks of not filtering spam are the constant flood

of spam clogs networks and adversely impacts user inboxes,

but also drain valuable resources such as bandwidth and

storage capacity, productivity loss and interfere with the

expedient delivery of legitimate emails.

Spam filters can be implemented at all layers, firewalls

exist in front of email server or at MTA(Mail Transfer

Agent), Email Server to provide an integrated Anti-Spam and

Anti-Virus solution offering complete email protection at the

network perimeter level, before unwanted or potentially

dangerous email reaches the network. At MDA (Mail

Delivery Agent) level also spam filters can be installed as a

service to all of their customers. At Email client user can

have personalized spam filters that then automatically filter

mail according to the chosen criteria. Figure 1. shows the

typical architecture of spam filter.

The several different methods to identify incoming

messages as spam are, Whitelist/Blacklist, Bayesian analysis,

Mail header analysis, Keyword checking. A whitelist is a

list, which includes all addresses from which the users

always wish to receive mail.

User can add email addresses or entire domains, or

functional domains. An interesting option is an automatic

whitelist management tool that eliminates the need for

administrators to manually input approved addresses on the

whitelist and ensures that mail from particular senders or

domains are never flagged as spam.

ISSN : 0975-3397

3126

V. Christina et al. / (IJCSE) International Journal on Computer Science and Engineering

Vol. 02, No. 09, 2010, 3126-3129

The number of records can be configured. When an

overflow occurs, obsolete records are overwritten. A blacklist

works similarly to competitive alternatives: this is a list of

addresses from which user never want to receive mail. Mail

header checking consists of a set of rules that, if a mail

header matches, triggers the mail server to return messages

that have blank "From" field, that lists a lot of addresses in

the "To" from the same source, that have too many digits in

email addresses (a fairly popular method of generating false

addresses). It also enables to return messages by matching

the language code declared in the header.

In Bayesian analysis, the word probabilities (also known

as likelihood functions) are used to compute the probability

that an email with a particular set of words in it belongs to

either category. This contribution is called the posterior

probability and is computed using Bayes' theorem. Then, the

email's spam probability is computed over all words in the

email, and if the total exceeds a certain threshold, the filter

will mark the email as a spam. Keyword checking is another

method widely used in filtering spam. It works by scanning

both email subject and body. Using "conditions" i.e.

combinations of keywords is a good solution to enhance

filtering efficiency. We can specify combinations of words

and update the list that must appear in the spam email. All

messages that include these words will be blocked.

III. M

ETHODOLOGY

Most of the spam filtering techniques is based on text

categorization methods. Thus filtering spam turns on a

classification problem. In our work, rules are framed to

extract feature vector from email. As the characteristics of

discrimination are not well defined, it is more convenient to

apply machine learning techniques. Three machine learning

algorithms, C 4.5 Decision tree classifier, Multilayer

perceptron and Naïve bayes classifier are used for learning

the classification model.

A. MultiLayer Perceptron

Multilayer Perceptron (MLP) network is the most widely

used neural network classifier. MLP networks are general-

purpose, flexible, nonlinear models consisting of a number of

units organised into multiple layers. The complexity of the

MLP network can be changed by varying the number of

layers and the number of units in each layer. Given enough

hidden units and enough data, it has been shown that MLPs

can approximate virtually any function to any desired

accuracy. In other words, MLPs are universal approximators.

MLPs are valuable tools in problems when one has little or

no knowledge about the form of the relationship between

input vectors and their corresponding outputs.

B. C 4.5 Decision Tree Induction

Decision Tree Classification generates the output as a

binary tree like structure called a decision tree, in which each

branch node represents a choice between a number of

alternatives, and

each leaf node represents a classification or

decision. A Decision Tree

model contains rules to predict

the target variable. This algorithm scales well, even where

there are varying numbers of training examples and

considerable numbers of attributes in large databases.

J48 algorithm is an implementation of the C4.5 decision tree

learner. This implementation produces decision tree models.

The algorithm uses the greedy technique to induce decision

trees for classification. A decision-tree model is built by

analyzing training data and the model is used to classify

unseen data. J48 generates decision trees, the nodes of which

evaluate the existence or significance of individual features.

C. Naïve Bayes Classification

The naive bayes classifier (NB) is a simple but effective

classifier which has been used in numerous applications of

information processing including, natural language

processing, information retrieval, etc. The Naive Bayes

Classifier technique is based on Bayesian theorem and is

particularly suited when the dimensionality of the inputs is

high. Naïve Bayes classifiers assume that the effect of a

variable value on a given class is independent of the values

of other variable. The Naive-Bayes inducer computes

conditional probabilities of the classes given the instance and

picks the class with the highest posterior. Depending on the

precise nature of the probability model, naive Bayes

classifiers can be trained very efficiently in a supervised

learning setting.

IV.

FEATURE EXTRACTION

The work is based on rules and uses a score-based system.

The rules are framed by analyzing the mail header

information, keyword matching and the body of the message.

And a relative score is assigned to each rule.

ISSN : 0975-3397

3127

V. Christina et al. / (IJCSE) International Journal on Computer Science and Engineering

Vol. 02, No. 09, 2010, 3126-3129

There are number of rules framed by considering the various

features that will aid to identify the spam messages

effectively. Each rule performs a test on the email, and each

rule has a score. When an email is processed, it is tested

against each rule. For each rule found to be true for an email,

the score associated with the rule is added to the overall score

for that email. Once all the rules have been used, the total

score for the email is compared to a threshold value. If the

score exceeds the threshold, then the email is marked as

spam and the others are classified as legitimate mail. In this

work, the rules used are

TABLE I

SCHEME OF RULES ASSIGNED TO EACH SPAM FEATURE

From name meaningful

From domain name

Blocked IP

Apostrophe in From name

From name in Auto Whitelist (AWL)

From address in User’s Block list

From address in User’s White list

Content Type

Content Boundary exists

To name meaningful

To address Undisclosed recipients

To header original

From address and To address same

Is subject present

Subject content has obfuscate words

Is forwarded message

Is reply message

Subject Reply without reference header

Is message body exists

Sensual message

Repeated double quotes in body

Character set includes foreign language

More blank lines in body

In these 23 rules, some are simple and some are

associated with one another. A simple rule could search for a

word ‘Viagra’ in subject line of an email, while a complex

rule may involve comparing an email against an online

database of spam. Each rule adds to the overall score, so an

email that triggers only one rule due to the use of the word

‘Viagra’ will not necessarily mark an email as spam.

However, if an email triggers several rules, it will have a

combined score that could be over the threshold and the mail

could be marked as spam.

V. E

XPERIMENT AND RESULTS

The email spam filtering has been carried out using

WEKA. The Weka, Open Source, Portable, GUI-based

workbench is a collection of state-of-the-art machine learning

algorithms and data pre processing tools.

The training dataset, spam and legitimate message corpus

is generated from the mails that we received from our

institute mail server for a period of six months. The mails are

analyzed and 23 rules are identified that extremely ease the

process of classifying the spam message. The corpus consists

of 750

spam messages and 750 legitimate messages. From

the corpus, the feature vectors are extracted by analyzing

message header, keyword checking,

whitelist/blacklist etc.

The class labels are designated as L and S to represent

legitimate and spam

message respectively.

The machine learning techniques Naïve Bayes Classifier,

C 4.5 Decision tree classifier, Multilayer Perceptron are used

for training the dataset in WEKA environment.

The training is carried out with the feature vectors

extracted by analyzing each message header and keyword

checking and whitelist/blacklist.

The performance of the trained models is evaluated

using 10-fold cross validation for its predictive accuracy.

Predictive accuracy is used as a performance measure for

email spam classification. The prediction accuracy is

measured as the ratio of number of correctly classified

instances in the test dataset and the total number of test cases.

In spam filtering, false negatives just mean that some spam

mails are classified as legitimate and moved to inbox. False

positive mean that legitimate emails that get mistakenly

identified as spam and moved to spam folder or discarded.

For most users, missing legitimate email is an order of

magnitude worse than receiving spam. The false positive rate

of each classifier also considered to measure its performance.

The performance of the classifiers are summarized in

Table II and shown in Fig.2 and Fig.3.

TABLE II

COMPARATIVE RESULTS OF THE CLASSIFIERS

Evaluation

Criteria

Naïve

Bayes

J48 MLP

Training time (secs) 0.15 0.20 138.05

Correctly Classified

Instances

1479 1449 1490

Prediction

Accuracy ( % )

98.6 96.6 99.3

False Positive (%) 5 4 1

Fig. 1 Classification Accuracy

The performance of the three models was evaluated

based on the three criteria, the prediction accuracy,

learning time and false positive rate. Multilayer

perceptron predicts better than other algorithms

99.3

96.6

98.6

95.5

96.5

97.5

98.5

99.5

100

Naïve Bayes J48 MLP

Accuracy%

ISSN : 0975-3397

3128

V. Christina et al. / (IJCSE) International Journal on Computer Science and Engineering

Vol. 02, No. 09, 2010, 3126-3129

0.15

0.2

138.05

100

120

140

Build Time(sec)

Naïve bayes J48 MLP

Fig. 1 Learning Time of the Models

Multilayer perceptron, the neural network classifier

consumes more time to build the model. The naivebayes, the

probabilistic classifier and decision tree model tends to learn

more rapidly for the given data set.

VI.

CONCLUSION

Although there are many email spam filtering tools exists

in the world, due to the existence of spammers and adoption

of new techniques, email spam filtering becomes a

challenging problem to the researchers. In our work, we

generated spam and legitimate message corpus from the

latest mails and employed machine learning techniques to

build the model. The performance of the model is evaluated

using 10-fold cross validation and observed that Multilayer

Perceptron classifier out performs other classifiers and the

false positive rate also very low compared to other

alogorithms. Email spam filters using this approach can be

adopted either at mailserver or at mail client side to reduce

the amount of spam messages and to reduce the risk of

productivity loss, bandwidth and storage usage.

REFERENCES

[1] Ahmed Khorsi, "An Overview of Content-based Spam Filtering

Techniques", Informatica, vol. 31, no. 3, October 2007, pp 269-277.

[2] Alistair McDonald, “SpamAssassin: A Practical Guide to Integration

and Configuration”, I

Edition, Packt publishers, 2004.

[3] Ian H. Witten, Eibe Frank, “Data Mining – Practical Mahine Learning

Tools and Techniques,” 2

Edition, Elsevier, 2005.

ISSN : 0975-3397

3129

An Optimized Approach For Detection and Classification of Spam Email’s Using Ensemble Methods

Preprint

Full-text available

Sep 2022

Since the advent of email services, spam emails are a major concern because users’ security depends on the classification of emails as ham or spam. It’s a malware attack that has been used for spear phishing, whaling, clone phishing, website forgery, and other harmful activities. However, various ensemble Machine Learning (ML) algorithms used for the detection and filtering of spam emails have been less explored. In this research, we offer a ML based optimized algorithm for detecting spam emails that have been enhanced using Hyper-parameter tuning approaches. The proposed approach uses two feature extraction modules, namely Count-Vectorizer and TFIDF-Vectorizer that provide the most effective classification results when we applied them to three different publicly available email data sets: Ling Spam, UCI SMS Spam, and Proposed dataset. Moreover, to extend the performance of classifiers we used various ML methods such as Naive Bayes (NB), Logistic Regression (LR), Extra Tree, Stochastic Gradient Descent (SGD), XG-Boost, Support Vector Machine (SVM), Random Forest (RF), Multi Layer Perception (MLP), and parameter optimization approaches such as Manual search, Random search, Grid search, and Genetic algorithm. For all three data sets, the SGD outperformed other algorithms. All of the other ensembles (Extra Tree, RF), linear models (LR, Linear-SVC), and MLP performed admirably, with relatively high precision, recall, accuracies and F1-Score.

Enhancing Spam Email Classification Using Effective Preprocessing Strategies and Optimal Machine Learning Algorithms

Preprint

Full-text available

Oct 2023

This paper proposes a content-based spam email classification by applying various text preprocessing techniques like Stopping, Stemming, and Lemmatization. NLP techniques have been applied to preprocessing techniques to avoid loss of information while preprocessing. Machine learning algorithms like Naïve Bayes (NB), Support Vector Machine (SVM), and Random Forest (RF) are applied to classify an email as ham or spam. The pre-processing technique used in this model has greatly enhanced the classification results. Two standard datasets Enron and SpamAssassin are used to evaluate the performance of the models. On the Enron dataset, it scored an impressive accuracy of 98.3%, and on the SpamAssassin dataset, it provided an even greater accuracy of 99.2%. These outcomes demonstrate the efficacy of preprocessing techniques for the classification of spam emails. Further, the proposed model's accuracy was validated using a dataset of personal emails sourced from Yahoo mailbox. The Yahoo inbuilt classifier offered an accuracy of 89%, however, the proposed models provided a staggering 97% accuracy on the personal email dataset. The experiment on the personal email dataset indicates the model's suitability for real-world email contexts, indicating its potential effectiveness in spam email categorization.

Enhancing email security: A hybrid machine learning approach for spam and malware detection

Article

Full-text available

May 2024

Recent research indicates a notable surge in SMS spam, posing as entities aiming to deceive individuals into divulging private account or identity details, commonly termed "phishing" or "email spam". Conventional spam filters struggle to adequately identify these malicious emails, leading to challenges for both consumers and businesses engaged in online transactions. Addressing this issue presents a significant learning challenge. While initially appearing as a straightforward text classification problem, the classification process is complicated by the striking similarity between spam and legitimate emails. In this study, we introduce a novel method named "filter" designed specifically for detecting deceptive SMS spam. By incorporating features tailored to expose the deceptive techniques employed to dupe users, we achieved an accurate classification rate of over 99.01% for SMS spam emails, while maintaining a low false positive rate. These results were attained using a dataset comprising 746 instances of spam and 4822 instances of legitimate emails. The filter's accuracy, evaluated on a dataset with two attributes and 5568 instances, notably surpasses existing methodologies. Our proposed model, a Hybrid NB-ANN model, achieves the highest accuracy at 99.01%, outperforming both Naïve Bayes (98.57%) and Artificial Neural Network (98.12%). This highlights the efficacy of the hybrid approach in enhancing accuracy for email spam detection and malware filtering, ensuring comprehensive coverage across training and test datasets for improved feedback loops.

Classification of Spam E-mail based on Naïve Bayes Classification Model

Article

Full-text available

Apr 2023

Shaopeng Cheng

With the rising number of spam email, the need of more sufficient antispam filter is surging. Phishing attack can lead to extremely large losses of companies and individual, even more than 1 billion dollars in one year. This paper investigates and combines Naïve Bayes Classification and clustering algorithm in the application of identifying spam emails. With sample emails to create a dynamic dictionary containing most frequent words in spam and normal emails, this distribution of spam filter will provide a stricter method to prevent spam emails than those methods used in mail companies, e.g., Google, Yahoo, and Outlook.com. Besides, this paper also compares several algorithms used today in classifying spams and the future techniques of deep learning and machine learning’s application in classifying spam emails. According to the analysis, Google’s algorithm has the most comprehensive function, but such algorithm has less strict rule than Yahoo’s. Outlook.com, as a combination of Microsoft application, it has a unique algorithm for encrypting and filtering spams. Overall, these results shed light on guiding further exploration of both comprehensive and strict rule for classifying spams.

Spam Classification Based on Machine Learning Algorithm

Article

Full-text available

Feb 2023

Yichun Huang

As technology continues to advance, email is used in almost every field. However, the dramatic increase in the number of spam emails has led to a growing need for accurate and powerful spam classifiers. Unfortunately, spam in its many forms is constantly being updated as the Internet evolves, and the challenge of fighting spam is enormous. With the emerging of deep learning methods, deep based algorithms are widely applied in most of the real-world scene and tasks. In this paper, we used CNN, RNN, LSTM and naïve Bayes models to implement spam filtering, and compared them based on information such as accuracy, model strengths and weaknesses. We test all the models on the public dataset and measure them by four metrics, including accuracy, precision, recall and F1-score. Finally, we conclude that the naïve bayes model achieves the best performance among all those methods, which can deal with the threat of spam efficiently.

Predicting Online Job Recruitment Fraudulent Using Machine Learning

Article

Mar 2023

Biman Barua

Employing individuals via the Internet has been a boon for businesses in the modern day. It is much simpler and more convenient than traditional recruitment methods. However, several scammers are abusing this platform, which may result in financial and privacy loss for job seekers and damage to the reputable organisation's name. In this research, we proposed a technique for detecting Online Recruitment Fraud (ORF). This model uses a publicly available dataset containing 17,780 job postings. We apply the four classification models to determine which classification model performs best for our suggested model. In this model, we use decision trees, random forests, Naive Bayes and logistic regression methods. We have estimated and evaluated the accuracy of several prediction systems. The random forest classifier provides the greatest accuracy, 97.16%, on our dataset. We have endeavoured to develop a method for detecting bogus recruiting postings.

Strengthening Cybersecurity: A Comparative Study of KNN and Random Forest for Spam Detection

Chapter

Mar 2024

At present, email has become a necessary communication medium for exchanging messages and is considered an important part of business, commerce, government, education, entertainment, and other fields in various countries. As it is known that everything has its pros and cons, in spite of benefitting society everybody have to face its drawbacks, which include email spamming. Email spam or junk emails are unwanted emails which are annoying as well as dangerous, containing links trying to corrupt the computer system, stealing your bank details, and stealing your identity with the help of Botnet or real humans, fraudulent access to the information through data breaches. Spam filtering algorithms detect undesired, infected mail and block these messages from reaching to user’s inboxes and keeping their email servers safe from getting overloaded. These are adaptable and provide sustainability to all the spam detected and provide security to the emails. It is important to make the network and system to be free from spammers, malicious links, and viruses. In this research, it has been demonstrated that the K-Nearest Neighbor algorithm by storing the training data and the class labels and is used for classification as well as regression and Random Forest containing more than one decision tree on different subsets improves the accuracy and achieve high accuracy rates, often above 95% by classifying email into spam and ham.

Machine Learning Approaches for Text Mining and Spam E-mail Filtering: Industry 4.0 Perspective

Chapter

Aug 2023

Artificial Intelligence and Data Science in Recommendation System: Current Trends, Technologies and Applications captures the state of the art in usage of artificial intelligence in different types of recommendation systems and predictive analysis. The book provides guidelines and case studies for application of artificial intelligence in recommendation from expert researchers and practitioners. A detailed analysis of the relevant theoretical and practical aspects, current trends and future directions is presented. The book highlights many use cases for recommendation systems: - Basic application of machine learning and deep learning in recommendation process and the evaluation metrics - Machine learning techniques for text mining and spam email filtering considering the perspective of Industry 4.0 - Tensor factorization in different types of recommendation system - Ranking framework and topic modeling to recommend author specialization based on content. - Movie recommendation systems - Point of interest recommendations - Mobile tourism recommendation systems for visually disabled persons - Automation of fashion retail outlets - Human resource management (employee assessment and interview screening) This reference is essential reading for students, faculty members, researchers and industry professionals seeking insight into the working and design of recommendation systems.

Cloud-IoT Technologies in Society 5.0

Book

Apr 2023

At present, we live in a self-motivated and dynamic global society where technologies and challenges are unexpectedly changing overnight. These rapid changes in globalization and technological advances are creating new market forces every day. Therefore, day-to-day innovation is essential for any business or institution to survive and flourish in such an atmosphere. Though, innovation is no longer just to create value to do good to individuals, societies, or organizations. The utmost purpose of innovation is to create a smart futuristic society where people can enjoy the best quality of life using natural resources and manmade technologies including cloud-IoT technologies, and industry4.0. Hence, the innovators and their innovations must search for intelligent solutions to tackle major socio-technical problems and remove barriers of rural, urban and smart city societies. This book provides in-depth knowledge in the areas of convergence of cloud-IoT technologies and industry 4.0 with society 5.0, machine-to-machine communication, machine-to-person communication, techno-psychological perspective of society 5.0, sentiment analysis of smart digital societies, multi-access edge computing for 5G networks, discovery & location reporting of multi-access edge enabled clients/servers, m-health systems, enhancing the concert of M-health technologies in smart societies, supervising communication services in smart societies, life quality enhancement in smart city societies, multiple disease infection predictions, and societal opinion mining algorithms for smart cities societies using cloud-IoT integrated intelligent machine / deep learning technologies to the readers in the distributive environment. In this book, the authors have mandatorily discussed the implementation of cloud-IoT-based machine learning technologies like clustering technique, Naïve Bayes classifier, artificial neural network (ANN), Firefly algorithm, Rough set classifiers, support vector machine classifier, decision tree classifier, ensemble classifier, random forest, and deep learning algorithms to analyze the behavior of intelligent machines and human habits using automated data scheduling and smart digital networks. Smart digitization and the intelligent implementation of manufacturing development processes are the necessities for today’s rural, urban, and smart city industries. All types of industries including development, manufacturing, and research are presently shifting from bunch production to customized production. The fast advancements in manufacturing technologies have an in-depth impact on all types of societies including societies of rural areas, urban areas, and smart cities. Industry 4.0 includes the Internet of Things (IoT), Industrial Internet, Smart Manufacturing, Cloud-based computing, and Manufacturing Technologies. The objective of this book is to establish a linkage between the Industry 4.0 components and various rural, urban & smart city societies (including society 5.0) to bring actual prosperity where human values, peace of mind, human relations, man-machine-relations, and calmness will have utmost preference. These objectives can be achieved by the integration of human societal values, and social opinion mining (SOM) approaches with the existing technologies.

Societal Opinion Mining Using Machine Intelligence

Chapter

Full-text available

Apr 2023

The rapid growth in the chunk of social media messages has generated the dire need to architecture more reliable and robust techniques to mine the social opinion with the help of social media messages related to a particular realm. Machine learning techniques of recent scenarios can successfully be implemented for this pursuit. In this chapter, the authors presented a critical review and performance of different machine learning approaches for social opinion mining (SOM). Indeed, this chapter presents a beautiful amalgamation of different aspects related to SOM such as indispensable and inevitable concepts, implementation details, and efficiency to mine the opinion of society with the help of social media messages in the light of machine learning techniques. The basic discussion and theoretical background of this chapter attempt to explore the application of machine learning strategies in the phenomenon of SOM. Moreover, the chapter renders the discussion on the usual opinion mining process and the attempts performed by different researchers to tackle the pertinent problems of SOM with the help of social media and the implementation of machine learning techniques. Moreover, the chapter also discusses different challenges associated with this task. The mathematical models of social opinion mining are also proposed based on fuzzy concepts and linear algebra. Further, this chapter entails the algorithmic details of ten machine learning techniques for SOM. Finally, experimentations have been carried out for these ten machine learning techniques using Google embeddings and Amazon embeddings on two datasets and it has been observed that convolutional neural network (CNN) based deep learning outperforms other machine learning techniques in the pursuit of SOM.KeywordsOpinion miningSocial opinion miningMachine learningSocial networkWord embeddings

Data Mining: Practical Machine Learning Tools and Techniques

Chapter

Full-text available

Nov 2010

An Overview of Content-Based Spam Filtering Techniques.

Article

Oct 2007

Ahmed Khorsi

So fast, so cheap, so efficient, Internet is nowadays incontestably communication mean of choice for personal, business and academic purposes. Unfortunately, Internet has not only this beautiful face. Malicious activities enjoy as well this so fast, cheap and efficient mean. The last decade, Internet worms took the lights. In the recent years, spams are invading one of the most used services of Internet: email. This paper summarizes most of techniques used to filter spams by analyzing the email content.

SpamAssassin: A Practical Guide to Integration and Configuration

Jan 2004

Alistair Mcdonald

Alistair McDonald, "SpamAssassin: A Practical Guide to Integration and Configuration", I st Edition, Packt publishers, 2004.

Email Spam Filtering using Supervised Machine Learning Techniques

Abstract and Figures

Recommended publications

An Efficient Spam Filtering using Supervised Machine Learning Techniques

A Study on Email Spam Filtering Techniques

Proposed efficient algorithm to filter spam using machine learning techniques

Spam Mail Detection through Data Mining – A Comparative Performance Analysis

Password Strength Prediction Using Supervised Machine Learning Techniques